Hi,

I am experiencing severe connection leak in my MR client that uses
Hbase as input/output . Every job that uses TableInputFormat leaks 1
zookeeper connection per run as evidenced by netstat.

I understand that the way HTable manages connections now is it creates
a new HBase (and also Zookeeper) connection per each instance of
Configuration it is initialized with. By looking at the code of the
TableInputFormat class, i see that it creates HTable in the front end
during configuration (of course, it probably needs to use it to
determine region splits).

Since i have to configure each job individually, i must create a new
instance of Configuration. Thus, i am not able to use shared HBase
connections (which i would prefer to, but there seems to be no way now
to do that).

So... after i run an instance of MR job, the hbase connection seems to
be leaked. It also leaks zk connection , which is a problem since
zookeeper instances have limits on how many connections can be made
from the same IP and eventually the client is not able to create any
new HTables anymore since it can't establish any new zookeeper
connections.

I tried to do explicit cleanup by calling
HConnectionManager.deleteConnection (Configuration) passing in the
configuration that i used to create MR job. Doesn't seem to work.

So.. Is there a way to run MR job with TableInputFormat without
leaking a connection? I am pretty sure i am not creating any HTables
in the client side. Or is it a bug? I spent several days now
investigation an issue but i am still not able to come up with a
workaround against zookeeper connection leaks in HBase MR jobs.

thank you very much.
-Dmitriy

Reply via email to