Hi, I am experiencing severe connection leak in my MR client that uses Hbase as input/output . Every job that uses TableInputFormat leaks 1 zookeeper connection per run as evidenced by netstat.
I understand that the way HTable manages connections now is it creates a new HBase (and also Zookeeper) connection per each instance of Configuration it is initialized with. By looking at the code of the TableInputFormat class, i see that it creates HTable in the front end during configuration (of course, it probably needs to use it to determine region splits). Since i have to configure each job individually, i must create a new instance of Configuration. Thus, i am not able to use shared HBase connections (which i would prefer to, but there seems to be no way now to do that). So... after i run an instance of MR job, the hbase connection seems to be leaked. It also leaks zk connection , which is a problem since zookeeper instances have limits on how many connections can be made from the same IP and eventually the client is not able to create any new HTables anymore since it can't establish any new zookeeper connections. I tried to do explicit cleanup by calling HConnectionManager.deleteConnection (Configuration) passing in the configuration that i used to create MR job. Doesn't seem to work. So.. Is there a way to run MR job with TableInputFormat without leaking a connection? I am pretty sure i am not creating any HTables in the client side. Or is it a bug? I spent several days now investigation an issue but i am still not able to come up with a workaround against zookeeper connection leaks in HBase MR jobs. thank you very much. -Dmitriy
