Thanks for reporting this, Shawn. Do you want to try out HBASE-4508 which is in HBase 0.90.5 ?
On Tue, Jan 24, 2012 at 9:15 AM, Shawn Quinn <[email protected]> wrote: > Hello, > > Our application runs Map/Reduce tasks fairly frequently against HBase > (Cloudera distribution 0.90.4), and we're making using of the default > org.apache.hadoop.hbase.mapreduce.TableOutputFormat class for the reduce > step which the TableMapReduceUtil.initTableReducerJob() sets up. We invoke > the Map/Reduce tasks via the standard Hadoop Job API, but they're all > triggered from the same virtual machine that stays running (so we aren't > shutting down the virtual machine after each job runs). We've been > noticing that we've been running out of ZooKeeper connections in this > configuration, and believe we've tracked the "leak" down to the > TableOutputFormat class. Specifically, that class does the following: > > public void setConf(Configuration otherConf) { > this.conf = HBaseConfiguration.create(otherConf); > String tableName = this.conf.get(OUTPUT_TABLE); > String address = this.conf.get(QUORUM_ADDRESS); > String serverClass = this.conf.get(REGION_SERVER_CLASS); > String serverImpl = this.conf.get(REGION_SERVER_IMPL); > try { > if (address != null) { > ZKUtil.applyClusterKeyToConf(this.conf, address); > } > if (serverClass != null) { > this.conf.set(HConstants.REGION_SERVER_CLASS, serverClass); > this.conf.set(HConstants.REGION_SERVER_IMPL, serverImpl); > } > this.table = new HTable(this.conf, tableName); > this.table.setAutoFlush(false); > LOG.info("Created table instance for " + tableName); > } catch(IOException e) { > LOG.error(e); > } > } > > I believe in previous releases of HBase this was different, but at some > point the code to clone the configuration object (first line of that > method) was added. Then, in that same method when that code creates the > HTable instance, internally the HTable gets a new connection to ZooKeeper > everytime (since the configuration instance is different.) > > I believe I can get around this in my application by creating a custom > TableOutputFormat. However, can anyone confirm if this is indeed a > problem, or if there is some other way to work around the default > TableOutputFormat class creating a new connection to ZooKeeper every time > it runs? > > Thanks, > > -Shawn >
