Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job

Jean-Daniel Cryans Wed, 13 Apr 2011 12:52:30 -0700

Yeah for a JVM running forever it won't work.

If you know for a fact that the configuration passed to TIF won't be
changed then you can subclass it and override setConf to not clone the
conf.


J-D

On Wed, Apr 13, 2011 at 12:45 PM, Ruben Quintero <[email protected]> wrote:
> The problem is the connections are never closed... so they just keep piling up
> until it hits the max. My max is at 400 right now, so after 14-15 hours of
> running, it gets stuck in an endless connection retry.
>
> I saw that the HConnectionManager will kick older HConnections out, but the
> problem is that their ZooKeeper threads continue on. Those need to be 
> explicitly
> closed.
>
>
>
> Again, this is only an issue inside JVMs set to run forever, like Venkatesh
> said, because that's when the orphaned ZK connections will have a chance to
> build up to whatever your maximum is. Setting that higher and higher is just
> prolonging uptime before the eventual crash. It's essentially a memory
> (connection) leak within TableInputFormat, since there is no way that I can 
> see
> to properly access and close those spawned connections.
>
> One question for you, JD: Inside of TableInputFormat.setConf, does the
> Configuration need to be cloned? (i.e. setHTable(new HTable(new
> Configuration(conf), tableName)); ). I'm guessing this is to prevent changes
> within the job from affecting the table and vice-versa...but if it weren't
> cloned, then you could use the job configuration (job.getConfiguration()) to
> close the connection....
>
> Other quick fixes that I can think of, none of which are very pretty:
> 1 - Just call deleteAllConnections(bool), and have any other processes using
> HConnections recover from that.
> 2 - Make the static HBASE_INSTANCES map accessible (public).... then you could
> iterate through open connections and try to match configs....
>
> Venkatesh - unless you have other processes in your JVM accessing HBase (I 
> have
> one), #1 might be the best bet.
>
> - Ruben
>
>
>
> ________________________________
> From: Jean-Daniel Cryans <[email protected]>
> To: [email protected]
> Sent: Wed, April 13, 2011 3:22:48 PM
> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
>
> Like I said, it's a zookeeper configuration that you can change. If
> hbase is managing your zookeeper then set
> hbase.zookeeper.property.maxClientCnxns to something higher than 30
> and restart the zk server (can be done while hbase is running).
>
> J-D
>
> On Wed, Apr 13, 2011 at 12:04 PM, Venkatesh <[email protected]> wrote:
>> Reuben:
>> Yes..I've the exact same issue now..& I'm also kicking off from another jvm
>>that runs for ever..
>> I don't have an alternate solution..either modify hbase code (or) modify my
>>code to kick off
>> as a standalone jvm (or) hopefully 0.90.3 release soon :)
>> J-D/St.Ack may have some suggestions
>>
>>
>> V
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Ruben Quintero <[email protected]>
>> To: [email protected]
>> Sent: Wed, Apr 13, 2011 2:39 pm
>> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
>>
>>
>> The problem I'm having is in getting the conf that is used to init the table
>>
>> within TableInputFormat. That's the one that's leaving open ZK connections 
>> for
>>
>> me.
>>
>>
>>
>> Following the code through, TableInputFormat initializes its HTable with new
>>
>> Configuration(new JobConf(conf)), where conf is the config I pass in via job
>>
>> initiation. I don't see a way of getting the initalized TableInputFormat in
>>
>> order to then get its table and its config to be able to properly close that
>>
>> connection. Cloned configs don't appear to produce similar hashes, either. 
>> The
>>
>> only other option I'm left with is closing all connections, but that disrupts
>>
>> things across the board.
>>
>>
>>
>>
>>
>> For MapReduce jobs run in their own JVM, this wouldn't be much of an issue, 
>> as
>>
>> the connection would just be closed on completion, but in my case (our code
>>
>> triggers the jobs internally), they simply pile up until the ConnectionLoss
>>hits
>>
>>
>>
>> due to too many ZK connections.
>>
>>
>>
>> Am I missing a way to get that buried table's config, or another way to kill
>>the
>>
>>
>>
>> orphaned connections?
>>
>>
>>
>> - Ruben
>>
>>
>>
>>
>>
>>
>>
>> ________________________________
>>
>> From: Venkatesh <[email protected]>
>>
>> To: [email protected]
>>
>> Sent: Wed, April 13, 2011 10:20:50 AM
>>
>> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
>>
>>
>>
>> Thanks J-D
>>
>> I made sure to pass conf objects around in places where I was n't..
>>
>> will give it a try
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>>
>> From: Jean-Daniel Cryans <[email protected]>
>>
>> To: [email protected]
>>
>> Sent: Tue, Apr 12, 2011 6:40 pm
>>
>> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
>>
>>
>>
>>
>>
>> Yes there are a few places like that. Also when you create new
>>
>>
>>
>> HTables, you should also close their connections (this is not done in
>>
>>
>>
>> htable.close).
>>
>>
>>
>>
>>
>>
>>
>> See HTable's javadoc which says:
>>
>>
>>
>>
>>
>>
>>
>> Instances of HTable passed the same Configuration instance will share
>>
>>
>>
>> connections to servers out on the cluster and to the zookeeper
>>
>>
>>
>> ensemble as well as caches of region locations. This is usually a
>>
>>
>>
>> *good* thing. This happens because they will all share the same
>>
>>
>>
>> underlying HConnection instance. See HConnectionManager for more on
>>
>>
>>
>> how this mechanism works.
>>
>>
>>
>>
>>
>>
>>
>> and it points to HCM which has more information:
>>
>>
>>
>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HConnectionManager.html
>>l
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> J-D
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Apr 12, 2011 at 3:09 PM, Ruben Quintero <[email protected]> wrote:
>>
>>
>>
>>> I'm running into the same issue, but did some poking around and it seems 
>>> that
>>
>>
>>
>>> Zookeeper connections are being left open by an HBase internal.
>>
>>
>>
>>>
>>
>>
>>
>>> Basically, I'm running a mapreduce job within another program, and noticed 
>>> in
>>
>>
>>
>>> the logs that every time the job is run, a connection is open, but I never
> see
>>
>>
>>
>>> it closed again. The connection is opened within the job.submit().
>>
>>
>>
>>>
>>
>>
>>
>>> I looked closer and checked the jstack after running it for just under an
>>
>>
>>
>> hour,
>>
>>
>>
>>> and sure enough there are a ton of Zookeeper threads just sitting there.
>>
>>
>>
>> Here's
>>
>>
>>
>>> a pastebin link: http://pastebin.com/MccEuvrc
>>
>>
>>
>>>
>>
>>
>>
>>> I'm running 0.90.0 right now.
>>
>>
>>
>>>
>>
>>
>>
>>> - Ruben
>>
>>
>>
>>>
>>
>>
>>
>>>
>>
>>
>>
>>>
>>
>>
>>
>>>
>>
>>
>>
>>>
>>
>>
>>
>>>
>>
>>
>>
>>> ________________________________
>>
>>
>>
>>> From: Jean-Daniel Cryans <[email protected]>
>>
>>
>>
>>> To: [email protected]
>>
>>
>>
>>> Sent: Tue, April 12, 2011 4:23:05 PM
>>
>>
>>
>>> Subject: Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job
>>
>>
>>
>>>
>>
>>
>>
>>> It's more in the vain of
>>
>>
>>
>>> https://issues.apache.org/jira/browse/HBASE-3755 and
>>
>>
>>
>>> https://issues.apache.org/jira/browse/HBASE-3771
>>
>>
>>
>>>
>>
>>
>>
>>> Basically 0.90 has a regression regarding the handling of zookeeper
>>
>>
>>
>>> connections that make it that you have to be super careful not to have
>>
>>
>>
>>> more than 30 per machine (each new Configuration is one new ZK
>>
>>
>>
>>> connection). Upping your zookeeper max connection config should get
>>
>>
>>
>>> rid of your issues since you only get it occasionally.
>>
>>
>>
>>>
>>
>>
>>
>>> J-D
>>
>>
>>
>>>
>>
>>
>>
>>> On Tue, Apr 12, 2011 at 7:59 AM, Venkatesh <[email protected]> wrote:
>>
>>
>>
>>>>
>>
>>
>>
>>>>  I get this occasionally..(not all the time)..Upgrading from 0.20.6 to
> 0.90.2
>>
>>
>>
>>>> Is this issue same as this JIRA
>>
>>
>>
>>>> https://issues.apache.org/jira/browse/HBASE-3578
>>
>>
>>
>>>>
>>
>>
>>
>>>> I'm using HBaseConfiguration.create() & setting that in job
>>
>>
>>
>>>> thx
>>
>>
>>
>>>> v
>>
>>
>>
>>>>
>>
>>
>>
>>>>
>>
>>
>>
>>>>  2011-04-12 02:13:06,870 ERROR Timer-0
>>
>>
>>
>>>>org.apache.hadoop.hbase.mapreduce.TableInputFormat -
>>
>>
>>
>>>>org.apache.hadoop.hbase.ZooKeeperConnectionException:
>>
>>
>>
>>>>org.apache.hadoop.hbase.ZooKeeperConnectionException:
>>
>>
>>
>>>>org.apache.zookeeper.KeeperException$ConnectionLossException: 
>>>>KeeperErrorCode
>>
>>
>>
>> =
>>
>>
>>
>>>>ConnectionLoss for /hbase        at
>>
>>
>>
>>>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:1000)
>>>>
>>
>>>>
>>
>>
>>
>>>>
>>
>>
>>
>>>>        at
>>
>>
>>
>>>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:303)
>>>>
>>
>>>>
>>
>>
>>
>>>>
>>
>>
>>
>>>>        at
>>
>>
>>
>>>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:294)
>>>>
>>
>>>>
>>
>>
>>
>>>>
>>
>>
>>
>>>>        at
>>
>>
>>
>>>>org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156)
>>>>
>>
>>>>
>>
>>
>>
>>>>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:167)
>>
>>
>>
>>>>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:145)
>>
>>
>>
>>>>        at
>>
>>
>>
>>>>org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:91)
>>>>
>>
>>>>
>>
>>
>>
>>>>
>>
>>
>>
>>>>        at
>>
>>
>>
>>>>org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>>
>>
>>
>>>>        at
>>
>>
>>
>>>>org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>>
>>
>>
>>>>        at
>>
>>
>>
>>> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:882)
>>
>>
>>
>>>>        at
>>
>>
>>
>>>>org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
>>
>>
>>
>>>>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>>
>>
>>
>>>>        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:448)
>>
>>
>>
>>>>
>>
>>
>>
>>>>
>>
>>
>>
>>>>
>>
>>
>>
>>>>
>>
>>
>>
>>>
>>
>>
>>
>>>
>>
>>
>>
>>>
>>
>>
>>
>>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
>

Re: hbase -0.90.x upgrade - zookeeper exception in mapreduce job

Reply via email to