Hey Jack, Seems like you're getting a lot of strange ZooKeeper behavior.
How many nodes are you running with in your quorum? Do you have any weird networking issues? Check out the ZK server logs as well and see if there's anything suspicious going on in there. Also, if you enable ZK debug on the HBase-side, you'll see all the session ids of these clients that seem to be out of sync. You can see which server they get connected to, match it up with those server's logs, and try to figure out if there's anything in common with all these clients getting odd stuff out of ZK. JG > -----Original Message----- > From: Jack Levin [mailto:[email protected]] > Sent: Friday, October 22, 2010 12:23 PM > To: [email protected] > Cc: [email protected] > Subject: Re: large store file split > > Yes exactly > > -Jack > > > On Oct 22, 2010, at 10:49 AM, Stack <[email protected]> wrote: > > > Thats all that is in the log file? You run at DEBUG level, right? > > Was that regionserver working fine otherwise? Just failing the split > > because couldn't "find" root? > > > > St.Ack > > > > On Fri, Oct 22, 2010 at 10:39 AM, Jack Levin <[email protected]> > wrote: > >> Everything else is humming along nicely... regions are loaded, and > >> there are no issues. > >> > >> -Jack > >> > >> PS. I was able to split it finally by doing split 'table' a couple > of times. > >> > >> On Fri, Oct 22, 2010 at 10:26 AM, Stack <[email protected]> wrote: > >>> The root region is not on line according to the below. Is that the > case? > >>> St.Ack > >>> > >>> On Fri, Oct 22, 2010 at 8:47 AM, Jack Levin <[email protected]> > wrote: > >>>> I am trying to split a 20G regionfile, and getting timeouts see > below: > >>>> > >>>> 2010-10-22 08:41:44,851 INFO > >>>> org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting > split > >>>> of region > test_bulk7_tsv,ds18115092010.th.jpg,1287730617803.07eb62bf729e1f9cbb39e > 8fbefe2a1e0. > >>>> 2010-10-22 08:44:06,065 INFO > >>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Running > >>>> rollback of failed split of > >>>> > test_bulk7_tsv,ds18115092010.th.jpg,1287730617803.07eb62bf729e1f9cbb39e > 8fbefe2a1e0.; > >>>> Timed out trying to locate root region > >>>> 2010-10-22 08:44:06,066 INFO > >>>> org.apache.hadoop.hbase.regionserver.CompactSplitThread: > Successful > >>>> rollback of failed split of > >>>> > test_bulk7_tsv,ds18115092010.th.jpg,1287730617803.07eb62bf729e1f9cbb39e > 8fbefe2a1e0. > >>>> > >>>> > >>>> Is there a way to adjust some parameters to have this finish? > >>>> > >>>> -Jack > >>>> > >>> > >>
