Re: Cluster destabilizes recovering from crash

Jean-Daniel Cryans Wed, 14 Jul 2010 17:51:52 -0700

Stuart,

I answered this to your last serie of emails some hours ago, I still
stand behind my observations:


So the first thing you saw in the master is that it was splitting the
HLogs from the dead region servers (looks like they were kill -9 or
failed very hard during/after your mapreduce job). As for all the HDFS
error messages, it sounds like you are missing the last requirement on
your cluster 
http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements
(either ulimit or xcievers aren't configured correctly). What probably
happened is that the datanodes's xcievers were full with the master
splitting logs, but you can actually verify that by grepping for
either "Too many open files" or "xceiverCount" in the datanode logs.

J-D

On Wed, Jul 14, 2010 at 4:44 PM, Stuart Smith <stu24m...@yahoo.com> wrote:
> Hello All,
>
> Ok, I'm seeing similar behavior to what I mentioned yesterday, but the cause 
> seems to be hbase not handling the Delete/Put load.
>
> I have:
>
>  - A small cluster with 4 1950s as regionservers & 1 M600 as zookeeper & 
> master. NTP is active, and the datetimes are in sync.
>
>  - 1 java client with about 32 threads all uploading files to hbase. Pretty 
> slow though, only 1-4 files a second, this due to non-hbase issues.
>
>  -  A M/R task that is iterating over a table, and gathering data about each 
> row, and deleting rows with improperly formed keys. There are about 40 
> million rows, and probably around 7-10 million improperly formed keys.
>
> I do the validation & delete in the Map task. My connection to hbase is 
> initialized in the Map class constructor. I noticed some map tasks are 
> failing due to being unable to connect (to namenode proxy, or something). The 
> number of failures is low compared to the total number of map tasks, so it's 
> not every connection that's failing (not a network visibility issue).
>
> The hbase master eventually went down, and I restarted everything. Hbase now 
> takes a long time coming back, and appears to be re-generating a lot of edits 
> trying to recover. However, as it goes through the recovery process, region 
> servers just start dying off, and I have to continuously restart them.
>
> Is this known behavior? Is hbase just kind of flaky on recovery at this 
> point? Or did I do something wrong? Should I just not push it that hard with 
> inserts & deletes?
>
> I went through this yesterday, and eventually it should stabilize and come 
> back, but it takes several hours of sitting around and babysitting the 
> cluster.
>
> I'm wondering if it would be bad to just write a cron job that calls my "log 
> on and try to start all regionservers" script. Is there anything bad about 
> trying to start regionservers via ./hbase start regionserver every minute or 
> so? They die off pretty quick.
>
> Take care,
>  -stu
>
>
> --- On Tue, 7/13/10, Stuart Smith <stu24m...@yahoo.com> wrote:
>
>> From: Stuart Smith <stu24m...@yahoo.com>
>> Subject: Re: Trying to write too much to stdout destabilizes cluster across 
>> reboots
>> To: user@hbase.apache.org
>> Date: Tuesday, July 13, 2010, 9:38 PM
>> Last follow up:
>>
>> It appears just rebooting, babysitting regionservers,
>> watching it die.. and repeating the process eventually
>> recovered the cluster.
>>
>> I still haven't gone through all the errors very carefully.
>> I did have hundreds of threads of hbase clients uploading
>> data while I was running my M/R task that dumped the output.
>> Based on the logs, I'm wondering if it was just getting
>> swamped trying to recover & rebuild all the writes? Not
>> sure.
>>
>> Well, it worked eventually, so I'm going to just upgrade to
>> 0.20.5 ASAP, and try to be more careful about how much I
>> write to stdout.
>>
>> Take care,
>>   -stu
>>
>> --- On Tue, 7/13/10, Stuart Smith <stu24m...@yahoo.com>
>> wrote:
>>
>> > From: Stuart Smith <stu24m...@yahoo.com>
>> > Subject: Re: Trying to write too much to stdout
>> destabilizes cluster across reboots
>> > To: user@hbase.apache.org
>> > Date: Tuesday, July 13, 2010, 9:10 PM
>> > One follow up.
>> >
>> > I noticed the master was saying this as well:
>> >
>> > 2010-07-13 17:48:17,595 WARN
>> > org.apache.hadoop.hbase.master.HMaster: Processing
>> pending
>> > operations: ProcessRegionClose of
>> >
>> filestore,5c4ee356835d82de844600895a7df1595c8b53bd378436f6e8d2a1a99acb550c,1279043794914,
>> > false, reassign: true
>> > java.lang.RuntimeException:
>> java.lang.NullPointerException
>> >     at
>> >
>> org.apache.hadoop.hbase.master.RetryableMetaOperation.doWithRetries(RetryableMetaOperation.java:96)
>> >     at
>> >
>> org.apache.hadoop.hbase.master.ProcessRegionClose.process(ProcessRegionClose.java:63)
>> >     at
>> >
>> org.apache.hadoop.hbase.master.HMaster.processToDoQueue(HMaster.java:494)
>> >     at
>> >
>> org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:429)
>> > Caused by: java.lang.NullPointerException
>> >
>> > Which lead me to this:
>> >
>> > https://issues.apache.org/jira/browse/HBASE-2428
>> >
>> > I guess I shouldn't have put off upgrading to 0.20.5
>> :(
>> > I'm curious if by this comment:
>> >
>> > "Since delayqueue is processed first, all we did was
>> try to
>> > process the close even though in the other more live
>> todo
>> > queue, we might have the process of the regionserver
>> > shutdown waiting, the process of the regionserver that
>> was
>> > carrying .meta. We'd never recover. "
>> >
>> > They mean never as in _never_ or just as in until the
>> next
>> > reboot. Or is my cluster (and my data) dead and gone
>> now?
>> >
>> > Take care,
>> >   -stu
>> >
>> > --- On Tue, 7/13/10, Stuart Smith <stu24m...@yahoo.com>
>> > wrote:
>> >
>> > > From: Stuart Smith <stu24m...@yahoo.com>
>> > > Subject: Trying to write too much to stdout
>> > destabilizes cluster across reboots
>> > > To: user@hbase.apache.org
>> > > Date: Tuesday, July 13, 2010, 8:58 PM
>> > > Hello,
>> > >
>> > >    I have a M/R task running over hbase
>> > > tables. I added
>> > >  a one line of output in my reduce tasks
>> running
>> > over a
>> > > set
>> > >  of tens of millions of records. This was a
>> > little silly,
>> > > but
>> > >  I was kind of curious what would happen.
>> > >
>> > >  Well the reduce ran very slowly, and
>> eventually,
>> > I had to
>> > >  kill the tasks, and I re-started the whole
>> > cluster, as
>> > > other
>> > >  clients were slowing up considerably, and the
>> > M/R job
>> > >  started dumping all the stdout directly to the
>> > console it
>> > >  was running on.
>> > >
>> > >  I've re-started it twice, but I couldn't get
>> > hbase to
>> > > come
>> > >  backup (fully). And when I shut down, the
>> region
>> > servers
>> > >  will go down when I tell them, but the master
>> > needs to be
>> > >  killed with -s 9 (SIG_KILL? I'm a little
>> rusty).
>> > It
>> > >  eventually seemed to hobble towards coming
>> back
>> > up.. it
>> > > took
>> > >  a looong.., with messages like:
>> > >
>> > >
>> > >  2010-07-13 16:32:47,722 INFO
>> > >  org.apache.hadoop.hdfs.DFSClient: Abandoning
>> > block
>> > >  blk_-9203944767743886537_123926
>> > >  2010-07-13 16:32:48,207 DEBUG
>> > >  org.apache.hadoop.hbase.regionserver.HLog:
>> > Waiting for
>> > > hlog
>> > >  writers to terminate, iteration #93
>> > >  2010-07-13 16:32:52,205 INFO
>> > >  org.apache.hadoop.hdfs.DFSClient: Exception in
>> > >  createBlockOutputStream java.io.EOFException
>> > >  2010-07-13 16:32:52,205 INFO
>> > >  org.apache.hadoop.hdfs.DFSClient: Abandoning
>> > block
>> > >  blk_-7742458689391305082_123926
>> > >  2010-07-13 16:32:53,072 INFO
>> > >  org.apache.hadoop.hdfs.DFSClient: Exception in
>> > >  createBlockOutputStream java.io.EOFException
>> > >  2010-07-13 16:32:53,072 INFO
>> > >  org.apache.hadoop.hdfs.DFSClient: Abandoning
>> > block
>> > >  blk_-281711885322381263_123926
>> > >  2010-07-13 16:32:53,207 DEBUG
>> > >  org.apache.hadoop.hbase.regionserver.HLog:
>> > Waiting for
>> > > hlog
>> > >  writers to terminate, iteration #94
>> > >  2010-07-13 16:32:53,726 INFO
>> > >  org.apache.hadoop.hdfs.DFSClient: Exception in
>> > >  createBlockOutputStream java.io.IOException:
>> Bad
>> > connect
>> > > ack
>> > >  with firstBadLink 192.168.193.155:50010
>> > >  2010-07-13 16:32:53,726 INFO
>> > >  org.apache.hadoop.hdfs.DFSClient: Abandoning
>> > block
>> > >  blk_-2730636850915812524_123926
>> > >
>> > >  This just seems odd.. dumping that much
>> standard
>> > output
>> > > was
>> > >  kind silly, but I wouldn't think it would
>> wedge
>> > the
>> > > cluster
>> > >  like this... or did something else happen?
>> > Everything has
>> > >  been fine for a while until I did this.
>> > >
>> > >  I let the cluster sit around for a while, and
>> it
>> > did come
>> > >  back for a little while, after printing out
>> this
>> > over and
>> > > over (on
>> > >  master):
>> > >
>> > >  da7738a3ad9,1279039241464 from
>> > >  ubuntu-hadoop-2,60020,1279066792598; 1 of 1
>> > >  2010-07-13 17:28:30,897 INFO
>> > >  org.apache.hadoop.hbase.master.ServerManager:
>> > Processing
>> > >  MSG_REPORT_PROCESS_OPEN:
>> > >
>> > >
>> >
>> filemetadata,unknown_23bf6f189f81218a877a41d1ccd82751527abf19beb61281b8a0f6475e682764,1278533813300
>> > >  from ubuntu-hadoop-4,60020,1279066792615; 1 of
>> > 3
>> > >  2010-07-13 17:28:30,897 INFO
>> > >  org.apache.hadoop.hbase.master.ServerManager:
>> > Processing
>> > >  MSG_REPORT_PROCESS_OPEN:
>> > >
>> > >
>> >
>> filestore,7f45c6a2f5c7d09ae8618bf6d0db5b04f346c583a1dbf24d9db135f2c8da9b57,1279050466715
>> > >  from ubuntu-hadoop-4,60020,1279066792615; 2 of
>> > 3
>> > >  2010-07-13 17:28:30,897 INFO
>> > >  org.apache.hadoop.hbase.master.ServerManager:
>> > Processing
>> > >  MSG_REPORT_PROCESS_OPEN: .META.,,1 from
>> > >  ubuntu-hadoop-4,60020,1279066792615; 3 of 3
>> > >
>> > >  But regionservers kept coming down, and
>> bringing
>> > down the
>> > >  .META region.. eventually the master seemed to
>> > recover,
>> > > and
>> > >  I would bring the regionservers back up
>> > individually..
>> > > one
>> > >  of them would soon crash again, repeating the
>> > cycle. And
>> > > the
>> > >  master would would die with this on the web
>> > page:
>> > >
>> > >  Trying to contact region server
>> > 192.168.193.155:60020 for
>> > >  region .META.,,1, row '', but failed after 3
>> > attempts.
>> > >  Exceptions:
>> > >  java.net.ConnectException: Connection refused
>> > >  java.net.ConnectException: Connection refused
>> > >  java.net.ConnectException: Connection refused
>> > >
>> > >  RequestURI=/master.jsp
>> > >
>> > >  The regionservers appear to start dying with
>> > this
>> > >  statement:
>> > >
>> > >
>> > >  010-07-13 02:12:59,257 DEBUG
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
>> > >  Compaction requested for region
>> > >
>> > >
>> >
>> filestore,5a1ec52137fb8bb38affb5ca034ec3817c6611adc2f5a2b399a4ab92effcdedd,1278970890717/1436533481
>> > >  because:
>> > regionserver/192.168.193.145:60020.cacheFlusher
>> > >  2010-07-13 02:13:19,336 WARN
>> > >  org.apache.hadoop.hdfs.DFSClient: Failed to
>> > connect to
>> > >  /192.168.193.145:50010 for file
>> > >
>> > /hbase/.META./1028785192/info/3836661676936096933 for
>> > > block
>> > >  -649923918996291451:java.io.EOFException
>> > >          at
>> > >
>> > >
>> >
>> java.io.DataInputStream.readShort(DataInputStream.java:298)
>> > >          at
>> > >
>> > >
>> >
>> org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1384)
>> > >          at
>> > >
>> > >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:1853)
>> > >          at
>> > >
>> > >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1922)
>> > >          at
>> > >
>> > >
>> >
>> org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46)
>> > >          at
>> > >
>> > >
>> >
>> org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:101)
>> > >
>> > >
>> > >  Note that it is failing to contact
>> > 192.168.193.145 ...
>> > > this
>> > >  is the ip of the node itself (the dying one).
>> > >
>> > >  There's a long sequence of exceptions, and the
>> > >  regionservers death ends with this:
>> > >
>> > >  010-07-13 17:36:57,068 INFO
>> > >  org.apache.zookeeper.ClientCnxn: EventThread
>> > shut down
>> > >  2010-07-13 17:36:57,069 ERROR
>> > >  org.apache.hadoop.hdfs.DFSClient: Exception
>> > closing file
>> > >
>> > >
>> >
>> /hbase/filestore/compaction.dir/1408434857/4172852130952668434
>> > >  : java.io.EOFException
>> > >  java.io.EOFException
>> > >          at
>> > >
>> > >
>> >
>> java.io.DataInputStream.readByte(DataInputStream.java:250)
>> > >          at
>> > >
>> > >
>> >
>> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>> > >          at
>> > >
>> > >
>> >
>> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>> > >          at
>> > >
>> > org.apache.hadoop.io.Text.readString(Text.java:400)
>> > >          at
>> > >
>> > >
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream
>> > >
>> > >  It just seems like a lot of trouble from one
>> > little
>> > >  System.out.println() statement..
>> > >
>> > >  Restarting hbase just repeats the cycle "Bad
>> > connect ack
>> > > with firstBadLink", eventual recovery,
>> regionservers
>> > start
>> > > crashing, cluster  becomes unusable..
>> > >
>> > >  In summary, it would seem printing out too
>> much
>> > data to
>> > >  stdout can render the whole cluster unstable -
>> > across
>> > >  reboots - and I haven't figured out how to
>> > stabilize it
>> > >  again.
>> > >
>> > >  Take care,
>> > >    -stu
>> > >
>> > >  PS. The address for the mailing list on the
>> > hbase page is
>> > > wrong. It shows:
>> > >
>> > > u...@hhbase.apache.org
>> > > But it should be
>> > > u...@hbase.apache.org.
>> > > I hope! (If this mail goes through, I got it
>> right).
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>
>
>
>

Re: Cluster destabilizes recovering from crash

Reply via email to