When you have a master that is confused like this, you can try this:

- make sure it isnt undergoing log splitting
- kill -9 the master
- restart the master

the startup code will check the cluster status then take appropriate action.

There is a new master which will make all this junk better very soon.
Thanks for putting up with it :-)

-ryan

On Tue, Aug 31, 2010 at 5:19 PM, Matthew LeMieux <[email protected]> wrote:
> I've been very happy with HBase, and am very much looking forward to more 
> stable releases in the future.    Today, I had another one of those 
> unfortunate crashes that seems to occur every few days and need some help 
> understanding how I can speed up the recovery, which is taking longer than 
> usual.   I'm running on CDH3.
>
> Right now, I'm getting log messages printed out at a rate of 100's / second 
> in the master log file.
>
> They start with: "2010-08-31 23:55:15,886 INFO 
> org.apache.hadoop.hbase.master.ServerManager: Processing 
> MSG_REPORT_PROCESS_OPEN:"
>
> And end with:  "a of b"
>
> Where a counts up to b each second.  I seem to remember that I used to see b 
> count down during a previous recover.  So, for example, I might get 200 
> messages one second with lines ending in "1 of 200", "2 of 200", ... "200 of 
> 200".  Then the next second  b might be 199, so the lines would end in "1 of 
> 199", "2 of 199", ....  "199 of 199".
>
> Unfortunately, right now, b seems to stay constant at 148 for a half hour.   
> The only work HBase appears to be doing is printing hundreds of log messages.
>
> It says all the region servers are online.  DFS is healthy with proper 
> replication.  The machines are under low load, having no other jobs or 
> services running on them.  Region servers have either 4 or 6 GB allocated to 
> them. The machines appear to all have CPU utilization of under 15%.
>
> Not all of the region servers are showing progress... on at least one of them 
> I can see messages of the form:
>
> "2010-09-01 00:14:35,209 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:"
>
> These are appearing VERY SLOWLY, and other region servers appear to be 
> completely idle while this is going on.
>
> I really need some help to get things back up and running.  I have people who 
> are waiting to get work done.
>
> How can I convince HBase to just startup and stop fooling around?  (Is the 
> INFO log level intended to be so verbose?)
>
> Thank you for your help,
>
> Matthew
>
>
>

Reply via email to