When you have a master that is confused like this, you can try this: - make sure it isnt undergoing log splitting - kill -9 the master - restart the master
the startup code will check the cluster status then take appropriate action. There is a new master which will make all this junk better very soon. Thanks for putting up with it :-) -ryan On Tue, Aug 31, 2010 at 5:19 PM, Matthew LeMieux <[email protected]> wrote: > I've been very happy with HBase, and am very much looking forward to more > stable releases in the future. Today, I had another one of those > unfortunate crashes that seems to occur every few days and need some help > understanding how I can speed up the recovery, which is taking longer than > usual. I'm running on CDH3. > > Right now, I'm getting log messages printed out at a rate of 100's / second > in the master log file. > > They start with: "2010-08-31 23:55:15,886 INFO > org.apache.hadoop.hbase.master.ServerManager: Processing > MSG_REPORT_PROCESS_OPEN:" > > And end with: "a of b" > > Where a counts up to b each second. I seem to remember that I used to see b > count down during a previous recover. So, for example, I might get 200 > messages one second with lines ending in "1 of 200", "2 of 200", ... "200 of > 200". Then the next second b might be 199, so the lines would end in "1 of > 199", "2 of 199", .... "199 of 199". > > Unfortunately, right now, b seems to stay constant at 148 for a half hour. > The only work HBase appears to be doing is printing hundreds of log messages. > > It says all the region servers are online. DFS is healthy with proper > replication. The machines are under low load, having no other jobs or > services running on them. Region servers have either 4 or 6 GB allocated to > them. The machines appear to all have CPU utilization of under 15%. > > Not all of the region servers are showing progress... on at least one of them > I can see messages of the form: > > "2010-09-01 00:14:35,209 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: Worker: MSG_REGION_OPEN:" > > These are appearing VERY SLOWLY, and other region servers appear to be > completely idle while this is going on. > > I really need some help to get things back up and running. I have people who > are waiting to get work done. > > How can I convince HBase to just startup and stop fooling around? (Is the > INFO log level intended to be so verbose?) > > Thank you for your help, > > Matthew > > >
