If that doesn't work you probably have an invalid reference file and you will find that in RS logs for the HLog split that is never finishing. On Aug 1, 2013 1:38 PM, "Kevin O'dell" <[email protected]> wrote:
> JM, > > Stop HBase > rmr /hbase from zkcli > Sideline META > Run offline meta repair > Start HBase > On Aug 1, 2013 1:01 PM, "Jean-Marc Spaggiari" <[email protected]> > wrote: > >> Hi Jimmy, >> >> I should still have all the logs. >> >> What I did is pretty simple. >> >> I tried to turn the cluster off while a single regioned 250GB table was >> under major_compaction to get splitted. >> >> I will targz all the logs for the few last days and make that available. >> >> On the other side, I'm still not able to bring it back up... >> >> JM >> >> 2013/8/1 Jimmy Xiang <[email protected]> >> >> > Something went wrong with split. It should be easy to fix your cluster. >> > However, it will be more interesting to find out how it happened. Do you >> > remember what has happened since it was good previously? Do you have all >> > the logs? >> > >> > >> > On Thu, Aug 1, 2013 at 7:08 AM, Jean-Marc Spaggiari < >> > [email protected] >> > > wrote: >> > >> > > I tried to remove the znodes but got the same result. So I shutted >> down >> > all >> > > the RS and restarted HBase, and now I have 0 regions for this table. >> > > Running HBCK. Seems that it has a lot to do... >> > > >> > > 2013/8/1 Kevin O'dell <[email protected]> >> > > >> > > > Yes you can if HBase is down, first I would copy .META out of HDFS >> > local >> > > > and then you can search it for split issues. Deleting those znodes >> > should >> > > > clear this up though. >> > > > On Aug 1, 2013 8:52 AM, "Jean-Marc Spaggiari" < >> [email protected] >> > > >> > > > wrote: >> > > > >> > > > > I can't check the meta since HBase is down. >> > > > > >> > > > > Regarding HDFS, I took few random lines like: >> > > > > 2013-08-01 08:45:57,260 WARN >> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region >> > > > > 28328fdb7181cbd9cc4d6814775e8895 not found on server >> > > > > node4,60020,1375319042033; failed processing >> > > > > 2013-08-01 08:45:57,260 WARN >> > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received SPLIT >> for >> > > > region >> > > > > 28328fdb7181cbd9cc4d6814775e8895 from server >> > node4,60020,1375319042033 >> > > > but >> > > > > it doesn't exist anymore, probably already processed its split >> > > > > >> > > > > And each time, there is nothing like that. >> > > > > hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -lsr / | grep >> > > > > 28328fdb7181cbd9cc4d6814775e8895 >> > > > > >> > > > > On ZK side: >> > > > > [zk: localhost:2181(CONNECTED) 3] ls /hbase/splitlog >> > > > > >> > > > > [zk: localhost:2181(CONNECTED) 10] ls /hbase/unassigned >> > > > > [28328fdb7181cbd9cc4d6814775e8895, >> a8781a598c46f19723a2405345b58470, >> > > > > b7ebfeb63b10997736fd12920fde2bb8, >> d95bb27cc026511c2a8c8ad155e79bf6, >> > > > > 270a9c371fcbe9cd9a04986e0b77d16b, >> aff4d1d8bf470458bb19525e8aef0759] >> > > > > >> > > > > Can I just delete those zknodes? Worst case hbck will find them >> back >> > > from >> > > > > HDFS if required? >> > > > > >> > > > > JM >> > > > > >> > > > > 2013/8/1 Kevin O'dell <[email protected]> >> > > > > >> > > > > > Does it exist in meta or hdfs? >> > > > > > On Aug 1, 2013 8:24 AM, "Jean-Marc Spaggiari" < >> > > [email protected] >> > > > > >> > > > > > wrote: >> > > > > > >> > > > > > > My master keep logging that: >> > > > > > > >> > > > > > > 2013-07-31 21:52:59,201 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server >> > > > > > > node7,60020,1375319044055; failed processing >> > > > > > > 2013-07-31 21:52:59,201 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received >> SPLIT >> > > for >> > > > > > region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server >> > > > node7,60020,1375319044055 >> > > > > > but >> > > > > > > it doesn't exist anymore, probably already processed its split >> > > > > > > 2013-07-31 21:52:59,339 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server >> > > > > > > node7,60020,1375319044055; failed processing >> > > > > > > 2013-07-31 21:52:59,339 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received >> SPLIT >> > > for >> > > > > > region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server >> > > > node7,60020,1375319044055 >> > > > > > but >> > > > > > > it doesn't exist anymore, probably already processed its split >> > > > > > > 2013-07-31 21:52:59,461 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server >> > > > > > > node7,60020,1375319044055; failed processing >> > > > > > > 2013-07-31 21:52:59,461 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received >> SPLIT >> > > for >> > > > > > region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server >> > > > node7,60020,1375319044055 >> > > > > > but >> > > > > > > it doesn't exist anymore, probably already processed its split >> > > > > > > 2013-07-31 21:52:59,636 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server >> > > > > > > node7,60020,1375319044055; failed processing >> > > > > > > 2013-07-31 21:52:59,636 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received >> SPLIT >> > > for >> > > > > > region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server >> > > > node7,60020,1375319044055 >> > > > > > but >> > > > > > > it doesn't exist anymore, probably already processed its split >> > > > > > > 2013-07-31 21:53:00,074 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server >> > > > > > > node7,60020,1375319044055; failed processing >> > > > > > > 2013-07-31 21:53:00,074 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received >> SPLIT >> > > for >> > > > > > region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server >> > > > node7,60020,1375319044055 >> > > > > > but >> > > > > > > it doesn't exist anymore, probably already processed its split >> > > > > > > 2013-07-31 21:53:00,261 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server >> > > > > > > node7,60020,1375319044055; failed processing >> > > > > > > 2013-07-31 21:53:00,261 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received >> SPLIT >> > > for >> > > > > > region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server >> > > > node7,60020,1375319044055 >> > > > > > but >> > > > > > > it doesn't exist anymore, probably already processed its split >> > > > > > > 2013-07-31 21:53:00,417 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server >> > > > > > > node7,60020,1375319044055; failed processing >> > > > > > > 2013-07-31 21:53:00,417 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received >> SPLIT >> > > for >> > > > > > region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server >> > > > node7,60020,1375319044055 >> > > > > > but >> > > > > > > it doesn't exist anymore, probably already processed its split >> > > > > > > >> > > > > > > hbase@node3:~/hbase-0.94.3$ cat >> > > logs/hbase-hbase-master-node3.log* | >> > > > > > grep >> > > > > > > "Region 270a9c371fcbe9cd9a04986e0b77d16b not found " | wc >> > > > > > > 5042 65546 927728 >> > > > > > > >> > > > > > > >> > > > > > > Then crashed. >> > > > > > > 2013-07-31 22:22:46,072 FATAL >> > > org.apache.hadoop.hbase.master.HMaster: >> > > > > > > Master server abort: loaded coprocessors are: [] >> > > > > > > 2013-07-31 22:22:46,073 FATAL >> > > org.apache.hadoop.hbase.master.HMaster: >> > > > > > > Unexpected state : work_proposed,\x02\xE8\x92'\x00\x00\x00\x00 >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> http://video.inportnews.ca/search/all/source/sun-news-network/harry-potter-in-translation/68463493001/page/1526,1375307272709.d95bb27cc026511c2a8c8ad155e79bf6 >> > > > > > . >> > > > > > > state=OPENING, ts=1375323766008, >> server=node7,60020,1375319044055 >> > > .. >> > > > > > > Cannot >> > > > > > > transit it to OFFLINE. >> > > > > > > java.lang.IllegalStateException: Unexpected state : >> > > > > > > work_proposed,\x02\xE8\x92'\x00\x00\x00\x00 >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> http://video.inportnews.ca/search/all/source/sun-news-network/harry-potter-in-translation/68463493001/page/1526,1375307272709.d95bb27cc026511c2a8c8ad155e79bf6 >> > > > > > . >> > > > > > > state=OPENING, ts=1375323766008, >> server=node7,60020,1375319044055 >> > > .. >> > > > > > > Cannot >> > > > > > > transit it to OFFLINE. >> > > > > > > at >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1879) >> > > > > > > at >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1688) >> > > > > > > at >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) >> > > > > > > at >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) >> > > > > > > at >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) >> > > > > > > at >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) >> > > > > > > at >> > > > > > > >> > > > > >> > > >> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:175) >> > > > > > > at >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >> > > > > > > at >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >> > > > > > > at java.lang.Thread.run(Thread.java:722) >> > > > > > > 2013-07-31 22:22:46,075 INFO >> > > org.apache.hadoop.hbase.master.HMaster: >> > > > > > > Aborting >> > > > > > > 2013-07-31 22:22:46,075 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > > > > Stopping >> > > > > > > server on 60000 >> > > > > > > 2013-07-31 22:22:46,075 INFO >> > > > org.apache.hadoop.hbase.master.HMaster$2: >> > > > > > > node3,60000,1375322220614-BalancerChore exiting >> > > > > > > 2013-07-31 22:22:46,075 INFO >> > > > > > org.apache.hadoop.hbase.master.CatalogJanitor: >> > > > > > > node3,60000,1375322220614-CatalogJanitor exiting >> > > > > > > 2013-07-31 22:22:46,076 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > > > > Stopping >> > > > > > > IPC Server listener on 60000 >> > > > > > > 2013-07-31 22:22:46,077 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > IPC >> > > > > > Server >> > > > > > > handler 9 on 60000: exiting >> > > > > > > 2013-07-31 22:22:46,077 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > IPC >> > > > > > Server >> > > > > > > handler 2 on 60000: exiting >> > > > > > > 2013-07-31 22:22:46,077 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > IPC >> > > > > > Server >> > > > > > > handler 4 on 60000: exiting >> > > > > > > 2013-07-31 22:22:46,077 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > IPC >> > > > > > Server >> > > > > > > handler 8 on 60000: exiting >> > > > > > > 2013-07-31 22:22:46,076 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > IPC >> > > > > > Server >> > > > > > > handler 6 on 60000: exiting >> > > > > > > 2013-07-31 22:22:46,076 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > > REPL >> > > > > IPC >> > > > > > > Server handler 2 on 60000: exiting >> > > > > > > 2013-07-31 22:22:46,076 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > > REPL >> > > > > IPC >> > > > > > > Server handler 1 on 60000: exiting >> > > > > > > 2013-07-31 22:22:46,076 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > > REPL >> > > > > IPC >> > > > > > > Server handler 0 on 60000: exiting >> > > > > > > 2013-07-31 22:22:46,077 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > IPC >> > > > > > Server >> > > > > > > handler 3 on 60000: exiting >> > > > > > > 2013-07-31 22:22:46,076 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > IPC >> > > > > > Server >> > > > > > > handler 0 on 60000: exiting >> > > > > > > 2013-07-31 22:22:46,077 INFO >> > > > > > > org.apache.hadoop.hbase.master.cleaner.HFileCleaner: >> > > > > > > master-node3,60000,1375322220614.archivedHFileCleaner exiting >> > > > > > > 2013-07-31 22:22:46,077 INFO >> > > > > > > org.apache.hadoop.hbase.master.cleaner.LogCleaner: >> > > > > > > master-node3,60000,1375322220614.oldLogCleaner exiting >> > > > > > > 2013-07-31 22:22:46,077 INFO >> > > org.apache.hadoop.hbase.master.HMaster: >> > > > > > > Stopping infoServer >> > > > > > > 2013-07-31 22:22:46,077 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > > > > Stopping >> > > > > > > IPC Server Responder >> > > > > > > 2013-07-31 22:22:46,077 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > IPC >> > > > > > Server >> > > > > > > handler 5 on 60000: exiting >> > > > > > > 2013-07-31 22:22:46,077 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > IPC >> > > > > > Server >> > > > > > > handler 7 on 60000: exiting >> > > > > > > 2013-07-31 22:22:46,077 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > IPC >> > > > > > Server >> > > > > > > handler 1 on 60000: exiting >> > > > > > > 2013-07-31 22:22:46,077 INFO >> org.apache.hadoop.ipc.HBaseServer: >> > > > > Stopping >> > > > > > > IPC Server Responder >> > > > > > > 2013-07-31 22:22:46,078 INFO org.mortbay.log: Stopped >> > > > > > > [email protected]:60010 >> > > > > > > 2013-07-31 22:22:46,127 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server >> > > > > > > node7,60020,1375319044055; failed processing >> > > > > > > 2013-07-31 22:22:46,127 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received >> SPLIT >> > > for >> > > > > > region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server >> > > > node7,60020,1375319044055 >> > > > > > but >> > > > > > > it doesn't exist anymore, probably already processed its split >> > > > > > > 2013-07-31 22:22:46,181 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region >> > > > > > > aff4d1d8bf470458bb19525e8aef0759 not found on server >> > > > > > > node2,60020,1375319046072; failed processing >> > > > > > > 2013-07-31 22:22:46,181 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received >> SPLIT >> > > for >> > > > > > region >> > > > > > > aff4d1d8bf470458bb19525e8aef0759 from server >> > > > node2,60020,1375319046072 >> > > > > > but >> > > > > > > it doesn't exist anymore, probably already processed its split >> > > > > > > 2013-07-31 22:22:46,193 ERROR >> > > > > > > org.apache.hadoop.hbase.executor.ExecutorService: Cannot >> submit >> > > > > > > [ClosedRegionHandler-node3,60000,1375322220614-179] because >> the >> > > > > executor >> > > > > > is >> > > > > > > missing. Is this process shutting down? >> > > > > > > 2013-07-31 22:22:46,250 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region >> > > > > > > 28328fdb7181cbd9cc4d6814775e8895 not found on server >> > > > > > > node4,60020,1375319042033; failed processing >> > > > > > > 2013-07-31 22:22:46,250 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received >> SPLIT >> > > for >> > > > > > region >> > > > > > > 28328fdb7181cbd9cc4d6814775e8895 from server >> > > > node4,60020,1375319042033 >> > > > > > but >> > > > > > > it doesn't exist anymore, probably already processed its split >> > > > > > > 2013-07-31 22:22:46,262 INFO >> > > > > > > org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor: >> > > > > > > node3,60000,1375322220614.splitLogManagerTimeoutMonitor >> exiting >> > > > > > > 2013-07-31 22:22:46,293 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b not found on server >> > > > > > > node7,60020,1375319044055; failed processing >> > > > > > > 2013-07-31 22:22:46,293 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received >> SPLIT >> > > for >> > > > > > region >> > > > > > > 270a9c371fcbe9cd9a04986e0b77d16b from server >> > > > node7,60020,1375319044055 >> > > > > > but >> > > > > > > it doesn't exist anymore, probably already processed its split >> > > > > > > 2013-07-31 22:22:46,294 INFO >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: >> > > > > > > Closed zookeeper sessionid=0x240024f5666144b >> > > > > > > 2013-07-31 22:22:46,361 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region >> > > > > > > aff4d1d8bf470458bb19525e8aef0759 not found on server >> > > > > > > node2,60020,1375319046072; failed processing >> > > > > > > 2013-07-31 22:22:46,362 WARN >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Received >> SPLIT >> > > for >> > > > > > region >> > > > > > > aff4d1d8bf470458bb19525e8aef0759 from server >> > > > node2,60020,1375319046072 >> > > > > > but >> > > > > > > it doesn't exist anymore, probably already processed its split >> > > > > > > 2013-07-31 22:22:46,388 INFO >> > > > > > > >> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: >> > > > > > > node3,60000,1375322220614.timeoutMonitor exiting >> > > > > > > 2013-07-31 22:22:46,388 INFO >> > > > > > > org.apache.hadoop.hbase.master.AssignmentManager$TimerUpdater: >> > > > > > > node3,60000,1375322220614.timerUpdater exiting >> > > > > > > 2013-07-31 22:22:46,402 INFO >> > > org.apache.hadoop.hbase.master.HMaster: >> > > > > > > HMaster main thread exiting >> > > > > > > 2013-07-31 22:22:46,402 ERROR >> > > > > > > org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to >> > start >> > > > > master >> > > > > > > java.lang.RuntimeException: HMaster Aborted >> > > > > > > at >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160) >> > > > > > > at >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104) >> > > > > > > at >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> > > > > > > at >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) >> > > > > > > at >> > > org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2100) >> > > > > > > >> > > > > > > Seems that HBCK can't do anything. I will start to look at the >> > > files >> > > > > into >> > > > > > > HDFS, but suggestions are welcome. >> > > > > > > >> > > > > > > JM >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >
