Is there anything meaningful in the RS logs? I've seen situations like this where a RS is failing to start due to issues reading the WAL. If this is the case it would list which WAL is problematic, which is zero-length in my experience, so I delete it from HDFS and things start up.
On Mon, May 23, 2011 at 9:16 AM, Himanish Kushary <[email protected]>wrote: > Both the Master and hbck command prints > > org.apache.hadoop.hbase.NotServingRegionException: > org.apache.hadoop.hbase.NotServingRegionException: Region is not online: > -ROOT-,,0 > > After the master thread exits due to the Heap Space error the hbck command > throws: > > org.apache.hadoop.hbase.MasterNotRunningException > > Is there anyway to fix this kind of issue.We are keeping the datanodes up > to > see whether the under replicated blocks may be recovered.Does improper > shutdown of the hadoop/hbase services cause this kind of issues? What > happens in case of disaster recovery situation, how are those situaltions > handled ? > > Thanks > > > On Mon, May 23, 2011 at 11:36 AM, Stack <[email protected]> wrote: > > > What does hbase hbck say? (http://hbase.apache.org/book.html#hbck). > > > > What does the master log have in it? Anything of interest. > > > > St.Ack > > > > On Mon, May 23, 2011 at 7:53 AM, Himanish Kushary <[email protected]> > > wrote: > > > Pressed the send button too soon... > > > > > > Also here is the output from hadoop fsck > > > > > > *Status: HEALTHY* > > > * Total size: 37678848280 B* > > > * Total dirs: 941* > > > * Total files: 902 (Files currently being written: 1)* > > > * Total blocks (validated): 1141 (avg. block size 33022654 B) (Total > open > > > file blocks (not validated): 1)* > > > * Minimally replicated blocks: 1141 (100.0 %)* > > > * Over-replicated blocks: 0 (0.0 %)* > > > * Under-replicated blocks: 906 (79.40403 %)* > > > * Mis-replicated blocks: 0 (0.0 %)* > > > * Default replication factor: 2* > > > * Average block replication: 2.0* > > > * Corrupt blocks: 0* > > > * Missing replicas: 1886 (82.646805 %)* > > > * Number of data-nodes: 2* > > > * Number of racks: 1* > > > *FSCK ended at Mon May 23 10:51:13 EDT 2011 in 257 milliseconds* > > > * > > > * > > > * > > > * > > > *The filesystem under path '/' is HEALTHY* > > > > > > > > > Could anybody please help on how to recover from this scenario . > > > > > > Thanks > > > > > > > > > On Mon, May 23, 2011 at 10:50 AM, Himanish Kushary <[email protected] > > >wrote: > > > > > >> Hi, > > >> > > >> Our hbase/hadoop servers machines were shutdown without bringing the > > hadoop > > >> and hbase services down properly.Now when we try to bring up hbase we > > get > > >> the following error in the master log: > > >> > > >> org.apache.hadoop.hbase.NotServingRegionException: Region is not > online: > > >> -ROOT-,,0 > > >> > > >> Hadoop services (namenode,jobtracker,datanode etc) have come up > properly > > >> and we are able to see the files in HDFS. But HBase Master keeps on > > throwing > > >> this exception and then finally throws a Java Heap Space error. > > >> > > >> Note: We have two datanodes, replication set to 2 and around 900 > blocks > > are > > >> shown as under-replicated. > > >> > > >> --------------------------------- > > >> Thanks & Regards > > >> Himanish > > >> > > > > > > > > > > > > -- > > > Thanks & Regards > > > Himanish > > > > > > > > > -- > Thanks & Regards > Himanish >
