This is great!! I thank you for seeing the "fear" in the eyes of us users who get ourselves into such states. This will help a lot to give us some hope when we need it most.
Now I think we need to learn a little better about HDFS as that seems to be the root cause to all of our problems in this case. Thanks!! On Fri, Jul 8, 2011 at 6:12 PM, Stack <[email protected]> wrote: > Going by the below where hdfs reports 173 "lost" blocks, I think the > only recourse is as you suggest in the below Wayne, "a recovery mode > that goes through and sees what is out there and rebuilds the meta > based on it." > > I've started work on a recovery tool over in > https://issues.apache.org/jira/browse/HBASE-4058. > > St.Ack > > On Sat, Jul 2, 2011 at 7:27 PM, Wayne <[email protected]> wrote: > > Like most problems we brought it on ourselves. To me the bigger issue is > how > > to get out. Since region definitions are the core of what hbase does, it > > would be great to have a bullet proof recovery process that we can invoke > to > > get us out. Bugs and human error will bring on problems and nothing will > > ever change that, but not having tools to help recover out of the hole is > > where I think it is lacking. HDFS is very stable. The hbase .META. table > > (and -ROOT-?) are the core how HBase manages things. If this gets out of > > whack all is lost. I think it would be great to have automatic backup of > the > > meta table and the ability to recover everything based on the HDFS data > out > > there and the backup. Something like a recovery mode that goes through > and > > sees what is out there and rebuilds the meta based on it. With corrupted > > data and lost regions etc. etc. like any relational database there should > be > > one or more recovery modes that goes through everything and rebuilds it > > consistently. Data may be lost but at least the cluster will be left in a > > 100% consistent/clean state. Manual editing of .META. is not something > > anyone should do (especially me). It is prone to human error...it should > be > > easy to have well tested recover tools that can do the hard work for us. > > > > Below is an attempt at the play by play in case it helps. It all started > > with the root partition of the namenode/hmaster filling up due to a table > > export. > > > > When I restarted hadoop this error was in the namenode log; > > "java.io.IOException: Incorrect data format. logVersion is -18 but > > writables.length is 0" > > > > So i found this< > https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/e35ee876da1a3bbc > >, > > which mentioned editing the namenode log files after verifying our > namenode > > log files seem to have the same symptom. So I copied each namenode "name" > > file to root's home directory and followed their advice. > > That allowed the namenode to start, but then HDFS wouldn't come up. It > kept > > hanging in safe-mode with the repeated error; > > "The ratio of reported blocks 0.9925 has not reached the threshold > 0.9990. > > Safe mode will be turned off automatically." > > So i turned safe-mode off with; "hadoop dfsadmin -safemode leave" and I > > tried to run "hadoop fsck" a few times and it still showed HDFS as > > "corrupt", so i did "hadoop fsck -move" and this is the last part of the > > output; > > > ....................................................................................Status: > > CORRUPT > > Total size: 1423140871890 B (Total open files size: 668770828 B) > > Total dirs: 3172 > > Total files: 2584 (Files currently being written: 11) > > Total blocks (validated): 23095 (avg. block size 61621167 B) (Total open > > file blocks (not validated): 10) > > ******************************** > > CORRUPT FILES: 65 > > MISSING BLOCKS: 173 > > MISSING SIZE: 8560948988 B > > CORRUPT BLOCKS: 173 > > ******************************** > > Minimally replicated blocks: 22922 (99.25092 %) > > Over-replicated blocks: 0 (0.0 %) > > Under-replicated blocks: 0 (0.0 %) > > Mis-replicated blocks: 0 (0.0 %) > > Default replication factor: 3 > > Average block replication: 2.9775276 > > Corrupt blocks: 173 > > Missing replicas: 0 (0.0 %) > > Number of data-nodes: 10 > > Number of racks: 1 > > > > I ran it again and got this; > > .Status: HEALTHY > > Total size: 1414579922902 B (Total open files size: 668770828 B) > > Total dirs: 3272 > > Total files: 2519 (Files currently being written: 11) > > Total blocks (validated): 22922 (avg. block size 61712761 B) (Total open > > file blocks (not validated): 10) > > Minimally replicated blocks: 22922 (100.0 %) > > Over-replicated blocks: 0 (0.0 %) > > Under-replicated blocks: 0 (0.0 %) > > Mis-replicated blocks: 0 (0.0 %) > > Default replication factor: 3 > > Average block replication: 3.0 > > Corrupt blocks: 0 > > Missing replicas: 0 (0.0 %) > > Number of data-nodes: 10 > > Number of racks: 1 > > > > > > The filesystem under path '/' is HEALTHY > > > > So i started everything and it seemed to be superficially functional. > > > > I then shutdown hadoop and restarted. hadoop came up in a matter of a few > > minutes, then hbase took about ten minutes of seeming to copy files > around, > > based on the hbase master logs. > > > > After this we saw region not found client errors on some tables. I ran > hbase > > hbck to look for problems and saw the errors I reported in the original > > post. Add in the ganglia problems and a botched attempt to edit the > .META. > > table which brought us even further into the rabbit hole. I then decided > to > > drop the affected tables but lo and behold one can not disable a table > that > > has messed up regions...so I manually deleted the data but some of the > > .META. table entries were still there. Finally this afternoon we > reformatted > > the entire cluster. > > > > Thanks. > > > > > > > > On Sat, Jul 2, 2011 at 5:25 PM, Stack <[email protected]> wrote: > > > >> On Sat, Jul 2, 2011 at 9:55 AM, Wayne <[email protected]> wrote: > >> > It just returns a ton of errors (import: command not found). Our > cluster > >> is > >> > hosed anyway. I am waiting to get it completely re-installed from > >> scratch. > >> > Hope has long since flown out the window. I just changed my opinion of > >> what > >> > it takes to manage hbase. A Java engineer is required on staff. I also > >> > realized now a backup strategy is more important than for a RDBMS. > Having > >> > RF=3 in HDFS offers no insurance against hbase lossing its shirt and > >> having > >> > .META. getting corrupted. I think I just found the achilles heel. > >> > > >> > > >> > >> Yeah, stability is primary but I do not know how you got into the > >> circumstance you find yourself in. All I can offer is to try and do > >> diagnostics since avoiding hitting this situation again is of utmost > >> importance. > >> > >> St.Ack > >> > > >
