I don't know. See the name of the file that failed w/ 0.89. Look for it being replayed in your 0.90.1. Did it succeed or did we hit EOFE toward of recovered.edits but in 0.90.1 keep going?
St.Ack On Thu, Mar 17, 2011 at 6:26 PM, Chris Tarnas <[email protected]> wrote: > Good news, so I restarted with 0.90.1, and now have all 288 regions online > including the three problematic ones. Could it be those were already updated > to 0.90.1 from my earlier attempt and 0.89 could not cope? > > Thank you all! > -chris > > On Mar 17, 2011, at 6:16 PM, Chris Tarnas wrote: > >> So we loose this data, no recovery options? >> >> -chris >> >> On Mar 17, 2011, at 6:13 PM, Stack wrote: >> >>> Those files look like they were trashed on their tail. There is an >>> issue on this, where recovered.edits files EOFE. For now, only 'soln' >>> is to move them aside. Doesn't look related to your other troubles. >>> May be from 0.89 since I have not seen this in a good while. >>> >>> St.Ack >>> >>> On Thu, Mar 17, 2011 at 6:04 PM, Chris Tarnas <[email protected]> wrote: >>>> Could these have been regions that were updated to 0.90.1 during the first >>>> attempted startup? Should I now go back to that? >>>> >>>> thank you, >>>> -chris >>>> >>>> On Mar 17, 2011, at 5:16 PM, Chris Tarnas wrote: >>>> >>>>> I restarted it with 0.89 (CDHb3b3, patchedin the new hadoop jar), it has >>>>> come up but is having trouble opening three regions (of 285), from hbck: >>>>> >>>>> ERROR: Region >>>>> sequence,8eUWjPYt2fBStS32zCJFzQ\x09A2740005-e5d6f259a1b7617eecd56aadd2867a24-1\x09,1299147700483.6b72bbe5fe43ae429215c1217cf8d6c6. >>>>> is not served by any region server but is listed in META to be on >>>>> server null >>>>> ERROR: Region >>>>> sequence,synonyms\x00unknown\x00accession\x008eUWjPYt2fBStS32zCJFzQ\x09A2740005-8f9efae82805e42c08bc982f4e03523f-2\x09,1299140082607.f9997faf88d52328bfc44b891b9da8c3. >>>>> is not served by any region server but is listed in META to be on >>>>> server null >>>>> ERROR: Region >>>>> sequence,tags\x00pair\x00A2740005-413946f4da4749a65e080e1d703f7309-1\x008eUWjPYt2fBStS32zCJFzQ\x09A2740005-413946f4da4749a65e080e1d703f7309-2\x09,1299140669680.a276ba37eb7f0df9bf8f14dd4d131ff2. >>>>> is not served by any region server but is listed in META to be on >>>>> server null >>>>> >>>>> >>>>> This is the error that is happening in the regionserver logs: >>>>> >>>>> 2011-03-17 19:10:46,842 ERROR >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening >>>>> sequence,tags\x00pair\x00A2740005-413946f4da4749a65e080e1d703f7309-1\x008eUWjPYt2fBStS32zCJFzQ\x09A2740005-413946f4da4749a65e080e1d703f7309-2\x09,1299140669680.a276ba37eb7f0df9bf8f14dd4d131ff2. >>>>> java.io.EOFException: >>>>> hdfs://lxbtdv003-pvt:8020/hbase/sequence/a276ba37eb7f0df9bf8f14dd4d131ff2/recovered.edits/0000000000036949961, >>>>> entryStart=4147415714, pos=4147415714, end=8294831428, edit=9769 >>>>> at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown >>>>> Source) >>>>> at >>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) >>>>> at java.lang.reflect.Constructor.newInstance(Constructor.java:513) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:186) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:142) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:126) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1842) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1817) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1776) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:342) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1588) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1553) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1465) >>>>> at java.lang.Thread.run(Thread.java:619) >>>>> Caused by: java.io.EOFException >>>>> at java.io.DataInputStream.readInt(DataInputStream.java:375) >>>>> at >>>>> org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1910) >>>>> at >>>>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1940) >>>>> at >>>>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1845) >>>>> at >>>>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1891) >>>>> at >>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:140) >>>>> >>>>> On Mar 17, 2011, at 4:55 PM, Stack wrote: >>>>> >>>>>> When we looked at it here at SU the log was REALLY old. Is yours? If >>>>>> really old, you have been living w/o the edits for a while anyways so >>>>>> just remove and press on. Regards going back, we say no -- but sounds >>>>>> like you didn't get off the ground so perhaps you can go back to >>>>>> 0.20.x to replay the old logs. >>>>>> St.Ack >>>>>> >>>>>> On Thu, Mar 17, 2011 at 4:43 PM, Chris Tarnas <[email protected]> wrote: >>>>>>> I know I didn't have a clean shutdown, I thought I had hit HBASE-3038, >>>>>>> but looking further I first had a OOME on a region server. Can I revert >>>>>>> to the oder HBASE to reconstruct the log or has that ship sailed? >>>>>>> >>>>>>> thanks, >>>>>>> -chris >>>>>>> On Mar 17, 2011, at 4:22 PM, Ryan Rawson wrote: >>>>>>> >>>>>>>> If you know you had a clean shutdown just nuke all directories in >>>>>>>> /hbase/.logs >>>>>>>> >>>>>>>> we hit this @ SU as well, its older logfile formats messing us up. >>>>>>>> >>>>>>>> remember, only if you had a CLEAN shutdown, or else you lose data!!!! >>>>>>>> >>>>>>>> On Thu, Mar 17, 2011 at 4:20 PM, Chris Tarnas <[email protected]> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> I just had to upgrade our second cluster CDH3B4 (the 2GB log file >>>>>>>>> problem, same as the reason for upgrading another cluster) and now >>>>>>>>> the master is not coming up, it dies with this error: >>>>>>>>> >>>>>>>>> >>>>>>>>> 2011-03-17 18:15:24,209 FATAL org.apache.hadoop.hbase.master.HMaster: >>>>>>>>> Unhandled exception. Starting shutdown. >>>>>>>>> java.lang.RuntimeException: java.lang.IllegalArgumentException: >>>>>>>>> java.net.URISyntaxException: Relative path in absolute URI: >>>>>>>>> sequence,lists-Gbaa-KOdBQHTxUyTq8MAwGA10:4:16:629:647%230/1Nr24og9ZJoEEzRue1qKSCg%09GA10:4:16:629:647%230/1%09,1300314038804.2e7bdb018c92a7e22be79f21fcb6bee6. >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.checkForErrors(HLogSplitter.java:461) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.access$100(HLogSplitter.java:66) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:745) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:300) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:188) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:180) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:379) >>>>>>>>> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:278) >>>>>>>>> >>>>>>>>> >>>>>>>>> HDFS is fine.. fsck ran clean. >>>>>>>>> >>>>>>>>> Here is more of the master log: >>>>>>>>> >>>>>>>>> http://pastebin.com/Uq5Riczz >>>>>>>>> >>>>>>>>> Thanks for any help! >>>>>>>>> -chris >>>>>>> >>>>>>> >>>>> >>>> >>>> >> > >
