Tracked it down: http://pastebin.com/QaFktFKg
From my novice eyes it looks to have been played back cleanly and then deleted. Thanks again! -chris On Mar 17, 2011, at 7:21 PM, Stack wrote: > But did you see log of its replay of recovered.edits and then > subsequent delete of this file just before open (The file is only > deleted if we successfully opened a region). > > St.Ack > > On Thu, Mar 17, 2011 at 6:38 PM, Chris Tarnas <[email protected]> wrote: >> I looked in the master log and the regionserver log that is hosting a >> formerly damaged region now, but the only reference to it was during the >> 0.89 timeframe, no EOFE after restart with 0.90.1. >> >> thanks, >> -chris >> >> On Mar 17, 2011, at 6:30 PM, Stack wrote: >> >>> I don't know. See the name of the file that failed w/ 0.89. Look >>> for it being replayed in your 0.90.1. Did it succeed or did we hit >>> EOFE toward of recovered.edits but in 0.90.1 keep going? >>> >>> St.Ack >>> >>> On Thu, Mar 17, 2011 at 6:26 PM, Chris Tarnas <[email protected]> wrote: >>>> Good news, so I restarted with 0.90.1, and now have all 288 regions online >>>> including the three problematic ones. Could it be those were already >>>> updated to 0.90.1 from my earlier attempt and 0.89 could not cope? >>>> >>>> Thank you all! >>>> -chris >>>> >>>> On Mar 17, 2011, at 6:16 PM, Chris Tarnas wrote: >>>> >>>>> So we loose this data, no recovery options? >>>>> >>>>> -chris >>>>> >>>>> On Mar 17, 2011, at 6:13 PM, Stack wrote: >>>>> >>>>>> Those files look like they were trashed on their tail. There is an >>>>>> issue on this, where recovered.edits files EOFE. For now, only 'soln' >>>>>> is to move them aside. Doesn't look related to your other troubles. >>>>>> May be from 0.89 since I have not seen this in a good while. >>>>>> >>>>>> St.Ack >>>>>> >>>>>> On Thu, Mar 17, 2011 at 6:04 PM, Chris Tarnas <[email protected]> wrote: >>>>>>> Could these have been regions that were updated to 0.90.1 during the >>>>>>> first attempted startup? Should I now go back to that? >>>>>>> >>>>>>> thank you, >>>>>>> -chris >>>>>>> >>>>>>> On Mar 17, 2011, at 5:16 PM, Chris Tarnas wrote: >>>>>>> >>>>>>>> I restarted it with 0.89 (CDHb3b3, patchedin the new hadoop jar), it >>>>>>>> has come up but is having trouble opening three regions (of 285), from >>>>>>>> hbck: >>>>>>>> >>>>>>>> ERROR: Region >>>>>>>> sequence,8eUWjPYt2fBStS32zCJFzQ\x09A2740005-e5d6f259a1b7617eecd56aadd2867a24-1\x09,1299147700483.6b72bbe5fe43ae429215c1217cf8d6c6. >>>>>>>> is not served by any region server but is listed in META to be on >>>>>>>> server null >>>>>>>> ERROR: Region >>>>>>>> sequence,synonyms\x00unknown\x00accession\x008eUWjPYt2fBStS32zCJFzQ\x09A2740005-8f9efae82805e42c08bc982f4e03523f-2\x09,1299140082607.f9997faf88d52328bfc44b891b9da8c3. >>>>>>>> is not served by any region server but is listed in META to be on >>>>>>>> server null >>>>>>>> ERROR: Region >>>>>>>> sequence,tags\x00pair\x00A2740005-413946f4da4749a65e080e1d703f7309-1\x008eUWjPYt2fBStS32zCJFzQ\x09A2740005-413946f4da4749a65e080e1d703f7309-2\x09,1299140669680.a276ba37eb7f0df9bf8f14dd4d131ff2. >>>>>>>> is not served by any region server but is listed in META to be on >>>>>>>> server null >>>>>>>> >>>>>>>> >>>>>>>> This is the error that is happening in the regionserver logs: >>>>>>>> >>>>>>>> 2011-03-17 19:10:46,842 ERROR >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening >>>>>>>> sequence,tags\x00pair\x00A2740005-413946f4da4749a65e080e1d703f7309-1\x008eUWjPYt2fBStS32zCJFzQ\x09A2740005-413946f4da4749a65e080e1d703f7309-2\x09,1299140669680.a276ba37eb7f0df9bf8f14dd4d131ff2. >>>>>>>> java.io.EOFException: >>>>>>>> hdfs://lxbtdv003-pvt:8020/hbase/sequence/a276ba37eb7f0df9bf8f14dd4d131ff2/recovered.edits/0000000000036949961, >>>>>>>> entryStart=4147415714, pos=4147415714, end=8294831428, edit=9769 >>>>>>>> at >>>>>>>> sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source) >>>>>>>> at >>>>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) >>>>>>>> at >>>>>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:513) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:186) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:142) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:126) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1842) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1817) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1776) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:342) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1588) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1553) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1465) >>>>>>>> at java.lang.Thread.run(Thread.java:619) >>>>>>>> Caused by: java.io.EOFException >>>>>>>> at java.io.DataInputStream.readInt(DataInputStream.java:375) >>>>>>>> at >>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1910) >>>>>>>> at >>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1940) >>>>>>>> at >>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1845) >>>>>>>> at >>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1891) >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:140) >>>>>>>> >>>>>>>> On Mar 17, 2011, at 4:55 PM, Stack wrote: >>>>>>>> >>>>>>>>> When we looked at it here at SU the log was REALLY old. Is yours? If >>>>>>>>> really old, you have been living w/o the edits for a while anyways so >>>>>>>>> just remove and press on. Regards going back, we say no -- but sounds >>>>>>>>> like you didn't get off the ground so perhaps you can go back to >>>>>>>>> 0.20.x to replay the old logs. >>>>>>>>> St.Ack >>>>>>>>> >>>>>>>>> On Thu, Mar 17, 2011 at 4:43 PM, Chris Tarnas <[email protected]> wrote: >>>>>>>>>> I know I didn't have a clean shutdown, I thought I had hit >>>>>>>>>> HBASE-3038, but looking further I first had a OOME on a region >>>>>>>>>> server. Can I revert to the oder HBASE to reconstruct the log or has >>>>>>>>>> that ship sailed? >>>>>>>>>> >>>>>>>>>> thanks, >>>>>>>>>> -chris >>>>>>>>>> On Mar 17, 2011, at 4:22 PM, Ryan Rawson wrote: >>>>>>>>>> >>>>>>>>>>> If you know you had a clean shutdown just nuke all directories in >>>>>>>>>>> /hbase/.logs >>>>>>>>>>> >>>>>>>>>>> we hit this @ SU as well, its older logfile formats messing us up. >>>>>>>>>>> >>>>>>>>>>> remember, only if you had a CLEAN shutdown, or else you lose >>>>>>>>>>> data!!!! >>>>>>>>>>> >>>>>>>>>>> On Thu, Mar 17, 2011 at 4:20 PM, Chris Tarnas <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I just had to upgrade our second cluster CDH3B4 (the 2GB log file >>>>>>>>>>>> problem, same as the reason for upgrading another cluster) and now >>>>>>>>>>>> the master is not coming up, it dies with this error: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2011-03-17 18:15:24,209 FATAL >>>>>>>>>>>> org.apache.hadoop.hbase.master.HMaster: Unhandled exception. >>>>>>>>>>>> Starting shutdown. >>>>>>>>>>>> java.lang.RuntimeException: java.lang.IllegalArgumentException: >>>>>>>>>>>> java.net.URISyntaxException: Relative path in absolute URI: >>>>>>>>>>>> sequence,lists-Gbaa-KOdBQHTxUyTq8MAwGA10:4:16:629:647%230/1Nr24og9ZJoEEzRue1qKSCg%09GA10:4:16:629:647%230/1%09,1300314038804.2e7bdb018c92a7e22be79f21fcb6bee6. >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.checkForErrors(HLogSplitter.java:461) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.access$100(HLogSplitter.java:66) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:745) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:300) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:188) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:180) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:379) >>>>>>>>>>>> at >>>>>>>>>>>> org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:278) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> HDFS is fine.. fsck ran clean. >>>>>>>>>>>> >>>>>>>>>>>> Here is more of the master log: >>>>>>>>>>>> >>>>>>>>>>>> http://pastebin.com/Uq5Riczz >>>>>>>>>>>> >>>>>>>>>>>> Thanks for any help! >>>>>>>>>>>> -chris >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>>> >> >>
