Tracked it down:

http://pastebin.com/QaFktFKg

From my novice eyes it looks to have been played back cleanly and then deleted. 
Thanks again!

-chris
 
On Mar 17, 2011, at 7:21 PM, Stack wrote:

> But did you see log of its replay of recovered.edits and then
> subsequent delete of this file just before open (The file is only
> deleted if we successfully opened a region).
> 
> St.Ack
> 
> On Thu, Mar 17, 2011 at 6:38 PM, Chris Tarnas <[email protected]> wrote:
>> I looked in the master log and the regionserver log that is hosting a 
>> formerly damaged region now, but the only reference to it was during the 
>> 0.89 timeframe, no EOFE after restart with 0.90.1.
>> 
>> thanks,
>> -chris
>> 
>> On Mar 17, 2011, at 6:30 PM, Stack wrote:
>> 
>>> I don't know.   See the name of the file that failed w/ 0.89.  Look
>>> for it being replayed in your 0.90.1.  Did it succeed or did we hit
>>> EOFE toward of recovered.edits  but in 0.90.1 keep going?
>>> 
>>> St.Ack
>>> 
>>> On Thu, Mar 17, 2011 at 6:26 PM, Chris Tarnas <[email protected]> wrote:
>>>> Good news, so I restarted with 0.90.1, and now have all 288 regions online 
>>>> including the three problematic ones. Could it be those were already 
>>>> updated to 0.90.1 from my earlier attempt and 0.89 could not cope?
>>>> 
>>>> Thank you all!
>>>> -chris
>>>> 
>>>> On Mar 17, 2011, at 6:16 PM, Chris Tarnas wrote:
>>>> 
>>>>> So we loose this data, no recovery options?
>>>>> 
>>>>> -chris
>>>>> 
>>>>> On Mar 17, 2011, at 6:13 PM, Stack wrote:
>>>>> 
>>>>>> Those files look like they were trashed on their tail.  There is an
>>>>>> issue on this, where recovered.edits files EOFE.  For now, only 'soln'
>>>>>> is to move them aside.  Doesn't look related to your other troubles.
>>>>>> May be from 0.89 since I have not seen this in a good while.
>>>>>> 
>>>>>> St.Ack
>>>>>> 
>>>>>> On Thu, Mar 17, 2011 at 6:04 PM, Chris Tarnas <[email protected]> wrote:
>>>>>>> Could these have been regions that were updated to 0.90.1 during the 
>>>>>>> first attempted startup? Should I now go back to that?
>>>>>>> 
>>>>>>> thank you,
>>>>>>> -chris
>>>>>>> 
>>>>>>> On Mar 17, 2011, at 5:16 PM, Chris Tarnas wrote:
>>>>>>> 
>>>>>>>> I restarted it with 0.89 (CDHb3b3, patchedin the new hadoop jar), it 
>>>>>>>> has come up but is having trouble opening three regions (of 285), from 
>>>>>>>> hbck:
>>>>>>>> 
>>>>>>>> ERROR: Region 
>>>>>>>> sequence,8eUWjPYt2fBStS32zCJFzQ\x09A2740005-e5d6f259a1b7617eecd56aadd2867a24-1\x09,1299147700483.6b72bbe5fe43ae429215c1217cf8d6c6.
>>>>>>>>  is not served by any region server  but is listed in META to be on 
>>>>>>>> server null
>>>>>>>> ERROR: Region 
>>>>>>>> sequence,synonyms\x00unknown\x00accession\x008eUWjPYt2fBStS32zCJFzQ\x09A2740005-8f9efae82805e42c08bc982f4e03523f-2\x09,1299140082607.f9997faf88d52328bfc44b891b9da8c3.
>>>>>>>>  is not served by any region server  but is listed in META to be on 
>>>>>>>> server null
>>>>>>>> ERROR: Region 
>>>>>>>> sequence,tags\x00pair\x00A2740005-413946f4da4749a65e080e1d703f7309-1\x008eUWjPYt2fBStS32zCJFzQ\x09A2740005-413946f4da4749a65e080e1d703f7309-2\x09,1299140669680.a276ba37eb7f0df9bf8f14dd4d131ff2.
>>>>>>>>  is not served by any region server  but is listed in META to be on 
>>>>>>>> server null
>>>>>>>> 
>>>>>>>> 
>>>>>>>> This is the error that is happening in the regionserver logs:
>>>>>>>> 
>>>>>>>> 2011-03-17 19:10:46,842 ERROR 
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening 
>>>>>>>> sequence,tags\x00pair\x00A2740005-413946f4da4749a65e080e1d703f7309-1\x008eUWjPYt2fBStS32zCJFzQ\x09A2740005-413946f4da4749a65e080e1d703f7309-2\x09,1299140669680.a276ba37eb7f0df9bf8f14dd4d131ff2.
>>>>>>>> java.io.EOFException: 
>>>>>>>> hdfs://lxbtdv003-pvt:8020/hbase/sequence/a276ba37eb7f0df9bf8f14dd4d131ff2/recovered.edits/0000000000036949961,
>>>>>>>>  entryStart=4147415714, pos=4147415714, end=8294831428, edit=9769
>>>>>>>>       at 
>>>>>>>> sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)
>>>>>>>>       at 
>>>>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>>>>>>>       at 
>>>>>>>> java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:186)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:142)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:126)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1842)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1817)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1776)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:342)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1588)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1553)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1465)
>>>>>>>>       at java.lang.Thread.run(Thread.java:619)
>>>>>>>> Caused by: java.io.EOFException
>>>>>>>>       at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1910)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1940)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1845)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1891)
>>>>>>>>       at 
>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:140)
>>>>>>>> 
>>>>>>>> On Mar 17, 2011, at 4:55 PM, Stack wrote:
>>>>>>>> 
>>>>>>>>> When we looked at it here at SU the log was REALLY old.  Is yours?  If
>>>>>>>>> really old, you have been living w/o the edits for a while anyways so
>>>>>>>>> just remove and press on.  Regards going back, we say no -- but sounds
>>>>>>>>> like you didn't get off the ground so perhaps you can go back to
>>>>>>>>> 0.20.x to replay the old logs.
>>>>>>>>> St.Ack
>>>>>>>>> 
>>>>>>>>> On Thu, Mar 17, 2011 at 4:43 PM, Chris Tarnas <[email protected]> wrote:
>>>>>>>>>> I know I didn't have a clean shutdown, I thought I had hit 
>>>>>>>>>> HBASE-3038, but looking further I first had a OOME on a region 
>>>>>>>>>> server. Can I revert to the oder HBASE to reconstruct the log or has 
>>>>>>>>>> that ship sailed?
>>>>>>>>>> 
>>>>>>>>>> thanks,
>>>>>>>>>> -chris
>>>>>>>>>> On Mar 17, 2011, at 4:22 PM, Ryan Rawson wrote:
>>>>>>>>>> 
>>>>>>>>>>> If you know you had a clean shutdown just nuke all directories in 
>>>>>>>>>>> /hbase/.logs
>>>>>>>>>>> 
>>>>>>>>>>> we hit this @ SU as well, its older logfile formats messing us up.
>>>>>>>>>>> 
>>>>>>>>>>> remember, only if you had a CLEAN shutdown, or else you lose 
>>>>>>>>>>> data!!!!
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Mar 17, 2011 at 4:20 PM, Chris Tarnas <[email protected]> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> I just had to upgrade our second cluster CDH3B4 (the 2GB log file 
>>>>>>>>>>>> problem, same as the reason for upgrading another cluster) and now 
>>>>>>>>>>>> the master is not coming up, it dies with this error:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 2011-03-17 18:15:24,209 FATAL 
>>>>>>>>>>>> org.apache.hadoop.hbase.master.HMaster: Unhandled exception. 
>>>>>>>>>>>> Starting shutdown.
>>>>>>>>>>>> java.lang.RuntimeException: java.lang.IllegalArgumentException: 
>>>>>>>>>>>> java.net.URISyntaxException: Relative path in absolute URI: 
>>>>>>>>>>>> sequence,lists-Gbaa-KOdBQHTxUyTq8MAwGA10:4:16:629:647%230/1Nr24og9ZJoEEzRue1qKSCg%09GA10:4:16:629:647%230/1%09,1300314038804.2e7bdb018c92a7e22be79f21fcb6bee6.
>>>>>>>>>>>>      at 
>>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.checkForErrors(HLogSplitter.java:461)
>>>>>>>>>>>>      at 
>>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.access$100(HLogSplitter.java:66)
>>>>>>>>>>>>      at 
>>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWritingAndClose(HLogSplitter.java:745)
>>>>>>>>>>>>      at 
>>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:300)
>>>>>>>>>>>>      at 
>>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:188)
>>>>>>>>>>>>      at 
>>>>>>>>>>>> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>>>>>>>>>>>>      at 
>>>>>>>>>>>> org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:180)
>>>>>>>>>>>>      at 
>>>>>>>>>>>> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:379)
>>>>>>>>>>>>      at 
>>>>>>>>>>>> org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:278)
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> HDFS is fine.. fsck ran clean.
>>>>>>>>>>>> 
>>>>>>>>>>>> Here is more of the master log:
>>>>>>>>>>>> 
>>>>>>>>>>>> http://pastebin.com/Uq5Riczz
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for any help!
>>>>>>>>>>>> -chris
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to