I think you could try applying both patches instead on whatever you're
running right now, they are pretty small.

Another option is using the version of 0.89 we're using here in
production that's already patched https://github.com/stumbleupon/hbase

J-D

On Wed, Mar 2, 2011 at 8:55 AM, Chris Tarnas <[email protected]> wrote:
> If HBASE-3038 is the problem is there anything I should be aware of during 
> upgrading while this region is in this state?
>
> thanks,
> -chris
>
> On Mar 2, 2011, at 8:22 AM, Chris Tarnas wrote:
>
>> I'm pretty sure I hit HBASE-3038, the recovered.edits file is over 2GB
>>
>> I'll push up my upgrade plans.
>>
>> -chris
>>
>> On Mar 2, 2011, at 2:44 AM, Chris Tarnas wrote:
>>
>>> Actually I see now that this EOFException is keeping a region offline, are 
>>> there anyways around this error to bring the region back online? I don't 
>>> have the logs from the regionservers when it went offline but here is the 
>>> section of the master log from then:
>>>
>>> http://pastebin.com/4ZBKGbnZ
>>>
>>> thanks again
>>> -chris
>>>
>>> On Mar 2, 2011, at 1:03 AM, Chris Tarnas wrote:
>>>
>>>> Under heavy loads I've seen a few of EOFException errors in my 
>>>> regionserver logs:
>>>>
>>>> 2011-03-02 02:27:03,669 ERROR 
>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening 
>>>> sequence,h7BpVjo07UDYrkBZBLwWfg\x09fc00fc97be11e00d731605f8e061462c-A2610001-1\x09,1298335975607.8a5d1e4a300792d74f516ba26de869c8.
>>>> java.io.EOFException: 
>>>> hdfs://lxbt006-pvt:8020/hbase/sequence/8a5d1e4a300792d74f516ba26de869c8/recovered.edits/0000000000054475364,
>>>>  entryStart=2336278916, pos=2336278916, end=4672557832, edit=13370
>>>>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
>>>> Method)
>>>>     at 
>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>>>     at 
>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>>>
>>>> Checking the same timeframe in the namenode logs on lcbt006-pvt reveals no 
>>>> ominous messages (no warns, errors, anything), just the same file being 
>>>> opened by a different node:
>>>>
>>>> 2011-03-02 02:27:05,466 INFO 
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=hadoop      
>>>> ip=/10.56.24.13 cmd=open        
>>>> src=/hbase/sequence/8a5d1e4a300792d74f516ba26de869c8/recovered.edits/0000000000054475364
>>>>         dst=null        perm=null
>>>>
>>>>
>>>> The Troubleshooting Wiki mentions it is related to swapping, but none of 
>>>> the nodes are swapping - they all have plenty of RAM. Are there other 
>>>> common causes? Is this anything I should be worried about or just "normal" 
>>>> exceptions, anything else I should look for? I'm on cdh3b3 and will be 
>>>> moving to b4 once I get a chance to run it through a test cluster.
>>>>
>>>> thank you,
>>>> -chris
>>>
>>
>
>

Reply via email to