Oh, well that's very nice to hear Chris!

Let us know if there's anything we can improve in this new release,

J-D

On Wed, Mar 2, 2011 at 4:23 PM, Chris Tarnas <[email protected]> wrote:
> Thanks for your help. I pushed up my upgrade plans and just finished 
> installing 0.90.1 (cdh3b4) and that solved the EOF error as well as a general 
> performance boost with my initial testing.
>
> -chris
>
> On Mar 2, 2011, at 9:18 AM, Jean-Daniel Cryans wrote:
>
>> I think you could try applying both patches instead on whatever you're
>> running right now, they are pretty small.
>>
>> Another option is using the version of 0.89 we're using here in
>> production that's already patched https://github.com/stumbleupon/hbase
>>
>> J-D
>>
>> On Wed, Mar 2, 2011 at 8:55 AM, Chris Tarnas <[email protected]> wrote:
>>> If HBASE-3038 is the problem is there anything I should be aware of during 
>>> upgrading while this region is in this state?
>>>
>>> thanks,
>>> -chris
>>>
>>> On Mar 2, 2011, at 8:22 AM, Chris Tarnas wrote:
>>>
>>>> I'm pretty sure I hit HBASE-3038, the recovered.edits file is over 2GB
>>>>
>>>> I'll push up my upgrade plans.
>>>>
>>>> -chris
>>>>
>>>> On Mar 2, 2011, at 2:44 AM, Chris Tarnas wrote:
>>>>
>>>>> Actually I see now that this EOFException is keeping a region offline, 
>>>>> are there anyways around this error to bring the region back online? I 
>>>>> don't have the logs from the regionservers when it went offline but here 
>>>>> is the section of the master log from then:
>>>>>
>>>>> http://pastebin.com/4ZBKGbnZ
>>>>>
>>>>> thanks again
>>>>> -chris
>>>>>
>>>>> On Mar 2, 2011, at 1:03 AM, Chris Tarnas wrote:
>>>>>
>>>>>> Under heavy loads I've seen a few of EOFException errors in my 
>>>>>> regionserver logs:
>>>>>>
>>>>>> 2011-03-02 02:27:03,669 ERROR 
>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening 
>>>>>> sequence,h7BpVjo07UDYrkBZBLwWfg\x09fc00fc97be11e00d731605f8e061462c-A2610001-1\x09,1298335975607.8a5d1e4a300792d74f516ba26de869c8.
>>>>>> java.io.EOFException: 
>>>>>> hdfs://lxbt006-pvt:8020/hbase/sequence/8a5d1e4a300792d74f516ba26de869c8/recovered.edits/0000000000054475364,
>>>>>>  entryStart=2336278916, pos=2336278916, end=4672557832, edit=13370
>>>>>>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
>>>>>> Method)
>>>>>>     at 
>>>>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>>>>>>     at 
>>>>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>>>>>
>>>>>> Checking the same timeframe in the namenode logs on lcbt006-pvt reveals 
>>>>>> no ominous messages (no warns, errors, anything), just the same file 
>>>>>> being opened by a different node:
>>>>>>
>>>>>> 2011-03-02 02:27:05,466 INFO 
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=hadoop    
>>>>>>   ip=/10.56.24.13 cmd=open        
>>>>>> src=/hbase/sequence/8a5d1e4a300792d74f516ba26de869c8/recovered.edits/0000000000054475364
>>>>>>         dst=null        perm=null
>>>>>>
>>>>>>
>>>>>> The Troubleshooting Wiki mentions it is related to swapping, but none of 
>>>>>> the nodes are swapping - they all have plenty of RAM. Are there other 
>>>>>> common causes? Is this anything I should be worried about or just 
>>>>>> "normal" exceptions, anything else I should look for? I'm on cdh3b3 and 
>>>>>> will be moving to b4 once I get a chance to run it through a test 
>>>>>> cluster.
>>>>>>
>>>>>> thank you,
>>>>>> -chris
>>>>>
>>>>
>>>
>>>
>
>

Reply via email to