I'm pretty sure I hit HBASE-3038, the recovered.edits file is over 2GB I'll push up my upgrade plans.
-chris On Mar 2, 2011, at 2:44 AM, Chris Tarnas wrote: > Actually I see now that this EOFException is keeping a region offline, are > there anyways around this error to bring the region back online? I don't have > the logs from the regionservers when it went offline but here is the section > of the master log from then: > > http://pastebin.com/4ZBKGbnZ > > thanks again > -chris > > On Mar 2, 2011, at 1:03 AM, Chris Tarnas wrote: > >> Under heavy loads I've seen a few of EOFException errors in my regionserver >> logs: >> >> 2011-03-02 02:27:03,669 ERROR >> org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening >> sequence,h7BpVjo07UDYrkBZBLwWfg\x09fc00fc97be11e00d731605f8e061462c-A2610001-1\x09,1298335975607.8a5d1e4a300792d74f516ba26de869c8. >> java.io.EOFException: >> hdfs://lxbt006-pvt:8020/hbase/sequence/8a5d1e4a300792d74f516ba26de869c8/recovered.edits/0000000000054475364, >> entryStart=2336278916, pos=2336278916, end=4672557832, edit=13370 >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >> at >> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) >> at >> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) >> >> Checking the same timeframe in the namenode logs on lcbt006-pvt reveals no >> ominous messages (no warns, errors, anything), just the same file being >> opened by a different node: >> >> 2011-03-02 02:27:05,466 INFO >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=hadoop >> ip=/10.56.24.13 cmd=open >> src=/hbase/sequence/8a5d1e4a300792d74f516ba26de869c8/recovered.edits/0000000000054475364 >> dst=null perm=null >> >> >> The Troubleshooting Wiki mentions it is related to swapping, but none of the >> nodes are swapping - they all have plenty of RAM. Are there other common >> causes? Is this anything I should be worried about or just "normal" >> exceptions, anything else I should look for? I'm on cdh3b3 and will be >> moving to b4 once I get a chance to run it through a test cluster. >> >> thank you, >> -chris >
