Under heavy loads I've seen a few of EOFException errors in my regionserver
logs:
2011-03-02 02:27:03,669 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening
sequence,h7BpVjo07UDYrkBZBLwWfg\x09fc00fc97be11e00d731605f8e061462c-A2610001-1\x09,1298335975607.8a5d1e4a300792d74f516ba26de869c8.
java.io.EOFException:
hdfs://lxbt006-pvt:8020/hbase/sequence/8a5d1e4a300792d74f516ba26de869c8/recovered.edits/0000000000054475364,
entryStart=2336278916, pos=2336278916, end=4672557832, edit=13370
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
Checking the same timeframe in the namenode logs on lcbt006-pvt reveals no
ominous messages (no warns, errors, anything), just the same file being opened
by a different node:
2011-03-02 02:27:05,466 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=hadoop
ip=/10.56.24.13 cmd=open
src=/hbase/sequence/8a5d1e4a300792d74f516ba26de869c8/recovered.edits/0000000000054475364
dst=null perm=null
The Troubleshooting Wiki mentions it is related to swapping, but none of the
nodes are swapping - they all have plenty of RAM. Are there other common
causes? Is this anything I should be worried about or just "normal" exceptions,
anything else I should look for? I'm on cdh3b3 and will be moving to b4 once I
get a chance to run it through a test cluster.
thank you,
-chris