[ https://issues.apache.org/jira/browse/HBASE-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jieshan Bean updated HBASE-8253: -------------------------------- Attachment: HBASE-8253-94.patch Patch for discussion. In ReplicationSource#readAllEntriesToReplicateOrNextFile, only read for the first edit may throw EOF. So when we get EOF, currentNbEntries should be 0. No other case. Please correct me if I am wrong. > A corrupted log blocked ReplicationSource > ----------------------------------------- > > Key: HBASE-8253 > URL: https://issues.apache.org/jira/browse/HBASE-8253 > Project: HBase > Issue Type: Bug > Components: Replication > Affects Versions: 0.94.6 > Reporter: Jieshan Bean > Assignee: Jieshan Bean > Attachments: HBASE-8253-94.patch > > > A writting log got corrupted when we forcely power down one node. Only > partial of last WALEdit was written into that log. And that log was not the > last one in replication queue. > ReplicationSource was blocked under this scenario. A lot of logs like below > were printed: > {noformat} > 2013-03-30 06:53:48,628 WARN > [regionserver26003-EventThread.replicationSource,1] 1 Got: > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:334) > java.io.EOFException: > hdfs://hacluster/hbase/.logs/master11,26003,1364530862620/master11%2C26003%2C1364530862620.1364553936510, > entryStart=40434738, pos=40450048, end=40450048, edit=0 > at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:295) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:240) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:84) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:412) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:330) > Caused by: java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68) > at > org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2282) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2181) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2227) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:238) > ... 3 more > .......... > 2013-03-30 06:54:38,899 WARN > [regionserver26003-EventThread.replicationSource,1] 1 Got: > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:334) > java.io.EOFException: > hdfs://hacluster/hbase/.logs/master11,26003,1364530862620/master11%2C26003%2C1364530862620.1364553936510, > entryStart=40434738, pos=40450048, end=40450048, edit=0 > at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:295) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:240) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:84) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:412) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:330) > Caused by: java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68) > at > org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2282) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2181) > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2227) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:238) > ... 3 more > ........... > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira