[ 
https://issues.apache.org/jira/browse/HBASE-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-8253:
--------------------------------

    Attachment: HBASE-8253-94.patch

Patch for discussion.

In ReplicationSource#readAllEntriesToReplicateOrNextFile, only read for the 
first edit may throw EOF. So when we get EOF, currentNbEntries should be 0. No 
other case.
Please correct me if I am wrong.
                
> A corrupted log blocked ReplicationSource
> -----------------------------------------
>
>                 Key: HBASE-8253
>                 URL: https://issues.apache.org/jira/browse/HBASE-8253
>             Project: HBase
>          Issue Type: Bug
>          Components: Replication
>    Affects Versions: 0.94.6
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>         Attachments: HBASE-8253-94.patch
>
>
> A writting log got corrupted when we forcely power down one node. Only 
> partial of last WALEdit was written into that log. And that log was not the 
> last one in replication queue. 
> ReplicationSource was blocked under this scenario. A lot of logs like below 
> were printed:
> {noformat}
> 2013-03-30 06:53:48,628 WARN  
> [regionserver26003-EventThread.replicationSource,1] 1 Got:  
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:334)
> java.io.EOFException: 
> hdfs://hacluster/hbase/.logs/master11,26003,1364530862620/master11%2C26003%2C1364530862620.1364553936510,
>  entryStart=40434738, pos=40450048, end=40450048, edit=0
>       at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown 
> Source)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:295)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:240)
>       at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:84)
>       at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:412)
>       at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:330)
> Caused by: java.io.EOFException
>       at java.io.DataInputStream.readFully(DataInputStream.java:180)
>       at 
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
>       at 
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
>       at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2282)
>       at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2181)
>       at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2227)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:238)
>       ... 3 more
> ..........    
> 2013-03-30 06:54:38,899 WARN  
> [regionserver26003-EventThread.replicationSource,1] 1 Got:  
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:334)
> java.io.EOFException: 
> hdfs://hacluster/hbase/.logs/master11,26003,1364530862620/master11%2C26003%2C1364530862620.1364553936510,
>  entryStart=40434738, pos=40450048, end=40450048, edit=0
>       at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown 
> Source)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:295)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:240)
>       at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:84)
>       at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:412)
>       at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:330)
> Caused by: java.io.EOFException
>       at java.io.DataInputStream.readFully(DataInputStream.java:180)
>       at 
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
>       at 
> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
>       at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2282)
>       at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2181)
>       at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2227)
>       at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:238)
>       ... 3 more
> ...........   
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to