[jira] [Commented] (HDFS-2090) BackupNode fails when log is streamed due checksum error
[ https://issues.apache.org/jira/browse/HDFS-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071239#comment-13071239 ] Luis Ramos commented on HDFS-2090: -- Is there an official workaround for this? Only thing that worked for me was to clear the edits file in dfs.name.dir. I got this error after upgrading the cloudera release from cdh3u0 to cdh3u1. BackupNode fails when log is streamed due checksum error - Key: HDFS-2090 URL: https://issues.apache.org/jira/browse/HDFS-2090 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: André Oriani *Reproductions steps:* 1) An HDFS cluster is up and running 2) A backupnode is up, running, and registered to the namenode 3) Do a write operation like copying a file to the FS. *Expected Result:* No exception is thrown *Actual Result:* A exception is thrown due a checksum error in the streamed log: {panel:title=log| borderStyle=solid} 11/06/15 17:52:22 INFO ipc.Server: IPC Server handler 1 on 50100, call journal(NamenodeRegistration(localhost:8020, role=NameNode), 101, 164, [B@3951f910), rpc version=1, client version=5, methodsFingerPrint=302283637 from 192.168.1.102:56780: error: java.io.IOException: Error replaying edit log at offset 13 Recent opcode offsets: 1 java.io.IOException: Error replaying edit log at offset 13 Recent opcode offsets: 1 at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:514) at org.apache.hadoop.hdfs.server.namenode.BackupImage.journal(BackupImage.java:242) at org.apache.hadoop.hdfs.server.namenode.BackupNode.journal(BackupNode.java:251) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:422) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1496) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1492) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1490) Caused by: org.apache.hadoop.fs.ChecksumException: Transaction 1 is corrupt. Calculated checksum is -2116249809 but read checksum 0 at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.validateChecksum(FSEditLogLoader.java:546) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:490) ... 13 more {panel} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2090) BackupNode fails when log is streamed due checksum error
[ https://issues.apache.org/jira/browse/HDFS-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13071245#comment-13071245 ] Todd Lipcon commented on HDFS-2090: --- Luis: I think you're looking at the wrong bug. This is related to the BackupNode, which is a new feature in 0.21. CDH3 is based on 0.20 and does not include the BackupNode. Regarding this issue, it is fixed by HDFS-1073. So, when 1073 is merged, we can close this as dup. BackupNode fails when log is streamed due checksum error - Key: HDFS-2090 URL: https://issues.apache.org/jira/browse/HDFS-2090 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: André Oriani *Reproductions steps:* 1) An HDFS cluster is up and running 2) A backupnode is up, running, and registered to the namenode 3) Do a write operation like copying a file to the FS. *Expected Result:* No exception is thrown *Actual Result:* A exception is thrown due a checksum error in the streamed log: {panel:title=log| borderStyle=solid} 11/06/15 17:52:22 INFO ipc.Server: IPC Server handler 1 on 50100, call journal(NamenodeRegistration(localhost:8020, role=NameNode), 101, 164, [B@3951f910), rpc version=1, client version=5, methodsFingerPrint=302283637 from 192.168.1.102:56780: error: java.io.IOException: Error replaying edit log at offset 13 Recent opcode offsets: 1 java.io.IOException: Error replaying edit log at offset 13 Recent opcode offsets: 1 at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:514) at org.apache.hadoop.hdfs.server.namenode.BackupImage.journal(BackupImage.java:242) at org.apache.hadoop.hdfs.server.namenode.BackupNode.journal(BackupNode.java:251) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:422) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1496) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1492) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1490) Caused by: org.apache.hadoop.fs.ChecksumException: Transaction 1 is corrupt. Calculated checksum is -2116249809 but read checksum 0 at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.validateChecksum(FSEditLogLoader.java:546) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:490) ... 13 more {panel} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2090) BackupNode fails when log is streamed due checksum error
[ https://issues.apache.org/jira/browse/HDFS-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052321#comment-13052321 ] André Oriani commented on HDFS-2090: According to my investigation and the help of Ivan Kelly from Yahoo, the commit below has introduced the bug: {panel:borderStyle=solid} Commit 27b956fa62ce9b467ab7dd287dd6dcd5ab6a0cb3 Author: Hairong Kuanghair...@apache.org Date: Mon Apr 11 17:15:27 2011 + HDFS-1630. Support fsedits checksum. Contrbuted by Hairong Kuang. git-svn-id: https://svn.apache.org/repos/asf/hadoop/hdfs/trunk@109113113f79535-47bb-0310-9956-ffa450edef68 {panel} PS: This is a github commit. BackupNode fails when log is streamed due checksum error - Key: HDFS-2090 URL: https://issues.apache.org/jira/browse/HDFS-2090 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: André Oriani *Reproductions steps:* 1) An HDFS cluster is up and running 2) A backupnode is up, running, and registered to the namenode 3) Do a write operation like copying a file to the FS. *Expected Result:* No exception is thrown *Actual Result:* A exception is thrown due a checksum error in the streamed log: {panel:title=log| borderStyle=solid} 11/06/15 17:52:22 INFO ipc.Server: IPC Server handler 1 on 50100, call journal(NamenodeRegistration(localhost:8020, role=NameNode), 101, 164, [B@3951f910), rpc version=1, client version=5, methodsFingerPrint=302283637 from 192.168.1.102:56780: error: java.io.IOException: Error replaying edit log at offset 13 Recent opcode offsets: 1 java.io.IOException: Error replaying edit log at offset 13 Recent opcode offsets: 1 at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:514) at org.apache.hadoop.hdfs.server.namenode.BackupImage.journal(BackupImage.java:242) at org.apache.hadoop.hdfs.server.namenode.BackupNode.journal(BackupNode.java:251) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:422) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1496) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1492) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1490) Caused by: org.apache.hadoop.fs.ChecksumException: Transaction 1 is corrupt. Calculated checksum is -2116249809 but read checksum 0 at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.validateChecksum(FSEditLogLoader.java:546) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:490) ... 13 more {panel} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira