[jira] [Commented] (HDFS-7934) During Rolling upgrade rollback ,standby namenode startup fails.
[ https://issues.apache.org/jira/browse/HDFS-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494294#comment-14494294 ] Hadoop QA commented on HDFS-7934: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725207/HDFS-7934.2.patch against trunk revision b5a0b24. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/10273//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/10273//console This message is automatically generated. During Rolling upgrade rollback ,standby namenode startup fails. Key: HDFS-7934 URL: https://issues.apache.org/jira/browse/HDFS-7934 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Critical Attachments: HDFS-7934.1.patch, HDFS-7934.2.patch During Rolling upgrade rollback , standby namenode startup fails , while loading edits and when there is no local copy of edits created after upgrade ( which is already been removed by Active Namenode from journal manager and from Active's local). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7934) During Rolling upgrade rollback ,standby namenode startup fails.
[ https://issues.apache.org/jira/browse/HDFS-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386281#comment-14386281 ] Vinayakumar B commented on HDFS-7934: - Hi [~arpitagarwal], You have any opinions on this? During Rolling upgrade rollback ,standby namenode startup fails. Key: HDFS-7934 URL: https://issues.apache.org/jira/browse/HDFS-7934 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Critical Attachments: HDFS-7934.1.patch During Rolling upgrade rollback , standby namenode startup fails , while loading edits and when there is no local copy of edits created after upgrade ( which is already been removed by Active Namenode from journal manager and from Active's local). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7934) During Rolling upgrade rollback ,standby namenode startup fails.
[ https://issues.apache.org/jira/browse/HDFS-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367391#comment-14367391 ] Hadoop QA commented on HDFS-7934: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705338/HDFS-7934.1.patch against trunk revision 9d72f93. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9958//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9958//console This message is automatically generated. During Rolling upgrade rollback ,standby namenode startup fails. Key: HDFS-7934 URL: https://issues.apache.org/jira/browse/HDFS-7934 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Critical Attachments: HDFS-7934.1.patch During Rolling upgrade rollback , standby namenode startup fails , while loading edits and when there is no local copy of edits created after upgrade ( which is already been removed by Active Namenode from journal manager and from Active's local). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7934) During Rolling upgrade rollback ,standby namenode startup fails.
[ https://issues.apache.org/jira/browse/HDFS-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362978#comment-14362978 ] Vinayakumar B commented on HDFS-7934: - I think the problematic area is below code block in Fsimage#loadFsImage(..) {code} // For rollback in rolling upgrade, we need to set the toAtLeastTxId to // the txid right before the upgrade marker. long toAtLeastTxId = editLog.isOpenForWrite() ? inspector .getMaxSeenTxId() : 0; if (rollingRollback) { // note that the first image in imageFiles is the special checkpoint // for the rolling upgrade toAtLeastTxId = imageFiles.get(0).getCheckpointTxId() + 2; }{code} In Case of rollingRollback, there is nothing read from edits streams. So setting {{toAtLeastTxId = imageFiles.get(0).getCheckpointTxId() + 2;}} is not required. removing this line will solve the problem IMO. Any thoughts? During Rolling upgrade rollback ,standby namenode startup fails. Key: HDFS-7934 URL: https://issues.apache.org/jira/browse/HDFS-7934 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Critical During Rolling upgrade rollback , standby namenode startup fails , while loading edits and when there is no local copy of edits created after upgrade ( which is already been removed by Active Namenode from journal manager and from Active's local). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7934) During Rolling upgrade rollback ,standby namenode startup fails.
[ https://issues.apache.org/jira/browse/HDFS-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362832#comment-14362832 ] J.Andreina commented on HDFS-7934: -- Steps to Reproduce: = Step 1: Start NN1 as active , NN2 as standby . Step 2: Perform hdfs dfsadmin -rollingUpgrade prepare Step 3: Start NN2 active and NN1 as standby with rolling upgrade started option. Step 4: DN also restarted in upgrade mode. {noformat} NN2 active: -rw-r--r-- 1 Rex users 1048576 Mar 13 17:36 edits_inprogress_031 -rw-r--r-- 1 Rex users 350 Mar 13 17:33 fsimage_000 -rw-r--r-- 1 Rex users 62 Mar 13 17:33 fsimage_000.md5 -rw-r--r-- 1 Rex users 622 Mar 13 17:36 fsimage_rollback_029 -rw-r--r-- 1 Rex users 71 Mar 13 17:36 fsimage_rollback_029.md5 -rw-r--r-- 1 Rex users 2 Mar 13 17:33 seen_txid -rw-r--r-- 1 Rex users 206 Mar 13 17:36 VERSION {noformat} Step 5: NN2 active shutdown Step 6: write files {noformat} NN1 active: -rw-r--r-- 1 Rex users1817 Mar 13 17:35 edits_001-026 -rw-r--r-- 1 Rex users 67 Mar 13 17:35 edits_027-029 -rw-r--r-- 1 Rex users 1048576 Mar 13 17:35 edits_030-030 -rw-r--r-- 1 Rex users 1048576 Mar 13 17:39 edits_inprogress_032 -rw-r--r-- 1 Rex users 350 Mar 13 17:32 fsimage_000 -rw-r--r-- 1 Rex users 62 Mar 13 17:32 fsimage_000.md5 -rw-r--r-- 1 Rex users 622 Mar 13 17:36 fsimage_rollback_029 -rw-r--r-- 1 Rex users 71 Mar 13 17:36 fsimage_rollback_029.md5 -rw-r--r-- 1 Rex users 3 Mar 13 17:35 seen_txid -rw-r--r-- 1 Rex users 206 Mar 13 17:32 VERSION {noformat} Step 7: bring down both NN Step 8: Start NN2 and NN1 with rolling upgrade rollback option. Issue: == NN2 active started successfully but NN1 standby startup failed with following exception: {noformat} 15/03/13 17:41:30 ERROR namenode.NameNode: Failed to start namenode. java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 31 but unable to find any edit logs containing txid 31 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1617) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1575) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:647) {noformat} {noformat} NN2 active: -rw-r--r-- 1 Rex users 1048576 Mar 13 17:36 edits_031-031.trash -rw-r--r-- 1 Rex users 1048576 Mar 13 17:40 edits_inprogress_030 -rw-r--r-- 1 Rex users 350 Mar 13 17:33 fsimage_000 -rw-r--r-- 1 Rex users 62 Mar 13 17:33 fsimage_000.md5 -rw-r--r-- 1 Rex users 622 Mar 13 17:36 fsimage_029 -rw-r--r-- 1 Rex users 62 Mar 13 17:40 fsimage_029.md5 -rw-r--r-- 1 Rex users 2 Mar 13 17:33 seen_txid -rw-r--r-- 1 Rex users 206 Mar 13 17:40 VERSION {noformat} {noformat} NN1 standby: -rw-r--r-- 1 Rex users1817 Mar 13 17:35 edits_001-026 -rw-r--r-- 1 Rex users 67 Mar 13 17:35 edits_027-029 -rw-r--r-- 1 Rex users 1048576 Mar 13 17:35 edits_030-030 -rw-r--r-- 1 Rex users 1048576 Mar 13 17:39 edits_032-062 -rw-r--r-- 1 Rex users 350 Mar 13 17:32 fsimage_000 -rw-r--r-- 1 Rex users 62 Mar 13 17:32 fsimage_000.md5 -rw-r--r-- 1 Rex users 622 Mar 13 17:36 fsimage_rollback_029 -rw-r--r-- 1 Rex users 71 Mar 13 17:36 fsimage_rollback_029.md5 -rw-r--r-- 1 Rex users 3 Mar 13 17:35 seen_txid -rw-r--r-- 1 Rex users 206 Mar 13 17:32 VERSION {noformat} During Rolling upgrade rollback ,standby namenode startup fails. Key: HDFS-7934 URL: https://issues.apache.org/jira/browse/HDFS-7934 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Critical During Rolling upgrade rollback , standby namenode startup fails , while loading edits and when there is no local copy of edits created after upgrade ( which is already been removed by Active Namenode from journal manager and from Active's local). -- This message was sent by Atlassian JIRA (v6.3.4#6332)