[jira] (HDFS-16954) RBF: The operation of renaming a multi-subcluster directory to a single-cluster directory should throw ioexception
[ https://issues.apache.org/jira/browse/HDFS-16954 ] chuanjie.duan deleted comment on HDFS-16954: -- was (Author: chuanjie.duan): if srcdir and distdir are same (single mount point or mult mount point), is that allowd to rename? > RBF: The operation of renaming a multi-subcluster directory to a > single-cluster directory should throw ioexception > -- > > Key: HDFS-16954 > URL: https://issues.apache.org/jira/browse/HDFS-16954 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.4.0 >Reporter: Max Xie >Assignee: Max Xie >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > The operation of renaming a multi-subcluster directory to a single-cluster > directory may cause inconsistent behavior of the file system. This operation > should throw exception to be reasonable. > Examples are as follows: > 1. add hash_all mount point `hdfs dfsrouteradmin -add /tmp/foo > subcluster1,subcluster2 /tmp/foo -order HASH_ALL` > 2. add mount point `hdfs dfsrouteradmin -add /user/foo subcluster1 > /user/foo` > 3. mkdir dir for all subcluster. ` hdfs dfs -mkdir /tmp/foo/123 ` > 4. check dir and all subclusters will have dir `/tmp/foo/123` > `hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`; > `hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will show dir > `hdfs://subcluster1/tmp/foo/123`; > `hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir > `hdfs://subcluster2/tmp/foo/123`; > 5. rename `/tmp/foo/123` to `/user/foo/123`. The op will succeed. `hdfs dfs > -mv /tmp/foo/123 /user/foo/123 ` > 6. check dir again, rbf cluster still show dir `/tmp/foo/123` > `hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`; > `hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will no dirs; > `hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir > `hdfs://subcluster2/tmp/foo/123`; > The step 5 should throw exception. > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16954) RBF: The operation of renaming a multi-subcluster directory to a single-cluster directory should throw ioexception
[ https://issues.apache.org/jira/browse/HDFS-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844893#comment-17844893 ] chuanjie.duan edited comment on HDFS-16954 at 5/9/24 7:32 AM: -- if srcdir and distdir are same (single mount point or mult mount point), is that allowd to rename? was (Author: chuanjie.duan): if srcdir and distdir is same (single mount point or mult mount point), is that allowd to rename? > RBF: The operation of renaming a multi-subcluster directory to a > single-cluster directory should throw ioexception > -- > > Key: HDFS-16954 > URL: https://issues.apache.org/jira/browse/HDFS-16954 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.4.0 >Reporter: Max Xie >Assignee: Max Xie >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > The operation of renaming a multi-subcluster directory to a single-cluster > directory may cause inconsistent behavior of the file system. This operation > should throw exception to be reasonable. > Examples are as follows: > 1. add hash_all mount point `hdfs dfsrouteradmin -add /tmp/foo > subcluster1,subcluster2 /tmp/foo -order HASH_ALL` > 2. add mount point `hdfs dfsrouteradmin -add /user/foo subcluster1 > /user/foo` > 3. mkdir dir for all subcluster. ` hdfs dfs -mkdir /tmp/foo/123 ` > 4. check dir and all subclusters will have dir `/tmp/foo/123` > `hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`; > `hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will show dir > `hdfs://subcluster1/tmp/foo/123`; > `hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir > `hdfs://subcluster2/tmp/foo/123`; > 5. rename `/tmp/foo/123` to `/user/foo/123`. The op will succeed. `hdfs dfs > -mv /tmp/foo/123 /user/foo/123 ` > 6. check dir again, rbf cluster still show dir `/tmp/foo/123` > `hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`; > `hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will no dirs; > `hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir > `hdfs://subcluster2/tmp/foo/123`; > The step 5 should throw exception. > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16954) RBF: The operation of renaming a multi-subcluster directory to a single-cluster directory should throw ioexception
[ https://issues.apache.org/jira/browse/HDFS-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844893#comment-17844893 ] chuanjie.duan commented on HDFS-16954: -- if srcdir and distdir is same (single mount point or mult mount point), is that allowd to rename? > RBF: The operation of renaming a multi-subcluster directory to a > single-cluster directory should throw ioexception > -- > > Key: HDFS-16954 > URL: https://issues.apache.org/jira/browse/HDFS-16954 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.4.0 >Reporter: Max Xie >Assignee: Max Xie >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > The operation of renaming a multi-subcluster directory to a single-cluster > directory may cause inconsistent behavior of the file system. This operation > should throw exception to be reasonable. > Examples are as follows: > 1. add hash_all mount point `hdfs dfsrouteradmin -add /tmp/foo > subcluster1,subcluster2 /tmp/foo -order HASH_ALL` > 2. add mount point `hdfs dfsrouteradmin -add /user/foo subcluster1 > /user/foo` > 3. mkdir dir for all subcluster. ` hdfs dfs -mkdir /tmp/foo/123 ` > 4. check dir and all subclusters will have dir `/tmp/foo/123` > `hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`; > `hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will show dir > `hdfs://subcluster1/tmp/foo/123`; > `hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir > `hdfs://subcluster2/tmp/foo/123`; > 5. rename `/tmp/foo/123` to `/user/foo/123`. The op will succeed. `hdfs dfs > -mv /tmp/foo/123 /user/foo/123 ` > 6. check dir again, rbf cluster still show dir `/tmp/foo/123` > `hdfs dfs -ls /tmp/foo/` : will show dir `/tmp/foo/123`; > `hdfs dfs -ls hdfs://subcluster1/tmp/foo/` : will no dirs; > `hdfs dfs -ls hdfs://subcluster2/tmp/foo/` : will show dir > `hdfs://subcluster2/tmp/foo/123`; > The step 5 should throw exception. > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15555) RBF: Refresh cacheNS when SocketException occurs
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840305#comment-17840305 ] chuanjie.duan commented on HDFS-1: -- [~elgoiri] [~aajisaka] not sure why delete "ioe instanceof ConnectException" > RBF: Refresh cacheNS when SocketException occurs > > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.3.1, 3.4.0 > Environment: HDFS 3.3.0, Java 11 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Problem: > When active NameNode is restarted and loading fsimage, DFSRouters > significantly slow down. > Investigation: > When active NameNode is restarted and loading fsimage, RouterRpcClient > receives SocketException. Since > RouterRpcClient#isUnavailableException(IOException) returns false when the > argument is SocketException, the MembershipNameNodeResolver#cacheNS is not > refreshed. That's why the order of the NameNodes returned by > MemberShipNameNodeResolver#getNamenodesForNameserviceId(String) is unchanged > and the active NameNode is still returned first. Therefore RouterRpcClient > still tries to connect to the NameNode that is loading fsimage. > After loading the fsimage, the NameNode throws StandbyException. The > exception is one of the 'Unavailable Exception' and the cacheNS is refreshed. > Workaround: > Stop NameNode and wait 1 minute before starting NameNode instead of > restarting. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11737) Backport HDFS-7964 to branch-2.7: add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17504095#comment-17504095 ] chuanjie.duan commented on HDFS-11737: -- [~zhz] , your patch is missing start thread. It worked when I overrided method "openForWrite" class FSEditLogAsync extends FSEditLog implements Runnable { .. _@Override_ void openForWrite(int layoutVersion) throws IOException { try { startSyncThread(); super.openForWrite(layoutVersion); } catch (IOException ioe) { stopSyncThread(); throw ioe; } } .. > Backport HDFS-7964 to branch-2.7: add support for async edit logging > > > Key: HDFS-11737 > URL: https://issues.apache.org/jira/browse/HDFS-11737 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha, namenode >Reporter: Zhe Zhang >Assignee: Zhe Zhang >Priority: Critical > Attachments: HDFS-11737-branch-2.7.00.patch > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17464552#comment-17464552 ] chuanjie.duan commented on HDFS-5920: - [~jingzhao] "3. This NN will load the special fsimage right before the upgrade marker, then discard all the editlog segments after the txid of the fsimage" Can we keep these editlog segments for replay on rollback_fsimage or startup with lastest fsimage and old hadoop version. I mean, "rollback" only rollback hadoop lib version and not lost data > Support rollback of rolling upgrade in NameNode and JournalNodes > > > Key: HDFS-5920 > URL: https://issues.apache.org/jira/browse/HDFS-5920 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: journal-node, namenode >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Major > Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch, > HDFS-5920.001.patch, HDFS-5920.002.patch, HDFS-5920.003.patch > > > This jira provides rollback functionality for NameNode and JournalNode in > rolling upgrade. > Currently the proposed rollback for rolling upgrade is: > 1. Shutdown both NN > 2. Start one of the NN using "-rollingUpgrade rollback" option > 3. This NN will load the special fsimage right before the upgrade marker, > then discard all the editlog segments after the txid of the fsimage > 4. The NN will also send RPC requests to all the JNs to discard editlog > segments. This call expects response from all the JNs. The NN will keep > running if the call succeeds. > 5. We start the other NN using bootstrapstandby rather than "-rollingUpgrade > rollback" -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16349) FSEditLog checkForGaps break HDFS RollingUpgrade Rollback
[ https://issues.apache.org/jira/browse/HDFS-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chuanjie.duan updated HDFS-16349: - Fix Version/s: (was: 3.2.2) (was: 3.2.3) Affects Version/s: 3.3.1 3.3.2 > FSEditLog checkForGaps break HDFS RollingUpgrade Rollback > - > > Key: HDFS-16349 > URL: https://issues.apache.org/jira/browse/HDFS-16349 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.2, 3.3.1, 3.2.3, 3.3.2 >Reporter: chuanjie.duan >Priority: Blocker > Attachments: HDFS-16349-branch-3.2.3.patch > > > 2021-11-22 20:36:44,440 INFO > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest > log: 10.65.57.133:8485=segmentState { > startTxId: 3906965 > endTxId: 3906965 > isInProgress: false > } > lastWriterEpoch: 5 > lastCommittedTxId: 3906964 > 2021-11-22 20:36:44,457 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /data12/data/flashHadoopU/namenode/current > 2021-11-22 20:36:44,495 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits > file > /data12/data/flashHadoopU/namenode/current/edits_inprogress_3898378 > -> > /data12/data/flashHadoopU/namenode/current/edits_3898378-3898412 > 2021-11-22 20:36:44,657 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: Gap in transactions. Expected to be able to read up > until at least txid 2510934 but unable to find any edit logs containing txid > 2510933 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:812) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559) > 2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped > HttpServer2$selectchannelconnectorwithsafestar...@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070 > 2021-11-22 20:36:44,760 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics > system... > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > stopped. > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > shutdown complete. > 2021-11-22 20:36:44,761 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. > Old version: 2.7.3 > New version: 3.2.2 > Steps to Reproduce > Step 1: Start NN1 as active , NN2 as standby . > Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare" > Step 3: Start NN2 active and NN1 as standby with rolling upgrade started > option. > Step 4: DN also restarted in upgrade mode. > Step 5: Restart journalnode with new hadoop version > Step 6: a few days later > Step 7: bring down both NN, journalnode, DN > Step 8: Start JN with old version > Step 9: Start NN1 with rolling upgrade rollback option. nn started failed > with above ERROR(Above mentioned txid version 2510933 has been deleted > because of checkpoint mechanism) > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16349) FSEditLog checkForGaps break HDFS RollingUpgrade Rollback
[ https://issues.apache.org/jira/browse/HDFS-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chuanjie.duan updated HDFS-16349: - Attachment: HDFS-16349-branch-3.2.3.patch Fix Version/s: 3.2.3 3.2.2 Status: Patch Available (was: Open) > FSEditLog checkForGaps break HDFS RollingUpgrade Rollback > - > > Key: HDFS-16349 > URL: https://issues.apache.org/jira/browse/HDFS-16349 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.2, 3.2.3 >Reporter: chuanjie.duan >Priority: Blocker > Fix For: 3.2.3, 3.2.2 > > Attachments: HDFS-16349-branch-3.2.3.patch > > > 2021-11-22 20:36:44,440 INFO > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest > log: 10.65.57.133:8485=segmentState { > startTxId: 3906965 > endTxId: 3906965 > isInProgress: false > } > lastWriterEpoch: 5 > lastCommittedTxId: 3906964 > 2021-11-22 20:36:44,457 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /data12/data/flashHadoopU/namenode/current > 2021-11-22 20:36:44,495 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits > file > /data12/data/flashHadoopU/namenode/current/edits_inprogress_3898378 > -> > /data12/data/flashHadoopU/namenode/current/edits_3898378-3898412 > 2021-11-22 20:36:44,657 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: Gap in transactions. Expected to be able to read up > until at least txid 2510934 but unable to find any edit logs containing txid > 2510933 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:812) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559) > 2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped > HttpServer2$selectchannelconnectorwithsafestar...@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070 > 2021-11-22 20:36:44,760 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics > system... > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > stopped. > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > shutdown complete. > 2021-11-22 20:36:44,761 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. > Old version: 2.7.3 > New version: 3.2.2 > Steps to Reproduce > Step 1: Start NN1 as active , NN2 as standby . > Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare" > Step 3: Start NN2 active and NN1 as standby with rolling upgrade started > option. > Step 4: DN also restarted in upgrade mode. > Step 5: Restart journalnode with new hadoop version > Step 6: a few days later > Step 7: bring down both NN, journalnode, DN > Step 8: Start JN with old version > Step 9: Start NN1 with rolling upgrade rollback option. nn started failed > with above ERROR(Above mentioned txid version 2510933 has been deleted > because of checkpoint mechanism) > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16349) FSEditLog checkForGaps break HDFS RollingUpgrade Rollback
[ https://issues.apache.org/jira/browse/HDFS-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chuanjie.duan updated HDFS-16349: - Attachment: (was: HDFS-16349-branch-3.2.3.patch) > FSEditLog checkForGaps break HDFS RollingUpgrade Rollback > - > > Key: HDFS-16349 > URL: https://issues.apache.org/jira/browse/HDFS-16349 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.2, 3.2.3 >Reporter: chuanjie.duan >Priority: Blocker > > 2021-11-22 20:36:44,440 INFO > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest > log: 10.65.57.133:8485=segmentState { > startTxId: 3906965 > endTxId: 3906965 > isInProgress: false > } > lastWriterEpoch: 5 > lastCommittedTxId: 3906964 > 2021-11-22 20:36:44,457 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /data12/data/flashHadoopU/namenode/current > 2021-11-22 20:36:44,495 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits > file > /data12/data/flashHadoopU/namenode/current/edits_inprogress_3898378 > -> > /data12/data/flashHadoopU/namenode/current/edits_3898378-3898412 > 2021-11-22 20:36:44,657 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: Gap in transactions. Expected to be able to read up > until at least txid 2510934 but unable to find any edit logs containing txid > 2510933 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:812) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559) > 2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped > HttpServer2$selectchannelconnectorwithsafestar...@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070 > 2021-11-22 20:36:44,760 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics > system... > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > stopped. > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > shutdown complete. > 2021-11-22 20:36:44,761 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. > Old version: 2.7.3 > New version: 3.2.2 > Steps to Reproduce > Step 1: Start NN1 as active , NN2 as standby . > Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare" > Step 3: Start NN2 active and NN1 as standby with rolling upgrade started > option. > Step 4: DN also restarted in upgrade mode. > Step 5: Restart journalnode with new hadoop version > Step 6: a few days later > Step 7: bring down both NN, journalnode, DN > Step 8: Start JN with old version > Step 9: Start NN1 with rolling upgrade rollback option. nn started failed > with above ERROR(Above mentioned txid version 2510933 has been deleted > because of checkpoint mechanism) > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16349) FSEditLog checkForGaps break HDFS RollingUpgrade Rollback
[ https://issues.apache.org/jira/browse/HDFS-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450297#comment-17450297 ] chuanjie.duan commented on HDFS-16349: -- Upload Patch,I just remove "+ 2" for esclipe checkForGaps, because we won't load any editlog > FSEditLog checkForGaps break HDFS RollingUpgrade Rollback > - > > Key: HDFS-16349 > URL: https://issues.apache.org/jira/browse/HDFS-16349 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.2, 3.2.3 >Reporter: chuanjie.duan >Priority: Blocker > Attachments: HDFS-16349-branch-3.2.3.patch > > > 2021-11-22 20:36:44,440 INFO > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest > log: 10.65.57.133:8485=segmentState { > startTxId: 3906965 > endTxId: 3906965 > isInProgress: false > } > lastWriterEpoch: 5 > lastCommittedTxId: 3906964 > 2021-11-22 20:36:44,457 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /data12/data/flashHadoopU/namenode/current > 2021-11-22 20:36:44,495 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits > file > /data12/data/flashHadoopU/namenode/current/edits_inprogress_3898378 > -> > /data12/data/flashHadoopU/namenode/current/edits_3898378-3898412 > 2021-11-22 20:36:44,657 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: Gap in transactions. Expected to be able to read up > until at least txid 2510934 but unable to find any edit logs containing txid > 2510933 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:812) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559) > 2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped > HttpServer2$selectchannelconnectorwithsafestar...@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070 > 2021-11-22 20:36:44,760 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics > system... > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > stopped. > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > shutdown complete. > 2021-11-22 20:36:44,761 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. > Old version: 2.7.3 > New version: 3.2.2 > Steps to Reproduce > Step 1: Start NN1 as active , NN2 as standby . > Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare" > Step 3: Start NN2 active and NN1 as standby with rolling upgrade started > option. > Step 4: DN also restarted in upgrade mode. > Step 5: Restart journalnode with new hadoop version > Step 6: a few days later > Step 7: bring down both NN, journalnode, DN > Step 8: Start JN with old version > Step 9: Start NN1 with rolling upgrade rollback option. nn started failed > with above ERROR(Above mentioned txid version 2510933 has been deleted > because of checkpoint mechanism) > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16349) FSEditLog checkForGaps break HDFS RollingUpgrade Rollback
[ https://issues.apache.org/jira/browse/HDFS-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chuanjie.duan updated HDFS-16349: - Affects Version/s: 3.2.3 > FSEditLog checkForGaps break HDFS RollingUpgrade Rollback > - > > Key: HDFS-16349 > URL: https://issues.apache.org/jira/browse/HDFS-16349 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.2, 3.2.3 >Reporter: chuanjie.duan >Priority: Blocker > Attachments: HDFS-16349-branch-3.2.3.patch > > > 2021-11-22 20:36:44,440 INFO > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest > log: 10.65.57.133:8485=segmentState { > startTxId: 3906965 > endTxId: 3906965 > isInProgress: false > } > lastWriterEpoch: 5 > lastCommittedTxId: 3906964 > 2021-11-22 20:36:44,457 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /data12/data/flashHadoopU/namenode/current > 2021-11-22 20:36:44,495 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits > file > /data12/data/flashHadoopU/namenode/current/edits_inprogress_3898378 > -> > /data12/data/flashHadoopU/namenode/current/edits_3898378-3898412 > 2021-11-22 20:36:44,657 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: Gap in transactions. Expected to be able to read up > until at least txid 2510934 but unable to find any edit logs containing txid > 2510933 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:812) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559) > 2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped > HttpServer2$selectchannelconnectorwithsafestar...@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070 > 2021-11-22 20:36:44,760 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics > system... > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > stopped. > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > shutdown complete. > 2021-11-22 20:36:44,761 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. > Old version: 2.7.3 > New version: 3.2.2 > Steps to Reproduce > Step 1: Start NN1 as active , NN2 as standby . > Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare" > Step 3: Start NN2 active and NN1 as standby with rolling upgrade started > option. > Step 4: DN also restarted in upgrade mode. > Step 5: Restart journalnode with new hadoop version > Step 6: a few days later > Step 7: bring down both NN, journalnode, DN > Step 8: Start JN with old version > Step 9: Start NN1 with rolling upgrade rollback option. nn started failed > with above ERROR(Above mentioned txid version 2510933 has been deleted > because of checkpoint mechanism) > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16349) FSEditLog checkForGaps break HDFS RollingUpgrade Rollback
[ https://issues.apache.org/jira/browse/HDFS-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chuanjie.duan updated HDFS-16349: - Attachment: HDFS-16349-branch-3.2.3.patch > FSEditLog checkForGaps break HDFS RollingUpgrade Rollback > - > > Key: HDFS-16349 > URL: https://issues.apache.org/jira/browse/HDFS-16349 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.2, 3.2.3 >Reporter: chuanjie.duan >Priority: Blocker > Attachments: HDFS-16349-branch-3.2.3.patch > > > 2021-11-22 20:36:44,440 INFO > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest > log: 10.65.57.133:8485=segmentState { > startTxId: 3906965 > endTxId: 3906965 > isInProgress: false > } > lastWriterEpoch: 5 > lastCommittedTxId: 3906964 > 2021-11-22 20:36:44,457 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /data12/data/flashHadoopU/namenode/current > 2021-11-22 20:36:44,495 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits > file > /data12/data/flashHadoopU/namenode/current/edits_inprogress_3898378 > -> > /data12/data/flashHadoopU/namenode/current/edits_3898378-3898412 > 2021-11-22 20:36:44,657 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: Gap in transactions. Expected to be able to read up > until at least txid 2510934 but unable to find any edit logs containing txid > 2510933 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:812) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559) > 2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped > HttpServer2$selectchannelconnectorwithsafestar...@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070 > 2021-11-22 20:36:44,760 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics > system... > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > stopped. > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > shutdown complete. > 2021-11-22 20:36:44,761 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. > Old version: 2.7.3 > New version: 3.2.2 > Steps to Reproduce > Step 1: Start NN1 as active , NN2 as standby . > Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare" > Step 3: Start NN2 active and NN1 as standby with rolling upgrade started > option. > Step 4: DN also restarted in upgrade mode. > Step 5: Restart journalnode with new hadoop version > Step 6: a few days later > Step 7: bring down both NN, journalnode, DN > Step 8: Start JN with old version > Step 9: Start NN1 with rolling upgrade rollback option. nn started failed > with above ERROR(Above mentioned txid version 2510933 has been deleted > because of checkpoint mechanism) > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16349) FSEditLog checkForGaps break HDFS RollingUpgrade Rollback
[ https://issues.apache.org/jira/browse/HDFS-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447862#comment-17447862 ] chuanjie.duan commented on HDFS-16349: -- when I change to -2 if (rollingRollback) { // note that the first image in imageFiles is the special checkpoint // for the rolling upgrade toAtLeastTxId = imageFiles.get(0).getCheckpointTxId() - 2; } checkForGaps would go into if (txId > toAtLeastTxId) return; At last, it worked > FSEditLog checkForGaps break HDFS RollingUpgrade Rollback > - > > Key: HDFS-16349 > URL: https://issues.apache.org/jira/browse/HDFS-16349 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.2 >Reporter: chuanjie.duan >Priority: Blocker > > 2021-11-22 20:36:44,440 INFO > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest > log: 10.65.57.133:8485=segmentState { > startTxId: 3906965 > endTxId: 3906965 > isInProgress: false > } > lastWriterEpoch: 5 > lastCommittedTxId: 3906964 > 2021-11-22 20:36:44,457 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /data12/data/flashHadoopU/namenode/current > 2021-11-22 20:36:44,495 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits > file > /data12/data/flashHadoopU/namenode/current/edits_inprogress_3898378 > -> > /data12/data/flashHadoopU/namenode/current/edits_3898378-3898412 > 2021-11-22 20:36:44,657 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: Gap in transactions. Expected to be able to read up > until at least txid 2510934 but unable to find any edit logs containing txid > 2510933 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:812) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559) > 2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped > HttpServer2$selectchannelconnectorwithsafestar...@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070 > 2021-11-22 20:36:44,760 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics > system... > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > stopped. > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > shutdown complete. > 2021-11-22 20:36:44,761 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. > Old version: 2.7.3 > New version: 3.2.2 > Steps to Reproduce > Step 1: Start NN1 as active , NN2 as standby . > Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare" > Step 3: Start NN2 active and NN1 as standby with rolling upgrade started > option. > Step 4: DN also restarted in upgrade mode. > Step 5: Restart journalnode with new hadoop version > Step 6: a few days later > Step 7: bring down both NN, journalnode, DN > Step 8: Start JN with old version > Step 9: Start NN1 with rolling upgrade rollback option. nn started failed > with above ERROR(Above mentioned txid version 2510933 has been deleted > because of checkpoint mechanism) > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16349) FSEditLog checkForGaps break HDFS RollingUpgrade Rollback
[ https://issues.apache.org/jira/browse/HDFS-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447802#comment-17447802 ] chuanjie.duan commented on HDFS-16349: -- private void checkForGaps(List streams, long fromTxId, long toAtLeastTxId, boolean inProgressOk) throws IOException { Iterator iter = streams.iterator(); long txId = fromTxId; while (true) { if (txId > toAtLeastTxId) return; if (!iter.hasNext()) break; EditLogInputStream elis = iter.next(); if (elis.getFirstTxId() > txId) break; .. return; } txId = next + 1; } throw new IOException(String._format_("Gap in transactions. Expected to " + "be able to read up until at least txid %d but unable to find any " + "edit logs containing txid %d", toAtLeastTxId, txId)); } so when rollback txId alway small than toAtLeastTxId, and elis.getFirstTxId() always large than txId. private boolean loadFSImage(FSNamesystem target, _StartupOption_ startOpt, MetaRecoveryContext recovery) throws IOException { .. if (rollingRollback) { // note that the first image in imageFiles is the special checkpoint // for the rolling upgrade toAtLeastTxId = imageFiles.get(0).getCheckpointTxId() + 2; // Is here should be - 2? } > FSEditLog checkForGaps break HDFS RollingUpgrade Rollback > - > > Key: HDFS-16349 > URL: https://issues.apache.org/jira/browse/HDFS-16349 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.2 >Reporter: chuanjie.duan >Priority: Blocker > > 2021-11-22 20:36:44,440 INFO > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest > log: 10.65.57.133:8485=segmentState { > startTxId: 3906965 > endTxId: 3906965 > isInProgress: false > } > lastWriterEpoch: 5 > lastCommittedTxId: 3906964 > 2021-11-22 20:36:44,457 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /data12/data/flashHadoopU/namenode/current > 2021-11-22 20:36:44,495 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits > file > /data12/data/flashHadoopU/namenode/current/edits_inprogress_3898378 > -> > /data12/data/flashHadoopU/namenode/current/edits_3898378-3898412 > 2021-11-22 20:36:44,657 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: Gap in transactions. Expected to be able to read up > until at least txid 2510934 but unable to find any edit logs containing txid > 2510933 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:812) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559) > 2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped > HttpServer2$selectchannelconnectorwithsafestar...@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070 > 2021-11-22 20:36:44,760 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics > system... > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > stopped. > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > shutdown complete. > 2021-11-22 20:36:44,761 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. > Old version: 2.7.3 > New version: 3.2.2 > Steps to Reproduce > Step 1: Start NN1 as active , NN2 as standby . > Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare" > Step 3: Start NN2 active and NN1 as standby with rolling upgrade started > option. > Step 4: DN also restarted in upgrade mode. > Step 5: Restart journalnode with new hadoop version >
[jira] [Commented] (HDFS-16349) FSEditLog checkForGaps break HDFS RollingUpgrade Rollback
[ https://issues.apache.org/jira/browse/HDFS-16349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447800#comment-17447800 ] chuanjie.duan commented on HDFS-16349: -- 2021-11-23 14:40:42,777 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: rollingRollback true 2021-11-23 14:40:42,777 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: toAtLeastTxId 2510934 2021-11-23 14:40:42,777 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: imageFiles.get(0).getCheckpointTxId() + 1 25109321 2021-11-23 14:40:42,861 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: iter.hasNext() true 2021-11-23 14:40:42,861 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: elis.getFirstTxId() 2844451 2021-11-23 14:40:42,965 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage > FSEditLog checkForGaps break HDFS RollingUpgrade Rollback > - > > Key: HDFS-16349 > URL: https://issues.apache.org/jira/browse/HDFS-16349 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.2.2 >Reporter: chuanjie.duan >Priority: Blocker > > 2021-11-22 20:36:44,440 INFO > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest > log: 10.65.57.133:8485=segmentState { > startTxId: 3906965 > endTxId: 3906965 > isInProgress: false > } > lastWriterEpoch: 5 > lastCommittedTxId: 3906964 > 2021-11-22 20:36:44,457 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering > unfinalized segments in /data12/data/flashHadoopU/namenode/current > 2021-11-22 20:36:44,495 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits > file > /data12/data/flashHadoopU/namenode/current/edits_inprogress_3898378 > -> > /data12/data/flashHadoopU/namenode/current/edits_3898378-3898412 > 2021-11-22 20:36:44,657 WARN > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception > loading fsimage > java.io.IOException: Gap in transactions. Expected to be able to read up > until at least txid 2510934 but unable to find any edit logs containing txid > 2510933 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:812) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559) > 2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped > HttpServer2$selectchannelconnectorwithsafestar...@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070 > 2021-11-22 20:36:44,760 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics > system... > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > stopped. > 2021-11-22 20:36:44,761 INFO > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system > shutdown complete. > 2021-11-22 20:36:44,761 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. > Old version: 2.7.3 > New version: 3.2.2 > Steps to Reproduce > Step 1: Start NN1 as active , NN2 as standby . > Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare" > Step 3: Start NN2 active and NN1 as standby with rolling upgrade started > option. > Step 4: DN also restarted in upgrade mode. > Step 5: Restart journalnode with new hadoop version > Step 6: a few days later > Step 7: bring down both NN, journalnode, DN > Step 8: Start JN with old version > Step 9: Start NN1 with rolling upgrade rollback option. nn started failed > with above ERROR(Above mentioned txid version 2510933 has been deleted > because of checkpoint mechanism) > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Created] (HDFS-16349) FSEditLog checkForGaps break HDFS RollingUpgrade Rollback
chuanjie.duan created HDFS-16349: Summary: FSEditLog checkForGaps break HDFS RollingUpgrade Rollback Key: HDFS-16349 URL: https://issues.apache.org/jira/browse/HDFS-16349 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.2.2 Reporter: chuanjie.duan 2021-11-22 20:36:44,440 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest log: 10.65.57.133:8485=segmentState { startTxId: 3906965 endTxId: 3906965 isInProgress: false } lastWriterEpoch: 5 lastCommittedTxId: 3906964 2021-11-22 20:36:44,457 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering unfinalized segments in /data12/data/flashHadoopU/namenode/current 2021-11-22 20:36:44,495 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /data12/data/flashHadoopU/namenode/current/edits_inprogress_3898378 -> /data12/data/flashHadoopU/namenode/current/edits_3898378-3898412 2021-11-22 20:36:44,657 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage java.io.IOException: Gap in transactions. Expected to be able to read up until at least txid 2510934 but unable to find any edit logs containing txid 2510933 at org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1578) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1536) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:976) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:812) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:796) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559) 2021-11-22 20:36:44,660 INFO org.mortbay.log: Stopped HttpServer2$selectchannelconnectorwithsafestar...@pro-hadoop-dc01-057133.vm.dc01.hellocloud.tech:50070 2021-11-22 20:36:44,760 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system... 2021-11-22 20:36:44,761 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped. 2021-11-22 20:36:44,761 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete. 2021-11-22 20:36:44,761 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. Old version: 2.7.3 New version: 3.2.2 Steps to Reproduce Step 1: Start NN1 as active , NN2 as standby . Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare" Step 3: Start NN2 active and NN1 as standby with rolling upgrade started option. Step 4: DN also restarted in upgrade mode. Step 5: Restart journalnode with new hadoop version Step 6: a few days later Step 7: bring down both NN, journalnode, DN Step 8: Start JN with old version Step 9: Start NN1 with rolling upgrade rollback option. nn started failed with above ERROR(Above mentioned txid version 2510933 has been deleted because of checkpoint mechanism) -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15658) Improve datanode capability balancing
[ https://issues.apache.org/jira/browse/HDFS-15658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chuanjie.duan updated HDFS-15658: - Attachment: HDFS-15658-branch-2.7.patch Status: Patch Available (was: Open) > Improve datanode capability balancing > - > > Key: HDFS-15658 > URL: https://issues.apache.org/jira/browse/HDFS-15658 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: chuanjie.duan >Priority: Major > Attachments: HDFS-15658-branch-2.7.patch > > > How about adjust the order of choosing replication to deletion? > Is there any other meaning, choosing "oldestHeartbeatStorage" first? > > public DatanodeStorageInfo chooseReplicaToDelete( > Collection moreThanOne, > Collection exactlyOne, > final List excessTypes, > Map> rackMap) { > .. > final DatanodeStorageInfo storage; > if (minSpaceStorage != null) { > storage = minSpaceStorage; > } else if (oldestHeartbeatStorage != null) { > storage = oldestHeartbeatStorage; > } else { > return null; > } > excessTypes.remove(storage.getStorageType()); > return storage; > } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15658) Improve datanode capability balancing
chuanjie.duan created HDFS-15658: Summary: Improve datanode capability balancing Key: HDFS-15658 URL: https://issues.apache.org/jira/browse/HDFS-15658 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: chuanjie.duan How about adjust the order of choosing replication to deletion? Is there any other meaning, choosing "oldestHeartbeatStorage" first? public DatanodeStorageInfo chooseReplicaToDelete( Collection moreThanOne, Collection exactlyOne, final List excessTypes, Map> rackMap) { .. final DatanodeStorageInfo storage; if (minSpaceStorage != null) { storage = minSpaceStorage; } else if (oldestHeartbeatStorage != null) { storage = oldestHeartbeatStorage; } else { return null; } excessTypes.remove(storage.getStorageType()); return storage; } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13037) Support protected path configuration
[ https://issues.apache.org/jira/browse/HDFS-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chuanjie.duan updated HDFS-13037: - Description: After Hadoop2.7 root path("/") cannot be deleted for any situration. But like '/tmp','/user','/user/hive/warehouse' and so on, shouldn't be deleted mostly. So can we let user config their own custom protected path. Just for any accident. 1. add configuration to hdfs-site.xml 2. add a command in dfsadmin for refreshing was: After Hadoop2.7 root path("/") cannot be deleted for any situration. But like '/tmp','/user','/user/hive/warehouse' and so on, shouldn't be deleted mostly. So can we add a configuration, then user can custom their own protected path. Just for any accident. 1. add configuration to hdfs-site.xml 2. add a command in dfsadmin for refreshing > Support protected path configuration > > > Key: HDFS-13037 > URL: https://issues.apache.org/jira/browse/HDFS-13037 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode >Reporter: chuanjie.duan >Priority: Major > > After Hadoop2.7 root path("/") cannot be deleted for any situration. But like > '/tmp','/user','/user/hive/warehouse' and so on, shouldn't be deleted mostly. > So can we let user config their own custom protected path. Just for any > accident. > 1. add configuration to hdfs-site.xml > 2. add a command in dfsadmin for refreshing -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-13037) Support protected path configuration
chuanjie.duan created HDFS-13037: Summary: Support protected path configuration Key: HDFS-13037 URL: https://issues.apache.org/jira/browse/HDFS-13037 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Reporter: chuanjie.duan After Hadoop2.7 root path("/") cannot be deleted for any situration. But like '/tmp','/user','/user/hive/warehouse' and so on, shouldn't be deleted mostly. So can we add a configuration, then user can custom their own protected path. Just for any accident. 1. add configuration to hdfs-site.xml 2. add a command in dfsadmin for refreshing -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org