[jira] [Resolved] (HDFS-14587) Support fail fast when client wait ACK by pipeline over threshold
[ https://issues.apache.org/jira/browse/HDFS-14587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HDFS-14587. Resolution: Duplicate I believe this is a dup of HDFS-8311, so I"ll resolve this one. Feel free to reopen if I am wrong. > Support fail fast when client wait ACK by pipeline over threshold > - > > Key: HDFS-14587 > URL: https://issues.apache.org/jira/browse/HDFS-14587 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > > Recently, I meet corner case that client wait for data to be acknowledged by > pipeline over 9 hours. After check branch trunk, I think this issue still > exist. So I propose to add threshold about wait timeout then fail fast. > {code:java} > 2019-06-18 12:53:46,217 WARN [Thread-127] org.apache.hadoop.hdfs.DFSClient: > Slow waitForAckedSeqno took 35560718ms (threshold=3ms) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15075) Remove process command timing from BPServiceActor
[ https://issues.apache.org/jira/browse/HDFS-15075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065343#comment-17065343 ] Wei-Chiu Chuang commented on HDFS-15075: Great work. I wish the extra metrics are a separate jira. Looking at the jira summary I wouldn't know the bulk of the change is for adding extra metrics. Also, the added metrics are a little similar/overlap with the per volume file IO metrics added in HDFS-10959 (which is not enabled by default). > Remove process command timing from BPServiceActor > - > > Key: HDFS-15075 > URL: https://issues.apache.org/jira/browse/HDFS-15075 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15075.001.patch, HDFS-15075.002.patch, > HDFS-15075.003.patch, HDFS-15075.004.patch, HDFS-15075.005.patch, > HDFS-15075.006.patch, HDFS-15075.007.patch > > > HDFS-14997 moved the command processing into async. > Right now, we are checking the time to add to a queue. > We should remove this one and maybe move the timing within the thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15088) RBF: Correct annotation typo of RouterPermissionChecker#checkPermission
[ https://issues.apache.org/jira/browse/HDFS-15088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065339#comment-17065339 ] Hudson commented on HDFS-15088: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #18077 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18077/]) HDFS-15088. RBF: Correct annotation typo of (weichiu: rev 5eddc82fb85d2f60bf13b1eca03d78deb8981d51) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterPermissionChecker.java > RBF: Correct annotation typo of RouterPermissionChecker#checkPermission > --- > > Key: HDFS-15088 > URL: https://issues.apache.org/jira/browse/HDFS-15088 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Trivial > Fix For: 3.3.0 > > Attachments: HDFS-15088.patch > > > Correct annotation typo of RouterPermissionChecker#checkPermission. > {code:java} > /** >* Whether a mount table entry can be accessed by the current context. >* >* @param mountTable >* MountTable being accessed >* @param access >* type of action being performed on the cache pool >* @throws AccessControlException >* if mount table cannot be accessed >*/ > public void checkPermission(MountTable mountTable, FsAction access) > throws AccessControlException { > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15088) RBF: Correct annotation typo of RouterPermissionChecker#checkPermission
[ https://issues.apache.org/jira/browse/HDFS-15088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15088: --- Fix Version/s: 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) > RBF: Correct annotation typo of RouterPermissionChecker#checkPermission > --- > > Key: HDFS-15088 > URL: https://issues.apache.org/jira/browse/HDFS-15088 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Trivial > Fix For: 3.3.0 > > Attachments: HDFS-15088.patch > > > Correct annotation typo of RouterPermissionChecker#checkPermission. > {code:java} > /** >* Whether a mount table entry can be accessed by the current context. >* >* @param mountTable >* MountTable being accessed >* @param access >* type of action being performed on the cache pool >* @throws AccessControlException >* if mount table cannot be accessed >*/ > public void checkPermission(MountTable mountTable, FsAction access) > throws AccessControlException { > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15088) RBF: Correct annotation typo of RouterPermissionChecker#checkPermission
[ https://issues.apache.org/jira/browse/HDFS-15088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065335#comment-17065335 ] Wei-Chiu Chuang commented on HDFS-15088: +1 > RBF: Correct annotation typo of RouterPermissionChecker#checkPermission > --- > > Key: HDFS-15088 > URL: https://issues.apache.org/jira/browse/HDFS-15088 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Trivial > Attachments: HDFS-15088.patch > > > Correct annotation typo of RouterPermissionChecker#checkPermission. > {code:java} > /** >* Whether a mount table entry can be accessed by the current context. >* >* @param mountTable >* MountTable being accessed >* @param access >* type of action being performed on the cache pool >* @throws AccessControlException >* if mount table cannot be accessed >*/ > public void checkPermission(MountTable mountTable, FsAction access) > throws AccessControlException { > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065332#comment-17065332 ] Wei-Chiu Chuang commented on HDFS-15160: Looks correct to me. nit: {code} /** * Acquire the lock of the data set. */ AutoCloseableLock acquireDatasetLock(); /*** * Acquire the read lock of the data set. * @return The AutoClosable read lock instance. */ AutoCloseableLock acquireDatasetReadLock(); {code} It would be great to make the javadoc more clear of the expected behavior. With the read lock, it is expected that the block map does it change; however, the Block data structure may be updated (update gen stamp, etc). Am I correct? > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, > HDFS-15160.003.patch > > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15236) Upgrade googletest to the latest version
[ https://issues.apache.org/jira/browse/HDFS-15236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15236: - Issue Type: Improvement (was: Bug) > Upgrade googletest to the latest version > > > Key: HDFS-15236 > URL: https://issues.apache.org/jira/browse/HDFS-15236 > Project: Hadoop HDFS > Issue Type: Improvement > Components: test >Reporter: Akira Ajisaka >Priority: Major > > Now libhdfspp is using gmock-1.7.0 with the patch in HDFS-15232. gmock was > moved to googletest and the latest version is 1.10.0. Let's upgrade it to > remove our own patch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15236) Upgrade googletest to the latest version
[ https://issues.apache.org/jira/browse/HDFS-15236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15236: - Component/s: native > Upgrade googletest to the latest version > > > Key: HDFS-15236 > URL: https://issues.apache.org/jira/browse/HDFS-15236 > Project: Hadoop HDFS > Issue Type: Improvement > Components: native, test >Reporter: Akira Ajisaka >Priority: Major > > Now libhdfspp is using gmock-1.7.0 with the patch in HDFS-15232. gmock was > moved to googletest and the latest version is 1.10.0. Let's upgrade it to > remove our own patch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9145) Tracking methods that hold FSNamesystemLock for too long
[ https://issues.apache.org/jira/browse/HDFS-9145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bianqi updated HDFS-9145: - Summary: Tracking methods that hold FSNamesystemLock for too long (was: Tracking methods that hold FSNamesytemLock for too long) > Tracking methods that hold FSNamesystemLock for too long > > > Key: HDFS-9145 > URL: https://issues.apache.org/jira/browse/HDFS-9145 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Jing Zhao >Assignee: Mingliang Liu >Priority: Major > Fix For: 2.8.0, 2.7.4, 3.0.0-alpha1 > > Attachments: HDFS-9145.000.patch, HDFS-9145.001.patch, > HDFS-9145.002.patch, HDFS-9145.003.patch, testlog.txt > > > It will be helpful that if we can have a way to track (or at least log a msg) > if some operation is holding the FSNamesystem lock for a long time. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15232) Fix libhdfspp test failures with GCC 7
[ https://issues.apache.org/jira/browse/HDFS-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065265#comment-17065265 ] Akira Ajisaka commented on HDFS-15232: -- Filed a Jira to upgrade the version of googletest: HDFS-15236 > Fix libhdfspp test failures with GCC 7 > -- > > Key: HDFS-15232 > URL: https://issues.apache.org/jira/browse/HDFS-15232 > Project: Hadoop HDFS > Issue Type: Bug > Components: native, test > Environment: Ubuntu Bionic, GCC 7.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Fix For: 3.3.0, 3.2.2 > > > Failed CTEST tests after HADOOP-16054: > * remote_block_reader > * memcheck_remote_block_reader > * bad_datanode > * memcheck_bad_datanode -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15236) Upgrade googletest to the latest version
[ https://issues.apache.org/jira/browse/HDFS-15236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065264#comment-17065264 ] Akira Ajisaka commented on HDFS-15236: -- It's better if the version of gtest used in hadoop-common is upgraded as well. > Upgrade googletest to the latest version > > > Key: HDFS-15236 > URL: https://issues.apache.org/jira/browse/HDFS-15236 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Akira Ajisaka >Priority: Major > > Now libhdfspp is using gmock-1.7.0 with the patch in HDFS-15232. gmock was > moved to googletest and the latest version is 1.10.0. Let's upgrade it to > remove our own patch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15232) Fix libhdfspp test failures with GCC 7
[ https://issues.apache.org/jira/browse/HDFS-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065263#comment-17065263 ] Hudson commented on HDFS-15232: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #18076 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18076/]) HDFS-15232. Fix libhdfspp test failures with GCC 7. (#1906) (github: rev f59f6891c8bf70bdddc1f2535588bea41155b480) * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/third_party/gmock-1.7.0/gmock/gmock.h > Fix libhdfspp test failures with GCC 7 > -- > > Key: HDFS-15232 > URL: https://issues.apache.org/jira/browse/HDFS-15232 > Project: Hadoop HDFS > Issue Type: Bug > Components: native, test > Environment: Ubuntu Bionic, GCC 7.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Fix For: 3.3.0, 3.2.2 > > > Failed CTEST tests after HADOOP-16054: > * remote_block_reader > * memcheck_remote_block_reader > * bad_datanode > * memcheck_bad_datanode -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15236) Upgrade googletest to the latest version
Akira Ajisaka created HDFS-15236: Summary: Upgrade googletest to the latest version Key: HDFS-15236 URL: https://issues.apache.org/jira/browse/HDFS-15236 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Akira Ajisaka Now libhdfspp is using gmock-1.7.0 with the patch in HDFS-15232. gmock was moved to googletest and the latest version is 1.10.0. Let's upgrade it to remove our own patch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15232) Fix libhdfspp test failures with GCC 7
[ https://issues.apache.org/jira/browse/HDFS-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15232: - Fix Version/s: 3.2.2 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) Merged the PR into trunk and branch-3.2. > Fix libhdfspp test failures with GCC 7 > -- > > Key: HDFS-15232 > URL: https://issues.apache.org/jira/browse/HDFS-15232 > Project: Hadoop HDFS > Issue Type: Bug > Components: native, test > Environment: Ubuntu Bionic, GCC 7.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Fix For: 3.3.0, 3.2.2 > > > Failed CTEST tests after HADOOP-16054: > * remote_block_reader > * memcheck_remote_block_reader > * bad_datanode > * memcheck_bad_datanode -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YCozy updated HDFS-15235: - Description: We have an HA cluster with two NameNodes: an active NN1 and a standby NN2. At some point, NN1 becomes unhealthy and the admin tries to manually failover to NN2 by running command {code:java} $ hdfs haadmin -failover NN1 NN2 {code} NN2 receives the request and becomes active: {code:java} 2020-03-24 00:24:56,412 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state 2020-03-24 00:24:56,413 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Edit log tailer interrupted: sleep interrupted 2020-03-24 00:24:56,415 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state 2020-03-24 00:24:56,417 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering unfinalized segments in /app/ha-name-dir-shared/current 2020-03-24 00:24:56,419 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering unfinalized segments in /app/nn2/name/current 2020-03-24 00:24:56,419 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest edits from old active before taking over writer role in edits logs 2020-03-24 00:24:56,435 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@7c3095fa expecting start txid #1 2020-03-24 00:24:56,436 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Start loading edits file /app/ha-name-dir-shared/current/edits_001-019 maxTxnsToRead = 9223372036854775807 2020-03-24 00:24:56,441 INFO org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: Fast-forwarding stream '/app/ha-name-dir-shared/current/edits_001-019' to transaction ID 1 2020-03-24 00:24:56,567 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded 1 edits file(s) (the last named /app/ha-name-dir-shared/current/edits_001-019) of total size 1305.0, total edits 19.0, total load time 109.0 ms 2020-03-24 00:24:56,567 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all datanodes as stale 2020-03-24 00:24:56,568 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Processing 4 messages from DataNodes that were previously queued during standby state 2020-03-24 00:24:56,569 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication and invalidation queues 2020-03-24 00:24:56,569 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: initializing replication queues 2020-03-24 00:24:56,570 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing edit logs at txnid 20 2020-03-24 00:24:56,571 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 20 2020-03-24 00:24:56,812 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Initializing quota with 4 thread(s) 2020-03-24 00:24:56,819 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Quota initialization completed in 6 millisecondsname space=3storage space=24690storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0, PROVIDED=0 2020-03-24 00:24:56,827 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 3 milliseconds {code} But NN2 fails to send back the RPC response because of temporary network partitioning. {code:java} java.io.EOFException: End of File Exception between local host is: "24e7b5a52e85/172.17.0.2"; destination host is: "127.0.0.3":8180; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:837) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:791) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1597) at org.apache.hadoop.ipc.Client.call(Client.java:1539) at org.apache.hadoop.ipc.Client.call(Client.java:1436) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy8.transitionToActive(Unknown Source) at
[jira] [Updated] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YCozy updated HDFS-15235: - Description: We have an HA cluster with two NameNodes: an active NN1 and a standby NN2. At some point, NN1 becomes unhealthy and the admin tries to manually failover to NN2 by running command {code:java} $ hfs haadmin -failover NN1 NN2 {code} NN2 receives the request and becomes active: {code:java} 2020-03-24 00:24:56,412 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state 2020-03-24 00:24:56,413 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Edit log tailer interrupted: sleep interrupted 2020-03-24 00:24:56,415 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state 2020-03-24 00:24:56,417 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering unfinalized segments in /app/ha-name-dir-shared/current 2020-03-24 00:24:56,419 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering unfinalized segments in /app/nn2/name/current 2020-03-24 00:24:56,419 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest edits from old active before taking over writer role in edits logs 2020-03-24 00:24:56,435 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@7c3095fa expecting start txid #1 2020-03-24 00:24:56,436 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Start loading edits file /app/ha-name-dir-shared/current/edits_001-019 maxTxnsToRead = 9223372036854775807 2020-03-24 00:24:56,441 INFO org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: Fast-forwarding stream '/app/ha-name-dir-shared/current/edits_001-019' to transaction ID 1 2020-03-24 00:24:56,567 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded 1 edits file(s) (the last named /app/ha-name-dir-shared/current/edits_001-019) of total size 1305.0, total edits 19.0, total load time 109.0 ms 2020-03-24 00:24:56,567 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all datanodes as stale 2020-03-24 00:24:56,568 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Processing 4 messages from DataNodes that were previously queued during standby state 2020-03-24 00:24:56,569 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication and invalidation queues 2020-03-24 00:24:56,569 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: initializing replication queues 2020-03-24 00:24:56,570 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing edit logs at txnid 20 2020-03-24 00:24:56,571 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 20 2020-03-24 00:24:56,812 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Initializing quota with 4 thread(s) 2020-03-24 00:24:56,819 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Quota initialization completed in 6 millisecondsname space=3storage space=24690storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0, PROVIDED=0 2020-03-24 00:24:56,827 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 3 milliseconds {code} But NN2 fails to send back the RPC response because of temporary network partitioning. {code:java} java.io.EOFException: End of File Exception between local host is: "24e7b5a52e85/172.17.0.2"; destination host is: "127.0.0.3":8180; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:837) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:791) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1597) at org.apache.hadoop.ipc.Client.call(Client.java:1539) at org.apache.hadoop.ipc.Client.call(Client.java:1436) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy8.transitionToActive(Unknown Source) at
[jira] [Created] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable
YCozy created HDFS-15235: Summary: Transient network failure during NameNode failover makes cluster unavailable Key: HDFS-15235 URL: https://issues.apache.org/jira/browse/HDFS-15235 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.3.0 Reporter: YCozy We have an HA cluster with two NameNodes: an active NN1 and a standby NN2. At some point, NN1 becomes unhealthy and the admin tries to manually failover to NN2 by running command {code:java} $ hfs haadmin -failover NN1 NN2 {code} NN2 receives the request and becomes active: {code:java} 2020-03-24 00:24:56,412 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state 2020-03-24 00:24:56,413 WARN org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Edit log tailer interrupted: sleep interrupted 2020-03-24 00:24:56,415 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state 2020-03-24 00:24:56,417 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering unfinalized segments in /app/ha-name-dir-shared/current 2020-03-24 00:24:56,419 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering unfinalized segments in /app/nn2/name/current 2020-03-24 00:24:56,419 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Catching up to latest edits from old active before taking over writer role in edits logs 2020-03-24 00:24:56,435 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@7c3095fa expecting start txid #1 2020-03-24 00:24:56,436 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Start loading edits file /app/ha-name-dir-shared/current/edits_001-019 maxTxnsToRead = 9223372036854775807 2020-03-24 00:24:56,441 INFO org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: Fast-forwarding stream '/app/ha-name-dir-shared/current/edits_001-019' to transaction ID 1 2020-03-24 00:24:56,567 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loaded 1 edits file(s) (the last named /app/ha-name-dir-shared/current/edits_001-019) of total size 1305.0, total edits 19.0, total load time 109.0 ms 2020-03-24 00:24:56,567 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: Marking all datanodes as stale 2020-03-24 00:24:56,568 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Processing 4 messages from DataNodes that were previously queued during standby state 2020-03-24 00:24:56,569 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Reprocessing replication and invalidation queues 2020-03-24 00:24:56,569 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: initializing replication queues 2020-03-24 00:24:56,570 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing edit logs at txnid 20 2020-03-24 00:24:56,571 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 20 2020-03-24 00:24:56,812 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Initializing quota with 4 thread(s) 2020-03-24 00:24:56,819 INFO org.apache.hadoop.hdfs.server.namenode.FSDirectory: Quota initialization completed in 6 millisecondsname space=3storage space=24690storage types=RAM_DISK=0, SSD=0, DISK=0, ARCHIVE=0, PROVIDED=0 2020-03-24 00:24:56,827 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 3 milliseconds {code} But NN2 fails to send back the RPC response because of temporary network partitioning. {code:java} java.io.EOFException: End of File Exception between local host is: "24e7b5a52e85/172.17.0.2"; destination host is: "127.0.0.3":8180; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:837) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:791) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1597) at org.apache.hadoop.ipc.Client.call(Client.java:1539) at org.apache.hadoop.ipc.Client.call(Client.java:1436) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
[jira] [Commented] (HDFS-13377) The owner of folder can set quota for his sub folder
[ https://issues.apache.org/jira/browse/HDFS-13377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065163#comment-17065163 ] Íñigo Goiri commented on HDFS-13377: +1 on [^HDFS-13377.006.patch]. > The owner of folder can set quota for his sub folder > > > Key: HDFS-13377 > URL: https://issues.apache.org/jira/browse/HDFS-13377 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Attachments: HDFS-13377.003.patch, HDFS-13377.004.patch, > HDFS-13377.005.patch, HDFS-13377.006.patch, HDFS-13377.patch, > HDFS-13377.patch, HDFS-13377.patch > > > Currently, only super user can set quota. That is huge burden for > administrator in a large system. Add a new feature to let the owner of a > folder also has the privilege to set quota for his sub folders. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15169) RBF: Router FSCK should consider the mount table
[ https://issues.apache.org/jira/browse/HDFS-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065162#comment-17065162 ] Íñigo Goiri commented on HDFS-15169: Thanks [~hexiaoqiao]. Is it possible to add a test that is a little more specific to the change? > RBF: Router FSCK should consider the mount table > > > Key: HDFS-15169 > URL: https://issues.apache.org/jira/browse/HDFS-15169 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15169.001.patch > > > HDFS-13989 implemented FSCK to DFSRouter, however, it just redirects the > requests to all the active downstream NameNodes for now. The DFSRouter should > consider the mount table when redirecting the requests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14783) Expired SampleStat needs to be removed from SlowPeersReport
[ https://issues.apache.org/jira/browse/HDFS-14783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065161#comment-17065161 ] Íñigo Goiri commented on HDFS-14783: My bad yes, I was wrong in the comment. +1 on [^HDFS-14783-005.patch]. > Expired SampleStat needs to be removed from SlowPeersReport > --- > > Key: HDFS-14783 > URL: https://issues.apache.org/jira/browse/HDFS-14783 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haibin Huang >Assignee: Haibin Huang >Priority: Major > Attachments: HDFS-14783, HDFS-14783-001.patch, HDFS-14783-002.patch, > HDFS-14783-003.patch, HDFS-14783-004.patch, HDFS-14783-005.patch > > > SlowPeersReport is calculated by the SampleStat between tow dn, so it can > present on nn's jmx like this: > {code:java} > "SlowPeersReport" :[{"SlowNode":"dn2","ReportingNodes":["dn1"]}] > {code} > the SampleStat is stored in a LinkedBlockingDeque, it won't be > removed until the queue is full and a newest one is generated. Therefore, if > dn1 don't send any packet to dn2 for a long time, the old SampleStat will > keep staying in the queue, and will be used to calculated slowpeer.I think > these old SampleStats should be considered as expired message and ignore them > when generating a new SlowPeersReport. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15075) Remove process command timing from BPServiceActor
[ https://issues.apache.org/jira/browse/HDFS-15075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065158#comment-17065158 ] Íñigo Goiri commented on HDFS-15075: +1 on [^HDFS-15075.007.patch]. Let's see what Jenkins says (thanks [~weichiu] for triggering). > Remove process command timing from BPServiceActor > - > > Key: HDFS-15075 > URL: https://issues.apache.org/jira/browse/HDFS-15075 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15075.001.patch, HDFS-15075.002.patch, > HDFS-15075.003.patch, HDFS-15075.004.patch, HDFS-15075.005.patch, > HDFS-15075.006.patch, HDFS-15075.007.patch > > > HDFS-14997 moved the command processing into async. > Right now, we are checking the time to add to a queue. > We should remove this one and maybe move the timing within the thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14442) Disagreement between HAUtil.getAddressOfActive and RpcInvocationHandler.getConnectionId
[ https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065103#comment-17065103 ] Erik Krogen commented on HDFS-14442: Apologies, I was on leave when pinged earlier. Thanks a lot for fixing this up and getting it committed! > Disagreement between HAUtil.getAddressOfActive and > RpcInvocationHandler.getConnectionId > --- > > Key: HDFS-14442 > URL: https://issues.apache.org/jira/browse/HDFS-14442 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Erik Krogen >Assignee: Ravuri Sushma sree >Priority: Major > Fix For: 3.3.0, 3.2.2 > > Attachments: HDFS-14442.001.patch, HDFS-14442.002.patch, > HDFS-14442.003.patch, HDFS-14442.004.patch > > > While working on HDFS-14245, we noticed a discrepancy in some proxy-handling > code. > The description of {{RpcInvocationHandler.getConnectionId()}} states: > {code} > /** >* Returns the connection id associated with the InvocationHandler instance. >* @return ConnectionId >*/ > ConnectionId getConnectionId(); > {code} > It does not make any claims about whether this connection ID will be an > active proxy or not. Yet in {{HAUtil}} we have: > {code} > /** >* Get the internet address of the currently-active NN. This should rarely > be >* used, since callers of this method who connect directly to the NN using > the >* resulting InetSocketAddress will not be able to connect to the active NN > if >* a failover were to occur after this method has been called. >* >* @param fs the file system to get the active address of. >* @return the internet address of the currently-active NN. >* @throws IOException if an error occurs while resolving the active NN. >*/ > public static InetSocketAddress getAddressOfActive(FileSystem fs) > throws IOException { > if (!(fs instanceof DistributedFileSystem)) { > throw new IllegalArgumentException("FileSystem " + fs + " is not a > DFS."); > } > // force client address resolution. > fs.exists(new Path("/")); > DistributedFileSystem dfs = (DistributedFileSystem) fs; > DFSClient dfsClient = dfs.getClient(); > return RPC.getServerAddress(dfsClient.getNamenode()); > } > {code} > Where the call {{RPC.getServerAddress()}} eventually terminates into > {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} -> > {{RPC.getConnectionIdForProxy()}} -> > {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making > an incorrect assumption that {{RpcInvocationHandler}} will necessarily return > an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a > counter-example to this, since the current connection ID may be pointing at, > for example, an Observer NameNode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14743) Enhance INodeAttributeProvider/ AccessControlEnforcer Interface in HDFS to support Authorization of mkdir, rm, rmdir, copy, move etc...
[ https://issues.apache.org/jira/browse/HDFS-14743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14743: --- Release Note: A new INodeAttributeProvider API checkPermissionWithContext(AuthorizationContext) is added. Authorization provider implementations may implement this API to get additional context (operation name and caller context) of an authorization request. > Enhance INodeAttributeProvider/ AccessControlEnforcer Interface in HDFS to > support Authorization of mkdir, rm, rmdir, copy, move etc... > --- > > Key: HDFS-14743 > URL: https://issues.apache.org/jira/browse/HDFS-14743 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Affects Versions: 3.1.0 >Reporter: Ramesh Mani >Assignee: Wei-Chiu Chuang >Priority: Critical > Fix For: 3.3.0 > > Attachments: HDFS-14743 Enhance INodeAttributeProvider_ > AccessControlEnforcer Interface.pdf > > > Enhance INodeAttributeProvider / AccessControlEnforcer Interface in HDFS to > support Authorization of mkdir, rm, rmdir, copy, move etc..., this should > help the implementors of the interface like Apache Ranger's HDFS > Authorization plugin to authorize and audit those command sets. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14743) Enhance INodeAttributeProvider/ AccessControlEnforcer Interface in HDFS to support Authorization of mkdir, rm, rmdir, copy, move etc...
[ https://issues.apache.org/jira/browse/HDFS-14743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14743: --- Issue Type: New Feature (was: Improvement) > Enhance INodeAttributeProvider/ AccessControlEnforcer Interface in HDFS to > support Authorization of mkdir, rm, rmdir, copy, move etc... > --- > > Key: HDFS-14743 > URL: https://issues.apache.org/jira/browse/HDFS-14743 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Affects Versions: 3.1.0 >Reporter: Ramesh Mani >Assignee: Wei-Chiu Chuang >Priority: Critical > Fix For: 3.3.0 > > Attachments: HDFS-14743 Enhance INodeAttributeProvider_ > AccessControlEnforcer Interface.pdf > > > Enhance INodeAttributeProvider / AccessControlEnforcer Interface in HDFS to > support Authorization of mkdir, rm, rmdir, copy, move etc..., this should > help the implementors of the interface like Apache Ranger's HDFS > Authorization plugin to authorize and audit those command sets. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14743) Enhance INodeAttributeProvider/ AccessControlEnforcer Interface in HDFS to support Authorization of mkdir, rm, rmdir, copy, move etc...
[ https://issues.apache.org/jira/browse/HDFS-14743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-14743: --- Issue Type: Improvement (was: Bug) > Enhance INodeAttributeProvider/ AccessControlEnforcer Interface in HDFS to > support Authorization of mkdir, rm, rmdir, copy, move etc... > --- > > Key: HDFS-14743 > URL: https://issues.apache.org/jira/browse/HDFS-14743 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.1.0 >Reporter: Ramesh Mani >Assignee: Wei-Chiu Chuang >Priority: Critical > Fix For: 3.3.0 > > Attachments: HDFS-14743 Enhance INodeAttributeProvider_ > AccessControlEnforcer Interface.pdf > > > Enhance INodeAttributeProvider / AccessControlEnforcer Interface in HDFS to > support Authorization of mkdir, rm, rmdir, copy, move etc..., this should > help the implementors of the interface like Apache Ranger's HDFS > Authorization plugin to authorize and audit those command sets. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15234) Add a default method body for the INodeAttributeProvider#checkPermissionWithContext API
[ https://issues.apache.org/jira/browse/HDFS-15234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15234: --- Description: The new API INodeAttributeProvider#checkPermissionWithContext() needs a default method body. Otherwise old implementations fail to compile. (was: The new API INodeAttributeProvider#checkPermissionWithContext() needs a default method body. Otherwise old implementations fails to compile.) > Add a default method body for the > INodeAttributeProvider#checkPermissionWithContext API > --- > > Key: HDFS-15234 > URL: https://issues.apache.org/jira/browse/HDFS-15234 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Blocker > > The new API INodeAttributeProvider#checkPermissionWithContext() needs a > default method body. Otherwise old implementations fail to compile. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15234) Add a default method body for the INodeAttributeProvider#checkPermissionWithContext API
Wei-Chiu Chuang created HDFS-15234: -- Summary: Add a default method body for the INodeAttributeProvider#checkPermissionWithContext API Key: HDFS-15234 URL: https://issues.apache.org/jira/browse/HDFS-15234 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.3.0 Reporter: Wei-Chiu Chuang Assignee: Wei-Chiu Chuang The new API INodeAttributeProvider#checkPermissionWithContext() needs a default method body. Otherwise old implementations fails to compile. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15075) Remove process command timing from BPServiceActor
[ https://issues.apache.org/jira/browse/HDFS-15075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065055#comment-17065055 ] Wei-Chiu Chuang commented on HDFS-15075: Retriggered jenkins: https://builds.apache.org/job/PreCommit-HDFS-Build/29009/ > Remove process command timing from BPServiceActor > - > > Key: HDFS-15075 > URL: https://issues.apache.org/jira/browse/HDFS-15075 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Íñigo Goiri >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15075.001.patch, HDFS-15075.002.patch, > HDFS-15075.003.patch, HDFS-15075.004.patch, HDFS-15075.005.patch, > HDFS-15075.006.patch, HDFS-15075.007.patch > > > HDFS-14997 moved the command processing into async. > Right now, we are checking the time to add to a queue. > We should remove this one and maybe move the timing within the thread. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15113) Missing IBR when NameNode restart if open processCommand async feature
[ https://issues.apache.org/jira/browse/HDFS-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065047#comment-17065047 ] Hudson commented on HDFS-15113: --- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #18075 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18075/]) HDFS-15113. Addendum: Missing IBR when NameNode restart if open (weichiu: rev af64ce2f4a72705c5b68ddfe4d29f0d208fd38e7) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java > Missing IBR when NameNode restart if open processCommand async feature > -- > > Key: HDFS-15113 > URL: https://issues.apache.org/jira/browse/HDFS-15113 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Blocker > Fix For: 3.3.0 > > Attachments: HDFS-15113.001.patch, HDFS-15113.002.patch, > HDFS-15113.003.patch, HDFS-15113.004.patch, HDFS-15113.005.patch, > HDFS-15113.addendum.patch > > > Recently, I meet one case that NameNode missing block after restart which is > related with HDFS-14997. > a. during NameNode restart, it will return command `DNA_REGISTER` to DataNode > when receive some RPC request from DataNode. > b. when DataNode receive `DNA_REGISTER` command, it will run #reRegister > async. > {code:java} > void reRegister() throws IOException { > if (shouldRun()) { > // re-retrieve namespace info to make sure that, if the NN > // was restarted, we still match its version (HDFS-2120) > NamespaceInfo nsInfo = retrieveNamespaceInfo(); > // and re-register > register(nsInfo); > scheduler.scheduleHeartbeat(); > // HDFS-9917,Standby NN IBR can be very huge if standby namenode is down > // for sometime. > if (state == HAServiceState.STANDBY || state == > HAServiceState.OBSERVER) { > ibrManager.clearIBRs(); > } > } > } > {code} > c. As we know, #register will trigger BR immediately. > d. because #reRegister run async, so we could not make sure which one run > first between send FBR and clear IBR. If clean IBR run first, it will be OK. > But if send FBR first then clear IBR, it will missing some blocks received > between these two time point until next FBR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15113) Missing IBR when NameNode restart if open processCommand async feature
[ https://issues.apache.org/jira/browse/HDFS-15113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15113: --- Resolution: Fixed Status: Resolved (was: Patch Available) Committed the addendum patch. Thanks again [~hexiaoqiao]! > Missing IBR when NameNode restart if open processCommand async feature > -- > > Key: HDFS-15113 > URL: https://issues.apache.org/jira/browse/HDFS-15113 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Blocker > Fix For: 3.3.0 > > Attachments: HDFS-15113.001.patch, HDFS-15113.002.patch, > HDFS-15113.003.patch, HDFS-15113.004.patch, HDFS-15113.005.patch, > HDFS-15113.addendum.patch > > > Recently, I meet one case that NameNode missing block after restart which is > related with HDFS-14997. > a. during NameNode restart, it will return command `DNA_REGISTER` to DataNode > when receive some RPC request from DataNode. > b. when DataNode receive `DNA_REGISTER` command, it will run #reRegister > async. > {code:java} > void reRegister() throws IOException { > if (shouldRun()) { > // re-retrieve namespace info to make sure that, if the NN > // was restarted, we still match its version (HDFS-2120) > NamespaceInfo nsInfo = retrieveNamespaceInfo(); > // and re-register > register(nsInfo); > scheduler.scheduleHeartbeat(); > // HDFS-9917,Standby NN IBR can be very huge if standby namenode is down > // for sometime. > if (state == HAServiceState.STANDBY || state == > HAServiceState.OBSERVER) { > ibrManager.clearIBRs(); > } > } > } > {code} > c. As we know, #register will trigger BR immediately. > d. because #reRegister run async, so we could not make sure which one run > first between send FBR and clear IBR. If clean IBR run first, it will be OK. > But if send FBR first then clear IBR, it will missing some blocks received > between these two time point until next FBR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15118) [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster.
[ https://issues.apache.org/jira/browse/HDFS-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned HDFS-15118: --- Assignee: Chen Liang (was: Brahma Reddy Battula) > [SBN Read] Slow clients when Observer reads are enabled but there are no > Observers on the cluster. > -- > > Key: HDFS-15118 > URL: https://issues.apache.org/jira/browse/HDFS-15118 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Chen Liang >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15118.001.patch, HDFS-15118.002.patch > > > We see substantial degradation in performance of HDFS clients, when Observer > reads are enabled via {{ObserverReadProxyProvider}}, but there are no > ObserverNodes on the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15118) [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster.
[ https://issues.apache.org/jira/browse/HDFS-15118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned HDFS-15118: --- Assignee: Brahma Reddy Battula (was: Chen Liang) > [SBN Read] Slow clients when Observer reads are enabled but there are no > Observers on the cluster. > -- > > Key: HDFS-15118 > URL: https://issues.apache.org/jira/browse/HDFS-15118 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 2.10.0 >Reporter: Konstantin Shvachko >Assignee: Brahma Reddy Battula >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15118.001.patch, HDFS-15118.002.patch > > > We see substantial degradation in performance of HDFS clients, when Observer > reads are enabled via {{ObserverReadProxyProvider}}, but there are no > ObserverNodes on the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15233) Add -S option in "Count" command to show only Snapshot Counts
hemanthboyina created HDFS-15233: Summary: Add -S option in "Count" command to show only Snapshot Counts Key: HDFS-15233 URL: https://issues.apache.org/jira/browse/HDFS-15233 Project: Hadoop HDFS Issue Type: Improvement Reporter: hemanthboyina Assignee: hemanthboyina -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15232) Fix libhdfspp test failures with GCC 7
[ https://issues.apache.org/jira/browse/HDFS-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15232: - Environment: Ubuntu Bionic, GCC 7.4.0 Summary: Fix libhdfspp test failures with GCC 7 (was: Some CTESTs are failing after HADOOP-16054) > Fix libhdfspp test failures with GCC 7 > -- > > Key: HDFS-15232 > URL: https://issues.apache.org/jira/browse/HDFS-15232 > Project: Hadoop HDFS > Issue Type: Bug > Components: native, test > Environment: Ubuntu Bionic, GCC 7.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > > Failed CTEST tests after HADOOP-16054: > * remote_block_reader > * memcheck_remote_block_reader > * bad_datanode > * memcheck_bad_datanode -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15232) Some CTESTs are failing after HADOOP-16054
[ https://issues.apache.org/jira/browse/HDFS-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064709#comment-17064709 ] Akira Ajisaka commented on HDFS-15232: -- I created a PR to patch gmock-1.7.0 instead of upgrading to 1.8.1. Now I don't have enough time to modify the CMakeLists.txt to catch up with the code structure changes between gmock 1.7.0 and googletest 1.8.1. > Some CTESTs are failing after HADOOP-16054 > -- > > Key: HDFS-15232 > URL: https://issues.apache.org/jira/browse/HDFS-15232 > Project: Hadoop HDFS > Issue Type: Bug > Components: native, test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > > Failed CTEST tests after HADOOP-16054: > * remote_block_reader > * memcheck_remote_block_reader > * bad_datanode > * memcheck_bad_datanode -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15232) Some CTESTs are failing after HADOOP-16054
[ https://issues.apache.org/jira/browse/HDFS-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15232: - Status: Patch Available (was: Open) > Some CTESTs are failing after HADOOP-16054 > -- > > Key: HDFS-15232 > URL: https://issues.apache.org/jira/browse/HDFS-15232 > Project: Hadoop HDFS > Issue Type: Bug > Components: native, test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > > Failed CTEST tests after HADOOP-16054: > * remote_block_reader > * memcheck_remote_block_reader > * bad_datanode > * memcheck_bad_datanode -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15232) Some CTESTs are failing after HADOOP-16054
[ https://issues.apache.org/jira/browse/HDFS-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-15232: - Component/s: test > Some CTESTs are failing after HADOOP-16054 > -- > > Key: HDFS-15232 > URL: https://issues.apache.org/jira/browse/HDFS-15232 > Project: Hadoop HDFS > Issue Type: Bug > Components: native, test >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > > Failed CTEST tests after HADOOP-16054: > * remote_block_reader > * memcheck_remote_block_reader > * bad_datanode > * memcheck_bad_datanode -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15232) Some CTESTs are failing after HADOOP-16054
[ https://issues.apache.org/jira/browse/HDFS-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064625#comment-17064625 ] Akira Ajisaka commented on HDFS-15232: -- After the change suggested in [https://github.com/google/googletest/issues/705#issuecomment-235067917], the tests passed. Here is the diff: {code} diff --git a/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/third_party/gmock-1.7.0/gmoc index e8dd7fc..dd1b865 100644 --- a/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/third_party/gmock-1.7.0/gmock/gmock +++ b/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/third_party/gmock-1.7.0/gmock/gmock @@ -9930,6 +9930,8 @@ class ActionResultHolder : public UntypedActionResultHolderBase { template <> class ActionResultHolder : public UntypedActionResultHolderBase { public: + explicit ActionResultHolder() {} + void GetValueAndDelete() const { delete this; } virtual void PrintAsActionResult(::std::ostream* /* os */) const {} @@ -9941,7 +9943,7 @@ class ActionResultHolder : public UntypedActionResultHolderBase { const typename Function::ArgumentTuple& args, const string& call_description) { func_mocker->PerformDefaultAction(args, call_description); -return NULL; +return new ActionResultHolder(); } // Performs the given action and returns NULL. @@ -9950,7 +9952,7 @@ class ActionResultHolder : public UntypedActionResultHolderBase { const Action& action, const typename Function::ArgumentTuple& args) { action.Perform(args); -return NULL; +return new ActionResultHolder(); } }; {code} This code has been fixed since gmock 1.8.x, so I'll upgrade gmock (moved to googletest) to 1.8.1. > Some CTESTs are failing after HADOOP-16054 > -- > > Key: HDFS-15232 > URL: https://issues.apache.org/jira/browse/HDFS-15232 > Project: Hadoop HDFS > Issue Type: Bug > Components: native >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > > Failed CTEST tests after HADOOP-16054: > * remote_block_reader > * memcheck_remote_block_reader > * bad_datanode > * memcheck_bad_datanode -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15232) Some CTESTs are failing after HADOOP-16054
[ https://issues.apache.org/jira/browse/HDFS-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka reassigned HDFS-15232: Assignee: Akira Ajisaka > Some CTESTs are failing after HADOOP-16054 > -- > > Key: HDFS-15232 > URL: https://issues.apache.org/jira/browse/HDFS-15232 > Project: Hadoop HDFS > Issue Type: Bug > Components: native >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > > Failed CTEST tests after HADOOP-16054: > * remote_block_reader > * memcheck_remote_block_reader > * bad_datanode > * memcheck_bad_datanode -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15232) Some CTESTs are failing after HADOOP-16054
[ https://issues.apache.org/jira/browse/HDFS-15232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064615#comment-17064615 ] Akira Ajisaka commented on HDFS-15232: -- These test failures are caused by the bug of gmock-1.7.0: https://github.com/google/googletest/issues/705 > Some CTESTs are failing after HADOOP-16054 > -- > > Key: HDFS-15232 > URL: https://issues.apache.org/jira/browse/HDFS-15232 > Project: Hadoop HDFS > Issue Type: Bug > Components: native >Reporter: Akira Ajisaka >Priority: Major > > Failed CTEST tests after HADOOP-16054: > * remote_block_reader > * memcheck_remote_block_reader > * bad_datanode > * memcheck_bad_datanode -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15169) RBF: Router FSCK should consider the mount table
[ https://issues.apache.org/jira/browse/HDFS-15169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064569#comment-17064569 ] Xiaoqiao He commented on HDFS-15169: Attach Jenkins result link https://builds.apache.org/job/PreCommit-HDFS-Build/28995/console du to it is misbehaving few days. Hi [~aajisaka],[~elgoiri],[~ayushtkn] Would you like to have a review? > RBF: Router FSCK should consider the mount table > > > Key: HDFS-15169 > URL: https://issues.apache.org/jira/browse/HDFS-15169 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Reporter: Akira Ajisaka >Assignee: Xiaoqiao He >Priority: Major > Attachments: HDFS-15169.001.patch > > > HDFS-13989 implemented FSCK to DFSRouter, however, it just redirects the > requests to all the active downstream NameNodes for now. The DFSRouter should > consider the mount table when redirecting the requests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org