[jira] [Commented] (HDFS-9126) namenode crash in fsimage download/transfer
[ https://issues.apache.org/jira/browse/HDFS-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362736#comment-17362736 ] Seokchan Yoon commented on HDFS-9126: - Why is this closed? I ran into the same situation and need to figure out the reason why the previous active NN failed on doHealthChecks. > namenode crash in fsimage download/transfer > --- > > Key: HDFS-9126 > URL: https://issues.apache.org/jira/browse/HDFS-9126 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: OS:Centos 6.5(final) > Apache Hadoop:2.6.0 > namenode ha base 5 journalnodes >Reporter: zengyongping >Priority: Critical > > In our product Hadoop cluster,when active namenode begin download/transfer > fsimage from standby namenode.some times zkfc monitor health of NameNode > socket timeout,zkfs judge active namenode status SERVICE_NOT_RESPONDING > ,happen hadoop namenode ha failover,fence old active namenode. > zkfc logs: > 2015-09-24 11:44:44,739 WARN org.apache.hadoop.ha.HealthMonitor: > Transport-level exception trying to monitor health of NameNode at > hostname1/192.168.10.11:8020: Call From hostname1/192.168.10.11 to > hostname1:8020 failed on socket timeout exception: > java.net.SocketTimeoutException: 45000 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/192.168.10.11:22614 remote=hostname1/192.168.10.11:8020]; For more > details see: http://wiki.apache.org/hadoop/SocketTimeout > 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.HealthMonitor: Entering > state SERVICE_NOT_RESPONDING > 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ZKFailoverController: Local > service NameNode at hostname1/192.168.10.11:8020 entered state: > SERVICE_NOT_RESPONDING > 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ZKFailoverController: > Quitting master election for NameNode at hostname1/192.168.10.11:8020 and > marking that fencing is necessary > 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Yielding from election > 2015-09-24 11:44:44,761 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x54d81348fe503e3 closed > 2015-09-24 11:44:44,761 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Ignoring stale result from old client with sessionId 0x54d81348fe503e3 > 2015-09-24 11:44:44,764 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > namenode logs: > 2015-09-24 11:43:34,074 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from > 192.168.10.12 > 2015-09-24 11:43:34,074 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs > 2015-09-24 11:43:34,075 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment > 2317430129 > 2015-09-24 11:43:34,253 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: > 272988 Total time for transactions(ms): 5502 Number of transactions batched > in Syncs: 146274 Number of syncs: 32375 SyncTimes(ms): 274465 319599 > 2015-09-24 11:43:46,005 INFO > org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: > Rescanning after 3 milliseconds > 2015-09-24 11:44:21,054 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > PendingReplicationMonitor timed out blk_1185804191_112164210 > 2015-09-24 11:44:36,076 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits > file > /software/data/hadoop-data/hdfs/namenode/current/edits_inprogress_02317430129 > -> > /software/data/hadoop-data/hdfs/namenode/current/edits_02317430129-02317703116 > 2015-09-24 11:44:36,077 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at > 2317703117 > 2015-09-24 11:45:38,008 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1 > Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 > Number of syncs: 0 SyncTimes(ms): 0 61585 > 2015-09-24 11:45:38,009 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took 222.88s > at 63510.29 KB/s > 2015-09-24 11:45:38,009 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file > fsimage.ckpt_02317430128 size 14495092105 bytes. > 2015-09-24 11:45:38,416 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal > 192.168.10.13:8485 failed to write txns 2317703117-2317703117. Will try to > write to this JN again after the next log roll. > org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 44 is > less than the last promised epoch 45 > at > org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:414) > a
[jira] [Work logged] (HDFS-16055) Quota is not preserved in snapshot INode
[ https://issues.apache.org/jira/browse/HDFS-16055?focusedWorklogId=610002&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610002 ] ASF GitHub Bot logged work on HDFS-16055: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:08 Start Date: 14/Jun/21 07:08 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3078: URL: https://github.com/apache/hadoop/pull/3078#issuecomment-859196473 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 50s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 33s | | trunk passed | | +1 :green_heart: | compile | 1m 21s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 14s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 1s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 24s | | trunk passed | | +1 :green_heart: | javadoc | 0m 55s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 24s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 17s | | trunk passed | | +1 :green_heart: | shadedclient | 19m 4s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 13s | | the patch passed | | +1 :green_heart: | compile | 1m 17s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 17s | | the patch passed | | +1 :green_heart: | compile | 1m 7s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 7s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 56s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 15s | | the patch passed | | +1 :green_heart: | javadoc | 0m 48s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 19s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 20s | | the patch passed | | +1 :green_heart: | shadedclient | 18m 48s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 330m 23s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3078/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 39s | | The patch does not generate ASF License warnings. | | | | 422m 20s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | | hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor | | | hadoop.hdfs.TestDFSShell | | | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3078/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3078 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux ed5e5dcaaa2c 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / fdb574b1debb926be5ee32daae4e624d11900383 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test R
[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
[ https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=609998&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609998 ] ASF GitHub Bot logged work on HDFS-13671: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:08 Start Date: 14/Jun/21 07:08 Worklog Time Spent: 10m Work Description: xiaoyuyao commented on a change in pull request #3065: URL: https://github.com/apache/hadoop/pull/3065#discussion_r648608033 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -3220,21 +3172,28 @@ private void reportDiffSortedInner( // comes from the IBR / FBR and hence what we should use to compare // against the memory state. // See HDFS-6289 and HDFS-15422 for more context. -queueReportedBlock(storageInfo, replica, reportedState, +queueReportedBlock(storageInfo, block, reportedState, Review comment: block should be storedBlock? ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto ## @@ -256,9 +256,6 @@ message BlockReportContextProto { // The block report lease ID, or 0 if we are sending without a lease to // bypass rate-limiting. optional uint64 leaseId = 4 [ default = 0 ]; - - // True if the reported blocks are sorted by increasing block IDs - optional bool sorted = 5 [default = false]; Review comment: should we leave here as-is to keep backward compatibility but leave it unused. ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -3111,106 +3042,127 @@ void processFirstBlockReport( } } - private void reportDiffSorted(DatanodeStorageInfo storageInfo, - Iterable newReport, + private void reportDiff(DatanodeStorageInfo storageInfo, + BlockListAsLongs newReport, Collection toAdd, // add to DatanodeDescriptor Collection toRemove, // remove from DatanodeDescriptor Collection toInvalidate, // should be removed from DN Collection toCorrupt, // add to corrupt replicas list Collection toUC) { // add to under-construction list -// The blocks must be sorted and the storagenodes blocks must be sorted -Iterator storageBlocksIterator = storageInfo.getBlockIterator(); +// place a delimiter in the list which separates blocks +// that have been reported from those that have not DatanodeDescriptor dn = storageInfo.getDatanodeDescriptor(); -BlockInfo storageBlock = null; - -for (BlockReportReplica replica : newReport) { - - long replicaID = replica.getBlockId(); - if (BlockIdManager.isStripedBlockID(replicaID) - && (!hasNonEcBlockUsingStripedID || - !blocksMap.containsBlock(replica))) { -replicaID = BlockIdManager.convertToStripedID(replicaID); - } - - ReplicaState reportedState = replica.getState(); - - LOG.debug("Reported block {} on {} size {} replicaState = {}", - replica, dn, replica.getNumBytes(), reportedState); - - if (shouldPostponeBlocksFromFuture - && isGenStampInFuture(replica)) { -queueReportedBlock(storageInfo, replica, reportedState, - QUEUE_REASON_FUTURE_GENSTAMP); -continue; - } - - if (storageBlock == null && storageBlocksIterator.hasNext()) { -storageBlock = storageBlocksIterator.next(); - } - - do { -int cmp; -if (storageBlock == null || -(cmp = Long.compare(replicaID, storageBlock.getBlockId())) < 0) { - // Check if block is available in NN but not yet on this storage - BlockInfo nnBlock = blocksMap.getStoredBlock(new Block(replicaID)); - if (nnBlock != null) { -reportDiffSortedInner(storageInfo, replica, reportedState, - nnBlock, toAdd, toCorrupt, toUC); - } else { -// Replica not found anywhere so it should be invalidated -toInvalidate.add(new Block(replica)); - } - break; -} else if (cmp == 0) { - // Replica matched current storageblock - reportDiffSortedInner(storageInfo, replica, reportedState, -storageBlock, toAdd, toCorrupt, toUC); - storageBlock = null; -} else { - // replica has higher ID than storedBlock - // Remove all stored blocks with IDs lower than replica - do { -toRemove.add(storageBlock); -storageBlock = storageBlocksIterator.hasNext() - ? storageBlocksIterator.next() : null; - } while (storageBlock != null && - Long.compare(replicaID, storageBlock.getBlockId()) > 0); +Block d
[jira] [Work logged] (HDFS-16062) When a DataNode hot reload configuration, JMX will block for a long time
[ https://issues.apache.org/jira/browse/HDFS-16062?focusedWorklogId=610008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610008 ] ASF GitHub Bot logged work on HDFS-16062: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:09 Start Date: 14/Jun/21 07:09 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3092: URL: https://github.com/apache/hadoop/pull/3092#issuecomment-859160527 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 54s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 21s | | trunk passed | | +1 :green_heart: | compile | 1m 25s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 15s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 3s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 24s | | trunk passed | | +1 :green_heart: | javadoc | 0m 56s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 22s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 18s | | trunk passed | | +1 :green_heart: | shadedclient | 19m 3s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 15s | | the patch passed | | +1 :green_heart: | compile | 1m 17s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 17s | | the patch passed | | +1 :green_heart: | compile | 1m 11s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 11s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 55s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3092/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 161 unchanged - 1 fixed = 162 total (was 162) | | +1 :green_heart: | mvnsite | 1m 14s | | the patch passed | | +1 :green_heart: | javadoc | 0m 48s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 25s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 18s | | the patch passed | | -1 :x: | shadedclient | 18m 56s | | patch has errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 765m 23s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3092/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 8s | | The patch does not generate ASF License warnings. | | | | 858m 19s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestSetrepDecreasing | | | hadoop.hdfs.server.blockmanagement.TestNameNodePrunesMissingStorages | | | hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport | | | hadoop.fs.TestResolveHdfsSymlink | | | hadoop.hdfs.TestClose | | | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | | | hadoop.fs.viewfs.TestViewFsDefaultValue | | | hadoop.hdfs.TestWriteBlockGetsBlockLengthHint | | | hadoop.hdfs.TestDatanodeConfig | | | hadoop.hdfs.tools.TestECAdmin | | | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotNameWithInvalidCharacters | | | hadoop.hdfs.TestFileChecksum | | | hadoop.hdfs.server.namenode.TestXAttrConfigFlag | | | hadoop.hdfs.web.TestWebHdfsTokens | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.server.datanode.TestDa
[jira] [Work logged] (HDFS-16057) Make sure the order for location in ENTERING_MAINTENANCE state
[ https://issues.apache.org/jira/browse/HDFS-16057?focusedWorklogId=610018&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610018 ] ASF GitHub Bot logged work on HDFS-16057: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:10 Start Date: 14/Jun/21 07:10 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #3084: URL: https://github.com/apache/hadoop/pull/3084#issuecomment-859187424 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610018) Time Spent: 1.5h (was: 1h 20m) > Make sure the order for location in ENTERING_MAINTENANCE state > -- > > Key: HDFS-16057 > URL: https://issues.apache.org/jira/browse/HDFS-16057 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > We use comparator to sort locations in getBlockLocations(), and the expected > result is: live -> stale -> entering_maintenance -> decommissioned. > But the networktopology. SortByDistance() will disrupt the order. We should > also filtered out node in sate AdminStates.ENTERING_MAINTENANCE before > networktopology. SortByDistance(). > > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager#sortLocatedBlock() > {code:java} > DatanodeInfoWithStorage[] di = lb.getLocations(); > // Move decommissioned/stale datanodes to the bottom > Arrays.sort(di, comparator); > // Sort nodes by network distance only for located blocks > int lastActiveIndex = di.length - 1; > while (lastActiveIndex > 0 && isInactive(di[lastActiveIndex])) { > --lastActiveIndex; > } > int activeLen = lastActiveIndex + 1; > if(nonDatanodeReader) { > networktopology.sortByDistanceUsingNetworkLocation(client, > lb.getLocations(), activeLen, createSecondaryNodeSorter()); > } else { > networktopology.sortByDistance(client, lb.getLocations(), activeLen, > createSecondaryNodeSorter()); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15671) TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk
[ https://issues.apache.org/jira/browse/HDFS-15671?focusedWorklogId=610048&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610048 ] ASF GitHub Bot logged work on HDFS-15671: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:12 Start Date: 14/Jun/21 07:12 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3097: URL: https://github.com/apache/hadoop/pull/3097#issuecomment-859199021 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 33s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 10s | | trunk passed | | +1 :green_heart: | compile | 1m 31s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 21s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 6s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 30s | | trunk passed | | +1 :green_heart: | javadoc | 1m 3s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 33s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 22s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 47s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 19s | | the patch passed | | +1 :green_heart: | compile | 1m 22s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 22s | | the patch passed | | +1 :green_heart: | compile | 1m 13s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 13s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 57s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 19s | | the patch passed | | +1 :green_heart: | javadoc | 0m 50s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 29s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 27s | | the patch passed | | +1 :green_heart: | shadedclient | 16m 41s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 230m 22s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 48s | | The patch does not generate ASF License warnings. | | | | 318m 14s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3097/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3097 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux b130e6aeca2a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 7dbfcfb38ed0b58aa72223bf8c9b0b4d276d5e93 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3097/1/testReport/ | | Max. process+thread count | 3043 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3097/1/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This mes
[jira] [Work logged] (HDFS-16023) Improve blockReportLeaseId acquisition to avoid repeated FBR
[ https://issues.apache.org/jira/browse/HDFS-16023?focusedWorklogId=610042&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610042 ] ASF GitHub Bot logged work on HDFS-16023: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:12 Start Date: 14/Jun/21 07:12 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3091: URL: https://github.com/apache/hadoop/pull/3091#issuecomment-859581181 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 36s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 28s | | trunk passed | | +1 :green_heart: | compile | 1m 23s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 20s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 0s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 25s | | trunk passed | | +1 :green_heart: | javadoc | 0m 59s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 31s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 22s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 41s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 15s | | the patch passed | | +1 :green_heart: | compile | 1m 19s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 19s | | the patch passed | | +1 :green_heart: | compile | 1m 11s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 11s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 2s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 14s | | the patch passed | | +1 :green_heart: | javadoc | 0m 50s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 27s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 5s | | the patch passed | | +1 :green_heart: | shadedclient | 15m 57s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 249m 26s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3091/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 47s | | The patch does not generate ASF License warnings. | | | | 335m 17s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap | | | hadoop.hdfs.TestFileChecksum | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3091/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3091 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux da722c292617 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 917dc482186cc690a16304032aedb194cf9dc1ed | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3091/2/testReport/ | | Max. process+thread count | 3163 (vs. ulimit of 5500)
[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
[ https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=610085&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610085 ] ASF GitHub Bot logged work on HDFS-13671: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:15 Start Date: 14/Jun/21 07:15 Worklog Time Spent: 10m Work Description: AlphaGouGe commented on pull request #3065: URL: https://github.com/apache/hadoop/pull/3065#issuecomment-859239871 @xiaoyuyao Thanks for review, i have update this PR, take a look please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610085) Time Spent: 4h 10m (was: 4h) > Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet > -- > > Key: HDFS-13671 > URL: https://issues.apache.org/jira/browse/HDFS-13671 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0, 3.0.3 >Reporter: Yiqun Lin >Assignee: Haibin Huang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, > image-2021-06-10-19-28-58-359.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > NameNode hung when deleting large files/blocks. The stack info: > {code} > "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 > tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > {code} > In the current deletion logic in NameNode, there are mainly two steps: > * Collect INodes and all blocks to be deleted, then delete INodes. > * Remove blocks chunk by chunk in a loop. > Actually the first step should be a more expensive operation and will takes > more time. However, now we always see NN hangs during the remove block > operation. > Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a > better performance in dealing FBR/IBRs. But compared with early > implementation in remove-block logic, {{FoldedTreeSet}} seems more slower > since It will take additional time to balance tree node. When there are large > block to be removed/deleted, it looks bad. > For the get type operations in {{DatanodeStorageInfo}}, we only provide the > {{getBlockIterator}} to return blocks iterator and no other get operation > with specified block. Still we need to use {{FoldedTreeSet}} in > {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not > Update. Maybe we can revert this to the early implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately
[ https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=610095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610095 ] ASF GitHub Bot logged work on HDFS-16039: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:16 Start Date: 14/Jun/21 07:16 Worklog Time Spent: 10m Work Description: goiri commented on a change in pull request #3086: URL: https://github.com/apache/hadoop/pull/3086#discussion_r650097026 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java ## @@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException { // Initialize the cache for the DN reports Configuration conf = router.getConfig(); -this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT, -RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); this.topTokenRealOwners = conf.getInt( RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY, RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT); +// Initialize the cache for the DN reports +this.dnReportTimeOut = conf.getTimeDuration( +RBFConfigKeys.DN_REPORT_TIME_OUT, +RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); +long dnCacheExpire = conf.getTimeDuration( +RBFConfigKeys.DN_REPORT_CACHE_EXPIRE, +RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, TimeUnit.MILLISECONDS); +this.dnCache = CacheBuilder.newBuilder() Review comment: I just want to avoid having two caches of the same thing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610095) Time Spent: 1h 10m (was: 1h) > RBF: Some indicators of RBFMetrics count inaccurately > -- > > Key: HDFS-16039 > URL: https://issues.apache.org/jira/browse/HDFS-16039 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity > The current statistical algorithm is to accumulate all Nn indicators, which > will lead to inaccurate counting. I think that the same ClusterID only needs > to take one Max and then do the accumulation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15671) TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk
[ https://issues.apache.org/jira/browse/HDFS-15671?focusedWorklogId=610114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610114 ] ASF GitHub Bot logged work on HDFS-15671: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:18 Start Date: 14/Jun/21 07:18 Worklog Time Spent: 10m Work Description: jbrennan333 merged pull request #3097: URL: https://github.com/apache/hadoop/pull/3097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610114) Time Spent: 0.5h (was: 20m) > TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk > -- > > Key: HDFS-15671 > URL: https://issues.apache.org/jira/browse/HDFS-15671 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault.log > > Time Spent: 0.5h > Remaining Estimate: 0h > > qbt report shows failures on TestBalancer > {code:bash} > org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault > Failing for the past 1 build (Since Failed#317 ) > Took 45 sec. > Error Message > Timed out waiting for /tmp.txt to reach 20 replicas > Stacktrace > java.util.concurrent.TimeoutException: Timed out waiting for /tmp.txt to > reach 20 replicas > at > org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:829) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.createFile(TestBalancer.java:319) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:865) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2193) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault(TestBalancerRPCDelay.java:53) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
[ https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=610117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610117 ] ASF GitHub Bot logged work on HDFS-13671: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:18 Start Date: 14/Jun/21 07:18 Worklog Time Spent: 10m Work Description: AlphaGouGe commented on a change in pull request #3065: URL: https://github.com/apache/hadoop/pull/3065#discussion_r649666587 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -3220,21 +3172,28 @@ private void reportDiffSortedInner( // comes from the IBR / FBR and hence what we should use to compare // against the memory state. // See HDFS-6289 and HDFS-15422 for more context. -queueReportedBlock(storageInfo, replica, reportedState, +queueReportedBlock(storageInfo, block, reportedState, Review comment: @xiaoyuyao you are right, it should be storedBlock -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610117) Time Spent: 4h 20m (was: 4h 10m) > Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet > -- > > Key: HDFS-13671 > URL: https://issues.apache.org/jira/browse/HDFS-13671 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0, 3.0.3 >Reporter: Yiqun Lin >Assignee: Haibin Huang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, > image-2021-06-10-19-28-58-359.png > > Time Spent: 4h 20m > Remaining Estimate: 0h > > NameNode hung when deleting large files/blocks. The stack info: > {code} > "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 > tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > {code} > In the current deletion logic in NameNode, there are mainly two steps: > * Collect INodes and all blocks to be deleted, then delete INodes. > * Remove blocks chunk by chunk in a loop. > Actually the first step should be a more expensive operation and will takes > more time. However, now we always see NN hangs during the remove block > operation. > Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a > better performance in dealing FBR/IBRs. But compared with early > implementation in remove-block logic, {{FoldedTreeSet}} seems more slower > since It will take additional time to balance tree node. When there are
[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately
[ https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=610115&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610115 ] ASF GitHub Bot logged work on HDFS-16039: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:18 Start Date: 14/Jun/21 07:18 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3086: URL: https://github.com/apache/hadoop/pull/3086#issuecomment-859227661 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 37s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 50s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 23m 24s | | trunk passed | | +1 :green_heart: | compile | 23m 52s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 20m 21s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 4m 12s | | trunk passed | | +1 :green_heart: | mvnsite | 27m 33s | | trunk passed | | +1 :green_heart: | javadoc | 8m 23s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 8m 4s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 35m 26s | | trunk passed | | +1 :green_heart: | shadedclient | 48m 20s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 26s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 21m 22s | | the patch passed | | +1 :green_heart: | compile | 21m 49s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 21m 49s | | the patch passed | | +1 :green_heart: | compile | 18m 43s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 18m 43s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/2/artifact/out/blanks-eol.txt) | The patch has 8 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 3m 46s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/2/artifact/out/results-checkstyle-root.txt) | root: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) | | +1 :green_heart: | mvnsite | 21m 4s | | the patch passed | | +1 :green_heart: | javadoc | 7m 51s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 7m 46s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 34m 56s | | the patch passed | | +1 :green_heart: | shadedclient | 46m 49s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 780m 40s | [/patch-unit-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/2/artifact/out/patch-unit-root.txt) | root in the patch passed. | | -1 :x: | asflicense | 1m 32s | [/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/2/artifact/out/results-asflicense.txt) | The patch generated 1 ASF License warnings. | | | | 1117m 13s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.yarn.server.router.clientrm.TestFederationClientInterceptor | | | hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination | | | hadoop.hdfs.server.federation.metrics.TestRBFMetrics | | | hadoop.hdfs.server.federation.router.TestRouterRpc | | | hadoop.hdfs.TestRollingUpgrade | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop
[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
[ https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=610155&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610155 ] ASF GitHub Bot logged work on HDFS-13671: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:22 Start Date: 14/Jun/21 07:22 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3065: URL: https://github.com/apache/hadoop/pull/3065#issuecomment-859499847 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 59s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | buf | 0m 1s | | buf was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 18 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 34m 42s | | trunk passed | | +1 :green_heart: | compile | 1m 25s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 16s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 8s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 22s | | trunk passed | | +1 :green_heart: | javadoc | 0m 54s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 25s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 16s | | trunk passed | | +1 :green_heart: | shadedclient | 18m 55s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 15s | | the patch passed | | +1 :green_heart: | compile | 1m 18s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | cc | 1m 18s | | the patch passed | | +1 :green_heart: | javac | 1m 18s | | the patch passed | | +1 :green_heart: | compile | 1m 8s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | cc | 1m 8s | | the patch passed | | +1 :green_heart: | javac | 1m 8s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 0s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3065/6/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 1338 unchanged - 13 fixed = 1339 total (was 1351) | | +1 :green_heart: | mvnsite | 1m 17s | | the patch passed | | +1 :green_heart: | xml | 0m 1s | | The patch has no ill-formed XML file. | | +1 :green_heart: | javadoc | 0m 49s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 18s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 20s | | the patch passed | | +1 :green_heart: | shadedclient | 18m 39s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 346m 2s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3065/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | -1 :x: | asflicense | 0m 38s | [/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3065/6/artifact/out/results-asflicense.txt) | The patch generated 2 ASF License warnings. | | | | 439m 46s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.TestDFSShell | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList | | | hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.or
[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
[ https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=610170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610170 ] ASF GitHub Bot logged work on HDFS-13671: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:24 Start Date: 14/Jun/21 07:24 Worklog Time Spent: 10m Work Description: aajisaka commented on a change in pull request #3065: URL: https://github.com/apache/hadoop/pull/3065#discussion_r649700823 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java ## @@ -3111,106 +3042,127 @@ void processFirstBlockReport( } } - private void reportDiffSorted(DatanodeStorageInfo storageInfo, - Iterable newReport, + private void reportDiff(DatanodeStorageInfo storageInfo, + BlockListAsLongs newReport, Collection toAdd, // add to DatanodeDescriptor Collection toRemove, // remove from DatanodeDescriptor Collection toInvalidate, // should be removed from DN Collection toCorrupt, // add to corrupt replicas list Collection toUC) { // add to under-construction list -// The blocks must be sorted and the storagenodes blocks must be sorted -Iterator storageBlocksIterator = storageInfo.getBlockIterator(); +// place a delimiter in the list which separates blocks +// that have been reported from those that have not DatanodeDescriptor dn = storageInfo.getDatanodeDescriptor(); -BlockInfo storageBlock = null; - -for (BlockReportReplica replica : newReport) { - - long replicaID = replica.getBlockId(); - if (BlockIdManager.isStripedBlockID(replicaID) - && (!hasNonEcBlockUsingStripedID || - !blocksMap.containsBlock(replica))) { -replicaID = BlockIdManager.convertToStripedID(replicaID); - } - - ReplicaState reportedState = replica.getState(); - - LOG.debug("Reported block {} on {} size {} replicaState = {}", - replica, dn, replica.getNumBytes(), reportedState); - - if (shouldPostponeBlocksFromFuture - && isGenStampInFuture(replica)) { -queueReportedBlock(storageInfo, replica, reportedState, - QUEUE_REASON_FUTURE_GENSTAMP); -continue; - } - - if (storageBlock == null && storageBlocksIterator.hasNext()) { -storageBlock = storageBlocksIterator.next(); - } - - do { -int cmp; -if (storageBlock == null || -(cmp = Long.compare(replicaID, storageBlock.getBlockId())) < 0) { - // Check if block is available in NN but not yet on this storage - BlockInfo nnBlock = blocksMap.getStoredBlock(new Block(replicaID)); - if (nnBlock != null) { -reportDiffSortedInner(storageInfo, replica, reportedState, - nnBlock, toAdd, toCorrupt, toUC); - } else { -// Replica not found anywhere so it should be invalidated -toInvalidate.add(new Block(replica)); - } - break; -} else if (cmp == 0) { - // Replica matched current storageblock - reportDiffSortedInner(storageInfo, replica, reportedState, -storageBlock, toAdd, toCorrupt, toUC); - storageBlock = null; -} else { - // replica has higher ID than storedBlock - // Remove all stored blocks with IDs lower than replica - do { -toRemove.add(storageBlock); -storageBlock = storageBlocksIterator.hasNext() - ? storageBlocksIterator.next() : null; - } while (storageBlock != null && - Long.compare(replicaID, storageBlock.getBlockId()) > 0); +Block delimiterBlock = new Block(); +BlockInfo delimiter = new BlockInfoContiguous(delimiterBlock, +(short) 1); +AddBlockResult result = storageInfo.addBlock(delimiter, delimiterBlock); +assert result == AddBlockResult.ADDED +: "Delimiting block cannot be present in the node"; +int headIndex = 0; //currently the delimiter is in the head of the list +int curIndex; + +if (newReport == null) { + newReport = BlockListAsLongs.EMPTY; +} +// scan the report and process newly reported blocks +for (BlockReportReplica iblk : newReport) { + ReplicaState iState = iblk.getState(); + LOG.debug("Reported block {} on {} size {} replicaState = {}", iblk, dn, + iblk.getNumBytes(), iState); + BlockInfo storedBlock = processReportedBlock(storageInfo, + iblk, iState, toAdd, toInvalidate, toCorrupt, toUC); + + // move block to the head of the list + if (storedBlock != null) { +curIndex = storedBlock.findStorageInfo(storageInfo); +if (curIndex >= 0) { + headIndex = + stor
[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately
[ https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=610208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610208 ] ASF GitHub Bot logged work on HDFS-16039: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:27 Start Date: 14/Jun/21 07:27 Worklog Time Spent: 10m Work Description: base111 commented on a change in pull request #3086: URL: https://github.com/apache/hadoop/pull/3086#discussion_r650119510 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java ## @@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException { // Initialize the cache for the DN reports Configuration conf = router.getConfig(); -this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT, -RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); this.topTokenRealOwners = conf.getInt( RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY, RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT); +// Initialize the cache for the DN reports +this.dnReportTimeOut = conf.getTimeDuration( +RBFConfigKeys.DN_REPORT_TIME_OUT, +RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); +long dnCacheExpire = conf.getTimeDuration( +RBFConfigKeys.DN_REPORT_CACHE_EXPIRE, +RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, TimeUnit.MILLISECONDS); +this.dnCache = CacheBuilder.newBuilder() Review comment: Yes,They should use the same dncache. In addition, I want to extract NamesystemMetrics and NameNodeInfoMetrics into RBFMetrics. I don't think they should be serialized to StateStore and then de-serialized to be used by RBFMetrics. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610208) Time Spent: 1.5h (was: 1h 20m) > RBF: Some indicators of RBFMetrics count inaccurately > -- > > Key: HDFS-16039 > URL: https://issues.apache.org/jira/browse/HDFS-16039 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity > The current statistical algorithm is to accumulate all Nn indicators, which > will lead to inaccurate counting. I think that the same ClusterID only needs > to take one Max and then do the accumulation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16057) Make sure the order for location in ENTERING_MAINTENANCE state
[ https://issues.apache.org/jira/browse/HDFS-16057?focusedWorklogId=610252&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610252 ] ASF GitHub Bot logged work on HDFS-16057: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:32 Start Date: 14/Jun/21 07:32 Worklog Time Spent: 10m Work Description: tasanuma commented on pull request #3084: URL: https://github.com/apache/hadoop/pull/3084#issuecomment-859387291 Merged. Thanks for your contribution, @tomscut. Thanks for your reviews, @jojochuang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610252) Time Spent: 1h 40m (was: 1.5h) > Make sure the order for location in ENTERING_MAINTENANCE state > -- > > Key: HDFS-16057 > URL: https://issues.apache.org/jira/browse/HDFS-16057 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We use comparator to sort locations in getBlockLocations(), and the expected > result is: live -> stale -> entering_maintenance -> decommissioned. > But the networktopology. SortByDistance() will disrupt the order. We should > also filtered out node in sate AdminStates.ENTERING_MAINTENANCE before > networktopology. SortByDistance(). > > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager#sortLocatedBlock() > {code:java} > DatanodeInfoWithStorage[] di = lb.getLocations(); > // Move decommissioned/stale datanodes to the bottom > Arrays.sort(di, comparator); > // Sort nodes by network distance only for located blocks > int lastActiveIndex = di.length - 1; > while (lastActiveIndex > 0 && isInactive(di[lastActiveIndex])) { > --lastActiveIndex; > } > int activeLen = lastActiveIndex + 1; > if(nonDatanodeReader) { > networktopology.sortByDistanceUsingNetworkLocation(client, > lb.getLocations(), activeLen, createSecondaryNodeSorter()); > } else { > networktopology.sortByDistance(client, lb.getLocations(), activeLen, > createSecondaryNodeSorter()); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately
[ https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=610275&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610275 ] ASF GitHub Bot logged work on HDFS-16039: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:34 Start Date: 14/Jun/21 07:34 Worklog Time Spent: 10m Work Description: zhuxiangyi commented on a change in pull request #3086: URL: https://github.com/apache/hadoop/pull/3086#discussion_r649727373 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java ## @@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException { // Initialize the cache for the DN reports Configuration conf = router.getConfig(); -this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT, -RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); this.topTokenRealOwners = conf.getInt( RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY, RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT); +// Initialize the cache for the DN reports +this.dnReportTimeOut = conf.getTimeDuration( +RBFConfigKeys.DN_REPORT_TIME_OUT, +RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); +long dnCacheExpire = conf.getTimeDuration( +RBFConfigKeys.DN_REPORT_CACHE_EXPIRE, +RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, TimeUnit.MILLISECONDS); +this.dnCache = CacheBuilder.newBuilder() Review comment: > RouterRpcServer has a similar cache, can we use that? Yes we can use it. NamesystemMetrics and NamenodeInfoMetrics will be stored in StateStore by NamenodeBeanMetrics. It does not need to be stored, right? Is it better for us to cache it in RBFMetrics. ` private void updateJMXParameters( String address, NamenodeStatusReport report) { try { // TODO part of this should be moved to its own utility getFsNamesystemMetrics(address, report); getNamenodeInfoMetrics(address, report); } catch (Exception e) { LOG.error("Cannot get stat from {} using JMX", getNamenodeDesc(), e); } }` ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java ## @@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException { // Initialize the cache for the DN reports Configuration conf = router.getConfig(); -this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT, -RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); this.topTokenRealOwners = conf.getInt( RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY, RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT); +// Initialize the cache for the DN reports +this.dnReportTimeOut = conf.getTimeDuration( +RBFConfigKeys.DN_REPORT_TIME_OUT, +RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); +long dnCacheExpire = conf.getTimeDuration( +RBFConfigKeys.DN_REPORT_CACHE_EXPIRE, +RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, TimeUnit.MILLISECONDS); +this.dnCache = CacheBuilder.newBuilder() Review comment: > RouterRpcServer has a similar cache, can we use that? Yes we can use it. NamesystemMetrics and NamenodeInfoMetrics will be stored in StateStore by NamenodeBeanMetrics. It does not need to be stored, right? Is it better for us to cache it in RBFMetrics. ``` private void updateJMXParameters( String address, NamenodeStatusReport report) { try { // TODO part of this should be moved to its own utility getFsNamesystemMetrics(address, report); getNamenodeInfoMetrics(address, report); } catch (Exception e) { LOG.error("Cannot get stat from {} using JMX", getNamenodeDesc(), e); } } ``` ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java ## @@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException { // Initialize the cache for the DN reports Configuration conf = router.getConfig(); -this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT, -RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS); this.topTokenRealOwners = conf.getInt( RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY, RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT); +// Initialize the cache for the DN reports +this.dnReportTimeOut = conf.getTimeDuration( +RBFConfigKeys.DN_REPORT_TIME_OUT, +RB
[jira] [Work logged] (HDFS-16057) Make sure the order for location in ENTERING_MAINTENANCE state
[ https://issues.apache.org/jira/browse/HDFS-16057?focusedWorklogId=610304&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610304 ] ASF GitHub Bot logged work on HDFS-16057: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:37 Start Date: 14/Jun/21 07:37 Worklog Time Spent: 10m Work Description: tasanuma merged pull request #3084: URL: https://github.com/apache/hadoop/pull/3084 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610304) Time Spent: 1h 50m (was: 1h 40m) > Make sure the order for location in ENTERING_MAINTENANCE state > -- > > Key: HDFS-16057 > URL: https://issues.apache.org/jira/browse/HDFS-16057 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > We use comparator to sort locations in getBlockLocations(), and the expected > result is: live -> stale -> entering_maintenance -> decommissioned. > But the networktopology. SortByDistance() will disrupt the order. We should > also filtered out node in sate AdminStates.ENTERING_MAINTENANCE before > networktopology. SortByDistance(). > > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager#sortLocatedBlock() > {code:java} > DatanodeInfoWithStorage[] di = lb.getLocations(); > // Move decommissioned/stale datanodes to the bottom > Arrays.sort(di, comparator); > // Sort nodes by network distance only for located blocks > int lastActiveIndex = di.length - 1; > while (lastActiveIndex > 0 && isInactive(di[lastActiveIndex])) { > --lastActiveIndex; > } > int activeLen = lastActiveIndex + 1; > if(nonDatanodeReader) { > networktopology.sortByDistanceUsingNetworkLocation(client, > lb.getLocations(), activeLen, createSecondaryNodeSorter()); > } else { > networktopology.sortByDistance(client, lb.getLocations(), activeLen, > createSecondaryNodeSorter()); > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16065) RBF: Add metrics to record Router's operations
[ https://issues.apache.org/jira/browse/HDFS-16065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16065: -- Labels: pull-request-available (was: ) > RBF: Add metrics to record Router's operations > -- > > Key: HDFS-16065 > URL: https://issues.apache.org/jira/browse/HDFS-16065 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Janus Chow >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently, Router's operations are not well recorded. It would be good to > have a similar metrics as "Hadoop:service=NameNode,name=NameNodeActivity" for > NameNode, which shows the count for each operations. > Besides, some operations are invoked concurrently in Routers, know the counts > for concurrent operations would help us better knowing about the cluster's > state. > This ticket is to add normal operation metrics and concurrent operation > metrics for Router. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations
[ https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=610341&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610341 ] ASF GitHub Bot logged work on HDFS-16065: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:40 Start Date: 14/Jun/21 07:40 Worklog Time Spent: 10m Work Description: symious opened a new pull request #3100: URL: https://github.com/apache/hadoop/pull/3100 ## What changes were proposed in this pull request? Currently, Router's operations are not well recorded. It would be good to have a similar metrics as "Hadoop:service=NameNode,name=NameNodeActivity" for NameNode, which shows the count for each operations. Besides, some operations are invoked concurrently in Routers, know the counts for concurrent operations would help us better knowing about the cluster's state. This ticket is to add normal operation metrics and concurrent operation metrics for Router. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDFS-16065 ## How was this patch tested? Add unit test to test normal operation and concurrent ooperation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610341) Remaining Estimate: 0h Time Spent: 10m > RBF: Add metrics to record Router's operations > -- > > Key: HDFS-16065 > URL: https://issues.apache.org/jira/browse/HDFS-16065 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Janus Chow >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Currently, Router's operations are not well recorded. It would be good to > have a similar metrics as "Hadoop:service=NameNode,name=NameNodeActivity" for > NameNode, which shows the count for each operations. > Besides, some operations are invoked concurrently in Routers, know the counts > for concurrent operations would help us better knowing about the cluster's > state. > This ticket is to add normal operation metrics and concurrent operation > metrics for Router. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations
[ https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=610349&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610349 ] ASF GitHub Bot logged work on HDFS-16065: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:41 Start Date: 14/Jun/21 07:41 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3100: URL: https://github.com/apache/hadoop/pull/3100#issuecomment-859660072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610349) Time Spent: 20m (was: 10m) > RBF: Add metrics to record Router's operations > -- > > Key: HDFS-16065 > URL: https://issues.apache.org/jira/browse/HDFS-16065 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Janus Chow >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently, Router's operations are not well recorded. It would be good to > have a similar metrics as "Hadoop:service=NameNode,name=NameNodeActivity" for > NameNode, which shows the count for each operations. > Besides, some operations are invoked concurrently in Routers, know the counts > for concurrent operations would help us better knowing about the cluster's > state. > This ticket is to add normal operation metrics and concurrent operation > metrics for Router. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations
[ https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=610356&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610356 ] ASF GitHub Bot logged work on HDFS-16065: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:42 Start Date: 14/Jun/21 07:42 Worklog Time Spent: 10m Work Description: symious commented on pull request #3100: URL: https://github.com/apache/hadoop/pull/3100#issuecomment-859595826 @goiri Could you have a look at this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610356) Time Spent: 0.5h (was: 20m) > RBF: Add metrics to record Router's operations > -- > > Key: HDFS-16065 > URL: https://issues.apache.org/jira/browse/HDFS-16065 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Janus Chow >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, Router's operations are not well recorded. It would be good to > have a similar metrics as "Hadoop:service=NameNode,name=NameNodeActivity" for > NameNode, which shows the count for each operations. > Besides, some operations are invoked concurrently in Routers, know the counts > for concurrent operations would help us better knowing about the cluster's > state. > This ticket is to add normal operation metrics and concurrent operation > metrics for Router. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16061) DFTestUtil.waitReplication can produce false positives
[ https://issues.apache.org/jira/browse/HDFS-16061?focusedWorklogId=610353&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610353 ] ASF GitHub Bot logged work on HDFS-16061: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:42 Start Date: 14/Jun/21 07:42 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3095: URL: https://github.com/apache/hadoop/pull/3095#issuecomment-859933114 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 36s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 34m 3s | | trunk passed | | +1 :green_heart: | compile | 1m 44s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 31s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 19s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 52s | | trunk passed | | +1 :green_heart: | javadoc | 1m 10s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 45s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 4m 15s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 17s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 32s | | the patch passed | | +1 :green_heart: | compile | 1m 39s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 39s | | the patch passed | | +1 :green_heart: | compile | 1m 27s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 27s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 8s | | hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 86 unchanged - 2 fixed = 86 total (was 88) | | +1 :green_heart: | mvnsite | 1m 38s | | the patch passed | | +1 :green_heart: | javadoc | 1m 2s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 47s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 4m 30s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 33s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 251m 16s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3095/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 48s | | The patch does not generate ASF License warnings. | | | | 356m 0s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor | | | hadoop.hdfs.TestDFSStripedInputStream | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3095/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3095 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux fa237f688cc1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 4040ad64a4537d47e3633d23daaaefe7e9d7192a | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoo
[jira] [Work logged] (HDFS-16043) HDFS : Delete performance optimization
[ https://issues.apache.org/jira/browse/HDFS-16043?focusedWorklogId=610398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610398 ] ASF GitHub Bot logged work on HDFS-16043: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:47 Start Date: 14/Jun/21 07:47 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3063: URL: https://github.com/apache/hadoop/pull/3063#issuecomment-859555619 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 36s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 1s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 8 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 53s | | trunk passed | | +1 :green_heart: | compile | 1m 23s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 16s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 9s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 26s | | trunk passed | | +1 :green_heart: | javadoc | 0m 58s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 29s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 7s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 7s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 13s | | the patch passed | | +1 :green_heart: | compile | 1m 14s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 14s | | the patch passed | | +1 :green_heart: | compile | 1m 7s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 7s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 0s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3063/3/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 927 unchanged - 1 fixed = 931 total (was 928) | | +1 :green_heart: | mvnsite | 1m 15s | | the patch passed | | +1 :green_heart: | xml | 0m 2s | | The patch has no ill-formed XML file. | | +1 :green_heart: | javadoc | 0m 48s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 20s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 5s | | the patch passed | | +1 :green_heart: | shadedclient | 16m 5s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 240m 56s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3063/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 45s | | The patch does not generate ASF License warnings. | | | | 328m 26s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork | | | hadoop.hdfs.TestBlocksScheduledCounter | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3063/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3063 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell xml | | uname | Linux 7e43feeb60ba 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 2db702b034d7b309083832c10347d793d0700f03 | | Default Java | Private Buil
[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations
[ https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=610483&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610483 ] ASF GitHub Bot logged work on HDFS-16065: - Author: ASF GitHub Bot Created on: 14/Jun/21 07:56 Start Date: 14/Jun/21 07:56 Worklog Time Spent: 10m Work Description: goiri commented on a change in pull request #3100: URL: https://github.com/apache/hadoop/pull/3100#discussion_r65027 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java ## @@ -471,6 +471,9 @@ private Object invokeMethod( if (this.rpcMonitor != null) { this.rpcMonitor.proxyOpComplete(true); } +if (this.router.getRouterMetrics() != null) { + this.router.getRouterMetrics().incInvokedMethod(method); Review comment: Can you point me to the equivalent for the Namenode? I thought that the RouterRpcServer was already tracking most of these metrics. ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/metrics/TestRouterMetrics.java ## @@ -0,0 +1,121 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.federation.metrics; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hdfs.DFSConfigKeys; +import org.apache.hadoop.hdfs.HdfsConfiguration; +import org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster; +import org.apache.hadoop.hdfs.server.federation.RouterConfigBuilder; +import org.junit.AfterClass; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.io.IOException; + +import static org.apache.hadoop.test.MetricsAsserts.assertCounter; +import static org.apache.hadoop.test.MetricsAsserts.getMetrics; + +/** + * Test case for FilesInGetListingOps metric in Namenode + */ +public class TestRouterMetrics { + private static final Configuration CONF = new HdfsConfiguration(); + private static final String ROUTER_METRICS = "RouterActivity"; + static { +CONF.setLong(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, 100); +CONF.setInt(DFSConfigKeys.DFS_BYTES_PER_CHECKSUM_KEY, 1); +CONF.setLong(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 1L); +CONF.setInt(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY, 1); + } + + private static final int NUM_SUBCLUSTERS = 2; + private static final int NUM_DNS = 3; + + + /** Federated HDFS cluster. */ + private static MiniRouterDFSCluster cluster; + + /** Random Router for this federated cluster. */ + private MiniRouterDFSCluster.RouterContext router; + + /** Filesystem interface to the Router. */ + private FileSystem routerFS; + /** Filesystem interface to the Namenode. */ + private FileSystem nnFS; + + @BeforeClass + public static void globalSetUp() throws Exception { +cluster = new MiniRouterDFSCluster(false, NUM_SUBCLUSTERS); +cluster.setNumDatanodesPerNameservice(NUM_DNS); +cluster.startCluster(); + +Configuration routerConf = new RouterConfigBuilder() +.metrics() +.rpc() +.build(); +cluster.addRouterOverrides(routerConf); +cluster.startRouters(); + +// Register and verify all NNs with all routers +cluster.registerNamenodes(); +cluster.waitNamenodeRegistration(); + + } + + @Before + public void testSetup() throws Exception { +// Create mock locations +cluster.installMockLocations(); + +// Delete all files via the NNs and verify +cluster.deleteAllFiles(); + +// Create test fixtures on NN +cluster.createTestDirectoriesNamenode(); + +// Wait to ensure NN has fully created its test directories +Thread.sleep(100); + +router = cluster.getRouters().get(0); +this.routerFS = router.getFileSystem(); + + } + + @AfterClass + public static void tearDown() throws Exception { +cluster.shutdown(); + } + + @Test + public void testGetListing() throws IOException { Review com
[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations
[ https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=610565&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610565 ] ASF GitHub Bot logged work on HDFS-16065: - Author: ASF GitHub Bot Created on: 14/Jun/21 08:07 Start Date: 14/Jun/21 08:07 Worklog Time Spent: 10m Work Description: symious commented on a change in pull request #3100: URL: https://github.com/apache/hadoop/pull/3100#discussion_r650457056 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java ## @@ -471,6 +471,9 @@ private Object invokeMethod( if (this.rpcMonitor != null) { this.rpcMonitor.proxyOpComplete(true); } +if (this.router.getRouterMetrics() != null) { + this.router.getRouterMetrics().incInvokedMethod(method); Review comment: @goiri Thanks for the review. A similar metrics for NameNode is "org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics". I think this RouterMetrics is trying to monitor from a different view of RouterRpcServer. IMHO, RouterRpcServer is monitoring Router as a Server, the new metrics would be treating Router as a Client. Some operations are not triggered by user requests but from Router itself. In our cluster, Router's async call queue was jammed, but we couldn't get the operations from the RouterRpcServer metrics. Then we found most of the operations are "getDatanodeReport" which are invoked by Router's FederationMetrics and not by users. ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/metrics/TestRouterMetrics.java ## @@ -0,0 +1,121 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.federation.metrics; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hdfs.DFSConfigKeys; +import org.apache.hadoop.hdfs.HdfsConfiguration; +import org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster; +import org.apache.hadoop.hdfs.server.federation.RouterConfigBuilder; +import org.junit.AfterClass; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; + +import java.io.IOException; + +import static org.apache.hadoop.test.MetricsAsserts.assertCounter; +import static org.apache.hadoop.test.MetricsAsserts.getMetrics; + +/** + * Test case for FilesInGetListingOps metric in Namenode + */ +public class TestRouterMetrics { + private static final Configuration CONF = new HdfsConfiguration(); + private static final String ROUTER_METRICS = "RouterActivity"; + static { +CONF.setLong(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, 100); +CONF.setInt(DFSConfigKeys.DFS_BYTES_PER_CHECKSUM_KEY, 1); +CONF.setLong(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 1L); +CONF.setInt(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY, 1); + } + + private static final int NUM_SUBCLUSTERS = 2; + private static final int NUM_DNS = 3; + + + /** Federated HDFS cluster. */ + private static MiniRouterDFSCluster cluster; + + /** Random Router for this federated cluster. */ + private MiniRouterDFSCluster.RouterContext router; + + /** Filesystem interface to the Router. */ + private FileSystem routerFS; + /** Filesystem interface to the Namenode. */ + private FileSystem nnFS; + + @BeforeClass + public static void globalSetUp() throws Exception { +cluster = new MiniRouterDFSCluster(false, NUM_SUBCLUSTERS); +cluster.setNumDatanodesPerNameservice(NUM_DNS); +cluster.startCluster(); + +Configuration routerConf = new RouterConfigBuilder() +.metrics() +.rpc() +.build(); +cluster.addRouterOverrides(routerConf); +cluster.startRouters(); + +// Register and verify all NNs with all routers +cluster.registerNamenodes(); +cluster.waitNamenodeRegistration(); + + } + + @Before + public void testSetup() throws Exception { +// Create mock locations +cluster.installMockLo
[jira] [Issue Comment Deleted] (HDFS-9126) namenode crash in fsimage download/transfer
[ https://issues.apache.org/jira/browse/HDFS-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Seokchan Yoon updated HDFS-9126: Comment: was deleted (was: Why is this closed? I ran into the same situation and need to figure out the reason why the previous active NN failed on doHealthChecks.) > namenode crash in fsimage download/transfer > --- > > Key: HDFS-9126 > URL: https://issues.apache.org/jira/browse/HDFS-9126 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: OS:Centos 6.5(final) > Apache Hadoop:2.6.0 > namenode ha base 5 journalnodes >Reporter: zengyongping >Priority: Critical > > In our product Hadoop cluster,when active namenode begin download/transfer > fsimage from standby namenode.some times zkfc monitor health of NameNode > socket timeout,zkfs judge active namenode status SERVICE_NOT_RESPONDING > ,happen hadoop namenode ha failover,fence old active namenode. > zkfc logs: > 2015-09-24 11:44:44,739 WARN org.apache.hadoop.ha.HealthMonitor: > Transport-level exception trying to monitor health of NameNode at > hostname1/192.168.10.11:8020: Call From hostname1/192.168.10.11 to > hostname1:8020 failed on socket timeout exception: > java.net.SocketTimeoutException: 45000 millis timeout while waiting for > channel to be ready for read. ch : java.nio.channels.SocketChannel[connected > local=/192.168.10.11:22614 remote=hostname1/192.168.10.11:8020]; For more > details see: http://wiki.apache.org/hadoop/SocketTimeout > 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.HealthMonitor: Entering > state SERVICE_NOT_RESPONDING > 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ZKFailoverController: Local > service NameNode at hostname1/192.168.10.11:8020 entered state: > SERVICE_NOT_RESPONDING > 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ZKFailoverController: > Quitting master election for NameNode at hostname1/192.168.10.11:8020 and > marking that fencing is necessary > 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ActiveStandbyElector: > Yielding from election > 2015-09-24 11:44:44,761 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x54d81348fe503e3 closed > 2015-09-24 11:44:44,761 WARN org.apache.hadoop.ha.ActiveStandbyElector: > Ignoring stale result from old client with sessionId 0x54d81348fe503e3 > 2015-09-24 11:44:44,764 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > namenode logs: > 2015-09-24 11:43:34,074 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from > 192.168.10.12 > 2015-09-24 11:43:34,074 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs > 2015-09-24 11:43:34,075 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment > 2317430129 > 2015-09-24 11:43:34,253 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: > 272988 Total time for transactions(ms): 5502 Number of transactions batched > in Syncs: 146274 Number of syncs: 32375 SyncTimes(ms): 274465 319599 > 2015-09-24 11:43:46,005 INFO > org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: > Rescanning after 3 milliseconds > 2015-09-24 11:44:21,054 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > PendingReplicationMonitor timed out blk_1185804191_112164210 > 2015-09-24 11:44:36,076 INFO > org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits > file > /software/data/hadoop-data/hdfs/namenode/current/edits_inprogress_02317430129 > -> > /software/data/hadoop-data/hdfs/namenode/current/edits_02317430129-02317703116 > 2015-09-24 11:44:36,077 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at > 2317703117 > 2015-09-24 11:45:38,008 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1 > Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 > Number of syncs: 0 SyncTimes(ms): 0 61585 > 2015-09-24 11:45:38,009 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took 222.88s > at 63510.29 KB/s > 2015-09-24 11:45:38,009 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file > fsimage.ckpt_02317430128 size 14495092105 bytes. > 2015-09-24 11:45:38,416 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal > 192.168.10.13:8485 failed to write txns 2317703117-2317703117. Will try to > write to this JN again after the next log roll. > org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 44 is > less than the last promised epoch 45 > at > org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:414) > at > org.apache.hadoop.hd
[jira] [Commented] (HDFS-15352) WebHdfsFileSystem does not log the exception that causes retries
[ https://issues.apache.org/jira/browse/HDFS-15352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362911#comment-17362911 ] Ayush Saxena commented on HDFS-15352: - can we keep the trace in debug mode? > WebHdfsFileSystem does not log the exception that causes retries > > > Key: HDFS-15352 > URL: https://issues.apache.org/jira/browse/HDFS-15352 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 3.3.1 > Environment: When the WebHdfsFileSystem performs retries, it swallows > up the original exception if retries are successful. This makes debugging the > source of latency spikes difficult. >Reporter: Simbarashe Dzinamarira >Assignee: Simbarashe Dzinamarira >Priority: Minor > Attachments: HDFS-15352.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16016) BPServiceActor add a new thread to handle IBR
[ https://issues.apache.org/jira/browse/HDFS-16016?focusedWorklogId=610677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610677 ] ASF GitHub Bot logged work on HDFS-16016: - Author: ASF GitHub Bot Created on: 14/Jun/21 13:52 Start Date: 14/Jun/21 13:52 Worklog Time Spent: 10m Work Description: virajjasani commented on pull request #2998: URL: https://github.com/apache/hadoop/pull/2998#issuecomment-860701898 Thanks for the review @smengcl. I have addressed your concerns. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610677) Time Spent: 4.5h (was: 4h 20m) > BPServiceActor add a new thread to handle IBR > - > > Key: HDFS-16016 > URL: https://issues.apache.org/jira/browse/HDFS-16016 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: JiangHua Zhu >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. > We can handle IBR independently to improve the performance of heartbeat and > FBR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16016) BPServiceActor add a new thread to handle IBR
[ https://issues.apache.org/jira/browse/HDFS-16016?focusedWorklogId=610682&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610682 ] ASF GitHub Bot logged work on HDFS-16016: - Author: ASF GitHub Bot Created on: 14/Jun/21 13:53 Start Date: 14/Jun/21 13:53 Worklog Time Spent: 10m Work Description: virajjasani commented on a change in pull request #2998: URL: https://github.com/apache/hadoop/pull/2998#discussion_r650968368 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java ## @@ -1104,6 +1122,34 @@ private void sendLifeline() throws IOException { } } + class IBRTaskHandler implements Runnable { + +@Override +public void run() { + LOG.info("Starting IBR Task Handler."); + while (shouldRun()) { +try { + final long startTime = scheduler.monotonicNow(); + final boolean sendHeartbeat = scheduler.isHeartbeatDue(startTime); + if (!dn.areIBRDisabledForTests() && + (ibrManager.sendImmediately() || sendHeartbeat)) { +synchronized (sendIBRLock) { + ibrManager.sendIBRs(bpNamenode, bpRegistration, + bpos.getBlockPoolId(), getRpcMetricSuffix()); +} + } + // There is no work to do; sleep until hearbeat timer elapses, + // or work arrives, and then iterate again. + ibrManager.waitTillNextIBR(scheduler.getHeartbeatWaitTime()); Review comment: > With IBR separated in a new thread, maybe later we could have a new config key that controls IBR interval separately, or add a configurable constant offset (from the FBR timer) to the IBR timer. This isn't something we need to add to this jira. Just a thought. I agree. We can add new config or continue with IBR/FBR expiring around same time for some time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610682) Time Spent: 4h 40m (was: 4.5h) > BPServiceActor add a new thread to handle IBR > - > > Key: HDFS-16016 > URL: https://issues.apache.org/jira/browse/HDFS-16016 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: JiangHua Zhu >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > > Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. > We can handle IBR independently to improve the performance of heartbeat and > FBR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations
[ https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=610744&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610744 ] ASF GitHub Bot logged work on HDFS-16065: - Author: ASF GitHub Bot Created on: 14/Jun/21 15:51 Start Date: 14/Jun/21 15:51 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3100: URL: https://github.com/apache/hadoop/pull/3100#issuecomment-860794285 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 38s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 52s | | trunk passed | | +1 :green_heart: | compile | 0m 46s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 0m 36s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 0m 24s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 40s | | trunk passed | | +1 :green_heart: | javadoc | 0m 40s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 0m 59s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 1m 19s | | trunk passed | | +1 :green_heart: | shadedclient | 14m 22s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 34s | | the patch passed | | +1 :green_heart: | compile | 0m 36s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 0m 36s | | the patch passed | | +1 :green_heart: | compile | 0m 32s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 0m 32s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 22s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3100/3/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 573 new + 0 unchanged - 0 fixed = 573 total (was 0) | | +1 :green_heart: | mvnsite | 0m 33s | | the patch passed | | +1 :green_heart: | javadoc | 0m 32s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 0m 52s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 1m 21s | | the patch passed | | +1 :green_heart: | shadedclient | 18m 14s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 19s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 31s | | The patch does not generate ASF License warnings. | | | | 98m 29s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3100/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3100 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 638e995dec2b 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 00e68678f1ce2a0e21a640774d00bcd1859821d1 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3100/3/testReport/ | | Max. process+thread count | 2359 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/h
[jira] [Updated] (HDFS-15659) Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-15659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jim Brennan updated HDFS-15659: --- Fix Version/s: 3.3.2 > Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster > --- > > Key: HDFS-15659 > URL: https://issues.apache.org/jira/browse/HDFS-15659 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > dfs.namenode.redundancy.considerLoad is true by default and it is causing > many test failures. Let's disable it in MiniDFSCluster. > Originally reported by [~weichiu]: > https://github.com/apache/hadoop/pull/2410#pullrequestreview-51612 > {quote} > i've certain seen this option causing test failures in the past. > Maybe we should turn it off by default in MiniDDFSCluster, and only enable it > for specific tests. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations
[ https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=610758&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610758 ] ASF GitHub Bot logged work on HDFS-16065: - Author: ASF GitHub Bot Created on: 14/Jun/21 16:26 Start Date: 14/Jun/21 16:26 Worklog Time Spent: 10m Work Description: goiri commented on a change in pull request #3100: URL: https://github.com/apache/hadoop/pull/3100#discussion_r651097583 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java ## @@ -471,6 +471,9 @@ private Object invokeMethod( if (this.rpcMonitor != null) { this.rpcMonitor.proxyOpComplete(true); } +if (this.router.getRouterMetrics() != null) { + this.router.getRouterMetrics().incInvokedMethod(method); Review comment: You are adding all these metrics raw to RouterMetrics. I'm wondering if we should have something that refers to this being metrics for the client. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610758) Time Spent: 1h 10m (was: 1h) > RBF: Add metrics to record Router's operations > -- > > Key: HDFS-16065 > URL: https://issues.apache.org/jira/browse/HDFS-16065 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Janus Chow >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently, Router's operations are not well recorded. It would be good to > have a similar metrics as "Hadoop:service=NameNode,name=NameNodeActivity" for > NameNode, which shows the count for each operations. > Besides, some operations are invoked concurrently in Routers, know the counts > for concurrent operations would help us better knowing about the cluster's > state. > This ticket is to add normal operation metrics and concurrent operation > metrics for Router. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15659) Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-15659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363057#comment-17363057 ] Jim Brennan commented on HDFS-15659: [~ahussein] I cherry-picked this to branch-3.3, but there are merge conflicts when trying to pull back further. Please provide patches for earlier branches, if desired. > Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster > --- > > Key: HDFS-15659 > URL: https://issues.apache.org/jira/browse/HDFS-15659 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > dfs.namenode.redundancy.considerLoad is true by default and it is causing > many test failures. Let's disable it in MiniDFSCluster. > Originally reported by [~weichiu]: > https://github.com/apache/hadoop/pull/2410#pullrequestreview-51612 > {quote} > i've certain seen this option causing test failures in the past. > Maybe we should turn it off by default in MiniDDFSCluster, and only enable it > for specific tests. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16055) Quota is not preserved in snapshot INode
[ https://issues.apache.org/jira/browse/HDFS-16055?focusedWorklogId=610792&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610792 ] ASF GitHub Bot logged work on HDFS-16055: - Author: ASF GitHub Bot Created on: 14/Jun/21 17:48 Start Date: 14/Jun/21 17:48 Worklog Time Spent: 10m Work Description: smengcl commented on pull request #3078: URL: https://github.com/apache/hadoop/pull/3078#issuecomment-860872516 UT failures are unrelated now. Will merge shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610792) Time Spent: 1h 20m (was: 1h 10m) > Quota is not preserved in snapshot INode > > > Key: HDFS-16055 > URL: https://issues.apache.org/jira/browse/HDFS-16055 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.3.0 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > Quota feature is not preserved during snapshot creation, this causes > {{INodeDirectory#metadataEquals}} to ALWAYS return true. Therefore, > {{snapshotDiff}} will ALWAYS return the snapshot root as modified, even if > the quota is set before the snapshot creation: > {code:bash} > $ hdfs snapshotDiff /diffTest s0 . > Difference between snapshot s0 and current directory under directory > /diffTest: > M . > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16055) Quota is not preserved in snapshot INode
[ https://issues.apache.org/jira/browse/HDFS-16055?focusedWorklogId=610794&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610794 ] ASF GitHub Bot logged work on HDFS-16055: - Author: ASF GitHub Bot Created on: 14/Jun/21 17:48 Start Date: 14/Jun/21 17:48 Worklog Time Spent: 10m Work Description: smengcl merged pull request #3078: URL: https://github.com/apache/hadoop/pull/3078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610794) Time Spent: 1.5h (was: 1h 20m) > Quota is not preserved in snapshot INode > > > Key: HDFS-16055 > URL: https://issues.apache.org/jira/browse/HDFS-16055 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.3.0 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Quota feature is not preserved during snapshot creation, this causes > {{INodeDirectory#metadataEquals}} to ALWAYS return true. Therefore, > {{snapshotDiff}} will ALWAYS return the snapshot root as modified, even if > the quota is set before the snapshot creation: > {code:bash} > $ hdfs snapshotDiff /diffTest s0 . > Difference between snapshot s0 and current directory under directory > /diffTest: > M . > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16055) Quota is not preserved in snapshot INode
[ https://issues.apache.org/jira/browse/HDFS-16055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siyao Meng updated HDFS-16055: -- Fix Version/s: (was: 3.3.2) 3.4.0 > Quota is not preserved in snapshot INode > > > Key: HDFS-16055 > URL: https://issues.apache.org/jira/browse/HDFS-16055 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.3.0 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Quota feature is not preserved during snapshot creation, this causes > {{INodeDirectory#metadataEquals}} to ALWAYS return true. Therefore, > {{snapshotDiff}} will ALWAYS return the snapshot root as modified, even if > the quota is set before the snapshot creation: > {code:bash} > $ hdfs snapshotDiff /diffTest s0 . > Difference between snapshot s0 and current directory under directory > /diffTest: > M . > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16055) Quota is not preserved in snapshot INode
[ https://issues.apache.org/jira/browse/HDFS-16055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siyao Meng updated HDFS-16055: -- Fix Version/s: 3.3.2 Target Version/s: (was: 3.3.2) Resolution: Fixed Status: Resolved (was: Patch Available) > Quota is not preserved in snapshot INode > > > Key: HDFS-16055 > URL: https://issues.apache.org/jira/browse/HDFS-16055 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.3.0 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Quota feature is not preserved during snapshot creation, this causes > {{INodeDirectory#metadataEquals}} to ALWAYS return true. Therefore, > {{snapshotDiff}} will ALWAYS return the snapshot root as modified, even if > the quota is set before the snapshot creation: > {code:bash} > $ hdfs snapshotDiff /diffTest s0 . > Difference between snapshot s0 and current directory under directory > /diffTest: > M . > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-15150) Introduce read write lock to Datanode
[ https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein reopened HDFS-15150: -- Thanks [~sodonnell] and [~weichiu] for introducing this optimization. I am opening the issue in order to submit patches backporting the changes to branches 2.10 - 3.x > Introduce read write lock to Datanode > - > > Key: HDFS-15150 > URL: https://issues.apache.org/jira/browse/HDFS-15150 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15150.001.patch, HDFS-15150.002.patch, > HDFS-15150.003.patch > > > HDFS-9668 pointed out the issues around the DN lock being a point of > contention some time ago, but that Jira went in a direction of creating a new > FSDataset implementation which is very risky, and activity on the Jira has > stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a > similar direction to what I was thinking, so I will review that Jira in more > detail to see if this one is necessary. > I feel there could be significant gains by moving to a ReentrantReadWrite > lock within the DN. The current implementation is simply a ReentrantLock so > any locker blocks all others. > Once place I think a read lock would benefit us significantly, is when the DN > is serving a lot of small blocks and there are jobs which perform a lot of > reads. The start of reading any blocks right now takes the lock, but if we > moved this to a read lock, many reads could do this at the same time. > The first conservative step, would be to change the current lock and then > make all accesses to it obtain the write lock. That way, we should keep the > current behaviour and then we can selectively move some lock accesses to the > readlock in separate Jiras. > I would appreciate any thoughts on this, and also if anyone has attempted it > before and found any blockers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15150) Introduce read write lock to Datanode
[ https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated HDFS-15150: - Attachment: HDFS-1515-branch-2.10.001.patch > Introduce read write lock to Datanode > - > > Key: HDFS-15150 > URL: https://issues.apache.org/jira/browse/HDFS-15150 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15150.001.patch, HDFS-15150.002.patch, > HDFS-15150.003.patch > > > HDFS-9668 pointed out the issues around the DN lock being a point of > contention some time ago, but that Jira went in a direction of creating a new > FSDataset implementation which is very risky, and activity on the Jira has > stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a > similar direction to what I was thinking, so I will review that Jira in more > detail to see if this one is necessary. > I feel there could be significant gains by moving to a ReentrantReadWrite > lock within the DN. The current implementation is simply a ReentrantLock so > any locker blocks all others. > Once place I think a read lock would benefit us significantly, is when the DN > is serving a lot of small blocks and there are jobs which perform a lot of > reads. The start of reading any blocks right now takes the lock, but if we > moved this to a read lock, many reads could do this at the same time. > The first conservative step, would be to change the current lock and then > make all accesses to it obtain the write lock. That way, we should keep the > current behaviour and then we can selectively move some lock accesses to the > readlock in separate Jiras. > I would appreciate any thoughts on this, and also if anyone has attempted it > before and found any blockers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15150) Introduce read write lock to Datanode
[ https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated HDFS-15150: - Attachment: (was: HDFS-1515-branch-2.10.001.patch) > Introduce read write lock to Datanode > - > > Key: HDFS-15150 > URL: https://issues.apache.org/jira/browse/HDFS-15150 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15150.001.patch, HDFS-15150.002.patch, > HDFS-15150.003.patch > > > HDFS-9668 pointed out the issues around the DN lock being a point of > contention some time ago, but that Jira went in a direction of creating a new > FSDataset implementation which is very risky, and activity on the Jira has > stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a > similar direction to what I was thinking, so I will review that Jira in more > detail to see if this one is necessary. > I feel there could be significant gains by moving to a ReentrantReadWrite > lock within the DN. The current implementation is simply a ReentrantLock so > any locker blocks all others. > Once place I think a read lock would benefit us significantly, is when the DN > is serving a lot of small blocks and there are jobs which perform a lot of > reads. The start of reading any blocks right now takes the lock, but if we > moved this to a read lock, many reads could do this at the same time. > The first conservative step, would be to change the current lock and then > make all accesses to it obtain the write lock. That way, we should keep the > current behaviour and then we can selectively move some lock accesses to the > readlock in separate Jiras. > I would appreciate any thoughts on this, and also if anyone has attempted it > before and found any blockers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15150) Introduce read write lock to Datanode
[ https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated HDFS-15150: - Attachment: HDFS-15150-branch-2.10.001.patch > Introduce read write lock to Datanode > - > > Key: HDFS-15150 > URL: https://issues.apache.org/jira/browse/HDFS-15150 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15150-branch-2.10.001.patch, HDFS-15150.001.patch, > HDFS-15150.002.patch, HDFS-15150.003.patch > > > HDFS-9668 pointed out the issues around the DN lock being a point of > contention some time ago, but that Jira went in a direction of creating a new > FSDataset implementation which is very risky, and activity on the Jira has > stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a > similar direction to what I was thinking, so I will review that Jira in more > detail to see if this one is necessary. > I feel there could be significant gains by moving to a ReentrantReadWrite > lock within the DN. The current implementation is simply a ReentrantLock so > any locker blocks all others. > Once place I think a read lock would benefit us significantly, is when the DN > is serving a lot of small blocks and there are jobs which perform a lot of > reads. The start of reading any blocks right now takes the lock, but if we > moved this to a read lock, many reads could do this at the same time. > The first conservative step, would be to change the current lock and then > make all accesses to it obtain the write lock. That way, we should keep the > current behaviour and then we can selectively move some lock accesses to the > readlock in separate Jiras. > I would appreciate any thoughts on this, and also if anyone has attempted it > before and found any blockers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16016) BPServiceActor add a new thread to handle IBR
[ https://issues.apache.org/jira/browse/HDFS-16016?focusedWorklogId=610846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610846 ] ASF GitHub Bot logged work on HDFS-16016: - Author: ASF GitHub Bot Created on: 14/Jun/21 19:03 Start Date: 14/Jun/21 19:03 Worklog Time Spent: 10m Work Description: smengcl commented on a change in pull request #2998: URL: https://github.com/apache/hadoop/pull/2998#discussion_r651200801 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeReport.java ## @@ -172,8 +172,20 @@ public void testDatanodeReportMissingBlock() throws Exception { // all bad datanodes } cluster.triggerHeartbeats(); // IBR delete ack - lb = fs.getClient().getLocatedBlocks(p.toString(), 0).get(0); - assertEquals(0, lb.getLocations().length); + int retries = 0; + while (true) { +lb = fs.getClient().getLocatedBlocks(p.toString(), 0).get(0); +if (0 != lb.getLocations().length) { + retries++; + if (retries > 7) { +Assert.fail("getLocatedBlocks failed after 7 retries"); +break; Review comment: ```suggestion ``` nit: `break` is unnecessary now. `Assert.fail` will throw `AssertionError`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610846) Time Spent: 4h 50m (was: 4h 40m) > BPServiceActor add a new thread to handle IBR > - > > Key: HDFS-16016 > URL: https://issues.apache.org/jira/browse/HDFS-16016 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: JiangHua Zhu >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Time Spent: 4h 50m > Remaining Estimate: 0h > > Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. > We can handle IBR independently to improve the performance of heartbeat and > FBR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15150) Introduce read write lock to Datanode
[ https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated HDFS-15150: - Status: Patch Available (was: Reopened) > Introduce read write lock to Datanode > - > > Key: HDFS-15150 > URL: https://issues.apache.org/jira/browse/HDFS-15150 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-15150-branch-2.10.001.patch, HDFS-15150.001.patch, > HDFS-15150.002.patch, HDFS-15150.003.patch > > > HDFS-9668 pointed out the issues around the DN lock being a point of > contention some time ago, but that Jira went in a direction of creating a new > FSDataset implementation which is very risky, and activity on the Jira has > stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a > similar direction to what I was thinking, so I will review that Jira in more > detail to see if this one is necessary. > I feel there could be significant gains by moving to a ReentrantReadWrite > lock within the DN. The current implementation is simply a ReentrantLock so > any locker blocks all others. > Once place I think a read lock would benefit us significantly, is when the DN > is serving a lot of small blocks and there are jobs which perform a lot of > reads. The start of reading any blocks right now takes the lock, but if we > moved this to a read lock, many reads could do this at the same time. > The first conservative step, would be to change the current lock and then > make all accesses to it obtain the write lock. That way, we should keep the > current behaviour and then we can selectively move some lock accesses to the > readlock in separate Jiras. > I would appreciate any thoughts on this, and also if anyone has attempted it > before and found any blockers. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-15659) Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-15659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein reopened HDFS-15659: -- Reopen issue to submit patches for earlier branches 2.10-3.x > Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster > --- > > Key: HDFS-15659 > URL: https://issues.apache.org/jira/browse/HDFS-15659 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > dfs.namenode.redundancy.considerLoad is true by default and it is causing > many test failures. Let's disable it in MiniDFSCluster. > Originally reported by [~weichiu]: > https://github.com/apache/hadoop/pull/2410#pullrequestreview-51612 > {quote} > i've certain seen this option causing test failures in the past. > Maybe we should turn it off by default in MiniDDFSCluster, and only enable it > for specific tests. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14575) LeaseRenewer#daemon threads leak in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renukaprasad C updated HDFS-14575: -- Attachment: HDFS-14575.003.patch > LeaseRenewer#daemon threads leak in DFSClient > - > > Key: HDFS-14575 > URL: https://issues.apache.org/jira/browse/HDFS-14575 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: HDFS-14575.001.patch, HDFS-14575.002.patch, > HDFS-14575.003.patch > > > Currently LeaseRenewer (and its daemon thread) without clients should be > terminated after a grace period which defaults to 60 seconds. A race > condition may happen when a new request is coming just after LeaseRenewer > expired. > Reproduce this race condition: > # Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 > thread, after a few seconds, File#1 is closed , there is no clients in > LeaseRenewer#1 now. > # 60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1 > thread is still in sleep, Client#1 creates File#2, lead to the creation of > Daemon#2. > # Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from > factory. > # File#2 is closed after a few seconds, LeaseRenewer#2 is created since it > can’t get renewer from factory. > Daemon#2 thread leaks from now on, since Client#1 in it can never be removed > and it won't have a chance to stop. > To solve this problem, IIUIC, a simple way I think is to make sure that all > clients are cleared when LeaseRenewer is removed from factory. Please feel > free to give your suggestions. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14575) LeaseRenewer#daemon threads leak in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363180#comment-17363180 ] Renukaprasad C commented on HDFS-14575: --- [~Tao Yang] [~weichiu] [~hexiaoqiao] [~hemanthboyina] [~brahma] Changes done as [~weichiu] suggested and uploaded HDFS-14575.003.patch. Can you please have a look into when you get time? With the changes i had run the test case in loop & in cluster verified read / write basic operations and SGL tool with 1K files. > LeaseRenewer#daemon threads leak in DFSClient > - > > Key: HDFS-14575 > URL: https://issues.apache.org/jira/browse/HDFS-14575 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Tao Yang >Assignee: Tao Yang >Priority: Major > Attachments: HDFS-14575.001.patch, HDFS-14575.002.patch, > HDFS-14575.003.patch > > > Currently LeaseRenewer (and its daemon thread) without clients should be > terminated after a grace period which defaults to 60 seconds. A race > condition may happen when a new request is coming just after LeaseRenewer > expired. > Reproduce this race condition: > # Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 > thread, after a few seconds, File#1 is closed , there is no clients in > LeaseRenewer#1 now. > # 60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1 > thread is still in sleep, Client#1 creates File#2, lead to the creation of > Daemon#2. > # Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from > factory. > # File#2 is closed after a few seconds, LeaseRenewer#2 is created since it > can’t get renewer from factory. > Daemon#2 thread leaks from now on, since Client#1 in it can never be removed > and it won't have a chance to stop. > To solve this problem, IIUIC, a simple way I think is to make sure that all > clients are cleared when LeaseRenewer is removed from factory. Please feel > free to give your suggestions. Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16066) Enhance NNThroughputBenchmark functionality
Renukaprasad C created HDFS-16066: - Summary: Enhance NNThroughputBenchmark functionality Key: HDFS-16066 URL: https://issues.apache.org/jira/browse/HDFS-16066 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Renukaprasad C -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16067) Support Append API in NNThroughputBenchmark
Renukaprasad C created HDFS-16067: - Summary: Support Append API in NNThroughputBenchmark Key: HDFS-16067 URL: https://issues.apache.org/jira/browse/HDFS-16067 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Renukaprasad C Assignee: Renukaprasad C Append API needs to be added into NNThroughputBenchmark tool. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14575) LeaseRenewer#daemon threads leak in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363206#comment-17363206 ] Hadoop QA commented on HDFS-14575: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 49s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 43s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 32s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 18m 20s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 2m 28s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 22s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/622/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-client: The patch generated 2 new + 68 unchanged - 0 fixed = 70 total (was 68) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 37s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green}{
[jira] [Work logged] (HDFS-16016) BPServiceActor add a new thread to handle IBR
[ https://issues.apache.org/jira/browse/HDFS-16016?focusedWorklogId=610942&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610942 ] ASF GitHub Bot logged work on HDFS-16016: - Author: ASF GitHub Bot Created on: 14/Jun/21 21:22 Start Date: 14/Jun/21 21:22 Worklog Time Spent: 10m Work Description: smengcl commented on pull request #2998: URL: https://github.com/apache/hadoop/pull/2998#issuecomment-861004273 Thanks @virajjasani for patch. Will merge shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610942) Time Spent: 5h (was: 4h 50m) > BPServiceActor add a new thread to handle IBR > - > > Key: HDFS-16016 > URL: https://issues.apache.org/jira/browse/HDFS-16016 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: JiangHua Zhu >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. > We can handle IBR independently to improve the performance of heartbeat and > FBR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-15618) Improve datanode shutdown latency
[ https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-15618 started by Ahmed Hussein. > Improve datanode shutdown latency > - > > Key: HDFS-15618 > URL: https://issues.apache.org/jira/browse/HDFS-15618 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, > HDFS-15618.002.patch, HDFS-15618.003.patch, HDFS-15618.004.patch > > > The shutdown of Datanode is a very long latency. A block scanner waits for 5 > minutes to join on each VolumeScanner thread. > Since the scanners are daemon threads and do not alter the block content, it > is safe to ignore such conditions on shutdown of Datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-15618) Improve datanode shutdown latency
[ https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein reopened HDFS-15618: -- Reopening to submit a patch for branch-2.10 > Improve datanode shutdown latency > - > > Key: HDFS-15618 > URL: https://issues.apache.org/jira/browse/HDFS-15618 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, > HDFS-15618.002.patch, HDFS-15618.003.patch, HDFS-15618.004.patch > > > The shutdown of Datanode is a very long latency. A block scanner waits for 5 > minutes to join on each VolumeScanner thread. > Since the scanners are daemon threads and do not alter the block content, it > is safe to ignore such conditions on shutdown of Datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15618) Improve datanode shutdown latency
[ https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Hussein updated HDFS-15618: - Attachment: HDFS-15618-branch-2.10.001.patch Status: Patch Available (was: In Progress) > Improve datanode shutdown latency > - > > Key: HDFS-15618 > URL: https://issues.apache.org/jira/browse/HDFS-15618 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.2.3, 3.2.2 > > Attachments: HDFS-15618-branch-2.10.001.patch, > HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, HDFS-15618.002.patch, > HDFS-15618.003.patch, HDFS-15618.004.patch > > > The shutdown of Datanode is a very long latency. A block scanner waits for 5 > minutes to join on each VolumeScanner thread. > Since the scanners are daemon threads and do not alter the block content, it > is safe to ignore such conditions on shutdown of Datanode. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16016) BPServiceActor add a new thread to handle IBR
[ https://issues.apache.org/jira/browse/HDFS-16016?focusedWorklogId=610945&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610945 ] ASF GitHub Bot logged work on HDFS-16016: - Author: ASF GitHub Bot Created on: 14/Jun/21 21:24 Start Date: 14/Jun/21 21:24 Worklog Time Spent: 10m Work Description: smengcl merged pull request #2998: URL: https://github.com/apache/hadoop/pull/2998 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 610945) Time Spent: 5h 10m (was: 5h) > BPServiceActor add a new thread to handle IBR > - > > Key: HDFS-16016 > URL: https://issues.apache.org/jira/browse/HDFS-16016 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: JiangHua Zhu >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Time Spent: 5h 10m > Remaining Estimate: 0h > > Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. > We can handle IBR independently to improve the performance of heartbeat and > FBR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15150) Introduce read write lock to Datanode
[ https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363289#comment-17363289 ] Hadoop QA commented on HDFS-15150: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 37s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 18s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 51s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 58s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK Azul Systems, Inc.-1.7.0_262-b10 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 9s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 6s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 24s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 33s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK Azul Systems, Inc.-1.7.0_262-b10 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 54s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 14m 37s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:red}-1{color} | {color:red} spotbugs {color} | {color:red} 2m 2s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/621/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html{color} | {color:red} hadoop-common-project/hadoop-common in branch-2.10 has 2 extant spotbugs warnings. {color} | | {color:red}-1{color} | {color:red} spotbugs {color} | {color:red} 2m 36s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/621/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in branch-2.10 has 1 extant spotbugs warnings. {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 39s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 20s{color} | {color:green}{color} | {color:green} the patch passed with JDK Azul Systems, Inc.-1.7.0_262-b10 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 20s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 15s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 15s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 5s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 19s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} |
[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency
[ https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363293#comment-17363293 ] Hadoop QA commented on HDFS-15618: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 41s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} branch-2.10 Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 18s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK Azul Systems, Inc.-1.7.0_262-b10 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK Azul Systems, Inc.-1.7.0_262-b10 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 7m 27s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:red}-1{color} | {color:red} spotbugs {color} | {color:red} 3m 6s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/623/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in branch-2.10 has 1 extant spotbugs warnings. {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 2s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green}{color} | {color:green} the patch passed with JDK Azul Systems, Inc.-1.7.0_262-b10 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 57s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s{color} | {color:green}{color} | {color:green} the patch passed with JDK Azul Systems, Inc.-1.7.0_262-b10 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 50s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Bu
[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
[ https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=611089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611089 ] ASF GitHub Bot logged work on HDFS-13671: - Author: ASF GitHub Bot Created on: 15/Jun/21 01:12 Start Date: 15/Jun/21 01:12 Worklog Time Spent: 10m Work Description: ferhui commented on pull request #3065: URL: https://github.com/apache/hadoop/pull/3065#issuecomment-861097055 @xiaoyuyao Thanks for review! Is @AlphaGouGe 's fix OK? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 611089) Time Spent: 4h 50m (was: 4h 40m) > Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet > -- > > Key: HDFS-13671 > URL: https://issues.apache.org/jira/browse/HDFS-13671 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0, 3.0.3 >Reporter: Yiqun Lin >Assignee: Haibin Huang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, > image-2021-06-10-19-28-58-359.png > > Time Spent: 4h 50m > Remaining Estimate: 0h > > NameNode hung when deleting large files/blocks. The stack info: > {code} > "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 > tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > {code} > In the current deletion logic in NameNode, there are mainly two steps: > * Collect INodes and all blocks to be deleted, then delete INodes. > * Remove blocks chunk by chunk in a loop. > Actually the first step should be a more expensive operation and will takes > more time. However, now we always see NN hangs during the remove block > operation. > Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a > better performance in dealing FBR/IBRs. But compared with early > implementation in remove-block logic, {{FoldedTreeSet}} seems more slower > since It will take additional time to balance tree node. When there are large > block to be removed/deleted, it looks bad. > For the get type operations in {{DatanodeStorageInfo}}, we only provide the > {{getBlockIterator}} to return blocks iterator and no other get operation > with specified block. Still we need to use {{FoldedTreeSet}} in > {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not > Update. Maybe we can revert this to the early implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005) -
[jira] [Created] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS
Takanobu Asanuma created HDFS-16068: --- Summary: WebHdfsFileSystem has a possible connection leak in connection with HttpFS Key: HDFS-16068 URL: https://issues.apache.org/jira/browse/HDFS-16068 Project: Hadoop HDFS Issue Type: Bug Reporter: Takanobu Asanuma Assignee: Takanobu Asanuma When we use WebHdfsFileSystem for HttpFS, some connections remain for a while after the filesystems are closed until GC runs. After investigating it for a while, I found that there is a potential connection leak in WebHdfsFileSystem. {code:java} // Close both the InputStream and the connection. @VisibleForTesting void closeInputStream(RunnerState rs) throws IOException { if (in != null) { IOUtils.close(cachedConnection); in = null; } cachedConnection = null; runnerState = rs; } {code} In the above code, if the variable of {{in}} is null and {{cachedConnection}} is not null, {{cachedConnection}} doesn't close and the connection remains. I think this is the cause of our problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
[ https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363306#comment-17363306 ] Hui Fei commented on HDFS-13671: [~huanghaibin] Thanks for sharing this. [~kihwal] Thanks, do you still have time to review this? Thanks for [~aajisaka] and [~xyao] 's review, If no other comments, i want to merge it this week. > Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet > -- > > Key: HDFS-13671 > URL: https://issues.apache.org/jira/browse/HDFS-13671 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.1.0, 3.0.3 >Reporter: Yiqun Lin >Assignee: Haibin Huang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, > image-2021-06-10-19-28-58-359.png > > Time Spent: 4h 50m > Remaining Estimate: 0h > > NameNode hung when deleting large files/blocks. The stack info: > {code} > "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 > tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > {code} > In the current deletion logic in NameNode, there are mainly two steps: > * Collect INodes and all blocks to be deleted, then delete INodes. > * Remove blocks chunk by chunk in a loop. > Actually the first step should be a more expensive operation and will takes > more time. However, now we always see NN hangs during the remove block > operation. > Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a > better performance in dealing FBR/IBRs. But compared with early > implementation in remove-block logic, {{FoldedTreeSet}} seems more slower > since It will take additional time to balance tree node. When there are large > block to be removed/deleted, it looks bad. > For the get type operations in {{DatanodeStorageInfo}}, we only provide the > {{getBlockIterator}} to return blocks iterator and no other get operation > with specified block. Still we need to use {{FoldedTreeSet}} in > {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not > Update. Maybe we can revert this to the early implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS
[ https://issues.apache.org/jira/browse/HDFS-16068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16068: -- Labels: pull-request-available (was: ) > WebHdfsFileSystem has a possible connection leak in connection with HttpFS > -- > > Key: HDFS-16068 > URL: https://issues.apache.org/jira/browse/HDFS-16068 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When we use WebHdfsFileSystem for HttpFS, some connections remain for a while > after the filesystems are closed until GC runs. After investigating it for a > while, I found that there is a potential connection leak in WebHdfsFileSystem. > {code:java} > // Close both the InputStream and the connection. > @VisibleForTesting > void closeInputStream(RunnerState rs) throws IOException { > if (in != null) { > IOUtils.close(cachedConnection); > in = null; > } > cachedConnection = null; > runnerState = rs; > } > {code} > In the above code, if the variable of {{in}} is null and {{cachedConnection}} > is not null, {{cachedConnection}} doesn't close and the connection remains. I > think this is the cause of our problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS
[ https://issues.apache.org/jira/browse/HDFS-16068?focusedWorklogId=611092&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611092 ] ASF GitHub Bot logged work on HDFS-16068: - Author: ASF GitHub Bot Created on: 15/Jun/21 01:31 Start Date: 15/Jun/21 01:31 Worklog Time Spent: 10m Work Description: tasanuma opened a new pull request #3104: URL: https://github.com/apache/hadoop/pull/3104 JIRA: HDFS-16068 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 611092) Remaining Estimate: 0h Time Spent: 10m > WebHdfsFileSystem has a possible connection leak in connection with HttpFS > -- > > Key: HDFS-16068 > URL: https://issues.apache.org/jira/browse/HDFS-16068 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When we use WebHdfsFileSystem for HttpFS, some connections remain for a while > after the filesystems are closed until GC runs. After investigating it for a > while, I found that there is a potential connection leak in WebHdfsFileSystem. > {code:java} > // Close both the InputStream and the connection. > @VisibleForTesting > void closeInputStream(RunnerState rs) throws IOException { > if (in != null) { > IOUtils.close(cachedConnection); > in = null; > } > cachedConnection = null; > runnerState = rs; > } > {code} > In the above code, if the variable of {{in}} is null and {{cachedConnection}} > is not null, {{cachedConnection}} doesn't close and the connection remains. I > think this is the cause of our problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS
[ https://issues.apache.org/jira/browse/HDFS-16068?focusedWorklogId=611093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611093 ] ASF GitHub Bot logged work on HDFS-16068: - Author: ASF GitHub Bot Created on: 15/Jun/21 01:31 Start Date: 15/Jun/21 01:31 Worklog Time Spent: 10m Work Description: tasanuma commented on pull request #3104: URL: https://github.com/apache/hadoop/pull/3104#issuecomment-861103245 Writing the unit test is not easy because `cachedConnection` is private. But the fix is clear and safe. So I don't think the unit test is required. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 611093) Time Spent: 20m (was: 10m) > WebHdfsFileSystem has a possible connection leak in connection with HttpFS > -- > > Key: HDFS-16068 > URL: https://issues.apache.org/jira/browse/HDFS-16068 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When we use WebHdfsFileSystem for HttpFS, some connections remain for a while > after the filesystems are closed until GC runs. After investigating it for a > while, I found that there is a potential connection leak in WebHdfsFileSystem. > {code:java} > // Close both the InputStream and the connection. > @VisibleForTesting > void closeInputStream(RunnerState rs) throws IOException { > if (in != null) { > IOUtils.close(cachedConnection); > in = null; > } > cachedConnection = null; > runnerState = rs; > } > {code} > In the above code, if the variable of {{in}} is null and {{cachedConnection}} > is not null, {{cachedConnection}} doesn't close and the connection remains. I > think this is the cause of our problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS
[ https://issues.apache.org/jira/browse/HDFS-16068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-16068: Status: Patch Available (was: Open) > WebHdfsFileSystem has a possible connection leak in connection with HttpFS > -- > > Key: HDFS-16068 > URL: https://issues.apache.org/jira/browse/HDFS-16068 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When we use WebHdfsFileSystem for HttpFS, some connections remain for a while > after the filesystems are closed until GC runs. After investigating it for a > while, I found that there is a potential connection leak in WebHdfsFileSystem. > {code:java} > // Close both the InputStream and the connection. > @VisibleForTesting > void closeInputStream(RunnerState rs) throws IOException { > if (in != null) { > IOUtils.close(cachedConnection); > in = null; > } > cachedConnection = null; > runnerState = rs; > } > {code} > In the above code, if the variable of {{in}} is null and {{cachedConnection}} > is not null, {{cachedConnection}} doesn't close and the connection remains. I > think this is the cause of our problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS
[ https://issues.apache.org/jira/browse/HDFS-16068?focusedWorklogId=611096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611096 ] ASF GitHub Bot logged work on HDFS-16068: - Author: ASF GitHub Bot Created on: 15/Jun/21 01:50 Start Date: 15/Jun/21 01:50 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #3104: URL: https://github.com/apache/hadoop/pull/3104#issuecomment-861108603 LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 611096) Time Spent: 0.5h (was: 20m) > WebHdfsFileSystem has a possible connection leak in connection with HttpFS > -- > > Key: HDFS-16068 > URL: https://issues.apache.org/jira/browse/HDFS-16068 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When we use WebHdfsFileSystem for HttpFS, some connections remain for a while > after the filesystems are closed until GC runs. After investigating it for a > while, I found that there is a potential connection leak in WebHdfsFileSystem. > {code:java} > // Close both the InputStream and the connection. > @VisibleForTesting > void closeInputStream(RunnerState rs) throws IOException { > if (in != null) { > IOUtils.close(cachedConnection); > in = null; > } > cachedConnection = null; > runnerState = rs; > } > {code} > In the above code, if the variable of {{in}} is null and {{cachedConnection}} > is not null, {{cachedConnection}} doesn't close and the connection remains. I > think this is the cause of our problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16016) BPServiceActor add a new thread to handle IBR
[ https://issues.apache.org/jira/browse/HDFS-16016?focusedWorklogId=611108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611108 ] ASF GitHub Bot logged work on HDFS-16016: - Author: ASF GitHub Bot Created on: 15/Jun/21 02:32 Start Date: 15/Jun/21 02:32 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2998: URL: https://github.com/apache/hadoop/pull/2998#issuecomment-861123365 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 50s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 30m 48s | | trunk passed | | +1 :green_heart: | compile | 1m 24s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 19s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 1s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 25s | | trunk passed | | +1 :green_heart: | javadoc | 0m 58s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 29s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 12s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 9s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 14s | | the patch passed | | +1 :green_heart: | compile | 1m 13s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 13s | | the patch passed | | +1 :green_heart: | compile | 1m 8s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 8s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 53s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 14s | | the patch passed | | +1 :green_heart: | javadoc | 0m 47s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 21s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 7s | | the patch passed | | +1 :green_heart: | shadedclient | 15m 56s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 355m 13s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2998/25/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 47s | | The patch does not generate ASF License warnings. | | | | 439m 29s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.web.TestWebHdfsFileSystemContract | | | hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor | | | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2998/25/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2998 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 09ad53fd21e2 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 0503dfdf06dbf1820aac54130e4d4f86854d5040 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.
[jira] [Work logged] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS
[ https://issues.apache.org/jira/browse/HDFS-16068?focusedWorklogId=64&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-64 ] ASF GitHub Bot logged work on HDFS-16068: - Author: ASF GitHub Bot Created on: 15/Jun/21 02:51 Start Date: 15/Jun/21 02:51 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3104: URL: https://github.com/apache/hadoop/pull/3104#issuecomment-861129182 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 41s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 13s | | trunk passed | | +1 :green_heart: | compile | 1m 1s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 0m 55s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 0m 31s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 59s | | trunk passed | | +1 :green_heart: | javadoc | 0m 43s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 0m 40s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 27s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 41s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 46s | | the patch passed | | +1 :green_heart: | compile | 0m 52s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 0m 52s | | the patch passed | | +1 :green_heart: | compile | 0m 47s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 0m 47s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 18s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 49s | | the patch passed | | +1 :green_heart: | javadoc | 0m 33s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 0m 29s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 2m 30s | | the patch passed | | +1 :green_heart: | shadedclient | 15m 10s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 16s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | asflicense | 0m 32s | | The patch does not generate ASF License warnings. | | | | 79m 0s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3104/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3104 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 5e1a36ddd1ed 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 9765967b2691b64fc25736bf46371a4c3769694e | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3104/1/testReport/ | | Max. process+thread count | 743 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3104/1/console |
[jira] [Assigned] (HDFS-16069) Remove locally stored files (edit log) when NameNode becomes Standby
[ https://issues.apache.org/jira/browse/HDFS-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangHua Zhu reassigned HDFS-16069: --- Assignee: JiangHua Zhu > Remove locally stored files (edit log) when NameNode becomes Standby > > > Key: HDFS-16069 > URL: https://issues.apache.org/jira/browse/HDFS-16069 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Minor > > When zkfc is working, one of the NameNode (Active) will become the Standby > state. Before the state change, this NameNode has saved some files (edit > log), these files are stored in the directory (dfs.namenode.name.dir) , And > will not disappear in the short term until the status of this NameNode > becomes Active again. > These files (edit log) are of little significance to the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16069) Remove locally stored files (edit log) when NameNode becomes Standby
JiangHua Zhu created HDFS-16069: --- Summary: Remove locally stored files (edit log) when NameNode becomes Standby Key: HDFS-16069 URL: https://issues.apache.org/jira/browse/HDFS-16069 Project: Hadoop HDFS Issue Type: Improvement Reporter: JiangHua Zhu When zkfc is working, one of the NameNode (Active) will become the Standby state. Before the state change, this NameNode has saved some files (edit log), these files are stored in the directory (dfs.namenode.name.dir) , And will not disappear in the short term until the status of this NameNode becomes Active again. These files (edit log) are of little significance to the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16069) Remove locally stored files (edit log) when NameNode becomes Standby
[ https://issues.apache.org/jira/browse/HDFS-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangHua Zhu updated HDFS-16069: Affects Version/s: 2.9.2 > Remove locally stored files (edit log) when NameNode becomes Standby > > > Key: HDFS-16069 > URL: https://issues.apache.org/jira/browse/HDFS-16069 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.9.2 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Minor > > When zkfc is working, one of the NameNode (Active) will become the Standby > state. Before the state change, this NameNode has saved some files (edit > log), these files are stored in the directory (dfs.namenode.name.dir) , And > will not disappear in the short term until the status of this NameNode > becomes Active again. > These files (edit log) are of little significance to the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16069) Remove locally stored files (edit log) when NameNode becomes Standby
[ https://issues.apache.org/jira/browse/HDFS-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangHua Zhu updated HDFS-16069: Labels: namenode zkfc (was: ) > Remove locally stored files (edit log) when NameNode becomes Standby > > > Key: HDFS-16069 > URL: https://issues.apache.org/jira/browse/HDFS-16069 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.9.2 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Minor > Labels: namenode, zkfc > > When zkfc is working, one of the NameNode (Active) will become the Standby > state. Before the state change, this NameNode has saved some files (edit > log), these files are stored in the directory (dfs.namenode.name.dir) , And > will not disappear in the short term until the status of this NameNode > becomes Active again. > These files (edit log) are of little significance to the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16069) Remove locally stored files (edit log) when NameNode becomes Standby
[ https://issues.apache.org/jira/browse/HDFS-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] JiangHua Zhu updated HDFS-16069: Labels: (was: namenode zkfc) > Remove locally stored files (edit log) when NameNode becomes Standby > > > Key: HDFS-16069 > URL: https://issues.apache.org/jira/browse/HDFS-16069 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.9.2 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Minor > > When zkfc is working, one of the NameNode (Active) will become the Standby > state. Before the state change, this NameNode has saved some files (edit > log), these files are stored in the directory (dfs.namenode.name.dir) , And > will not disappear in the short term until the status of this NameNode > becomes Active again. > These files (edit log) are of little significance to the cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16070) DataTransfer block storm when datanode's io is busy.
zhengchenyu created HDFS-16070: -- Summary: DataTransfer block storm when datanode's io is busy. Key: HDFS-16070 URL: https://issues.apache.org/jira/browse/HDFS-16070 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.2.1, 3.3.0 Reporter: zhengchenyu When I speed up the decommission, I found that some datanode's io is busy, then I found host's load is very high, and ten thousands data transfer thread are running. Then I find log like below. {code} # 启动线程的日志 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.52:9866 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.31:9866 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.16.50:9866 # 发送完成的标记 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.7.52:9866 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.16.50:9866 {code} You will see last datatranfser thread was done on 13:54:08, but next datatranfser was start at 13:52:36. If datatranfser was not done in 10min(pending timeout + check interval), then next datatranfser for same block will be running. Then disk and network are heavy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16070) DataTransfer block storm when datanode's io is busy.
[ https://issues.apache.org/jira/browse/HDFS-16070?focusedWorklogId=611124&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611124 ] ASF GitHub Bot logged work on HDFS-16070: - Author: ASF GitHub Bot Created on: 15/Jun/21 03:42 Start Date: 15/Jun/21 03:42 Worklog Time Spent: 10m Work Description: zhengchenyu opened a new pull request #3105: URL: https://github.com/apache/hadoop/pull/3105 When I speed up the decommission, I found that some datanode's io is busy, then I found host's load is very high, and ten thousands data transfer thread are running. Then I find log like below. ``` # 启动线程的日志 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.52:9866 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.31:9866 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.16.50:9866 # 发送完成的标记 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.7.52:9866 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.16.50:9866 ``` You will see last datatranfser thread was done on 13:54:08, but next datatranfser was start at 13:52:36. If datatranfser was not done in 10min(pending timeout + check interval), then next datatranfser for same block will be running. Then disk and network are heavy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 611124) Remaining Estimate: 0h Time Spent: 10m > DataTransfer block storm when datanode's io is busy. > > > Key: HDFS-16070 > URL: https://issues.apache.org/jira/browse/HDFS-16070 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.0, 3.2.1 >Reporter: zhengchenyu >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When I speed up the decommission, I found that some datanode's io is busy, > then I found host's load is very high, and ten thousands data transfer thread > are running. > Then I find log like below. > {code} > # 启动线程的日志 > 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.201.4.49:9866, > datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) > Starting thread to transfer > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to > 10.201.7.52:9866 > 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.201.4.49:9866, > datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) > Starting thread to transfer > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to > 10.201.7.31:9866 > 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.
[jira] [Updated] (HDFS-16070) DataTransfer block storm when datanode's io is busy.
[ https://issues.apache.org/jira/browse/HDFS-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16070: -- Labels: pull-request-available (was: ) > DataTransfer block storm when datanode's io is busy. > > > Key: HDFS-16070 > URL: https://issues.apache.org/jira/browse/HDFS-16070 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.0, 3.2.1 >Reporter: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When I speed up the decommission, I found that some datanode's io is busy, > then I found host's load is very high, and ten thousands data transfer thread > are running. > Then I find log like below. > {code} > # 启动线程的日志 > 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.201.4.49:9866, > datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) > Starting thread to transfer > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to > 10.201.7.52:9866 > 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.201.4.49:9866, > datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) > Starting thread to transfer > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to > 10.201.7.31:9866 > 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.201.4.49:9866, > datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) > Starting thread to transfer > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to > 10.201.16.50:9866 > # 发送完成的标记 > 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 > (numBytes=7457424) to /10.201.7.52:9866 > 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 > (numBytes=7457424) to /10.201.16.50:9866 > {code} > You will see last datatranfser thread was done on 13:54:08, but next > datatranfser was start at 13:52:36. > If datatranfser was not done in 10min(pending timeout + check interval), then > next datatranfser for same block will be running. Then disk and network are > heavy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16070) DataTransfer block storm when datanode's io is busy.
[ https://issues.apache.org/jira/browse/HDFS-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363358#comment-17363358 ] zhengchenyu commented on HDFS-16070: [~ayushsaxena][~inigoiri] I have submit a pull request, can you help me review this patch? > DataTransfer block storm when datanode's io is busy. > > > Key: HDFS-16070 > URL: https://issues.apache.org/jira/browse/HDFS-16070 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.0, 3.2.1 >Reporter: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When I speed up the decommission, I found that some datanode's io is busy, > then I found host's load is very high, and ten thousands data transfer thread > are running. > Then I find log like below. > {code} > # 启动线程的日志 > 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.201.4.49:9866, > datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) > Starting thread to transfer > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to > 10.201.7.52:9866 > 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.201.4.49:9866, > datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) > Starting thread to transfer > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to > 10.201.7.31:9866 > 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.201.4.49:9866, > datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) > Starting thread to transfer > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to > 10.201.16.50:9866 > # 发送完成的标记 > 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 > (numBytes=7457424) to /10.201.7.52:9866 > 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 > (numBytes=7457424) to /10.201.16.50:9866 > {code} > You will see last datatranfser thread was done on 13:54:08, but next > datatranfser was start at 13:52:36. > If datatranfser was not done in 10min(pending timeout + check interval), then > next datatranfser for same block will be running. Then disk and network are > heavy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16070) DataTransfer block storm when datanode's io is busy.
[ https://issues.apache.org/jira/browse/HDFS-16070?focusedWorklogId=611126&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611126 ] ASF GitHub Bot logged work on HDFS-16070: - Author: ASF GitHub Bot Created on: 15/Jun/21 03:47 Start Date: 15/Jun/21 03:47 Worklog Time Spent: 10m Work Description: zhengchenyu commented on pull request #3105: URL: https://github.com/apache/hadoop/pull/3105#issuecomment-861146706 @ayushtkn @goiri can you help me review this PR ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 611126) Time Spent: 20m (was: 10m) > DataTransfer block storm when datanode's io is busy. > > > Key: HDFS-16070 > URL: https://issues.apache.org/jira/browse/HDFS-16070 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.0, 3.2.1 >Reporter: zhengchenyu >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > When I speed up the decommission, I found that some datanode's io is busy, > then I found host's load is very high, and ten thousands data transfer thread > are running. > Then I find log like below. > {code} > # 启动线程的日志 > 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.201.4.49:9866, > datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) > Starting thread to transfer > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to > 10.201.7.52:9866 > 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.201.4.49:9866, > datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) > Starting thread to transfer > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to > 10.201.7.31:9866 > 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.201.4.49:9866, > datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, > infoSecurePort=0, ipcPort=9867, > storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) > Starting thread to transfer > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to > 10.201.16.50:9866 > # 发送完成的标记 > 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 > (numBytes=7457424) to /10.201.7.52:9866 > 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: > DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted > BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 > (numBytes=7457424) to /10.201.16.50:9866 > {code} > You will see last datatranfser thread was done on 13:54:08, but next > datatranfser was start at 13:52:36. > If datatranfser was not done in 10min(pending timeout + check interval), then > next datatranfser for same block will be running. Then disk and network are > heavy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16070) DataTransfer block storm when datanode's io is busy.
[ https://issues.apache.org/jira/browse/HDFS-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengchenyu updated HDFS-16070: --- Description: When I speed up the decommission, I found that some datanode's io is busy, then I found host's load is very high, and ten thousands data transfer thread are running. Then I find log like below. {code} # 启动线程的日志 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.52:9866 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.31:9866 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.16.50:9866 # 发送完成的标记 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.7.52:9866 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.16.50:9866 {code} You will see last datatranfser thread was done on 13:54:08, but next datatranfser was start at 13:52:36. If datatranfser was not done in 10min(pending timeout + check interval), then next datatranfser for same block will be running. Then disk and network are heavy. Note: decommission ec block will trigger this problem easily, becuase every ec internal block are unique. was: When I speed up the decommission, I found that some datanode's io is busy, then I found host's load is very high, and ten thousands data transfer thread are running. Then I find log like below. {code} # 启动线程的日志 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.52:9866 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.31:9866 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.16.50:9866 # 发送完成的标记 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.7.52:9866 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.16.50:9866 {code} You will see last datatranfser thread was done on 13:54:08, but next datatranfser was start at 13:52:36. If datatranfser was not done in 10min(pending timeout + check interval), then next datatranfser for same block will be runni
[jira] [Work logged] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS
[ https://issues.apache.org/jira/browse/HDFS-16068?focusedWorklogId=611146&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611146 ] ASF GitHub Bot logged work on HDFS-16068: - Author: ASF GitHub Bot Created on: 15/Jun/21 05:50 Start Date: 15/Jun/21 05:50 Worklog Time Spent: 10m Work Description: hemanthboyina commented on pull request #3104: URL: https://github.com/apache/hadoop/pull/3104#issuecomment-861191955 +1 will commit shortly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 611146) Time Spent: 50m (was: 40m) > WebHdfsFileSystem has a possible connection leak in connection with HttpFS > -- > > Key: HDFS-16068 > URL: https://issues.apache.org/jira/browse/HDFS-16068 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > When we use WebHdfsFileSystem for HttpFS, some connections remain for a while > after the filesystems are closed until GC runs. After investigating it for a > while, I found that there is a potential connection leak in WebHdfsFileSystem. > {code:java} > // Close both the InputStream and the connection. > @VisibleForTesting > void closeInputStream(RunnerState rs) throws IOException { > if (in != null) { > IOUtils.close(cachedConnection); > in = null; > } > cachedConnection = null; > runnerState = rs; > } > {code} > In the above code, if the variable of {{in}} is null and {{cachedConnection}} > is not null, {{cachedConnection}} doesn't close and the connection remains. I > think this is the cause of our problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16016) BPServiceActor add a new thread to handle IBR
[ https://issues.apache.org/jira/browse/HDFS-16016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani updated HDFS-16016: Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) > BPServiceActor add a new thread to handle IBR > - > > Key: HDFS-16016 > URL: https://issues.apache.org/jira/browse/HDFS-16016 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: JiangHua Zhu >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. > We can handle IBR independently to improve the performance of heartbeat and > FBR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org