date:20210614

[jira] [Commented] (HDFS-9126) namenode crash in fsimage download/transfer

2021-06-14 Thread Seokchan Yoon (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362736#comment-17362736
 ] 

Seokchan Yoon commented on HDFS-9126:
-

Why is this closed? I ran into the same situation and need to figure out the 
reason why the previous active NN failed on doHealthChecks.

> namenode crash in fsimage download/transfer
> ---
>
> Key: HDFS-9126
> URL: https://issues.apache.org/jira/browse/HDFS-9126
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: OS:Centos 6.5(final)
> Apache Hadoop:2.6.0
> namenode ha base 5 journalnodes
>Reporter: zengyongping
>Priority: Critical
>
> In our product Hadoop cluster,when active namenode begin download/transfer 
> fsimage from standby namenode.some times zkfc monitor health of NameNode 
> socket timeout,zkfs judge active namenode status SERVICE_NOT_RESPONDING 
> ,happen hadoop namenode ha failover,fence old active namenode.
> zkfc logs:
> 2015-09-24 11:44:44,739 WARN org.apache.hadoop.ha.HealthMonitor: 
> Transport-level exception trying to monitor health of NameNode at 
> hostname1/192.168.10.11:8020: Call From hostname1/192.168.10.11 to 
> hostname1:8020 failed on socket timeout exception: 
> java.net.SocketTimeoutException: 45000 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/192.168.10.11:22614 remote=hostname1/192.168.10.11:8020]; For more 
> details see:  http://wiki.apache.org/hadoop/SocketTimeout
> 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.HealthMonitor: Entering 
> state SERVICE_NOT_RESPONDING
> 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ZKFailoverController: Local 
> service NameNode at hostname1/192.168.10.11:8020 entered state: 
> SERVICE_NOT_RESPONDING
> 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ZKFailoverController: 
> Quitting master election for NameNode at hostname1/192.168.10.11:8020 and 
> marking that fencing is necessary
> 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Yielding from election
> 2015-09-24 11:44:44,761 INFO org.apache.zookeeper.ZooKeeper: Session: 
> 0x54d81348fe503e3 closed
> 2015-09-24 11:44:44,761 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Ignoring stale result from old client with sessionId 0x54d81348fe503e3
> 2015-09-24 11:44:44,764 INFO org.apache.zookeeper.ClientCnxn: EventThread 
> shut down
> namenode logs:
> 2015-09-24 11:43:34,074 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 
> 192.168.10.12
> 2015-09-24 11:43:34,074 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
> 2015-09-24 11:43:34,075 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 
> 2317430129
> 2015-09-24 11:43:34,253 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 
> 272988 Total time for transactions(ms): 5502 Number of transactions batched 
> in Syncs: 146274 Number of syncs: 32375 SyncTimes(ms): 274465 319599
> 2015-09-24 11:43:46,005 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 3 milliseconds
> 2015-09-24 11:44:21,054 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReplicationMonitor timed out blk_1185804191_112164210
> 2015-09-24 11:44:36,076 INFO 
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits 
> file 
> /software/data/hadoop-data/hdfs/namenode/current/edits_inprogress_02317430129
>  -> 
> /software/data/hadoop-data/hdfs/namenode/current/edits_02317430129-02317703116
> 2015-09-24 11:44:36,077 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 
> 2317703117
> 2015-09-24 11:45:38,008 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1 
> Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 
> Number of syncs: 0 SyncTimes(ms): 0 61585
> 2015-09-24 11:45:38,009 INFO 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took 222.88s 
> at 63510.29 KB/s
> 2015-09-24 11:45:38,009 INFO 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file 
> fsimage.ckpt_02317430128 size 14495092105 bytes.
> 2015-09-24 11:45:38,416 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal 
> 192.168.10.13:8485 failed to write txns 2317703117-2317703117. Will try to 
> write to this JN again after the next log roll.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 44 is 
> less than the last promised epoch 45
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:414)
> a

[jira] [Work logged] (HDFS-16055) Quota is not preserved in snapshot INode

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16055?focusedWorklogId=610002&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610002
 ]

ASF GitHub Bot logged work on HDFS-16055:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:08
Start Date: 14/Jun/21 07:08
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3078:
URL: https://github.com/apache/hadoop/pull/3078#issuecomment-859196473


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 33s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 24s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 17s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  19m  4s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  7s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m  7s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 56s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 48s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  18m 48s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 330m 23s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3078/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 422m 20s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestDecommissioningStatus 
|
   |   | 
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor |
   |   | hadoop.hdfs.TestDFSShell |
   |   | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3078/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3078 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux ed5e5dcaaa2c 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 
05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / fdb574b1debb926be5ee32daae4e624d11900383 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test R

[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=609998&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-609998
 ]

ASF GitHub Bot logged work on HDFS-13671:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:08
Start Date: 14/Jun/21 07:08
Worklog Time Spent: 10m 
  Work Description: xiaoyuyao commented on a change in pull request #3065:
URL: https://github.com/apache/hadoop/pull/3065#discussion_r648608033



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
##
@@ -3220,21 +3172,28 @@ private void reportDiffSortedInner(
 // comes from the IBR / FBR and hence what we should use to compare
 // against the memory state.
 // See HDFS-6289 and HDFS-15422 for more context.
-queueReportedBlock(storageInfo, replica, reportedState,
+queueReportedBlock(storageInfo, block, reportedState,

Review comment:
   block should be storedBlock?

##
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
##
@@ -256,9 +256,6 @@ message BlockReportContextProto  {
   // The block report lease ID, or 0 if we are sending without a lease to
   // bypass rate-limiting.
   optional uint64 leaseId = 4 [ default = 0 ];
-
-  // True if the reported blocks are sorted by increasing block IDs
-  optional bool sorted = 5 [default = false];

Review comment:
   should we leave here as-is to keep backward compatibility but leave it 
unused. 

##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
##
@@ -3111,106 +3042,127 @@ void processFirstBlockReport(
 }
   }
 
-  private void reportDiffSorted(DatanodeStorageInfo storageInfo,
-  Iterable newReport,
+  private void reportDiff(DatanodeStorageInfo storageInfo,
+  BlockListAsLongs newReport,
   Collection toAdd, // add to DatanodeDescriptor
   Collection toRemove,   // remove from DatanodeDescriptor
   Collection toInvalidate,   // should be removed from DN
   Collection toCorrupt, // add to corrupt replicas list
   Collection toUC) { // add to under-construction list
 
-// The blocks must be sorted and the storagenodes blocks must be sorted
-Iterator storageBlocksIterator = storageInfo.getBlockIterator();
+// place a delimiter in the list which separates blocks
+// that have been reported from those that have not
 DatanodeDescriptor dn = storageInfo.getDatanodeDescriptor();
-BlockInfo storageBlock = null;
-
-for (BlockReportReplica replica : newReport) {
-
-  long replicaID = replica.getBlockId();
-  if (BlockIdManager.isStripedBlockID(replicaID)
-  && (!hasNonEcBlockUsingStripedID ||
-  !blocksMap.containsBlock(replica))) {
-replicaID = BlockIdManager.convertToStripedID(replicaID);
-  }
-
-  ReplicaState reportedState = replica.getState();
-
-  LOG.debug("Reported block {} on {} size {} replicaState = {}",
-  replica, dn, replica.getNumBytes(), reportedState);
-
-  if (shouldPostponeBlocksFromFuture
-  && isGenStampInFuture(replica)) {
-queueReportedBlock(storageInfo, replica, reportedState,
-   QUEUE_REASON_FUTURE_GENSTAMP);
-continue;
-  }
-
-  if (storageBlock == null && storageBlocksIterator.hasNext()) {
-storageBlock = storageBlocksIterator.next();
-  }
-
-  do {
-int cmp;
-if (storageBlock == null ||
-(cmp = Long.compare(replicaID, storageBlock.getBlockId())) < 0) {
-  // Check if block is available in NN but not yet on this storage
-  BlockInfo nnBlock = blocksMap.getStoredBlock(new Block(replicaID));
-  if (nnBlock != null) {
-reportDiffSortedInner(storageInfo, replica, reportedState,
-  nnBlock, toAdd, toCorrupt, toUC);
-  } else {
-// Replica not found anywhere so it should be invalidated
-toInvalidate.add(new Block(replica));
-  }
-  break;
-} else if (cmp == 0) {
-  // Replica matched current storageblock
-  reportDiffSortedInner(storageInfo, replica, reportedState,
-storageBlock, toAdd, toCorrupt, toUC);
-  storageBlock = null;
-} else {
-  // replica has higher ID than storedBlock
-  // Remove all stored blocks with IDs lower than replica
-  do {
-toRemove.add(storageBlock);
-storageBlock = storageBlocksIterator.hasNext()
-   ? storageBlocksIterator.next() : null;
-  } while (storageBlock != null &&
-   Long.compare(replicaID, storageBlock.getBlockId()) > 0);
+Block d

[jira] [Work logged] (HDFS-16062) When a DataNode hot reload configuration, JMX will block for a long time

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16062?focusedWorklogId=610008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610008
 ]

ASF GitHub Bot logged work on HDFS-16062:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:09
Start Date: 14/Jun/21 07:09
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3092:
URL: https://github.com/apache/hadoop/pull/3092#issuecomment-859160527


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 54s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 21s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  3s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 24s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 22s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  19m  3s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 55s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3092/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 161 unchanged 
- 1 fixed = 162 total (was 162)  |
   | +1 :green_heart: |  mvnsite  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 48s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  the patch passed  |
   | -1 :x: |  shadedclient  |  18m 56s |  |  patch has errors when building 
and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 765m 23s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3092/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  8s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 858m 19s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestSetrepDecreasing |
   |   | hadoop.hdfs.server.blockmanagement.TestNameNodePrunesMissingStorages |
   |   | hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport |
   |   | hadoop.fs.TestResolveHdfsSymlink |
   |   | hadoop.hdfs.TestClose |
   |   | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
   |   | hadoop.fs.viewfs.TestViewFsDefaultValue |
   |   | hadoop.hdfs.TestWriteBlockGetsBlockLengthHint |
   |   | hadoop.hdfs.TestDatanodeConfig |
   |   | hadoop.hdfs.tools.TestECAdmin |
   |   | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics |
   |   | 
hadoop.hdfs.server.namenode.snapshot.TestSnapshotNameWithInvalidCharacters |
   |   | hadoop.hdfs.TestFileChecksum |
   |   | hadoop.hdfs.server.namenode.TestXAttrConfigFlag |
   |   | hadoop.hdfs.web.TestWebHdfsTokens |
   |   | hadoop.hdfs.security.TestDelegationTokenForProxyUser |
   |   | hadoop.hdfs.server.datanode.TestDa

[jira] [Work logged] (HDFS-16057) Make sure the order for location in ENTERING_MAINTENANCE state

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16057?focusedWorklogId=610018&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610018
 ]

ASF GitHub Bot logged work on HDFS-16057:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:10
Start Date: 14/Jun/21 07:10
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #3084:
URL: https://github.com/apache/hadoop/pull/3084#issuecomment-859187424






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610018)
Time Spent: 1.5h  (was: 1h 20m)

> Make sure the order for location in ENTERING_MAINTENANCE state
> --
>
> Key: HDFS-16057
> URL: https://issues.apache.org/jira/browse/HDFS-16057
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We use comparator to sort locations in getBlockLocations(), and the expected 
> result is: live -> stale -> entering_maintenance -> decommissioned.
> But the networktopology. SortByDistance() will disrupt the order. We should 
> also filtered out node in sate  AdminStates.ENTERING_MAINTENANCE before 
> networktopology. SortByDistance().
>  
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager#sortLocatedBlock()
> {code:java}
> DatanodeInfoWithStorage[] di = lb.getLocations();
> // Move decommissioned/stale datanodes to the bottom
> Arrays.sort(di, comparator);
> // Sort nodes by network distance only for located blocks
> int lastActiveIndex = di.length - 1;
> while (lastActiveIndex > 0 && isInactive(di[lastActiveIndex])) {
>   --lastActiveIndex;
> }
> int activeLen = lastActiveIndex + 1;
> if(nonDatanodeReader) {
>   networktopology.sortByDistanceUsingNetworkLocation(client,
>   lb.getLocations(), activeLen, createSecondaryNodeSorter());
> } else {
>   networktopology.sortByDistance(client, lb.getLocations(), activeLen,
>   createSecondaryNodeSorter());
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-15671) TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15671?focusedWorklogId=610048&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610048
 ]

ASF GitHub Bot logged work on HDFS-15671:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:12
Start Date: 14/Jun/21 07:12
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3097:
URL: https://github.com/apache/hadoop/pull/3097#issuecomment-859199021


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 33s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 10s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 31s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 30s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 47s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 57s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 27s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m 41s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 230m 22s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 48s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 318m 14s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3097/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3097 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux b130e6aeca2a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 7dbfcfb38ed0b58aa72223bf8c9b0b4d276d5e93 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3097/1/testReport/ |
   | Max. process+thread count | 3043 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3097/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This mes

[jira] [Work logged] (HDFS-16023) Improve blockReportLeaseId acquisition to avoid repeated FBR

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16023?focusedWorklogId=610042&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610042
 ]

ASF GitHub Bot logged work on HDFS-16023:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:12
Start Date: 14/Jun/21 07:12
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3091:
URL: https://github.com/apache/hadoop/pull/3091#issuecomment-859581181


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 36s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 28s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 41s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  2s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m  5s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 57s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 249m 26s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3091/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 47s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 335m 17s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap |
   |   | hadoop.hdfs.TestFileChecksum |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3091/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3091 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux da722c292617 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 917dc482186cc690a16304032aedb194cf9dc1ed |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3091/2/testReport/ |
   | Max. process+thread count | 3163 (vs. ulimit of 5500)

[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=610085&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610085
 ]

ASF GitHub Bot logged work on HDFS-13671:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:15
Start Date: 14/Jun/21 07:15
Worklog Time Spent: 10m 
  Work Description: AlphaGouGe commented on pull request #3065:
URL: https://github.com/apache/hadoop/pull/3065#issuecomment-859239871


   @xiaoyuyao Thanks for review, i have update this PR, take a look please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610085)
Time Spent: 4h 10m  (was: 4h)

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=610095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610095
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:16
Start Date: 14/Jun/21 07:16
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r650097026



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
 // Initialize the cache for the DN reports
 Configuration conf = router.getConfig();
-this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
 this.topTokenRealOwners = conf.getInt(
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+// Initialize the cache for the DN reports
+this.dnReportTimeOut = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_TIME_OUT,
+RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+long dnCacheExpire = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE,
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, 
TimeUnit.MILLISECONDS);
+this.dnCache = CacheBuilder.newBuilder()

Review comment:
   I just want to avoid having two caches of the same thing.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610095)
Time Spent: 1h 10m  (was: 1h)

> RBF:  Some indicators of RBFMetrics count inaccurately
> --
>
> Key: HDFS-16039
> URL: https://issues.apache.org/jira/browse/HDFS-16039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity
> The current statistical algorithm is to accumulate all Nn indicators, which 
> will lead to inaccurate counting. I think that the same ClusterID only needs 
> to take one Max and then do the accumulation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-15671) TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15671?focusedWorklogId=610114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610114
 ]

ASF GitHub Bot logged work on HDFS-15671:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:18
Start Date: 14/Jun/21 07:18
Worklog Time Spent: 10m 
  Work Description: jbrennan333 merged pull request #3097:
URL: https://github.com/apache/hadoop/pull/3097


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610114)
Time Spent: 0.5h  (was: 20m)

> TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk
> --
>
> Key: HDFS-15671
> URL: https://issues.apache.org/jira/browse/HDFS-15671
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault.log
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> qbt report shows failures on TestBalancer
> {code:bash}
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault
> Failing for the past 1 build (Since Failed#317 )
> Took 45 sec.
> Error Message
> Timed out waiting for /tmp.txt to reach 20 replicas
> Stacktrace
> java.util.concurrent.TimeoutException: Timed out waiting for /tmp.txt to 
> reach 20 replicas
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:829)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.createFile(TestBalancer.java:319)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:865)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2193)
>   at 
> org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault(TestBalancerRPCDelay.java:53)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=610117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610117
 ]

ASF GitHub Bot logged work on HDFS-13671:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:18
Start Date: 14/Jun/21 07:18
Worklog Time Spent: 10m 
  Work Description: AlphaGouGe commented on a change in pull request #3065:
URL: https://github.com/apache/hadoop/pull/3065#discussion_r649666587



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
##
@@ -3220,21 +3172,28 @@ private void reportDiffSortedInner(
 // comes from the IBR / FBR and hence what we should use to compare
 // against the memory state.
 // See HDFS-6289 and HDFS-15422 for more context.
-queueReportedBlock(storageInfo, replica, reportedState,
+queueReportedBlock(storageInfo, block, reportedState,

Review comment:
   @xiaoyuyao you are right, it should be storedBlock




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610117)
Time Spent: 4h 20m  (was: 4h 10m)

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are

[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=610115&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610115
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:18
Start Date: 14/Jun/21 07:18
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#issuecomment-859227661


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 50s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  23m 24s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  23m 52s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  20m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   4m 12s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |  27m 33s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   8m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   8m  4s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |  35m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  48m 20s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 26s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |  21m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 49s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  21m 49s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  18m 43s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  18m 43s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/2/artifact/out/blanks-eol.txt)
 |  The patch has 8 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   3m 46s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/2/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  mvnsite  |  21m  4s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   7m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   7m 46s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |  34m 56s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  46m 49s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 780m 40s | 
[/patch-unit-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/2/artifact/out/patch-unit-root.txt)
 |  root in the patch passed.  |
   | -1 :x: |  asflicense  |   1m 32s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3086/2/artifact/out/results-asflicense.txt)
 |  The patch generated 1 ASF License warnings.  |
   |  |   | 1117m 13s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.yarn.server.router.clientrm.TestFederationClientInterceptor |
   |   | hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination |
   |   | hadoop.hdfs.server.federation.metrics.TestRBFMetrics |
   |   | hadoop.hdfs.server.federation.router.TestRouterRpc |
   |   | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop

[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=610155&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610155
 ]

ASF GitHub Bot logged work on HDFS-13671:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:22
Start Date: 14/Jun/21 07:22
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3065:
URL: https://github.com/apache/hadoop/pull/3065#issuecomment-859499847


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 59s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  buf  |   0m  1s |  |  buf was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 18 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  34m 42s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 25s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  8s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 16s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  18m 55s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  cc  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  cc  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  0s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3065/6/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 1338 unchanged 
- 13 fixed = 1339 total (was 1351)  |
   | +1 :green_heart: |  mvnsite  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 18s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  18m 39s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 346m  2s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3065/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | -1 :x: |  asflicense  |   0m 38s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3065/6/artifact/out/results-asflicense.txt)
 |  The patch generated 2 ASF License warnings.  |
   |  |   | 439m 46s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
   |   | hadoop.hdfs.server.namenode.ha.TestEditLogTailer |
   |   | hadoop.hdfs.TestDFSShell |
   |   | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsVolumeList |
   |   | 
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor |
   |   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.or

[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=610170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610170
 ]

ASF GitHub Bot logged work on HDFS-13671:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:24
Start Date: 14/Jun/21 07:24
Worklog Time Spent: 10m 
  Work Description: aajisaka commented on a change in pull request #3065:
URL: https://github.com/apache/hadoop/pull/3065#discussion_r649700823



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
##
@@ -3111,106 +3042,127 @@ void processFirstBlockReport(
 }
   }
 
-  private void reportDiffSorted(DatanodeStorageInfo storageInfo,
-  Iterable newReport,
+  private void reportDiff(DatanodeStorageInfo storageInfo,
+  BlockListAsLongs newReport,
   Collection toAdd, // add to DatanodeDescriptor
   Collection toRemove,   // remove from DatanodeDescriptor
   Collection toInvalidate,   // should be removed from DN
   Collection toCorrupt, // add to corrupt replicas list
   Collection toUC) { // add to under-construction list
 
-// The blocks must be sorted and the storagenodes blocks must be sorted
-Iterator storageBlocksIterator = storageInfo.getBlockIterator();
+// place a delimiter in the list which separates blocks
+// that have been reported from those that have not
 DatanodeDescriptor dn = storageInfo.getDatanodeDescriptor();
-BlockInfo storageBlock = null;
-
-for (BlockReportReplica replica : newReport) {
-
-  long replicaID = replica.getBlockId();
-  if (BlockIdManager.isStripedBlockID(replicaID)
-  && (!hasNonEcBlockUsingStripedID ||
-  !blocksMap.containsBlock(replica))) {
-replicaID = BlockIdManager.convertToStripedID(replicaID);
-  }
-
-  ReplicaState reportedState = replica.getState();
-
-  LOG.debug("Reported block {} on {} size {} replicaState = {}",
-  replica, dn, replica.getNumBytes(), reportedState);
-
-  if (shouldPostponeBlocksFromFuture
-  && isGenStampInFuture(replica)) {
-queueReportedBlock(storageInfo, replica, reportedState,
-   QUEUE_REASON_FUTURE_GENSTAMP);
-continue;
-  }
-
-  if (storageBlock == null && storageBlocksIterator.hasNext()) {
-storageBlock = storageBlocksIterator.next();
-  }
-
-  do {
-int cmp;
-if (storageBlock == null ||
-(cmp = Long.compare(replicaID, storageBlock.getBlockId())) < 0) {
-  // Check if block is available in NN but not yet on this storage
-  BlockInfo nnBlock = blocksMap.getStoredBlock(new Block(replicaID));
-  if (nnBlock != null) {
-reportDiffSortedInner(storageInfo, replica, reportedState,
-  nnBlock, toAdd, toCorrupt, toUC);
-  } else {
-// Replica not found anywhere so it should be invalidated
-toInvalidate.add(new Block(replica));
-  }
-  break;
-} else if (cmp == 0) {
-  // Replica matched current storageblock
-  reportDiffSortedInner(storageInfo, replica, reportedState,
-storageBlock, toAdd, toCorrupt, toUC);
-  storageBlock = null;
-} else {
-  // replica has higher ID than storedBlock
-  // Remove all stored blocks with IDs lower than replica
-  do {
-toRemove.add(storageBlock);
-storageBlock = storageBlocksIterator.hasNext()
-   ? storageBlocksIterator.next() : null;
-  } while (storageBlock != null &&
-   Long.compare(replicaID, storageBlock.getBlockId()) > 0);
+Block delimiterBlock = new Block();
+BlockInfo delimiter = new BlockInfoContiguous(delimiterBlock,
+(short) 1);
+AddBlockResult result = storageInfo.addBlock(delimiter, delimiterBlock);
+assert result == AddBlockResult.ADDED
+: "Delimiting block cannot be present in the node";
+int headIndex = 0; //currently the delimiter is in the head of the list
+int curIndex;
+
+if (newReport == null) {
+  newReport = BlockListAsLongs.EMPTY;
+}
+// scan the report and process newly reported blocks
+for (BlockReportReplica iblk : newReport) {
+  ReplicaState iState = iblk.getState();
+  LOG.debug("Reported block {} on {} size {} replicaState = {}", iblk, dn,
+  iblk.getNumBytes(), iState);
+  BlockInfo storedBlock = processReportedBlock(storageInfo,
+  iblk, iState, toAdd, toInvalidate, toCorrupt, toUC);
+
+  // move block to the head of the list
+  if (storedBlock != null) {
+curIndex = storedBlock.findStorageInfo(storageInfo);
+if (curIndex >= 0) {
+  headIndex =
+  stor

[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=610208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610208
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:27
Start Date: 14/Jun/21 07:27
Worklog Time Spent: 10m 
  Work Description: base111 commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r650119510



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
 // Initialize the cache for the DN reports
 Configuration conf = router.getConfig();
-this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
 this.topTokenRealOwners = conf.getInt(
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+// Initialize the cache for the DN reports
+this.dnReportTimeOut = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_TIME_OUT,
+RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+long dnCacheExpire = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE,
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, 
TimeUnit.MILLISECONDS);
+this.dnCache = CacheBuilder.newBuilder()

Review comment:
   Yes,They should use the same dncache. In addition, I want to extract 
NamesystemMetrics and NameNodeInfoMetrics into RBFMetrics. I don't think they 
should be serialized to StateStore and then de-serialized to be used by 
RBFMetrics.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610208)
Time Spent: 1.5h  (was: 1h 20m)

> RBF:  Some indicators of RBFMetrics count inaccurately
> --
>
> Key: HDFS-16039
> URL: https://issues.apache.org/jira/browse/HDFS-16039
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity
> The current statistical algorithm is to accumulate all Nn indicators, which 
> will lead to inaccurate counting. I think that the same ClusterID only needs 
> to take one Max and then do the accumulation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16057) Make sure the order for location in ENTERING_MAINTENANCE state

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16057?focusedWorklogId=610252&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610252
 ]

ASF GitHub Bot logged work on HDFS-16057:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:32
Start Date: 14/Jun/21 07:32
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #3084:
URL: https://github.com/apache/hadoop/pull/3084#issuecomment-859387291


   Merged. Thanks for your contribution, @tomscut. Thanks for your reviews, 
@jojochuang.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610252)
Time Spent: 1h 40m  (was: 1.5h)

> Make sure the order for location in ENTERING_MAINTENANCE state
> --
>
> Key: HDFS-16057
> URL: https://issues.apache.org/jira/browse/HDFS-16057
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We use comparator to sort locations in getBlockLocations(), and the expected 
> result is: live -> stale -> entering_maintenance -> decommissioned.
> But the networktopology. SortByDistance() will disrupt the order. We should 
> also filtered out node in sate  AdminStates.ENTERING_MAINTENANCE before 
> networktopology. SortByDistance().
>  
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager#sortLocatedBlock()
> {code:java}
> DatanodeInfoWithStorage[] di = lb.getLocations();
> // Move decommissioned/stale datanodes to the bottom
> Arrays.sort(di, comparator);
> // Sort nodes by network distance only for located blocks
> int lastActiveIndex = di.length - 1;
> while (lastActiveIndex > 0 && isInactive(di[lastActiveIndex])) {
>   --lastActiveIndex;
> }
> int activeLen = lastActiveIndex + 1;
> if(nonDatanodeReader) {
>   networktopology.sortByDistanceUsingNetworkLocation(client,
>   lb.getLocations(), activeLen, createSecondaryNodeSorter());
> } else {
>   networktopology.sortByDistance(client, lb.getLocations(), activeLen,
>   createSecondaryNodeSorter());
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16039?focusedWorklogId=610275&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610275
 ]

ASF GitHub Bot logged work on HDFS-16039:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:34
Start Date: 14/Jun/21 07:34
Worklog Time Spent: 10m 
  Work Description: zhuxiangyi commented on a change in pull request #3086:
URL: https://github.com/apache/hadoop/pull/3086#discussion_r649727373



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
 // Initialize the cache for the DN reports
 Configuration conf = router.getConfig();
-this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
 this.topTokenRealOwners = conf.getInt(
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+// Initialize the cache for the DN reports
+this.dnReportTimeOut = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_TIME_OUT,
+RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+long dnCacheExpire = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE,
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, 
TimeUnit.MILLISECONDS);
+this.dnCache = CacheBuilder.newBuilder()

Review comment:
   > RouterRpcServer has a similar cache, can we use that?
   
   Yes we can use it. 
   
   NamesystemMetrics and NamenodeInfoMetrics will be stored in StateStore by 
NamenodeBeanMetrics. It does not need to be stored, right? Is it better for us 
to cache it in RBFMetrics.
   
   `  private void updateJMXParameters(
 String address, NamenodeStatusReport report) {
   try {
 // TODO part of this should be moved to its own utility
 getFsNamesystemMetrics(address, report);
 getNamenodeInfoMetrics(address, report);
   } catch (Exception e) {
 LOG.error("Cannot get stat from {} using JMX", getNamenodeDesc(), e);
   }
 }`

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
 // Initialize the cache for the DN reports
 Configuration conf = router.getConfig();
-this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
 this.topTokenRealOwners = conf.getInt(
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+// Initialize the cache for the DN reports
+this.dnReportTimeOut = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_TIME_OUT,
+RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
+long dnCacheExpire = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE,
+RBFConfigKeys.DN_REPORT_CACHE_EXPIRE_MS_DEFAULT, 
TimeUnit.MILLISECONDS);
+this.dnCache = CacheBuilder.newBuilder()

Review comment:
   > RouterRpcServer has a similar cache, can we use that?
   
   Yes we can use it. 
   
   NamesystemMetrics and NamenodeInfoMetrics will be stored in StateStore by 
NamenodeBeanMetrics. It does not need to be stored, right? Is it better for us 
to cache it in RBFMetrics.
   
   ```
   private void updateJMXParameters(
 String address, NamenodeStatusReport report) {
   try {
 // TODO part of this should be moved to its own utility
 getFsNamesystemMetrics(address, report);
 getNamenodeInfoMetrics(address, report);
   } catch (Exception e) {
 LOG.error("Cannot get stat from {} using JMX", getNamenodeDesc(), e);
   }
 }
   ```

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java
##
@@ -165,11 +172,46 @@ public RBFMetrics(Router router) throws IOException {
 
 // Initialize the cache for the DN reports
 Configuration conf = router.getConfig();
-this.timeOut = conf.getTimeDuration(RBFConfigKeys.DN_REPORT_TIME_OUT,
-RBFConfigKeys.DN_REPORT_TIME_OUT_MS_DEFAULT, TimeUnit.MILLISECONDS);
 this.topTokenRealOwners = conf.getInt(
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY,
 RBFConfigKeys.DFS_ROUTER_METRICS_TOP_NUM_TOKEN_OWNERS_KEY_DEFAULT);
+// Initialize the cache for the DN reports
+this.dnReportTimeOut = conf.getTimeDuration(
+RBFConfigKeys.DN_REPORT_TIME_OUT,
+RB

[jira] [Work logged] (HDFS-16057) Make sure the order for location in ENTERING_MAINTENANCE state

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16057?focusedWorklogId=610304&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610304
 ]

ASF GitHub Bot logged work on HDFS-16057:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:37
Start Date: 14/Jun/21 07:37
Worklog Time Spent: 10m 
  Work Description: tasanuma merged pull request #3084:
URL: https://github.com/apache/hadoop/pull/3084


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610304)
Time Spent: 1h 50m  (was: 1h 40m)

> Make sure the order for location in ENTERING_MAINTENANCE state
> --
>
> Key: HDFS-16057
> URL: https://issues.apache.org/jira/browse/HDFS-16057
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We use comparator to sort locations in getBlockLocations(), and the expected 
> result is: live -> stale -> entering_maintenance -> decommissioned.
> But the networktopology. SortByDistance() will disrupt the order. We should 
> also filtered out node in sate  AdminStates.ENTERING_MAINTENANCE before 
> networktopology. SortByDistance().
>  
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager#sortLocatedBlock()
> {code:java}
> DatanodeInfoWithStorage[] di = lb.getLocations();
> // Move decommissioned/stale datanodes to the bottom
> Arrays.sort(di, comparator);
> // Sort nodes by network distance only for located blocks
> int lastActiveIndex = di.length - 1;
> while (lastActiveIndex > 0 && isInactive(di[lastActiveIndex])) {
>   --lastActiveIndex;
> }
> int activeLen = lastActiveIndex + 1;
> if(nonDatanodeReader) {
>   networktopology.sortByDistanceUsingNetworkLocation(client,
>   lb.getLocations(), activeLen, createSecondaryNodeSorter());
> } else {
>   networktopology.sortByDistance(client, lb.getLocations(), activeLen,
>   createSecondaryNodeSorter());
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16065) RBF: Add metrics to record Router's operations

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16065:
--
Labels: pull-request-available  (was: )

> RBF: Add metrics to record Router's operations
> --
>
> Key: HDFS-16065
> URL: https://issues.apache.org/jira/browse/HDFS-16065
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, Router's operations are not well recorded. It would be good to 
> have a similar metrics as "Hadoop:service=NameNode,name=NameNodeActivity" for 
> NameNode, which shows the count for each operations.
> Besides, some operations are invoked concurrently in Routers, know the counts 
> for concurrent operations would help us better knowing about the cluster's 
> state.
> This ticket is to add normal operation metrics and concurrent operation 
> metrics for Router.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=610341&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610341
 ]

ASF GitHub Bot logged work on HDFS-16065:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:40
Start Date: 14/Jun/21 07:40
Worklog Time Spent: 10m 
  Work Description: symious opened a new pull request #3100:
URL: https://github.com/apache/hadoop/pull/3100


   ## What changes were proposed in this pull request?
   Currently, Router's operations are not well recorded. It would be good to 
have a similar metrics as "Hadoop:service=NameNode,name=NameNodeActivity" for 
NameNode, which shows the count for each operations.
   
   Besides, some operations are invoked concurrently in Routers, know the 
counts for concurrent operations would help us better knowing about the 
cluster's state.
   
   This ticket is to add normal operation metrics and concurrent operation 
metrics for Router.
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDFS-16065
   
   ## How was this patch tested?
   
   Add unit test to test normal operation and concurrent ooperation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610341)
Remaining Estimate: 0h
Time Spent: 10m

> RBF: Add metrics to record Router's operations
> --
>
> Key: HDFS-16065
> URL: https://issues.apache.org/jira/browse/HDFS-16065
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Janus Chow
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, Router's operations are not well recorded. It would be good to 
> have a similar metrics as "Hadoop:service=NameNode,name=NameNodeActivity" for 
> NameNode, which shows the count for each operations.
> Besides, some operations are invoked concurrently in Routers, know the counts 
> for concurrent operations would help us better knowing about the cluster's 
> state.
> This ticket is to add normal operation metrics and concurrent operation 
> metrics for Router.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=610349&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610349
 ]

ASF GitHub Bot logged work on HDFS-16065:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:41
Start Date: 14/Jun/21 07:41
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3100:
URL: https://github.com/apache/hadoop/pull/3100#issuecomment-859660072






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610349)
Time Spent: 20m  (was: 10m)

> RBF: Add metrics to record Router's operations
> --
>
> Key: HDFS-16065
> URL: https://issues.apache.org/jira/browse/HDFS-16065
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, Router's operations are not well recorded. It would be good to 
> have a similar metrics as "Hadoop:service=NameNode,name=NameNodeActivity" for 
> NameNode, which shows the count for each operations.
> Besides, some operations are invoked concurrently in Routers, know the counts 
> for concurrent operations would help us better knowing about the cluster's 
> state.
> This ticket is to add normal operation metrics and concurrent operation 
> metrics for Router.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=610356&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610356
 ]

ASF GitHub Bot logged work on HDFS-16065:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:42
Start Date: 14/Jun/21 07:42
Worklog Time Spent: 10m 
  Work Description: symious commented on pull request #3100:
URL: https://github.com/apache/hadoop/pull/3100#issuecomment-859595826


   @goiri Could you have a look at this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610356)
Time Spent: 0.5h  (was: 20m)

> RBF: Add metrics to record Router's operations
> --
>
> Key: HDFS-16065
> URL: https://issues.apache.org/jira/browse/HDFS-16065
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, Router's operations are not well recorded. It would be good to 
> have a similar metrics as "Hadoop:service=NameNode,name=NameNodeActivity" for 
> NameNode, which shows the count for each operations.
> Besides, some operations are invoked concurrently in Routers, know the counts 
> for concurrent operations would help us better knowing about the cluster's 
> state.
> This ticket is to add normal operation metrics and concurrent operation 
> metrics for Router.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16061) DFTestUtil.waitReplication can produce false positives

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16061?focusedWorklogId=610353&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610353
 ]

ASF GitHub Bot logged work on HDFS-16061:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:42
Start Date: 14/Jun/21 07:42
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3095:
URL: https://github.com/apache/hadoop/pull/3095#issuecomment-859933114


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 36s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  34m  3s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m 19s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 52s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   4m 15s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 17s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  8s |  |  
hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 86 unchanged - 2 
fixed = 86 total (was 88)  |
   | +1 :green_heart: |  mvnsite  |   1m 38s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 47s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   4m 30s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 33s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 251m 16s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3095/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 48s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 356m  0s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor 
|
   |   | hadoop.hdfs.TestDFSStripedInputStream |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3095/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3095 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux fa237f688cc1 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4040ad64a4537d47e3633d23daaaefe7e9d7192a |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoo

[jira] [Work logged] (HDFS-16043) HDFS : Delete performance optimization

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16043?focusedWorklogId=610398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610398
 ]

ASF GitHub Bot logged work on HDFS-16043:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:47
Start Date: 14/Jun/21 07:47
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3063:
URL: https://github.com/apache/hadoop/pull/3063#issuecomment-859555619


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 36s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  1s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 8 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 53s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m  7s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m  7s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  7s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m  7s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  0s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3063/3/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 927 unchanged 
- 1 fixed = 931 total (was 928)  |
   | +1 :green_heart: |  mvnsite  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   0m 48s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m  5s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m  5s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 240m 56s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3063/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 45s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 328m 26s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork |
   |   | hadoop.hdfs.TestBlocksScheduledCounter |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3063/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3063 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux 7e43feeb60ba 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 2db702b034d7b309083832c10347d793d0700f03 |
   | Default Java | Private Buil

[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=610483&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610483
 ]

ASF GitHub Bot logged work on HDFS-16065:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 07:56
Start Date: 14/Jun/21 07:56
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #3100:
URL: https://github.com/apache/hadoop/pull/3100#discussion_r65027



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java
##
@@ -471,6 +471,9 @@ private Object invokeMethod(
 if (this.rpcMonitor != null) {
   this.rpcMonitor.proxyOpComplete(true);
 }
+if (this.router.getRouterMetrics() != null) {
+  this.router.getRouterMetrics().incInvokedMethod(method);

Review comment:
   Can you point me to the equivalent for the Namenode?
   I thought that the RouterRpcServer was already tracking most of these 
metrics.

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/metrics/TestRouterMetrics.java
##
@@ -0,0 +1,121 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.federation.metrics;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.HdfsConfiguration;
+import org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster;
+import org.apache.hadoop.hdfs.server.federation.RouterConfigBuilder;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.io.IOException;
+
+import static org.apache.hadoop.test.MetricsAsserts.assertCounter;
+import static org.apache.hadoop.test.MetricsAsserts.getMetrics;
+
+/**
+ * Test case for FilesInGetListingOps metric in Namenode
+ */
+public class TestRouterMetrics {
+  private static final Configuration CONF = new HdfsConfiguration();
+  private static final String ROUTER_METRICS = "RouterActivity";
+  static {
+CONF.setLong(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, 100);
+CONF.setInt(DFSConfigKeys.DFS_BYTES_PER_CHECKSUM_KEY, 1);
+CONF.setLong(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 1L);
+CONF.setInt(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY, 1);
+  }
+
+  private static final int NUM_SUBCLUSTERS = 2;
+  private static final int NUM_DNS = 3;
+
+
+  /** Federated HDFS cluster. */
+  private static MiniRouterDFSCluster cluster;
+
+  /** Random Router for this federated cluster. */
+  private MiniRouterDFSCluster.RouterContext router;
+
+  /** Filesystem interface to the Router. */
+  private FileSystem routerFS;
+  /** Filesystem interface to the Namenode. */
+  private FileSystem nnFS;
+
+  @BeforeClass
+  public static void globalSetUp() throws Exception {
+cluster = new MiniRouterDFSCluster(false, NUM_SUBCLUSTERS);
+cluster.setNumDatanodesPerNameservice(NUM_DNS);
+cluster.startCluster();
+
+Configuration routerConf = new RouterConfigBuilder()
+.metrics()
+.rpc()
+.build();
+cluster.addRouterOverrides(routerConf);
+cluster.startRouters();
+
+// Register and verify all NNs with all routers
+cluster.registerNamenodes();
+cluster.waitNamenodeRegistration();
+
+  }
+
+  @Before
+  public void testSetup() throws Exception {
+// Create mock locations
+cluster.installMockLocations();
+
+// Delete all files via the NNs and verify
+cluster.deleteAllFiles();
+
+// Create test fixtures on NN
+cluster.createTestDirectoriesNamenode();
+
+// Wait to ensure NN has fully created its test directories
+Thread.sleep(100);
+
+router = cluster.getRouters().get(0);
+this.routerFS = router.getFileSystem();
+
+  }
+
+  @AfterClass
+  public static void tearDown() throws Exception {
+cluster.shutdown();
+  }
+
+  @Test
+  public void testGetListing() throws IOException {

Review com

[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=610565&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610565
 ]

ASF GitHub Bot logged work on HDFS-16065:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 08:07
Start Date: 14/Jun/21 08:07
Worklog Time Spent: 10m 
  Work Description: symious commented on a change in pull request #3100:
URL: https://github.com/apache/hadoop/pull/3100#discussion_r650457056



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java
##
@@ -471,6 +471,9 @@ private Object invokeMethod(
 if (this.rpcMonitor != null) {
   this.rpcMonitor.proxyOpComplete(true);
 }
+if (this.router.getRouterMetrics() != null) {
+  this.router.getRouterMetrics().incInvokedMethod(method);

Review comment:
   @goiri Thanks for the review.
   A similar metrics for NameNode is 
"org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics".
   I think this RouterMetrics is trying to monitor from a different view of 
RouterRpcServer. IMHO, RouterRpcServer is monitoring Router as a Server, the 
new metrics would be treating Router as a Client. Some operations are not 
triggered by user requests but from Router itself.
   In our cluster, Router's async call queue was jammed, but we couldn't get 
the operations from the RouterRpcServer metrics. Then we found most of the 
operations are "getDatanodeReport" which are invoked by Router's 
FederationMetrics and not by users.

##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/metrics/TestRouterMetrics.java
##
@@ -0,0 +1,121 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.federation.metrics;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.HdfsConfiguration;
+import org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster;
+import org.apache.hadoop.hdfs.server.federation.RouterConfigBuilder;
+import org.junit.AfterClass;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import java.io.IOException;
+
+import static org.apache.hadoop.test.MetricsAsserts.assertCounter;
+import static org.apache.hadoop.test.MetricsAsserts.getMetrics;
+
+/**
+ * Test case for FilesInGetListingOps metric in Namenode
+ */
+public class TestRouterMetrics {
+  private static final Configuration CONF = new HdfsConfiguration();
+  private static final String ROUTER_METRICS = "RouterActivity";
+  static {
+CONF.setLong(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, 100);
+CONF.setInt(DFSConfigKeys.DFS_BYTES_PER_CHECKSUM_KEY, 1);
+CONF.setLong(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 1L);
+CONF.setInt(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY, 1);
+  }
+
+  private static final int NUM_SUBCLUSTERS = 2;
+  private static final int NUM_DNS = 3;
+
+
+  /** Federated HDFS cluster. */
+  private static MiniRouterDFSCluster cluster;
+
+  /** Random Router for this federated cluster. */
+  private MiniRouterDFSCluster.RouterContext router;
+
+  /** Filesystem interface to the Router. */
+  private FileSystem routerFS;
+  /** Filesystem interface to the Namenode. */
+  private FileSystem nnFS;
+
+  @BeforeClass
+  public static void globalSetUp() throws Exception {
+cluster = new MiniRouterDFSCluster(false, NUM_SUBCLUSTERS);
+cluster.setNumDatanodesPerNameservice(NUM_DNS);
+cluster.startCluster();
+
+Configuration routerConf = new RouterConfigBuilder()
+.metrics()
+.rpc()
+.build();
+cluster.addRouterOverrides(routerConf);
+cluster.startRouters();
+
+// Register and verify all NNs with all routers
+cluster.registerNamenodes();
+cluster.waitNamenodeRegistration();
+
+  }
+
+  @Before
+  public void testSetup() throws Exception {
+// Create mock locations
+cluster.installMockLo

[jira] [Issue Comment Deleted] (HDFS-9126) namenode crash in fsimage download/transfer

2021-06-14 Thread Seokchan Yoon (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seokchan Yoon updated HDFS-9126:

Comment: was deleted

(was: Why is this closed? I ran into the same situation and need to figure out 
the reason why the previous active NN failed on doHealthChecks.)

> namenode crash in fsimage download/transfer
> ---
>
> Key: HDFS-9126
> URL: https://issues.apache.org/jira/browse/HDFS-9126
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: OS:Centos 6.5(final)
> Apache Hadoop:2.6.0
> namenode ha base 5 journalnodes
>Reporter: zengyongping
>Priority: Critical
>
> In our product Hadoop cluster,when active namenode begin download/transfer 
> fsimage from standby namenode.some times zkfc monitor health of NameNode 
> socket timeout,zkfs judge active namenode status SERVICE_NOT_RESPONDING 
> ,happen hadoop namenode ha failover,fence old active namenode.
> zkfc logs:
> 2015-09-24 11:44:44,739 WARN org.apache.hadoop.ha.HealthMonitor: 
> Transport-level exception trying to monitor health of NameNode at 
> hostname1/192.168.10.11:8020: Call From hostname1/192.168.10.11 to 
> hostname1:8020 failed on socket timeout exception: 
> java.net.SocketTimeoutException: 45000 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/192.168.10.11:22614 remote=hostname1/192.168.10.11:8020]; For more 
> details see:  http://wiki.apache.org/hadoop/SocketTimeout
> 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.HealthMonitor: Entering 
> state SERVICE_NOT_RESPONDING
> 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ZKFailoverController: Local 
> service NameNode at hostname1/192.168.10.11:8020 entered state: 
> SERVICE_NOT_RESPONDING
> 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ZKFailoverController: 
> Quitting master election for NameNode at hostname1/192.168.10.11:8020 and 
> marking that fencing is necessary
> 2015-09-24 11:44:44,740 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Yielding from election
> 2015-09-24 11:44:44,761 INFO org.apache.zookeeper.ZooKeeper: Session: 
> 0x54d81348fe503e3 closed
> 2015-09-24 11:44:44,761 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
> Ignoring stale result from old client with sessionId 0x54d81348fe503e3
> 2015-09-24 11:44:44,764 INFO org.apache.zookeeper.ClientCnxn: EventThread 
> shut down
> namenode logs:
> 2015-09-24 11:43:34,074 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 
> 192.168.10.12
> 2015-09-24 11:43:34,074 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
> 2015-09-24 11:43:34,075 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 
> 2317430129
> 2015-09-24 11:43:34,253 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 
> 272988 Total time for transactions(ms): 5502 Number of transactions batched 
> in Syncs: 146274 Number of syncs: 32375 SyncTimes(ms): 274465 319599
> 2015-09-24 11:43:46,005 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: 
> Rescanning after 3 milliseconds
> 2015-09-24 11:44:21,054 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReplicationMonitor timed out blk_1185804191_112164210
> 2015-09-24 11:44:36,076 INFO 
> org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits 
> file 
> /software/data/hadoop-data/hdfs/namenode/current/edits_inprogress_02317430129
>  -> 
> /software/data/hadoop-data/hdfs/namenode/current/edits_02317430129-02317703116
> 2015-09-24 11:44:36,077 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 
> 2317703117
> 2015-09-24 11:45:38,008 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1 
> Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 
> Number of syncs: 0 SyncTimes(ms): 0 61585
> 2015-09-24 11:45:38,009 INFO 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took 222.88s 
> at 63510.29 KB/s
> 2015-09-24 11:45:38,009 INFO 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Downloaded file 
> fsimage.ckpt_02317430128 size 14495092105 bytes.
> 2015-09-24 11:45:38,416 WARN 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal 
> 192.168.10.13:8485 failed to write txns 2317703117-2317703117. Will try to 
> write to this JN again after the next log roll.
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 44 is 
> less than the last promised epoch 45
> at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:414)
> at 
> org.apache.hadoop.hd

[jira] [Commented] (HDFS-15352) WebHdfsFileSystem does not log the exception that causes retries

2021-06-14 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362911#comment-17362911
 ] 

Ayush Saxena commented on HDFS-15352:
-

can we keep the  trace in debug mode?

> WebHdfsFileSystem does not log the exception that causes retries
> 
>
> Key: HDFS-15352
> URL: https://issues.apache.org/jira/browse/HDFS-15352
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 3.3.1
> Environment: When the WebHdfsFileSystem performs retries, it swallows 
> up the original exception if retries are successful. This makes debugging the 
> source of latency spikes difficult.
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Minor
> Attachments: HDFS-15352.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16016) BPServiceActor add a new thread to handle IBR

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16016?focusedWorklogId=610677&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610677
 ]

ASF GitHub Bot logged work on HDFS-16016:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 13:52
Start Date: 14/Jun/21 13:52
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on pull request #2998:
URL: https://github.com/apache/hadoop/pull/2998#issuecomment-860701898


   Thanks for the review @smengcl. I have addressed your concerns.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610677)
Time Spent: 4.5h  (was: 4h 20m)

> BPServiceActor add a new thread to handle IBR
> -
>
> Key: HDFS-16016
> URL: https://issues.apache.org/jira/browse/HDFS-16016
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. 
> We can handle IBR independently to improve the performance of heartbeat and 
> FBR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16016) BPServiceActor add a new thread to handle IBR

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16016?focusedWorklogId=610682&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610682
 ]

ASF GitHub Bot logged work on HDFS-16016:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 13:53
Start Date: 14/Jun/21 13:53
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on a change in pull request #2998:
URL: https://github.com/apache/hadoop/pull/2998#discussion_r650968368



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
##
@@ -1104,6 +1122,34 @@ private void sendLifeline() throws IOException {
 }
   }
 
+  class IBRTaskHandler implements Runnable {
+
+@Override
+public void run() {
+  LOG.info("Starting IBR Task Handler.");
+  while (shouldRun()) {
+try {
+  final long startTime = scheduler.monotonicNow();
+  final boolean sendHeartbeat = scheduler.isHeartbeatDue(startTime);
+  if (!dn.areIBRDisabledForTests() &&
+  (ibrManager.sendImmediately() || sendHeartbeat)) {
+synchronized (sendIBRLock) {
+  ibrManager.sendIBRs(bpNamenode, bpRegistration,
+  bpos.getBlockPoolId(), getRpcMetricSuffix());
+}
+  }
+  // There is no work to do;  sleep until hearbeat timer elapses,
+  // or work arrives, and then iterate again.
+  ibrManager.waitTillNextIBR(scheduler.getHeartbeatWaitTime());

Review comment:
   > With IBR separated in a new thread, maybe later we could have a new 
config key that controls IBR interval separately, or add a configurable 
constant offset (from the FBR timer) to the IBR timer. This isn't something we 
need to add to this jira. Just a thought.
   
   I agree. We can add new config or continue with IBR/FBR expiring around same 
time for some time.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610682)
Time Spent: 4h 40m  (was: 4.5h)

> BPServiceActor add a new thread to handle IBR
> -
>
> Key: HDFS-16016
> URL: https://issues.apache.org/jira/browse/HDFS-16016
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. 
> We can handle IBR independently to improve the performance of heartbeat and 
> FBR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=610744&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610744
 ]

ASF GitHub Bot logged work on HDFS-16065:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 15:51
Start Date: 14/Jun/21 15:51
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3100:
URL: https://github.com/apache/hadoop/pull/3100#issuecomment-860794285


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 52s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 46s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 24s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 19s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  14m 22s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 22s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3100/3/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 573 new + 0 
unchanged - 0 fixed = 573 total (was 0)  |
   | +1 :green_heart: |  mvnsite  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  18m 14s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 19s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 31s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  98m 29s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3100/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3100 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 638e995dec2b 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 00e68678f1ce2a0e21a640774d00bcd1859821d1 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3100/3/testReport/ |
   | Max. process+thread count | 2359 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/h

[jira] [Updated] (HDFS-15659) Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster

2021-06-14 Thread Jim Brennan (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Brennan updated HDFS-15659:
---
Fix Version/s: 3.3.2

> Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster
> ---
>
> Key: HDFS-15659
> URL: https://issues.apache.org/jira/browse/HDFS-15659
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> dfs.namenode.redundancy.considerLoad is true by default and it is causing 
> many test failures. Let's disable it in MiniDFSCluster.
> Originally reported by [~weichiu]: 
> https://github.com/apache/hadoop/pull/2410#pullrequestreview-51612
> {quote}
> i've certain seen this option causing test failures in the past.
> Maybe we should turn it off by default in MiniDDFSCluster, and only enable it 
> for specific tests.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16065) RBF: Add metrics to record Router's operations

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16065?focusedWorklogId=610758&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610758
 ]

ASF GitHub Bot logged work on HDFS-16065:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 16:26
Start Date: 14/Jun/21 16:26
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #3100:
URL: https://github.com/apache/hadoop/pull/3100#discussion_r651097583



##
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java
##
@@ -471,6 +471,9 @@ private Object invokeMethod(
 if (this.rpcMonitor != null) {
   this.rpcMonitor.proxyOpComplete(true);
 }
+if (this.router.getRouterMetrics() != null) {
+  this.router.getRouterMetrics().incInvokedMethod(method);

Review comment:
   You are adding all these metrics raw to RouterMetrics.
   I'm wondering if we should have something that refers to this being metrics 
for the client.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610758)
Time Spent: 1h 10m  (was: 1h)

> RBF: Add metrics to record Router's operations
> --
>
> Key: HDFS-16065
> URL: https://issues.apache.org/jira/browse/HDFS-16065
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Janus Chow
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, Router's operations are not well recorded. It would be good to 
> have a similar metrics as "Hadoop:service=NameNode,name=NameNodeActivity" for 
> NameNode, which shows the count for each operations.
> Besides, some operations are invoked concurrently in Routers, know the counts 
> for concurrent operations would help us better knowing about the cluster's 
> state.
> This ticket is to add normal operation metrics and concurrent operation 
> metrics for Router.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15659) Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster

2021-06-14 Thread Jim Brennan (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363057#comment-17363057
 ] 

Jim Brennan commented on HDFS-15659:


[~ahussein] I cherry-picked this to branch-3.3, but there are merge conflicts 
when trying to pull back further.  Please provide patches for earlier branches, 
if desired.


> Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster
> ---
>
> Key: HDFS-15659
> URL: https://issues.apache.org/jira/browse/HDFS-15659
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> dfs.namenode.redundancy.considerLoad is true by default and it is causing 
> many test failures. Let's disable it in MiniDFSCluster.
> Originally reported by [~weichiu]: 
> https://github.com/apache/hadoop/pull/2410#pullrequestreview-51612
> {quote}
> i've certain seen this option causing test failures in the past.
> Maybe we should turn it off by default in MiniDDFSCluster, and only enable it 
> for specific tests.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16055) Quota is not preserved in snapshot INode

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16055?focusedWorklogId=610792&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610792
 ]

ASF GitHub Bot logged work on HDFS-16055:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 17:48
Start Date: 14/Jun/21 17:48
Worklog Time Spent: 10m 
  Work Description: smengcl commented on pull request #3078:
URL: https://github.com/apache/hadoop/pull/3078#issuecomment-860872516


   UT failures are unrelated now. Will merge shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610792)
Time Spent: 1h 20m  (was: 1h 10m)

> Quota is not preserved in snapshot INode
> 
>
> Key: HDFS-16055
> URL: https://issues.apache.org/jira/browse/HDFS-16055
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Quota feature is not preserved during snapshot creation, this causes 
> {{INodeDirectory#metadataEquals}} to ALWAYS return true. Therefore, 
> {{snapshotDiff}} will ALWAYS return the snapshot root as modified, even if 
> the quota is set before the snapshot creation:
> {code:bash}
> $ hdfs snapshotDiff /diffTest s0 .
> Difference between snapshot s0 and current directory under directory 
> /diffTest:
> M .
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16055) Quota is not preserved in snapshot INode

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16055?focusedWorklogId=610794&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610794
 ]

ASF GitHub Bot logged work on HDFS-16055:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 17:48
Start Date: 14/Jun/21 17:48
Worklog Time Spent: 10m 
  Work Description: smengcl merged pull request #3078:
URL: https://github.com/apache/hadoop/pull/3078


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610794)
Time Spent: 1.5h  (was: 1h 20m)

> Quota is not preserved in snapshot INode
> 
>
> Key: HDFS-16055
> URL: https://issues.apache.org/jira/browse/HDFS-16055
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Quota feature is not preserved during snapshot creation, this causes 
> {{INodeDirectory#metadataEquals}} to ALWAYS return true. Therefore, 
> {{snapshotDiff}} will ALWAYS return the snapshot root as modified, even if 
> the quota is set before the snapshot creation:
> {code:bash}
> $ hdfs snapshotDiff /diffTest s0 .
> Difference between snapshot s0 and current directory under directory 
> /diffTest:
> M .
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16055) Quota is not preserved in snapshot INode

2021-06-14 Thread Siyao Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDFS-16055:
--
Fix Version/s: (was: 3.3.2)
   3.4.0

> Quota is not preserved in snapshot INode
> 
>
> Key: HDFS-16055
> URL: https://issues.apache.org/jira/browse/HDFS-16055
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Quota feature is not preserved during snapshot creation, this causes 
> {{INodeDirectory#metadataEquals}} to ALWAYS return true. Therefore, 
> {{snapshotDiff}} will ALWAYS return the snapshot root as modified, even if 
> the quota is set before the snapshot creation:
> {code:bash}
> $ hdfs snapshotDiff /diffTest s0 .
> Difference between snapshot s0 and current directory under directory 
> /diffTest:
> M .
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16055) Quota is not preserved in snapshot INode

2021-06-14 Thread Siyao Meng (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDFS-16055:
--
   Fix Version/s: 3.3.2
Target Version/s:   (was: 3.3.2)
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

> Quota is not preserved in snapshot INode
> 
>
> Key: HDFS-16055
> URL: https://issues.apache.org/jira/browse/HDFS-16055
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Quota feature is not preserved during snapshot creation, this causes 
> {{INodeDirectory#metadataEquals}} to ALWAYS return true. Therefore, 
> {{snapshotDiff}} will ALWAYS return the snapshot root as modified, even if 
> the quota is set before the snapshot creation:
> {code:bash}
> $ hdfs snapshotDiff /diffTest s0 .
> Difference between snapshot s0 and current directory under directory 
> /diffTest:
> M .
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-15150) Introduce read write lock to Datanode

2021-06-14 Thread Ahmed Hussein (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein reopened HDFS-15150:
--

Thanks [~sodonnell] and [~weichiu] for introducing this optimization.
I am opening the issue in order to submit patches backporting the changes to 
branches 2.10 - 3.x

> Introduce read write lock to Datanode
> -
>
> Key: HDFS-15150
> URL: https://issues.apache.org/jira/browse/HDFS-15150
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15150.001.patch, HDFS-15150.002.patch, 
> HDFS-15150.003.patch
>
>
> HDFS-9668 pointed out the issues around the DN lock being a point of 
> contention some time ago, but that Jira went in a direction of creating a new 
> FSDataset implementation which is very risky, and activity on the Jira has 
> stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a 
> similar direction to what I was thinking, so I will review that Jira in more 
> detail to see if this one is necessary.
> I feel there could be significant gains by moving to a ReentrantReadWrite 
> lock within the DN. The current implementation is simply a ReentrantLock so 
> any locker blocks all others.
> Once place I think a read lock would benefit us significantly, is when the DN 
> is serving a lot of small blocks and there are jobs which perform a lot of 
> reads. The start of reading any blocks right now takes the lock, but if we 
> moved this to a read lock, many reads could do this at the same time.
> The first conservative step, would be to change the current lock and then 
> make all accesses to it obtain the write lock. That way, we should keep the 
> current behaviour and then we can selectively move some lock accesses to the 
> readlock in separate Jiras.
> I would appreciate any thoughts on this, and also if anyone has attempted it 
> before and found any blockers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15150) Introduce read write lock to Datanode

2021-06-14 Thread Ahmed Hussein (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-15150:
-
Attachment: HDFS-1515-branch-2.10.001.patch

> Introduce read write lock to Datanode
> -
>
> Key: HDFS-15150
> URL: https://issues.apache.org/jira/browse/HDFS-15150
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15150.001.patch, HDFS-15150.002.patch, 
> HDFS-15150.003.patch
>
>
> HDFS-9668 pointed out the issues around the DN lock being a point of 
> contention some time ago, but that Jira went in a direction of creating a new 
> FSDataset implementation which is very risky, and activity on the Jira has 
> stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a 
> similar direction to what I was thinking, so I will review that Jira in more 
> detail to see if this one is necessary.
> I feel there could be significant gains by moving to a ReentrantReadWrite 
> lock within the DN. The current implementation is simply a ReentrantLock so 
> any locker blocks all others.
> Once place I think a read lock would benefit us significantly, is when the DN 
> is serving a lot of small blocks and there are jobs which perform a lot of 
> reads. The start of reading any blocks right now takes the lock, but if we 
> moved this to a read lock, many reads could do this at the same time.
> The first conservative step, would be to change the current lock and then 
> make all accesses to it obtain the write lock. That way, we should keep the 
> current behaviour and then we can selectively move some lock accesses to the 
> readlock in separate Jiras.
> I would appreciate any thoughts on this, and also if anyone has attempted it 
> before and found any blockers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15150) Introduce read write lock to Datanode

2021-06-14 Thread Ahmed Hussein (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-15150:
-
Attachment: (was: HDFS-1515-branch-2.10.001.patch)

> Introduce read write lock to Datanode
> -
>
> Key: HDFS-15150
> URL: https://issues.apache.org/jira/browse/HDFS-15150
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15150.001.patch, HDFS-15150.002.patch, 
> HDFS-15150.003.patch
>
>
> HDFS-9668 pointed out the issues around the DN lock being a point of 
> contention some time ago, but that Jira went in a direction of creating a new 
> FSDataset implementation which is very risky, and activity on the Jira has 
> stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a 
> similar direction to what I was thinking, so I will review that Jira in more 
> detail to see if this one is necessary.
> I feel there could be significant gains by moving to a ReentrantReadWrite 
> lock within the DN. The current implementation is simply a ReentrantLock so 
> any locker blocks all others.
> Once place I think a read lock would benefit us significantly, is when the DN 
> is serving a lot of small blocks and there are jobs which perform a lot of 
> reads. The start of reading any blocks right now takes the lock, but if we 
> moved this to a read lock, many reads could do this at the same time.
> The first conservative step, would be to change the current lock and then 
> make all accesses to it obtain the write lock. That way, we should keep the 
> current behaviour and then we can selectively move some lock accesses to the 
> readlock in separate Jiras.
> I would appreciate any thoughts on this, and also if anyone has attempted it 
> before and found any blockers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15150) Introduce read write lock to Datanode

2021-06-14 Thread Ahmed Hussein (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-15150:
-
Attachment: HDFS-15150-branch-2.10.001.patch

> Introduce read write lock to Datanode
> -
>
> Key: HDFS-15150
> URL: https://issues.apache.org/jira/browse/HDFS-15150
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15150-branch-2.10.001.patch, HDFS-15150.001.patch, 
> HDFS-15150.002.patch, HDFS-15150.003.patch
>
>
> HDFS-9668 pointed out the issues around the DN lock being a point of 
> contention some time ago, but that Jira went in a direction of creating a new 
> FSDataset implementation which is very risky, and activity on the Jira has 
> stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a 
> similar direction to what I was thinking, so I will review that Jira in more 
> detail to see if this one is necessary.
> I feel there could be significant gains by moving to a ReentrantReadWrite 
> lock within the DN. The current implementation is simply a ReentrantLock so 
> any locker blocks all others.
> Once place I think a read lock would benefit us significantly, is when the DN 
> is serving a lot of small blocks and there are jobs which perform a lot of 
> reads. The start of reading any blocks right now takes the lock, but if we 
> moved this to a read lock, many reads could do this at the same time.
> The first conservative step, would be to change the current lock and then 
> make all accesses to it obtain the write lock. That way, we should keep the 
> current behaviour and then we can selectively move some lock accesses to the 
> readlock in separate Jiras.
> I would appreciate any thoughts on this, and also if anyone has attempted it 
> before and found any blockers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16016) BPServiceActor add a new thread to handle IBR

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16016?focusedWorklogId=610846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610846
 ]

ASF GitHub Bot logged work on HDFS-16016:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 19:03
Start Date: 14/Jun/21 19:03
Worklog Time Spent: 10m 
  Work Description: smengcl commented on a change in pull request #2998:
URL: https://github.com/apache/hadoop/pull/2998#discussion_r651200801



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeReport.java
##
@@ -172,8 +172,20 @@ public void testDatanodeReportMissingBlock() throws 
Exception {
 // all bad datanodes
   }
   cluster.triggerHeartbeats(); // IBR delete ack
-  lb = fs.getClient().getLocatedBlocks(p.toString(), 0).get(0);
-  assertEquals(0, lb.getLocations().length);
+  int retries = 0;
+  while (true) {
+lb = fs.getClient().getLocatedBlocks(p.toString(), 0).get(0);
+if (0 != lb.getLocations().length) {
+  retries++;
+  if (retries > 7) {
+Assert.fail("getLocatedBlocks failed after 7 retries");
+break;

Review comment:
   ```suggestion
   ```
   nit: `break` is unnecessary now. `Assert.fail` will throw `AssertionError`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610846)
Time Spent: 4h 50m  (was: 4h 40m)

> BPServiceActor add a new thread to handle IBR
> -
>
> Key: HDFS-16016
> URL: https://issues.apache.org/jira/browse/HDFS-16016
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. 
> We can handle IBR independently to improve the performance of heartbeat and 
> FBR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15150) Introduce read write lock to Datanode

2021-06-14 Thread Ahmed Hussein (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-15150:
-
Status: Patch Available  (was: Reopened)

> Introduce read write lock to Datanode
> -
>
> Key: HDFS-15150
> URL: https://issues.apache.org/jira/browse/HDFS-15150
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-15150-branch-2.10.001.patch, HDFS-15150.001.patch, 
> HDFS-15150.002.patch, HDFS-15150.003.patch
>
>
> HDFS-9668 pointed out the issues around the DN lock being a point of 
> contention some time ago, but that Jira went in a direction of creating a new 
> FSDataset implementation which is very risky, and activity on the Jira has 
> stalled for a few years now. Edit: Looks like HDFS-9668 eventually went in a 
> similar direction to what I was thinking, so I will review that Jira in more 
> detail to see if this one is necessary.
> I feel there could be significant gains by moving to a ReentrantReadWrite 
> lock within the DN. The current implementation is simply a ReentrantLock so 
> any locker blocks all others.
> Once place I think a read lock would benefit us significantly, is when the DN 
> is serving a lot of small blocks and there are jobs which perform a lot of 
> reads. The start of reading any blocks right now takes the lock, but if we 
> moved this to a read lock, many reads could do this at the same time.
> The first conservative step, would be to change the current lock and then 
> make all accesses to it obtain the write lock. That way, we should keep the 
> current behaviour and then we can selectively move some lock accesses to the 
> readlock in separate Jiras.
> I would appreciate any thoughts on this, and also if anyone has attempted it 
> before and found any blockers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-15659) Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster

2021-06-14 Thread Ahmed Hussein (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein reopened HDFS-15659:
--

Reopen issue to submit patches for earlier branches 2.10-3.x

> Set dfs.namenode.redundancy.considerLoad to false in MiniDFSCluster
> ---
>
> Key: HDFS-15659
> URL: https://issues.apache.org/jira/browse/HDFS-15659
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Reporter: Akira Ajisaka
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> dfs.namenode.redundancy.considerLoad is true by default and it is causing 
> many test failures. Let's disable it in MiniDFSCluster.
> Originally reported by [~weichiu]: 
> https://github.com/apache/hadoop/pull/2410#pullrequestreview-51612
> {quote}
> i've certain seen this option causing test failures in the past.
> Maybe we should turn it off by default in MiniDDFSCluster, and only enable it 
> for specific tests.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-14575) LeaseRenewer#daemon threads leak in DFSClient

2021-06-14 Thread Renukaprasad C (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renukaprasad C updated HDFS-14575:
--
Attachment: HDFS-14575.003.patch

> LeaseRenewer#daemon threads leak in DFSClient
> -
>
> Key: HDFS-14575
> URL: https://issues.apache.org/jira/browse/HDFS-14575
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: HDFS-14575.001.patch, HDFS-14575.002.patch, 
> HDFS-14575.003.patch
>
>
> Currently LeaseRenewer (and its daemon thread) without clients should be 
> terminated after a grace period which defaults to 60 seconds. A race 
> condition may happen when a new request is coming just after LeaseRenewer 
> expired.
>  Reproduce this race condition:
>  # Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 
> thread, after a few seconds, File#1 is closed , there is no clients in 
> LeaseRenewer#1 now.
>  # 60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1 
> thread is still in sleep, Client#1 creates File#2, lead to the creation of 
> Daemon#2.
>  # Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from 
> factory.
>  # File#2 is closed after a few seconds, LeaseRenewer#2 is created since it 
> can’t get renewer from factory.
> Daemon#2 thread leaks from now on, since Client#1 in it can never be removed 
> and it won't have a chance to stop.
> To solve this problem, IIUIC, a simple way I think is to make sure that all 
> clients are cleared when LeaseRenewer is removed from factory. Please feel 
> free to give your suggestions. Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14575) LeaseRenewer#daemon threads leak in DFSClient

2021-06-14 Thread Renukaprasad C (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363180#comment-17363180
 ] 

Renukaprasad C commented on HDFS-14575:
---

[~Tao Yang] [~weichiu] [~hexiaoqiao] [~hemanthboyina] [~brahma] Changes done as 
[~weichiu] suggested and uploaded HDFS-14575.003.patch. Can you please have a 
look into when you get time? 

With the changes i had run the test case in loop & in cluster verified read / 
write basic operations and SGL tool with 1K files.

> LeaseRenewer#daemon threads leak in DFSClient
> -
>
> Key: HDFS-14575
> URL: https://issues.apache.org/jira/browse/HDFS-14575
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
> Attachments: HDFS-14575.001.patch, HDFS-14575.002.patch, 
> HDFS-14575.003.patch
>
>
> Currently LeaseRenewer (and its daemon thread) without clients should be 
> terminated after a grace period which defaults to 60 seconds. A race 
> condition may happen when a new request is coming just after LeaseRenewer 
> expired.
>  Reproduce this race condition:
>  # Client#1 creates File#1: creates LeaseRenewer#1 and starts Daemon#1 
> thread, after a few seconds, File#1 is closed , there is no clients in 
> LeaseRenewer#1 now.
>  # 60 seconds (grace period) later, LeaseRenewer#1 just expires but daemon#1 
> thread is still in sleep, Client#1 creates File#2, lead to the creation of 
> Daemon#2.
>  # Daemon#1 is awake then exit, after that, LeaseRenewer#1 is removed from 
> factory.
>  # File#2 is closed after a few seconds, LeaseRenewer#2 is created since it 
> can’t get renewer from factory.
> Daemon#2 thread leaks from now on, since Client#1 in it can never be removed 
> and it won't have a chance to stop.
> To solve this problem, IIUIC, a simple way I think is to make sure that all 
> clients are cleared when LeaseRenewer is removed from factory. Please feel 
> free to give your suggestions. Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16066) Enhance NNThroughputBenchmark functionality

2021-06-14 Thread Renukaprasad C (Jira)

Renukaprasad C created HDFS-16066:
-

 Summary: Enhance NNThroughputBenchmark functionality
 Key: HDFS-16066
 URL: https://issues.apache.org/jira/browse/HDFS-16066
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Renukaprasad C






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16067) Support Append API in NNThroughputBenchmark

2021-06-14 Thread Renukaprasad C (Jira)

Renukaprasad C created HDFS-16067:
-

 Summary: Support Append API in NNThroughputBenchmark
 Key: HDFS-16067
 URL: https://issues.apache.org/jira/browse/HDFS-16067
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Renukaprasad C
Assignee: Renukaprasad C


Append API needs to be added into NNThroughputBenchmark tool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14575) LeaseRenewer#daemon threads leak in DFSClient

2021-06-14 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363206#comment-17363206
 ] 

Hadoop QA commented on HDFS-14575:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
49s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
43s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 32s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 18m 
20s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m 
28s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
44s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 22s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/622/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt{color}
 | {color:orange} hadoop-hdfs-project/hadoop-hdfs-client: The patch generated 2 
new + 68 unchanged - 0 fixed = 70 total (was 68) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 37s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green}{

[jira] [Work logged] (HDFS-16016) BPServiceActor add a new thread to handle IBR

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16016?focusedWorklogId=610942&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610942
 ]

ASF GitHub Bot logged work on HDFS-16016:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 21:22
Start Date: 14/Jun/21 21:22
Worklog Time Spent: 10m 
  Work Description: smengcl commented on pull request #2998:
URL: https://github.com/apache/hadoop/pull/2998#issuecomment-861004273


   Thanks @virajjasani for patch. Will merge shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610942)
Time Spent: 5h  (was: 4h 50m)

> BPServiceActor add a new thread to handle IBR
> -
>
> Key: HDFS-16016
> URL: https://issues.apache.org/jira/browse/HDFS-16016
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. 
> We can handle IBR independently to improve the performance of heartbeat and 
> FBR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work started] (HDFS-15618) Improve datanode shutdown latency

2021-06-14 Thread Ahmed Hussein (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-15618 started by Ahmed Hussein.

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, 
> HDFS-15618.002.patch, HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-15618) Improve datanode shutdown latency

2021-06-14 Thread Ahmed Hussein (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein reopened HDFS-15618:
--

Reopening to submit a patch for branch-2.10

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, 
> HDFS-15618.002.patch, HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15618) Improve datanode shutdown latency

2021-06-14 Thread Ahmed Hussein (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Hussein updated HDFS-15618:
-
Attachment: HDFS-15618-branch-2.10.001.patch
Status: Patch Available  (was: In Progress)

> Improve datanode shutdown latency
> -
>
> Key: HDFS-15618
> URL: https://issues.apache.org/jira/browse/HDFS-15618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.2.3, 3.2.2
>
> Attachments: HDFS-15618-branch-2.10.001.patch, 
> HDFS-15618-branch-3.3.004.patch, HDFS-15618.001.patch, HDFS-15618.002.patch, 
> HDFS-15618.003.patch, HDFS-15618.004.patch
>
>
> The shutdown of Datanode is a very long latency. A block scanner waits for 5 
> minutes to join on each VolumeScanner thread.
> Since the scanners are daemon threads and do not alter the block content, it 
> is safe to ignore such conditions on shutdown of Datanode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16016) BPServiceActor add a new thread to handle IBR

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16016?focusedWorklogId=610945&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-610945
 ]

ASF GitHub Bot logged work on HDFS-16016:
-

Author: ASF GitHub Bot
Created on: 14/Jun/21 21:24
Start Date: 14/Jun/21 21:24
Worklog Time Spent: 10m 
  Work Description: smengcl merged pull request #2998:
URL: https://github.com/apache/hadoop/pull/2998


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 610945)
Time Spent: 5h 10m  (was: 5h)

> BPServiceActor add a new thread to handle IBR
> -
>
> Key: HDFS-16016
> URL: https://issues.apache.org/jira/browse/HDFS-16016
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. 
> We can handle IBR independently to improve the performance of heartbeat and 
> FBR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15150) Introduce read write lock to Datanode

2021-06-14 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363289#comment-17363289
 ] 

Hadoop QA commented on HDFS-15150:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 21m 
37s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 7 
new or modified test files. {color} |
|| || || || {color:brown} branch-2.10 Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
18s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
51s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
58s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m  
9s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 6s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
24s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
33s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
54s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 14m 
37s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:red}-1{color} | {color:red} spotbugs {color} | {color:red}  2m  
2s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/621/artifact/out/branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html{color}
 | {color:red} hadoop-common-project/hadoop-common in branch-2.10 has 2 extant 
spotbugs warnings. {color} |
| {color:red}-1{color} | {color:red} spotbugs {color} | {color:red}  2m 
36s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/621/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html{color}
 | {color:red} hadoop-hdfs-project/hadoop-hdfs in branch-2.10 has 1 extant 
spotbugs warnings. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
39s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
20s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 
20s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 
15s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 11m 
15s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 5s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
19s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} |

[jira] [Commented] (HDFS-15618) Improve datanode shutdown latency

2021-06-14 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363293#comment-17363293
 ] 

Hadoop QA commented on HDFS-15618:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
41s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} branch-2.10 Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
18s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
6s{color} | {color:green}{color} | {color:green} branch-2.10 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} branch-2.10 passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  7m 
27s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:red}-1{color} | {color:red} spotbugs {color} | {color:red}  3m  
6s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/623/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html{color}
 | {color:red} hadoop-hdfs-project/hadoop-hdfs in branch-2.10 has 1 extant 
spotbugs warnings. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 2s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green}{color} | {color:green} the patch passed with JDK Azul 
Systems, Inc.-1.7.0_262-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~16.04.1-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
57s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} The patch has no ill-formed 
XML file. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Azul Systems, Inc.-1.7.0_262-b10 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Bu

[jira] [Work logged] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-13671?focusedWorklogId=611089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611089
 ]

ASF GitHub Bot logged work on HDFS-13671:
-

Author: ASF GitHub Bot
Created on: 15/Jun/21 01:12
Start Date: 15/Jun/21 01:12
Worklog Time Spent: 10m 
  Work Description: ferhui commented on pull request #3065:
URL: https://github.com/apache/hadoop/pull/3065#issuecomment-861097055


   @xiaoyuyao Thanks for review! Is @AlphaGouGe 's fix OK?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611089)
Time Spent: 4h 50m  (was: 4h 40m)

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-

[jira] [Created] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS

2021-06-14 Thread Takanobu Asanuma (Jira)

Takanobu Asanuma created HDFS-16068:
---

 Summary: WebHdfsFileSystem has a possible connection leak in 
connection with HttpFS
 Key: HDFS-16068
 URL: https://issues.apache.org/jira/browse/HDFS-16068
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma


When we use WebHdfsFileSystem for HttpFS, some connections remain for a while 
after the filesystems are closed until GC runs. After investigating it for a 
while, I found that there is a potential connection leak in WebHdfsFileSystem.
{code:java}
// Close both the InputStream and the connection.
@VisibleForTesting
void closeInputStream(RunnerState rs) throws IOException {
  if (in != null) {
IOUtils.close(cachedConnection);
in = null;
  }
  cachedConnection = null;
  runnerState = rs;
}
{code}
In the above code, if the variable of {{in}} is null and {{cachedConnection}} 
is not null, {{cachedConnection}} doesn't close and the connection remains. I 
think this is the cause of our problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2021-06-14 Thread Hui Fei (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363306#comment-17363306
 ] 

Hui Fei commented on HDFS-13671:


[~huanghaibin] Thanks for sharing this. [~kihwal] Thanks, do you still have 
time to review this?

Thanks for [~aajisaka] and [~xyao] 's review, If no other comments, i want to 
merge it this week.

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16068:
--
Labels: pull-request-available  (was: )

> WebHdfsFileSystem has a possible connection leak in connection with HttpFS
> --
>
> Key: HDFS-16068
> URL: https://issues.apache.org/jira/browse/HDFS-16068
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we use WebHdfsFileSystem for HttpFS, some connections remain for a while 
> after the filesystems are closed until GC runs. After investigating it for a 
> while, I found that there is a potential connection leak in WebHdfsFileSystem.
> {code:java}
> // Close both the InputStream and the connection.
> @VisibleForTesting
> void closeInputStream(RunnerState rs) throws IOException {
>   if (in != null) {
> IOUtils.close(cachedConnection);
> in = null;
>   }
>   cachedConnection = null;
>   runnerState = rs;
> }
> {code}
> In the above code, if the variable of {{in}} is null and {{cachedConnection}} 
> is not null, {{cachedConnection}} doesn't close and the connection remains. I 
> think this is the cause of our problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16068?focusedWorklogId=611092&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611092
 ]

ASF GitHub Bot logged work on HDFS-16068:
-

Author: ASF GitHub Bot
Created on: 15/Jun/21 01:31
Start Date: 15/Jun/21 01:31
Worklog Time Spent: 10m 
  Work Description: tasanuma opened a new pull request #3104:
URL: https://github.com/apache/hadoop/pull/3104


   JIRA: HDFS-16068


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611092)
Remaining Estimate: 0h
Time Spent: 10m

> WebHdfsFileSystem has a possible connection leak in connection with HttpFS
> --
>
> Key: HDFS-16068
> URL: https://issues.apache.org/jira/browse/HDFS-16068
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When we use WebHdfsFileSystem for HttpFS, some connections remain for a while 
> after the filesystems are closed until GC runs. After investigating it for a 
> while, I found that there is a potential connection leak in WebHdfsFileSystem.
> {code:java}
> // Close both the InputStream and the connection.
> @VisibleForTesting
> void closeInputStream(RunnerState rs) throws IOException {
>   if (in != null) {
> IOUtils.close(cachedConnection);
> in = null;
>   }
>   cachedConnection = null;
>   runnerState = rs;
> }
> {code}
> In the above code, if the variable of {{in}} is null and {{cachedConnection}} 
> is not null, {{cachedConnection}} doesn't close and the connection remains. I 
> think this is the cause of our problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16068?focusedWorklogId=611093&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611093
 ]

ASF GitHub Bot logged work on HDFS-16068:
-

Author: ASF GitHub Bot
Created on: 15/Jun/21 01:31
Start Date: 15/Jun/21 01:31
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #3104:
URL: https://github.com/apache/hadoop/pull/3104#issuecomment-861103245


   Writing the unit test is not easy because `cachedConnection` is private. But 
the fix is clear and safe. So I don't think the unit test is required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611093)
Time Spent: 20m  (was: 10m)

> WebHdfsFileSystem has a possible connection leak in connection with HttpFS
> --
>
> Key: HDFS-16068
> URL: https://issues.apache.org/jira/browse/HDFS-16068
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When we use WebHdfsFileSystem for HttpFS, some connections remain for a while 
> after the filesystems are closed until GC runs. After investigating it for a 
> while, I found that there is a potential connection leak in WebHdfsFileSystem.
> {code:java}
> // Close both the InputStream and the connection.
> @VisibleForTesting
> void closeInputStream(RunnerState rs) throws IOException {
>   if (in != null) {
> IOUtils.close(cachedConnection);
> in = null;
>   }
>   cachedConnection = null;
>   runnerState = rs;
> }
> {code}
> In the above code, if the variable of {{in}} is null and {{cachedConnection}} 
> is not null, {{cachedConnection}} doesn't close and the connection remains. I 
> think this is the cause of our problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS

2021-06-14 Thread Takanobu Asanuma (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16068:

Status: Patch Available  (was: Open)

> WebHdfsFileSystem has a possible connection leak in connection with HttpFS
> --
>
> Key: HDFS-16068
> URL: https://issues.apache.org/jira/browse/HDFS-16068
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When we use WebHdfsFileSystem for HttpFS, some connections remain for a while 
> after the filesystems are closed until GC runs. After investigating it for a 
> while, I found that there is a potential connection leak in WebHdfsFileSystem.
> {code:java}
> // Close both the InputStream and the connection.
> @VisibleForTesting
> void closeInputStream(RunnerState rs) throws IOException {
>   if (in != null) {
> IOUtils.close(cachedConnection);
> in = null;
>   }
>   cachedConnection = null;
>   runnerState = rs;
> }
> {code}
> In the above code, if the variable of {{in}} is null and {{cachedConnection}} 
> is not null, {{cachedConnection}} doesn't close and the connection remains. I 
> think this is the cause of our problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16068?focusedWorklogId=611096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611096
 ]

ASF GitHub Bot logged work on HDFS-16068:
-

Author: ASF GitHub Bot
Created on: 15/Jun/21 01:50
Start Date: 15/Jun/21 01:50
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #3104:
URL: https://github.com/apache/hadoop/pull/3104#issuecomment-861108603


   LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611096)
Time Spent: 0.5h  (was: 20m)

> WebHdfsFileSystem has a possible connection leak in connection with HttpFS
> --
>
> Key: HDFS-16068
> URL: https://issues.apache.org/jira/browse/HDFS-16068
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When we use WebHdfsFileSystem for HttpFS, some connections remain for a while 
> after the filesystems are closed until GC runs. After investigating it for a 
> while, I found that there is a potential connection leak in WebHdfsFileSystem.
> {code:java}
> // Close both the InputStream and the connection.
> @VisibleForTesting
> void closeInputStream(RunnerState rs) throws IOException {
>   if (in != null) {
> IOUtils.close(cachedConnection);
> in = null;
>   }
>   cachedConnection = null;
>   runnerState = rs;
> }
> {code}
> In the above code, if the variable of {{in}} is null and {{cachedConnection}} 
> is not null, {{cachedConnection}} doesn't close and the connection remains. I 
> think this is the cause of our problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16016) BPServiceActor add a new thread to handle IBR

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16016?focusedWorklogId=611108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611108
 ]

ASF GitHub Bot logged work on HDFS-16016:
-

Author: ASF GitHub Bot
Created on: 15/Jun/21 02:32
Start Date: 15/Jun/21 02:32
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2998:
URL: https://github.com/apache/hadoop/pull/2998#issuecomment-861123365


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  30m 48s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m  9s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 53s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 47s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   3m  7s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 56s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 355m 13s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2998/25/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 47s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 439m 29s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
   |   | 
hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor |
   |   | hadoop.hdfs.server.namenode.ha.TestBootstrapStandby |
   |   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2998/25/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2998 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 09ad53fd21e2 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 0503dfdf06dbf1820aac54130e4d4f86854d5040 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.

[jira] [Work logged] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16068?focusedWorklogId=64&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-64
 ]

ASF GitHub Bot logged work on HDFS-16068:
-

Author: ASF GitHub Bot
Created on: 15/Jun/21 02:51
Start Date: 15/Jun/21 02:51
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3104:
URL: https://github.com/apache/hadoop/pull/3104#issuecomment-861129182


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 13s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m  1s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |   0m 55s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 59s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 27s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m 41s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |   0m 52s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 47s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 18s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 49s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 30s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 10s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 16s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 32s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  79m  0s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3104/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3104 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 5e1a36ddd1ed 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 9765967b2691b64fc25736bf46371a4c3769694e |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3104/1/testReport/ |
   | Max. process+thread count | 743 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: 
hadoop-hdfs-project/hadoop-hdfs-client |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3104/1/console |

[jira] [Assigned] (HDFS-16069) Remove locally stored files (edit log) when NameNode becomes Standby

2021-06-14 Thread JiangHua Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JiangHua Zhu reassigned HDFS-16069:
---

Assignee: JiangHua Zhu

> Remove locally stored files (edit log) when NameNode becomes Standby
> 
>
> Key: HDFS-16069
> URL: https://issues.apache.org/jira/browse/HDFS-16069
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>
> When zkfc is working, one of the NameNode (Active) will become the Standby 
> state. Before the state change, this NameNode has saved some files (edit 
> log), these files are stored in the directory (dfs.namenode.name.dir) , And 
> will not disappear in the short term until the status of this NameNode 
> becomes Active again.
> These files (edit log) are of little significance to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16069) Remove locally stored files (edit log) when NameNode becomes Standby

2021-06-14 Thread JiangHua Zhu (Jira)

JiangHua Zhu created HDFS-16069:
---

 Summary: Remove locally stored files (edit log) when NameNode 
becomes Standby
 Key: HDFS-16069
 URL: https://issues.apache.org/jira/browse/HDFS-16069
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: JiangHua Zhu


When zkfc is working, one of the NameNode (Active) will become the Standby 
state. Before the state change, this NameNode has saved some files (edit log), 
these files are stored in the directory (dfs.namenode.name.dir) , And will not 
disappear in the short term until the status of this NameNode becomes Active 
again.

These files (edit log) are of little significance to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16069) Remove locally stored files (edit log) when NameNode becomes Standby

2021-06-14 Thread JiangHua Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JiangHua Zhu updated HDFS-16069:

Affects Version/s: 2.9.2

> Remove locally stored files (edit log) when NameNode becomes Standby
> 
>
> Key: HDFS-16069
> URL: https://issues.apache.org/jira/browse/HDFS-16069
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>
> When zkfc is working, one of the NameNode (Active) will become the Standby 
> state. Before the state change, this NameNode has saved some files (edit 
> log), these files are stored in the directory (dfs.namenode.name.dir) , And 
> will not disappear in the short term until the status of this NameNode 
> becomes Active again.
> These files (edit log) are of little significance to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16069) Remove locally stored files (edit log) when NameNode becomes Standby

2021-06-14 Thread JiangHua Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JiangHua Zhu updated HDFS-16069:

Labels: namenode zkfc  (was: )

> Remove locally stored files (edit log) when NameNode becomes Standby
> 
>
> Key: HDFS-16069
> URL: https://issues.apache.org/jira/browse/HDFS-16069
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: namenode, zkfc
>
> When zkfc is working, one of the NameNode (Active) will become the Standby 
> state. Before the state change, this NameNode has saved some files (edit 
> log), these files are stored in the directory (dfs.namenode.name.dir) , And 
> will not disappear in the short term until the status of this NameNode 
> becomes Active again.
> These files (edit log) are of little significance to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16069) Remove locally stored files (edit log) when NameNode becomes Standby

2021-06-14 Thread JiangHua Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JiangHua Zhu updated HDFS-16069:

Labels:   (was: namenode zkfc)

> Remove locally stored files (edit log) when NameNode becomes Standby
> 
>
> Key: HDFS-16069
> URL: https://issues.apache.org/jira/browse/HDFS-16069
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>
> When zkfc is working, one of the NameNode (Active) will become the Standby 
> state. Before the state change, this NameNode has saved some files (edit 
> log), these files are stored in the directory (dfs.namenode.name.dir) , And 
> will not disappear in the short term until the status of this NameNode 
> becomes Active again.
> These files (edit log) are of little significance to the cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-14 Thread zhengchenyu (Jira)

zhengchenyu created HDFS-16070:
--

 Summary: DataTransfer block storm when datanode's io is busy.
 Key: HDFS-16070
 URL: https://issues.apache.org/jira/browse/HDFS-16070
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.2.1, 3.3.0
Reporter: zhengchenyu


When I speed up the decommission, I found that some datanode's io is busy, then 
I found host's load is very high, and ten thousands data transfer thread are 
running. 
Then I find log like below.
{code}
# 启动线程的日志
2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.52:9866
2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.31:9866
2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.16.50:9866
# 发送完成的标记
2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.7.52:9866
2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.16.50:9866
{code}
You will see last datatranfser thread was done on 13:54:08, but next 
datatranfser was start at 13:52:36. 
If datatranfser was not done in 10min(pending timeout + check interval), then 
next datatranfser for same block will be running. Then disk and network are 
heavy.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16070?focusedWorklogId=611124&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611124
 ]

ASF GitHub Bot logged work on HDFS-16070:
-

Author: ASF GitHub Bot
Created on: 15/Jun/21 03:42
Start Date: 15/Jun/21 03:42
Worklog Time Spent: 10m 
  Work Description: zhengchenyu opened a new pull request #3105:
URL: https://github.com/apache/hadoop/pull/3105


   When I speed up the decommission, I found that some datanode's io is busy, 
then I found host's load is very high, and ten thousands data transfer thread 
are running.
   Then I find log like below.
   ```
   # 启动线程的日志
   2021-06-08 13:42:37,620 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.52:9866
   2021-06-08 13:52:36,345 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.31:9866
   2021-06-08 14:02:37,197 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.16.50:9866
   # 发送完成的标记
   2021-06-08 13:54:08,134 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at 
bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.7.52:9866
   2021-06-08 14:10:47,170 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at 
bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.16.50:9866
   ```
   You will see last datatranfser thread was done on 13:54:08, but next 
datatranfser was start at 13:52:36.
   If datatranfser was not done in 10min(pending timeout + check interval), 
then next datatranfser for same block will be running. Then disk and network 
are heavy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611124)
Remaining Estimate: 0h
Time Spent: 10m

> DataTransfer block storm when datanode's io is busy.
> 
>
> Key: HDFS-16070
> URL: https://issues.apache.org/jira/browse/HDFS-16070
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.0, 3.2.1
>Reporter: zhengchenyu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When I speed up the decommission, I found that some datanode's io is busy, 
> then I found host's load is very high, and ten thousands data transfer thread 
> are running. 
> Then I find log like below.
> {code}
> # 启动线程的日志
> 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.52:9866
> 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.31:9866
> 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.

[jira] [Updated] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16070:
--
Labels: pull-request-available  (was: )

> DataTransfer block storm when datanode's io is busy.
> 
>
> Key: HDFS-16070
> URL: https://issues.apache.org/jira/browse/HDFS-16070
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.0, 3.2.1
>Reporter: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When I speed up the decommission, I found that some datanode's io is busy, 
> then I found host's load is very high, and ten thousands data transfer thread 
> are running. 
> Then I find log like below.
> {code}
> # 启动线程的日志
> 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.52:9866
> 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.31:9866
> 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.16.50:9866
> # 发送完成的标记
> 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.7.52:9866
> 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.16.50:9866
> {code}
> You will see last datatranfser thread was done on 13:54:08, but next 
> datatranfser was start at 13:52:36. 
> If datatranfser was not done in 10min(pending timeout + check interval), then 
> next datatranfser for same block will be running. Then disk and network are 
> heavy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-14 Thread zhengchenyu (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363358#comment-17363358
 ] 

zhengchenyu commented on HDFS-16070:


[~ayushsaxena][~inigoiri] I have submit a pull request, can you help me review 
this patch?

> DataTransfer block storm when datanode's io is busy.
> 
>
> Key: HDFS-16070
> URL: https://issues.apache.org/jira/browse/HDFS-16070
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.0, 3.2.1
>Reporter: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When I speed up the decommission, I found that some datanode's io is busy, 
> then I found host's load is very high, and ten thousands data transfer thread 
> are running. 
> Then I find log like below.
> {code}
> # 启动线程的日志
> 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.52:9866
> 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.31:9866
> 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.16.50:9866
> # 发送完成的标记
> 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.7.52:9866
> 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.16.50:9866
> {code}
> You will see last datatranfser thread was done on 13:54:08, but next 
> datatranfser was start at 13:52:36. 
> If datatranfser was not done in 10min(pending timeout + check interval), then 
> next datatranfser for same block will be running. Then disk and network are 
> heavy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16070?focusedWorklogId=611126&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611126
 ]

ASF GitHub Bot logged work on HDFS-16070:
-

Author: ASF GitHub Bot
Created on: 15/Jun/21 03:47
Start Date: 15/Jun/21 03:47
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on pull request #3105:
URL: https://github.com/apache/hadoop/pull/3105#issuecomment-861146706


   @ayushtkn @goiri  can you help me review this PR ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611126)
Time Spent: 20m  (was: 10m)

> DataTransfer block storm when datanode's io is busy.
> 
>
> Key: HDFS-16070
> URL: https://issues.apache.org/jira/browse/HDFS-16070
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.0, 3.2.1
>Reporter: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When I speed up the decommission, I found that some datanode's io is busy, 
> then I found host's load is very high, and ten thousands data transfer thread 
> are running. 
> Then I find log like below.
> {code}
> # 启动线程的日志
> 2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.52:9866
> 2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.7.31:9866
> 2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(10.201.4.49:9866, 
> datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
> infoSecurePort=0, ipcPort=9867, 
> storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
>  Starting thread to transfer 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
> 10.201.16.50:9866
> # 发送完成的标记
> 2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.7.52:9866
> 2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
> BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
> (numBytes=7457424) to /10.201.16.50:9866
> {code}
> You will see last datatranfser thread was done on 13:54:08, but next 
> datatranfser was start at 13:52:36. 
> If datatranfser was not done in 10min(pending timeout + check interval), then 
> next datatranfser for same block will be running. Then disk and network are 
> heavy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

2021-06-14 Thread zhengchenyu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HDFS-16070:
---
Description: 
When I speed up the decommission, I found that some datanode's io is busy, then 
I found host's load is very high, and ten thousands data transfer thread are 
running. 
Then I find log like below.
{code}
# 启动线程的日志
2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.52:9866
2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.31:9866
2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.16.50:9866
# 发送完成的标记
2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.7.52:9866
2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.16.50:9866
{code}
You will see last datatranfser thread was done on 13:54:08, but next 
datatranfser was start at 13:52:36. 
If datatranfser was not done in 10min(pending timeout + check interval), then 
next datatranfser for same block will be running. Then disk and network are 
heavy.

Note: decommission ec block will trigger this problem easily, becuase every ec 
internal block are unique. 


  was:
When I speed up the decommission, I found that some datanode's io is busy, then 
I found host's load is very high, and ten thousands data transfer thread are 
running. 
Then I find log like below.
{code}
# 启动线程的日志
2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.52:9866
2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.7.31:9866
2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.201.4.49:9866, 
datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, 
infoSecurePort=0, ipcPort=9867, 
storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797)
 Starting thread to transfer 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 
10.201.16.50:9866
# 发送完成的标记
2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.7.52:9866
2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted 
BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 
(numBytes=7457424) to /10.201.16.50:9866
{code}
You will see last datatranfser thread was done on 13:54:08, but next 
datatranfser was start at 13:52:36. 
If datatranfser was not done in 10min(pending timeout + check interval), then 
next datatranfser for same block will be runni

[jira] [Work logged] (HDFS-16068) WebHdfsFileSystem has a possible connection leak in connection with HttpFS

2021-06-14 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16068?focusedWorklogId=611146&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-611146
 ]

ASF GitHub Bot logged work on HDFS-16068:
-

Author: ASF GitHub Bot
Created on: 15/Jun/21 05:50
Start Date: 15/Jun/21 05:50
Worklog Time Spent: 10m 
  Work Description: hemanthboyina commented on pull request #3104:
URL: https://github.com/apache/hadoop/pull/3104#issuecomment-861191955


   +1 will commit shortly


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 611146)
Time Spent: 50m  (was: 40m)

> WebHdfsFileSystem has a possible connection leak in connection with HttpFS
> --
>
> Key: HDFS-16068
> URL: https://issues.apache.org/jira/browse/HDFS-16068
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When we use WebHdfsFileSystem for HttpFS, some connections remain for a while 
> after the filesystems are closed until GC runs. After investigating it for a 
> while, I found that there is a potential connection leak in WebHdfsFileSystem.
> {code:java}
> // Close both the InputStream and the connection.
> @VisibleForTesting
> void closeInputStream(RunnerState rs) throws IOException {
>   if (in != null) {
> IOUtils.close(cachedConnection);
> in = null;
>   }
>   cachedConnection = null;
>   runnerState = rs;
> }
> {code}
> In the above code, if the variable of {{in}} is null and {{cachedConnection}} 
> is not null, {{cachedConnection}} doesn't close and the connection remains. I 
> think this is the cause of our problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16016) BPServiceActor add a new thread to handle IBR

2021-06-14 Thread Viraj Jasani (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated HDFS-16016:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> BPServiceActor add a new thread to handle IBR
> -
>
> Key: HDFS-16016
> URL: https://issues.apache.org/jira/browse/HDFS-16016
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: JiangHua Zhu
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Now BPServiceActor#offerService() is doing many things, FBR, IBR, heartbeat. 
> We can handle IBR independently to improve the performance of heartbeat and 
> FBR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

80 matches

Mail list logo