[jira] [Updated] (HDFS-16432) Namenode block report add yield to avoid holding write lock too long

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16432:
--
Labels: pull-request-available  (was: )

> Namenode block report add yield to avoid holding write lock too long
> 
>
> Key: HDFS-16432
> URL: https://issues.apache.org/jira/browse/HDFS-16432
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-01-20-15-19-28-384.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> !image-2022-01-20-15-19-28-384.png|width=683,height=132!
> In our cluster, namenode block report will held write lock for a long time if 
> the storage block number more than 10. So we want to add a yield 
> mechanism in block reporting process to avoid holding write lock too long.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16432) Namenode block report add yield to avoid holding write lock too long

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16432?focusedWorklogId=711904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711904
 ]

ASF GitHub Bot logged work on HDFS-16432:
-

Author: ASF GitHub Bot
Created on: 20/Jan/22 07:32
Start Date: 20/Jan/22 07:32
Worklog Time Spent: 10m 
  Work Description: liubingxing opened a new pull request #3907:
URL: https://github.com/apache/hadoop/pull/3907


   JIRA: [HDFS-16432](https://issues.apache.org/jira/browse/HDFS-16432)
   
   
![image](https://user-images.githubusercontent.com/2844826/150293279-07d7bbf0-1471-464f-af81-7d5c23aeadcd.png)
   
   In our cluster, namenode block report will held write lock for a long time 
if the storage block number more than 10. 
   So we want to add a yield mechanism in block reporting process to avoid 
holding write lock too long.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711904)
Remaining Estimate: 0h
Time Spent: 10m

> Namenode block report add yield to avoid holding write lock too long
> 
>
> Key: HDFS-16432
> URL: https://issues.apache.org/jira/browse/HDFS-16432
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: qinyuren
>Priority: Major
> Attachments: image-2022-01-20-15-19-28-384.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> !image-2022-01-20-15-19-28-384.png|width=683,height=132!
> In our cluster, namenode block report will held write lock for a long time if 
> the storage block number more than 10. So we want to add a yield 
> mechanism in block reporting process to avoid holding write lock too long.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16432) Namenode block report add yield to avoid holding write lock too long

2022-01-19 Thread qinyuren (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qinyuren updated HDFS-16432:

Description: 
!image-2022-01-20-15-19-28-384.png|width=683,height=132!

In our cluster, namenode block report will held write lock for a long time if 
the storage block number more than 10. So we want to add a yield mechanism 
in block reporting process to avoid holding write lock too long.

  was:
!image-2022-01-20-15-19-28-384.png|width=652,height=126!

In our cluster, namenode block report will held write lock for a long time if 
the storage block number more than 10. So we want to add a yield mechanism 
in block reporting process to avoid holding write lock too long.


> Namenode block report add yield to avoid holding write lock too long
> 
>
> Key: HDFS-16432
> URL: https://issues.apache.org/jira/browse/HDFS-16432
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: qinyuren
>Priority: Major
> Attachments: image-2022-01-20-15-19-28-384.png
>
>
> !image-2022-01-20-15-19-28-384.png|width=683,height=132!
> In our cluster, namenode block report will held write lock for a long time if 
> the storage block number more than 10. So we want to add a yield 
> mechanism in block reporting process to avoid holding write lock too long.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16432) Namenode block report add yield to avoid holding write lock too long

2022-01-19 Thread qinyuren (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qinyuren updated HDFS-16432:

Description: 
!image-2022-01-20-15-19-28-384.png|width=652,height=126!

In our cluster, namenode block report will held write lock for a long time if 
the storage block number more than 10. So we want to add a yield mechanism 
in block reporting process to avoid holding write lock too long.

  was:!image-2022-01-20-15-19-28-384.png|width=652,height=126!


> Namenode block report add yield to avoid holding write lock too long
> 
>
> Key: HDFS-16432
> URL: https://issues.apache.org/jira/browse/HDFS-16432
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: qinyuren
>Priority: Major
> Attachments: image-2022-01-20-15-19-28-384.png
>
>
> !image-2022-01-20-15-19-28-384.png|width=652,height=126!
> In our cluster, namenode block report will held write lock for a long time if 
> the storage block number more than 10. So we want to add a yield 
> mechanism in block reporting process to avoid holding write lock too long.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16432) Namenode block report add yield to avoid holding write lock too long

2022-01-19 Thread qinyuren (Jira)
qinyuren created HDFS-16432:
---

 Summary: Namenode block report add yield to avoid holding write 
lock too long
 Key: HDFS-16432
 URL: https://issues.apache.org/jira/browse/HDFS-16432
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: qinyuren
 Attachments: image-2022-01-20-15-19-28-384.png

!image-2022-01-20-15-19-28-384.png|width=652,height=126!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.

2022-01-19 Thread Yuanbo Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17479090#comment-17479090
 ] 

Yuanbo Liu commented on HDFS-15180:
---

[~sodonnell]  Thanks for your comments.
There's a background that needs to be clarified.
Nowadays, the storage machine becomes bigger and bigger. We've seen 12TB x 36 
disks (which means 436TB of single datanode) in production environment. Global 
lock will be the key impact of IO performance, we'd be glad if this Jira has 
further progress to discuss or even be merged. 

>  DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
> ---
>
> Key: HDFS-15180
> URL: https://issues.apache.org/jira/browse/HDFS-15180
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: Qi Zhu
>Assignee: Mingxiang Li
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15180.001.patch, HDFS-15180.002.patch, 
> HDFS-15180.003.patch, HDFS-15180.004.patch, 
> image-2020-03-10-17-22-57-391.png, image-2020-03-10-17-31-58-830.png, 
> image-2020-03-10-17-34-26-368.png, image-2020-04-09-11-20-36-459.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in 
> big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16430) Validate maximum blocks in EC group when adding an EC policy

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16430?focusedWorklogId=711858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711858
 ]

ASF GitHub Bot logged work on HDFS-16430:
-

Author: ASF GitHub Bot
Created on: 20/Jan/22 03:56
Start Date: 20/Jan/22 03:56
Worklog Time Spent: 10m 
  Work Description: cndaimin commented on pull request #3899:
URL: https://github.com/apache/hadoop/pull/3899#issuecomment-1017095749


   Thanks for your review! @ayushtkn 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711858)
Time Spent: 0.5h  (was: 20m)

> Validate maximum blocks in EC group when adding an EC policy
> 
>
> Key: HDFS-16430
> URL: https://issues.apache.org/jira/browse/HDFS-16430
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec, erasure-coding
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HDFS EC adopts the last 4 bits of block ID to store the block index in EC 
> block group. Therefore maximum blocks in EC block group is 2^4=16, and which 
> is defined here: HdfsServerConstants#MAX_BLOCKS_IN_GROUP.
> Currently there is no limitation or warning when adding a bad EC policy with 
> numDataUnits + numParityUnits > 16. It only results in read/write error on EC 
> file with bad EC policy. To users this is not very straightforward.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2022-01-19 Thread Yuanbo Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu updated HDFS-14768:
--
Attachment: HDFS-14768-branch-3.1.patch

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768-branch-3.1.patch, HDFS-14768-branch-3.2.patch, 
> HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, 
> HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, 
> HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, 
> HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.011.patch, 
> HDFS-14768.jpg, guojh_UT_after_deomission.txt, 
> guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, 
> zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   

[jira] [Commented] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2022-01-19 Thread Yuanbo Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17479070#comment-17479070
 ] 

Yuanbo Liu commented on HDFS-14768:
---

Attached for branch-3.2, and branch-3.1, once it's merged, I'll attach new 
patch for HDFS-15186 along with branch-3.2 and branch-3.1

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768-branch-3.1.patch, HDFS-14768-branch-3.2.patch, 
> HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, 
> HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, 
> HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, 
> HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.011.patch, 
> HDFS-14768.jpg, guojh_UT_after_deomission.txt, 
> guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, 
> zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be 

[jira] [Updated] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2022-01-19 Thread Yuanbo Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu updated HDFS-14768:
--
Attachment: HDFS-14768-branch-3.2.patch

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768-branch-3.2.patch, HDFS-14768.000.patch, HDFS-14768.001.patch, 
> HDFS-14768.002.patch, HDFS-14768.003.patch, HDFS-14768.004.patch, 
> HDFS-14768.005.patch, HDFS-14768.006.patch, HDFS-14768.007.patch, 
> HDFS-14768.008.patch, HDFS-14768.009.patch, HDFS-14768.010.patch, 
> HDFS-14768.011.patch, HDFS-14768.jpg, guojh_UT_after_deomission.txt, 
> guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, 
> zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, 

[jira] [Updated] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2022-01-19 Thread Yuanbo Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu updated HDFS-14768:
--
Attachment: (was: HDFS-14768-branch-3.2.patch)

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768.000.patch, HDFS-14768.001.patch, HDFS-14768.002.patch, 
> HDFS-14768.003.patch, HDFS-14768.004.patch, HDFS-14768.005.patch, 
> HDFS-14768.006.patch, HDFS-14768.007.patch, HDFS-14768.008.patch, 
> HDFS-14768.009.patch, HDFS-14768.010.patch, HDFS-14768.011.patch, 
> HDFS-14768.jpg, guojh_UT_after_deomission.txt, 
> guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, 
> zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, 

[jira] [Updated] (HDFS-14768) EC : Busy DN replica should be consider in live replica check.

2022-01-19 Thread Yuanbo Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu updated HDFS-14768:
--
Attachment: HDFS-14768-branch-3.2.patch

> EC : Busy DN replica should be consider in live replica check.
> --
>
> Key: HDFS-14768
> URL: https://issues.apache.org/jira/browse/HDFS-14768
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding, hdfs, namenode
>Affects Versions: 3.0.2
>Reporter: guojh
>Assignee: guojh
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0
>
> Attachments: 1568275810244.jpg, 1568276338275.jpg, 1568771471942.jpg, 
> HDFS-14768-branch-3.2.patch, HDFS-14768.000.patch, HDFS-14768.001.patch, 
> HDFS-14768.002.patch, HDFS-14768.003.patch, HDFS-14768.004.patch, 
> HDFS-14768.005.patch, HDFS-14768.006.patch, HDFS-14768.007.patch, 
> HDFS-14768.008.patch, HDFS-14768.009.patch, HDFS-14768.010.patch, 
> HDFS-14768.011.patch, HDFS-14768.jpg, guojh_UT_after_deomission.txt, 
> guojh_UT_before_deomission.txt, zhaoyiming_UT_after_deomission.txt, 
> zhaoyiming_UT_beofre_deomission.txt
>
>
> Policy is RS-6-3-1024K, version is hadoop 3.0.2;
> We suppose a file's block Index is [0,1,2,3,4,5,6,7,8], And decommission 
> index[3,4], increase the index 6 datanode's
> pendingReplicationWithoutTargets  that make it large than 
> replicationStreamsHardLimit(we set 14). Then, After the method 
> chooseSourceDatanodes of BlockMananger, the liveBlockIndices is 
> [0,1,2,3,4,5,7,8], Block Counter is, Live:7, Decommission:2. 
> In method scheduleReconstruction of BlockManager, the additionalReplRequired 
> is 9 - 7 = 2. After Namenode choose two target Datanode, will assign a 
> erasureCode task to target datanode.
> When datanode get the task will build  targetIndices from liveBlockIndices 
> and target length. the code is blow.
> {code:java}
> // code placeholder
> targetIndices = new short[targets.length];
> private void initTargetIndices() { 
>   BitSet bitset = reconstructor.getLiveBitSet();
>   int m = 0; hasValidTargets = false; 
>   for (int i = 0; i < dataBlkNum + parityBlkNum; i++) {  
> if (!bitset.get) {    
>   if (reconstructor.getBlockLen > 0) {
>        if (m < targets.length) {
>          targetIndices[m++] = (short)i;
>          hasValidTargets = true;
>         }
>       }
>     }
>  }
> {code}
> targetIndices[0]=6, and targetIndices[1] is aways 0 from initial value.
> The StripedReader is  aways create reader from first 6 index block, and is 
> [0,1,2,3,4,5]
> Use the index [0,1,2,3,4,5] to build target index[6,0] will trigger the isal 
> bug. the block index6's data is corruption(all data is zero).
> I write a unit test can stabilize repreduce.
> {code:java}
> // code placeholder
> private int replicationStreamsHardLimit = 
> DFSConfigKeys.DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_DEFAULT;
> numDNs = dataBlocks + parityBlocks + 10;
> @Test(timeout = 24)
> public void testFileDecommission() throws Exception {
>   LOG.info("Starting test testFileDecommission");
>   final Path ecFile = new Path(ecDir, "testFileDecommission");
>   int writeBytes = cellSize * dataBlocks;
>   writeStripedFile(dfs, ecFile, writeBytes);
>   Assert.assertEquals(0, bm.numOfUnderReplicatedBlocks());
>   FileChecksum fileChecksum1 = dfs.getFileChecksum(ecFile, writeBytes);
>   final INodeFile fileNode = cluster.getNamesystem().getFSDirectory()
>   .getINode4Write(ecFile.toString()).asFile();
>   LocatedBlocks locatedBlocks =
>   StripedFileTestUtil.getLocatedBlocks(ecFile, dfs);
>   LocatedBlock lb = dfs.getClient().getLocatedBlocks(ecFile.toString(), 0)
>   .get(0);
>   DatanodeInfo[] dnLocs = lb.getLocations();
>   LocatedStripedBlock lastBlock =
>   (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
>   DatanodeInfo[] storageInfos = lastBlock.getLocations();
>   //
>   DatanodeDescriptor datanodeDescriptor = 
> cluster.getNameNode().getNamesystem()
>   
> .getBlockManager().getDatanodeManager().getDatanode(storageInfos[6].getDatanodeUuid());
>   BlockInfo firstBlock = fileNode.getBlocks()[0];
>   DatanodeStorageInfo[] dStorageInfos = bm.getStorages(firstBlock);
>   // the first heartbeat will consume 3 replica tasks
>   for (int i = 0; i <= replicationStreamsHardLimit + 3; i++) {
> BlockManagerTestUtil.addBlockToBeReplicated(datanodeDescriptor, new 
> Block(i),
> new DatanodeStorageInfo[]{dStorageInfos[0]});
>   }
>   assertEquals(dataBlocks + parityBlocks, dnLocs.length);
>   int[] decommNodeIndex = {3, 4};
>   final List decommisionNodes = new ArrayList();
>   // add the node which will be decommissioning
>   decommisionNodes.add(dnLocs[decommNodeIndex[0]]);
>   decommisionNodes.add(dnLocs[decommNodeIndex[1]]);
>   decommissionNode(0, decommisionNodes, 

[jira] [Work logged] (HDFS-16398) Reconfig block report parameters for datanode

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16398?focusedWorklogId=711813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711813
 ]

ASF GitHub Bot logged work on HDFS-16398:
-

Author: ASF GitHub Bot
Created on: 20/Jan/22 01:29
Start Date: 20/Jan/22 01:29
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #3831:
URL: https://github.com/apache/hadoop/pull/3831#issuecomment-1017027989


   Hi @tasanuma , I have solved the conflict. If you are free, please help to 
review this PR. Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711813)
Time Spent: 1h  (was: 50m)

> Reconfig block report parameters for datanode
> -
>
> Key: HDFS-16398
> URL: https://issues.apache.org/jira/browse/HDFS-16398
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16427) Add debug log for BlockManager#chooseExcessRedundancyStriped

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16427?focusedWorklogId=711801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711801
 ]

ASF GitHub Bot logged work on HDFS-16427:
-

Author: ASF GitHub Bot
Created on: 20/Jan/22 01:11
Start Date: 20/Jan/22 01:11
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #3888:
URL: https://github.com/apache/hadoop/pull/3888#issuecomment-1017019075


   Hi @tasanuma @ayushtkn , could you please take a look. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711801)
Time Spent: 0.5h  (was: 20m)

> Add debug log for BlockManager#chooseExcessRedundancyStriped
> 
>
> Key: HDFS-16427
> URL: https://issues.apache.org/jira/browse/HDFS-16427
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> To solve this issueHDFS-16420 , we added some debug logs, which were also 
> necessary.  If there is a problem, we set the log level to DEBUG, which is 
> convenient to analyze it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16427) Add debug log for BlockManager#chooseExcessRedundancyStriped

2022-01-19 Thread tomscut (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tomscut updated HDFS-16427:
---
Description: To solve this issueHDFS-16420 , we added some debug logs, 
which were also necessary.  If there is a problem, we set the log level to 
DEBUG, which is convenient to analyze it.  (was: To solve this 
issue[HDFS-16420|https://issues.apache.org/jira/browse/HDFS-16420] , we added 
some debug logs, which were also necessary.)

> Add debug log for BlockManager#chooseExcessRedundancyStriped
> 
>
> Key: HDFS-16427
> URL: https://issues.apache.org/jira/browse/HDFS-16427
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> To solve this issueHDFS-16420 , we added some debug logs, which were also 
> necessary.  If there is a problem, we set the log level to DEBUG, which is 
> convenient to analyze it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16398) Reconfig block report parameters for datanode

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16398?focusedWorklogId=711746=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711746
 ]

ASF GitHub Bot logged work on HDFS-16398:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 23:18
Start Date: 19/Jan/22 23:18
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3831:
URL: https://github.com/apache/hadoop/pull/3831#issuecomment-1016959135


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 12s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 10s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 59s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 28s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 23s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 53s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3831/4/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 110 unchanged 
- 2 fixed = 112 total (was 112)  |
   | +1 :green_heart: |  mvnsite  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 16s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 368m 38s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 477m 13s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3831/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3831 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 48c7e19233c8 4.15.0-163-generic #171-Ubuntu SMP Fri Nov 5 
11:55:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 6bd9edc48583506a286bd453f71dd64862ca5961 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3831/4/testReport/ |
   | Max. process+thread count | 2009 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Work logged] (HDFS-16428) Source path setted storagePolicy will cause wrong typeConsumed in rename operation

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16428?focusedWorklogId=711643=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711643
 ]

ASF GitHub Bot logged work on HDFS-16428:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 20:43
Start Date: 19/Jan/22 20:43
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3898:
URL: https://github.com/apache/hadoop/pull/3898#issuecomment-1016852033


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 54s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 20s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 17s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 46s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 49s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 17s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 59s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 406m 39s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 46s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 506m 42s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3898/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3898 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 1ad4e8770126 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 8e5315ca5b46938552dc10ee5cf4771c2613a7a0 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3898/3/testReport/ |
   | Max. process+thread count | 2658 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3898/3/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This 

[jira] [Work logged] (HDFS-16429) Add DataSetLockManager to maintain locks for FsDataSetImpl

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16429?focusedWorklogId=711454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711454
 ]

ASF GitHub Bot logged work on HDFS-16429:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 15:56
Start Date: 19/Jan/22 15:56
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3900:
URL: https://github.com/apache/hadoop/pull/3900#issuecomment-1016609231


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m  5s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  21m 42s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  22m 17s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |  19m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   3m 41s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 21s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   3m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   5m 48s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m  9s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 27s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 33s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |  21m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |  19m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   3m 31s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3900/2/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 1 new + 201 unchanged - 0 fixed = 202 total (was 
201)  |
   | +1 :green_heart: |  mvnsite  |   3m 21s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  2s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   2m 24s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   3m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 12s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 15s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  17m 39s | 
[/patch-unit-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3900/2/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common in the patch passed.  |
   | +1 :green_heart: |  unit  | 231m 58s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  5s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 454m  1s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.ipc.TestIPC |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3900/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3900 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux 9b1217eef5cf 4.15.0-161-generic #169-Ubuntu SMP Fri Oct 15 
13:41:54 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | 

[jira] [Work logged] (HDFS-16401) Remove the worthless DatasetVolumeChecker#numAsyncDatasetChecks

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16401?focusedWorklogId=711359=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711359
 ]

ASF GitHub Bot logged work on HDFS-16401:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 12:30
Start Date: 19/Jan/22 12:30
Worklog Time Spent: 10m 
  Work Description: jianghuazhu commented on pull request #3838:
URL: https://github.com/apache/hadoop/pull/3838#issuecomment-1016421126


   Hi @ferhui , can this pr be merged into trunk branch or other main branch?
   If not for now, I will continue to work hard.
   Thank you very much.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711359)
Time Spent: 1h 20m  (was: 1h 10m)

> Remove the worthless DatasetVolumeChecker#numAsyncDatasetChecks
> ---
>
> Key: HDFS-16401
> URL: https://issues.apache.org/jira/browse/HDFS-16401
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> As early as HDFS-11279, DataNode#checkDiskErrorAsync() has been cleaned up,
> It seems to have neglected to clean up 
> DatasetVolumeChecker#numAsyncDatasetChecks together.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16428) Source path setted storagePolicy will cause wrong typeConsumed in rename operation

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16428?focusedWorklogId=711295=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711295
 ]

ASF GitHub Bot logged work on HDFS-16428:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 11:09
Start Date: 19/Jan/22 11:09
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on a change in pull request #3898:
URL: https://github.com/apache/hadoop/pull/3898#discussion_r787638952



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestQuota.java
##
@@ -958,6 +959,33 @@ public void testQuotaByStorageType() throws Exception {
 6 * fileSpace);
   }
 
+  @Test
+  public void testRenameInodeWithStorageType() throws IOException {
+final int SIZE = 64;
+final short REPL = 1;
+final Path foo = new Path("/foo");
+final Path bs1 = new Path(foo, "bs1");
+final Path wow = new Path(bs1, "wow");
+final Path bs2 = new Path(foo, "bs2");
+final Path wow2 = new Path(bs2,"wow2");
+
+dfs.mkdirs(bs1, FsPermission.getDirDefault());
+dfs.mkdirs(bs2, FsPermission.getDirDefault());
+dfs.setQuota(bs1,1000, 434217728);
+dfs.setQuota(bs2,1000, 434217728);
+dfs.setStoragePolicy(bs2, HdfsConstants.ONESSD_STORAGE_POLICY_NAME);
+
+DFSTestUtil.createFile(dfs, wow, SIZE, REPL, 0);
+DFSTestUtil.createFile(dfs, wow2, SIZE, REPL, 0);
+assertTrue("without storage policy, typeConsumed should be 0.",
+dfs.getQuotaUsage(bs1).getTypeConsumed(StorageType.SSD) == 0);
+assertTrue("with storage policy, typeConsumed should not be 0.",
+dfs.getQuotaUsage(bs2).getTypeConsumed(StorageType.SSD) != 0);
+dfs.rename(bs2, bs1);
+assertTrue("rename with storage policy, typeConsumed should not be 0.",
+dfs.getQuotaUsage(bs1).getTypeConsumed(StorageType.SSD) != 0);

Review comment:
   Hmm, Thanx @ThinkerLei for confirming.
   Rechecking, seems correct. The issue and fix makes sense to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711295)
Time Spent: 50m  (was: 40m)

> Source path setted storagePolicy will cause wrong typeConsumed  in rename 
> operation
> ---
>
> Key: HDFS-16428
> URL: https://issues.apache.org/jira/browse/HDFS-16428
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: lei w
>Priority: Major
>  Labels: pull-request-available
> Attachments: example.txt
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When compute quota in rename operation , we use storage policy of the target 
> directory to compute src  quota usage. This will cause wrong value of 
> typeConsumed when source path was setted storage policy. I provided a unit 
> test to present this situation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16214) Lock optimization for large deleteing, no locks on the collection block

2022-01-19 Thread Xiangyi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17478544#comment-17478544
 ] 

Xiangyi Zhu commented on HDFS-16214:


[~John Smith] Currently Issues wants to solve the problem of long lock-holding 
time when collecting blocks when deleting large directories. This  
[HDFS-16043|https://issues.apache.org/jira/browse/HDFS-16043] Issuss is to 
achieve asynchronous deletion of blocks. These two issues are not the same.

> Lock optimization for large deleteing, no locks on the collection block
> ---
>
> Key: HDFS-16214
> URL: https://issues.apache.org/jira/browse/HDFS-16214
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The time-consuming deletion is mainly reflected in three logics , collecting 
> blocks, deleting Inode from InodeMap, and deleting blocks. The current 
> deletion is divided into two major steps. Step 1 acquires the lock, collects 
> the block and inode, deletes the inode, and releases the lock. Step 2 Acquire 
> the lock and delete the block to release the lock.
> Phase 2 is currently deleting blocks in batches, which can control the lock 
> holding time. Here we can also delete blocks asynchronously.
> Now step 1 still has the problem of holding the lock for a long time.
> For stage 1, we can make the collection block not hold the lock. The process 
> is as follows, step 1 obtains the lock, parent.removeChild, writes to 
> editLog, releases the lock. Step 2 no lock, collects the block. Step 3 
> acquire lock, update quota, release lease, release lock. Step 4 acquire lock, 
> delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block 
> to release lock.
> There may be some problems following the above process:
> 1. When the /a/b/c file is writing, then delete the /a/b directory. If the 
> deletion is performed to the collecting block stage, the client writes 
> complete or addBlock to the /a/b/c file at this time. This step is not locked 
> and delete /a/b and editLog has been written successfully. In this case, the 
> order of editLog is delete /a/c and complete /a/b/c. In this case, the 
> standby node playback editLog /a/b/c file has been deleted, and then go to 
> complete /a/b/c file will be abnormal.
> *The process is as follows:*
> *write editLog order: delete /a/b/c -> delete /a/b -> complete /a/b/c* 
> *replay  editLog order:* *delete /a/b/c ->* *delete /a/b ->* *complete /a/b/c 
> {color:#ff}(not found){color}*
> 2. If a delete operation is executed to the stage of collecting block, then 
> the administrator executes saveNameSpace, and then restarts Namenode. This 
> situation may cause the Inode that has been deleted from the parent childList 
> to remain in the InodeMap.
> To solve the above problem, in step 1, add the inode being deleted to the 
> Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile 
> EditLog), check whether there is this file and one of its parent Inodes in 
> the Set, and throw it if there is. An exception FileNotFoundException 
> occurred.
> In addition, the execution of saveNamespace needs to wait for all iNodes in 
> Set to be removed before execution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously

2022-01-19 Thread Xiangyi Zhu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangyi Zhu updated HDFS-16043:
---
Description: Add markedDeleteBlockScrubberThread to delete blocks 
asynchronously.  (was: The deletion of the large directory caused NN to hold 
the lock for too long, which caused our NameNode to be killed by ZKFC.
 Through the flame graph, it is found that its main time-consuming calculation 
is QuotaCount when removingBlocks(toRemovedBlocks) and deleting inodes, and 
removeBlocks(toRemovedBlocks) takes a higher proportion of time.
h3. solution:

1. RemoveBlocks is processed asynchronously. A thread is started in the 
BlockManager to process the deleted blocks and control the lock time.
 2. QuotaCount calculation optimization, this is similar to the optimization of 
this Issue HDFS-16000.
h3. Comparison before and after optimization:

Delete 1000w Inode and 1000w block test.
 *before:*
remove inode elapsed time: 7691 ms
 remove block elapsed time :11107 ms
 *after:*
 remove inode elapsed time: 4149 ms
 remove block elapsed time :0 ms)

> Add markedDeleteBlockScrubberThread to delete blocks asynchronously
> ---
>
> Key: HDFS-16043
> URL: https://issues.apache.org/jira/browse/HDFS-16043
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, namanode
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
> Attachments: 20210527-after.svg, 20210527-before.svg
>
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> Add markedDeleteBlockScrubberThread to delete blocks asynchronously.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16331) Make dfs.blockreport.intervalMsec reconfigurable

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16331?focusedWorklogId=711233=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711233
 ]

ASF GitHub Bot logged work on HDFS-16331:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 09:56
Start Date: 19/Jan/22 09:56
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #3676:
URL: https://github.com/apache/hadoop/pull/3676#issuecomment-1016268249


   @tomscut Thanks for working on it and letting me know. I would like to 
review it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711233)
Time Spent: 5h 40m  (was: 5.5h)

> Make dfs.blockreport.intervalMsec reconfigurable
> 
>
> Key: HDFS-16331
> URL: https://issues.apache.org/jira/browse/HDFS-16331
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
> Attachments: image-2021-11-18-09-33-24-236.png, 
> image-2021-11-18-09-35-35-400.png
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> We have a cold data cluster, which stores as EC policy. There are 24 fast 
> disks on each node and each disk is 7 TB. 
> Recently, many nodes have more than 10 million blocks, and the interval of 
> FBR is 6h as default. Frequent FBR caused great pressure on NN.
> !image-2021-11-18-09-35-35-400.png|width=334,height=229!
> !image-2021-11-18-09-33-24-236.png|width=566,height=159!
> We want to increase the interval of FBR, but have to rolling restart the DNs, 
> this operation is very heavy. In this scenario, it is necessary to make 
> _dfs.blockreport.intervalMsec_ reconfigurable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16214) Lock optimization for large deleteing, no locks on the collection block

2022-01-19 Thread Yuxuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17478505#comment-17478505
 ] 

Yuxuan Wang commented on HDFS-16214:


[~zhuxiangyi] I'm not read entirely but it looks similar to HDFS-16043 ?

> Lock optimization for large deleteing, no locks on the collection block
> ---
>
> Key: HDFS-16214
> URL: https://issues.apache.org/jira/browse/HDFS-16214
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Xiangyi Zhu
>Assignee: Xiangyi Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The time-consuming deletion is mainly reflected in three logics , collecting 
> blocks, deleting Inode from InodeMap, and deleting blocks. The current 
> deletion is divided into two major steps. Step 1 acquires the lock, collects 
> the block and inode, deletes the inode, and releases the lock. Step 2 Acquire 
> the lock and delete the block to release the lock.
> Phase 2 is currently deleting blocks in batches, which can control the lock 
> holding time. Here we can also delete blocks asynchronously.
> Now step 1 still has the problem of holding the lock for a long time.
> For stage 1, we can make the collection block not hold the lock. The process 
> is as follows, step 1 obtains the lock, parent.removeChild, writes to 
> editLog, releases the lock. Step 2 no lock, collects the block. Step 3 
> acquire lock, update quota, release lease, release lock. Step 4 acquire lock, 
> delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block 
> to release lock.
> There may be some problems following the above process:
> 1. When the /a/b/c file is writing, then delete the /a/b directory. If the 
> deletion is performed to the collecting block stage, the client writes 
> complete or addBlock to the /a/b/c file at this time. This step is not locked 
> and delete /a/b and editLog has been written successfully. In this case, the 
> order of editLog is delete /a/c and complete /a/b/c. In this case, the 
> standby node playback editLog /a/b/c file has been deleted, and then go to 
> complete /a/b/c file will be abnormal.
> *The process is as follows:*
> *write editLog order: delete /a/b/c -> delete /a/b -> complete /a/b/c* 
> *replay  editLog order:* *delete /a/b/c ->* *delete /a/b ->* *complete /a/b/c 
> {color:#ff}(not found){color}*
> 2. If a delete operation is executed to the stage of collecting block, then 
> the administrator executes saveNameSpace, and then restarts Namenode. This 
> situation may cause the Inode that has been deleted from the parent childList 
> to remain in the InodeMap.
> To solve the above problem, in step 1, add the inode being deleted to the 
> Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile 
> EditLog), check whether there is this file and one of its parent Inodes in 
> the Set, and throw it if there is. An exception FileNotFoundException 
> occurred.
> In addition, the execution of saveNamespace needs to wait for all iNodes in 
> Set to be removed before execution.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16399) Reconfig cache report parameters for datanode

2022-01-19 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16399:

Fix Version/s: 3.3.3

> Reconfig cache report parameters for datanode
> -
>
> Key: HDFS-16399
> URL: https://issues.apache.org/jira/browse/HDFS-16399
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16331) Make dfs.blockreport.intervalMsec reconfigurable

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16331?focusedWorklogId=711228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711228
 ]

ASF GitHub Bot logged work on HDFS-16331:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 09:49
Start Date: 19/Jan/22 09:49
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #3676:
URL: https://github.com/apache/hadoop/pull/3676#issuecomment-1016262441


   > Cherry-picked it into branch-3.3 with fixing small conflicts.
   
   Thank @tasanuma for your cherry-picking.
   
   I submitted a new PR [#3831](https://github.com/apache/hadoop/pull/3831) 
with other parameters related to blockReport, but currently there is a conflict 
with the branch-trunk. I'll resolve the conflict later. 
   
   I'm sorry I didn't tell you earlier. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711228)
Time Spent: 5.5h  (was: 5h 20m)

> Make dfs.blockreport.intervalMsec reconfigurable
> 
>
> Key: HDFS-16331
> URL: https://issues.apache.org/jira/browse/HDFS-16331
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
> Attachments: image-2021-11-18-09-33-24-236.png, 
> image-2021-11-18-09-35-35-400.png
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> We have a cold data cluster, which stores as EC policy. There are 24 fast 
> disks on each node and each disk is 7 TB. 
> Recently, many nodes have more than 10 million blocks, and the interval of 
> FBR is 6h as default. Frequent FBR caused great pressure on NN.
> !image-2021-11-18-09-35-35-400.png|width=334,height=229!
> !image-2021-11-18-09-33-24-236.png|width=566,height=159!
> We want to increase the interval of FBR, but have to rolling restart the DNs, 
> this operation is very heavy. In this scenario, it is necessary to make 
> _dfs.blockreport.intervalMsec_ reconfigurable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16399) Reconfig cache report parameters for datanode

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16399?focusedWorklogId=711224=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711224
 ]

ASF GitHub Bot logged work on HDFS-16399:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 09:45
Start Date: 19/Jan/22 09:45
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #3841:
URL: https://github.com/apache/hadoop/pull/3841#issuecomment-1016258796


   Cherry-picked it into branch-3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711224)
Time Spent: 1h 40m  (was: 1.5h)

> Reconfig cache report parameters for datanode
> -
>
> Key: HDFS-16399
> URL: https://issues.apache.org/jira/browse/HDFS-16399
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16431) Truncate CallerContext in client side

2022-01-19 Thread Chengwei Wang (Jira)
Chengwei Wang created HDFS-16431:


 Summary: Truncate CallerContext in client side
 Key: HDFS-16431
 URL: https://issues.apache.org/jira/browse/HDFS-16431
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: nn
Reporter: Chengwei Wang
Assignee: Chengwei Wang


The context of CallerContext would be truncated  when  it exceeds the maximum 
allowed length in server side. I think it's better to do check and truncate in 
client side to reduce the unnecessary overhead of network and memory for NN.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16400) Reconfig DataXceiver parameters for datanode

2022-01-19 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16400:

Fix Version/s: 3.3.3

> Reconfig DataXceiver parameters for datanode
> 
>
> Key: HDFS-16400
> URL: https://issues.apache.org/jira/browse/HDFS-16400
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> To avoid frequent rolling restarts of the DN, we should make DataXceiver 
> parameters reconfigurable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16400) Reconfig DataXceiver parameters for datanode

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16400?focusedWorklogId=711223=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711223
 ]

ASF GitHub Bot logged work on HDFS-16400:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 09:44
Start Date: 19/Jan/22 09:44
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #3843:
URL: https://github.com/apache/hadoop/pull/3843#issuecomment-1016258225


   Cherry-picked it into branch-3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711223)
Time Spent: 3h 40m  (was: 3.5h)

> Reconfig DataXceiver parameters for datanode
> 
>
> Key: HDFS-16400
> URL: https://issues.apache.org/jira/browse/HDFS-16400
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> To avoid frequent rolling restarts of the DN, we should make DataXceiver 
> parameters reconfigurable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16331) Make dfs.blockreport.intervalMsec reconfigurable

2022-01-19 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-16331:

Fix Version/s: 3.3.3

> Make dfs.blockreport.intervalMsec reconfigurable
> 
>
> Key: HDFS-16331
> URL: https://issues.apache.org/jira/browse/HDFS-16331
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
> Attachments: image-2021-11-18-09-33-24-236.png, 
> image-2021-11-18-09-35-35-400.png
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> We have a cold data cluster, which stores as EC policy. There are 24 fast 
> disks on each node and each disk is 7 TB. 
> Recently, many nodes have more than 10 million blocks, and the interval of 
> FBR is 6h as default. Frequent FBR caused great pressure on NN.
> !image-2021-11-18-09-35-35-400.png|width=334,height=229!
> !image-2021-11-18-09-33-24-236.png|width=566,height=159!
> We want to increase the interval of FBR, but have to rolling restart the DNs, 
> this operation is very heavy. In this scenario, it is necessary to make 
> _dfs.blockreport.intervalMsec_ reconfigurable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16331) Make dfs.blockreport.intervalMsec reconfigurable

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16331?focusedWorklogId=711221=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711221
 ]

ASF GitHub Bot logged work on HDFS-16331:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 09:42
Start Date: 19/Jan/22 09:42
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #3676:
URL: https://github.com/apache/hadoop/pull/3676#issuecomment-1016256231


   Cherry-picked it into branch-3.3 with fixing small conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711221)
Time Spent: 5h 20m  (was: 5h 10m)

> Make dfs.blockreport.intervalMsec reconfigurable
> 
>
> Key: HDFS-16331
> URL: https://issues.apache.org/jira/browse/HDFS-16331
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2021-11-18-09-33-24-236.png, 
> image-2021-11-18-09-35-35-400.png
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> We have a cold data cluster, which stores as EC policy. There are 24 fast 
> disks on each node and each disk is 7 TB. 
> Recently, many nodes have more than 10 million blocks, and the interval of 
> FBR is 6h as default. Frequent FBR caused great pressure on NN.
> !image-2021-11-18-09-35-35-400.png|width=334,height=229!
> !image-2021-11-18-09-33-24-236.png|width=566,height=159!
> We want to increase the interval of FBR, but have to rolling restart the DNs, 
> this operation is very heavy. In this scenario, it is necessary to make 
> _dfs.blockreport.intervalMsec_ reconfigurable.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16423) balancer should not get blocks on stale storages

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16423?focusedWorklogId=711211=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711211
 ]

ASF GitHub Bot logged work on HDFS-16423:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 09:34
Start Date: 19/Jan/22 09:34
Worklog Time Spent: 10m 
  Work Description: liubingxing commented on pull request #3883:
URL: https://github.com/apache/hadoop/pull/3883#issuecomment-1016249836


   Thanks @tasanuma @tomscut 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711211)
Time Spent: 3h 20m  (was: 3h 10m)

> balancer should not get blocks on stale storages
> 
>
> Key: HDFS-16423
> URL: https://issues.apache.org/jira/browse/HDFS-16423
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2022-01-13-17-18-32-409.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We have met a problems as described in HDFS-16420
> We found that balancer copied a block multi times without deleting the source 
> block if this block was placed in a stale storage. And resulting a block with 
> many copies, but these redundant copies are not deleted until the storage 
> become not stale.
>  
> !image-2022-01-13-17-18-32-409.png|width=657,height=275!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16426) fix nextBlockReportTime when trigger full block report force

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16426?focusedWorklogId=711210=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711210
 ]

ASF GitHub Bot logged work on HDFS-16426:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 09:33
Start Date: 19/Jan/22 09:33
Worklog Time Spent: 10m 
  Work Description: liubingxing commented on pull request #3887:
URL: https://github.com/apache/hadoop/pull/3887#issuecomment-1016249379


   Thanks @tasanuma @tomscut 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711210)
Time Spent: 2h 10m  (was: 2h)

> fix nextBlockReportTime when trigger full block report force
> 
>
> Key: HDFS-16426
> URL: https://issues.apache.org/jira/browse/HDFS-16426
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When we trigger full block report force by command line, the next block 
> report time will be set like this:
> nextBlockReportTime.getAndAdd(blockReportIntervalMs);
> nextBlockReportTime will larger than blockReportIntervalMs.
> If we trigger full block report twice, the nextBlockReportTime will larger 
> than 2 * blockReportIntervalMs. This is obviously not what we want.
> We fix the nextBlockReportTime = now + blockReportIntervalMs after full block 
> report trigger by command line.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16423) balancer should not get blocks on stale storages

2022-01-19 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16423.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> balancer should not get blocks on stale storages
> 
>
> Key: HDFS-16423
> URL: https://issues.apache.org/jira/browse/HDFS-16423
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2022-01-13-17-18-32-409.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We have met a problems as described in HDFS-16420
> We found that balancer copied a block multi times without deleting the source 
> block if this block was placed in a stale storage. And resulting a block with 
> many copies, but these redundant copies are not deleted until the storage 
> become not stale.
>  
> !image-2022-01-13-17-18-32-409.png|width=657,height=275!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16423) balancer should not get blocks on stale storages

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16423?focusedWorklogId=711179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711179
 ]

ASF GitHub Bot logged work on HDFS-16423:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 09:00
Start Date: 19/Jan/22 09:00
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on pull request #3883:
URL: https://github.com/apache/hadoop/pull/3883#issuecomment-1016220775


   Merged it. Thanks for your contribution, @liubingxing, and thanks for your 
review, @tomscut.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711179)
Time Spent: 3h 10m  (was: 3h)

> balancer should not get blocks on stale storages
> 
>
> Key: HDFS-16423
> URL: https://issues.apache.org/jira/browse/HDFS-16423
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-01-13-17-18-32-409.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We have met a problems as described in HDFS-16420
> We found that balancer copied a block multi times without deleting the source 
> block if this block was placed in a stale storage. And resulting a block with 
> many copies, but these redundant copies are not deleted until the storage 
> become not stale.
>  
> !image-2022-01-13-17-18-32-409.png|width=657,height=275!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16423) balancer should not get blocks on stale storages

2022-01-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16423?focusedWorklogId=711177=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711177
 ]

ASF GitHub Bot logged work on HDFS-16423:
-

Author: ASF GitHub Bot
Created on: 19/Jan/22 08:59
Start Date: 19/Jan/22 08:59
Worklog Time Spent: 10m 
  Work Description: tasanuma merged pull request #3883:
URL: https://github.com/apache/hadoop/pull/3883


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 711177)
Time Spent: 3h  (was: 2h 50m)

> balancer should not get blocks on stale storages
> 
>
> Key: HDFS-16423
> URL: https://issues.apache.org/jira/browse/HDFS-16423
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-01-13-17-18-32-409.png
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> We have met a problems as described in HDFS-16420
> We found that balancer copied a block multi times without deleting the source 
> block if this block was placed in a stale storage. And resulting a block with 
> many copies, but these redundant copies are not deleted until the storage 
> become not stale.
>  
> !image-2022-01-13-17-18-32-409.png|width=657,height=275!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org