[jira] [Work logged] (HDFS-15624) Fix the SetQuotaByStorageTypeOp problem after updating hadoop

2020-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15624?focusedWorklogId=524282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524282
 ]

ASF GitHub Bot logged work on HDFS-15624:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 07:08
Start Date: 15/Dec/20 07:08
Worklog Time Spent: 10m 
  Work Description: huangtianhua commented on pull request #2377:
URL: https://github.com/apache/hadoop/pull/2377#issuecomment-745100099


   @ayushtkn Hi, maybe you can help to review this :)? Thanks very much



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524282)
Time Spent: 7h 20m  (was: 7h 10m)

>  Fix the SetQuotaByStorageTypeOp problem after updating hadoop 
> ---
>
> Key: HDFS-15624
> URL: https://issues.apache.org/jira/browse/HDFS-15624
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: YaYun Wang
>Priority: Major
>  Labels: pull-request-available, release-blocker
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> HDFS-15025 adds a new storage Type NVDIMM, changes the ordinal() of the enum 
> of StorageType. And, setting the quota by storageType depends on the 
> ordinal(), therefore, it may cause the setting of quota to be invalid after 
> upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-14 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249504#comment-17249504
 ] 

Ayush Saxena commented on HDFS-15170:
-

Thanx [~weichiu] for the commit.

This broke the compilation in 3.2 and 3.1, since the conf wasn't present there, 
{{DFS_NAMENODE_CORRUPT_BLOCK_DELETE_IMMEDIATELY_ENABLED}}

Have quickly backported HDFS-15200 and HDFS-15187, seems things are 
working(Just ran the related UT's)

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch, HDFS-15170-04.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15200) Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage

2020-12-14 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249502#comment-17249502
 ] 

Ayush Saxena commented on HDFS-15200:
-

backported to 3.2 and 3.1

> Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage 
> -
>
> Key: HDFS-15200
> URL: https://issues.apache.org/jira/browse/HDFS-15200
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Fix For: 3.3.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15200-01.patch, HDFS-15200-02.patch, 
> HDFS-15200-03.patch, HDFS-15200-04.patch, HDFS-15200-05.patch
>
>
> Presently {{invalidateBlock(..)}} before adding a replica into invalidates, 
> checks whether any  block replica is on stale storage, if any replica is on 
> stale storage, it postpones deletion of the replica.
> Here :
> {code:java}
>// Check how many copies we have of the block
> if (nr.replicasOnStaleNodes() > 0) {
>   blockLog.debug("BLOCK* invalidateBlocks: postponing " +
>   "invalidation of {} on {} because {} replica(s) are located on " +
>   "nodes with potentially out-of-date block reports", b, dn,
>   nr.replicasOnStaleNodes());
>   postponeBlock(b.getCorrupted());
>   return false;
> {code}
>  
> In case of corrupt replica, we can skip this logic and delete the corrupt 
> replica immediately, as a corrupt replica can't get corrected.
> One outcome of this behavior presently is namenodes showing different block 
> states post failover, as:
> If a replica is marked corrupt, the Active NN, will mark it as corrupt, and 
> mark it for deletion and remove it from corruptReplica's and  
> excessRedundancyMap.
> If before the deletion of replica, Failover happens.
> The standby Namenode will mark all the storages as stale.
> Then will start processing IBR's, Now since the replica's would be on stale 
> storage, it will skip deletion, and removal from corruptReplica's
> Hence both the namenode will show different numbers and different corrupt 
> replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15200) Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage

2020-12-14 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15200:

Fix Version/s: 3.2.3
   3.1.5

> Delete Corrupt Replica Immediately Irrespective of Replicas On Stale Storage 
> -
>
> Key: HDFS-15200
> URL: https://issues.apache.org/jira/browse/HDFS-15200
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Fix For: 3.3.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15200-01.patch, HDFS-15200-02.patch, 
> HDFS-15200-03.patch, HDFS-15200-04.patch, HDFS-15200-05.patch
>
>
> Presently {{invalidateBlock(..)}} before adding a replica into invalidates, 
> checks whether any  block replica is on stale storage, if any replica is on 
> stale storage, it postpones deletion of the replica.
> Here :
> {code:java}
>// Check how many copies we have of the block
> if (nr.replicasOnStaleNodes() > 0) {
>   blockLog.debug("BLOCK* invalidateBlocks: postponing " +
>   "invalidation of {} on {} because {} replica(s) are located on " +
>   "nodes with potentially out-of-date block reports", b, dn,
>   nr.replicasOnStaleNodes());
>   postponeBlock(b.getCorrupted());
>   return false;
> {code}
>  
> In case of corrupt replica, we can skip this logic and delete the corrupt 
> replica immediately, as a corrupt replica can't get corrected.
> One outcome of this behavior presently is namenodes showing different block 
> states post failover, as:
> If a replica is marked corrupt, the Active NN, will mark it as corrupt, and 
> mark it for deletion and remove it from corruptReplica's and  
> excessRedundancyMap.
> If before the deletion of replica, Failover happens.
> The standby Namenode will mark all the storages as stale.
> Then will start processing IBR's, Now since the replica's would be on stale 
> storage, it will skip deletion, and removal from corruptReplica's
> Hence both the namenode will show different numbers and different corrupt 
> replicas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15187) CORRUPT replica mismatch between namenodes after failover

2020-12-14 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15187:

Fix Version/s: 3.2.3
   3.1.5

> CORRUPT replica mismatch between namenodes after failover
> -
>
> Key: HDFS-15187
> URL: https://issues.apache.org/jira/browse/HDFS-15187
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Fix For: 3.3.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15187-01.patch, HDFS-15187-02.patch, 
> HDFS-15187-03.patch
>
>
> The corrupt replica identified by Active Namenode, isn't identified by the 
> Other Namenode, when it is failovered to Active, in case the replica is being 
> marked corrupt due to updatePipeline.
> Scenario to repro :
> 1. Create a file, while writing turn one datanode down, to trigger update 
> pipeline.
> 2. Write some more data.
> 3. Close the file.
> 4. Turn on the shutdown datanode.
> 5. The replica in the datanode will be identifed as CORRUPT and the corrupt 
> count will be 1.
> 6. Failover to other Namenode.
> 7. Wait for all pending IBR processing.
> 8. The corrupt count will not be same, and the FSCK won't show the corrupt 
> replica.
> 9. Failover back to first namenode.
> 10. Corrupt count and corrupt replica will be there.
> Both Namenodes shows different stuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15187) CORRUPT replica mismatch between namenodes after failover

2020-12-14 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249500#comment-17249500
 ] 

Ayush Saxena commented on HDFS-15187:
-

Cherry-picked to 3.2 and 3.1

> CORRUPT replica mismatch between namenodes after failover
> -
>
> Key: HDFS-15187
> URL: https://issues.apache.org/jira/browse/HDFS-15187
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Fix For: 3.3.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15187-01.patch, HDFS-15187-02.patch, 
> HDFS-15187-03.patch
>
>
> The corrupt replica identified by Active Namenode, isn't identified by the 
> Other Namenode, when it is failovered to Active, in case the replica is being 
> marked corrupt due to updatePipeline.
> Scenario to repro :
> 1. Create a file, while writing turn one datanode down, to trigger update 
> pipeline.
> 2. Write some more data.
> 3. Close the file.
> 4. Turn on the shutdown datanode.
> 5. The replica in the datanode will be identifed as CORRUPT and the corrupt 
> count will be 1.
> 6. Failover to other Namenode.
> 7. Wait for all pending IBR processing.
> 8. The corrupt count will not be same, and the FSCK won't show the corrupt 
> replica.
> 9. Failover back to first namenode.
> 10. Corrupt count and corrupt replica will be there.
> Both Namenodes shows different stuff.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15728) Update description of dfs.datanode.handler.count in hdfs-default.xml

2020-12-14 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249479#comment-17249479
 ] 

Ayush Saxena commented on HDFS-15728:
-

Merged PR to trunk

> Update description of dfs.datanode.handler.count in hdfs-default.xml
> 
>
> Key: HDFS-15728
> URL: https://issues.apache.org/jira/browse/HDFS-15728
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration
>Reporter: liuyan
>Assignee: liuyan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As for dfs.namenode.handler.count we documented with below description 
>  _"The number of Namenode RPC server threads that listen to requests from 
> clients. If dfs.namenode.servicerpc-address is not configured then Namenode 
> RPC server threads listen to requests from all nodes."_
> however, for dfs.datanode.handler.count we documented with below description 
> _"The number of server threads for the datanode."_
>  
> The purpose of this Jira is to update the description for 
> dfs.datanode.handler.count with 
> _"The number of Datanode RPC server threads that listen to requests from 
> client."_ to make it more readable to the users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15728) Update description of dfs.datanode.handler.count in hdfs-default.xml

2020-12-14 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-15728.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Update description of dfs.datanode.handler.count in hdfs-default.xml
> 
>
> Key: HDFS-15728
> URL: https://issues.apache.org/jira/browse/HDFS-15728
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration
>Reporter: liuyan
>Assignee: liuyan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As for dfs.namenode.handler.count we documented with below description 
>  _"The number of Namenode RPC server threads that listen to requests from 
> clients. If dfs.namenode.servicerpc-address is not configured then Namenode 
> RPC server threads listen to requests from all nodes."_
> however, for dfs.datanode.handler.count we documented with below description 
> _"The number of server threads for the datanode."_
>  
> The purpose of this Jira is to update the description for 
> dfs.datanode.handler.count with 
> _"The number of Datanode RPC server threads that listen to requests from 
> client."_ to make it more readable to the users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-14 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15170:
---
Fix Version/s: 3.2.3
   3.1.5
   3.4.0
   3.3.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks Ayush for the patch. The change is merged into trunk all the way to 
branch-3.1

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch, HDFS-15170-04.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15728) Update description of dfs.datanode.handler.count in hdfs-default.xml

2020-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15728?focusedWorklogId=524239=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524239
 ]

ASF GitHub Bot logged work on HDFS-15728:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 04:42
Start Date: 15/Dec/20 04:42
Worklog Time Spent: 10m 
  Work Description: ayushtkn edited a comment on pull request #2534:
URL: https://github.com/apache/hadoop/pull/2534#issuecomment-745047847


   Merged to trunk.
   @liuyanpunk Thanx for the contribution and next time please raise a PR 
against trunk branch not master. Anyway this time I have sorted it out.
   Thanx @jojochuang for the review!!!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524239)
Time Spent: 50m  (was: 40m)

> Update description of dfs.datanode.handler.count in hdfs-default.xml
> 
>
> Key: HDFS-15728
> URL: https://issues.apache.org/jira/browse/HDFS-15728
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration
>Reporter: liuyan
>Assignee: liuyan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As for dfs.namenode.handler.count we documented with below description 
>  _"The number of Namenode RPC server threads that listen to requests from 
> clients. If dfs.namenode.servicerpc-address is not configured then Namenode 
> RPC server threads listen to requests from all nodes."_
> however, for dfs.datanode.handler.count we documented with below description 
> _"The number of server threads for the datanode."_
>  
> The purpose of this Jira is to update the description for 
> dfs.datanode.handler.count with 
> _"The number of Datanode RPC server threads that listen to requests from 
> client."_ to make it more readable to the users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15728) Update description of dfs.datanode.handler.count in hdfs-default.xml

2020-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15728?focusedWorklogId=524238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524238
 ]

ASF GitHub Bot logged work on HDFS-15728:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 04:41
Start Date: 15/Dec/20 04:41
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #2534:
URL: https://github.com/apache/hadoop/pull/2534#issuecomment-745047847


   Merged to trunk.
   @liuyanpunk Thanx for the contribution and next time please raise a PR 
against trunk branch not master. Anyway this time I have it sorted out.
   Thanx @jojochuang for the review!!!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524238)
Time Spent: 40m  (was: 0.5h)

> Update description of dfs.datanode.handler.count in hdfs-default.xml
> 
>
> Key: HDFS-15728
> URL: https://issues.apache.org/jira/browse/HDFS-15728
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration
>Reporter: liuyan
>Assignee: liuyan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As for dfs.namenode.handler.count we documented with below description 
>  _"The number of Namenode RPC server threads that listen to requests from 
> clients. If dfs.namenode.servicerpc-address is not configured then Namenode 
> RPC server threads listen to requests from all nodes."_
> however, for dfs.datanode.handler.count we documented with below description 
> _"The number of server threads for the datanode."_
>  
> The purpose of this Jira is to update the description for 
> dfs.datanode.handler.count with 
> _"The number of Datanode RPC server threads that listen to requests from 
> client."_ to make it more readable to the users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15728) Update description of dfs.datanode.handler.count in hdfs-default.xml

2020-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15728?focusedWorklogId=524234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524234
 ]

ASF GitHub Bot logged work on HDFS-15728:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 04:21
Start Date: 15/Dec/20 04:21
Worklog Time Spent: 10m 
  Work Description: ayushtkn merged pull request #2534:
URL: https://github.com/apache/hadoop/pull/2534


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524234)
Time Spent: 0.5h  (was: 20m)

> Update description of dfs.datanode.handler.count in hdfs-default.xml
> 
>
> Key: HDFS-15728
> URL: https://issues.apache.org/jira/browse/HDFS-15728
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration
>Reporter: liuyan
>Assignee: liuyan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As for dfs.namenode.handler.count we documented with below description 
>  _"The number of Namenode RPC server threads that listen to requests from 
> clients. If dfs.namenode.servicerpc-address is not configured then Namenode 
> RPC server threads listen to requests from all nodes."_
> however, for dfs.datanode.handler.count we documented with below description 
> _"The number of server threads for the datanode."_
>  
> The purpose of this Jira is to update the description for 
> dfs.datanode.handler.count with 
> _"The number of Datanode RPC server threads that listen to requests from 
> client."_ to make it more readable to the users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15728) Update description of dfs.datanode.handler.count in hdfs-default.xml

2020-12-14 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15728:

Summary: Update description of dfs.datanode.handler.count in 
hdfs-default.xml  (was: Updating definition to dfs.datanode.handler.count 
documentation description.)

> Update description of dfs.datanode.handler.count in hdfs-default.xml
> 
>
> Key: HDFS-15728
> URL: https://issues.apache.org/jira/browse/HDFS-15728
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration
>Reporter: liuyan
>Assignee: liuyan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As for dfs.namenode.handler.count we documented with below description 
>  _"The number of Namenode RPC server threads that listen to requests from 
> clients. If dfs.namenode.servicerpc-address is not configured then Namenode 
> RPC server threads listen to requests from all nodes."_
> however, for dfs.datanode.handler.count we documented with below description 
> _"The number of server threads for the datanode."_
>  
> The purpose of this Jira is to update the description for 
> dfs.datanode.handler.count with 
> _"The number of Datanode RPC server threads that listen to requests from 
> client."_ to make it more readable to the users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15728) Updating definition to dfs.datanode.handler.count documentation description.

2020-12-14 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-15728:
---

Assignee: liuyan

> Updating definition to dfs.datanode.handler.count documentation description.
> 
>
> Key: HDFS-15728
> URL: https://issues.apache.org/jira/browse/HDFS-15728
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration
>Reporter: liuyan
>Assignee: liuyan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As for dfs.namenode.handler.count we documented with below description 
>  _"The number of Namenode RPC server threads that listen to requests from 
> clients. If dfs.namenode.servicerpc-address is not configured then Namenode 
> RPC server threads listen to requests from all nodes."_
> however, for dfs.datanode.handler.count we documented with below description 
> _"The number of server threads for the datanode."_
>  
> The purpose of this Jira is to update the description for 
> dfs.datanode.handler.count with 
> _"The number of Datanode RPC server threads that listen to requests from 
> client."_ to make it more readable to the users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15729) Show progress of Balancer in Namenode UI

2020-12-14 Thread Srinivasu Majeti (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249471#comment-17249471
 ] 

Srinivasu Majeti commented on HDFS-15729:
-

Sure [~weichiu], Any UI thats useful for tracking progress :) . Thank you.

> Show progress of Balancer in Namenode UI
> 
>
> Key: HDFS-15729
> URL: https://issues.apache.org/jira/browse/HDFS-15729
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Affects Versions: 3.1.4
>Reporter: Srinivasu Majeti
>Priority: Major
>
> It would be nice to have a tracking of Balancer process in the Namenode UI to 
> show if something is running and what is the progress to show current status 
> . This would be similar to Namenode startup progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-14 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249431#comment-17249431
 ] 

Hadoop QA commented on HDFS-15170:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
47s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 33m 
10s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
17s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
18s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 55s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
28s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m  
7s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs config; 
considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
5s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
15s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
11s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
6s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 50s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green}{color} | {color:green} the 

[jira] [Work logged] (HDFS-15704) Mitigate lease monitor's rapid infinite loop

2020-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15704?focusedWorklogId=524201=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524201
 ]

ASF GitHub Bot logged work on HDFS-15704:
-

Author: ASF GitHub Bot
Created on: 15/Dec/20 01:37
Start Date: 15/Dec/20 01:37
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2511:
URL: https://github.com/apache/hadoop/pull/2511#issuecomment-744977311


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 27s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  35m 51s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 24s |  |  trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  compile  |   1m 12s |  |  trunk passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  checkstyle  |   0m 48s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 20s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  19m 20s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  trunk passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +0 :ok: |  spotbugs  |   3m 16s |  |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   3m 12s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 10s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javac  |   1m 14s |  |  
hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 with 
JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 generated 0 new + 599 unchanged - 3 
fixed = 599 total (was 602)  |
   | +1 :green_heart: |  compile  |   1m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  javac  |   1m  6s |  |  
hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01
 with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 generated 0 new 
+ 583 unchanged - 3 fixed = 583 total (was 586)  |
   | +1 :green_heart: |  checkstyle  |   0m 41s |  |  
hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 26 unchanged - 1 
fixed = 26 total (was 27)  |
   | +1 :green_heart: |  mvnsite  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  |  The patch has no 
whitespace issues.  |
   | +1 :green_heart: |  shadedclient  |  16m 50s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04  |
   | +1 :green_heart: |  javadoc  |   1m 22s |  |  the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01  |
   | +1 :green_heart: |  findbugs  |   3m 17s |  |  the patch passed  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 117m  8s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 38s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 214m 23s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2511/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2511 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux a60018bc009b 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 3234e5eaf36 |
   | Default Java | Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 |
   |  Test 

[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-14 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249354#comment-17249354
 ] 

Wei-Chiu Chuang commented on HDFS-15170:


I'll commit the patch. The 04 patch is essentially 03 patch with one word 
removed in the comment. Attach here for future reference.

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch, HDFS-15170-04.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-14 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15170:
---
Attachment: HDFS-15170-04.patch

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch, HDFS-15170-04.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15170) EC: Block gets marked as CORRUPT in case of failover and pipeline recovery

2020-12-14 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249350#comment-17249350
 ] 

Wei-Chiu Chuang commented on HDFS-15170:


We should move forward with the fix... even though HDFS-15200 mostly covered it 
in trunk, we should fix it too in lower releases.

> EC: Block gets marked as CORRUPT in case of failover and pipeline recovery
> --
>
> Key: HDFS-15170
> URL: https://issues.apache.org/jira/browse/HDFS-15170
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
> Attachments: HDFS-15170-01.patch, HDFS-15170-02.patch, 
> HDFS-15170-03.patch
>
>
> Steps to Repro :
> 1. Start writing a EC file.
> 2. After more than one stripe has been written, stop one datanode.
> 3. Post pipeline recovery, keep on writing the data.
> 4.Close the file.
> 5. transition the namenode to standby and back to active.
> 6. Turn on the shutdown datanode in step 2
> The BR from datanode 2 will make the block corrupt and during invalidate 
> block won't remove it, since post failover the blocks would be on stale 
> storage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15716) TestUpgradeDomainBlockPlacementPolicy flaky

2020-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15716?focusedWorklogId=524136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524136
 ]

ASF GitHub Bot logged work on HDFS-15716:
-

Author: ASF GitHub Bot
Created on: 14/Dec/20 21:59
Start Date: 14/Dec/20 21:59
Worklog Time Spent: 10m 
  Work Description: amahussein commented on pull request #2528:
URL: https://github.com/apache/hadoop/pull/2528#issuecomment-744736690


   > The list of failed unit tests in the last few days is getting worse and 
worse.
   > @amahussein, you've been making lots of fixes in the last month; any idea 
why is this suddenly getting so bad?
   
   Thanks @goiri. 
   I took a look at   the  build latest 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/355/
   
   ```bash
   Test Result (23 failures / -45)
   
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerWithStripedFile
   
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerWithIncludeListWithPorts
   
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerWithSortTopNodes
   
org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerCliWithIncludeListWithPorts
   
org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks.testSetRepIncWithUnderReplicatedBlocks
   org.apache.hadoop.hdfs.server.namenode.TestFsck.testECFsck
   
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater.testNodeDecommision
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testUpdateAppStateXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testAppQueueXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testAppTimeoutsXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testGetContainersXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testUpdateAppPriorityXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testUpdateAppQueueXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testAppTimeoutXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testAppAttemptXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testGetAppAttemptXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testAppXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testAppStateXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testAppPriorityXML
   
org.apache.hadoop.yarn.server.router.webapp.TestRouterWebServicesREST.testGetAppsMultiThread
   
org.apache.hadoop.tools.dynamometer.TestDynamometerInfra.org.apache.hadoop.tools.dynamometer.TestDynamometerInfra
   
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers
   
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithEnforceExecutionType
   
   ```
   
   - TestFsck: could be failing after the installation of intel-ISA.
   - TestUnderReplicatedBlocks: I remember  I saw that unit test failing before.
   - TestBalancer: Interesting that there are several failures.  I haven't 
looked into that yet. I guess there is a race condition somewhere in the code 
path.
   - TestRouterWebServicesREST, TestDynamometerInfra, TestDistributedShell: are 
failing for sometime now.
   
   By the way, I found that TestDistributedShell does not clean at all. The 
problem that the two failing unit tests leave several processes running for 
sometime. It could be one of the reasons the system crashes as the background 
containers are sucking memory and CPU resources.
   I am going to address that sometime soon. Hopefully this will enhance the 
stability of the overall Yetus execution.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524136)
Time Spent: 1h 50m  (was: 1h 40m)

> TestUpgradeDomainBlockPlacementPolicy flaky
> ---
>
> Key: HDFS-15716
> URL: https://issues.apache.org/jira/browse/HDFS-15716
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode, test
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In some slow runs 

[jira] [Commented] (HDFS-15729) Show progress of Balancer in Namenode UI

2020-12-14 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249306#comment-17249306
 ] 

Wei-Chiu Chuang commented on HDFS-15729:


Balancer runs as a separate process, probably not a good idea to add one to NN 
web UI.

That said, we recently made balancer a daemon, and which exposes jmx. Maybe we 
can add a UI to balancer and let it show progress.

> Show progress of Balancer in Namenode UI
> 
>
> Key: HDFS-15729
> URL: https://issues.apache.org/jira/browse/HDFS-15729
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Affects Versions: 3.1.4
>Reporter: Srinivasu Majeti
>Priority: Major
>
> It would be nice to have a tracking of Balancer process in the Namenode UI to 
> show if something is running and what is the progress to show current status 
> . This would be similar to Namenode startup progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15704) Mitigate lease monitor's rapid infinite loop

2020-12-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15704?focusedWorklogId=524104=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-524104
 ]

ASF GitHub Bot logged work on HDFS-15704:
-

Author: ASF GitHub Bot
Created on: 14/Dec/20 20:50
Start Date: 14/Dec/20 20:50
Worklog Time Spent: 10m 
  Work Description: jbrennan333 commented on a change in pull request #2511:
URL: https://github.com/apache/hadoop/pull/2511#discussion_r542744744



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
##
@@ -92,21 +91,15 @@
   private long lastHolderUpdateTime;
   private String internalLeaseHolder;
 
+  //
   // Used for handling lock-leases
   // Mapping: leaseHolder -> Lease
-  private final SortedMap leases = new TreeMap<>();
-  // Set of: Lease
-  private final NavigableSet sortedLeases = new TreeSet<>(
-  new Comparator() {
-@Override
-public int compare(Lease o1, Lease o2) {
-  if (o1.getLastUpdate() != o2.getLastUpdate()) {
-return Long.signum(o1.getLastUpdate() - o2.getLastUpdate());
-  } else {
-return o1.holder.compareTo(o2.holder);
-  }
-}
-  });
+  // TreeMap has O(log(n)) complexity but it is more space efficient
+  // compared to HashMap. Therefore, replacing TreeMap with a
+  // HashMap can be considered to get faster O(1) time complexity
+  // on the expense of 30% memory waste.
+  //

Review comment:
   This explanation belongs in the Jira, not in a comment in the code.  
When looking at current code, it's not really clear why you are talking about 
TreeMap at all.
   
   Also, I think this comment misses the point of the TreeMap.  The reason a 
TreeMap was used here was to maintain a sorted order, which allowed the 
checkLeases() to exit the while loop as soon as it hit an unexpired lease. 
   
   The new design removes the need for the TreeMap by pruning the list it 
passes to checkLeases().

##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
##
@@ -541,10 +550,10 @@ public void run() {
   fsnamesystem.getEditLog().logSync();
 }
   }
-  
-  Thread.sleep(fsnamesystem.getLeaseRecheckIntervalMs());
 } catch(InterruptedException ie) {
-  LOG.debug("{} is interrupted", name, ie);
+  if (LOG.isDebugEnabled()) {

Review comment:
   This LOG.isDebugEnabled() check is not needed anymore.

##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
##
@@ -515,6 +508,13 @@ public void setLeasePeriod(long softLimit, long hardLimit) 
{
 this.softLimit = softLimit;
 this.hardLimit = hardLimit; 
   }
+
+  private synchronized Collection getExpiredCandidateLeases() {
+final long now = Time.monotonicNow();
+return leases.values().stream()
+.filter(lease -> lease.expiredHardLimit(now))
+.collect(Collectors.toCollection(HashSet::new));
+  }

Review comment:
   I much prefer the loop in @daryn-sharp's original code.
   
   Collection expired = new HashSet<>();
   for (Lease lease : leases) {
 if (lease.expiredHardLimit(now)) {
   expired.add(lease);
 }
   }
   This streams code will have to change if we want to pull this back to branch 
2.
   I think @daryn-sharp  also said that stream()'s are more expensive.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 524104)
Time Spent: 1h  (was: 50m)

> Mitigate lease monitor's rapid infinite loop
> 
>
> Key: HDFS-15704
> URL: https://issues.apache.org/jira/browse/HDFS-15704
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [~daryn] reported that the lease monitor goes into a rapid infinite loop if 
> an exception occurs during a lease recovery.  The two main issues are:
> # lease monitor thread does not sleep if an exception occurs before looping 
> again
> # the loop peeks at the first element of a sorted tree set so when an 
> exception occurs, the "bad" lease remains as the first element preventing 
> recovery of 

[jira] [Updated] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize

2020-12-14 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-15725:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Lease Recovery never completes for a committed block which the DNs never 
> finalize
> -
>
> Key: HDFS-15725
> URL: https://issues.apache.org/jira/browse/HDFS-15725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: HDFS-15725.001.patch, HDFS-15725.002.patch, 
> HDFS-15725.003.patch, HDFS-15725.branch-2.10.001.patch, 
> HDFS-15725.branch-3.2.001.patch, lease_recovery_2_10.patch
>
>
> It a very rare condition, the HDFS client process can get killed right at the 
> time it is completing a block / file.
> The client sends the "complete" call to the namenode, moving the block into a 
> committed state, but it dies before it can send the final packet to the 
> Datanodes telling them to finalize the block.
> This means the blocks are stuck on the datanodes in RBW state and nothing 
> will ever tell them to move out of that state.
> The namenode / lease manager will retry forever to close the file, but it 
> will always complain it is waiting for blocks to reach minimal replication.
> I have a simple test and patch to fix this, but I think it warrants some 
> discussion on whether this is the correct thing to do, or if I need to put 
> the fix behind a config switch.
> My idea, is that if lease recovery occurs, and the block is still waiting on 
> "minimal replication", just put the file back to UNDER_CONSTRUCTION so that 
> on the next lease recovery attempt, BLOCK RECOVERY will happen, close the 
> file and move the replicas to FINALIZED.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize

2020-12-14 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-15725:
-
Fix Version/s: 3.2.3
   2.10.2
   3.1.5
   3.4.0
   3.3.1

> Lease Recovery never completes for a committed block which the DNs never 
> finalize
> -
>
> Key: HDFS-15725
> URL: https://issues.apache.org/jira/browse/HDFS-15725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: HDFS-15725.001.patch, HDFS-15725.002.patch, 
> HDFS-15725.003.patch, HDFS-15725.branch-2.10.001.patch, 
> HDFS-15725.branch-3.2.001.patch, lease_recovery_2_10.patch
>
>
> It a very rare condition, the HDFS client process can get killed right at the 
> time it is completing a block / file.
> The client sends the "complete" call to the namenode, moving the block into a 
> committed state, but it dies before it can send the final packet to the 
> Datanodes telling them to finalize the block.
> This means the blocks are stuck on the datanodes in RBW state and nothing 
> will ever tell them to move out of that state.
> The namenode / lease manager will retry forever to close the file, but it 
> will always complain it is waiting for blocks to reach minimal replication.
> I have a simple test and patch to fix this, but I think it warrants some 
> discussion on whether this is the correct thing to do, or if I need to put 
> the fix behind a config switch.
> My idea, is that if lease recovery occurs, and the block is still waiting on 
> "minimal replication", just put the file back to UNDER_CONSTRUCTION so that 
> on the next lease recovery attempt, BLOCK RECOVERY will happen, close the 
> file and move the replicas to FINALIZED.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize

2020-12-14 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248969#comment-17248969
 ] 

Stephen O'Donnell commented on HDFS-15725:
--

Committed this all the way down to branch-2.10. Thanks for the reviews 
[~kihwal] and [~szetszwo].

> Lease Recovery never completes for a committed block which the DNs never 
> finalize
> -
>
> Key: HDFS-15725
> URL: https://issues.apache.org/jira/browse/HDFS-15725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
> Attachments: HDFS-15725.001.patch, HDFS-15725.002.patch, 
> HDFS-15725.003.patch, HDFS-15725.branch-2.10.001.patch, 
> HDFS-15725.branch-3.2.001.patch, lease_recovery_2_10.patch
>
>
> It a very rare condition, the HDFS client process can get killed right at the 
> time it is completing a block / file.
> The client sends the "complete" call to the namenode, moving the block into a 
> committed state, but it dies before it can send the final packet to the 
> Datanodes telling them to finalize the block.
> This means the blocks are stuck on the datanodes in RBW state and nothing 
> will ever tell them to move out of that state.
> The namenode / lease manager will retry forever to close the file, but it 
> will always complain it is waiting for blocks to reach minimal replication.
> I have a simple test and patch to fix this, but I think it warrants some 
> discussion on whether this is the correct thing to do, or if I need to put 
> the fix behind a config switch.
> My idea, is that if lease recovery occurs, and the block is still waiting on 
> "minimal replication", just put the file back to UNDER_CONSTRUCTION so that 
> on the next lease recovery attempt, BLOCK RECOVERY will happen, close the 
> file and move the replicas to FINALIZED.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13965) hadoop.security.kerberos.ticket.cache.path setting is not honored when KMS encryption is enabled.

2020-12-14 Thread Arun Prabu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248902#comment-17248902
 ] 

Arun Prabu commented on HDFS-13965:
---

Thanks for the suggestion. We are planning to use ticket cache file specific to 
our software as that would solve the issue. We will validate this completely 
and get back

> hadoop.security.kerberos.ticket.cache.path setting is not honored when KMS 
> encryption is enabled.
> -
>
> Key: HDFS-13965
> URL: https://issues.apache.org/jira/browse/HDFS-13965
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, kms
>Affects Versions: 2.7.3, 2.7.7
>Reporter: LOKESKUMAR VIJAYAKUMAR
>Assignee: Kitti Nanasi
>Priority: Major
>
> _We use the *+hadoop.security.kerberos.ticket.cache.path+* setting to provide 
> a custom kerberos cache path for all hadoop operations to be run as specified 
> user. But this setting is not honored when KMS encryption is enabled._
> _The below program to read a file works when KMS encryption is not enabled, 
> but it fails when the KMS encryption is enabled._
> _Looks like *hadoop.security.kerberos.ticket.cache.path* setting is not 
> honored by *createConnection on KMSClientProvider.java.*_
>  
> HadoopTest.java (CLASSPATH needs to be set to compile and run)
>  
> import java.io.InputStream;
> import java.net.URI;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
>  
> public class HadoopTest {
>     public static int runRead(String[] args) throws Exception{
>     if (args.length < 3) {
>     System.err.println("HadoopTest hadoop_file_path 
> hadoop_user kerberos_cache");
>     return 1;
>     }
>     Path inputPath = new Path(args[0]);
>     Configuration conf = new Configuration();
>     URI defaultURI = FileSystem.getDefaultUri(conf);
>     
> conf.set("hadoop.security.kerberos.ticket.cache.path",args[2]);
>     FileSystem fs = 
> FileSystem.newInstance(defaultURI,conf,args[1]);
>     InputStream is = fs.open(inputPath);
>     byte[] buffer = new byte[4096];
>     int nr = is.read(buffer);
>     while (nr != -1)
>     {
>     System.out.write(buffer, 0, nr);
>     nr = is.read(buffer);
>     }
>     return 0;
>     }
>     public static void main( String[] args ) throws Exception {
>     int returnCode = HadoopTest.runRead(args);
>     System.exit(returnCode);
>     }
> }
>  
>  
>  
> [root@lstrost3 testhadoop]# pwd
> /testhadoop
>  
> [root@lstrost3 testhadoop]# ls
> HadoopTest.java
>  
> [root@lstrost3 testhadoop]# export CLASSPATH=`hadoop classpath --glob`:.
>  
> [root@lstrost3 testhadoop]# javac HadoopTest.java
>  
> [root@lstrost3 testhadoop]# java HadoopTest
> HadoopTest  hadoop_file_path  hadoop_user  kerberos_cache
>  
> [root@lstrost3 testhadoop]# java HadoopTest /loki/loki.file loki 
> /tmp/krb5cc_1006
> 18/09/27 23:23:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 18/09/27 23:23:21 WARN shortcircuit.DomainSocketFactory: The short-circuit 
> local reads feature cannot be used because libhadoop cannot be loaded.
> Exception in thread "main" java.io.IOException: 
> org.apache.hadoop.security.authentication.client.AuthenticationException: 
> GSSException: *{color:#FF}No valid credentials provided (Mechanism level: 
> Failed to find any Kerberos tgt){color}*
>     at 
> {color:#FF}*org.apache.hadoop.crypto.key.kms.KMSClientProvider.createConnection(KMSClientProvider.java:551)*{color}
>     at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:831)
>     at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>     at 
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1393)
>     at 
> org.apache.hadoop.hdfs.DFSClient.createWrappedInputStream(DFSClient.java:1463)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:333)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327)
>     at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:340)
>     at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786)
>