[jira] [Commented] (HDFS-15273) CacheReplicationMonitor hold lock for long time and lead to NN out of service

2020-05-04 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099476#comment-17099476
 ] 

Wei-Chiu Chuang commented on HDFS-15273:


Thanks for reporting the issue! Do you have any estimate how many cache 
directives equals how much time? Any data points? Thanks

> CacheReplicationMonitor hold lock for long time and lead to NN out of service
> -
>
> Key: HDFS-15273
> URL: https://issues.apache.org/jira/browse/HDFS-15273
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: caching, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
>
> CacheReplicationMonitor scan Cache Directives and Cached BlockMap 
> periodically. If we add more and more cache directives, 
> CacheReplicationMonitor will cost very long time to rescan all of cache 
> directives and cache blocks. Meanwhile, scan operation hold global write 
> lock, during scan period, NameNode could not process other request.
> So I think we should warn this risk to end user who turn on CacheManager 
> feature before improve this implement.
> {code:java}
>   private void rescan() throws InterruptedException {
> scannedDirectives = 0;
> scannedBlocks = 0;
> try {
>   namesystem.writeLock();
>   try {
> lock.lock();
> if (shutdown) {
>   throw new InterruptedException("CacheReplicationMonitor was " +
>   "shut down.");
> }
> curScanCount = completedScanCount + 1;
>   } finally {
> lock.unlock();
>   }
>   resetStatistics();
>   rescanCacheDirectives();
>   rescanCachedBlockMap();
>   blockManager.getDatanodeManager().resetLastCachingDirectiveSentTime();
> } finally {
>   namesystem.writeUnlock();
> }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15272) Backport HDFS-12862 to branch-3.1

2020-05-04 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15272:
---
Fix Version/s: 3.1.5
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Backport HDFS-12862 to branch-3.1
> -
>
> Key: HDFS-15272
> URL: https://issues.apache.org/jira/browse/HDFS-15272
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.4
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.1.5
>
> Attachments: HDFS-15272.branch-3.1.001.patch
>
>
> Backport HDFS-12862 CacheDirective becomes invalid when NN restart or 
> failover to branch-3.1.4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14599) HDFS-12487 breaks test TestDiskBalancer.testDiskBalancerWithFedClusterWithOneNameServiceEmpty

2020-05-04 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099463#comment-17099463
 ] 

Wei-Chiu Chuang commented on HDFS-14599:


Sorry I missed this one. Just cherrypicked the change to branch-3.2 and 
branch-3.1.
[~gabor.bota] fyi (3.1.4 RM)

> HDFS-12487 breaks test 
> TestDiskBalancer.testDiskBalancerWithFedClusterWithOneNameServiceEmpty
> -
>
> Key: HDFS-14599
> URL: https://issues.apache.org/jira/browse/HDFS-14599
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Affects Versions: 3.3.0, 3.2.1, 3.1.3
>Reporter: Wei-Chiu Chuang
>Assignee: Xiaoqiao He
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0, 3.2.2, 3.1.5
>
> Attachments: HDFS-14599.001.patch, HDFS-14599.002.patch
>
>
> It looks like HDFS-12487 changes the error message expected by 
> {{TestDiskBalancer#testDiskBalancerWithFedClusterWithOneNameServiceEmpty}}.
> The test expects error "There are no blocks in the blockPool" but after 
> HDFS-12487, it returns error string "NextBlock call returned null.No valid 
> block to copy."
> Probably the simplest approach to fix it is to update the expected error 
> string.
> Thoughts? [~bharatviswa] you crafted the test in HDFS-13715. Should we update 
> the expected error string, or revert HDFS-12487?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15305) Extend ViewFS and provide ViewFSOverloadScheme implementation with scheme configurable.

2020-05-04 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099464#comment-17099464
 ] 

Hudson commented on HDFS-15305:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18216 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18216/])
HDFS-15305. Extend ViewFS and provide ViewFileSystemOverloadScheme (github: rev 
9c8236d04dfc3d4cefe7a00b63625f60ee232cfe)
* (add) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemOverloadSchemeLocalFileSystem.java
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/viewfs/ViewFileSystemBaseTest.java
* (edit) 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FileSystemContractBaseTest.java
* (add) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemOverloadSchemeHdfsFileSystemContract.java
* (add) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemOverloadSchemeWithHdfsScheme.java
* (add) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystemOverloadScheme.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FsConstants.java


> Extend ViewFS and provide ViewFSOverloadScheme implementation with scheme 
> configurable.
> ---
>
> Key: HDFS-15305
> URL: https://issues.apache.org/jira/browse/HDFS-15305
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hadoop-client, hdfs-client, viewfs
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>
> Provide ViewFsOverloadScheme implementation by extending ViewFileSystem class.
>  # When target scheme and uri scheme matches, it should handle to create 
> target filesystems different way than using FileSystem.get API.
>  # Provide the flexibility to configure overload scheme.
> ex: by setting hdfs scheme and impl to ViewFsOverloadScheme, users should be 
> able to continue working with hdfs scheme uris and should be able to mount 
> any hadoop compatible file systems as target. It will follow the same mount 
> link configuration pattern as ViewFileSystem. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14599) HDFS-12487 breaks test TestDiskBalancer.testDiskBalancerWithFedClusterWithOneNameServiceEmpty

2020-05-04 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14599:
---
Fix Version/s: 3.1.5
   3.2.2

> HDFS-12487 breaks test 
> TestDiskBalancer.testDiskBalancerWithFedClusterWithOneNameServiceEmpty
> -
>
> Key: HDFS-14599
> URL: https://issues.apache.org/jira/browse/HDFS-14599
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: diskbalancer
>Affects Versions: 3.3.0, 3.2.1, 3.1.3
>Reporter: Wei-Chiu Chuang
>Assignee: Xiaoqiao He
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.0, 3.2.2, 3.1.5
>
> Attachments: HDFS-14599.001.patch, HDFS-14599.002.patch
>
>
> It looks like HDFS-12487 changes the error message expected by 
> {{TestDiskBalancer#testDiskBalancerWithFedClusterWithOneNameServiceEmpty}}.
> The test expects error "There are no blocks in the blockPool" but after 
> HDFS-12487, it returns error string "NextBlock call returned null.No valid 
> block to copy."
> Probably the simplest approach to fix it is to update the expected error 
> string.
> Thoughts? [~bharatviswa] you crafted the test in HDFS-13715. Should we update 
> the expected error string, or revert HDFS-12487?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15305) Extend ViewFS and provide ViewFSOverloadScheme implementation with scheme configurable.

2020-05-04 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-15305:
---
Hadoop Flags: Reviewed
  Resolution: Fixed
  Status: Resolved  (was: Patch Available)

This PR merged into trunk now. Thanks

> Extend ViewFS and provide ViewFSOverloadScheme implementation with scheme 
> configurable.
> ---
>
> Key: HDFS-15305
> URL: https://issues.apache.org/jira/browse/HDFS-15305
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hadoop-client, hdfs-client, viewfs
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>
> Provide ViewFsOverloadScheme implementation by extending ViewFileSystem class.
>  # When target scheme and uri scheme matches, it should handle to create 
> target filesystems different way than using FileSystem.get API.
>  # Provide the flexibility to configure overload scheme.
> ex: by setting hdfs scheme and impl to ViewFsOverloadScheme, users should be 
> able to continue working with hdfs scheme uris and should be able to mount 
> any hadoop compatible file systems as target. It will follow the same mount 
> link configuration pattern as ViewFileSystem. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15272) Backport HDFS-12862 to branch-3.1

2020-05-04 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099448#comment-17099448
 ] 

Wei-Chiu Chuang commented on HDFS-15272:


+1 I'm sorry I missed this one.

> Backport HDFS-12862 to branch-3.1
> -
>
> Key: HDFS-15272
> URL: https://issues.apache.org/jira/browse/HDFS-15272
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.4
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-15272.branch-3.1.001.patch
>
>
> Backport HDFS-12862 CacheDirective becomes invalid when NN restart or 
> failover to branch-3.1.4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15272) Backport HDFS-12862 to branch-3.1

2020-05-04 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15272:
---
Fix Version/s: (was: 3.1.5)

> Backport HDFS-12862 to branch-3.1
> -
>
> Key: HDFS-15272
> URL: https://issues.apache.org/jira/browse/HDFS-15272
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.4
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-15272.branch-3.1.001.patch
>
>
> Backport HDFS-12862 CacheDirective becomes invalid when NN restart or 
> failover to branch-3.1.4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

2020-05-04 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099434#comment-17099434
 ] 

Wei-Chiu Chuang commented on HDFS-13183:


I am really sorry I meant to review but got distracted.

I would like to push this feature to the finish line, because CRFS is a big 
feature and will take time to stabilize. Plus, it requires an additional 
Observer NameNode. The logistics of adding an extra master namenode adds 
additional complexity.

A few comments on the patch:
* does it work in federated cluster? IIRC you have a large federated cluster so 
I am assuming the answer is yes, but does work out of box or does it require 
extra configuration ? (Sorry, don't have much experience with HDFS federation)
* Looks like the balancer determine which NN is the sbnn at start, and then use 
it til the end. There are two issues:
** failover. if a failover happens, the balancer can't adapt and will then send 
the requests to ANN. That is fine as it shouldn't fail the balancer, but it 
increases the new ANN overhead.
** multiple standby namenode support. The balancer always choose the first 
available standby namenode. This is fine, since in any case there can be only 
one balancer running at a time.

Also, just want to say that you don't actually need to UNCHECKED 
FSNamesystem#getBlocks(). If dfs.ha.allow.stale.reads is true, Standby NN 
accepts the request as well. That is an extra configuration so probably not 
ideal.

> Standby NameNode process getBlocks request to reduce Active load
> 
>
> Key: HDFS-13183
> URL: https://issues.apache.org/jira/browse/HDFS-13183
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover, namenode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, 
> HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15323) StandbyNode fails transition to active due to insufficient transaction tailing

2020-05-04 Thread Gabor Bota (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099387#comment-17099387
 ] 

Gabor Bota commented on HDFS-15323:
---

[~shv], sorry but it won't be included, I'm going to send the mail with the RC0 
in a few minutes.

> StandbyNode fails transition to active due to insufficient transaction tailing
> --
>
> Key: HDFS-15323
> URL: https://issues.apache.org/jira/browse/HDFS-15323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, qjm
>Affects Versions: 2.7.7
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 2.10.1, 3.4.0
>
> Attachments: HDFS-15323-branch-2.10.002.patch, 
> HDFS-15323.000.unitTest.patch, HDFS-15323.001.patch, HDFS-15323.002.patch
>
>
> StandbyNode is asked to {{transitionToActive()}}. If it fell too far behind 
> in tailing journal transaction (from QJM) it can crash with 
> {{IllegalStateException}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15311) [SBN Read] High frequency reQueue cause Reader's performance to degrade

2020-05-04 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099384#comment-17099384
 ] 

Konstantin Shvachko commented on HDFS-15311:


 Was proposing to avoid re-queue altogether with a cyclical queue, see 
HDFS-15291.
Although the throughput is decreasing with {{autoMsyncPeriodMs = 0}} probably 
because it doubles the number of rpc calls.

> [SBN Read] High frequency reQueue cause Reader's performance to degrade
> ---
>
> Key: HDFS-15311
> URL: https://issues.apache.org/jira/browse/HDFS-15311
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: xuzq
>Priority: Major
>
> If _autoMsyncPeriodMs_ is 0, will do _msync_ for each read rpc.
> On the observer server side, it will cause high frequency reQueue in Handler.
> As the Queue is BlockingQueue, so it will cause Readers(small number)  and 
> Handlers(large number) competing for BlockingQueue locks.
> It will cause the throughput decrease.
>  
> Maybe we can let the handler sleep a little time to wait the StateId to 
> decrease ReQueue.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15332) Quota Space consumed was wrong in truncate with Snapshots

2020-05-04 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099345#comment-17099345
 ] 

Hadoop QA commented on HDFS-15332:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
59s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
4m 43s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m  
1s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
59s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 35s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 15s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
44s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}152m 16s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeMetrics |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29235/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15332 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13002022/HDFS-15332.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux f8e64ce5028a 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / ebb878bab99 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/29235/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 

[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2020-05-04 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099342#comment-17099342
 ] 

Wei-Chiu Chuang commented on HDFS-15160:


[~zhuqi] did you try the latest patch and how did it go? Thanks

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15270) Account for *env == NULL in hdfsThreadDestructor

2020-05-04 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099329#comment-17099329
 ] 

Hudson commented on HDFS-15270:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18215 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18215/])
HDFS-15270. Account for *env == NULL in hdfsThreadDestructor (#1951) (github: 
rev 1996351b0b7be6866eda73223ab6ef1ec78d30cd)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/os/windows/thread_local_storage.c
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/os/posix/thread_local_storage.c


> Account for *env == NULL in hdfsThreadDestructor
> 
>
> Key: HDFS-15270
> URL: https://issues.apache.org/jira/browse/HDFS-15270
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Please refer to the "steps to reproduce" the failure in 
> https://github.com/eclipse/openj9/issues/7752#issue-521732953.
>Reporter: Babneet Singh
>Assignee: Babneet Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> OpenJ9 JVM properly terminates the thread before hdfsThreadDestructor is
> invoked. JNIEnv is a mirror of J9VMThread in OpenJ9. After proper thread
> termination, accessing JNIEnv in hdfsThreadDestructor (*env)->GetJavaVM,
> yields a SIGSEGV since *env is NULL after thread cleanup is performed.
> The main purpose of hdfsThreadDestructor is to invoke
> DetachCurrentThread, which performs thread cleanup in OpenJ9. Since
> OpenJ9 performs thread cleanup before hdfsThreadDestructor is invoked,
> hdfsThreadDestructor should account for *env == NULL and skip
> DetachCurrentThread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15311) [SBN Read] High frequency reQueue cause Reader's performance to degrade

2020-05-04 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099321#comment-17099321
 ] 

Wei-Chiu Chuang commented on HDFS-15311:


[~cliang] [~xkrogen] [~shv] thoughts?

> [SBN Read] High frequency reQueue cause Reader's performance to degrade
> ---
>
> Key: HDFS-15311
> URL: https://issues.apache.org/jira/browse/HDFS-15311
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: xuzq
>Priority: Major
>
> If _autoMsyncPeriodMs_ is 0, will do _msync_ for each read rpc.
> On the observer server side, it will cause high frequency reQueue in Handler.
> As the Queue is BlockingQueue, so it will cause Readers(small number)  and 
> Handlers(large number) competing for BlockingQueue locks.
> It will cause the throughput decrease.
>  
> Maybe we can let the handler sleep a little time to wait the StateId to 
> decrease ReQueue.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15270) Account for *env == NULL in hdfsThreadDestructor

2020-05-04 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang resolved HDFS-15270.

Fix Version/s: 3.4.0
   Resolution: Fixed

Thanks [~babsingh] this is in the trunk. Do you have a branch in mind that you 
want this cherrypicked to?

> Account for *env == NULL in hdfsThreadDestructor
> 
>
> Key: HDFS-15270
> URL: https://issues.apache.org/jira/browse/HDFS-15270
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Please refer to the "steps to reproduce" the failure in 
> https://github.com/eclipse/openj9/issues/7752#issue-521732953.
>Reporter: Babneet Singh
>Assignee: Babneet Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> OpenJ9 JVM properly terminates the thread before hdfsThreadDestructor is
> invoked. JNIEnv is a mirror of J9VMThread in OpenJ9. After proper thread
> termination, accessing JNIEnv in hdfsThreadDestructor (*env)->GetJavaVM,
> yields a SIGSEGV since *env is NULL after thread cleanup is performed.
> The main purpose of hdfsThreadDestructor is to invoke
> DetachCurrentThread, which performs thread cleanup in OpenJ9. Since
> OpenJ9 performs thread cleanup before hdfsThreadDestructor is invoked,
> hdfsThreadDestructor should account for *env == NULL and skip
> DetachCurrentThread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15323) StandbyNode fails transition to active due to insufficient transaction tailing

2020-05-04 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15323:

Fix Version/s: 3.3.0

> StandbyNode fails transition to active due to insufficient transaction tailing
> --
>
> Key: HDFS-15323
> URL: https://issues.apache.org/jira/browse/HDFS-15323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, qjm
>Affects Versions: 2.7.7
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 3.3.0, 3.2.2, 2.10.1, 3.4.0
>
> Attachments: HDFS-15323-branch-2.10.002.patch, 
> HDFS-15323.000.unitTest.patch, HDFS-15323.001.patch, HDFS-15323.002.patch
>
>
> StandbyNode is asked to {{transitionToActive()}}. If it fell too far behind 
> in tailing journal transaction (from QJM) it can crash with 
> {{IllegalStateException}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14283) DFSInputStream to prefer cached replica

2020-05-04 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099302#comment-17099302
 ] 

Ayush Saxena commented on HDFS-14283:
-

Thanx [~leosun08] for the patch.

{code:java}
+  if (!deadNodes.containsKey(cachedLocs[i])
{code}
For this can we use {{dfsClient.getDeadNodes(this).containsKey(nodes[i])}}? it 
is added as part of DeadDatanodeDetection feature. If yes, May be we can 
refactor the if checks into a single method and use at both places.

{code:java}
return new DNAddrPair(chosenNode, targetAddr, storageType, block);
{code}
{{storagaeType}} will be {{null}} if using {{cachedReplica}}, is it ok?


> DFSInputStream to prefer cached replica
> ---
>
> Key: HDFS-14283
> URL: https://issues.apache.org/jira/browse/HDFS-14283
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.6.0
> Environment: HDFS Caching
>Reporter: Wei-Chiu Chuang
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14283.001.patch, HDFS-14283.002.patch, 
> HDFS-14283.003.patch, HDFS-14283.004.patch, HDFS-14283.005.patch, 
> HDFS-14283.006.patch, HDFS-14283.007.patch
>
>
> HDFS Caching offers performance benefits. However, currently NameNode does 
> not treat cached replica with higher priority, so HDFS caching is only useful 
> when cache replication = 3, that is to say, all replicas are cached in 
> memory, so that a client doesn't randomly pick an uncached replica.
> HDFS-6846 proposed to let NameNode give higher priority to cached replica. 
> Changing a logic in NameNode is always tricky so that didn't get much 
> traction. Here I propose a different approach: let client (DFSInputStream) 
> prefer cached replica.
> A {{LocatedBlock}} object already contains cached replica location so a 
> client has the needed information. I think we can change 
> {{DFSInputStream#getBestNodeDNAddrPair()}} for this purpose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15323) StandbyNode fails transition to active due to insufficient transaction tailing

2020-05-04 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099281#comment-17099281
 ] 

Konstantin Shvachko commented on HDFS-15323:


Thanks [~ayushtkn], please do.
[~gabor.bota] would be good if this jira could make it into 3.1.4 release.

> StandbyNode fails transition to active due to insufficient transaction tailing
> --
>
> Key: HDFS-15323
> URL: https://issues.apache.org/jira/browse/HDFS-15323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, qjm
>Affects Versions: 2.7.7
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 3.2.2, 2.10.1, 3.4.0
>
> Attachments: HDFS-15323-branch-2.10.002.patch, 
> HDFS-15323.000.unitTest.patch, HDFS-15323.001.patch, HDFS-15323.002.patch
>
>
> StandbyNode is asked to {{transitionToActive()}}. If it fell too far behind 
> in tailing journal transaction (from QJM) it can crash with 
> {{IllegalStateException}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15323) StandbyNode fails transition to active due to insufficient transaction tailing

2020-05-04 Thread Konstantin Shvachko (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-15323:
---
Fix Version/s: 3.4.0
   2.10.1
   3.2.2

> StandbyNode fails transition to active due to insufficient transaction tailing
> --
>
> Key: HDFS-15323
> URL: https://issues.apache.org/jira/browse/HDFS-15323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, qjm
>Affects Versions: 2.7.7
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Fix For: 3.2.2, 2.10.1, 3.4.0
>
> Attachments: HDFS-15323-branch-2.10.002.patch, 
> HDFS-15323.000.unitTest.patch, HDFS-15323.001.patch, HDFS-15323.002.patch
>
>
> StandbyNode is asked to {{transitionToActive()}}. If it fell too far behind 
> in tailing journal transaction (from QJM) it can crash with 
> {{IllegalStateException}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15332) Quota Space consumed was wrong in truncate with Snapshots

2020-05-04 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15332:
-
Description: 
On calculating space quota usage
{code:java}
   if (file.getBlocks() != null) {
allBlocks.addAll(Arrays.asList(file.getBlocks()));
   }
   if (removed.getBlocks() != null) {
allBlocks.addAll(Arrays.asList(removed.getBlocks()));
   }  
   for (BlockInfo b: allBlocks) { {code}
we missed out the blocks of file snapshot feature's Diffs

> Quota Space consumed was wrong in truncate with Snapshots
> -
>
> Key: HDFS-15332
> URL: https://issues.apache.org/jira/browse/HDFS-15332
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15332.001.patch
>
>
> On calculating space quota usage
> {code:java}
>if (file.getBlocks() != null) {
> allBlocks.addAll(Arrays.asList(file.getBlocks()));
>}
>if (removed.getBlocks() != null) {
> allBlocks.addAll(Arrays.asList(removed.getBlocks()));
>}  
>for (BlockInfo b: allBlocks) { {code}
> we missed out the blocks of file snapshot feature's Diffs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15323) StandbyNode fails transition to active due to insufficient transaction tailing

2020-05-04 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099219#comment-17099219
 ] 

Ayush Saxena commented on HDFS-15323:
-

Thanx [~shv] 
For 3.1.4 RC0 seems have been created, Not sure guess need to check with the 
Release Manager., Can cherry-pick to 3.3.0 branch.

> StandbyNode fails transition to active due to insufficient transaction tailing
> --
>
> Key: HDFS-15323
> URL: https://issues.apache.org/jira/browse/HDFS-15323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, qjm
>Affects Versions: 2.7.7
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15323-branch-2.10.002.patch, 
> HDFS-15323.000.unitTest.patch, HDFS-15323.001.patch, HDFS-15323.002.patch
>
>
> StandbyNode is asked to {{transitionToActive()}}. If it fell too far behind 
> in tailing journal transaction (from QJM) it can crash with 
> {{IllegalStateException}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13904) ContentSummary does not always respect processing limit, resulting in long lock acquisitions

2020-05-04 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099220#comment-17099220
 ] 

Erik Krogen commented on HDFS-13904:


Hi [~umamaheswararao], I'm not actively working on this. I don't believe we 
applied any fix to the NN; instead we focused on migrating users to the 
{{getQuotaUsage()}} API, since it was quota checks which caused the really 
large issues.

Yes, the NN had consistent load throughout (besides some minor blips around 
restarts of course). It indeed was interesting to see the difference across 
restarts. I don't have any good ideas there.

GC pauses were low and consistent with normal behavior.

> ContentSummary does not always respect processing limit, resulting in long 
> lock acquisitions
> 
>
> Key: HDFS-13904
> URL: https://issues.apache.org/jira/browse/HDFS-13904
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> HDFS-4995 added a config {{dfs.content-summary.limit}} which allows for an 
> administrator to set a limit on the number of entries processed during a 
> single acquisition of the {{FSNamesystemLock}} during the creation of a 
> content summary. This is useful to prevent very long (multiple seconds) 
> pauses on the NameNode when {{getContentSummary}} is called on large 
> directories.
> However, even on versions with HDFS-4995, we have seen warnings like:
> {code}
> INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem read 
> lock held for 9398 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:950)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:188)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readUnlock(FSNamesystem.java:1486)
> org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.yield(ContentSummaryComputationContext.java:109)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:679)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeContentSummary(INodeDirectory.java:642)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:656)
> {code}
> happen quite consistently when {{getContentSummary}} was called on a large 
> directory on a heavily-loaded NameNode. Such long pauses completely destroy 
> the performance of the NameNode. We have the limit set to its default of 
> 5000; if it was respected, clearly there would not be a 10-second pause.
> The current {{yield()}} code within {{ContentSummaryComputationContext}} 
> looks like:
> {code}
>   public boolean yield() {
> // Are we set up to do this?
> if (limitPerRun <= 0 || dir == null || fsn == null) {
>   return false;
> }
> // Have we reached the limit?
> long currentCount = counts.getFileCount() +
> counts.getSymlinkCount() +
> counts.getDirectoryCount() +
> counts.getSnapshotableDirectoryCount();
> if (currentCount <= nextCountLimit) {
>   return false;
> }
> // Update the next limit
> nextCountLimit = currentCount + limitPerRun;
> boolean hadDirReadLock = dir.hasReadLock();
> boolean hadDirWriteLock = dir.hasWriteLock();
> boolean hadFsnReadLock = fsn.hasReadLock();
> boolean hadFsnWriteLock = fsn.hasWriteLock();
> // sanity check.
> if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
> hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
> fsn.getReadHoldCount() != 1) {
>   // cannot relinquish
>   return false;
> }
> // unlock
> dir.readUnlock();
> fsn.readUnlock("contentSummary");
> try {
>   Thread.sleep(sleepMilliSec, sleepNanoSec);
> } catch (InterruptedException ie) {
> } finally {
>   // reacquire
>   fsn.readLock();
>   dir.readLock();
> }
> yieldCount++;
> return true;
>   }
> {code}
> We believe that this check in particular is the culprit:
> {code}
> if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
> hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
> fsn.getReadHoldCount() != 1) {
>   // cannot relinquish
>   return false;
> }
> {code}
> The content summary computation will only relinquish the lock if it is 
> currently the _only_ holder of the lock. Given the high volume of read 
> requests on a heavily loaded NameNode, especially when unfair locking is 
> enabled, it is likely there may be another holder of the read lock performing 
> some short-lived operation. By refusing to give up the lock in this case, the 
> content 

[jira] [Updated] (HDFS-15332) Quota Space consumed was wrong in truncate with Snapshots

2020-05-04 Thread hemanthboyina (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hemanthboyina updated HDFS-15332:
-
Attachment: HDFS-15332.001.patch
Status: Patch Available  (was: Open)

> Quota Space consumed was wrong in truncate with Snapshots
> -
>
> Key: HDFS-15332
> URL: https://issues.apache.org/jira/browse/HDFS-15332
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15332.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15289) Allow viewfs mounts with hdfs scheme and centralized mount table

2020-05-04 Thread Uma Maheswara Rao G (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091107#comment-17091107
 ] 

Uma Maheswara Rao G edited comment on HDFS-15289 at 5/4/20, 6:18 PM:
-

Thanks a lot, [~virajith]  for comments. Glad to hear that you guys are looking 
for similar things.
 Pretty much our targeted use cases are similar to what you mentioned. 
 First and foremost, our goal is to make ViewFSOverloadScheme configurable with 
different schemes and “hdfs” is a priority use case as Hive-like systems 
persist “hdfs://nn1” uris in meta stores.

Coming to tools support, we discussed some level of the details and we thought 
we should first make ViewFS support different schemes ( ex: hdfs) and keep 
configuration centrally to manage easy mount-configurations.
{quote}saveNamespace and other methods in FileSystem all needed to be 
implemented in ViewFSOveraloadScheme. Do you have any specific plans around 
testing this?
{quote}
I have a question here. In ViewFSOverloadScheme case, we will have multiple 
target file systems.
 So, when user call ViewFSOverloadScheme#saveNameSpace, we need to delegate 
this to all hdfs specific target fs? Or but in reality users may want to run 
this on specific targets right?
 DistributedFileSystem interface tagged with:
{quote}@InterfaceAudience.LimitedPrivate(
 Unknown macro: \{ "MapReduce", "HBase" }
 )
 @InterfaceStability.Unstable
{quote}
Unfortunately some/many users directly used DFS classes. But we have a public 
exposed class for administration functions
{quote}/**
 * The public API for performing administrative functions on HDFS. Those writing
 * applications against HDFS should prefer this interface to directly accessing
 * functionality in DistributedFileSystem or DFSClient.
 *
 * Note that this is distinct from the similarly-named DFSAdmin, which
 * is a class that provides the functionality for the CLI `hdfs dfsadmin ...'
 * commands.
 */
 @InterfaceAudience.Public
 @InterfaceStability.Evolving
 public class HdfsAdmin {{quote}
Can we extend this class to support ViewFS functionally for administration 
functions?
 I mean we can do something like: Currently HdfsAdmin holds DFS class and 
delegates calls to DFS. Probably we can modify this class or extend it to 
support ViewFSOverloadScheme specific functionality?
 If that does not work, sure we can discuss which API needed to be added in 
ViewFSOverloadScheme and we may need additional APIs like when users want to 
run on specific target child filesystems.
 Actually ViewFS already exposed APIs like getChildFileSystems etc. We can add 
more functions here to achieve. 
 example: ViewFSOverloadScheme#getTargetFS(“/mountPath”); This would return DFS 
if /mounPath was pointed to the dfs cluster. 
 It would be great if you have some thoughts on how we wanted to use 
“saveNameSpace” like API when we have multiple target hdfs links mounted.
{quote}Admins will not have a way to directly access HDFS unless admin tooling 
explicitly sets the right properties. Is this something you considered? How do 
you plan to make admin tools work?
{quote}
Yes, I agree. However supporting single target dfs ( overloaded scheme target 
fs ) would be easy. DFSAdmin gets FS from ViewFSOverLoadScheme and gets the 
overloadedScheme fs from there and delegate calls. 
 Challenge here is, we will have multiple DFS clusters configured as targets. 
We should make current DFSAdmin to get all matching hdfs scheme target file 
systems from OverLoadedScheme and delegate the calls. More appropriate way may 
be to extend DFSAdmin. 
 I think today if a user configures defaultFS as “viewfs://” and wants to 
connect to some of the child hdfs clusters using DFSAdmin, we have the same 
problem. So, this problem will be there in ViewFS itself and we should improve 
to provide flexibility to access child filesystems.

One thought is, admin commands use -fs option and specify the required nn 
address. DFSAdmin can use ViewFSOverloadScheme#getOverloadSchemeFS and pass the 
calls to that fs. 

Another way probably we have to build ViewDFSAdmin which will provide access to 
child file systems via ViewFSOverLoadedScheme APIs.
{quote}How to handle cases where DistributedFileSystem is used instead of 
FileSystem?
{quote}
If users access DFS directly, they may need to get the childFileSystems from 
ViewFSOverloadScheme and check the instanceOf.
{quote}Do you plan to make ViewFSOveraloadScheme extend DistributedFileSystem?
{quote}
The plan is to extend the ViewFileSystem class. So, we will retain the pretty 
much viewFS client side mount-building logic as is. And we will address FS 
looping issues and remote configuration loading in extended class. Also we can 
add more usability functions like getting a child file system by scheme etc.


was (Author: umamaheswararao):
Thanks a lot, [~virajith]  for comments. Glad to hear that you guys are looking 
for 

[jira] [Commented] (HDFS-15323) StandbyNode fails transition to active due to insufficient transaction tailing

2020-05-04 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099209#comment-17099209
 ] 

Hudson commented on HDFS-15323:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18214 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18214/])
HDFS-15323. StandbyNode fails transition to active due to insufficient (shv: 
rev ebb878bab991c242b5089a18881aa10abf318ea0)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyInProgressTail.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java


> StandbyNode fails transition to active due to insufficient transaction tailing
> --
>
> Key: HDFS-15323
> URL: https://issues.apache.org/jira/browse/HDFS-15323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, qjm
>Affects Versions: 2.7.7
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15323-branch-2.10.002.patch, 
> HDFS-15323.000.unitTest.patch, HDFS-15323.001.patch, HDFS-15323.002.patch
>
>
> StandbyNode is asked to {{transitionToActive()}}. If it fell too far behind 
> in tailing journal transaction (from QJM) it can crash with 
> {{IllegalStateException}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13904) ContentSummary does not always respect processing limit, resulting in long lock acquisitions

2020-05-04 Thread Uma Maheswara Rao G (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099204#comment-17099204
 ] 

Uma Maheswara Rao G commented on HDFS-13904:


Hi [~xkrogen], Any updates on this?

Just a question, GC Pause monitor not reporting any pauses right?

Do we have consistent load on NN on above mentioned 2 restart scenarios? 
Interesting to see after one of restart NN started reporting long lock and 
other restart not. 

Did you apply above proposed fix in your clusters and tried?

 

> ContentSummary does not always respect processing limit, resulting in long 
> lock acquisitions
> 
>
> Key: HDFS-13904
> URL: https://issues.apache.org/jira/browse/HDFS-13904
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> HDFS-4995 added a config {{dfs.content-summary.limit}} which allows for an 
> administrator to set a limit on the number of entries processed during a 
> single acquisition of the {{FSNamesystemLock}} during the creation of a 
> content summary. This is useful to prevent very long (multiple seconds) 
> pauses on the NameNode when {{getContentSummary}} is called on large 
> directories.
> However, even on versions with HDFS-4995, we have seen warnings like:
> {code}
> INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem read 
> lock held for 9398 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:950)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:188)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readUnlock(FSNamesystem.java:1486)
> org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.yield(ContentSummaryComputationContext.java:109)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:679)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeContentSummary(INodeDirectory.java:642)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:656)
> {code}
> happen quite consistently when {{getContentSummary}} was called on a large 
> directory on a heavily-loaded NameNode. Such long pauses completely destroy 
> the performance of the NameNode. We have the limit set to its default of 
> 5000; if it was respected, clearly there would not be a 10-second pause.
> The current {{yield()}} code within {{ContentSummaryComputationContext}} 
> looks like:
> {code}
>   public boolean yield() {
> // Are we set up to do this?
> if (limitPerRun <= 0 || dir == null || fsn == null) {
>   return false;
> }
> // Have we reached the limit?
> long currentCount = counts.getFileCount() +
> counts.getSymlinkCount() +
> counts.getDirectoryCount() +
> counts.getSnapshotableDirectoryCount();
> if (currentCount <= nextCountLimit) {
>   return false;
> }
> // Update the next limit
> nextCountLimit = currentCount + limitPerRun;
> boolean hadDirReadLock = dir.hasReadLock();
> boolean hadDirWriteLock = dir.hasWriteLock();
> boolean hadFsnReadLock = fsn.hasReadLock();
> boolean hadFsnWriteLock = fsn.hasWriteLock();
> // sanity check.
> if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
> hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
> fsn.getReadHoldCount() != 1) {
>   // cannot relinquish
>   return false;
> }
> // unlock
> dir.readUnlock();
> fsn.readUnlock("contentSummary");
> try {
>   Thread.sleep(sleepMilliSec, sleepNanoSec);
> } catch (InterruptedException ie) {
> } finally {
>   // reacquire
>   fsn.readLock();
>   dir.readLock();
> }
> yieldCount++;
> return true;
>   }
> {code}
> We believe that this check in particular is the culprit:
> {code}
> if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
> hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
> fsn.getReadHoldCount() != 1) {
>   // cannot relinquish
>   return false;
> }
> {code}
> The content summary computation will only relinquish the lock if it is 
> currently the _only_ holder of the lock. Given the high volume of read 
> requests on a heavily loaded NameNode, especially when unfair locking is 
> enabled, it is likely there may be another holder of the read lock performing 
> some short-lived operation. By refusing to give up the lock in this case, the 
> content summary computation ends up never relinquishing the lock.
> We propose to simply remove the readHoldCount checks from this {{yield()}}. 
> 

[jira] [Created] (HDFS-15332) Quota Space consumed was wrong in truncate with Snapshots

2020-05-04 Thread hemanthboyina (Jira)
hemanthboyina created HDFS-15332:


 Summary: Quota Space consumed was wrong in truncate with Snapshots
 Key: HDFS-15332
 URL: https://issues.apache.org/jira/browse/HDFS-15332
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: hemanthboyina
Assignee: hemanthboyina






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15323) StandbyNode fails transition to active due to insufficient transaction tailing

2020-05-04 Thread Konstantin Shvachko (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099198#comment-17099198
 ] 

Konstantin Shvachko commented on HDFS-15323:


Thanks [~ayushtkn] and [~xkrogen] for prompt reviews.
I just committed this to tunk, and branches 3.3, 3.2, 3.1, 2.10.
I lost track of ongoing releases, please cherry pick this to respective 
branches.

> StandbyNode fails transition to active due to insufficient transaction tailing
> --
>
> Key: HDFS-15323
> URL: https://issues.apache.org/jira/browse/HDFS-15323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, qjm
>Affects Versions: 2.7.7
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15323-branch-2.10.002.patch, 
> HDFS-15323.000.unitTest.patch, HDFS-15323.001.patch, HDFS-15323.002.patch
>
>
> StandbyNode is asked to {{transitionToActive()}}. If it fell too far behind 
> in tailing journal transaction (from QJM) it can crash with 
> {{IllegalStateException}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15255) Consider StorageType when DatanodeManager#sortLocatedBlock()

2020-05-04 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099125#comment-17099125
 ] 

Hadoop QA commented on HDFS-15255:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
52s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
3s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
26m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
46s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
38s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 11m 
10s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
26s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
 5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 21m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 21m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
34s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  3m 
14s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client generated 1 new 
+ 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  4m 
13s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0) {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  9m 42s{color} 
| {color:red} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  2m 10s{color} 
| {color:red} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}112m 39s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m  
7s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
59s{color} | {color:green} The patch does not generate ASF License 

[jira] [Commented] (HDFS-12288) Fix DataNode's xceiver count calculation

2020-05-04 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099081#comment-17099081
 ] 

Hadoop QA commented on HDFS-12288:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 3 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m  
3s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
1s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}117m  6s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}189m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestGetFileChecksum |
|   | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
|   | hadoop.hdfs.TestReconstructStripedFile |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29234/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-12288 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13001992/HDFS-12288.008.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 5d4cc3db6717 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 8dace8ff3a9 |
| Default Java | Private 

[jira] [Commented] (HDFS-15331) Remove invalid exclusions that minicluster dependency on HDFS

2020-05-04 Thread Wanqiang Ji (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099033#comment-17099033
 ] 

Wanqiang Ji commented on HDFS-15331:


[https://github.com/apache/hadoop/pull/1996]

> Remove invalid exclusions that minicluster dependency on HDFS
> -
>
> Key: HDFS-15331
> URL: https://issues.apache.org/jira/browse/HDFS-15331
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
>
> Ozone has split into independent repo, but the invalid exclusions (kubernetes 
> client) that minicluster dependency on HDFS is kept.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15331) Remove invalid exclusions that minicluster dependency on HDFS

2020-05-04 Thread Wanqiang Ji (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wanqiang Ji updated HDFS-15331:
---
Status: Patch Available  (was: Open)

> Remove invalid exclusions that minicluster dependency on HDFS
> -
>
> Key: HDFS-15331
> URL: https://issues.apache.org/jira/browse/HDFS-15331
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wanqiang Ji
>Assignee: Wanqiang Ji
>Priority: Major
>
> Ozone has split into independent repo, but the invalid exclusions (kubernetes 
> client) that minicluster dependency on HDFS is kept.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15331) Remove invalid exclusions that minicluster dependency on HDFS

2020-05-04 Thread Wanqiang Ji (Jira)
Wanqiang Ji created HDFS-15331:
--

 Summary: Remove invalid exclusions that minicluster dependency on 
HDFS
 Key: HDFS-15331
 URL: https://issues.apache.org/jira/browse/HDFS-15331
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Wanqiang Ji
Assignee: Wanqiang Ji


Ozone has split into independent repo, but the invalid exclusions (kubernetes 
client) that minicluster dependency on HDFS is kept.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15323) StandbyNode fails transition to active due to insufficient transaction tailing

2020-05-04 Thread Erik Krogen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17099021#comment-17099021
 ] 

Erik Krogen commented on HDFS-15323:


+1 pretty simple fix, LGTM.

> StandbyNode fails transition to active due to insufficient transaction tailing
> --
>
> Key: HDFS-15323
> URL: https://issues.apache.org/jira/browse/HDFS-15323
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, qjm
>Affects Versions: 2.7.7
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
>Priority: Major
> Attachments: HDFS-15323-branch-2.10.002.patch, 
> HDFS-15323.000.unitTest.patch, HDFS-15323.001.patch, HDFS-15323.002.patch
>
>
> StandbyNode is asked to {{transitionToActive()}}. If it fell too far behind 
> in tailing journal transaction (from QJM) it can crash with 
> {{IllegalStateException}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-12288) Fix DataNode's xceiver count calculation

2020-05-04 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun reassigned HDFS-12288:
--

Assignee: Lisheng Sun  (was: Chen Zhang)

> Fix DataNode's xceiver count calculation
> 
>
> Key: HDFS-12288
> URL: https://issues.apache.org/jira/browse/HDFS-12288
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Reporter: Lukas Majercak
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch, 
> HDFS-12288.003.patch, HDFS-12288.004.patch, HDFS-12288.005.patch, 
> HDFS-12288.006.patch, HDFS-12288.007.patch, HDFS-12288.008.patch
>
>
> The problem with the ThreadGroup.activeCount() method is that the method is 
> only a very rough estimate, and in reality returns the total number of 
> threads in the thread group as opposed to the threads actually running.
> In some DNs, we saw this to return 50~ for a long time, even though the 
> actual number of DataXceiver threads was next to none.
> This is a big issue as we use the xceiverCount to make decisions on the NN 
> for choosing replication source DN or returning DNs to clients for R/W.
> The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value 
> which only accounts for actual number of DataXcevier threads currently 
> running and thus represents the load on the DN much better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12288) Fix DataNode's xceiver count calculation

2020-05-04 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-12288:
---
Attachment: HDFS-12288.008.patch

> Fix DataNode's xceiver count calculation
> 
>
> Key: HDFS-12288
> URL: https://issues.apache.org/jira/browse/HDFS-12288
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Reporter: Lukas Majercak
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch, 
> HDFS-12288.003.patch, HDFS-12288.004.patch, HDFS-12288.005.patch, 
> HDFS-12288.006.patch, HDFS-12288.007.patch, HDFS-12288.008.patch
>
>
> The problem with the ThreadGroup.activeCount() method is that the method is 
> only a very rough estimate, and in reality returns the total number of 
> threads in the thread group as opposed to the threads actually running.
> In some DNs, we saw this to return 50~ for a long time, even though the 
> actual number of DataXceiver threads was next to none.
> This is a big issue as we use the xceiverCount to make decisions on the NN 
> for choosing replication source DN or returning DNs to clients for R/W.
> The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value 
> which only accounts for actual number of DataXcevier threads currently 
> running and thus represents the load on the DN much better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12288) Fix DataNode's xceiver count calculation

2020-05-04 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098889#comment-17098889
 ] 

Lisheng Sun commented on HDFS-12288:


hi [~zhangchen]  Are you still working on this jira?

If not, i will take over it. Hope you don't mind.

> Fix DataNode's xceiver count calculation
> 
>
> Key: HDFS-12288
> URL: https://issues.apache.org/jira/browse/HDFS-12288
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Reporter: Lukas Majercak
>Assignee: Chen Zhang
>Priority: Major
> Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch, 
> HDFS-12288.003.patch, HDFS-12288.004.patch, HDFS-12288.005.patch, 
> HDFS-12288.006.patch, HDFS-12288.007.patch
>
>
> The problem with the ThreadGroup.activeCount() method is that the method is 
> only a very rough estimate, and in reality returns the total number of 
> threads in the thread group as opposed to the threads actually running.
> In some DNs, we saw this to return 50~ for a long time, even though the 
> actual number of DataXceiver threads was next to none.
> This is a big issue as we use the xceiverCount to make decisions on the NN 
> for choosing replication source DN or returning DNs to clients for R/W.
> The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value 
> which only accounts for actual number of DataXcevier threads currently 
> running and thus represents the load on the DN much better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15255) Consider StorageType when DatanodeManager#sortLocatedBlock()

2020-05-04 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098868#comment-17098868
 ] 

Lisheng Sun commented on HDFS-15255:


Add the v007 patch. 

This patch removes equals and hashCode from DatanodeInfoWithStorage.

> Consider StorageType when DatanodeManager#sortLocatedBlock()
> 
>
> Key: HDFS-15255
> URL: https://issues.apache.org/jira/browse/HDFS-15255
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15255-findbugs-test.001.patch, 
> HDFS-15255.001.patch, HDFS-15255.002.patch, HDFS-15255.003.patch, 
> HDFS-15255.004.patch, HDFS-15255.005.patch, HDFS-15255.006.patch, 
> HDFS-15255.007.patch, experiment-find-bugs.001.patch
>
>
> When only one replica of a block is SDD, the others are HDD. 
> When the client reads the data, the current logic is that it considers the 
> distance between the client and the dn. I think it should also consider the 
> StorageType of the replica. Priority to return fast StorageType node when the 
> distance is same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15255) Consider StorageType when DatanodeManager#sortLocatedBlock()

2020-05-04 Thread Lisheng Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lisheng Sun updated HDFS-15255:
---
Attachment: HDFS-15255.007.patch

> Consider StorageType when DatanodeManager#sortLocatedBlock()
> 
>
> Key: HDFS-15255
> URL: https://issues.apache.org/jira/browse/HDFS-15255
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15255-findbugs-test.001.patch, 
> HDFS-15255.001.patch, HDFS-15255.002.patch, HDFS-15255.003.patch, 
> HDFS-15255.004.patch, HDFS-15255.005.patch, HDFS-15255.006.patch, 
> HDFS-15255.007.patch, experiment-find-bugs.001.patch
>
>
> When only one replica of a block is SDD, the others are HDD. 
> When the client reads the data, the current logic is that it considers the 
> distance between the client and the dn. I think it should also consider the 
> StorageType of the replica. Priority to return fast StorageType node when the 
> distance is same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15255) Consider StorageType when DatanodeManager#sortLocatedBlock()

2020-05-04 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098838#comment-17098838
 ] 

Stephen O'Donnell commented on HDFS-15255:
--

[~leosun08] Patch 06 no longer applies to trunk, probably due to the other 
change related to this we committed last week.

I asked some of my colleagues to check this find bug warnings, and they both 
believed it can be ignored.

Before setting an ignore annotation on the code, one person suggested just 
removing the equals (and also probably hashCode) from DatanodeInfoWithStorage. 
All it does in both these methods is call super, and the normal inheritance 
chain will do that anyway.

Could you try rebasing the 06 patch against trunk and then remove equals and 
hashCode from DatanodeInfoWithStorage and lets see if that gets rid of the find 
bugs warning?

> Consider StorageType when DatanodeManager#sortLocatedBlock()
> 
>
> Key: HDFS-15255
> URL: https://issues.apache.org/jira/browse/HDFS-15255
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15255-findbugs-test.001.patch, 
> HDFS-15255.001.patch, HDFS-15255.002.patch, HDFS-15255.003.patch, 
> HDFS-15255.004.patch, HDFS-15255.005.patch, HDFS-15255.006.patch, 
> experiment-find-bugs.001.patch
>
>
> When only one replica of a block is SDD, the others are HDD. 
> When the client reads the data, the current logic is that it considers the 
> distance between the client and the dn. I think it should also consider the 
> StorageType of the replica. Priority to return fast StorageType node when the 
> distance is same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15310) RBF: Not proxy client's clientId and callId caused RetryCache invalid in NameNode.

2020-05-04 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098807#comment-17098807
 ] 

Ayush Saxena commented on HDFS-15310:
-

[~hexiaoqiao] [~elgoiri] [~xuzq_zander] [~ferhui] I have notified at the dev. 
regarding this problem. Please add anything if I have missed, or which isn't 
clear. :)

> RBF: Not proxy client's clientId and callId caused RetryCache invalid in 
> NameNode.
> --
>
> Key: HDFS-15310
> URL: https://issues.apache.org/jira/browse/HDFS-15310
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: xuzq
>Assignee: xuzq
>Priority: Critical
>
> The RBF not proxy client's clientId and CallId to NameNode, it caused 
> RetryCache invalid in NameNode and some rpc may be failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15310) RBF: Not proxy client's clientId and callId caused RetryCache invalid in NameNode.

2020-05-04 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098782#comment-17098782
 ] 

Ayush Saxena commented on HDFS-15310:
-

Thanx everyone. Yahh, this seems to be have dragged a bit too much, we have 
discussed it in details even on the mailing list for Data Locality Problem but 
the solution decided there didn't conclude due to security reasons. 
Data Locality problem I think was still not causing as such problem, but Retry 
Cache one we need to do something this can have impact on the overall 
consistency of the system.
Anyway, I will shoot a discussion at the @dev list. Hope we get a solution this 
time. 

> RBF: Not proxy client's clientId and callId caused RetryCache invalid in 
> NameNode.
> --
>
> Key: HDFS-15310
> URL: https://issues.apache.org/jira/browse/HDFS-15310
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: xuzq
>Assignee: xuzq
>Priority: Critical
>
> The RBF not proxy client's clientId and CallId to NameNode, it caused 
> RetryCache invalid in NameNode and some rpc may be failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org