[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=571647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571647
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 25/Mar/21 04:57
Start Date: 25/Mar/21 04:57
Worklog Time Spent: 10m 
  Work Description: functioner commented on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-806363752


   > @functioner would you mind to add unit test to cover this improvement?
   
   I'm writing a unit test to cover this improvement. After I read the test 
cases of `TestEditLog` and `TestEditLogRace`, I think we can try to add a test 
similar to `TestEditLogRace#testDeadlock`. My current design is basically 
adding an `Edit` which intentionally sleeps for a while in its `logSyncNotify` 
method, blocking the next `Edit`, and test if the next `Edit` can finish soon.
   
   However, if we use this design, the test needs to access some private 
methods and classes in `FSEditLogAsync`. I think maybe it's not a good idea to 
change some modifiers in `FSEditLogAsync`. Can we use something like 
`@exposeToTest`? (sorry that I don't have experience in this part). Or can you 
come up with a better design? @Hexiaoqiao @jojochuang @ayushtkn @linyiqun 
Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 571647)
Time Spent: 50m  (was: 40m)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();   // line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
> // ...
>   }{code}
>     If the sendResponse operation in line 365 gets stuck, then the 

[jira] [Commented] (HDFS-15919) BlockPoolManager should log stack trace if unable to get Namenode addresses

2021-03-24 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308352#comment-17308352
 ] 

Xiaoqiao He commented on HDFS-15919:


+ cherry-pick to branch-3.1

> BlockPoolManager should log stack trace if unable to get Namenode addresses
> ---
>
> Key: HDFS-15919
> URL: https://issues.apache.org/jira/browse/HDFS-15919
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15919.001.patch
>
>
> If the hdfs config is badly configured, the datanode can fail to start with 
> this stack trace:
> {code}
> 2021-03-24 05:58:27,026 INFO  datanode.DataNode 
> (BlockPoolManager.java:refreshNamenodes(149)) - Refresh request received for 
> nameservices: null
> 2021-03-24 05:58:27,033 WARN  datanode.DataNode 
> (BlockPoolManager.java:refreshNamenodes(161)) - Unable to get NameNode 
> addresses.
> ...
> 2021-03-24 05:58:27,077 ERROR datanode.DataNode 
> (DataNode.java:secureMain(2883)) - Exception in secureMain
> java.io.IOException: No services to connect, missing NameNode address.
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.Java:165)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1440)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:500)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2782)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2690)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2732)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2876)
>   at 
> org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:100)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
> {code}
> In this case, the issue was an exception thrown in 
> DFSUtil.getNNServiceRpcAddressesForCluster(...) but there are a couple of 
> scenarios within it which can cause an exception, so its difficult to figure 
> out what is wrong with the config.
> We should simple add the exception onto the existing log message when an 
> error occurs so it is clear what caused it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15919) BlockPoolManager should log stack trace if unable to get Namenode addresses

2021-03-24 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15919:
---
Fix Version/s: 3.1.5

> BlockPoolManager should log stack trace if unable to get Namenode addresses
> ---
>
> Key: HDFS-15919
> URL: https://issues.apache.org/jira/browse/HDFS-15919
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
> Attachments: HDFS-15919.001.patch
>
>
> If the hdfs config is badly configured, the datanode can fail to start with 
> this stack trace:
> {code}
> 2021-03-24 05:58:27,026 INFO  datanode.DataNode 
> (BlockPoolManager.java:refreshNamenodes(149)) - Refresh request received for 
> nameservices: null
> 2021-03-24 05:58:27,033 WARN  datanode.DataNode 
> (BlockPoolManager.java:refreshNamenodes(161)) - Unable to get NameNode 
> addresses.
> ...
> 2021-03-24 05:58:27,077 ERROR datanode.DataNode 
> (DataNode.java:secureMain(2883)) - Exception in secureMain
> java.io.IOException: No services to connect, missing NameNode address.
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.Java:165)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1440)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:500)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2782)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2690)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2732)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2876)
>   at 
> org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:100)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
> {code}
> In this case, the issue was an exception thrown in 
> DFSUtil.getNNServiceRpcAddressesForCluster(...) but there are a couple of 
> scenarios within it which can cause an exception, so its difficult to figure 
> out what is wrong with the config.
> We should simple add the exception onto the existing log message when an 
> error occurs so it is clear what caused it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15919) BlockPoolManager should log stack trace if unable to get Namenode addresses

2021-03-24 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-15919:
---
Fix Version/s: 3.2.3
   3.4.0
   3.3.1
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

LGTM, +1. Committed to trunk and cherry-pick clean to branch-3.3 and branch-3.2.
Thanks [~sodonnell] for your contributions! Thanks [~vjasani] and [~ayushtkn] 
for your reviews!

> BlockPoolManager should log stack trace if unable to get Namenode addresses
> ---
>
> Key: HDFS-15919
> URL: https://issues.apache.org/jira/browse/HDFS-15919
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15919.001.patch
>
>
> If the hdfs config is badly configured, the datanode can fail to start with 
> this stack trace:
> {code}
> 2021-03-24 05:58:27,026 INFO  datanode.DataNode 
> (BlockPoolManager.java:refreshNamenodes(149)) - Refresh request received for 
> nameservices: null
> 2021-03-24 05:58:27,033 WARN  datanode.DataNode 
> (BlockPoolManager.java:refreshNamenodes(161)) - Unable to get NameNode 
> addresses.
> ...
> 2021-03-24 05:58:27,077 ERROR datanode.DataNode 
> (DataNode.java:secureMain(2883)) - Exception in secureMain
> java.io.IOException: No services to connect, missing NameNode address.
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.Java:165)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1440)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:500)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2782)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2690)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2732)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2876)
>   at 
> org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:100)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
> {code}
> In this case, the issue was an exception thrown in 
> DFSUtil.getNNServiceRpcAddressesForCluster(...) but there are a couple of 
> scenarios within it which can cause an exception, so its difficult to figure 
> out what is wrong with the config.
> We should simple add the exception onto the existing log message when an 
> error occurs so it is clear what caused it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=571623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571623
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 25/Mar/21 03:34
Start Date: 25/Mar/21 03:34
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao edited a comment on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-806336802


   @functioner would you mind to add unit test to cover this improvement?
   cc @jojochuang @ayushtkn @linyiqun I believe this is a great improvement 
especially for heavy response. Would you mind to take another review. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 571623)
Time Spent: 40m  (was: 0.5h)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();   // line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
> // ...
>   }{code}
>     If the sendResponse operation in line 365 gets stuck, then the whole 
> FSEditLogAsync thread is not able to proceed. In this case, the critical 
> logSync (line 243) can’t be executed, for the incoming transactions. Then the 
> namenode hangs. This is undesirable because FSEditLogAsync’s key feature is 
> asynchronous edit logging that is supposed to tolerate slow I/O.
>     To see why the sendResponse operation in line 365 may get stuck, here is 
> the stack trace:
> {code:java}
>  '(org.apache.hadoop.ipc.Server,channelWrite,3593)',
>  '(org.apache.hadoop.ipc.Server,access$1700,139)',
>  '(org.apache.hadoop.ipc.Server$Responder,processResponse,1657)',
>  

[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=571618=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571618
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 25/Mar/21 03:28
Start Date: 25/Mar/21 03:28
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao commented on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-806336802


   @functioner would you mind to add unit test to cover this improvement?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 571618)
Time Spent: 0.5h  (was: 20m)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();   // line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
> // ...
>   }{code}
>     If the sendResponse operation in line 365 gets stuck, then the whole 
> FSEditLogAsync thread is not able to proceed. In this case, the critical 
> logSync (line 243) can’t be executed, for the incoming transactions. Then the 
> namenode hangs. This is undesirable because FSEditLogAsync’s key feature is 
> asynchronous edit logging that is supposed to tolerate slow I/O.
>     To see why the sendResponse operation in line 365 may get stuck, here is 
> the stack trace:
> {code:java}
>  '(org.apache.hadoop.ipc.Server,channelWrite,3593)',
>  '(org.apache.hadoop.ipc.Server,access$1700,139)',
>  '(org.apache.hadoop.ipc.Server$Responder,processResponse,1657)',
>  '(org.apache.hadoop.ipc.Server$Responder,doRespond,1727)',
>  '(org.apache.hadoop.ipc.Server$Connection,sendResponse,2828)',
>  

[jira] [Updated] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2021-03-24 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15160:
-
Status: Patch Available  (was: Reopened)

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15160-branch-3.3-001.patch, HDFS-15160.001.patch, 
> HDFS-15160.002.patch, HDFS-15160.003.patch, HDFS-15160.004.patch, 
> HDFS-15160.005.patch, HDFS-15160.006.patch, HDFS-15160.007.patch, 
> HDFS-15160.008.patch, HDFS-15160.branch-3-3.001.patch, 
> image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2021-03-24 Thread Akira Ajisaka (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-15160:
-
Attachment: HDFS-15160-branch-3.3-001.patch

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15160-branch-3.3-001.patch, HDFS-15160.001.patch, 
> HDFS-15160.002.patch, HDFS-15160.003.patch, HDFS-15160.004.patch, 
> HDFS-15160.005.patch, HDFS-15160.006.patch, HDFS-15160.007.patch, 
> HDFS-15160.008.patch, HDFS-15160.branch-3-3.001.patch, 
> image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15916) Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff

2021-03-24 Thread Jason Wen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308221#comment-17308221
 ] 

Jason Wen commented on HDFS-15916:
--

Is it supposed to be supported between Hadoop 3.x client and 2.x server?

In this case, distcp source is Hadoop 3 HDFS client and destination is Hadoop 2 
HDFS server.

> Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for 
> snapshotdiff
> 
>
> Key: HDFS-15916
> URL: https://issues.apache.org/jira/browse/HDFS-15916
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 3.2.2
>Reporter: Srinivasu Majeti
>Priority: Major
>
> Looks like when using distcp diff options between two snapshots from a hadoop 
> 3 cluster to hadoop 2 cluster , we get below exception and seems to be break 
> backward compatibility due to new API introduction 
> getSnapshotDiffReportListing.
>  
> {code:java}
> hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method getSnapshotDiffReportListing called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-03-24 Thread Haoze Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haoze Wu updated HDFS-15869:

Description: 
    We were doing some testing of the latest Hadoop stable release 3.2.2 and 
found some network issue can cause the namenode to hang even with the async 
edit logging (FSEditLogAsync).

    The workflow of the FSEditLogAsync thread is basically:
 # get EditLog from a queue (line 229)
 # do the transaction (line 232)
 # sync the log if doSync (line 243)
 # do logSyncNotify (line 248)

{code:java}
//hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java

  @Override
  public void run() {
try {
  while (true) {
boolean doSync;
Edit edit = dequeueEdit(); // 
line 229
if (edit != null) {
  // sync if requested by edit log.
  doSync = edit.logEdit(); // 
line 232
  syncWaitQ.add(edit);
} else {
  // sync when editq runs dry, but have edits pending a sync.
  doSync = !syncWaitQ.isEmpty();
}
if (doSync) {
  // normally edit log exceptions cause the NN to terminate, but tests
  // relying on ExitUtil.terminate need to see the exception.
  RuntimeException syncEx = null;
  try {
logSync(getLastWrittenTxId()); // 
line 243
  } catch (RuntimeException ex) {
syncEx = ex;
  }
  while ((edit = syncWaitQ.poll()) != null) {
edit.logSyncNotify(syncEx);// 
line 248
  }
}
  }
} catch (InterruptedException ie) {
  LOG.info(Thread.currentThread().getName() + " was interrupted, exiting");
} catch (Throwable t) {
  terminate(t);
}
  }
{code}
    In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is essentially 
doing some network write (line 365).
{code:java}
//hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java

  private static class RpcEdit extends Edit {
// ...
@Override
public void logSyncNotify(RuntimeException syncEx) {
  try {
if (syncEx == null) {
  call.sendResponse();   // line 365
} else {
  call.abortResponse(syncEx);
}
  } catch (Exception e) {} // don't care if not sent.
}
// ...
  }{code}
    If the sendResponse operation in line 365 gets stuck, then the whole 
FSEditLogAsync thread is not able to proceed. In this case, the critical 
logSync (line 243) can’t be executed, for the incoming transactions. Then the 
namenode hangs. This is undesirable because FSEditLogAsync’s key feature is 
asynchronous edit logging that is supposed to tolerate slow I/O.

    To see why the sendResponse operation in line 365 may get stuck, here is 
the stack trace:
{code:java}
 '(org.apache.hadoop.ipc.Server,channelWrite,3593)',
 '(org.apache.hadoop.ipc.Server,access$1700,139)',
 '(org.apache.hadoop.ipc.Server$Responder,processResponse,1657)',
 '(org.apache.hadoop.ipc.Server$Responder,doRespond,1727)',
 '(org.apache.hadoop.ipc.Server$Connection,sendResponse,2828)',
 '(org.apache.hadoop.ipc.Server$Connection,access$300,1799)',
 '(org.apache.hadoop.ipc.Server$RpcCall,doResponse,)',
 '(org.apache.hadoop.ipc.Server$Call,doResponse,903)',
 '(org.apache.hadoop.ipc.Server$Call,sendResponse,889)',
 
'(org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$RpcEdit,logSyncNotify,365)',
 '(org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync,run,248)',
 '(java.lang.Thread,run,748)'
{code}
 The `channelWrite` function is defined as follows:
{code:java}
//hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java

  private int channelWrite(WritableByteChannel channel,
   ByteBuffer buffer) throws IOException {

int count =  (buffer.remaining() <= NIO_BUFFER_LIMIT) ?
 channel.write(buffer) : channelIO(null, channel, buffer);  // 
line 3594
if (count > 0) {
  rpcMetrics.incrSentBytes(count);
}
return count;
  }{code}
    The `channel.write(buffer)` operation in line 3594 may be slow. Although 
for this specific stack trace, the channel is initialized in the non-blocking 
mode, there is still a chance of being slow depending on native write 
implementation in the OS (e.g., a kernel issue). Furthermore, the channelIO 
invocation in line 3594 may also get stuck, since it waits until the buffer is 
drained:
{code:java}
//hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java

  private static int channelIO(...) throws IOException {
//...
while (buf.remaining() > 0) {

[jira] [Work logged] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?focusedWorklogId=571531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571531
 ]

ASF GitHub Bot logged work on HDFS-15869:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 22:14
Start Date: 24/Mar/21 22:14
Worklog Time Spent: 10m 
  Work Description: functioner commented on pull request #2737:
URL: https://github.com/apache/hadoop/pull/2737#issuecomment-806219473


   Thanks @Hexiaoqiao for your review. I have added the multi-threaded executor 
for it.
   
   For the exception handling part, I observe the original semantics of 
`RpcEdit`
   
https://github.com/apache/hadoop/blob/aaedc51d8783c2808563d5c8b51b68ab79e19820/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java#L379-L387
   and `SyncEdit`
   
https://github.com/apache/hadoop/blob/aaedc51d8783c2808563d5c8b51b68ab79e19820/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java#L347-L353
   Therefore, we don't need to add any extra handling mechanism.
   
   I have checked the test of `TestEditLogRace` and `TestEditLog` in my local 
machine. It passes all the tests. Let's see if there are other tests that fail.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 571531)
Time Spent: 20m  (was: 10m)

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *Description*
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();   // line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
> // ...
>   }{code}
>     If the sendResponse operation in line 365 gets stuck, then the whole 
> FSEditLogAsync 

[jira] [Commented] (HDFS-15919) BlockPoolManager should log stack trace if unable to get Namenode addresses

2021-03-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308195#comment-17308195
 ] 

Hadoop QA commented on HDFS-15919:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 
44s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
13s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
19s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
24s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 21s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
26s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 20m 
44s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  3m  
5s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
12s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
12s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
9s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
9s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 38s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| 

[jira] [Updated] (HDFS-15456) TestExternalStoragePolicySatisfier fails intermittently

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15456:
---
Fix Version/s: 3.3.1

> TestExternalStoragePolicySatisfier fails intermittently
> ---
>
> Key: HDFS-15456
> URL: https://issues.apache.org/jira/browse/HDFS-15456
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ahmed Hussein
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available, test
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {{TestExternalStoragePolicySatisfier}} frequently times-out on hadoop trunk 
> {code:bash}
> [ERROR] Tests run: 28, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 421.443 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier
> [ERROR] 
> testChooseInSameDatanodeWithONESSDShouldNotChooseIfNoSpace(org.apache.hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier)
>   Time elapsed: 43.983 s  <<< ERROR!
> java.util.concurrent.TimeoutException: 
> Timed out waiting for condition. Thread diagnostics:
> Timestamp: 2020-07-07 07:51:10,267
> "IPC Server handler 4 on default port 44933" daemon prio=5 tid=1138 
> timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
> at 
> org.apache.hadoop.ipc.CallQueueManager.take(CallQueueManager.java:307)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2918)
> "ForkJoinPool-2-worker-19" daemon prio=5 tid=235 in Object.wait()
> java.lang.Thread.State: WAITING (on object monitor)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.ForkJoinPool.awaitWork(ForkJoinPool.java:1824)
> at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1693)
> at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
> "refreshUsed-/home/jenkins/jenkins-slave/workspace/PreCommit-HADOOP-Build/sourcedir/hadoop-hdfs-project/hadoop-hdfs/target/test/data/4/dfs/data/data1/current/BP-912129709-172.17.0.2-1594151429636"
>  daemon prio=5 tid=1217 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at java.lang.Thread.sleep(Native Method)
> at 
> org.apache.hadoop.fs.CachingGetSpaceUsed$RefreshThread.run(CachingGetSpaceUsed.java:205)
> at java.lang.Thread.run(Thread.java:748)
> "Socket Reader #1 for port 0" daemon prio=5 tid=1192 runnable
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
> at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:1273)
> at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:1252)
> "pool-90-thread-1"  prio=5 tid=1069 timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> "IPC Server handler 2 on default port 37995" daemon prio=5 tid=1169 
> timed_waiting
> java.lang.Thread.State: TIMED_WAITING
> at sun.misc.Unsafe.park(Native Method)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> at 
> 

[jira] [Work logged] (HDFS-15909) Make fnmatch cross platform

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15909?focusedWorklogId=571446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571446
 ]

ASF GitHub Bot logged work on HDFS-15909:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 19:48
Start Date: 24/Mar/21 19:48
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2792:
URL: https://github.com/apache/hadoop/pull/2792#issuecomment-806127546


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 35s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   2m 38s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   2m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  mvnsite  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  51m 42s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | -1 :x: |  cc  |   2m 27s | 
[/results-compile-cc-hadoop-hdfs-project_hadoop-hdfs-native-client-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2792/4/artifact/out/results-compile-cc-hadoop-hdfs-project_hadoop-hdfs-native-client-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt)
 |  
hadoop-hdfs-project_hadoop-hdfs-native-client-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 3 new + 84 unchanged 
- 3 fixed = 87 total (was 87)  |
   | +1 :green_heart: |  golang  |   2m 27s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | -1 :x: |  cc  |   2m 30s | 
[/results-compile-cc-hadoop-hdfs-project_hadoop-hdfs-native-client-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2792/4/artifact/out/results-compile-cc-hadoop-hdfs-project_hadoop-hdfs-native-client-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt)
 |  
hadoop-hdfs-project_hadoop-hdfs-native-client-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 generated 6 new 
+ 81 unchanged - 6 fixed = 87 total (was 87)  |
   | +1 :green_heart: |  golang  |   2m 30s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  mvnsite  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  13m 11s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  34m 31s |  |  hadoop-hdfs-native-client in 
the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 30s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 108m 25s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2792/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2792 |
   | Optional Tests | dupname asflicense compile cc mvnsite javac unit 
codespell golang |
   | uname | Linux 9b15851551f3 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 8eadda4516d7ee6f8de016a8c402e7d71f3b7e83 |
   | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
   | Multi-JDK versions | 

[jira] [Commented] (HDFS-15918) Replace RAND_pseudo_bytes in sasl_digest_md5.cc

2021-03-24 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308064#comment-17308064
 ] 

Íñigo Goiri commented on HDFS-15918:


Thanks [~gautham] for the fixes.
Merged the PR.

> Replace RAND_pseudo_bytes in sasl_digest_md5.cc
> ---
>
> Key: HDFS-15918
> URL: https://issues.apache.org/jira/browse/HDFS-15918
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> RAND_pseudo_bytes was deprecated in OpenSSL 1.1.1. We get the following 
> warning during compilation that it's deprecated -
> {code}
> [WARNING] 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:97:74:
>  warning: 'int RAND_pseudo_bytes(unsigned char*, int)' is deprecated 
> [-Wdeprecated-declarations]
> [WARNING]  from 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:20:
> [WARNING] /usr/include/openssl/rand.h:44:1: note: declared here
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15918) Replace RAND_pseudo_bytes in sasl_digest_md5.cc

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15918?focusedWorklogId=571358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571358
 ]

ASF GitHub Bot logged work on HDFS-15918:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 17:52
Start Date: 24/Mar/21 17:52
Worklog Time Spent: 10m 
  Work Description: goiri merged pull request #2811:
URL: https://github.com/apache/hadoop/pull/2811


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 571358)
Time Spent: 50m  (was: 40m)

> Replace RAND_pseudo_bytes in sasl_digest_md5.cc
> ---
>
> Key: HDFS-15918
> URL: https://issues.apache.org/jira/browse/HDFS-15918
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> RAND_pseudo_bytes was deprecated in OpenSSL 1.1.1. We get the following 
> warning during compilation that it's deprecated -
> {code}
> [WARNING] 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:97:74:
>  warning: 'int RAND_pseudo_bytes(unsigned char*, int)' is deprecated 
> [-Wdeprecated-declarations]
> [WARNING]  from 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:20:
> [WARNING] /usr/include/openssl/rand.h:44:1: note: declared here
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15918) Replace RAND_pseudo_bytes in sasl_digest_md5.cc

2021-03-24 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri resolved HDFS-15918.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Replace RAND_pseudo_bytes in sasl_digest_md5.cc
> ---
>
> Key: HDFS-15918
> URL: https://issues.apache.org/jira/browse/HDFS-15918
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> RAND_pseudo_bytes was deprecated in OpenSSL 1.1.1. We get the following 
> warning during compilation that it's deprecated -
> {code}
> [WARNING] 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:97:74:
>  warning: 'int RAND_pseudo_bytes(unsigned char*, int)' is deprecated 
> [-Wdeprecated-declarations]
> [WARNING]  from 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:20:
> [WARNING] /usr/include/openssl/rand.h:44:1: note: declared here
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15911) Provide blocks moved count in Balancer iteration result

2021-03-24 Thread Mingliang Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-15911:
-
Component/s: balancer & mover

> Provide blocks moved count in Balancer iteration result
> ---
>
> Key: HDFS-15911
> URL: https://issues.apache.org/jira/browse/HDFS-15911
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Balancer provides Result for iteration and it contains info like exitStatus, 
> bytesLeftToMove, bytesBeingMoved etc. We should also provide blocksMoved 
> count from NameNodeConnector and print it with rest of details in 
> Result#print().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15911) Provide blocks moved count in Balancer iteration result

2021-03-24 Thread Mingliang Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-15911:
-
Fix Version/s: 3.2.3
   3.1.5
   3.4.0
   3.3.1
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged to all Hadoop 3 branches. Thanks!

> Provide blocks moved count in Balancer iteration result
> ---
>
> Key: HDFS-15911
> URL: https://issues.apache.org/jira/browse/HDFS-15911
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Balancer provides Result for iteration and it contains info like exitStatus, 
> bytesLeftToMove, bytesBeingMoved etc. We should also provide blocksMoved 
> count from NameNodeConnector and print it with rest of details in 
> Result#print().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2021-03-24 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308030#comment-17308030
 ] 

Stephen O'Donnell commented on HDFS-15160:
--

[~weichiu] Could you give the branch-3.3 PR a quick review. There were a couple 
of conflicts caused by new changes on trunk, but easy to resolve.

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, HDFS-15160.007.patch, HDFS-15160.008.patch, 
> HDFS-15160.branch-3-3.001.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15911) Provide blocks moved count in Balancer iteration result

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15911?focusedWorklogId=571329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571329
 ]

ASF GitHub Bot logged work on HDFS-15911:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 17:20
Start Date: 24/Mar/21 17:20
Worklog Time Spent: 10m 
  Work Description: liuml07 merged pull request #2799:
URL: https://github.com/apache/hadoop/pull/2799


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 571329)
Time Spent: 5h 20m  (was: 5h 10m)

> Provide blocks moved count in Balancer iteration result
> ---
>
> Key: HDFS-15911
> URL: https://issues.apache.org/jira/browse/HDFS-15911
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Balancer provides Result for iteration and it contains info like exitStatus, 
> bytesLeftToMove, bytesBeingMoved etc. We should also provide blocksMoved 
> count from NameNodeConnector and print it with rest of details in 
> Result#print().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15911) Provide blocks moved count in Balancer iteration result

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15911?focusedWorklogId=571322=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571322
 ]

ASF GitHub Bot logged work on HDFS-15911:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 17:14
Start Date: 24/Mar/21 17:14
Worklog Time Spent: 10m 
  Work Description: liuml07 merged pull request #2797:
URL: https://github.com/apache/hadoop/pull/2797


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 571322)
Time Spent: 5h 10m  (was: 5h)

> Provide blocks moved count in Balancer iteration result
> ---
>
> Key: HDFS-15911
> URL: https://issues.apache.org/jira/browse/HDFS-15911
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Balancer provides Result for iteration and it contains info like exitStatus, 
> bytesLeftToMove, bytesBeingMoved etc. We should also provide blocksMoved 
> count from NameNodeConnector and print it with rest of details in 
> Result#print().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15160?focusedWorklogId=571283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571283
 ]

ASF GitHub Bot logged work on HDFS-15160:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 16:42
Start Date: 24/Mar/21 16:42
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2813:
URL: https://github.com/apache/hadoop/pull/2813#issuecomment-805981816


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  18m 11s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ branch-3.3 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 44s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  checkstyle  |   0m 53s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 18s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  spotbugs  |   3m  2s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  shadedclient  |  18m 14s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 10s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  3s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m  3s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m  7s |  |  the patch passed  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  javadoc  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   3m  7s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  18m  5s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 185m  4s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2813/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 38s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 287m  1s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestGetFileChecksum |
   |   | hadoop.hdfs.TestReconstructStripedFile |
   |   | hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy |
   |   | hadoop.hdfs.TestStripedFileAppend |
   |   | hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2813/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2813 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell xml |
   | uname | Linux c545657525e7 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.3 / a5783af06c7456e31eb31d7f64ea81c6511df2dc |
   | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~18.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2813/1/testReport/ |
   | Max. process+thread count | 3422 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2813/1/console |
   | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog 

[jira] [Commented] (HDFS-15919) BlockPoolManager should log stack trace if unable to get Namenode addresses

2021-03-24 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307976#comment-17307976
 ] 

Ayush Saxena commented on HDFS-15919:
-

+1

> BlockPoolManager should log stack trace if unable to get Namenode addresses
> ---
>
> Key: HDFS-15919
> URL: https://issues.apache.org/jira/browse/HDFS-15919
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15919.001.patch
>
>
> If the hdfs config is badly configured, the datanode can fail to start with 
> this stack trace:
> {code}
> 2021-03-24 05:58:27,026 INFO  datanode.DataNode 
> (BlockPoolManager.java:refreshNamenodes(149)) - Refresh request received for 
> nameservices: null
> 2021-03-24 05:58:27,033 WARN  datanode.DataNode 
> (BlockPoolManager.java:refreshNamenodes(161)) - Unable to get NameNode 
> addresses.
> ...
> 2021-03-24 05:58:27,077 ERROR datanode.DataNode 
> (DataNode.java:secureMain(2883)) - Exception in secureMain
> java.io.IOException: No services to connect, missing NameNode address.
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.Java:165)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1440)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:500)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2782)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2690)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2732)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2876)
>   at 
> org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:100)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
> {code}
> In this case, the issue was an exception thrown in 
> DFSUtil.getNNServiceRpcAddressesForCluster(...) but there are a couple of 
> scenarios within it which can cause an exception, so its difficult to figure 
> out what is wrong with the config.
> We should simple add the exception onto the existing log message when an 
> error occurs so it is clear what caused it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15919) BlockPoolManager should log stack trace if unable to get Namenode addresses

2021-03-24 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15919:

Status: Patch Available  (was: Open)

> BlockPoolManager should log stack trace if unable to get Namenode addresses
> ---
>
> Key: HDFS-15919
> URL: https://issues.apache.org/jira/browse/HDFS-15919
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15919.001.patch
>
>
> If the hdfs config is badly configured, the datanode can fail to start with 
> this stack trace:
> {code}
> 2021-03-24 05:58:27,026 INFO  datanode.DataNode 
> (BlockPoolManager.java:refreshNamenodes(149)) - Refresh request received for 
> nameservices: null
> 2021-03-24 05:58:27,033 WARN  datanode.DataNode 
> (BlockPoolManager.java:refreshNamenodes(161)) - Unable to get NameNode 
> addresses.
> ...
> 2021-03-24 05:58:27,077 ERROR datanode.DataNode 
> (DataNode.java:secureMain(2883)) - Exception in secureMain
> java.io.IOException: No services to connect, missing NameNode address.
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.Java:165)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1440)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:500)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2782)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2690)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2732)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2876)
>   at 
> org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:100)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
> {code}
> In this case, the issue was an exception thrown in 
> DFSUtil.getNNServiceRpcAddressesForCluster(...) but there are a couple of 
> scenarios within it which can cause an exception, so its difficult to figure 
> out what is wrong with the config.
> We should simple add the exception onto the existing log message when an 
> error occurs so it is clear what caused it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15919) BlockPoolManager should log stack trace if unable to get Namenode addresses

2021-03-24 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307880#comment-17307880
 ] 

Viraj Jasani commented on HDFS-15919:
-

+1 (non-binding)

> BlockPoolManager should log stack trace if unable to get Namenode addresses
> ---
>
> Key: HDFS-15919
> URL: https://issues.apache.org/jira/browse/HDFS-15919
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15919.001.patch
>
>
> If the hdfs config is badly configured, the datanode can fail to start with 
> this stack trace:
> {code}
> 2021-03-24 05:58:27,026 INFO  datanode.DataNode 
> (BlockPoolManager.java:refreshNamenodes(149)) - Refresh request received for 
> nameservices: null
> 2021-03-24 05:58:27,033 WARN  datanode.DataNode 
> (BlockPoolManager.java:refreshNamenodes(161)) - Unable to get NameNode 
> addresses.
> ...
> 2021-03-24 05:58:27,077 ERROR datanode.DataNode 
> (DataNode.java:secureMain(2883)) - Exception in secureMain
> java.io.IOException: No services to connect, missing NameNode address.
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.Java:165)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1440)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:500)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2782)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2690)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2732)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2876)
>   at 
> org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:100)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
> {code}
> In this case, the issue was an exception thrown in 
> DFSUtil.getNNServiceRpcAddressesForCluster(...) but there are a couple of 
> scenarios within it which can cause an exception, so its difficult to figure 
> out what is wrong with the config.
> We should simple add the exception onto the existing log message when an 
> error occurs so it is clear what caused it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15911) Provide blocks moved count in Balancer iteration result

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15911?focusedWorklogId=571201=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571201
 ]

ASF GitHub Bot logged work on HDFS-15911:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 14:34
Start Date: 24/Mar/21 14:34
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2799:
URL: https://github.com/apache/hadoop/pull/2799#issuecomment-805872201


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |  11m 18s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any 
@author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 
1 new or modified test files.  |
   ||| _ branch-3.1 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  27m 53s |  branch-3.1 passed  |
   | +1 :green_heart: |  compile  |   1m  7s |  branch-3.1 passed  |
   | +1 :green_heart: |  checkstyle  |   0m 50s |  branch-3.1 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 11s |  branch-3.1 passed  |
   | +1 :green_heart: |  shadedclient  |  14m 21s |  branch has no errors when 
building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  branch-3.1 passed  |
   | +0 :ok: |  spotbugs  |   2m 50s |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | +1 :green_heart: |  findbugs  |   2m 48s |  branch-3.1 passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 10s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 59s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 59s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 49s |  
hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 229 unchanged - 4 
fixed = 229 total (was 233)  |
   | +1 :green_heart: |  mvnsite  |   1m  6s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace 
issues.  |
   | +1 :green_heart: |  shadedclient  |  13m  9s |  patch has no errors when 
building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  the patch passed  |
   | +1 :green_heart: |  findbugs  |   2m 51s |  the patch passed  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  | 169m 46s |  hadoop-hdfs in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 42s |  The patch does not generate 
ASF License warnings.  |
   |  |   | 253m  2s |   |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
   |   | hadoop.fs.TestHdfsNativeCodeLoader |
   |   | hadoop.hdfs.server.namenode.TestEditLogRace |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2799/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2799 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 40fa88644a2b 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.1 / 332c2a6 |
   | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~16.04-b08 |
   | unit | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2799/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2799/5/testReport/ |
   | Max. process+thread count | 2919 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2799/5/console |
   | versions | git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 571201)
Time Spent: 5h  (was: 4h 50m)

> Provide blocks moved count in Balancer iteration result
> ---
>
> Key: HDFS-15911
> 

[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2021-03-24 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307823#comment-17307823
 ] 

Stephen O'Donnell commented on HDFS-15160:
--

After backporting this to branch-3.3, we also need to backport 

  HDFS-15457 TestFsDatasetImpl fails intermittently
  HDFS-15818 Fix TestFsDatasetImpl.testReadLockCanBeDisabledByConfig 

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, HDFS-15160.007.patch, HDFS-15160.008.patch, 
> HDFS-15160.branch-3-3.001.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15907) Reduce Memory Overhead of AclFeature by avoiding AtomicInteger

2021-03-24 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307804#comment-17307804
 ] 

Stephen O'Donnell commented on HDFS-15907:
--

Thanks for committing [~ayushtkn]. The issue reported in HDFS-15792 is only 
relevant when parallel image loading is enabled. This was delivered in 
HDFS-14617, which is only on 3.3, trunk and 2.10. Therefore I think its fine 
that HDFS-15792 and this one is only on branch-3.3.

The branch-2.10 version of HDFS-15792 is different and doesn't use AtomicInt 
for Java 7 compatibility reasons.

> Reduce Memory Overhead of AclFeature by avoiding AtomicInteger
> --
>
> Key: HDFS-15907
> URL: https://issues.apache.org/jira/browse/HDFS-15907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15907.001.patch
>
>
> In HDFS-15792 we made some changes to the AclFeature and ReferenceCountedMap 
> classes to address a rare bug when loading the FSImage in parallel.
> One change we made was to replace an int inside AclFeature with an 
> AtomicInteger to avoid synchronising the methods in AclFeature.
> Discussing this change with [~weichiu], he pointed out that while the 
> AclFeature cache is intended to reduce the count of AclFeature objects, on a 
> large cluster, it is possible for there to be many millions of AclFeature 
> objects.
> Previously, the int will have taken 4 bytes of heap.
> By moving to a AtomicInteger, we probably have an overhead of:
>  4 bytes (or 8 if the heap is over 32GB) for a reference to the atomic long 
> object
>  12 byte overhead for the java object
>  4 bytes inside the atomic long to store an int.
>  
> So the total heap overhead has gone from 4 bytes to 20 bytes just to use an 
> AtomicInteger.
> Therefore I think it makes sense to remove the AtomicInteger and just 
> synchronise the methods of AclFeature where the value is incremented / 
> decremented / retrieved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15919) BlockPoolManager should log stack trace if unable to get Namenode addresses

2021-03-24 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-15919:
-
Attachment: HDFS-15919.001.patch

> BlockPoolManager should log stack trace if unable to get Namenode addresses
> ---
>
> Key: HDFS-15919
> URL: https://issues.apache.org/jira/browse/HDFS-15919
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15919.001.patch
>
>
> If the hdfs config is badly configured, the datanode can fail to start with 
> this stack trace:
> {code}
> 2021-03-24 05:58:27,026 INFO  datanode.DataNode 
> (BlockPoolManager.java:refreshNamenodes(149)) - Refresh request received for 
> nameservices: null
> 2021-03-24 05:58:27,033 WARN  datanode.DataNode 
> (BlockPoolManager.java:refreshNamenodes(161)) - Unable to get NameNode 
> addresses.
> ...
> 2021-03-24 05:58:27,077 ERROR datanode.DataNode 
> (DataNode.java:secureMain(2883)) - Exception in secureMain
> java.io.IOException: No services to connect, missing NameNode address.
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.Java:165)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1440)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:500)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2782)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2690)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2732)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2876)
>   at 
> org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:100)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
> {code}
> In this case, the issue was an exception thrown in 
> DFSUtil.getNNServiceRpcAddressesForCluster(...) but there are a couple of 
> scenarios within it which can cause an exception, so its difficult to figure 
> out what is wrong with the config.
> We should simple add the exception onto the existing log message when an 
> error occurs so it is clear what caused it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15919) BlockPoolManager should log stack trace if unable to get Namenode addresses

2021-03-24 Thread Stephen O'Donnell (Jira)
Stephen O'Donnell created HDFS-15919:


 Summary: BlockPoolManager should log stack trace if unable to get 
Namenode addresses
 Key: HDFS-15919
 URL: https://issues.apache.org/jira/browse/HDFS-15919
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.4.0
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


If the hdfs config is badly configured, the datanode can fail to start with 
this stack trace:

{code}
2021-03-24 05:58:27,026 INFO  datanode.DataNode 
(BlockPoolManager.java:refreshNamenodes(149)) - Refresh request received for 
nameservices: null
2021-03-24 05:58:27,033 WARN  datanode.DataNode 
(BlockPoolManager.java:refreshNamenodes(161)) - Unable to get NameNode 
addresses.
...
2021-03-24 05:58:27,077 ERROR datanode.DataNode 
(DataNode.java:secureMain(2883)) - Exception in secureMain
java.io.IOException: No services to connect, missing NameNode address.
at 
org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.Java:165)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1440)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:500)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2782)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2690)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2732)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2876)
at 
org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
{code}

In this case, the issue was an exception thrown in 
DFSUtil.getNNServiceRpcAddressesForCluster(...) but there are a couple of 
scenarios within it which can cause an exception, so its difficult to figure 
out what is wrong with the config.

We should simple add the exception onto the existing log message when an error 
occurs so it is clear what caused it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15160?focusedWorklogId=571094=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571094
 ]

ASF GitHub Bot logged work on HDFS-15160:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 11:54
Start Date: 24/Mar/21 11:54
Worklog Time Spent: 10m 
  Work Description: sodonnel opened a new pull request #2813:
URL: https://github.com/apache/hadoop/pull/2813


   PR to backport HDFS-15160 to branch-3.3. This is already committed on trunk 
some time back.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 571094)
Remaining Estimate: 0h
Time Spent: 10m

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, HDFS-15160.007.patch, HDFS-15160.008.patch, 
> HDFS-15160.branch-3-3.001.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-15160:
--
Labels: pull-request-available  (was: )

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, HDFS-15160.007.patch, HDFS-15160.008.patch, 
> HDFS-15160.branch-3-3.001.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2021-03-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307764#comment-17307764
 ] 

Hadoop QA commented on HDFS-15160:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 14s{color} 
| {color:red}{color} | {color:red} HDFS-15160 does not apply to trunk. Rebase 
required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for 
help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-15160 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13022895/HDFS-15160.branch-3-3.001.patch
 |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/555/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, HDFS-15160.007.patch, HDFS-15160.008.patch, 
> HDFS-15160.branch-3-3.001.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2021-03-24 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-15160:
-
Attachment: HDFS-15160.branch-3-3.001.patch

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, HDFS-15160.007.patch, HDFS-15160.008.patch, 
> HDFS-15160.branch-3-3.001.patch, image-2020-04-10-17-18-08-128.png, 
> image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2021-03-24 Thread Stephen O'Donnell (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell reopened HDFS-15160:
--

Reopening to backport to branch-3.3

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, HDFS-15160.007.patch, HDFS-15160.008.patch, 
> image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15911) Provide blocks moved count in Balancer iteration result

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15911?focusedWorklogId=571045=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571045
 ]

ASF GitHub Bot logged work on HDFS-15911:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 10:09
Start Date: 24/Mar/21 10:09
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2797:
URL: https://github.com/apache/hadoop/pull/2797#issuecomment-805670478


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 46s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ branch-3.2 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  27m 16s |  |  branch-3.2 passed  |
   | +1 :green_heart: |  compile  |   1m  1s |  |  branch-3.2 passed  |
   | +1 :green_heart: |  checkstyle  |   0m 47s |  |  branch-3.2 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 13s |  |  branch-3.2 passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  branch-3.2 passed  |
   | -1 :x: |  spotbugs  |   2m 49s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2797/3/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html)
 |  hadoop-hdfs-project/hadoop-hdfs in branch-3.2 has 4 extant spotbugs 
warnings.  |
   | +1 :green_heart: |  shadedclient  |  14m 52s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 57s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 57s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 40s |  |  
hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 229 unchanged - 4 
fixed = 229 total (was 233)  |
   | +1 :green_heart: |  mvnsite  |   1m  3s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   2m 51s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  16m  2s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 173m 59s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2797/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 41s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 245m 26s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestRedudantBlocks |
   |   | hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2797/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2797 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux c331f094bdd4 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.2 / 72544e6273b377f41008706a3ef861e5ef5cc13e |
   | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~18.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2797/3/testReport/ |
   | Max. process+thread count | 2986 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2797/3/console |
   | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this 

[jira] [Work logged] (HDFS-15918) Replace RAND_pseudo_bytes in sasl_digest_md5.cc

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15918?focusedWorklogId=571043=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571043
 ]

ASF GitHub Bot logged work on HDFS-15918:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 10:04
Start Date: 24/Mar/21 10:04
Worklog Time Spent: 10m 
  Work Description: GauthamBanasandra commented on pull request #2811:
URL: https://github.com/apache/hadoop/pull/2811#issuecomment-805667461


   This PR doesn't modify any existing functionality and thus no extra unit 
tests are required. The warnings reported in the CI run aren't related to my PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 571043)
Time Spent: 40m  (was: 0.5h)

> Replace RAND_pseudo_bytes in sasl_digest_md5.cc
> ---
>
> Key: HDFS-15918
> URL: https://issues.apache.org/jira/browse/HDFS-15918
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> RAND_pseudo_bytes was deprecated in OpenSSL 1.1.1. We get the following 
> warning during compilation that it's deprecated -
> {code}
> [WARNING] 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:97:74:
>  warning: 'int RAND_pseudo_bytes(unsigned char*, int)' is deprecated 
> [-Wdeprecated-declarations]
> [WARNING]  from 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:20:
> [WARNING] /usr/include/openssl/rand.h:44:1: note: declared here
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15900) RBF: empty blockpool id on dfsrouter caused by UNAVAILABLE NameNode

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15900?focusedWorklogId=571041=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571041
 ]

ASF GitHub Bot logged work on HDFS-15900:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 09:59
Start Date: 24/Mar/21 09:59
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2787:
URL: https://github.com/apache/hadoop/pull/2787#issuecomment-805663042


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 39s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 28s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 14s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  14m 11s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 29s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 17s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   0m 47s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  13m 59s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  |  17m 46s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2787/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  91m 49s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2787/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2787 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 213e3e38c69b 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 6f9fc3eca23e2a1cba7238ff49a8d4ecd7f9c0ef |
   | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2787/4/testReport/ |
   | Max. process+thread count | 2184 (vs. ulimit of 5500) |
   | modules 

[jira] [Work logged] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?focusedWorklogId=571037=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571037
 ]

ASF GitHub Bot logged work on HDFS-15759:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 09:56
Start Date: 24/Mar/21 09:56
Worklog Time Spent: 10m 
  Work Description: touchida commented on pull request #2585:
URL: https://github.com/apache/hadoop/pull/2585#issuecomment-805661393


   @ferhui @runitao @aajisaka @tasanuma Thanks for your kind review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 571037)
Time Spent: 8h 10m  (was: 8h)

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15660) StorageTypeProto is not compatiable between 3.x and 2.6

2021-03-24 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307712#comment-17307712
 ] 

Yiqun Lin edited comment on HDFS-15660 at 3/24/21, 9:48 AM:


Hi [~weichiu], this compatible issue only happened in that old hadoop version 
client doesn't contain the storage type which introduced in HDFS-9806. It's a 
client side issue not the server side. As version 3.1, 3.2 and 3.3 already 
contain the new storage type, it should be okay to do the upgrade. So I don't 
cherry-pick to other branches.


was (Author: linyiqun):
Hi [~weichiu], this compatible issue only happened in that old hadoop version 
client doesn't contain the storage type which introduced in HDFS-9806. It's a 
client side issue not the server side. As version 3.1, 3.2 and 3.3 already 
contain the new storage type, it should be okay to do the upgrade. So I only 
push the fix to trunk.

> StorageTypeProto is not compatiable between 3.x and 2.6
> ---
>
> Key: HDFS-15660
> URL: https://issues.apache.org/jira/browse/HDFS-15660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.0.1, 2.9.2, 2.8.5, 2.7.7, 2.10.1
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Fix For: 2.9.3, 3.4.0, 2.10.2
>
> Attachments: HDFS-15660.002.patch, HDFS-15660.003.patch
>
>
> In our case, when nn has upgraded to 3.1.3 and dn’s version was still 2.6,  
> we found hive to call getContentSummary method , the client and server was 
> not compatible  because of hadoop3 added new PROVIDED storage type.
> {code:java}
> // code placeholder
> 20/04/15 14:28:35 INFO retry.RetryInvocationHandler---main: Exception while 
> invoking getContentSummary of class ClientNamenodeProtocolTranslatorPB over 
> x/x:8020. Trying to fail over immediately.
> java.io.IOException: com.google.protobuf.ServiceException: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:819)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>         at com.sun.proxy.$Proxy11.getContentSummary(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:3144)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:706)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:702)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:713)
>         at org.apache.hadoop.fs.shell.Count.processPath(Count.java:109)
>         at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
>         at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
>         at 
> org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
>         at 
> org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
>         at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
>         at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
>         at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>         at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
> Caused by: com.google.protobuf.ServiceException: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:272)
>         at com.sun.proxy.$Proxy10.getContentSummary(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:816)
>         ... 23 more
> Caused by: 

[jira] [Updated] (HDFS-14546) Document block placement policies

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-14546:
---
Fix Version/s: 3.3.1

> Document block placement policies
> -
>
> Key: HDFS-14546
> URL: https://issues.apache.org/jira/browse/HDFS-14546
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Íñigo Goiri
>Assignee: Amithsha
>Priority: Major
>  Labels: documentation
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-14546-01.patch, HDFS-14546-02.patch, 
> HDFS-14546-03.patch, HDFS-14546-04.patch, HDFS-14546-05.patch, 
> HDFS-14546-06.patch, HDFS-14546-07.patch, HDFS-14546-08.patch, 
> HDFS-14546-09.patch, HdfsDesign.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, all the documentation refers to the default block placement policy.
> However, over time there have been new policies:
> * BlockPlacementPolicyRackFaultTolerant (HDFS-7891)
> * BlockPlacementPolicyWithNodeGroup (HDFS-3601)
> * BlockPlacementPolicyWithUpgradeDomain (HDFS-9006)
> We should update the documentation to refer to them explaining their 
> particularities and probably how to setup each one of them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15660) StorageTypeProto is not compatiable between 3.x and 2.6

2021-03-24 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307712#comment-17307712
 ] 

Yiqun Lin edited comment on HDFS-15660 at 3/24/21, 9:47 AM:


Hi [~weichiu], this compatible issue only happened in that old hadoop version 
client doesn't contain the storage type which introduced in HDFS-9806. It's a 
client side issue not the server side. As version 3.1, 3.2 and 3.3 already 
contain the new storage type, it should be okay to do the upgrade. So I only 
push the fix to trunk.


was (Author: linyiqun):
Hi [~weichiu], this compatible issue only happened in that old hadoop version 
client doesn't contain the storage type which introduced in HDFS-9806. It's a 
client side issue not the server side. As version 3.1, 3.2 and 3.3 already 
contain the new storage type, it should be okay to do the upgrade.

> StorageTypeProto is not compatiable between 3.x and 2.6
> ---
>
> Key: HDFS-15660
> URL: https://issues.apache.org/jira/browse/HDFS-15660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.0.1, 2.9.2, 2.8.5, 2.7.7, 2.10.1
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Fix For: 2.9.3, 3.4.0, 2.10.2
>
> Attachments: HDFS-15660.002.patch, HDFS-15660.003.patch
>
>
> In our case, when nn has upgraded to 3.1.3 and dn’s version was still 2.6,  
> we found hive to call getContentSummary method , the client and server was 
> not compatible  because of hadoop3 added new PROVIDED storage type.
> {code:java}
> // code placeholder
> 20/04/15 14:28:35 INFO retry.RetryInvocationHandler---main: Exception while 
> invoking getContentSummary of class ClientNamenodeProtocolTranslatorPB over 
> x/x:8020. Trying to fail over immediately.
> java.io.IOException: com.google.protobuf.ServiceException: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:819)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>         at com.sun.proxy.$Proxy11.getContentSummary(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:3144)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:706)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:702)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:713)
>         at org.apache.hadoop.fs.shell.Count.processPath(Count.java:109)
>         at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
>         at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
>         at 
> org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
>         at 
> org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
>         at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
>         at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
>         at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>         at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
> Caused by: com.google.protobuf.ServiceException: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:272)
>         at com.sun.proxy.$Proxy10.getContentSummary(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:816)
>         ... 23 more
> Caused by: com.google.protobuf.UninitializedMessageException: Message missing 
> 

[jira] [Updated] (HDFS-15380) RBF: Could not fetch real remote IP in RouterWebHdfsMethods

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15380:
---
Fix Version/s: 3.3.1

> RBF: Could not fetch real remote IP in RouterWebHdfsMethods
> ---
>
> Key: HDFS-15380
> URL: https://issues.apache.org/jira/browse/HDFS-15380
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.1.0
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: router, webhdfs
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15380.001.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> We plan to add audit log for hdfs router, then we fetch remote ip via 
> Server.getRemoteIp(), but the result is "localhost/127.0.0.1".
>   
>  "REMOTE_ADDRESS" in RouterWebHdfsMethods.java is a ThreadLocal field, 
> setting in construction method RouterWebHdfsMethods() and init(). When we 
> call method Server.getRemoteIp() to fetch remote ip, the thread would be 
> changed, so the ThreadLocal field "REMOTE_ADDRESS" is null, and would be 
> passed to "localhost/127.0.0.1" via InetAddress.getByName().
>   
>  So we can change this field "REMOTE_ADDRESS" to a String value, just like 
> NamenodeWebHdfsMethods does.
>   
> I printed thread name and the value of "REMOTE_ADDRESS" in log, the log is 
> shown below:
> {code:java}
> 2020-05-27 19:15:18,797 INFO  router.RouterWebHdfsMethods 
> (RouterWebHdfsMethods.java:(138)) - RouterWebHdfsMethods 
> REMOTE_ADDRESS: 14.39.39.28, current thread: qtp476579021-1090
> 2020-05-27 19:15:18,827 INFO  router.RouterWebHdfsMethods 
> (RouterWebHdfsMethods.java:init(150)) - init REMOTE_ADDRESS: 14.39.39.28, 
> current thread: qtp476579021-1090
> 2020-05-27 19:15:18,836 INFO  router.RouterWebHdfsMethods 
> (RouterWebHdfsMethods.java:getRemoteAddr(170)) - getRemoteAddr 
> REMOTE_ADDRESS: null, current thread: IPC Server handler 75 on 
> 2020-05-27 19:15:18,837 INFO  router.RouterWebHdfsMethods 
> (RouterWebHdfsMethods.java:getRemoteAddr(170)) - getRemoteAddr 
> REMOTE_ADDRESS: null, current thread: IPC Server handler 75 on 
> 2020-05-27 19:15:18,883 INFO  router.RouterWebHdfsMethods 
> (RouterWebHdfsMethods.java:reset(164)) - reset REMOTE_ADDRESS: null, current 
> thread: IPC Server handler 75 on 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15610) Reduce datanode upgrade/hardlink thread

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15610:
---
Fix Version/s: 3.3.1

> Reduce datanode upgrade/hardlink thread
> ---
>
> Key: HDFS-15610
> URL: https://issues.apache.org/jira/browse/HDFS-15610
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 3.1.4
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> There is a kernel overhead on datanode upgrade. If datanode with millions of 
> blocks and 10+ disks then block-layout migration will be super expensive 
> during its hardlink operation.  Slowness is observed when running with large 
> hardlink threads(dfs.datanode.block.id.layout.upgrade.threads, default is 12 
> thread for each disk) and its runs for 2+ hours. 
> I.e 10*12=120 threads (for 10 disks)
> Small test:
> RHEL7, 32 cores, 20 GB RAM, 8 GB DN heap
> ||dfs.datanode.block.id.layout.upgrade.threads||Blocks||Disks||Time taken||
> |12|3.3 Million|1|2 minutes and 59 seconds|
> |6|3.3 Million|1|2 minutes and 35 seconds|
> |3|3.3 Million|1|2 minutes and 51 seconds|
> Tried same test twice and 95% is accurate (only a few sec difference on each 
> iteration). Using 6 thread is faster than 12 thread because of its overhead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15275) HttpFS: Response of Create was not correct with noredirect and data are true

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15275:
---
Fix Version/s: 3.3.1

> HttpFS: Response of Create was not correct with noredirect and data are true
> 
>
> Key: HDFS-15275
> URL: https://issues.apache.org/jira/browse/HDFS-15275
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Hemanth Boyina
>Assignee: Hemanth Boyina
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15275.001.patch, HDFS-15275.002.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12288) Fix DataNode's xceiver count calculation

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-12288:
---
Fix Version/s: 3.3.1

> Fix DataNode's xceiver count calculation
> 
>
> Key: HDFS-12288
> URL: https://issues.apache.org/jira/browse/HDFS-12288
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Reporter: Lukas Majercak
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch, 
> HDFS-12288.003.patch, HDFS-12288.004.patch, HDFS-12288.005.patch, 
> HDFS-12288.006.patch, HDFS-12288.007.patch, HDFS-12288.008.patch
>
>
> The problem with the ThreadGroup.activeCount() method is that the method is 
> only a very rough estimate, and in reality returns the total number of 
> threads in the thread group as opposed to the threads actually running.
> In some DNs, we saw this to return 50~ for a long time, even though the 
> actual number of DataXceiver threads was next to none.
> This is a big issue as we use the xceiverCount to make decisions on the NN 
> for choosing replication source DN or returning DNs to clients for R/W.
> The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value 
> which only accounts for actual number of DataXcevier threads currently 
> running and thus represents the load on the DN much better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15660) StorageTypeProto is not compatiable between 3.x and 2.6

2021-03-24 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307712#comment-17307712
 ] 

Yiqun Lin commented on HDFS-15660:
--

Hi [~weichiu], this compatible issue only happened in that old hadoop version 
client doesn't contain the storage type which introduced in HDFS-9806. It's a 
client side issue not the server side. As version 3.1, 3.2 and 3.3 already 
contain the new storage type, it should be okay to do the upgrade.

> StorageTypeProto is not compatiable between 3.x and 2.6
> ---
>
> Key: HDFS-15660
> URL: https://issues.apache.org/jira/browse/HDFS-15660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.0.1, 2.9.2, 2.8.5, 2.7.7, 2.10.1
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Fix For: 2.9.3, 3.4.0, 2.10.2
>
> Attachments: HDFS-15660.002.patch, HDFS-15660.003.patch
>
>
> In our case, when nn has upgraded to 3.1.3 and dn’s version was still 2.6,  
> we found hive to call getContentSummary method , the client and server was 
> not compatible  because of hadoop3 added new PROVIDED storage type.
> {code:java}
> // code placeholder
> 20/04/15 14:28:35 INFO retry.RetryInvocationHandler---main: Exception while 
> invoking getContentSummary of class ClientNamenodeProtocolTranslatorPB over 
> x/x:8020. Trying to fail over immediately.
> java.io.IOException: com.google.protobuf.ServiceException: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:819)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>         at com.sun.proxy.$Proxy11.getContentSummary(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:3144)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:706)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:702)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:713)
>         at org.apache.hadoop.fs.shell.Count.processPath(Count.java:109)
>         at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
>         at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
>         at 
> org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
>         at 
> org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
>         at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
>         at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
>         at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>         at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
> Caused by: com.google.protobuf.ServiceException: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:272)
>         at com.sun.proxy.$Proxy10.getContentSummary(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:816)
>         ... 23 more
> Caused by: com.google.protobuf.UninitializedMessageException: Message missing 
> required fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetContentSummaryResponseProto$Builder.build(ClientNamenodeProtocolProtos.java:65392)
>         at 
> 

[jira] [Work logged] (HDFS-15918) Replace RAND_pseudo_bytes in sasl_digest_md5.cc

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15918?focusedWorklogId=571029=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571029
 ]

ASF GitHub Bot logged work on HDFS-15918:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 09:37
Start Date: 24/Mar/21 09:37
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2811:
URL: https://github.com/apache/hadoop/pull/2811#issuecomment-805649096


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 32s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   2m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   2m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  mvnsite  |   0m 30s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  51m 46s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 28s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | -1 :x: |  cc  |   2m 28s | 
[/results-compile-cc-hadoop-hdfs-project_hadoop-hdfs-native-client-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2811/1/artifact/out/results-compile-cc-hadoop-hdfs-project_hadoop-hdfs-native-client-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04.txt)
 |  
hadoop-hdfs-project_hadoop-hdfs-native-client-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04
 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 7 new + 80 unchanged 
- 13 fixed = 87 total (was 93)  |
   | +1 :green_heart: |  golang  |   2m 28s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | -1 :x: |  cc  |   2m 32s | 
[/results-compile-cc-hadoop-hdfs-project_hadoop-hdfs-native-client-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2811/1/artifact/out/results-compile-cc-hadoop-hdfs-project_hadoop-hdfs-native-client-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08.txt)
 |  
hadoop-hdfs-project_hadoop-hdfs-native-client-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08
 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 generated 3 new 
+ 84 unchanged - 9 fixed = 87 total (was 93)  |
   | +1 :green_heart: |  golang  |   2m 32s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 32s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  mvnsite  |   0m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  13m 12s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  31m 53s |  |  hadoop-hdfs-native-client in 
the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 33s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 105m 54s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2811/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2811 |
   | Optional Tests | dupname asflicense compile cc mvnsite javac unit 
codespell golang |
   | uname | Linux 3a133aed974d 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 
05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 936b080994c403b3e5b5f4d37de133adafe30c3f |
   | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
   | Multi-JDK versions | 

[jira] [Commented] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2021-03-24 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307692#comment-17307692
 ] 

Wei-Chiu Chuang commented on HDFS-15160:


+1 to backport.

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15160.001.patch, HDFS-15160.002.patch, 
> HDFS-15160.003.patch, HDFS-15160.004.patch, HDFS-15160.005.patch, 
> HDFS-15160.006.patch, HDFS-15160.007.patch, HDFS-15160.008.patch, 
> image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png
>
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15902) Improve the log for HTTPFS server operation

2021-03-24 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-15902:

Fix Version/s: 3.2.3
   3.4.0
   3.3.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-3.3, branch-3.2. Thanks for your contribution, 
[~bpatel].

> Improve the log for HTTPFS server operation
> ---
>
> Key: HDFS-15902
> URL: https://issues.apache.org/jira/browse/HDFS-15902
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: httpfs
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15902.001.patch
>
>
> Improve the log for HTTPFS server operations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-03-24 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei updated HDFS-15759:
---
Fix Version/s: 3.4.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?focusedWorklogId=571004=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571004
 ]

ASF GitHub Bot logged work on HDFS-15759:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 08:56
Start Date: 24/Mar/21 08:56
Worklog Time Spent: 10m 
  Work Description: ferhui merged pull request #2585:
URL: https://github.com/apache/hadoop/pull/2585


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 571004)
Time Spent: 8h  (was: 7h 50m)

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?focusedWorklogId=571003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-571003
 ]

ASF GitHub Bot logged work on HDFS-15759:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 08:55
Start Date: 24/Mar/21 08:55
Worklog Time Spent: 10m 
  Work Description: ferhui commented on pull request #2585:
URL: https://github.com/apache/hadoop/pull/2585#issuecomment-805621188


   @touchida thanks for contribution.
   @tasanuma @runitao @aajisaka thanks for review!
   merged


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 571003)
Time Spent: 7h 50m  (was: 7h 40m)

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15660) StorageTypeProto is not compatiable between 3.x and 2.6

2021-03-24 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307657#comment-17307657
 ] 

Wei-Chiu Chuang commented on HDFS-15660:


Sorry I am still confused. Shouldn't this get cherrypicked to 3.1, 3.2 and 3.3?

> StorageTypeProto is not compatiable between 3.x and 2.6
> ---
>
> Key: HDFS-15660
> URL: https://issues.apache.org/jira/browse/HDFS-15660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.0.1, 2.9.2, 2.8.5, 2.7.7, 2.10.1
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Fix For: 2.9.3, 3.4.0, 2.10.2
>
> Attachments: HDFS-15660.002.patch, HDFS-15660.003.patch
>
>
> In our case, when nn has upgraded to 3.1.3 and dn’s version was still 2.6,  
> we found hive to call getContentSummary method , the client and server was 
> not compatible  because of hadoop3 added new PROVIDED storage type.
> {code:java}
> // code placeholder
> 20/04/15 14:28:35 INFO retry.RetryInvocationHandler---main: Exception while 
> invoking getContentSummary of class ClientNamenodeProtocolTranslatorPB over 
> x/x:8020. Trying to fail over immediately.
> java.io.IOException: com.google.protobuf.ServiceException: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:819)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>         at com.sun.proxy.$Proxy11.getContentSummary(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:3144)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:706)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:702)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:713)
>         at org.apache.hadoop.fs.shell.Count.processPath(Count.java:109)
>         at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
>         at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
>         at 
> org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
>         at 
> org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
>         at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
>         at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
>         at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>         at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
> Caused by: com.google.protobuf.ServiceException: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:272)
>         at com.sun.proxy.$Proxy10.getContentSummary(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:816)
>         ... 23 more
> Caused by: com.google.protobuf.UninitializedMessageException: Message missing 
> required fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetContentSummaryResponseProto$Builder.build(ClientNamenodeProtocolProtos.java:65392)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetContentSummaryResponseProto$Builder.build(ClientNamenodeProtocolProtos.java:65331)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:263)
>      

[jira] [Commented] (HDFS-15902) Improve the log for HTTPFS server operation

2021-03-24 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307656#comment-17307656
 ] 

Takanobu Asanuma commented on HDFS-15902:
-

+1 on [^HDFS-15902.001.patch].

> Improve the log for HTTPFS server operation
> ---
>
> Key: HDFS-15902
> URL: https://issues.apache.org/jira/browse/HDFS-15902
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: httpfs
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
> Attachments: HDFS-15902.001.patch
>
>
> Improve the log for HTTPFS server operations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?focusedWorklogId=570997=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-570997
 ]

ASF GitHub Bot logged work on HDFS-15759:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 08:39
Start Date: 24/Mar/21 08:39
Worklog Time Spent: 10m 
  Work Description: runitao commented on pull request #2585:
URL: https://github.com/apache/hadoop/pull/2585#issuecomment-805611910


   +1, LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 570997)
Time Spent: 7h 40m  (was: 7.5h)

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15764) Notify Namenode missing or new block on disk as soon as possible

2021-03-24 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307653#comment-17307653
 ] 

Hadoop QA commented on HDFS-15764:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
30s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
24s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
20s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
21s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 43s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 23m  
5s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  3m 
10s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
14s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
7s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
7s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 4s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
13s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} The patch has no ill-formed 
XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  0s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} the patch passed with JDK 

[jira] [Work logged] (HDFS-15900) RBF: empty blockpool id on dfsrouter caused by UNAVAILABLE NameNode

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15900?focusedWorklogId=570994=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-570994
 ]

ASF GitHub Bot logged work on HDFS-15900:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 08:28
Start Date: 24/Mar/21 08:28
Worklog Time Spent: 10m 
  Work Description: hdaikoku commented on pull request #2787:
URL: https://github.com/apache/hadoop/pull/2787#issuecomment-805605184


   @tasanuma I have fixed the style issues and rebased: 6f9fc3e


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 570994)
Time Spent: 3h 20m  (was: 3h 10m)

> RBF: empty blockpool id on dfsrouter caused by UNAVAILABLE NameNode
> ---
>
> Key: HDFS-15900
> URL: https://issues.apache.org/jira/browse/HDFS-15900
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.3.0
>Reporter: Harunobu Daikoku
>Assignee: Harunobu Daikoku
>Priority: Major
>  Labels: pull-request-available
> Attachments: image.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We observed that when a NameNode becomes UNAVAILABLE, the corresponding 
> blockpool id in MembershipStoreImpl#activeNamespaces on dfsrouter 
> unintentionally sets to empty, its initial value.
>  !image.png|height=250!
> As a result of this, concat operations through dfsrouter fail with the 
> following error as it cannot resolve the block id in the recognized active 
> namespaces.
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): 
> Cannot locate a nameservice for block pool BP-...
> {noformat}
> A possible fix is to ignore UNAVAILABLE NameNode registrations, and set 
> proper namespace information obtained from available NameNode registrations 
> when constructing the cache of active namespaces.
>  
> [https://github.com/apache/hadoop/blob/rel/release-3.3.0/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/impl/MembershipStoreImpl.java#L207-L221]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15917) Make HDFS native client more secure

2021-03-24 Thread Gautham Banasandra (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautham Banasandra updated HDFS-15917:
--
Description: There's lots of legacy code in HDFS native client. With the 
recent C++17, CMake, Boost and other dependent library upgrades, we're noticing 
some deprecation warnings during compilation. We need to prioritize replacing 
the security related function calls as it's the most important functionality.  
(was: There's lots of legacy code in HDFS native client. With the recent C++17, 
CMake, Boost and other dependent library upgrades, we're noticing warnings 
during compilation that some of these functions are on the path to deprecation.

We need to prioritize replacing the security related function calls as it's the 
most important functionality.)

> Make HDFS native client more secure
> ---
>
> Key: HDFS-15917
> URL: https://issues.apache.org/jira/browse/HDFS-15917
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>
> There's lots of legacy code in HDFS native client. With the recent C++17, 
> CMake, Boost and other dependent library upgrades, we're noticing some 
> deprecation warnings during compilation. We need to prioritize replacing the 
> security related function calls as it's the most important functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15918) Replace RAND_pseudo_bytes in sasl_digest_md5.cc

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15918?focusedWorklogId=570983=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-570983
 ]

ASF GitHub Bot logged work on HDFS-15918:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 07:51
Start Date: 24/Mar/21 07:51
Worklog Time Spent: 10m 
  Work Description: GauthamBanasandra commented on pull request #2811:
URL: https://github.com/apache/hadoop/pull/2811#issuecomment-80558


   This PR fixes the following compilation warning -
   ```
   [WARNING] 
/home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:97:74:
 warning: 'int RAND_pseudo_bytes(unsigned char*, int)' is deprecated 
[-Wdeprecated-declarations]
   [WARNING]  from 
/home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:20:
   [WARNING] /usr/include/openssl/rand.h:44:1: note: declared here
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 570983)
Time Spent: 20m  (was: 10m)

> Replace RAND_pseudo_bytes in sasl_digest_md5.cc
> ---
>
> Key: HDFS-15918
> URL: https://issues.apache.org/jira/browse/HDFS-15918
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> RAND_pseudo_bytes was deprecated in OpenSSL 1.1.1. We get the following 
> warning during compilation that it's deprecated -
> {code}
> [WARNING] 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:97:74:
>  warning: 'int RAND_pseudo_bytes(unsigned char*, int)' is deprecated 
> [-Wdeprecated-declarations]
> [WARNING]  from 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:20:
> [WARNING] /usr/include/openssl/rand.h:44:1: note: declared here
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15918) Replace RAND_pseudo_bytes in sasl_digest_md5.cc

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15918?focusedWorklogId=570982=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-570982
 ]

ASF GitHub Bot logged work on HDFS-15918:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 07:50
Start Date: 24/Mar/21 07:50
Worklog Time Spent: 10m 
  Work Description: GauthamBanasandra opened a new pull request #2811:
URL: https://github.com/apache/hadoop/pull/2811


   * RAND_pseudo_bytes is deprecated in OpenSSL
 1.1.1 and must be replaced by RAND_bytes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 570982)
Remaining Estimate: 0h
Time Spent: 10m

> Replace RAND_pseudo_bytes in sasl_digest_md5.cc
> ---
>
> Key: HDFS-15918
> URL: https://issues.apache.org/jira/browse/HDFS-15918
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> RAND_pseudo_bytes was deprecated in OpenSSL 1.1.1. We get the following 
> warning during compilation that it's deprecated -
> {code}
> [WARNING] 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:97:74:
>  warning: 'int RAND_pseudo_bytes(unsigned char*, int)' is deprecated 
> [-Wdeprecated-declarations]
> [WARNING]  from 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:20:
> [WARNING] /usr/include/openssl/rand.h:44:1: note: declared here
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15918) Replace RAND_pseudo_bytes in sasl_digest_md5.cc

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-15918:
--
Labels: pull-request-available  (was: )

> Replace RAND_pseudo_bytes in sasl_digest_md5.cc
> ---
>
> Key: HDFS-15918
> URL: https://issues.apache.org/jira/browse/HDFS-15918
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> RAND_pseudo_bytes was deprecated in OpenSSL 1.1.1. We get the following 
> warning during compilation that it's deprecated -
> {code}
> [WARNING] 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:97:74:
>  warning: 'int RAND_pseudo_bytes(unsigned char*, int)' is deprecated 
> [-Wdeprecated-declarations]
> [WARNING]  from 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:20:
> [WARNING] /usr/include/openssl/rand.h:44:1: note: declared here
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15918) Replace RAND_pseudo_bytes in sasl_digest_md5.cc

2021-03-24 Thread Gautham Banasandra (Jira)
Gautham Banasandra created HDFS-15918:
-

 Summary: Replace RAND_pseudo_bytes in sasl_digest_md5.cc
 Key: HDFS-15918
 URL: https://issues.apache.org/jira/browse/HDFS-15918
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs++
Affects Versions: 3.4.0
Reporter: Gautham Banasandra
Assignee: Gautham Banasandra


RAND_pseudo_bytes was deprecated in OpenSSL 1.1.1. We get the following warning 
during compilation that it's deprecated -
{code}
[WARNING] 
/home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:97:74:
 warning: 'int RAND_pseudo_bytes(unsigned char*, int)' is deprecated 
[-Wdeprecated-declarations]
[WARNING]  from 
/home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-2792/src/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/sasl_digest_md5.cc:20:
[WARNING] /usr/include/openssl/rand.h:44:1: note: declared here
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15916) Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff

2021-03-24 Thread Srinivasu Majeti (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307625#comment-17307625
 ] 

Srinivasu Majeti commented on HDFS-15916:
-

Hi [~ayushtkn] , Backporting or modifying hadoop 2.x brach may not be helping 
any easy way for newer products like CDP not supporting them . Fallback 
mechanism what you said could be made to work in my opinion.

> Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for 
> snapshotdiff
> 
>
> Key: HDFS-15916
> URL: https://issues.apache.org/jira/browse/HDFS-15916
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 3.2.2
>Reporter: Srinivasu Majeti
>Priority: Major
>
> Looks like when using distcp diff options between two snapshots from a hadoop 
> 3 cluster to hadoop 2 cluster , we get below exception and seems to be break 
> backward compatibility due to new API introduction 
> getSnapshotDiffReportListing.
>  
> {code:java}
> hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method getSnapshotDiffReportListing called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15917) Make HDFS native client more secure

2021-03-24 Thread Gautham Banasandra (Jira)
Gautham Banasandra created HDFS-15917:
-

 Summary: Make HDFS native client more secure
 Key: HDFS-15917
 URL: https://issues.apache.org/jira/browse/HDFS-15917
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs++
Affects Versions: 3.4.0
Reporter: Gautham Banasandra
Assignee: Gautham Banasandra


There's lots of legacy code in HDFS native client. With the recent C++17, 
CMake, Boost and other dependent library upgrades, we're noticing warnings 
during compilation that some of these functions are on the path to deprecation.

We need to prioritize replacing the security related function calls as it's the 
most important functionality.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15916) Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff

2021-03-24 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15916:

Target Version/s: 3.1.4, 3.3.1, 3.4.0, 3.2.3

> Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for 
> snapshotdiff
> 
>
> Key: HDFS-15916
> URL: https://issues.apache.org/jira/browse/HDFS-15916
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 3.2.2
>Reporter: Srinivasu Majeti
>Priority: Major
>
> Looks like when using distcp diff options between two snapshots from a hadoop 
> 3 cluster to hadoop 2 cluster , we get below exception and seems to be break 
> backward compatibility due to new API introduction 
> getSnapshotDiffReportListing.
>  
> {code:java}
> hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method getSnapshotDiffReportListing called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15916) Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff

2021-03-24 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307617#comment-17307617
 ] 

Ayush Saxena commented on HDFS-15916:
-

Yeps, Seems to be an issue, I feel two ways to proceed, First is backport the 
new Api to branch-2.10, this will make things work or secondly add a fallback 
mechanism, if the new API throws {{RpcNoSuchMethodException}} use the old API 
or something of that sort

> Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for 
> snapshotdiff
> 
>
> Key: HDFS-15916
> URL: https://issues.apache.org/jira/browse/HDFS-15916
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 3.2.2
>Reporter: Srinivasu Majeti
>Priority: Major
>
> Looks like when using distcp diff options between two snapshots from a hadoop 
> 3 cluster to hadoop 2 cluster , we get below exception and seems to be break 
> backward compatibility due to new API introduction 
> getSnapshotDiffReportListing.
>  
> {code:java}
> hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method getSnapshotDiffReportListing called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15879) Exclude slow nodes when choose targets for blocks

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15879?focusedWorklogId=570972=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-570972
 ]

ASF GitHub Bot logged work on HDFS-15879:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 07:20
Start Date: 24/Mar/21 07:20
Worklog Time Spent: 10m 
  Work Description: tasanuma commented on a change in pull request #2748:
URL: https://github.com/apache/hadoop/pull/2748#discussion_r600222793



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyExcludeSlowNodes.java
##
@@ -0,0 +1,128 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.blockmanagement;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.DFSTestUtil;
+import org.apache.hadoop.hdfs.TestBlockStoragePolicy;
+import org.apache.hadoop.hdfs.server.namenode.NameNode;
+import org.apache.hadoop.net.Node;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Set;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+@RunWith(Parameterized.class)
+public class TestReplicationPolicyExcludeSlowNodes
+extends BaseReplicationPolicyTest {
+
+  public TestReplicationPolicyExcludeSlowNodes(String blockPlacementPolicy) {
+this.blockPlacementPolicy = blockPlacementPolicy;
+  }
+
+  @Parameterized.Parameters
+  public static Iterable data() {
+return Arrays.asList(new Object[][] {

Review comment:
   How about adding BlockPlacementPolicyRackFaultTolerant, 
AvailableSpaceBlockPlacementPolicy, 
AvailableSpaceRackFaultTolerantBlockPlacementPolicy?

##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyExcludeSlowNodes.java
##
@@ -0,0 +1,128 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.blockmanagement;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hdfs.DFSConfigKeys;
+import org.apache.hadoop.hdfs.DFSTestUtil;
+import org.apache.hadoop.hdfs.TestBlockStoragePolicy;
+import org.apache.hadoop.hdfs.server.namenode.NameNode;
+import org.apache.hadoop.net.Node;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Set;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+@RunWith(Parameterized.class)
+public class TestReplicationPolicyExcludeSlowNodes
+extends BaseReplicationPolicyTest {
+
+  public TestReplicationPolicyExcludeSlowNodes(String blockPlacementPolicy) {
+this.blockPlacementPolicy = blockPlacementPolicy;
+  }
+
+  @Parameterized.Parameters
+  public static Iterable data() {
+return Arrays.asList(new Object[][] {
+{ BlockPlacementPolicyDefault.class.getName() },
+{ BlockPlacementPolicyWithUpgradeDomain.class.getName() } });
+  }
+
+  @Override
+  DatanodeDescriptor[] getDatanodeDescriptors(Configuration conf) {

[jira] [Work logged] (HDFS-15850) Superuser actions should be reported to external enforcers

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15850?focusedWorklogId=570971=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-570971
 ]

ASF GitHub Bot logged work on HDFS-15850:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 07:18
Start Date: 24/Mar/21 07:18
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2784:
URL: https://github.com/apache/hadoop/pull/2784#issuecomment-805564692


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 51s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 58s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  22m 42s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 12s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   4m 47s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 56s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 17s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   4m 24s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 51s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 40s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 44s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m  0s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   5m  0s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   4m 36s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  javac  |   4m 36s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  hadoop-hdfs-project: The 
patch generated 0 new + 498 unchanged - 6 fixed = 498 total (was 504)  |
   | +1 :green_heart: |  mvnsite  |   1m 43s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   2m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | -1 :x: |  spotbugs  |   3m 21s | 
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2784/7/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html)
 |  hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 
total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  16m 52s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 341m 54s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2784/7/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | -1 :x: |  unit  |  22m 56s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2784/7/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 38s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 482m 54s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs |
   |  |  Possible null pointer dereference of r in 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncate(String, long, 
String, String, long)  Dereferenced at FSNamesystem.java:r in 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.truncate(String, long, 
String, String, long)  

[jira] [Updated] (HDFS-15916) Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff

2021-03-24 Thread Srinivasu Majeti (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivasu Majeti updated HDFS-15916:

Description: 
Looks like when using distcp diff options between two snapshots from a hadoop 3 
cluster to hadoop 2 cluster , we get below exception and seems to be break 
backward compatibility due to new API introduction getSnapshotDiffReportListing.

 
{code:java}
hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
 Unknown method getSnapshotDiffReportListing called on 
org.apache.hadoop.hdfs.protocol.ClientProtocol protocol
 {code}
 

 



  was:
Looks like when using distcp diff options between two snapshots from a hadoop 3 
cluster to hadoop 2 cluster , we get below exception and seems to be break 
backward compatibility due to new API introduction getSnapshotDiffReportListing.

hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
 Unknown method getSnapshotDiffReportListing called on 
org.apache.hadoop.hdfs.protocol.ClientProtocol protocol
 


> Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for 
> snapshotdiff
> 
>
> Key: HDFS-15916
> URL: https://issues.apache.org/jira/browse/HDFS-15916
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 3.2.2
>Reporter: Srinivasu Majeti
>Priority: Major
>
> Looks like when using distcp diff options between two snapshots from a hadoop 
> 3 cluster to hadoop 2 cluster , we get below exception and seems to be break 
> backward compatibility due to new API introduction 
> getSnapshotDiffReportListing.
>  
> {code:java}
> hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method getSnapshotDiffReportListing called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol
>  {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15916) Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff

2021-03-24 Thread Srinivasu Majeti (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivasu Majeti updated HDFS-15916:

Component/s: distcp

> Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for 
> snapshotdiff
> 
>
> Key: HDFS-15916
> URL: https://issues.apache.org/jira/browse/HDFS-15916
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 3.2.2
>Reporter: Srinivasu Majeti
>Priority: Major
>
> Looks like when using distcp diff options between two snapshots from a hadoop 
> 3 cluster to hadoop 2 cluster , we get below exception and seems to be break 
> backward compatibility due to new API introduction 
> getSnapshotDiffReportListing.
> hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method getSnapshotDiffReportListing called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15916) Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff

2021-03-24 Thread Srinivasu Majeti (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivasu Majeti updated HDFS-15916:

Affects Version/s: 3.2.2

> Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for 
> snapshotdiff
> 
>
> Key: HDFS-15916
> URL: https://issues.apache.org/jira/browse/HDFS-15916
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Srinivasu Majeti
>Priority: Major
>
> Looks like when using distcp diff options between two snapshots from a hadoop 
> 3 cluster to hadoop 2 cluster , we get below exception and seems to be break 
> backward compatibility due to new API introduction 
> getSnapshotDiffReportListing.
> hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
>  Unknown method getSnapshotDiffReportListing called on 
> org.apache.hadoop.hdfs.protocol.ClientProtocol protocol
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15249) ThrottledAsyncChecker is not thread-safe.

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15249:
---
Fix Version/s: 3.3.1

> ThrottledAsyncChecker is not thread-safe.
> -
>
> Key: HDFS-15249
> URL: https://issues.apache.org/jira/browse/HDFS-15249
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: federation
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> ThrottledAsyncChecker should be thread-safe because it can be used by 
> multiple threads when we have multiple namespaces.
> *checksInProgress* and *completedChecks* are respectively HashMap and 
> WeakHashMap which are not thread-safe. So we need to put them in synchronized 
> block whenever we access them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15916) Backward compatibility - Distcp fails from Hadoop 3 to Hadoop 2 for snapshotdiff

2021-03-24 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HDFS-15916:
---

 Summary: Backward compatibility - Distcp fails from Hadoop 3 to 
Hadoop 2 for snapshotdiff
 Key: HDFS-15916
 URL: https://issues.apache.org/jira/browse/HDFS-15916
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Srinivasu Majeti


Looks like when using distcp diff options between two snapshots from a hadoop 3 
cluster to hadoop 2 cluster , we get below exception and seems to be break 
backward compatibility due to new API introduction getSnapshotDiffReportListing.

hadoop distcp -diff s1 s2 -update src_cluster_path dst_cluster_path
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException):
 Unknown method getSnapshotDiffReportListing called on 
org.apache.hadoop.hdfs.protocol.ClientProtocol protocol
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15743) Fix -Pdist build failure of hadoop-hdfs-native-client

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15743:
---
Fix Version/s: 3.3.1

> Fix -Pdist build failure of hadoop-hdfs-native-client
> -
>
> Key: HDFS-15743
> URL: https://issues.apache.org/jira/browse/HDFS-15743
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> [INFO] --- exec-maven-plugin:1.3.1:exec (pre-dist) @ 
> hadoop-hdfs-native-client ---
> tar: ./*: Cannot stat: No such file or directory
> tar: Exiting with failure status due to previous errors
> Checking to bundle with:
> bundleoption=false, liboption=snappy.lib, pattern=libsnappy. libdir=
> Checking to bundle with:
> bundleoption=false, liboption=zstd.lib, pattern=libzstd. libdir=
> Checking to bundle with:
> bundleoption=false, liboption=openssl.lib, pattern=libcrypto. libdir=
> Checking to bundle with:
> bundleoption=false, liboption=isal.lib, pattern=libisal. libdir=
> Checking to bundle with:
> bundleoption=, liboption=pmdk.lib, pattern=pmdk libdir=
> Bundling bin files failed
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15610) Reduce datanode upgrade/hardlink thread

2021-03-24 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307589#comment-17307589
 ] 

Wei-Chiu Chuang commented on HDFS-15610:


I think this is a quite important improvement which stabilizes the upgrade 
experience. I'll cherry pick it to lower branches

> Reduce datanode upgrade/hardlink thread
> ---
>
> Key: HDFS-15610
> URL: https://issues.apache.org/jira/browse/HDFS-15610
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.0.0, 3.1.4
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> There is a kernel overhead on datanode upgrade. If datanode with millions of 
> blocks and 10+ disks then block-layout migration will be super expensive 
> during its hardlink operation.  Slowness is observed when running with large 
> hardlink threads(dfs.datanode.block.id.layout.upgrade.threads, default is 12 
> thread for each disk) and its runs for 2+ hours. 
> I.e 10*12=120 threads (for 10 disks)
> Small test:
> RHEL7, 32 cores, 20 GB RAM, 8 GB DN heap
> ||dfs.datanode.block.id.layout.upgrade.threads||Blocks||Disks||Time taken||
> |12|3.3 Million|1|2 minutes and 59 seconds|
> |6|3.3 Million|1|2 minutes and 35 seconds|
> |3|3.3 Million|1|2 minutes and 51 seconds|
> Tried same test twice and 95% is accurate (only a few sec difference on each 
> iteration). Using 6 thread is faster than 12 thread because of its overhead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13639) SlotReleaser is not fast enough

2021-03-24 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17307586#comment-17307586
 ] 

Lisheng Sun commented on HDFS-13639:


Hi, [~sodonnell] 

This patch is  running in production for a long time.

The concurrency  of  short-circuit read has been improved greatly in the Hbase 
sence. 

 

> SlotReleaser is not fast enough
> ---
>
> Key: HDFS-13639
> URL: https://issues.apache.org/jira/browse/HDFS-13639
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
> Environment: 1. YCSB:
> {color:#00} recordcount=20
>  fieldcount=1
>  fieldlength=1000
>  operationcount=1000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#00}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#00}3. regionserver:{color}
> {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#00}block cache is disabled:{color}{color:#00} 
>  hbase.bucketcache.size
>  0.9
>  {color}
>  
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, 
> HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, 
> perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the domain socket, the get qps of the YCSB getting worse 
> and worse, and after about 45 mins, full GC starts. When we reuse the domain 
> socket, no full GC found, and the stress test could be finished smoothly, the 
> qps of allocating and releasing match.
>  # Due to the datanode young GC, without the improvement, the YCSB get qps is 
> even smaller than the one with the improvement, ~3700 VS ~4200.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, 

[jira] [Updated] (HDFS-15910) Replace bzero with explicit_bzero for better safety

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15910:
---
Fix Version/s: 3.3.1

> Replace bzero with explicit_bzero for better safety
> ---
>
> Key: HDFS-15910
> URL: https://issues.apache.org/jira/browse/HDFS-15910
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Affects Versions: 3.2.2
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It is better to always use explicit_bzero since it guarantees that the buffer 
> will be cleared irrespective of the compiler optimizations - 
> https://man7.org/linux/man-pages/man3/bzero.3.html.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15868) Possible Resource Leak in EditLogFileOutputStream

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15868:
---
Fix Version/s: 3.3.1

> Possible Resource Leak in EditLogFileOutputStream
> -
>
> Key: HDFS-15868
> URL: https://issues.apache.org/jira/browse/HDFS-15868
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Narges Shadab
>Assignee: Narges Shadab
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> We noticed a possible resource leak 
> [here|https://github.com/apache/hadoop/blob/1f1a1ef52df896a2b66b16f5bbc17aa39b1a1dd7/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/EditLogFileOutputStream.java#L91].
>  If an I/O error occurs at line 91, rp remains open since the exception isn't 
> caught locally, and there is no way for any caller to close the 
> RandomAccessFile.
>  I'll submit a pull request to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15908) Possible Resource Leak in org.apache.hadoop.hdfs.qjournal.server.Journal

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15908:
---
Fix Version/s: 3.3.1

> Possible Resource Leak in org.apache.hadoop.hdfs.qjournal.server.Journal
> 
>
> Key: HDFS-15908
> URL: https://issues.apache.org/jira/browse/HDFS-15908
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Narges Shadab
>Assignee: Narges Shadab
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We noticed a possible resource leak 
> [here|https://github.com/apache/hadoop/blob/cd44e917d0b331a2d1e1fa63fdd498eac01ae323/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java#L266].
>  The call to close on {{storage}} at line 267 can throw an exception. If it 
> occurs, then {{committedTxnId}} and {{curSegment}} are never closed.
> I'll submit a pull request to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15809) DeadNodeDetector doesn't remove live nodes from dead node set.

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15809:
---
Fix Version/s: 3.3.1

> DeadNodeDetector doesn't remove live nodes from dead node set.
> --
>
> Key: HDFS-15809
> URL: https://issues.apache.org/jira/browse/HDFS-15809
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15809.001.patch, HDFS-15809.002.patch, 
> HDFS-15809.003.patch, HDFS-15809.004.patch, HDFS-15809.005.patch, 
> HDFS-15809.006.patch, HDFS-15809.007.patch
>
>
> We found the dead node detector might never remove the alive nodes from the 
> dead node set in a big cluster. For example:
>  # 200 nodes are added to the dead node set by DeadNodeDetector.
>  # DeadNodeDetector#checkDeadNodes() adds 100 nodes to the 
> deadNodesProbeQueue because the queue limited length is 100.
>  # The probe threads start working and probe 30 nodes.
>  # DeadNodeDetector#checkDeadNodes() is scheduled again. It iterates the dead 
> node set  and adds 30 nodes to the deadNodesProbeQueue. But the order is the 
> same as the last time. So the 30 nodes that has already been probed are added 
> to the queue again.
>  # Repeat 3 and 4. But we always add the first 30 nodes from the dead set. If 
> they are all dead then the live nodes behind them could never be recovered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15806) DeadNodeDetector should close all the threads when it is closed.

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15806:
---
Fix Version/s: 3.3.1

> DeadNodeDetector should close all the threads when it is closed.
> 
>
> Key: HDFS-15806
> URL: https://issues.apache.org/jira/browse/HDFS-15806
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15806.001.patch
>
>
> The DeadNodeDetector doesn't close all the threads when it is closed. This 
> Jira trys to fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15791) Possible Resource Leak in FSImageFormatProtobuf

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15791:
---
Fix Version/s: 3.3.1

> Possible Resource Leak in FSImageFormatProtobuf
> ---
>
> Key: HDFS-15791
> URL: https://issues.apache.org/jira/browse/HDFS-15791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Narges Shadab
>Assignee: Narges Shadab
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We noticed a possible resource leak 
> [here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L271].
>  If an I/O error occurs at line 
> [273|https://github.com/apache/hadoop/blob/06a5d3437f68546207f18d23fe527895920c756a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L273]
>  or 
> [277|https://github.com/apache/hadoop/blob/06a5d3437f68546207f18d23fe527895920c756a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java#L277],
>  {{fin}} remains open since the exception isn't caught locally, and there is 
> no way for any caller to close the FileInputStream
> I'll submit a pull request to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15551) Tiny Improve for DeadNode detector

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15551:
---
Fix Version/s: 3.3.1

> Tiny Improve for DeadNode detector
> --
>
> Key: HDFS-15551
> URL: https://issues.apache.org/jira/browse/HDFS-15551
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.3.0
>Reporter: dark_num
>Assignee: imbajin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> # add or improve some logs for adding local & global deadnodes
>  # logic improve
>  # fix typo



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15661) The DeadNodeDetector shouldn't be shared by different DFSClients.

2021-03-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15661:
---
Fix Version/s: 3.3.1

> The DeadNodeDetector shouldn't be shared by different DFSClients.
> -
>
> Key: HDFS-15661
> URL: https://issues.apache.org/jira/browse/HDFS-15661
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15661.001.patch, HDFS-15661.002.patch, 
> HDFS-15661.003.patch, HDFS-15661.004.patch, HDFS-15661.005.patch
>
>
> Currently the DeadNodeDetector is a member of ClientContext. That means it is 
> shared by many different DFSClients. When one DFSClient.close() is invoked, 
> the DeadNodeDetecotor thread would be interrupted and impact other DFSClients.
> From the original design of HDFS-13571 we could see the DeadNodeDetector is 
> supposed to share dead nodes of many input streams from the same client. 
> We should move the DeadNodeDetector as a member of DFSClient instead of 
> ClientContext. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15911) Provide blocks moved count in Balancer iteration result

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15911?focusedWorklogId=570961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-570961
 ]

ASF GitHub Bot logged work on HDFS-15911:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 06:16
Start Date: 24/Mar/21 06:16
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2799:
URL: https://github.com/apache/hadoop/pull/2799#issuecomment-805533036


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | +0 :ok: |  reexec  |   0m 43s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any 
@author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  The patch appears to include 
1 new or modified test files.  |
   ||| _ branch-3.1 Compile Tests _ |
   | -1 :x: |  mvninstall  |   4m 39s |  root in branch-3.1 failed.  |
   | -1 :x: |  compile  |   0m 31s |  hadoop-hdfs in branch-3.1 failed.  |
   | -0 :warning: |  checkstyle  |   0m 18s |  The patch fails to run 
checkstyle in hadoop-hdfs  |
   | -1 :x: |  mvnsite  |   0m 13s |  hadoop-hdfs in branch-3.1 failed.  |
   | -1 :x: |  shadedclient  |   1m 22s |  branch has errors when building and 
testing our client artifacts.  |
   | -1 :x: |  javadoc  |   0m 21s |  hadoop-hdfs in branch-3.1 failed.  |
   | +0 :ok: |  spotbugs  |   1m 58s |  Used deprecated FindBugs config; 
considering switching to SpotBugs.  |
   | -1 :x: |  findbugs  |   0m 14s |  hadoop-hdfs in branch-3.1 failed.  |
   ||| _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 12s |  hadoop-hdfs in the patch failed.  |
   | -1 :x: |  compile  |   0m 13s |  hadoop-hdfs in the patch failed.  |
   | -1 :x: |  javac  |   0m 13s |  hadoop-hdfs in the patch failed.  |
   | -0 :warning: |  checkstyle  |   0m 11s |  The patch fails to run 
checkstyle in hadoop-hdfs  |
   | -1 :x: |  mvnsite  |   0m 12s |  hadoop-hdfs in the patch failed.  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace 
issues.  |
   | -1 :x: |  shadedclient  |   0m 49s |  patch has errors when building and 
testing our client artifacts.  |
   | -1 :x: |  javadoc  |   0m 13s |  hadoop-hdfs in the patch failed.  |
   | -1 :x: |  findbugs  |   0m 14s |  hadoop-hdfs in the patch failed.  |
   ||| _ Other Tests _ |
   | -1 :x: |  unit  |   0m 14s |  hadoop-hdfs in the patch failed.  |
   | +1 :green_heart: |  asflicense  |   0m 30s |  The patch does not generate 
ASF License warnings.  |
   |  |   |  13m 21s |   |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2799/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2799 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 9bbc2caea67b 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 
16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.1 / 34e507c |
   | Default Java | Oracle 
Corporation-9-internal+0-2016-04-14-195246.buildd.src |
   | mvninstall | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2799/4/artifact/out/branch-mvninstall-root.txt
 |
   | compile | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2799/4/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs.txt
 |
   | checkstyle | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2799/4/artifact/out/buildtool-branch-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
   | mvnsite | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2799/4/artifact/out/branch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt
 |
   | javadoc | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2799/4/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs.txt
 |
   | findbugs | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2799/4/artifact/out/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt
 |
   | mvninstall | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2799/4/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt
 |
   | compile | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2799/4/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs.txt
 |
   | javac | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2799/4/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs.txt
 |
   | checkstyle | 

[jira] [Work logged] (HDFS-15911) Provide blocks moved count in Balancer iteration result

2021-03-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15911?focusedWorklogId=570958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-570958
 ]

ASF GitHub Bot logged work on HDFS-15911:
-

Author: ASF GitHub Bot
Created on: 24/Mar/21 06:04
Start Date: 24/Mar/21 06:04
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on pull request #2797:
URL: https://github.com/apache/hadoop/pull/2797#issuecomment-805527138


   Yes @liuml07 , I have confirmed that this is not relevant. Triggered the 
build also.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 570958)
Time Spent: 4.5h  (was: 4h 20m)

> Provide blocks moved count in Balancer iteration result
> ---
>
> Key: HDFS-15911
> URL: https://issues.apache.org/jira/browse/HDFS-15911
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Balancer provides Result for iteration and it contains info like exitStatus, 
> bytesLeftToMove, bytesBeingMoved etc. We should also provide blocksMoved 
> count from NameNodeConnector and print it with rest of details in 
> Result#print().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org