[jira] [Commented] (HDFS-17152) Fix the documentation of count command in FileSystemShell.md

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753463#comment-17753463
 ] 

ASF GitHub Bot commented on HDFS-17152:
---

slfan1989 commented on PR #5939:
URL: https://github.com/apache/hadoop/pull/5939#issuecomment-1675718724

   LGTM. 




> Fix the documentation of count command in FileSystemShell.md
> 
>
> Key: HDFS-17152
> URL: https://issues.apache.org/jira/browse/HDFS-17152
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
>
> count -q means show quotas and usage.
> count -u means show quotas.
> We should fix this minor documentation error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17157) Transient network failure in lease recovery could be mitigated to ensure better consistency

2023-08-11 Thread Haoze Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haoze Wu updated HDFS-17157:

Summary: Transient network failure in lease recovery could be mitigated to 
ensure better consistency  (was: Transient network failure in lease recovery 
could lead to the block in a datanode in an inconsisetnt state for a long time)

> Transient network failure in lease recovery could be mitigated to ensure 
> better consistency
> ---
>
> Key: HDFS-17157
> URL: https://issues.apache.org/jira/browse/HDFS-17157
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.0.0-alpha
>Reporter: Haoze Wu
>Priority: Major
>
> This case is related to HDFS-12070.
> In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to 
> permanent block recovery failure and leaves the file open indefinitely.  In 
> the patch, instead of failing the whole lease recovery process when the 
> second stage of block recovery is failed at one datanode, the whole lease 
> recovery process is failed if only these are failed for all the datanodes. 
> Attached is the code snippet for the second stage of the block recovery, in 
> BlockRecoveryWorker#syncBlock:
> {code:java}
> ...
> final List successList = new ArrayList<>(); 
> for (BlockRecord r : participatingList) {
>   try {  
> r.updateReplicaUnderRecovery(bpid, recoveryId, blockId, 
> newBlock.getNumBytes());     
> successList.add(r);
>   } catch (IOException e) { 
> ...{code}
> However, because of transient network failure, the RPC in 
> updateReplicaUnderRecovery initiated from the primary datanode to another 
> datanode could return an EOFException while the other side does not process 
> the RPC at all or throw an IOException when reading from the socket. 
> {code:java}
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:788)
>         at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>         at com.sun.proxy.$Proxy29.updateReplicaUnderRecovery(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolTranslatorPB.java:112)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.updateReplicaUnderRecovery(BlockRecoveryWorker.java:88)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.access$700(BlockRecoveryWorker.java:71)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:300)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:188)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:606)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at 
> org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1796)
>         at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1165)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1061) 
> {code}
> Then if there is any other datanode in which the second stage of block 
> recovery success, the lease recovery would be successful and close the file. 
> However, the last block failed to be synced to that failed datanode and this 
> inconsistency could potentially last for a very long time. 
> To fix the issue, I propose adding a configurable retry of 
> updateReplicaUnderRecovery RPC so that transient network failure could be 
> mitigated.
> Any comments and suggestions would be appreciated.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To 

[jira] [Updated] (HDFS-17157) Transient network failure in lease recovery could lead to the block in a datanode in an inconsisetnt state for a long time

2023-08-11 Thread Haoze Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haoze Wu updated HDFS-17157:

Description: 
This case is related to HDFS-12070.

In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to 
permanent block recovery failure and leaves the file open indefinitely.  In the 
patch, instead of failing the whole lease recovery process when the second 
stage of block recovery is failed at one datanode, the whole lease recovery 
process is failed if only these are failed for all the datanodes. 

Attached is the code snippet for the second stage of the block recovery, in 
BlockRecoveryWorker#syncBlock:
{code:java}
...
final List successList = new ArrayList<>(); 
for (BlockRecord r : participatingList) {
  try {  
r.updateReplicaUnderRecovery(bpid, recoveryId, blockId, 
newBlock.getNumBytes());     
successList.add(r);
  } catch (IOException e) { 
...{code}
However, because of transient network failure, the RPC in 
updateReplicaUnderRecovery initiated from the primary datanode to another 
datanode could return an EOFException while the other side does not process the 
RPC at all or throw an IOException when reading from the socket. 
{code:java}
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:788)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495)
        at org.apache.hadoop.ipc.Client.call(Client.java:1437)
        at org.apache.hadoop.ipc.Client.call(Client.java:1347)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy29.updateReplicaUnderRecovery(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolTranslatorPB.java:112)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.updateReplicaUnderRecovery(BlockRecoveryWorker.java:88)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.access$700(BlockRecoveryWorker.java:71)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:300)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:188)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:606)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at 
org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1796)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1165)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1061) {code}
Then if there is any other datanode in which the second stage of block recovery 
success, the lease recovery would be successful and close the file. However, 
the last block failed to be synced to that failed datanode and this 
inconsistency could potentially last for a very long time. 

To fix the issue, I propose adding a configurable retry of 
updateReplicaUnderRecovery RPC so that transient network failure could be 
mitigated.

Any comments and suggestions would be appreciated.

 

  was:
This case is related to HDFS-12070.

In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to 
permanent block recovery failure and leaves the file open indefinitely.  In the 
patch, instead of failing the whole lease recovery process when the second 
stage of block recovery is failed at one datanode, the whole lease recovery 
process is failed if only these are failed for all the datanodes. 

Attached is the code snippet for the second stage of the block recovery, in 
BlockRecoveryWorker#syncBlock:
{code:java}
...
final List successList = new ArrayList<>(); 
for (BlockRecord r : participatingList) {
  try {  
r.updateReplicaUnderRecovery(bpid, recoveryId, blockId, 
newBlock.getNumBytes());     
successList.add(r);
  } catch (IOException e) { 
...{code}
However, because of transient network failure, the RPC in 
updateReplicaUnderRecovery initiated from the primary datanode to another 
datanode could return an EOFException 

[jira] [Updated] (HDFS-17157) Transient network failure in lease recovery could lead to the block in a datanode in an inconsisetnt state for a long time

2023-08-11 Thread Haoze Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haoze Wu updated HDFS-17157:

Description: 
This case is related to HDFS-12070.

In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to 
permanent block recovery failure and leaves the file open indefinitely.  In the 
patch, instead of failing the whole lease recovery process when the second 
stage of block recovery is failed at one datanode, the whole lease recovery 
process is failed if only these are failed for all the datanodes. 

Attached is the code snippet for the second stage of the block recovery, in 
BlockRecoveryWorker#syncBlock:
{code:java}
...
final List successList = new ArrayList<>(); 
for (BlockRecord r : participatingList) {
  try {  
r.updateReplicaUnderRecovery(bpid, recoveryId, blockId, 
newBlock.getNumBytes());     
successList.add(r);
  } catch (IOException e) { 
...{code}
However, because of transient network failure, the RPC in 
updateReplicaUnderRecovery initiated from the primary datanode to another 
datanode could return an EOFException while the other side does not process the 
RPC at all or throw an IOException when reading from the socket. 
{code:java}
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:788)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495)
        at org.apache.hadoop.ipc.Client.call(Client.java:1437)
        at org.apache.hadoop.ipc.Client.call(Client.java:1347)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy29.updateReplicaUnderRecovery(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolTranslatorPB.java:112)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.updateReplicaUnderRecovery(BlockRecoveryWorker.java:88)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.access$700(BlockRecoveryWorker.java:71)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:300)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:188)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:606)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at 
org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1796)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1165)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1061) {code}
Then if there is any other datanode in which the second stage of block recovery 
success, the lease recovery would be successful and close the file. However, 
the last block failed to be synced to that failed datanode and this 
inconsistency could potentially last for a very long time. 

To fix the issue, I propose adding a configurable retry of 
updateReplicaUnderRecovery RPC so that transient network failure could be 
mitigated.

 

 

  was:
This case is related to HDFS-12070.

In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to 
permanent block recovery failure and leaves the file open indefinitely.  In the 
patch, instead of failing the whole lease recovery process when the second 
stage of block recovery is failed at one datanode, the whole lease recovery 
process is failed if only these are failed for all the datanodes. 

Attached is the code snippet for the second stage of the block recovery, in 
BlockRecoveryWorker#syncBlock:
{code:java}
...
final List successList = new ArrayList<>(); 
for (BlockRecord r : participatingList) {
  try {  
r.updateReplicaUnderRecovery(bpid, recoveryId, blockId, 
newBlock.getNumBytes());     
successList.add(r);
  } catch (IOException e) { 
...{code}
However, because of transient network failure, the RPC in 
updateReplicaUnderRecovery initiated from the primary datanode to another 
datanode could return an EOFException while the other side does not process the 
RPC at 

[jira] [Updated] (HDFS-17157) Transient network failure in lease recovery could lead to the block in a datanode in an inconsisetnt state for a long time

2023-08-11 Thread Haoze Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haoze Wu updated HDFS-17157:

Description: 
This case is related to HDFS-12070.

In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to 
permanent block recovery failure and leaves the file open indefinitely.  In the 
patch, instead of failing the whole lease recovery process when the second 
stage of block recovery is failed at one datanode, the whole lease recovery 
process is failed if only these are failed for all the datanodes. 

Attached is the code snippet for the second stage of the block recovery, in 
BlockRecoveryWorker#syncBlock:
{code:java}
...
final List successList = new ArrayList<>(); 
for (BlockRecord r : participatingList) {
  try {  
r.updateReplicaUnderRecovery(bpid, recoveryId, blockId, 
newBlock.getNumBytes());     
successList.add(r);
  } catch (IOException e) { 
...{code}
However, because of transient network failure, the RPC in 
updateReplicaUnderRecovery initiated from the primary datanode to another 
datanode could return an EOFException while the other side does not process the 
RPC at all or throw an IOException when reading from the socket. 
{code:java}
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:788)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495)
        at org.apache.hadoop.ipc.Client.call(Client.java:1437)
        at org.apache.hadoop.ipc.Client.call(Client.java:1347)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy29.updateReplicaUnderRecovery(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolTranslatorPB.java:112)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.updateReplicaUnderRecovery(BlockRecoveryWorker.java:88)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.access$700(BlockRecoveryWorker.java:71)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:300)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:188)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:606)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at 
org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1796)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1165)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1061) {code}
Then if there is any other datanode in which the second stage of block recovery 
success, the lease recovery would be successful and close the file. However, 
the last block failed to be synced to that failed datanode and this 
inconsistency could potentially last for a very long time. 

To fix the issue, I propose adding a configurable retry of 
updateReplicaUnderRecovery RPC so that transient network failure could be 
tolerated. 

 

 

  was:
This case is related to HDFS-12070.

In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to 
permanent block recovery failure and leaves the file open indefinitely.  In the 
patch, instead of failing the whole lease recovery process when the second 
stage of block recovery is failed at one datanode, the whole lease recovery 
process is failed if only these are failed for all the datanodes. 

Attached is the code snippet for the second stage of the block recovery, in 
BlockRecoveryWorker#syncBlock:
{code:java}
...
final List successList = new ArrayList<>(); 
for (BlockRecord r : participatingList) {
  try {  
r.updateReplicaUnderRecovery(bpid, recoveryId, blockId, 
newBlock.getNumBytes());     
successList.add(r);
  } catch (IOException e) { 
...{code}
However, because of transient network failure, the RPC in 
updateReplicaUnderRecovery initiated from the primary datanode to another 
datanode could return an EOFException while the other side does not process the 
RPC at 

[jira] [Updated] (HDFS-17157) Transient network failure in lease recovery could lead to the block in a datanode in an inconsisetnt state for a long time

2023-08-11 Thread Haoze Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haoze Wu updated HDFS-17157:

Summary: Transient network failure in lease recovery could lead to the 
block in a datanode in an inconsisetnt state for a long time  (was: Transient 
network failure in lease recovery could lead to a datanode in an inconsisetnt 
state for a long time)

> Transient network failure in lease recovery could lead to the block in a 
> datanode in an inconsisetnt state for a long time
> --
>
> Key: HDFS-17157
> URL: https://issues.apache.org/jira/browse/HDFS-17157
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.0.0-alpha
>Reporter: Haoze Wu
>Priority: Major
>
> This case is related to HDFS-12070.
> In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to 
> permanent block recovery failure and leaves the file open indefinitely.  In 
> the patch, instead of failing the whole lease recovery process when the 
> second stage of block recovery is failed at one datanode, the whole lease 
> recovery process is failed if only these are failed for all the datanodes. 
> Attached is the code snippet for the second stage of the block recovery, in 
> BlockRecoveryWorker#syncBlock:
> {code:java}
> ...
> final List successList = new ArrayList<>(); 
> for (BlockRecord r : participatingList) {
>   try {  
> r.updateReplicaUnderRecovery(bpid, recoveryId, blockId, 
> newBlock.getNumBytes());     
> successList.add(r);
>   } catch (IOException e) { 
> ...{code}
> However, because of transient network failure, the RPC in 
> updateReplicaUnderRecovery initiated from the primary datanode to another 
> datanode could return an EOFException while the other side does not process 
> the RPC at all or throw an IOException when reading from the socket. 
> {code:java}
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:788)
>         at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>         at com.sun.proxy.$Proxy29.updateReplicaUnderRecovery(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolTranslatorPB.java:112)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.updateReplicaUnderRecovery(BlockRecoveryWorker.java:88)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.access$700(BlockRecoveryWorker.java:71)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:300)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:188)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:606)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at 
> org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1796)
>         at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1165)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1061) 
> {code}
> Then if there is any other datanode in which the second stage of block 
> recovery success, the lease recovery would be successful and close the file. 
> However, the last block failed to be synced to that failed datanode and this 
> inconsistent could potentially last for a very long time. 
> To fix the issue, I propose adding a configurable retry of 
> updateReplicaUnderRecovery RPC so that transient network failure could be 
> tolerated. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HDFS-17157) Transient network failure in lease recovery could lead to a datanode in an inconsisetnt state for a long time

2023-08-11 Thread Haoze Wu (Jira)
Haoze Wu created HDFS-17157:
---

 Summary: Transient network failure in lease recovery could lead to 
a datanode in an inconsisetnt state for a long time
 Key: HDFS-17157
 URL: https://issues.apache.org/jira/browse/HDFS-17157
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.0.0-alpha
Reporter: Haoze Wu


This case is related to HDFS-12070.

In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to 
permanent block recovery failure and leaves the file open indefinitely.  In the 
patch, instead of failing the whole lease recovery process when the second 
stage of block recovery is failed at one datanode, the whole lease recovery 
process is failed if only these are failed for all the datanodes. 

Attached is the code snippet for the second stage of the block recovery, in 
BlockRecoveryWorker#syncBlock:
{code:java}
...
final List successList = new ArrayList<>(); 
for (BlockRecord r : participatingList) {
  try {  
r.updateReplicaUnderRecovery(bpid, recoveryId, blockId, 
newBlock.getNumBytes());     
successList.add(r);
  } catch (IOException e) { 
...{code}
However, because of transient network failure, the RPC in 
updateReplicaUnderRecovery initiated from the primary datanode to another 
datanode could return an EOFException while the other side does not process the 
RPC at all or throw an IOException when reading from the socket. 
{code:java}
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:788)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495)
        at org.apache.hadoop.ipc.Client.call(Client.java:1437)
        at org.apache.hadoop.ipc.Client.call(Client.java:1347)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy29.updateReplicaUnderRecovery(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolTranslatorPB.java:112)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.updateReplicaUnderRecovery(BlockRecoveryWorker.java:88)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.access$700(BlockRecoveryWorker.java:71)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:300)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:188)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:606)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at 
org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1796)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1165)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1061) {code}
Then if there is any other datanode in which the second stage of block recovery 
success, the lease recovery would be successful and close the file. However, 
the last block failed to be synced to that failed datanode and this 
inconsistent could potentially last for a very long time. 

To fix the issue, I propose adding a configurable retry of 
updateReplicaUnderRecovery RPC so that transient network failure could be 
tolerated. 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17153) Reconfig 'pipelineSupportSlownode' parameters for datanode.

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753347#comment-17753347
 ] 

ASF GitHub Bot commented on HDFS-17153:
---

hadoop-yetus commented on PR #5940:
URL: https://github.com/apache/hadoop/pull/5940#issuecomment-1675422748

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 59s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  44m 14s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   1m 13s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 33s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 12s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 29s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  36m  5s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 15s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  36m 14s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 253m 41s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 57s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 396m 24s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5940/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5940 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 0e1308a2f343 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ec4df77e6a8d96162136906646d7ae0d73cdc4e5 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5940/3/testReport/ |
   | Max. process+thread count | 3382 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5940/3/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Reconfig 'pipelineSupportSlownode' parameters for datanode.
> 

[jira] [Commented] (HDFS-17138) RBF: We changed the hadoop.security.auth_to_local configuration of one router, the other routers stopped working

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753340#comment-17753340
 ] 

ASF GitHub Bot commented on HDFS-17138:
---

goiri commented on code in PR #5921:
URL: https://github.com/apache/hadoop/pull/5921#discussion_r1291733842


##
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/delegation/AbstractDelegationTokenSecretManager.java:
##
@@ -81,7 +81,12 @@ class AbstractDelegationTokenSecretManager {
+/**
+ * Create a secret manager
+ *
+ * @param delegationKeyUpdateIntervalthe number of milliseconds 
for rolling
+ *   new secret keys.
+ * @param delegationTokenMaxLifetime the maximum lifetime of the 
delegation
+ *   tokens in milliseconds
+ * @param delegationTokenRenewInterval   how often the tokens must be 
renewed
+ *   in milliseconds
+ * @param delegationTokenRemoverScanInterval how often the tokens are 
scanned
+ *   for expired tokens in 
milliseconds
+ */
+public MyDelegationTokenSecretManager(long delegationKeyUpdateInterval,
+long delegationTokenMaxLifetime, long delegationTokenRenewInterval,
+long delegationTokenRemoverScanInterval) {
+  super(delegationKeyUpdateInterval,
+  delegationTokenMaxLifetime,
+  delegationTokenRenewInterval,
+  delegationTokenRemoverScanInterval);
+}
+
+@Override
+public DelegationTokenIdentifier createIdentifier() {
+  return null;
+}
+
+@Override
+public void logExpireTokens(Collection 
expiredTokens) throws IOException {
+  super.logExpireTokens(expiredTokens);
+}
+  }
+
+  @Test
+  public void testLogExpireTokensWhenChangeRules() {
+MyDelegationTokenSecretManager myDtSecretManager =
+new MyDelegationTokenSecretManager(10 * 1000, 10 * 1000, 10 * 1000, 10 
* 1000);
+setRules("RULE:[2:$1@$0](SomeUser.*)s/.*/SomeUser/");
+DelegationTokenIdentifier dtId = new DelegationTokenIdentifier(
+new Text("SomeUser/h...@example.com"),
+new Text("SomeUser/h...@example.com"),
+new Text("SomeUser/h...@example.com"));
+Set expiredTokens = new HashSet();
+expiredTokens.add(dtId);
+
+setRules("RULE:[2:$1@$0](OtherUser.*)s/.*/OtherUser/");
+// rules was modified, causing the existing tokens (May be loaded from 
other storage systems like zookeeper)
+// to fail to match the kerberos rules,
+// return an exception that cannot be handled
+try {
+  myDtSecretManager.logExpireTokens(expiredTokens);
+} catch (Exception e) {
+  Assert.fail("Exception in logExpireTokens");

Review Comment:
   If you just let the exception go through, it will fail the test anyway.



##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/security/TestDelegationToken.java:
##
@@ -376,4 +382,61 @@ public void testDelegationTokenIdentifierToString() throws 
Exception {
 " for SomeUser with renewer JobTracker",
 dtId.toStringStable());
   }
+
+  public static class MyDelegationTokenSecretManager extends
+  AbstractDelegationTokenSecretManager {
+/**
+ * Create a secret manager
+ *
+ * @param delegationKeyUpdateIntervalthe number of milliseconds 
for rolling
+ *   new secret keys.
+ * @param delegationTokenMaxLifetime the maximum lifetime of the 
delegation
+ *   tokens in milliseconds
+ * @param delegationTokenRenewInterval   how often the tokens must be 
renewed
+ *   in milliseconds
+ * @param delegationTokenRemoverScanInterval how often the tokens are 
scanned
+ *   for expired tokens in 
milliseconds
+ */
+public MyDelegationTokenSecretManager(long delegationKeyUpdateInterval,
+long delegationTokenMaxLifetime, long delegationTokenRenewInterval,
+long delegationTokenRemoverScanInterval) {
+  super(delegationKeyUpdateInterval,

Review Comment:
   Doesn't this happen automatically?





> RBF: We changed the hadoop.security.auth_to_local configuration of one 
> router, the other routers stopped working
> 
>
> Key: HDFS-17138
> URL: https://issues.apache.org/jira/browse/HDFS-17138
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: hadoop 3.3.0
>Reporter: Xiping Zhang
>Assignee: Xiping Zhang
>Priority: Major
>  Labels: 

[jira] [Assigned] (HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL

2023-08-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri reassigned HDFS-17148:
--

Assignee: Hector Sandoval Chaverri

> RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL
> ---
>
> Key: HDFS-17148
> URL: https://issues.apache.org/jira/browse/HDFS-17148
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Hector Sandoval Chaverri
>Assignee: Hector Sandoval Chaverri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The SQLDelegationTokenSecretManager fetches tokens from SQL and stores them 
> temporarily in a memory cache with a short TTL. The ExpiredTokenRemover in 
> AbstractDelegationTokenSecretManager runs periodically to cleanup any expired 
> tokens from the cache, but most tokens have been evicted automatically per 
> the TTL configuration. This leads to many expired tokens in the SQL database 
> that should be cleaned up.
> The SQLDelegationTokenSecretManager should find expired tokens in SQL instead 
> of in the memory cache when running the periodic cleanup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL

2023-08-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri resolved HDFS-17148.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL
> ---
>
> Key: HDFS-17148
> URL: https://issues.apache.org/jira/browse/HDFS-17148
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Hector Sandoval Chaverri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The SQLDelegationTokenSecretManager fetches tokens from SQL and stores them 
> temporarily in a memory cache with a short TTL. The ExpiredTokenRemover in 
> AbstractDelegationTokenSecretManager runs periodically to cleanup any expired 
> tokens from the cache, but most tokens have been evicted automatically per 
> the TTL configuration. This leads to many expired tokens in the SQL database 
> that should be cleaned up.
> The SQLDelegationTokenSecretManager should find expired tokens in SQL instead 
> of in the memory cache when running the periodic cleanup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753338#comment-17753338
 ] 

ASF GitHub Bot commented on HDFS-17148:
---

goiri merged PR #5936:
URL: https://github.com/apache/hadoop/pull/5936




> RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL
> ---
>
> Key: HDFS-17148
> URL: https://issues.apache.org/jira/browse/HDFS-17148
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Hector Sandoval Chaverri
>Priority: Major
>  Labels: pull-request-available
>
> The SQLDelegationTokenSecretManager fetches tokens from SQL and stores them 
> temporarily in a memory cache with a short TTL. The ExpiredTokenRemover in 
> AbstractDelegationTokenSecretManager runs periodically to cleanup any expired 
> tokens from the cache, but most tokens have been evicted automatically per 
> the TTL configuration. This leads to many expired tokens in the SQL database 
> that should be cleaned up.
> The SQLDelegationTokenSecretManager should find expired tokens in SQL instead 
> of in the memory cache when running the periodic cleanup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17154) EC: Fix bug in updateBlockForPipeline after failover

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753264#comment-17753264
 ] 

ASF GitHub Bot commented on HDFS-17154:
---

hadoop-yetus commented on PR #5941:
URL: https://github.com/apache/hadoop/pull/5941#issuecomment-1675009020

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 36s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  1s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m  0s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  30m 53s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 22s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   5m 18s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 19s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 55s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   2m 18s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   5m 40s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  34m 44s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 32s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 54s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   5m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   5m  6s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m  3s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   2m  5s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   5m 40s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  34m 46s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 24s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 217m 13s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5941/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 53s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 386m  2s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5941/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5941 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 63c5ebac5f2b 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 19b4726482db20b9d7a9351ce517b298755d71a0 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 

[jira] [Commented] (HDFS-17093) In the case of all datanodes sending FBR when the namenode restarts (large clusters), there is an issue with incomplete block reporting

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753252#comment-17753252
 ] 

ASF GitHub Bot commented on HDFS-17093:
---

hadoop-yetus commented on PR #5855:
URL: https://github.com/apache/hadoop/pull/5855#issuecomment-1674931551

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 28s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 19s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 53s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 51s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 46s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 55s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 57s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 21s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 42s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  1s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 52s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 12s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 198m 28s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 292m 23s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5855/13/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5855 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux a5d677241212 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ff1a3128b2d8d536356cdb64f9fb85b215478cc6 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5855/13/testReport/ |
   | Max. process+thread count | 3555 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5855/13/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> In the case of all datanodes sending FBR when the namenode restarts (large 

[jira] [Commented] (HDFS-17153) Reconfig 'pipelineSupportSlownode' parameters for datanode.

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753248#comment-17753248
 ] 

ASF GitHub Bot commented on HDFS-17153:
---

hadoop-yetus commented on PR #5940:
URL: https://github.com/apache/hadoop/pull/5940#issuecomment-1674897862

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  5s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  43m 59s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 31s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 30s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 27s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  37m 36s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 28s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  37m  1s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 269m 58s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5940/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 55s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 415m 25s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5940/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5940 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 4c6165d9c5f7 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 7166005e98d774d57c6fd06cfa3859b38ff8f4e7 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5940/2/testReport/ |
   | Max. process+thread count | 3211 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 

[jira] [Commented] (HDFS-17153) Reconfig 'pipelineSupportSlownode' parameters for datanode.

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753246#comment-17753246
 ] 

ASF GitHub Bot commented on HDFS-17153:
---

hadoop-yetus commented on PR #5940:
URL: https://github.com/apache/hadoop/pull/5940#issuecomment-1674889811

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 35s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  44m 38s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   1m 11s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 33s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 11s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 33s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  37m 13s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 10s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   1m 10s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 32s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  36m 25s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 269m 21s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5940/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 51s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 414m  7s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5940/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5940 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 553a2ce85b9b 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 7166005e98d774d57c6fd06cfa3859b38ff8f4e7 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5940/1/testReport/ |
   | Max. process+thread count | 2782 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 

[jira] [Commented] (HDFS-17156) mapreduce job encounters java.io.IOException when dfs.client.rbf.observer.read.enable is true

2023-08-11 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753235#comment-17753235
 ] 

Takanobu Asanuma commented on HDFS-17156:
-

It looked like a bug in RBF SBN, so I moved this jira from HADOOP to HDFS.

> mapreduce job encounters java.io.IOException when 
> dfs.client.rbf.observer.read.enable is true
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Moved] (HDFS-17156) mapreduce job encounters java.io.IOException when dfs.client.rbf.observer.read.enable is true

2023-08-11 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma moved HADOOP-18847 to HDFS-17156:
--

Component/s: rbf
 (was: common)
Key: HDFS-17156  (was: HADOOP-18847)
Project: Hadoop HDFS  (was: Hadoop Common)

> mapreduce job encounters java.io.IOException when 
> dfs.client.rbf.observer.read.enable is true
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17150) EC: Fix the bug of failed lease recovery.

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753209#comment-17753209
 ] 

ASF GitHub Bot commented on HDFS-17150:
---

Hexiaoqiao commented on code in PR #5937:
URL: https://github.com/apache/hadoop/pull/5937#discussion_r1291256537


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -3802,16 +3803,26 @@ boolean internalReleaseLease(Lease lease, String src, 
INodesInPath iip,
 lastBlock.getBlockType());
   }
 
-  if (uc.getNumExpectedLocations() == 0 && lastBlock.getNumBytes() == 0) {
+  int minLocationsNum = 1;
+  if (lastBlock.isStriped()) {
+minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
+  }
+  if (uc.getNumExpectedLocations() < minLocationsNum &&
+  lastBlock.getNumBytes() == 0) {
 // There is no datanode reported to this block.
 // may be client have crashed before writing data to pipeline.
 // This blocks doesn't need any recovery.
 // We can remove this block and close the file.
 pendingFile.removeLastBlock(lastBlock);
 finalizeINodeFileUnderConstruction(src, pendingFile,
 iip.getLatestSnapshotId(), false);
-NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: "
-+ "Removed empty last block and closed file " + src);
+if (uc.getNumExpectedLocations() == 0) {
+  NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: "
+  + "Removed empty last block and closed file " + src);
+} else {
+  NameNode.stateChangeLog.warn("BLOCK* internalReleaseLease: "

Review Comment:
   A little weird that if we can determine it is EC file here when 
`uc.getNumExpectedLocations() != 0`, if not it will be ambiguity of this log 
print.





> EC: Fix the bug of failed lease recovery.
> -
>
> Key: HDFS-17150
> URL: https://issues.apache.org/jira/browse/HDFS-17150
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> If the client crashes without writing the minimum number of internal blocks 
> required by the EC policy, the lease recovery process for the corresponding 
> unclosed file may continue to fail. Taking RS(6,3) policy as an example, the 
> timeline is as follows:
> 1. The client writes some data to only 5 datanodes;
> 2. Client crashes;
> 3. NN fails over;
> 4. Now the result of `uc.getNumExpectedLocations()` completely depends on 
> block report, and there are 5 datanodes reporting internal blocks;
> 5. When the lease expires hard limit, NN issues a block recovery command;
> 6. The datanode checks the command and finds that the number of internal 
> blocks is insufficient, resulting in an error and recovery failure;
> 7. The lease expires hard limit again, and NN issues a block recovery command 
> again, but the recovery fails again..
> When the number of internal blocks written by the client is less than 6, the 
> block group is actually unrecoverable. We should equate this situation to the 
> case where the number of replicas is 0 when processing replica files, i.e., 
> directly remove the last block group and close the file.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17152) Fix the documentation of count command in FileSystemShell.md

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753141#comment-17753141
 ] 

ASF GitHub Bot commented on HDFS-17152:
---

hadoop-yetus commented on PR #5939:
URL: https://github.com/apache/hadoop/pull/5939#issuecomment-1674539723

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 55s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  47m 54s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 30s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  87m 10s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 56s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  mvnsite  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  38m 41s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 36s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 132m 37s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5939/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5939 |
   | Optional Tests | dupname asflicense mvnsite codespell detsecrets 
markdownlint |
   | uname | Linux db137d81e4ef 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 18922ee71c9accbe6ee294fb78df7b5391a12888 |
   | Max. process+thread count | 533 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5939/2/console |
   | versions | git=2.25.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Fix the documentation of count command in FileSystemShell.md
> 
>
> Key: HDFS-17152
> URL: https://issues.apache.org/jira/browse/HDFS-17152
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
>
> count -q means show quotas and usage.
> count -u means show quotas.
> We should fix this minor documentation error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17093) In the case of all datanodes sending FBR when the namenode restarts (large clusters), there is an issue with incomplete block reporting

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753126#comment-17753126
 ] 

ASF GitHub Bot commented on HDFS-17093:
---

Hexiaoqiao commented on PR #5855:
URL: https://github.com/apache/hadoop/pull/5855#issuecomment-1674478417

   > @Hexiaoqiao Thank you for your patience. I have corrected it. What else is 
wrong?
   
   Please check the blanks item and try to fix it as Yetus saud. 
https://github.com/apache/hadoop/pull/5855#issuecomment-1669741496 Thanks.




> In the case of all datanodes sending FBR when the namenode restarts (large 
> clusters), there is an issue with incomplete block reporting
> ---
>
> Key: HDFS-17093
> URL: https://issues.apache.org/jira/browse/HDFS-17093
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.4
>Reporter: Yanlei Yu
>Priority: Minor
>  Labels: pull-request-available
>
> In our cluster of 800+ nodes, after restarting the namenode, we found that 
> some datanodes did not report enough blocks, causing the namenode to stay in 
> secure mode for a long time after restarting because of incomplete block 
> reporting
> I found in the logs of the datanode with incomplete block reporting that the 
> first FBR attempt failed, possibly due to namenode stress, and then a second 
> FBR attempt was made as follows:
> {code:java}
> 
> 2023-07-17 11:29:28,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Unsuccessfully sent block report 0x6237a52c1e817e,  containing 12 storage 
> report(s), of which we sent 1. The reports had 1099057 total blocks and used 
> 1 RPC(s). This took 294 msec to generate and 101721 msecs for RPC and NN 
> processing. Got back no commands.
> 2023-07-17 11:37:04,014 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Successfully sent block report 0x62382416f3f055,  containing 12 storage 
> report(s), of which we sent 12. The reports had 1099048 total blocks and used 
> 12 RPC(s). This took 295 msec to generate and 11647 msecs for RPC and NN 
> processing. Got back no commands. {code}
> There's nothing wrong with that. Retry the send if it fails But on the 
> namenode side of the logic:
> {code:java}
> if (namesystem.isInStartupSafeMode()
>     && !StorageType.PROVIDED.equals(storageInfo.getStorageType())
>     && storageInfo.getBlockReportCount() > 0) {
>   blockLog.info("BLOCK* processReport 0x{} with lease ID 0x{}: "
>       + "discarded non-initial block report from {}"
>       + " because namenode still in startup phase",
>       strBlockReportId, fullBrLeaseId, nodeID);
>   blockReportLeaseManager.removeLease(node);
>   return !node.hasStaleStorages();
> } {code}
> When a disk was identified as the report is not the first time, namely 
> storageInfo. GetBlockReportCount > 0, Will remove the ticket from the 
> datanode, lead to a second report failed because no lease



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17152) Fix the documentation of count command in FileSystemShell.md

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753120#comment-17753120
 ] 

ASF GitHub Bot commented on HDFS-17152:
---

hadoop-yetus commented on PR #5939:
URL: https://github.com/apache/hadoop/pull/5939#issuecomment-1674471343

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 54s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |  44m 28s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5939/1/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  mvnsite  |   1m 33s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  84m 55s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 57s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  mvnsite  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m 15s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 130m 41s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5939/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5939 |
   | Optional Tests | dupname asflicense mvnsite codespell detsecrets 
markdownlint |
   | uname | Linux 94c4904c66c7 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 04b9774b79fb3d53269521e08d78ef4f82baccae |
   | Max. process+thread count | 586 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5939/1/console |
   | versions | git=2.25.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Fix the documentation of count command in FileSystemShell.md
> 
>
> Key: HDFS-17152
> URL: https://issues.apache.org/jira/browse/HDFS-17152
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
>
> count -q means show quotas and usage.
> count -u means show quotas.
> We should fix this minor documentation error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17155) Fixed small doc in mounttable class

2023-08-11 Thread TIsNotT (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TIsNotT resolved HDFS-17155.

Release Note: 
I mistook the branch


  Resolution: Won't Fix

> Fixed small doc in mounttable class
> ---
>
> Key: HDFS-17155
> URL: https://issues.apache.org/jira/browse/HDFS-17155
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: TIsNotT
>Priority: Minor
>
>   /**
>* Set the destination paths.
>*
>* @param paths Destination paths.
>*/
>   public abstract void setDestinations(List dests);
> The annotation value is wrong , it shuld be dests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17154) EC: Fix bug in updateBlockForPipeline after failover

2023-08-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17154:
--
Labels: pull-request-available  (was: )

> EC: Fix bug in updateBlockForPipeline after failover
> 
>
> Key: HDFS-17154
> URL: https://issues.apache.org/jira/browse/HDFS-17154
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
>
> In the method `updateBlockForPipeline`, NameNode uses the 
> `BlockUnderConstructionFeature` of a BlockInfo to generate the member 
> `blockIndices` of `LocatedStripedBlock`. 
> And then, NameNode uses `blockIndices` to generate block tokens for client.
> However, if there is a failover, the location info in 
> BlockUnderConstructionFeature may be incomplete, which results in the absence 
> of the corresponding block tokens.
> When the client receives these incomplete block tokens, it will throw a NPE 
> because `updatedBlks[i]` is null.
> NameNode should just return block tokens for all indices to the client. 
> Client can pick whichever it likes to use. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17154) EC: Fix bug in updateBlockForPipeline after failover

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753114#comment-17753114
 ] 

ASF GitHub Bot commented on HDFS-17154:
---

zhangshuyan0 opened a new pull request, #5941:
URL: https://github.com/apache/hadoop/pull/5941

   In the method `updateBlockForPipeline`, NameNode uses the 
`BlockUnderConstructionFeature` of a BlockInfo to generate the member 
`blockIndices` of `LocatedStripedBlock`. 
   
https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L5308-L5319
   And then, it uses `blockIndices` to generate block tokens for client.
   
https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L1618-L1632
   However, if there is a failover, the location info in 
BlockUnderConstructionFeature may be incomplete, which results in the absence 
of the corresponding block tokens.
   When the client receives these incomplete block tokens, it will throw a NPE 
because `updatedBlks[i]` is null (line 825).
   
https://github.com/apache/hadoop/blob/b6edcb9a84ceac340c79cd692637b3e11c997cc5/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSStripedOutputStream.java#L820-L828
   As a result, the write process in client fails. We need to fix this bug. 
NameNode should just return block tokens for all indices to the client. Client 
can pick whichever it likes to use. 
   




> EC: Fix bug in updateBlockForPipeline after failover
> 
>
> Key: HDFS-17154
> URL: https://issues.apache.org/jira/browse/HDFS-17154
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>
> In the method `updateBlockForPipeline`, NameNode uses the 
> `BlockUnderConstructionFeature` of a BlockInfo to generate the member 
> `blockIndices` of `LocatedStripedBlock`. 
> And then, NameNode uses `blockIndices` to generate block tokens for client.
> However, if there is a failover, the location info in 
> BlockUnderConstructionFeature may be incomplete, which results in the absence 
> of the corresponding block tokens.
> When the client receives these incomplete block tokens, it will throw a NPE 
> because `updatedBlks[i]` is null.
> NameNode should just return block tokens for all indices to the client. 
> Client can pick whichever it likes to use. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17155) Fixed small doc in mounttable class

2023-08-11 Thread TIsNotT (Jira)
TIsNotT created HDFS-17155:
--

 Summary: Fixed small doc in mounttable class
 Key: HDFS-17155
 URL: https://issues.apache.org/jira/browse/HDFS-17155
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: rbf
Reporter: TIsNotT


  /**
   * Set the destination paths.
   *
   * @param paths Destination paths.
   */
  public abstract void setDestinations(List dests);

The annotation value is wrong , it shuld be dests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17154) EC: Fix bug in updateBlockForPipeline after failover

2023-08-11 Thread Shuyan Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuyan Zhang reassigned HDFS-17154:
---

Assignee: Shuyan Zhang

> EC: Fix bug in updateBlockForPipeline after failover
> 
>
> Key: HDFS-17154
> URL: https://issues.apache.org/jira/browse/HDFS-17154
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>
> In the method `updateBlockForPipeline`, NameNode uses the 
> `BlockUnderConstructionFeature` of a BlockInfo to generate the member 
> `blockIndices` of `LocatedStripedBlock`. 
> And then, NameNode uses `blockIndices` to generate block tokens for client.
> However, if there is a failover, the location info in 
> BlockUnderConstructionFeature may be incomplete, which results in the absence 
> of the corresponding block tokens.
> When the client receives these incomplete block tokens, it will throw a NPE 
> because `updatedBlks[i]` is null.
> NameNode should just return block tokens for all indices to the client. 
> Client can pick whichever it likes to use. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17154) EC: Fix bug in updateBlockForPipeline after failover

2023-08-11 Thread Shuyan Zhang (Jira)
Shuyan Zhang created HDFS-17154:
---

 Summary: EC: Fix bug in updateBlockForPipeline after failover
 Key: HDFS-17154
 URL: https://issues.apache.org/jira/browse/HDFS-17154
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Shuyan Zhang


In the method `updateBlockForPipeline`, NameNode uses the 
`BlockUnderConstructionFeature` of a BlockInfo to generate the member 
`blockIndices` of `LocatedStripedBlock`. 

And then, NameNode uses `blockIndices` to generate block tokens for client.

However, if there is a failover, the location info in 
BlockUnderConstructionFeature may be incomplete, which results in the absence 
of the corresponding block tokens.

When the client receives these incomplete block tokens, it will throw a NPE 
because `updatedBlks[i]` is null.

NameNode should just return block tokens for all indices to the client. Client 
can pick whichever it likes to use. 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2023-08-11 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated HDFS-15869:
---
Attachment: 2.png

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 1.png, 2.png
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();   // line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
> // ...
>   }{code}
>     If the sendResponse operation in line 365 gets stuck, then the whole 
> FSEditLogAsync thread is not able to proceed. In this case, the critical 
> logSync (line 243) can’t be executed, for the incoming transactions. Then the 
> namenode hangs. This is undesirable because FSEditLogAsync’s key feature is 
> asynchronous edit logging that is supposed to tolerate slow I/O.
>     To see why the sendResponse operation in line 365 may get stuck, here is 
> the stack trace:
> {code:java}
>  '(org.apache.hadoop.ipc.Server,channelWrite,3593)',
>  '(org.apache.hadoop.ipc.Server,access$1700,139)',
>  '(org.apache.hadoop.ipc.Server$Responder,processResponse,1657)',
>  '(org.apache.hadoop.ipc.Server$Responder,doRespond,1727)',
>  '(org.apache.hadoop.ipc.Server$Connection,sendResponse,2828)',
>  '(org.apache.hadoop.ipc.Server$Connection,access$300,1799)',
>  '(org.apache.hadoop.ipc.Server$RpcCall,doResponse,)',
>  '(org.apache.hadoop.ipc.Server$Call,doResponse,903)',
>  '(org.apache.hadoop.ipc.Server$Call,sendResponse,889)',
>  
> '(org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$RpcEdit,logSyncNotify,365)',
>  '(org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync,run,248)',
>  '(java.lang.Thread,run,748)'
> {code}
>  The `channelWrite` function is defined as follows:
> {code:java}
> //hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
>   private int channelWrite(WritableByteChannel channel,
>ByteBuffer 

[jira] [Commented] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2023-08-11 Thread caozhiqiang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753092#comment-17753092
 ] 

caozhiqiang commented on HDFS-15869:


After I apply this patch, I found the performance of namenode degraded 
obviously, use the NNThroughputBenchmark tool. From the flame graph analyzed by 
async-profiler, it can be seen that most of the CPU is spent on lock 
competition.

!1.png|width=568,height=276! !2.png|width=717,height=184!

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 1.png, 2.png
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();   // line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
> // ...
>   }{code}
>     If the sendResponse operation in line 365 gets stuck, then the whole 
> FSEditLogAsync thread is not able to proceed. In this case, the critical 
> logSync (line 243) can’t be executed, for the incoming transactions. Then the 
> namenode hangs. This is undesirable because FSEditLogAsync’s key feature is 
> asynchronous edit logging that is supposed to tolerate slow I/O.
>     To see why the sendResponse operation in line 365 may get stuck, here is 
> the stack trace:
> {code:java}
>  '(org.apache.hadoop.ipc.Server,channelWrite,3593)',
>  '(org.apache.hadoop.ipc.Server,access$1700,139)',
>  '(org.apache.hadoop.ipc.Server$Responder,processResponse,1657)',
>  '(org.apache.hadoop.ipc.Server$Responder,doRespond,1727)',
>  '(org.apache.hadoop.ipc.Server$Connection,sendResponse,2828)',
>  '(org.apache.hadoop.ipc.Server$Connection,access$300,1799)',
>  '(org.apache.hadoop.ipc.Server$RpcCall,doResponse,)',
>  '(org.apache.hadoop.ipc.Server$Call,doResponse,903)',
>  '(org.apache.hadoop.ipc.Server$Call,sendResponse,889)',
>  
> '(org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$RpcEdit,logSyncNotify,365)',
>  

[jira] [Updated] (HDFS-15869) Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can cause the namenode to hang

2023-08-11 Thread caozhiqiang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

caozhiqiang updated HDFS-15869:
---
Attachment: 1.png

> Network issue while FSEditLogAsync is executing RpcEdit.logSyncNotify can 
> cause the namenode to hang
> 
>
> Key: HDFS-15869
> URL: https://issues.apache.org/jira/browse/HDFS-15869
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs async, namenode
>Affects Versions: 3.2.2
>Reporter: Haoze Wu
>Assignee: Haoze Wu
>Priority: Major
>  Labels: pull-request-available
> Attachments: 1.png, 2.png
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
>     We were doing some testing of the latest Hadoop stable release 3.2.2 and 
> found some network issue can cause the namenode to hang even with the async 
> edit logging (FSEditLogAsync).
>     The workflow of the FSEditLogAsync thread is basically:
>  # get EditLog from a queue (line 229)
>  # do the transaction (line 232)
>  # sync the log if doSync (line 243)
>  # do logSyncNotify (line 248)
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   @Override
>   public void run() {
> try {
>   while (true) {
> boolean doSync;
> Edit edit = dequeueEdit(); // 
> line 229
> if (edit != null) {
>   // sync if requested by edit log.
>   doSync = edit.logEdit(); // 
> line 232
>   syncWaitQ.add(edit);
> } else {
>   // sync when editq runs dry, but have edits pending a sync.
>   doSync = !syncWaitQ.isEmpty();
> }
> if (doSync) {
>   // normally edit log exceptions cause the NN to terminate, but tests
>   // relying on ExitUtil.terminate need to see the exception.
>   RuntimeException syncEx = null;
>   try {
> logSync(getLastWrittenTxId()); // 
> line 243
>   } catch (RuntimeException ex) {
> syncEx = ex;
>   }
>   while ((edit = syncWaitQ.poll()) != null) {
> edit.logSyncNotify(syncEx);// 
> line 248
>   }
> }
>   }
> } catch (InterruptedException ie) {
>   LOG.info(Thread.currentThread().getName() + " was interrupted, 
> exiting");
> } catch (Throwable t) {
>   terminate(t);
> }
>   }
> {code}
>     In terms of the step 4, FSEditLogAsync$RpcEdit.logSyncNotify is 
> essentially doing some network write (line 365).
> {code:java}
> //hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogAsync.java
>   private static class RpcEdit extends Edit {
> // ...
> @Override
> public void logSyncNotify(RuntimeException syncEx) {
>   try {
> if (syncEx == null) {
>   call.sendResponse();   // line 
> 365
> } else {
>   call.abortResponse(syncEx);
> }
>   } catch (Exception e) {} // don't care if not sent.
> }
> // ...
>   }{code}
>     If the sendResponse operation in line 365 gets stuck, then the whole 
> FSEditLogAsync thread is not able to proceed. In this case, the critical 
> logSync (line 243) can’t be executed, for the incoming transactions. Then the 
> namenode hangs. This is undesirable because FSEditLogAsync’s key feature is 
> asynchronous edit logging that is supposed to tolerate slow I/O.
>     To see why the sendResponse operation in line 365 may get stuck, here is 
> the stack trace:
> {code:java}
>  '(org.apache.hadoop.ipc.Server,channelWrite,3593)',
>  '(org.apache.hadoop.ipc.Server,access$1700,139)',
>  '(org.apache.hadoop.ipc.Server$Responder,processResponse,1657)',
>  '(org.apache.hadoop.ipc.Server$Responder,doRespond,1727)',
>  '(org.apache.hadoop.ipc.Server$Connection,sendResponse,2828)',
>  '(org.apache.hadoop.ipc.Server$Connection,access$300,1799)',
>  '(org.apache.hadoop.ipc.Server$RpcCall,doResponse,)',
>  '(org.apache.hadoop.ipc.Server$Call,doResponse,903)',
>  '(org.apache.hadoop.ipc.Server$Call,sendResponse,889)',
>  
> '(org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync$RpcEdit,logSyncNotify,365)',
>  '(org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync,run,248)',
>  '(java.lang.Thread,run,748)'
> {code}
>  The `channelWrite` function is defined as follows:
> {code:java}
> //hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java
>   private int channelWrite(WritableByteChannel channel,
>ByteBuffer 

[jira] [Assigned] (HDFS-17153) Reconfig 'pipelineSupportSlownode' parameters for datanode.

2023-08-11 Thread huangzhaobo99 (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangzhaobo99 reassigned HDFS-17153:


Assignee: huangzhaobo99

> Reconfig 'pipelineSupportSlownode' parameters for datanode.
> ---
>
> Key: HDFS-17153
> URL: https://issues.apache.org/jira/browse/HDFS-17153
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17153) Reconfig 'pipelineSupportSlownode' parameters for datanode.

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753086#comment-17753086
 ] 

ASF GitHub Bot commented on HDFS-17153:
---

huangzhaobo99 commented on PR #5940:
URL: https://github.com/apache/hadoop/pull/5940#issuecomment-1674330057

   Hi @ayushtkn @tomscut @slfan1989, Could you help review this when you have 
time? Thanks.




> Reconfig 'pipelineSupportSlownode' parameters for datanode.
> ---
>
> Key: HDFS-17153
> URL: https://issues.apache.org/jira/browse/HDFS-17153
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: huangzhaobo99
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17153) Reconfig 'pipelineSupportSlownode' parameters for datanode.

2023-08-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17153:
--
Labels: pull-request-available  (was: )

> Reconfig 'pipelineSupportSlownode' parameters for datanode.
> ---
>
> Key: HDFS-17153
> URL: https://issues.apache.org/jira/browse/HDFS-17153
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: huangzhaobo99
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17153) Reconfig 'pipelineSupportSlownode' parameters for datanode.

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753085#comment-17753085
 ] 

ASF GitHub Bot commented on HDFS-17153:
---

huangzhaobo99 opened a new pull request, #5940:
URL: https://github.com/apache/hadoop/pull/5940

   ### Description of PR
   Reference: [HDFS-16396](https://issues.apache.org/jira/browse/HDFS-16396)
   In large clusters, rolling restart datanodes takes a long time. We can make 
slow peers 'pipelineSupportSlownode' parameters in datanode reconfigurable to 
facilitate cluster operation and maintenance.
   
   ### How was this patch tested?
   UT
   
   
   
   




> Reconfig 'pipelineSupportSlownode' parameters for datanode.
> ---
>
> Key: HDFS-17153
> URL: https://issues.apache.org/jira/browse/HDFS-17153
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: huangzhaobo99
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17152) Fix the documentation of count command in FileSystemShell.md

2023-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753084#comment-17753084
 ] 

ASF GitHub Bot commented on HDFS-17152:
---

hfutatzhanghb opened a new pull request, #5939:
URL: https://github.com/apache/hadoop/pull/5939

   ### Description of PR
   Please see HDFS-17152.




> Fix the documentation of count command in FileSystemShell.md
> 
>
> Key: HDFS-17152
> URL: https://issues.apache.org/jira/browse/HDFS-17152
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>
> count -q means show quotas and usage.
> count -u means show quotas.
> We should fix this minor documentation error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17152) Fix the documentation of count command in FileSystemShell.md

2023-08-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17152:
--
Labels: pull-request-available  (was: )

> Fix the documentation of count command in FileSystemShell.md
> 
>
> Key: HDFS-17152
> URL: https://issues.apache.org/jira/browse/HDFS-17152
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
>
> count -q means show quotas and usage.
> count -u means show quotas.
> We should fix this minor documentation error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17153) Reconfig 'pipelineSupportSlownode' parameters for datanode.

2023-08-11 Thread huangzhaobo99 (Jira)
huangzhaobo99 created HDFS-17153:


 Summary: Reconfig 'pipelineSupportSlownode' parameters for 
datanode.
 Key: HDFS-17153
 URL: https://issues.apache.org/jira/browse/HDFS-17153
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: huangzhaobo99






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17152) Fix the documentation of count command in FileSystemShell.md

2023-08-11 Thread farmmamba (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

farmmamba reassigned HDFS-17152:


Assignee: farmmamba

> Fix the documentation of count command in FileSystemShell.md
> 
>
> Key: HDFS-17152
> URL: https://issues.apache.org/jira/browse/HDFS-17152
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>
> count -q means show quotas and usage.
> count -u means show quotas.
> We should fix this minor documentation error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17152) Fix the documentation of count command in FileSystemShell.md

2023-08-11 Thread farmmamba (Jira)
farmmamba created HDFS-17152:


 Summary: Fix the documentation of count command in 
FileSystemShell.md
 Key: HDFS-17152
 URL: https://issues.apache.org/jira/browse/HDFS-17152
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.4.0
Reporter: farmmamba


count -q means show quotas and usage.

count -u means show quotas.

We should fix this minor documentation error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org