from:"Stephen O'Donnell \(Jira\)"

[jira] [Resolved] (HDFS-17376) Distcp creates Factor 1 replication file on target if Source is EC

2024-02-09 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-17376.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

> Distcp creates Factor 1 replication file on target if Source is EC
> --
>
> Key: HDFS-17376
> URL: https://issues.apache.org/jira/browse/HDFS-17376
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 3.3.6
>Reporter: Sadanand Shenoy
>Assignee: Sadanand Shenoy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> If the source file is EC, distcp without preserve option creates a 1 
> replication file (this is not intended). 
> This is because for an EC file getReplication() always return 1 . Instead it 
> should create the file as per the default replication on the target.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-17237) Remove IPCLoggerChannel Metrics when the logger is closed

2023-10-24 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-17237.
--
Resolution: Fixed

> Remove IPCLoggerChannel Metrics when the logger is closed
> -
>
> Key: HDFS-17237
> URL: https://issues.apache.org/jira/browse/HDFS-17237
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.7
>
>
> When an IPCLoggerChannel is created (which is used to read from and write to 
> the Journal nodes) it also creates a metrics object. When the namenodes 
> failover, the IPC loggers are all closed and reopened in read mode on the new 
> SBNN or the read mode is closed on the SBNN and re-opened in write mode. The 
> closing frees the resources and discards the original IPCLoggerChannel object 
> and causes a new one to be created by the caller.
> If a Journal node was down and added back to the cluster with the same 
> hostname, but a different IP, when the failover happens, you end up with 4 
> metrics objects for the JNs:
> 1. For for each of the original 3 IPs
> 2. One for the new IP
> The old stale metric will remain forever and will no longer be updated, 
> leading to confusing results in any tools that use the metrics for monitoring.
> This change, ensures we un-register the metrics when the logger channel is 
> closed and a new metrics object gets created when the new channel is created.
> I have added a small test to prove this, but also reproduced the original 
> issue on a docker cluster and validated it is resolved with this change in 
> place.
> For info, the logger metrics look like:
> {code}
> {
>"name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-192.168.32.8-8485",
> "modelerType" : "IPCLoggerChannel-192.168.32.8-8485",
> "tag.Context" : "dfs",
> "tag.IsOutOfSync" : "false",
> "tag.Hostname" : "957e3e66f10b",
> "QueuedEditsSize" : 0,
> "LagTimeMillis" : 0,
> "CurrentLagTxns" : 0
>   }
> {code}
> Node the name includes the IP, rather than the hostname.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17237) Remove IPCLoggerChannel Metrics when the logger is closed

2023-10-24 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-17237:
-
Fix Version/s: 3.4.0
   3.3.7

> Remove IPCLoggerChannel Metrics when the logger is closed
> -
>
> Key: HDFS-17237
> URL: https://issues.apache.org/jira/browse/HDFS-17237
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.7
>
>
> When an IPCLoggerChannel is created (which is used to read from and write to 
> the Journal nodes) it also creates a metrics object. When the namenodes 
> failover, the IPC loggers are all closed and reopened in read mode on the new 
> SBNN or the read mode is closed on the SBNN and re-opened in write mode. The 
> closing frees the resources and discards the original IPCLoggerChannel object 
> and causes a new one to be created by the caller.
> If a Journal node was down and added back to the cluster with the same 
> hostname, but a different IP, when the failover happens, you end up with 4 
> metrics objects for the JNs:
> 1. For for each of the original 3 IPs
> 2. One for the new IP
> The old stale metric will remain forever and will no longer be updated, 
> leading to confusing results in any tools that use the metrics for monitoring.
> This change, ensures we un-register the metrics when the logger channel is 
> closed and a new metrics object gets created when the new channel is created.
> I have added a small test to prove this, but also reproduced the original 
> issue on a docker cluster and validated it is resolved with this change in 
> place.
> For info, the logger metrics look like:
> {code}
> {
>"name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-192.168.32.8-8485",
> "modelerType" : "IPCLoggerChannel-192.168.32.8-8485",
> "tag.Context" : "dfs",
> "tag.IsOutOfSync" : "false",
> "tag.Hostname" : "957e3e66f10b",
> "QueuedEditsSize" : 0,
> "LagTimeMillis" : 0,
> "CurrentLagTxns" : 0
>   }
> {code}
> Node the name includes the IP, rather than the hostname.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17237) Remove IPCLoggerChannel Metrics when the logger is closed

2023-10-23 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-17237:
-
Summary: Remove IPCLoggerChannel Metrics when the logger is closed  (was: 
Remove IPCLogger Metrics when the logger is closed)

> Remove IPCLoggerChannel Metrics when the logger is closed
> -
>
> Key: HDFS-17237
> URL: https://issues.apache.org/jira/browse/HDFS-17237
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> When an IPCLoggerChannel is created (which is used to read from and write to 
> the Journal nodes) it also creates a metrics object. When the namenodes 
> failover, the IPC loggers are all closed and reopened in read mode on the new 
> SBNN or the read mode is closed on the SBNN and re-opened in write mode. The 
> closing frees the resources and discards the original IPCLoggerChannel object 
> and causes a new one to be created by the caller.
> If a Journal node was down and added back to the cluster with the same 
> hostname, but a different IP, when the failover happens, you end up with 4 
> metrics objects for the JNs:
> 1. For for each of the original 3 IPs
> 2. One for the new IP
> The old stale metric will remain forever and will no longer be updated, 
> leading to confusing results in any tools that use the metrics for monitoring.
> This change, ensures we un-register the metrics when the logger channel is 
> closed and a new metrics object gets created when the new channel is created.
> I have added a small test to prove this, but also reproduced the original 
> issue on a docker cluster and validated it is resolved with this change in 
> place.
> For info, the logger metrics look like:
> {code}
> {
>"name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-192.168.32.8-8485",
> "modelerType" : "IPCLoggerChannel-192.168.32.8-8485",
> "tag.Context" : "dfs",
> "tag.IsOutOfSync" : "false",
> "tag.Hostname" : "957e3e66f10b",
> "QueuedEditsSize" : 0,
> "LagTimeMillis" : 0,
> "CurrentLagTxns" : 0
>   }
> {code}
> Node the name includes the IP, rather than the hostname.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-17237) Remove IPCLogger Metrics when the logger is closed

2023-10-23 Thread Stephen O'Donnell (Jira)

Stephen O'Donnell created HDFS-17237:


 Summary: Remove IPCLogger Metrics when the logger is closed
 Key: HDFS-17237
 URL: https://issues.apache.org/jira/browse/HDFS-17237
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


When an IPCLoggerChannel is created (which is used to read from and write to 
the Journal nodes) it also creates a metrics object. When the namenodes 
failover, the IPC loggers are all closed and reopened in read mode on the new 
SBNN or the read mode is closed on the SBNN and re-opened in write mode. The 
closing frees the resources and discards the original IPCLoggerChannel object 
and causes a new one to be created by the caller.

If a Journal node was down and added back to the cluster with the same 
hostname, but a different IP, when the failover happens, you end up with 4 
metrics objects for the JNs:

1. For for each of the original 3 IPs
2. One for the new IP

The old stale metric will remain forever and will no longer be updated, leading 
to confusing results in any tools that use the metrics for monitoring.

This change, ensures we un-register the metrics when the logger channel is 
closed and a new metrics object gets created when the new channel is created.

I have added a small test to prove this, but also reproduced the original issue 
on a docker cluster and validated it is resolved with this change in place.

For info, the logger metrics look like:

{code}
{
   "name" : "Hadoop:service=NameNode,name=IPCLoggerChannel-192.168.32.8-8485",
"modelerType" : "IPCLoggerChannel-192.168.32.8-8485",
"tag.Context" : "dfs",
"tag.IsOutOfSync" : "false",
"tag.Hostname" : "957e3e66f10b",
"QueuedEditsSize" : 0,
"LagTimeMillis" : 0,
"CurrentLagTxns" : 0
  }
{code}

Node the name includes the IP, rather than the hostname.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-14626) Decommission all nodes hosting last block of open file succeeds unexpectedly

2023-08-29 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell reassigned HDFS-14626:


Assignee: (was: Stephen O'Donnell)

> Decommission all nodes hosting last block of open file succeeds unexpectedly 
> -
>
> Key: HDFS-14626
> URL: https://issues.apache.org/jira/browse/HDFS-14626
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Priority: Major
> Attachments: test-to-reproduce.patch
>
>
> I have been investigating scenarios that cause decommission to hang, 
> especially around one long standing issue. That is, an open block on the host 
> which is being decommissioned can cause the process to never complete.
> Checking the history, there seems to have been at least one change in 
> HDFS-5579 which greatly improved the situation, but from reading comments and 
> support cases, there still seems to be some scenarios where open blocks on a 
> DN host cause the decommission to get stuck.
> No matter what I try, I have not been able to reproduce this, but I think I 
> have uncovered another issue that may partly explain why.
> If I do the following, the nodes will decommission without any issues:
> 1. Create a file and write to it so it crosses a block boundary. Then there 
> is one complete block and one under construction block. Keep the file open, 
> and write a few bytes periodically.
> 2. Now note the nodes which the UC block is currently being written on, and 
> decommission them all.
> 3. The decommission should succeed.
> 4. Now attempt to close the open file, and it will fail to close with an 
> error like below, probably as decommissioned nodes are not allowed to send 
> IBRs:
> {code:java}
> java.io.IOException: Unable to close file because the last block 
> BP-646926902-192.168.0.20-1562099323291:blk_1073741827_1003 does not have 
> enough number of replicas.
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:968)
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:911)
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:894)
>     at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:849)
>     at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>     at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101){code}
> Interestingly, if you recommission the nodes without restarting them before 
> closing the file, it will close OK, and writes to it can continue even once 
> decommission has completed.
> I don't think this is expected - ie decommission should not complete on all 
> nodes hosting the last UC block of a file?
> From what I have figured out, I don't think UC blocks are considered in the 
> DatanodeAdminManager at all. This is because the original list of blocks it 
> cares about, are taken from the Datanode block Iterator, which takes them 
> from the DatanodeStorageInfo objects attached to the datanode instance. I 
> believe UC blocks don't make it into the DatanodeStoreageInfo until after 
> they have been completed and an IBR sent, so the decommission logic never 
> considers them.
> What troubles me about this explanation, is how did open files previously 
> cause decommission to get stuck if it never checks for them, so I suspect I 
> am missing something.
> I will attach a patch with a test case that demonstrates this issue. This 
> reproduces on trunk and I also tested on CDH 5.8.1, which is based on the 2.6 
> branch, but with a lot of backports.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-29 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738581#comment-17738581
 ] 

Stephen O'Donnell commented on HDFS-17061:
--

I don't know of any tool, and certainly the datanode does not know if the block 
is a data or parity.

You might be able to do some analysis from fsck output, but I have never tried 
to do it for this EC analysis.

> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover, erasure-coding, hdfs
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png|width=600,height=333! 
> +If we can let data blocks and parity blocks on DNs more balanced, the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png|width=600,height=333! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17061) EC: Let data blocks and parity blocks on DNs more balanced

2023-06-27 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737802#comment-17737802
 ] 

Stephen O'Donnell commented on HDFS-17061:
--

On a large cluster with many datanodes, and approximately random datanode 
selection for the pipelines, would the cluster balance out naturally?

Have you seen a problem like this on a large cluster, where the parity and data 
blocks are not reasonably well balanced?

> EC: Let data blocks and parity blocks on DNs more balanced
> --
>
> Key: HDFS-17061
> URL: https://issues.apache.org/jira/browse/HDFS-17061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover, erasure-coding, hdfs
>Reporter: WangYuanben
>Priority: Minor
> Attachments: figure1, unbalanced traffic load on DNs.png, figure2, 
> balanced traffic load on DNs.png
>
>
> When choosing DN for placing data block or parity block, the existing number 
> of data block and parity block on datanode is not taken into consideration. 
> This may lead to *uneven traffic load*.
> As shown in the figure 1, when reading block group A, B, C, D and E from five 
> different EC files without any missing block, datanodes like DN1 and DN2 will 
> have high traffic load. However, datanodes like DN3, DN4 and DN5 may have low 
> or even no traffic load. 
>  !figure1, unbalanced traffic load on DNs.png|width=600,height=333! 
> +If we can let data blocks and parity blocks on DNs more balanced, the 
> traffic load in cluster will be more balanced and the peak traffic load on DN 
> will be reduced+. Here "balance" refers to the matching of the number of data 
> blocks and parity blocks on DN with its EC policy. In the ideal state, each 
> DN has a balanced traffic load just like what figure 2 shows. 
>  !figure2, balanced traffic load on DNs.png|width=600,height=333! 
> Then how to reduce this imbalance? I think it's related to EC policy and the 
> ratio of data blocks to parity blocks on datanode. For RS-3-2-1024k, it's 
> appropriate to let the ratio close to 3:2. 
> There are two solutions:
> 1.Improve the block placement policy.
> 2.Improve the Balancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17002) Erasure coding:Generate parity blocks in time to prevent file corruption

2023-05-10 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721241#comment-17721241
 ] 

Stephen O'Donnell commented on HDFS-17002:
--

If the directory scanner is not working, then the same issue can happen to any 
blocks on the system, replicated or EC if they are not read frequently. The 
system is designed to have the directory scanner running and its job is to 
detect corruptions such that you have described.

I don't think there is any need to make any changes here.

> Erasure coding:Generate parity blocks in time to prevent file corruption
> 
>
> Key: HDFS-17002
> URL: https://issues.apache.org/jira/browse/HDFS-17002
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Priority: Major
>
> In current EC implementation, the corrupted parity block will not be 
> regenerated in time. 
> Think about below scene when using RS-6-3-1024k EC policy:
> If three parity blocks p1, p2, p3 are all corrupted or deleted, we are not 
> aware of it.
> Unfortunately, a data block is also corrupted in this time period,  then this 
> file will be corrupted and can not be read by decoding.
>  
> So, here we should always re-generate parity block in time when it is 
> unhealthy.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-17002) Erasure coding:Generate parity blocks in time to prevent file corruption

2023-05-09 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-17002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721080#comment-17721080
 ] 

Stephen O'Donnell commented on HDFS-17002:
--

If some of the parity blocks go missing, the Namenode should detect this and 
reconstruct them. Have you seen some example where this did not happen? Have 
you any more details or do you know the source of the problem?

> Erasure coding:Generate parity blocks in time to prevent file corruption
> 
>
> Key: HDFS-17002
> URL: https://issues.apache.org/jira/browse/HDFS-17002
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Priority: Major
>
> In current EC implementation, the corrupted parity block will not be 
> regenerated in time. 
> Think about below scene when using RS-6-3-1024k EC policy:
> If three parity blocks p1, p2, p3 are all corrupted or deleted, we are not 
> aware of it.
> Unfortunately, a data block is also corrupted in this time period,  then this 
> file will be corrupted and can not be read by decoding.
>  
> So, here we should always re-generate parity block in time when it is 
> unhealthy.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16942) Send error to datanode if FBR is rejected due to bad lease

2023-03-11 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16942.
--
Resolution: Fixed

> Send error to datanode if FBR is rejected due to bad lease
> --
>
> Key: HDFS-16942
> URL: https://issues.apache.org/jira/browse/HDFS-16942
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.5, 3.3.6
>
>
> When a datanode sends a FBR to the namenode, it requires a lease to send it. 
> On a couple of busy clusters, we have seen an issue where the DN is somehow 
> delayed in sending the FBR after requesting the least. Then the NN rejects 
> the FBR and logs a message to that effect, but from the Datanodes point of 
> view, it thinks the report was successful and does not try to send another 
> report until the 6 hour default interval has passed.
> If this happens to a few DNs, there can be missing and under replicated 
> blocks, further adding to the cluster load. Even worse, I have see the DNs 
> join the cluster with zero blocks, so it is not obvious the under replication 
> is caused by lost a FBR, as all DNs appear to be up and running.
> I believe we should propagate an error back to the DN if the FBR is rejected, 
> that way, the DN can request a new lease and try again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16942) Send error to datanode if FBR is rejected due to bad lease

2023-03-11 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16942:
-
Fix Version/s: 3.2.5
   3.3.6

> Send error to datanode if FBR is rejected due to bad lease
> --
>
> Key: HDFS-16942
> URL: https://issues.apache.org/jira/browse/HDFS-16942
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.5, 3.3.6
>
>
> When a datanode sends a FBR to the namenode, it requires a lease to send it. 
> On a couple of busy clusters, we have seen an issue where the DN is somehow 
> delayed in sending the FBR after requesting the least. Then the NN rejects 
> the FBR and logs a message to that effect, but from the Datanodes point of 
> view, it thinks the report was successful and does not try to send another 
> report until the 6 hour default interval has passed.
> If this happens to a few DNs, there can be missing and under replicated 
> blocks, further adding to the cluster load. Even worse, I have see the DNs 
> join the cluster with zero blocks, so it is not obvious the under replication 
> is caused by lost a FBR, as all DNs appear to be up and running.
> I believe we should propagate an error back to the DN if the FBR is rejected, 
> that way, the DN can request a new lease and try again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16942) Send error to datanode if FBR is rejected due to bad lease

2023-03-11 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16942:
-
Fix Version/s: 3.4.0

> Send error to datanode if FBR is rejected due to bad lease
> --
>
> Key: HDFS-16942
> URL: https://issues.apache.org/jira/browse/HDFS-16942
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When a datanode sends a FBR to the namenode, it requires a lease to send it. 
> On a couple of busy clusters, we have seen an issue where the DN is somehow 
> delayed in sending the FBR after requesting the least. Then the NN rejects 
> the FBR and logs a message to that effect, but from the Datanodes point of 
> view, it thinks the report was successful and does not try to send another 
> report until the 6 hour default interval has passed.
> If this happens to a few DNs, there can be missing and under replicated 
> blocks, further adding to the cluster load. Even worse, I have see the DNs 
> join the cluster with zero blocks, so it is not obvious the under replication 
> is caused by lost a FBR, as all DNs appear to be up and running.
> I believe we should propagate an error back to the DN if the FBR is rejected, 
> that way, the DN can request a new lease and try again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16942) Send error to datanode if FBR is rejected due to bad lease

2023-03-07 Thread Stephen O'Donnell (Jira)

Stephen O'Donnell created HDFS-16942:


 Summary: Send error to datanode if FBR is rejected due to bad lease
 Key: HDFS-16942
 URL: https://issues.apache.org/jira/browse/HDFS-16942
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, namenode
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


When a datanode sends a FBR to the namenode, it requires a lease to send it. On 
a couple of busy clusters, we have seen an issue where the DN is somehow 
delayed in sending the FBR after requesting the least. Then the NN rejects the 
FBR and logs a message to that effect, but from the Datanodes point of view, it 
thinks the report was successful and does not try to send another report until 
the 6 hour default interval has passed.

If this happens to a few DNs, there can be missing and under replicated blocks, 
further adding to the cluster load. Even worse, I have see the DNs join the 
cluster with zero blocks, so it is not obvious the under replication is caused 
by lost a FBR, as all DNs appear to be up and running.

I believe we should propagate an error back to the DN if the FBR is rejected, 
that way, the DN can request a new lease and try again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-14548) Cannot create snapshot when the snapshotCounter reaches MaxSnapshotID

2023-02-27 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-14548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-14548.
--
Resolution: Duplicate

> Cannot create snapshot when the snapshotCounter reaches MaxSnapshotID
> -
>
> Key: HDFS-14548
> URL: https://issues.apache.org/jira/browse/HDFS-14548
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: zhangqianqiong
>Priority: Major
> Attachments: 1559717485296.jpg
>
>
> when a new snapshot is created, the snapshotCounter would increment, but when 
> a snapshot is deleted, the snapshotCounter would not decrement. Over time, 
> when the snapshotCounter reaches the MaxSnapshotID, the new snapshot cannot 
> be created.
> By the way, How can I reset the snapshotCounter?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16761) Namenode UI for Datanodes page not loading if any data node is down

2023-02-16 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17689891#comment-17689891
 ] 

Stephen O'Donnell commented on HDFS-16761:
--

Branch 3.2 seems to be OK too, so resolving this one.

> Namenode UI for Datanodes page not loading if any data node is down
> ---
>
> Key: HDFS-16761
> URL: https://issues.apache.org/jira/browse/HDFS-16761
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Krishna Reddy
>Assignee: Zita Dombi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Steps to reproduce:
> - Install the hadoop components and add 3 datanodes
> - Enable namenode HA 
> - Open Namenode UI and check datanode page 
> - check all datanodes will display
> - Now make one datanode down
> - wait for 10 minutes time as heartbeat expires
> - Refresh namenode page and check
>  
> Actual Result: It is showing error message "NameNode is still loading. 
> Redirecting to the Startup Progress page."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16761) Namenode UI for Datanodes page not loading if any data node is down

2023-02-16 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16761:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Namenode UI for Datanodes page not loading if any data node is down
> ---
>
> Key: HDFS-16761
> URL: https://issues.apache.org/jira/browse/HDFS-16761
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Krishna Reddy
>Assignee: Zita Dombi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Steps to reproduce:
> - Install the hadoop components and add 3 datanodes
> - Enable namenode HA 
> - Open Namenode UI and check datanode page 
> - check all datanodes will display
> - Now make one datanode down
> - wait for 10 minutes time as heartbeat expires
> - Refresh namenode page and check
>  
> Actual Result: It is showing error message "NameNode is still loading. 
> Redirecting to the Startup Progress page."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16761) Namenode UI for Datanodes page not loading if any data node is down

2023-02-15 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16761:
-
Fix Version/s: 3.4.0

> Namenode UI for Datanodes page not loading if any data node is down
> ---
>
> Key: HDFS-16761
> URL: https://issues.apache.org/jira/browse/HDFS-16761
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Krishna Reddy
>Assignee: Zita Dombi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Steps to reproduce:
> - Install the hadoop components and add 3 datanodes
> - Enable namenode HA 
> - Open Namenode UI and check datanode page 
> - check all datanodes will display
> - Now make one datanode down
> - wait for 10 minutes time as heartbeat expires
> - Refresh namenode page and check
>  
> Actual Result: It is showing error message "NameNode is still loading. 
> Redirecting to the Startup Progress page."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-16761) Namenode UI for Datanodes page not loading if any data node is down

2023-02-09 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell reassigned HDFS-16761:


Assignee: Zita Dombi

> Namenode UI for Datanodes page not loading if any data node is down
> ---
>
> Key: HDFS-16761
> URL: https://issues.apache.org/jira/browse/HDFS-16761
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.2.2
>Reporter: Krishna Reddy
>Assignee: Zita Dombi
>Priority: Major
> Fix For: 3.2.2
>
>
> Steps to reproduce:
> - Install the hadoop components and add 3 datanodes
> - Enable namenode HA 
> - Open Namenode UI and check datanode page 
> - check all datanodes will display
> - Now make one datanode down
> - wait for 10 minutes time as heartbeat expires
> - Refresh namenode page and check
>  
> Actual Result: It is showing error message "NameNode is still loading. 
> Redirecting to the Startup Progress page."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-8789) Block Placement policy migrator

2022-09-14 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-8789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17603945#comment-17603945
 ] 

Stephen O'Donnell commented on HDFS-8789:
-

I don't think this tool is needed as er have HDFS-14053 committed since this 
Jira was opened, which allows you to migrate blocks on a path by path basis.

There are no plans from our side to move this forward.

> Block Placement policy migrator
> ---
>
> Key: HDFS-8789
> URL: https://issues.apache.org/jira/browse/HDFS-8789
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Major
> Attachments: HDFS-8789-trunk-STRAWMAN-v1.patch
>
>
> As we start to add new block placement policies to HDFS, it will be necessary 
> to have a robust tool that can migrate HDFS blocks between placement 
> policies. This jira is for the design and implementation of that tool.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16610) Make fsck read timeout configurable

2022-06-07 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16610.
--
Resolution: Fixed

> Make fsck read timeout configurable
> ---
>
> Key: HDFS-16610
> URL: https://issues.apache.org/jira/browse/HDFS-16610
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In a cluster with a lot of small files, we encountered a case where fsck was 
> very slow. I believe it is due to contention with many other threads reading 
> / writing data on the cluster.
> Sometimes fsck does not report any progress for more than 60 seconds and the 
> client times out. Currently the connect and read timeout are hardcoded to 60 
> seconds. This change is to make them configurable.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16610) Make fsck read timeout configurable

2022-06-07 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16610:
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.4

> Make fsck read timeout configurable
> ---
>
> Key: HDFS-16610
> URL: https://issues.apache.org/jira/browse/HDFS-16610
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In a cluster with a lot of small files, we encountered a case where fsck was 
> very slow. I believe it is due to contention with many other threads reading 
> / writing data on the cluster.
> Sometimes fsck does not report any progress for more than 60 seconds and the 
> client times out. Currently the connect and read timeout are hardcoded to 60 
> seconds. This change is to make them configurable.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-16583) DatanodeAdminDefaultMonitor can get stuck in an infinite loop

2022-05-31 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell reopened HDFS-16583:
--

Reopening to add a branch-3.2 PR.

> DatanodeAdminDefaultMonitor can get stuck in an infinite loop
> -
>
> Key: HDFS-16583
> URL: https://issues.apache.org/jira/browse/HDFS-16583
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We encountered a case where the decommission monitor in the namenode got 
> stuck for about 6 hours. The logs give:
> {code}
> 2022-05-15 01:09:25,490 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
> maintenance of dead node 10.185.3.132:50010
> 2022-05-15 01:10:20,918 INFO org.apache.hadoop.http.HttpServer2: Process 
> Thread Dump: jsp requested
> 
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753665_3428271426
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753659_3428271420
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753662_3428271423
> 2022-05-15 01:19:06,810 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
> PendingReconstructionMonitor timed out blk_4501753663_3428271424
> 2022-05-15 06:00:57,281 INFO 
> org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
> maintenance of dead node 10.185.3.34:50010
> 2022-05-15 06:00:58,105 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock 
> held for 17492614 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1601)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:496)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>   Number of suppressed write-lock reports: 0
>   Longest write-lock held interval: 17492614
> {code}
> We only have the one thread dump triggered by the FC:
> {code}
> Thread 80 (DatanodeAdminMonitor-0):
>   State: RUNNABLE
>   Blocked count: 16
>   Waited count: 453693
>   Stack:
> 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:538)
> 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:494)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
> {code}
> This was the line of code:
> {code}
> private void check() {
>   final Iterator>>
>   it = new CyclicIteration<>(outOfServiceNodeBlocks,
>   iterkey).iterator();
>   final LinkedList toRemove = new LinkedList<>();
>   while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem
>   .isRunning()) {
> numNodesChecked++;
> final Map.Entry>
> entry = it.next();
> final DatanodeDescriptor dn = entry.getKey();
> AbstractList blocks = entry.getValue();
> boolean fullScan = false;
> if (dn.isMaintenance() &&

[jira] [Created] (HDFS-16610) Make fsck read timeout configurable

2022-05-31 Thread Stephen O'Donnell (Jira)

Stephen O'Donnell created HDFS-16610:


 Summary: Make fsck read timeout configurable
 Key: HDFS-16610
 URL: https://issues.apache.org/jira/browse/HDFS-16610
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


In a cluster with a lot of small files, we encountered a case where fsck was 
very slow. I believe it is due to contention with many other threads reading / 
writing data on the cluster.

Sometimes fsck does not report any progress for more than 60 seconds and the 
client times out. Currently the connect and read timeout are hardcoded to 60 
seconds. This change is to make them configurable.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16594) Many RpcCalls are blocked for a while while Decommission works

2022-05-25 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542230#comment-17542230
 ] 

Stephen O'Donnell commented on HDFS-16594:
--

Some people have reported good results with the DatanodeAdminBackoffMonitor. 
What about giving it a try and see if the locking it better?

> Many RpcCalls are blocked for a while while Decommission works
> --
>
> Key: HDFS-16594
> URL: https://issues.apache.org/jira/browse/HDFS-16594
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
> Attachments: image-2022-05-26-02-05-38-878.png
>
>
> When there are some DataNodes that need to go offline, Decommission starts to 
> work, and periodically checks the number of blocks remaining to be processed. 
> By default, when checking more than 
> 50w(${dfs.namenode.decommission.blocks.per.interval}) blocks, the 
> DatanodeAdminDefaultMonitor thread will sleep for a while before continuing.
> If the number of blocks to be checked is very large, for example, the number 
> of replicas managed by the DataNode reaches 90w or even 100w, during this 
> period, the DatanodeAdminDefaultMonitor will continue to hold the 
> FSNamesystemLock, which will block a lot of RpcCalls. Here are some logs:
>  !image-2022-05-26-02-05-38-878.png! 
> It can be seen that in the last inspection process, there were more than 100w 
> blocks.
> When the check is over, FSNamesystemLock is released and RpcCall starts 
> working:
> '
> 2022-05-25 13:46:09,712 [4831384907] - WARN  [IPC Server handler 36 on 
> 8021:Server@494] - Slow RPC : sendHeartbeat took 3488 milliseconds to process 
> from client Call#5571549 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 10.196.145.92:35727
> 2022-05-25 13:46:09,712 [4831384907] - WARN  [IPC Server handler 135 on 
> 8021:Server@494] - Slow RPC : sendHeartbeat took 3472 milliseconds to process 
> from client Call#36795561 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 10.196.99.152:37793
> 2022-05-25 13:46:09,712 [4831384907] - WARN  [IPC Server handler 108 on 
> 8021:Server@494] - Slow RPC : sendHeartbeat took 3445 milliseconds to process 
> from client Call#5497586 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 10.196.146.56:23475
> '
> '
> 2022-05-25 13:46:09,712 [4831384907] - WARN  [IPC Server handler 33 on 
> 8021:Server@494] - Slow RPC : sendHeartbeat took 3435 milliseconds to process 
> from client Call#6043903 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 10.196.82.106:34746
> 2022-05-25 13:46:09,712 [4831384907] - WARN  [IPC Server handler 139 on 
> 8021:Server@494] - Slow RPC : sendHeartbeat took 3436 milliseconds to process 
> from client Call#274471 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 10.196.149.175:46419
> 2022-05-25 13:46:09,712 [4831384907] - WARN  [IPC Server handler 77 on 
> 8021:Server@494] - Slow RPC : sendHeartbeat took 3436 milliseconds to process 
> from client Call#73375524 Retry#0 
> org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.sendHeartbeat from 
> 10.196.81.46:34241
> '
> Since RpcCall is waiting for a long time, RpcQueueTime+RpcProcessingTime will 
> be longer than usual. A very large number of RpcCalls were affected during 
> this time.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16583) DatanodeAdminDefaultMonitor can get stuck in an infinite loop

2022-05-19 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16583:
-
Description: 
We encountered a case where the decommission monitor in the namenode got stuck 
for about 6 hours. The logs give:

{code}
2022-05-15 01:09:25,490 INFO 
org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
maintenance of dead node 10.185.3.132:50010
2022-05-15 01:10:20,918 INFO org.apache.hadoop.http.HttpServer2: Process Thread 
Dump: jsp requested

2022-05-15 01:19:06,810 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
PendingReconstructionMonitor timed out blk_4501753665_3428271426
2022-05-15 01:19:06,810 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
PendingReconstructionMonitor timed out blk_4501753659_3428271420
2022-05-15 01:19:06,810 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
PendingReconstructionMonitor timed out blk_4501753662_3428271423
2022-05-15 01:19:06,810 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
PendingReconstructionMonitor timed out blk_4501753663_3428271424
2022-05-15 06:00:57,281 INFO 
org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
maintenance of dead node 10.185.3.34:50010
2022-05-15 06:00:58,105 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock 
held for 17492614 ms via
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1601)
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:496)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
Number of suppressed write-lock reports: 0
Longest write-lock held interval: 17492614
{code}

We only have the one thread dump triggered by the FC:

{code}
Thread 80 (DatanodeAdminMonitor-0):
  State: RUNNABLE
  Blocked count: 16
  Waited count: 453693
  Stack:

org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:538)

org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:494)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
{code}

This was the line of code:

{code}
private void check() {
  final Iterator>>
  it = new CyclicIteration<>(outOfServiceNodeBlocks,
  iterkey).iterator();
  final LinkedList toRemove = new LinkedList<>();

  while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem
  .isRunning()) {
numNodesChecked++;
final Map.Entry>
entry = it.next();
final DatanodeDescriptor dn = entry.getKey();
AbstractList blocks = entry.getValue();
boolean fullScan = false;
if (dn.isMaintenance() && dn.maintenanceExpired()) {
  // If maintenance expires, stop tracking it.
  stopMaintenance(dn);
  toRemove.add(dn);
  continue;
}
if (dn.isInMaintenance()) {
  // The dn is IN_MAINTENANCE and the maintenance hasn't expired yet.
  continue;   >>> This line
}
{code}

With only one sample, we cannot figure out for sure if it is somehow stuck in 
an infinite loop, but I suspect it is.

The problem is two fold:

1) When we call stopMaintenance(dn), which we must have done as it logged the 
"Stopping maintenance of dead node", the code looks like:

{code}

   if (dn.isMaintenance() && dn.maintenanceExpired()) {
  // If maintenance

[jira] [Created] (HDFS-16583) DatanodeAdminDefaultMonitor can get stuck in an infinite loop

2022-05-19 Thread Stephen O'Donnell (Jira)

Stephen O'Donnell created HDFS-16583:


 Summary: DatanodeAdminDefaultMonitor can get stuck in an infinite 
loop
 Key: HDFS-16583
 URL: https://issues.apache.org/jira/browse/HDFS-16583
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell



We encountered a case where the decommission monitor in the namenode got stuck 
for about 6 hours. The logs give:

{code}
2022-05-15 01:09:25,490 INFO 
org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
maintenance of dead node 10.185.3.132:50010
2022-05-15 01:10:20,918 INFO org.apache.hadoop.http.HttpServer2: Process Thread 
Dump: jsp requested

2022-05-15 01:19:06,810 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
PendingReconstructionMonitor timed out blk_4501753665_3428271426
2022-05-15 01:19:06,810 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
PendingReconstructionMonitor timed out blk_4501753659_3428271420
2022-05-15 01:19:06,810 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
PendingReconstructionMonitor timed out blk_4501753662_3428271423
2022-05-15 01:19:06,810 WARN 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: 
PendingReconstructionMonitor timed out blk_4501753663_3428271424
2022-05-15 06:00:57,281 INFO 
org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping 
maintenance of dead node 10.185.3.34:50010
2022-05-15 06:00:58,105 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock 
held for 17492614 ms via
java.lang.Thread.getStackTrace(Thread.java:1559)
org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263)
org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1601)
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:496)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
Number of suppressed write-lock reports: 0
Longest write-lock held interval: 17492614
{code}

We only have the one thread dump triggered by the FC:

{code}
Thread 80 (DatanodeAdminMonitor-0):
  State: RUNNABLE
  Blocked count: 16
  Waited count: 453693
  Stack:

org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:538)

org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:494)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
{code}

This was the line of code:

{code}
private void check() {
  final Iterator>>
  it = new CyclicIteration<>(outOfServiceNodeBlocks,
  iterkey).iterator();
  final LinkedList toRemove = new LinkedList<>();

  while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem
  .isRunning()) {
numNodesChecked++;
final Map.Entry>
entry = it.next();
final DatanodeDescriptor dn = entry.getKey();
AbstractList blocks = entry.getValue();
boolean fullScan = false;
if (dn.isMaintenance() && dn.maintenanceExpired()) {
  // If maintenance expires, stop tracking it.
  stopMaintenance(dn);
  toRemove.add(dn);
  continue;
}
if (dn.isInMaintenance()) {
  // The dn is IN_MAINTENANCE and the maintenance hasn't expired yet.
  continue;   >>> This line
}
{code}

With only one sample, we cannot figure out for sure if it is somehow stuck in 
an infinite loop, but I suspect it is.

The problem is two fold:

1) When we call stopMaintenance(dn), which we

[jira] [Commented] (HDFS-16093) DataNodes under decommission will still be returned to the client via getLocatedBlocks, so the client may request decommissioning datanodes to read which will cause bad

2022-04-27 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529026#comment-17529026
 ] 

Stephen O'Donnell commented on HDFS-16093:
--

I think HDFS-16076 addressed the sorting, to put the out of service nodes last 
in the list.

> DataNodes under decommission will still be returned to the client via 
> getLocatedBlocks, so the client may request decommissioning datanodes to read 
> which will cause badly competation on disk IO.
> --
>
> Key: HDFS-16093
> URL: https://issues.apache.org/jira/browse/HDFS-16093
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.1
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Critical
>
> DataNodes under decommission will still be returned to the client via 
> getLocatedBlocks, so the client may request decommissioning datanodes to read 
> which will cause badly competation on disk IO.
> Therefore, datanodes under decommission should be removed from the return 
> list of getLocatedBlocks api.
> !image-2021-06-29-10-50-44-739.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16531) Avoid setReplication logging an edit record if old replication equals the new value

2022-04-20 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16531:
-
Fix Version/s: (was: 3.4.0)
   (was: 3.2.4)
   (was: 3.3.4)

> Avoid setReplication logging an edit record if old replication equals the new 
> value
> ---
>
> Key: HDFS-16531
> URL: https://issues.apache.org/jira/browse/HDFS-16531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I recently came across a NN log where about 800k setRep calls were made, 
> setting the replication from 3 to 3 - ie leaving it unchanged.
> Even in a case like this, we log an edit record, an audit log, and perform 
> some quota checks etc.
> I believe it should be possible to avoid some of the work if we check for 
> oldRep == newRep and jump out of the method early.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16531) Avoid setReplication logging an edit record if old replication equals the new value

2022-04-20 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16531.
--
Resolution: Abandoned

Reverted this change down the branches. Sorry for causing the issue and thanks 
for those who jumped in with suggestions to fix it. It was intended to be a 
simple optimisation, but its proving too risky to be worth it!

> Avoid setReplication logging an edit record if old replication equals the new 
> value
> ---
>
> Key: HDFS-16531
> URL: https://issues.apache.org/jira/browse/HDFS-16531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I recently came across a NN log where about 800k setRep calls were made, 
> setting the replication from 3 to 3 - ie leaving it unchanged.
> Even in a case like this, we log an edit record, an audit log, and perform 
> some quota checks etc.
> I believe it should be possible to avoid some of the work if we check for 
> oldRep == newRep and jump out of the method early.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16531) Avoid setReplication logging an edit record if old replication equals the new value

2022-04-20 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525235#comment-17525235
 ] 

Stephen O'Donnell commented on HDFS-16531:
--

I worry what else might break that is not covered by a test. This was intended 
to be a simple optimisation - perhaps its not worth the risk. I'm going to 
revert it for now.

> Avoid setReplication logging an edit record if old replication equals the new 
> value
> ---
>
> Key: HDFS-16531
> URL: https://issues.apache.org/jira/browse/HDFS-16531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I recently came across a NN log where about 800k setRep calls were made, 
> setting the replication from 3 to 3 - ie leaving it unchanged.
> Even in a case like this, we log an edit record, an audit log, and perform 
> some quota checks etc.
> I believe it should be possible to avoid some of the work if we check for 
> oldRep == newRep and jump out of the method early.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16548) Failed unit test testRenameMoreThanOnceAcrossSnapDirs_2

2022-04-20 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525133#comment-17525133
 ] 

Stephen O'Donnell commented on HDFS-16548:
--

Huum, do you see any way to fix this easily, or should we just revert 
HDFS-16548 ?

> Failed unit test testRenameMoreThanOnceAcrossSnapDirs_2
> ---
>
> Key: HDFS-16548
> URL: https://issues.apache.org/jira/browse/HDFS-16548
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: tomscut
>Priority: Major
>
> It seems to be related to HDFS-16531.
> {code:java}
> [ERROR] Tests run: 44, Failures: 6, Errors: 0, Skipped: 0, Time elapsed: 
> 143.701 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
> [ERROR] 
> testRenameMoreThanOnceAcrossSnapDirs_2(org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots)
>   Time elapsed: 6.606 s  <<< FAILURE!
> java.lang.AssertionError: expected:<3> but was:<1>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at org.junit.Assert.assertEquals(Assert.java:633)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots.testRenameMoreThanOnceAcrossSnapDirs_2(TestRenameWithSnapshots.java:985)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16531) Avoid setReplication logging an edit record if old replication equals the new value

2022-04-19 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16531:
-
Fix Version/s: 3.2.4
   3.3.4

> Avoid setReplication logging an edit record if old replication equals the new 
> value
> ---
>
> Key: HDFS-16531
> URL: https://issues.apache.org/jira/browse/HDFS-16531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I recently came across a NN log where about 800k setRep calls were made, 
> setting the replication from 3 to 3 - ie leaving it unchanged.
> Even in a case like this, we log an edit record, an audit log, and perform 
> some quota checks etc.
> I believe it should be possible to avoid some of the work if we check for 
> oldRep == newRep and jump out of the method early.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16530) setReplication debug log creates a new string even if debug is disabled

2022-04-06 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16530.
--
Resolution: Fixed

> setReplication debug log creates a new string even if debug is disabled
> ---
>
> Key: HDFS-16530
> URL: https://issues.apache.org/jira/browse/HDFS-16530
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In FSDirAttrOp, HDFS-14521 made a good change to move a noisy logger to debug:
> {code}
>   if (oldBR > targetReplication) {
> FSDirectory.LOG.debug("Decreasing replication from {} to {} for {}",
>  oldBR, targetReplication, iip.getPath());
>   } else if (oldBR < targetReplication) {
> FSDirectory.LOG.debug("Increasing replication from {} to {} for {}",
>  oldBR, targetReplication, iip.getPath());
>   } else {
> FSDirectory.LOG.debug("Replication remains unchanged at {} for {}",
>  oldBR, iip.getPath());
>   }
> }
> {code}
> However the `iip.getPath()` method must be evaluated to pass the resulting 
> string into the LOG.debug method, even if debug is not enabled:
> This code may form a new string where it does not need to:
> {code}
>   public String getPath() {
> if (pathname == null) {
>   pathname = DFSUtil.byteArray2PathString(path);
> }
> return pathname;
>   }
> {code}
> We should wrap the entire logging block in `if LOG.debugEnabled()` to avoid 
> any overhead when the logger is not enabled.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16530) setReplication debug log creates a new string even if debug is disabled

2022-04-06 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16530:
-
Fix Version/s: 3.2.4

> setReplication debug log creates a new string even if debug is disabled
> ---
>
> Key: HDFS-16530
> URL: https://issues.apache.org/jira/browse/HDFS-16530
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In FSDirAttrOp, HDFS-14521 made a good change to move a noisy logger to debug:
> {code}
>   if (oldBR > targetReplication) {
> FSDirectory.LOG.debug("Decreasing replication from {} to {} for {}",
>  oldBR, targetReplication, iip.getPath());
>   } else if (oldBR < targetReplication) {
> FSDirectory.LOG.debug("Increasing replication from {} to {} for {}",
>  oldBR, targetReplication, iip.getPath());
>   } else {
> FSDirectory.LOG.debug("Replication remains unchanged at {} for {}",
>  oldBR, iip.getPath());
>   }
> }
> {code}
> However the `iip.getPath()` method must be evaluated to pass the resulting 
> string into the LOG.debug method, even if debug is not enabled:
> This code may form a new string where it does not need to:
> {code}
>   public String getPath() {
> if (pathname == null) {
>   pathname = DFSUtil.byteArray2PathString(path);
> }
> return pathname;
>   }
> {code}
> We should wrap the entire logging block in `if LOG.debugEnabled()` to avoid 
> any overhead when the logger is not enabled.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16530) setReplication debug log creates a new string even if debug is disabled

2022-04-06 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16530:
-
Fix Version/s: 3.4.0
   3.3.3

> setReplication debug log creates a new string even if debug is disabled
> ---
>
> Key: HDFS-16530
> URL: https://issues.apache.org/jira/browse/HDFS-16530
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In FSDirAttrOp, HDFS-14521 made a good change to move a noisy logger to debug:
> {code}
>   if (oldBR > targetReplication) {
> FSDirectory.LOG.debug("Decreasing replication from {} to {} for {}",
>  oldBR, targetReplication, iip.getPath());
>   } else if (oldBR < targetReplication) {
> FSDirectory.LOG.debug("Increasing replication from {} to {} for {}",
>  oldBR, targetReplication, iip.getPath());
>   } else {
> FSDirectory.LOG.debug("Replication remains unchanged at {} for {}",
>  oldBR, iip.getPath());
>   }
> }
> {code}
> However the `iip.getPath()` method must be evaluated to pass the resulting 
> string into the LOG.debug method, even if debug is not enabled:
> This code may form a new string where it does not need to:
> {code}
>   public String getPath() {
> if (pathname == null) {
>   pathname = DFSUtil.byteArray2PathString(path);
> }
> return pathname;
>   }
> {code}
> We should wrap the entire logging block in `if LOG.debugEnabled()` to avoid 
> any overhead when the logger is not enabled.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16531) Avoid setReplication logging an edit record if old replication equals the new value

2022-04-05 Thread Stephen O'Donnell (Jira)

Stephen O'Donnell created HDFS-16531:


 Summary: Avoid setReplication logging an edit record if old 
replication equals the new value
 Key: HDFS-16531
 URL: https://issues.apache.org/jira/browse/HDFS-16531
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


I recently came across a NN log where about 800k setRep calls were made, 
setting the replication from 3 to 3 - ie leaving it unchanged.

Even in a case like this, we log an edit record, an audit log, and perform some 
quota checks etc.

I believe it should be possible to avoid some of the work if we check for 
oldRep == newRep and jump out of the method early.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16530) setReplication debug log creates a new string even if debug is disabled

2022-04-05 Thread Stephen O'Donnell (Jira)

Stephen O'Donnell created HDFS-16530:


 Summary: setReplication debug log creates a new string even if 
debug is disabled
 Key: HDFS-16530
 URL: https://issues.apache.org/jira/browse/HDFS-16530
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


In FSDirAttrOp, HDFS-14521 made a good change to move a noisy logger to debug:

{code}
  if (oldBR > targetReplication) {
FSDirectory.LOG.debug("Decreasing replication from {} to {} for {}",
 oldBR, targetReplication, iip.getPath());
  } else if (oldBR < targetReplication) {
FSDirectory.LOG.debug("Increasing replication from {} to {} for {}",
 oldBR, targetReplication, iip.getPath());
  } else {
FSDirectory.LOG.debug("Replication remains unchanged at {} for {}",
 oldBR, iip.getPath());
  }
}
{code}

However the `iip.getPath()` method must be evaluated to pass the resulting 
string into the LOG.debug method, even if debug is not enabled:

This code may form a new string where it does not need to:

{code}
  public String getPath() {
if (pathname == null) {
  pathname = DFSUtil.byteArray2PathString(path);
}
return pathname;
  }
{code}

We should wrap the entire logging block in `if LOG.debugEnabled()` to avoid any 
overhead when the logger is not enabled.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16496) Snapshot diff on snapshotable directory fails with not snapshottable error

2022-03-08 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16496.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

> Snapshot diff on snapshotable directory fails with not snapshottable error
> --
>
> Key: HDFS-16496
> URL: https://issues.apache.org/jira/browse/HDFS-16496
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Running a snapshot diff against some snapshotable folders gives an error:
> {code}
> org.apache.hadoop.hdfs.protocol.SnapshotException: Directory is neither 
> snapshottable nor under a snap root!
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.checkAndGetSnapshottableAncestorDir(SnapshotManager.java:395)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.diff(SnapshotManager.java:744)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.getSnapshotDiffReportListing(FSDirSnapshotOp.java:200)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getSnapshotDiffReportListing(FSNamesystem.java:6983)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getSnapshotDiffReportListing(NameNodeRpcServer.java:1977)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getSnapshotDiffReportListing(ClientNamenodeProtocolServerSideTranslatorPB.java:1387)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
> {code}
> This is caused by HDFS-15483 (in order snapshot delete), and the issue is in 
> the following method in SnapshotManager:
> {code}
>   public INodeDirectory getSnapshottableAncestorDir(final INodesInPath iip)
>   throws IOException {
> final String path = iip.getPath();
> final INode inode = iip.getLastINode();
> final INodeDirectory dir;
> if (inode instanceof INodeDirectory) { // THIS SHOULD BE TRUE - change to 
> inode.isDirectory()
>   dir = INodeDirectory.valueOf(inode, path);
> } else {
>   dir = INodeDirectory.valueOf(iip.getINode(-2), iip.getParentPath());
> }
> if (dir.isSnapshottable()) {
>   return dir;
> }
> for (INodeDirectory snapRoot : this.snapshottables.values()) {
>   if (dir.isAncestorDirectory(snapRoot)) {
> return snapRoot;
>   }
> }
> return null;
>   }
> {code}
> After adding some debug, I found the directory which is the snapshot root is 
> not an instance of INodeDirectory, but instead is an 
> "INodeReference$DstReference". I think the directory becomes an instance of 
> this class, if the directory is renamed and one of its children has been 
> moved out of another snapshot.
> The fix is simple - just check `inode.isDirectory()` instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16496) Snapshot diff on snapshotable directory fails with not snapshottable error

2022-03-06 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16496:
-
Component/s: namanode

> Snapshot diff on snapshotable directory fails with not snapshottable error
> --
>
> Key: HDFS-16496
> URL: https://issues.apache.org/jira/browse/HDFS-16496
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> Running a snapshot diff against some snapshotable folders gives an error:
> {code}
> org.apache.hadoop.hdfs.protocol.SnapshotException: Directory is neither 
> snapshottable nor under a snap root!
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.checkAndGetSnapshottableAncestorDir(SnapshotManager.java:395)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.diff(SnapshotManager.java:744)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.getSnapshotDiffReportListing(FSDirSnapshotOp.java:200)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getSnapshotDiffReportListing(FSNamesystem.java:6983)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getSnapshotDiffReportListing(NameNodeRpcServer.java:1977)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getSnapshotDiffReportListing(ClientNamenodeProtocolServerSideTranslatorPB.java:1387)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
> {code}
> This is caused by HDFS-15483 (in order snapshot delete), and the issue is in 
> the following method in SnapshotManager:
> {code}
>   public INodeDirectory getSnapshottableAncestorDir(final INodesInPath iip)
>   throws IOException {
> final String path = iip.getPath();
> final INode inode = iip.getLastINode();
> final INodeDirectory dir;
> if (inode instanceof INodeDirectory) { // THIS SHOULD BE TRUE - change to 
> inode.isDirectory()
>   dir = INodeDirectory.valueOf(inode, path);
> } else {
>   dir = INodeDirectory.valueOf(iip.getINode(-2), iip.getParentPath());
> }
> if (dir.isSnapshottable()) {
>   return dir;
> }
> for (INodeDirectory snapRoot : this.snapshottables.values()) {
>   if (dir.isAncestorDirectory(snapRoot)) {
> return snapRoot;
>   }
> }
> return null;
>   }
> {code}
> After adding some debug, I found the directory which is the snapshot root is 
> not an instance of INodeDirectory, but instead is an 
> "INodeReference$DstReference". I think the directory becomes an instance of 
> this class, if the directory is renamed and one of its children has been 
> moved out of another snapshot.
> The fix is simple - just check `inode.isDirectory()` instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16496) Snapshot diff on snapshotable directory fails with not snapshottable error

2022-03-06 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16496:
-
Affects Version/s: 3.4.0

> Snapshot diff on snapshotable directory fails with not snapshottable error
> --
>
> Key: HDFS-16496
> URL: https://issues.apache.org/jira/browse/HDFS-16496
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> Running a snapshot diff against some snapshotable folders gives an error:
> {code}
> org.apache.hadoop.hdfs.protocol.SnapshotException: Directory is neither 
> snapshottable nor under a snap root!
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.checkAndGetSnapshottableAncestorDir(SnapshotManager.java:395)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.diff(SnapshotManager.java:744)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.getSnapshotDiffReportListing(FSDirSnapshotOp.java:200)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getSnapshotDiffReportListing(FSNamesystem.java:6983)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getSnapshotDiffReportListing(NameNodeRpcServer.java:1977)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getSnapshotDiffReportListing(ClientNamenodeProtocolServerSideTranslatorPB.java:1387)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
> {code}
> This is caused by HDFS-15483 (in order snapshot delete), and the issue is in 
> the following method in SnapshotManager:
> {code}
>   public INodeDirectory getSnapshottableAncestorDir(final INodesInPath iip)
>   throws IOException {
> final String path = iip.getPath();
> final INode inode = iip.getLastINode();
> final INodeDirectory dir;
> if (inode instanceof INodeDirectory) { // THIS SHOULD BE TRUE - change to 
> inode.isDirectory()
>   dir = INodeDirectory.valueOf(inode, path);
> } else {
>   dir = INodeDirectory.valueOf(iip.getINode(-2), iip.getParentPath());
> }
> if (dir.isSnapshottable()) {
>   return dir;
> }
> for (INodeDirectory snapRoot : this.snapshottables.values()) {
>   if (dir.isAncestorDirectory(snapRoot)) {
> return snapRoot;
>   }
> }
> return null;
>   }
> {code}
> After adding some debug, I found the directory which is the snapshot root is 
> not an instance of INodeDirectory, but instead is an 
> "INodeReference$DstReference". I think the directory becomes an instance of 
> this class, if the directory is renamed and one of its children has been 
> moved out of another snapshot.
> The fix is simple - just check `inode.isDirectory()` instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16496) Snapshot diff on snapshotable directory fails with not snapshottable error

2022-03-06 Thread Stephen O'Donnell (Jira)

Stephen O'Donnell created HDFS-16496:


 Summary: Snapshot diff on snapshotable directory fails with not 
snapshottable error
 Key: HDFS-16496
 URL: https://issues.apache.org/jira/browse/HDFS-16496
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


Running a snapshot diff against some snapshotable folders gives an error:

{code}
org.apache.hadoop.hdfs.protocol.SnapshotException: Directory is neither 
snapshottable nor under a snap root!
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.checkAndGetSnapshottableAncestorDir(SnapshotManager.java:395)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.diff(SnapshotManager.java:744)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirSnapshotOp.getSnapshotDiffReportListing(FSDirSnapshotOp.java:200)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getSnapshotDiffReportListing(FSNamesystem.java:6983)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getSnapshotDiffReportListing(NameNodeRpcServer.java:1977)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getSnapshotDiffReportListing(ClientNamenodeProtocolServerSideTranslatorPB.java:1387)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
{code}

This is caused by HDFS-15483 (in order snapshot delete), and the issue is in 
the following method in SnapshotManager:

{code}
  public INodeDirectory getSnapshottableAncestorDir(final INodesInPath iip)
  throws IOException {
final String path = iip.getPath();
final INode inode = iip.getLastINode();
final INodeDirectory dir;
if (inode instanceof INodeDirectory) { // THIS SHOULD BE TRUE - change to 
inode.isDirectory()
  dir = INodeDirectory.valueOf(inode, path);
} else {
  dir = INodeDirectory.valueOf(iip.getINode(-2), iip.getParentPath());
}
if (dir.isSnapshottable()) {
  return dir;
}
for (INodeDirectory snapRoot : this.snapshottables.values()) {
  if (dir.isAncestorDirectory(snapRoot)) {
return snapRoot;
  }
}
return null;
  }
{code}

After adding some debug, I found the directory which is the snapshot root is 
not an instance of INodeDirectory, but instead is an 
"INodeReference$DstReference". I think the directory becomes an instance of 
this class, if the directory is renamed and one of its children has been moved 
out of another snapshot.

The fix is simple - just check `inode.isDirectory()` instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14626) Decommission all nodes hosting last block of open file succeeds unexpectedly

2022-02-22 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-14626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496014#comment-17496014
 ] 

Stephen O'Donnell commented on HDFS-14626:
--

[~aajisaka] I have not looked into this issue for years now. It does seem like 
something which should be fixed, but I have no plans to work on it. Feel free 
to take it over if you want to try fixing it.

> Decommission all nodes hosting last block of open file succeeds unexpectedly 
> -
>
> Key: HDFS-14626
> URL: https://issues.apache.org/jira/browse/HDFS-14626
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: test-to-reproduce.patch
>
>
> I have been investigating scenarios that cause decommission to hang, 
> especially around one long standing issue. That is, an open block on the host 
> which is being decommissioned can cause the process to never complete.
> Checking the history, there seems to have been at least one change in 
> HDFS-5579 which greatly improved the situation, but from reading comments and 
> support cases, there still seems to be some scenarios where open blocks on a 
> DN host cause the decommission to get stuck.
> No matter what I try, I have not been able to reproduce this, but I think I 
> have uncovered another issue that may partly explain why.
> If I do the following, the nodes will decommission without any issues:
> 1. Create a file and write to it so it crosses a block boundary. Then there 
> is one complete block and one under construction block. Keep the file open, 
> and write a few bytes periodically.
> 2. Now note the nodes which the UC block is currently being written on, and 
> decommission them all.
> 3. The decommission should succeed.
> 4. Now attempt to close the open file, and it will fail to close with an 
> error like below, probably as decommissioned nodes are not allowed to send 
> IBRs:
> {code:java}
> java.io.IOException: Unable to close file because the last block 
> BP-646926902-192.168.0.20-1562099323291:blk_1073741827_1003 does not have 
> enough number of replicas.
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:968)
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:911)
>     at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:894)
>     at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:849)
>     at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>     at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101){code}
> Interestingly, if you recommission the nodes without restarting them before 
> closing the file, it will close OK, and writes to it can continue even once 
> decommission has completed.
> I don't think this is expected - ie decommission should not complete on all 
> nodes hosting the last UC block of a file?
> From what I have figured out, I don't think UC blocks are considered in the 
> DatanodeAdminManager at all. This is because the original list of blocks it 
> cares about, are taken from the Datanode block Iterator, which takes them 
> from the DatanodeStorageInfo objects attached to the datanode instance. I 
> believe UC blocks don't make it into the DatanodeStoreageInfo until after 
> they have been completed and an IBR sent, so the decommission logic never 
> considers them.
> What troubles me about this explanation, is how did open files previously 
> cause decommission to get stuck if it never checks for them, so I suspect I 
> am missing something.
> I will attach a patch with a test case that demonstrates this issue. This 
> reproduces on trunk and I also tested on CDH 5.8.1, which is based on the 2.6 
> branch, but with a lot of backports.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16423) balancer should not get blocks on stale storages

2022-01-26 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17482324#comment-17482324
 ] 

Stephen O'Donnell commented on HDFS-16423:
--

I have a question on this Jira.

As I understand it, the namenode marks all storages stale after a failover. The 
only way the storage is marked as "not stale" is when a FBR is sent from the 
datanode. This FBR interval is 6 hours by default, but some set it higher. I 
don't think there is any mechanism to trigger the FBR early due to the failover.

If we tell the balancer to not pick blocks from stale storages, then after a 
failover the balancer will effectively not work at all for up to the FBR 
interval, as there will be no storages for it to pick blocks from. Is that 
correct?

I wonder if we should log a message in the NN indicating "all storages as 
stale" to help people understand that is why the balancer is not working? 
Probably, the balancer will try 5 times to get some blocks to move and then 
give up with a "failed to move any blocks in 5 iterations" message.

> balancer should not get blocks on stale storages
> 
>
> Key: HDFS-16423
> URL: https://issues.apache.org/jira/browse/HDFS-16423
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Reporter: qinyuren
>Assignee: qinyuren
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
> Attachments: image-2022-01-13-17-18-32-409.png
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> We have met a problems as described in HDFS-16420
> We found that balancer copied a block multi times without deleting the source 
> block if this block was placed in a stale storage. And resulting a block with 
> many copies, but these redundant copies are not deleted until the storage 
> become not stale.
>  
> !image-2022-01-13-17-18-32-409.png|width=657,height=275!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16438) Avoid holding read locks for a long time when scanDatanodeStorage

2022-01-25 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17481923#comment-17481923
 ] 

Stephen O'Donnell commented on HDFS-16438:
--

[~weichiu] Thanks for pinging me on this. I will check the PR.

[~tomscut] It is interesting to me that you are using the 
DatanodeAdminBackoffMonitor - are you finding it to be better, the same or 
worse than the default monitor in general? I developed it some time ago, but 
have not seen any real world use of it, so I am interested in how you find it 
working.

> Avoid holding read locks for a long time when scanDatanodeStorage
> -
>
> Key: HDFS-16438
> URL: https://issues.apache.org/jira/browse/HDFS-16438
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-01-25-23-18-30-275.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> At the time of decommission, if use {*}DatanodeAdminBackoffMonitor{*}, there 
> is a heavy operation: {*}scanDatanodeStorage{*}. If the number of blocks on a 
> storage is large(more than 5 million), and GC performance is also poor, it 
> may hold *read lock* for a long time, we should optimize it.
> !image-2022-01-25-23-18-30-275.png|width=764,height=193!
> {code:java}
> 2021-12-22 07:49:01,279 INFO  namenode.FSNamesystem 
> (FSNamesystemLock.java:readUnlock(220)) - FSNamesystem scanDatanodeStorage 
> read lock held for 5491 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:222)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readUnlock(FSNamesystem.java:1641)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.scanDatanodeStorage(DatanodeAdminBackoffMonitor.java:646)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.checkForCompletedNodes(DatanodeAdminBackoffMonitor.java:417)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.check(DatanodeAdminBackoffMonitor.java:300)
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminBackoffMonitor.run(DatanodeAdminBackoffMonitor.java:201)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
>     Number of suppressed read-lock reports: 0
>     Longest read-lock held interval: 5491 {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-25 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16262.
--
Resolution: Fixed

I have committed this to trunk and branch-3.3. There are conflicts trying to 
take it to branch 3.2. If you want it on branch-3.2, please create another PR 
(we can re-use this Jira) against branch-3.2 so we get the CI checks to run.

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16262) Async refresh of cached locations in DFSInputStream

2022-01-25 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16262:
-
Fix Version/s: 3.4.0
   3.3.3

> Async refresh of cached locations in DFSInputStream
> ---
>
> Key: HDFS-16262
> URL: https://issues.apache.org/jira/browse/HDFS-16262
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Assignee: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.3
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> HDFS-15119 added the ability to invalidate cached block locations in 
> DFSInputStream. As written, the feature will affect all DFSInputStreams 
> regardless of whether they need it or not. The invalidation also only applies 
> on the next request, so the next request will pay the cost of calling 
> openInfo before reading the data.
> I'm working on a feature for HBase which enables efficient healing of 
> locality through Balancer-style low level block moves (HBASE-26250). I'd like 
> to utilize the idea started in HDFS-15119 in order to update DFSInputStreams 
> after blocks have been moved to local hosts.
> I was considering using the feature as is, but some of our clusters are quite 
> large and I'm concerned about the impact on the namenode:
>  * We have some clusters with over 350k StoreFiles, so that'd be 350k 
> DFSInputStreams. With such a large number and very active usage, having the 
> refresh be in-line makes it too hard to ensure we don't DDOS the NameNode.
>  * Currently we need to pay the price of openInfo the next time a 
> DFSInputStream is invoked. Moving that async would minimize the latency hit. 
> Also, some StoreFiles might be far less frequently accessed, so they may live 
> on for a long time before ever refreshing. We'd like to be able to know that 
> all DFSInputStreams are refreshed by a given time.
>  * We may have 350k files, but only a small percentage of them are ever 
> non-local at a given time. Refreshing only if necessary will save a lot of 
> work.
> In order to make this as painless to end users as possible, I'd like to:
>  * Update the implementation to utilize an async thread for managing 
> refreshes. This will give more control over rate limiting across all 
> DFSInputStreams in a DFSClient, and also ensure that all DFSInputStreams are 
> refreshed.
>  * Only refresh files which are lacking a local replica or have known 
> deadNodes to be cleaned up
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16420) ec + balancer may cause missing block

2022-01-10 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471862#comment-17471862
 ] 

Stephen O'Donnell commented on HDFS-16420:
--

Since release 3.1.0, I think there have been quite a few EC fixes, including 
dataloss issues. Are you running base 3.1.0 with no other patches, or have you 
backported changes into your build?

> ec + balancer may cause missing block
> -
>
> Key: HDFS-16420
> URL: https://issues.apache.org/jira/browse/HDFS-16420
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: qinyuren
>Priority: Major
> Attachments: image-2022-01-10-17-31-35-910.png, 
> image-2022-01-10-17-32-56-981.png
>
>
> We have a similar problem as HDFS-16297 described. 
> In our cluster, we used {color:#de350b}ec(6+3) + balancer with version 
> 3.1.0{color}, and the {color:#de350b}missing block{color} happened. 
> We got the block(blk_-9223372036824119008) info from fsck, only 5 live 
> replications and multiple redundant replications. 
> {code:java}
> blk_-9223372036824119008_220037616 len=133370338 MISSING! Live_repl=5
> blk_-9223372036824119007:DatanodeInfoWithStorage,   
> blk_-9223372036824119002:DatanodeInfoWithStorage,    
> blk_-9223372036824119001:DatanodeInfoWithStorage,  
> blk_-9223372036824119000:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage,  
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage {code}
>    
> We searched the log from all datanode, and found that the internal blocks of 
> blk_-9223372036824119008 were deleted almost at the same time.
>  
> {code:java}
> 08:15:58,550 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119008_220037616 URI 
> file:/data15/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119008
> 08:16:21,214 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119006_220037616 URI 
> file:/data4/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119006
> 08:16:55,737 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119005_220037616 URI 
> file:/data2/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119005
> {code}
>  
> The total number of internal blocks deleted during 08:15-08:17 are as follows
> ||internal block||    delete num||
> |blk_-9223372036824119008      
> blk_-9223372036824119006         
> blk_-9223372036824119005         
> blk_-9223372036824119004         
> blk_-9223372036824119003         
> blk_-9223372036824119000        |        1
>         1
>         1  
>         50
>         1
>         1|
>  
> {color:#ff}During 08:15 to 08:17, we restarted 2 datanode and triggered 
> full block report immediately.{color}
>  
> There are 2 questions: 
> 1. Why are there so many replicas of this block?
> 2. Why delete the internal block with only one copy?
> The reasons for the first problem may be as follows: 
> 1. We set the full block report period of some datanode to 168 hours.
> 2. We have done a namenode HA operation.
> 3. After namenode HA, the state of storage became 
> {color:#ff}stale{color}, and the state not change until next full block 
> report.
> 4. The balancer copied the replica without deleting the replica from source 
> node, because the source node have the stale storage, and the request was put 
> into {color:#ff}postponedMisreplicatedBlocks{color}.
> 5. Balancer continues to copy the replica, eventually resulting in multiple 
> copies of a replica
> !image-2022-01-10-17-31-35-910.png|width=642,height=269!
> The set of {color:#ff}rescannedMisreplicatedBlocks{color} have so many 
> block to remove.
> !image-2022-01-10-17-32-56-981.png|width=745,height=124!
> As for the second question, we checked the code of 
> {color:#de350b}processExtraRedundancyBlock{color}, but didn't find any 
> problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16408) Ensure LeaseRecheckIntervalMs is greater than zero

2022-01-05 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16408.
--
Resolution: Fixed

> Ensure LeaseRecheckIntervalMs is greater than zero
> --
>
> Key: HDFS-16408
> URL: https://issues.apache.org/jira/browse/HDFS-16408
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.3, 3.3.1
>Reporter: Jingxuan Fu
>Assignee: Jingxuan Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>   Original Estimate: 1h
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> There is a problem with the try catch statement in the LeaseMonitor daemon 
> (in LeaseManager.java), when an unknown exception is caught, it simply prints 
> a warning message and continues with the next loop. 
> An extreme case is when the configuration item 
> 'dfs.namenode.lease-recheck-interval-ms' is accidentally set to a negative 
> number by the user, as the configuration item is read without checking its 
> range, 'fsnamesystem. getLeaseRecheckIntervalMs()' returns this value and is 
> used as an argument to Thread.sleep(). A negative argument will cause 
> Thread.sleep() to throw an IllegalArgumentException, which will be caught by 
> 'catch(Throwable e)' and a warning message will be printed. 
> This behavior is repeated for each subsequent loop. This means that a huge 
> amount of repetitive messages will be printed to the log file in a short 
> period of time, quickly consuming disk space and affecting the operation of 
> the system.
> As you can see, 178M log files are generated in one minute.
>  
> {code:java}
> ll logs/
> total 174456
> drwxrwxr-x  2 hadoop hadoop      4096 1月   3 15:13 ./
> drwxr-xr-x 11 hadoop hadoop      4096 1月   3 15:13 ../
> -rw-rw-r--  1 hadoop hadoop     36342 1月   3 15:14 
> hadoop-hadoop-datanode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop      1243 1月   3 15:13 
> hadoop-hadoop-datanode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop 178545466 1月   3 15:14 
> hadoop-hadoop-namenode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop       692 1月   3 15:13 
> hadoop-hadoop-namenode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop     33201 1月   3 15:14 
> hadoop-hadoop-secondarynamenode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop      3764 1月   3 15:14 
> hadoop-hadoop-secondarynamenode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop         0 1月   3 15:13 SecurityAuth-hadoop.audit
>  
> tail -n 15 logs/hadoop-hadoop-namenode-ljq1.log 
> 2022-01-03 15:14:46,032 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> 2022-01-03 15:14:46,033 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> 2022-01-03 15:14:46,033 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
>  
> I think there are two potential solutions. 
> The first is to adjust the position of the try catch statement in the 
> LeaseMonitor daemon by moving 'catch(Throwable e)' to the outside of the loop 
> body. This can be done like the NameNodeResourceMonitor daemon, which ends 
> the thread when an unexpected exception is caught. 
> The second is to use Precondition.checkArgument() to scope the configuration 
> item 'dfs.namenode.lease-recheck-interval-ms' when it is read, to avoid the 
> wrong configuration item can affect the subsequent operation of the program.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16408) Ensure LeaseRecheckIntervalMs is greater than zero

2022-01-05 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16408:
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3

> Ensure LeaseRecheckIntervalMs is greater than zero
> --
>
> Key: HDFS-16408
> URL: https://issues.apache.org/jira/browse/HDFS-16408
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.3, 3.3.1
>Reporter: Jingxuan Fu
>Assignee: Jingxuan Fu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>   Original Estimate: 1h
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> There is a problem with the try catch statement in the LeaseMonitor daemon 
> (in LeaseManager.java), when an unknown exception is caught, it simply prints 
> a warning message and continues with the next loop. 
> An extreme case is when the configuration item 
> 'dfs.namenode.lease-recheck-interval-ms' is accidentally set to a negative 
> number by the user, as the configuration item is read without checking its 
> range, 'fsnamesystem. getLeaseRecheckIntervalMs()' returns this value and is 
> used as an argument to Thread.sleep(). A negative argument will cause 
> Thread.sleep() to throw an IllegalArgumentException, which will be caught by 
> 'catch(Throwable e)' and a warning message will be printed. 
> This behavior is repeated for each subsequent loop. This means that a huge 
> amount of repetitive messages will be printed to the log file in a short 
> period of time, quickly consuming disk space and affecting the operation of 
> the system.
> As you can see, 178M log files are generated in one minute.
>  
> {code:java}
> ll logs/
> total 174456
> drwxrwxr-x  2 hadoop hadoop      4096 1月   3 15:13 ./
> drwxr-xr-x 11 hadoop hadoop      4096 1月   3 15:13 ../
> -rw-rw-r--  1 hadoop hadoop     36342 1月   3 15:14 
> hadoop-hadoop-datanode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop      1243 1月   3 15:13 
> hadoop-hadoop-datanode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop 178545466 1月   3 15:14 
> hadoop-hadoop-namenode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop       692 1月   3 15:13 
> hadoop-hadoop-namenode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop     33201 1月   3 15:14 
> hadoop-hadoop-secondarynamenode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop      3764 1月   3 15:14 
> hadoop-hadoop-secondarynamenode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop         0 1月   3 15:13 SecurityAuth-hadoop.audit
>  
> tail -n 15 logs/hadoop-hadoop-namenode-ljq1.log 
> 2022-01-03 15:14:46,032 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> 2022-01-03 15:14:46,033 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> 2022-01-03 15:14:46,033 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
>  
> I think there are two potential solutions. 
> The first is to adjust the position of the try catch statement in the 
> LeaseMonitor daemon by moving 'catch(Throwable e)' to the outside of the loop 
> body. This can be done like the NameNodeResourceMonitor daemon, which ends 
> the thread when an unexpected exception is caught. 
> The second is to use Precondition.checkArgument() to scope the configuration 
> item 'dfs.namenode.lease-recheck-interval-ms' when it is read, to avoid the 
> wrong configuration item can affect the subsequent operation of the program.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-16408) Ensure LeaseRecheckIntervalMs is greater than zero

2022-01-05 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell reassigned HDFS-16408:


Assignee: Jingxuan Fu  (was: Stephen O'Donnell)

> Ensure LeaseRecheckIntervalMs is greater than zero
> --
>
> Key: HDFS-16408
> URL: https://issues.apache.org/jira/browse/HDFS-16408
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.3, 3.3.1
>Reporter: Jingxuan Fu
>Assignee: Jingxuan Fu
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 1h
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> There is a problem with the try catch statement in the LeaseMonitor daemon 
> (in LeaseManager.java), when an unknown exception is caught, it simply prints 
> a warning message and continues with the next loop. 
> An extreme case is when the configuration item 
> 'dfs.namenode.lease-recheck-interval-ms' is accidentally set to a negative 
> number by the user, as the configuration item is read without checking its 
> range, 'fsnamesystem. getLeaseRecheckIntervalMs()' returns this value and is 
> used as an argument to Thread.sleep(). A negative argument will cause 
> Thread.sleep() to throw an IllegalArgumentException, which will be caught by 
> 'catch(Throwable e)' and a warning message will be printed. 
> This behavior is repeated for each subsequent loop. This means that a huge 
> amount of repetitive messages will be printed to the log file in a short 
> period of time, quickly consuming disk space and affecting the operation of 
> the system.
> As you can see, 178M log files are generated in one minute.
>  
> {code:java}
> ll logs/
> total 174456
> drwxrwxr-x  2 hadoop hadoop      4096 1月   3 15:13 ./
> drwxr-xr-x 11 hadoop hadoop      4096 1月   3 15:13 ../
> -rw-rw-r--  1 hadoop hadoop     36342 1月   3 15:14 
> hadoop-hadoop-datanode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop      1243 1月   3 15:13 
> hadoop-hadoop-datanode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop 178545466 1月   3 15:14 
> hadoop-hadoop-namenode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop       692 1月   3 15:13 
> hadoop-hadoop-namenode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop     33201 1月   3 15:14 
> hadoop-hadoop-secondarynamenode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop      3764 1月   3 15:14 
> hadoop-hadoop-secondarynamenode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop         0 1月   3 15:13 SecurityAuth-hadoop.audit
>  
> tail -n 15 logs/hadoop-hadoop-namenode-ljq1.log 
> 2022-01-03 15:14:46,032 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> 2022-01-03 15:14:46,033 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> 2022-01-03 15:14:46,033 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
>  
> I think there are two potential solutions. 
> The first is to adjust the position of the try catch statement in the 
> LeaseMonitor daemon by moving 'catch(Throwable e)' to the outside of the loop 
> body. This can be done like the NameNodeResourceMonitor daemon, which ends 
> the thread when an unexpected exception is caught. 
> The second is to use Precondition.checkArgument() to scope the configuration 
> item 'dfs.namenode.lease-recheck-interval-ms' when it is read, to avoid the 
> wrong configuration item can affect the subsequent operation of the program.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-16408) Ensure LeaseRecheckIntervalMs is greater than zero

2022-01-05 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell reassigned HDFS-16408:


Assignee: Stephen O'Donnell

> Ensure LeaseRecheckIntervalMs is greater than zero
> --
>
> Key: HDFS-16408
> URL: https://issues.apache.org/jira/browse/HDFS-16408
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.3, 3.3.1
>Reporter: Jingxuan Fu
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 1h
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> There is a problem with the try catch statement in the LeaseMonitor daemon 
> (in LeaseManager.java), when an unknown exception is caught, it simply prints 
> a warning message and continues with the next loop. 
> An extreme case is when the configuration item 
> 'dfs.namenode.lease-recheck-interval-ms' is accidentally set to a negative 
> number by the user, as the configuration item is read without checking its 
> range, 'fsnamesystem. getLeaseRecheckIntervalMs()' returns this value and is 
> used as an argument to Thread.sleep(). A negative argument will cause 
> Thread.sleep() to throw an IllegalArgumentException, which will be caught by 
> 'catch(Throwable e)' and a warning message will be printed. 
> This behavior is repeated for each subsequent loop. This means that a huge 
> amount of repetitive messages will be printed to the log file in a short 
> period of time, quickly consuming disk space and affecting the operation of 
> the system.
> As you can see, 178M log files are generated in one minute.
>  
> {code:java}
> ll logs/
> total 174456
> drwxrwxr-x  2 hadoop hadoop      4096 1月   3 15:13 ./
> drwxr-xr-x 11 hadoop hadoop      4096 1月   3 15:13 ../
> -rw-rw-r--  1 hadoop hadoop     36342 1月   3 15:14 
> hadoop-hadoop-datanode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop      1243 1月   3 15:13 
> hadoop-hadoop-datanode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop 178545466 1月   3 15:14 
> hadoop-hadoop-namenode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop       692 1月   3 15:13 
> hadoop-hadoop-namenode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop     33201 1月   3 15:14 
> hadoop-hadoop-secondarynamenode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop      3764 1月   3 15:14 
> hadoop-hadoop-secondarynamenode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop         0 1月   3 15:13 SecurityAuth-hadoop.audit
>  
> tail -n 15 logs/hadoop-hadoop-namenode-ljq1.log 
> 2022-01-03 15:14:46,032 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> 2022-01-03 15:14:46,033 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> 2022-01-03 15:14:46,033 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
>  
> I think there are two potential solutions. 
> The first is to adjust the position of the try catch statement in the 
> LeaseMonitor daemon by moving 'catch(Throwable e)' to the outside of the loop 
> body. This can be done like the NameNodeResourceMonitor daemon, which ends 
> the thread when an unexpected exception is caught. 
> The second is to use Precondition.checkArgument() to scope the configuration 
> item 'dfs.namenode.lease-recheck-interval-ms' when it is read, to avoid the 
> wrong configuration item can affect the subsequent operation of the program.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16408) Ensure LeaseRecheckIntervalMs is greater than zero

2022-01-05 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16408:
-
Summary: Ensure LeaseRecheckIntervalMs is greater than zero  (was: Negative 
LeaseRecheckIntervalMs will let LeaseMonitor loop forever and print huge amount 
of log)

> Ensure LeaseRecheckIntervalMs is greater than zero
> --
>
> Key: HDFS-16408
> URL: https://issues.apache.org/jira/browse/HDFS-16408
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.3, 3.3.1
>Reporter: Jingxuan Fu
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 1h
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> There is a problem with the try catch statement in the LeaseMonitor daemon 
> (in LeaseManager.java), when an unknown exception is caught, it simply prints 
> a warning message and continues with the next loop. 
> An extreme case is when the configuration item 
> 'dfs.namenode.lease-recheck-interval-ms' is accidentally set to a negative 
> number by the user, as the configuration item is read without checking its 
> range, 'fsnamesystem. getLeaseRecheckIntervalMs()' returns this value and is 
> used as an argument to Thread.sleep(). A negative argument will cause 
> Thread.sleep() to throw an IllegalArgumentException, which will be caught by 
> 'catch(Throwable e)' and a warning message will be printed. 
> This behavior is repeated for each subsequent loop. This means that a huge 
> amount of repetitive messages will be printed to the log file in a short 
> period of time, quickly consuming disk space and affecting the operation of 
> the system.
> As you can see, 178M log files are generated in one minute.
>  
> {code:java}
> ll logs/
> total 174456
> drwxrwxr-x  2 hadoop hadoop      4096 1月   3 15:13 ./
> drwxr-xr-x 11 hadoop hadoop      4096 1月   3 15:13 ../
> -rw-rw-r--  1 hadoop hadoop     36342 1月   3 15:14 
> hadoop-hadoop-datanode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop      1243 1月   3 15:13 
> hadoop-hadoop-datanode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop 178545466 1月   3 15:14 
> hadoop-hadoop-namenode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop       692 1月   3 15:13 
> hadoop-hadoop-namenode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop     33201 1月   3 15:14 
> hadoop-hadoop-secondarynamenode-ljq1.log
> -rw-rw-r--  1 hadoop hadoop      3764 1月   3 15:14 
> hadoop-hadoop-secondarynamenode-ljq1.out
> -rw-rw-r--  1 hadoop hadoop         0 1月   3 15:13 SecurityAuth-hadoop.audit
>  
> tail -n 15 logs/hadoop-hadoop-namenode-ljq1.log 
> 2022-01-03 15:14:46,032 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> 2022-01-03 15:14:46,033 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> 2022-01-03 15:14:46,033 WARN 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Unexpected throwable: 
> java.lang.IllegalArgumentException: timeout value is negative
>         at java.base/java.lang.Thread.sleep(Native Method)
>         at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:534)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
>  
> I think there are two potential solutions. 
> The first is to adjust the position of the try catch statement in the 
> LeaseMonitor daemon by moving 'catch(Throwable e)' to the outside of the loop 
> body. This can be done like the NameNodeResourceMonitor daemon, which ends 
> the thread when an unexpected exception is caught. 
> The second is to use Precondition.checkArgument() to scope the configuration 
> item 'dfs.namenode.lease-recheck-interval-ms' when it is read, to avoid the 
> wrong configuration item can affect the subsequent operation of the program.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService

2021-12-21 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16391.
--
Resolution: Fixed

> Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
> ---
>
> Key: HDFS-16391
> URL: https://issues.apache.org/jira/browse/HDFS-16391
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: wangzhaohui
>Assignee: wangzhaohui
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService

2021-12-21 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell reassigned HDFS-16391:


Assignee: wangzhaohui

> Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
> ---
>
> Key: HDFS-16391
> URL: https://issues.apache.org/jira/browse/HDFS-16391
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: wangzhaohui
>Assignee: wangzhaohui
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16391) Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService

2021-12-21 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16391:
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3

> Avoid evaluation of LOG.debug statement in NameNodeHeartbeatService
> ---
>
> Key: HDFS-16391
> URL: https://issues.apache.org/jira/browse/HDFS-16391
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: wangzhaohui
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working

2021-12-21 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16386:
-
Fix Version/s: 3.2.3

> Reduce DataNode load when FsDatasetAsyncDiskService is working
> --
>
> Key: HDFS-16386
> URL: https://issues.apache.org/jira/browse/HDFS-16386
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.2.4, 3.3.3
>
> Attachments: monitor.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it 
> will cause a high load on the DataNode.
> Here are some monitoring related to memory:
>  !monitor.png! 
> Since each disk deletes the block asynchronously, and each thread allows 4 
> threads to work,
> This will cause some troubles to the DataNode, such as increased cpu and 
> increased memory.
> We should appropriately reduce the number of jobs of the total thread so that 
> the DataNode can work better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working

2021-12-20 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16386.
--
Resolution: Fixed

> Reduce DataNode load when FsDatasetAsyncDiskService is working
> --
>
> Key: HDFS-16386
> URL: https://issues.apache.org/jira/browse/HDFS-16386
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: monitor.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it 
> will cause a high load on the DataNode.
> Here are some monitoring related to memory:
>  !monitor.png! 
> Since each disk deletes the block asynchronously, and each thread allows 4 
> threads to work,
> This will cause some troubles to the DataNode, such as increased cpu and 
> increased memory.
> We should appropriately reduce the number of jobs of the total thread so that 
> the DataNode can work better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16386) Reduce DataNode load when FsDatasetAsyncDiskService is working

2021-12-20 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16386:
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3

> Reduce DataNode load when FsDatasetAsyncDiskService is working
> --
>
> Key: HDFS-16386
> URL: https://issues.apache.org/jira/browse/HDFS-16386
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.9.2
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
> Attachments: monitor.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Our DataNode node has 36 disks. When FsDatasetAsyncDiskService is working, it 
> will cause a high load on the DataNode.
> Here are some monitoring related to memory:
>  !monitor.png! 
> Since each disk deletes the block asynchronously, and each thread allows 4 
> threads to work,
> This will cause some troubles to the DataNode, such as increased cpu and 
> increased memory.
> We should appropriately reduce the number of jobs of the total thread so that 
> the DataNode can work better.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15180) DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.

2021-11-29 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17450379#comment-17450379
 ] 

Stephen O'Donnell commented on HDFS-15180:
--

In Cloudera, we have not been looking into this issue actively, but it is an 
interesting one. We have went ahead with HDFS-15160 in our latest release and 
so far have not seen any problems from it. Our hope is the relatively minor 
change in HDFS-15160 can have a large benefit and is easy to disable with a 
config switch if any problems are detected. This change probably has a bigger 
impact that HDFS-15160, but is more complicated and so carries more risk.

It is good to know you have been running it with no issue for some time - that 
does help give us more confidence there are no issues with the approach.

>  DataNode FsDatasetImpl Fine-Grained Locking via BlockPool.
> ---
>
> Key: HDFS-15180
> URL: https://issues.apache.org/jira/browse/HDFS-15180
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: Qi Zhu
>Assignee: Aiphago
>Priority: Major
> Attachments: HDFS-15180.001.patch, HDFS-15180.002.patch, 
> HDFS-15180.003.patch, HDFS-15180.004.patch, 
> image-2020-03-10-17-22-57-391.png, image-2020-03-10-17-31-58-830.png, 
> image-2020-03-10-17-34-26-368.png, image-2020-04-09-11-20-36-459.png
>
>
> Now the FsDatasetImpl datasetLock is heavy, when their are many namespaces in 
> big cluster. If we can split the FsDatasetImpl datasetLock via blockpool. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16343) Add some debug logs when the dfsUsed are not used during Datanode startup

2021-11-23 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16343.
--
Resolution: Fixed

> Add some debug logs when the dfsUsed are not used during Datanode startup
> -
>
> Key: HDFS-16343
> URL: https://issues.apache.org/jira/browse/HDFS-16343
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16343) Add some debug logs when the dfsUsed are not used during Datanode startup

2021-11-23 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16343:
-
Fix Version/s: 3.4.0
   3.2.4
   3.3.3

> Add some debug logs when the dfsUsed are not used during Datanode startup
> -
>
> Key: HDFS-16343
> URL: https://issues.apache.org/jira/browse/HDFS-16343
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Mukul Kumar Singh
>Assignee: Mukul Kumar Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.3
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16173) Improve CopyCommands#Put#executor queue configurability

2021-11-22 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16173:
-
Fix Version/s: 3.2.3

> Improve CopyCommands#Put#executor queue configurability
> ---
>
> Key: HDFS-16173
> URL: https://issues.apache.org/jira/browse/HDFS-16173
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: fs
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2, 3.2.4
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> In CopyCommands#Put, the number of executor queues is a fixed value, 1024.
> We should make him configurable, because there are different usage 
> environments.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file

2021-11-04 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17438658#comment-17438658
 ] 

Stephen O'Donnell commented on HDFS-16286:
--

Yes, we added it already to the HDFSCommands.md in the PR:

https://github.com/apache/hadoop/pull/3593/commits/51e61547d07d9a0c236b89e5b804aaa8f362f28d

> Debug tool to verify the correctness of erasure coding on file
> --
>
> Key: HDFS-16286
> URL: https://issues.apache.org/jira/browse/HDFS-16286
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, tools
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Block data in erasure coded block group may corrupt and the block meta 
> (checksum) is unable to discover the corruption in some cases such as EC 
> reconstruction, related issues are:  HDFS-14768, HDFS-15186, HDFS-15240.
> In addition to HDFS-15759, there needs a tool to check erasure coded file 
> whether any block group has data corruption in case of other conditions 
> rather than EC reconstruction, or the feature HDFS-15759(validation during EC 
> reconstruction) is not open(which is close by default now).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file

2021-11-03 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16286.
--
Resolution: Fixed

Committed down the active 3.x branches. Thanks for the contribution [~cndaimin].

> Debug tool to verify the correctness of erasure coding on file
> --
>
> Key: HDFS-16286
> URL: https://issues.apache.org/jira/browse/HDFS-16286
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, tools
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Block data in erasure coded block group may corrupt and the block meta 
> (checksum) is unable to discover the corruption in some cases such as EC 
> reconstruction, related issues are:  HDFS-14768, HDFS-15186, HDFS-15240.
> In addition to HDFS-15759, there needs a tool to check erasure coded file 
> whether any block group has data corruption in case of other conditions 
> rather than EC reconstruction, or the feature HDFS-15759(validation during EC 
> reconstruction) is not open(which is close by default now).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file

2021-11-03 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16286:
-
Fix Version/s: 3.3.2
   3.2.3
   3.4.0

> Debug tool to verify the correctness of erasure coding on file
> --
>
> Key: HDFS-16286
> URL: https://issues.apache.org/jira/browse/HDFS-16286
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, tools
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Block data in erasure coded block group may corrupt and the block meta 
> (checksum) is unable to discover the corruption in some cases such as EC 
> reconstruction, related issues are:  HDFS-14768, HDFS-15186, HDFS-15240.
> In addition to HDFS-15759, there needs a tool to check erasure coded file 
> whether any block group has data corruption in case of other conditions 
> rather than EC reconstruction, or the feature HDFS-15759(validation during EC 
> reconstruction) is not open(which is close by default now).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16259) Catch and re-throw sub-classes of AccessControlException thrown by any permission provider plugins (eg Ranger)

2021-11-02 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16259.
--
Resolution: Fixed

Thanks for the review [~weichiu] and the discussion on this [~ayushtkn]

> Catch and re-throw sub-classes of AccessControlException thrown by any 
> permission provider plugins (eg Ranger)
> --
>
> Key: HDFS-16259
> URL: https://issues.apache.org/jira/browse/HDFS-16259
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When a permission provider plugin is enabled (eg Ranger) there are some 
> scenarios where it can throw a sub-class of an AccessControlException (eg 
> RangerAccessControlException). If this exception is allowed to propagate up 
> the stack, it can give problems in the HDFS Client, when it unwraps the 
> remote exception containing the AccessControlException sub-class.
> Ideally, we should make AccessControlException final so it cannot be 
> sub-classed, but that would be a breaking change at this point. Therefore I 
> believe the safest thing to do, is to catch any AccessControlException that 
> comes out of the permission enforcer plugin, and re-throw an 
> AccessControlException instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16259) Catch and re-throw sub-classes of AccessControlException thrown by any permission provider plugins (eg Ranger)

2021-11-02 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16259:
-
Fix Version/s: 3.3.2
   3.4.0

> Catch and re-throw sub-classes of AccessControlException thrown by any 
> permission provider plugins (eg Ranger)
> --
>
> Key: HDFS-16259
> URL: https://issues.apache.org/jira/browse/HDFS-16259
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When a permission provider plugin is enabled (eg Ranger) there are some 
> scenarios where it can throw a sub-class of an AccessControlException (eg 
> RangerAccessControlException). If this exception is allowed to propagate up 
> the stack, it can give problems in the HDFS Client, when it unwraps the 
> remote exception containing the AccessControlException sub-class.
> Ideally, we should make AccessControlException final so it cannot be 
> sub-classed, but that would be a breaking change at this point. Therefore I 
> believe the safest thing to do, is to catch any AccessControlException that 
> comes out of the permission enforcer plugin, and re-throw an 
> AccessControlException instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16259) Catch and re-throw sub-classes of AccessControlException thrown by any permission provider plugins (eg Ranger)

2021-10-28 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17435315#comment-17435315
 ] 

Stephen O'Donnell commented on HDFS-16259:
--

[~ayushtkn] Thanks for the discussion on this. I will create a PR to catch the 
enforcer ACE subclass exceptions and re-throw ACE in the next day or two, which 
will solve the immediate problem. Then we can consider making incompatible 
changes on trunk later. 

> Catch and re-throw sub-classes of AccessControlException thrown by any 
> permission provider plugins (eg Ranger)
> --
>
> Key: HDFS-16259
> URL: https://issues.apache.org/jira/browse/HDFS-16259
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> When a permission provider plugin is enabled (eg Ranger) there are some 
> scenarios where it can throw a sub-class of an AccessControlException (eg 
> RangerAccessControlException). If this exception is allowed to propagate up 
> the stack, it can give problems in the HDFS Client, when it unwraps the 
> remote exception containing the AccessControlException sub-class.
> Ideally, we should make AccessControlException final so it cannot be 
> sub-classed, but that would be a breaking change at this point. Therefore I 
> believe the safest thing to do, is to catch any AccessControlException that 
> comes out of the permission enforcer plugin, and re-throw an 
> AccessControlException instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16286) Debug tool to verify the correctness of erasure coding on file

2021-10-27 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17434741#comment-17434741
 ] 

Stephen O'Donnell commented on HDFS-16286:
--

You may be interested in this tool which we developed to search for EC 
corruptions:

https://github.com/sodonnel/hdfs-ec-validator

I have not had time to try to bring it into Hadoop as an official tool.

> Debug tool to verify the correctness of erasure coding on file
> --
>
> Key: HDFS-16286
> URL: https://issues.apache.org/jira/browse/HDFS-16286
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, tools
>Affects Versions: 3.3.0, 3.3.1
>Reporter: daimin
>Assignee: daimin
>Priority: Minor
>  Labels: pull-request-available
>
> Block data in erasure coded block group may corrupt and the block meta 
> (checksum) is unable to discover the corruption in some cases such as EC 
> reconstruction, related issues are:  HDFS-14768, HDFS-15186, HDFS-15240.
> In addition to HDFS-15759, there needs a tool to check erasure coded file 
> whether any block group has data corruption in case of other conditions 
> rather than EC reconstruction, or the feature HDFS-15759(validation during EC 
> reconstruction) is not open(which is close by default now).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16272) Int overflow in computing safe length during EC block recovery

2021-10-18 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16272.
--
Resolution: Fixed

Committed down the active branches. Thanks for the contribution [~cndaimin].

> Int overflow in computing safe length during EC block recovery
> --
>
> Key: HDFS-16272
> URL: https://issues.apache.org/jira/browse/HDFS-16272
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: 3.1.1
>Affects Versions: 3.3.0, 3.3.1
> Environment: Cluster settings: EC RS-8-2-256k, Block Size 1GiB.
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There exists an int overflow problem in StripedBlockUtil#getSafeLength, which 
> will produce a negative or zero length:
> 1. With negative length, it fails to the later >=0 check, and will crash the 
> BlockRecoveryWorker thread, which make the lease recovery operation unable to 
> finish.
> 2. With zero length, it passes the check, and directly truncate the block 
> size to zero, leads to data lossing.
> If you are using any of the default EC policies (3-2, 6-3 or 10-4) and the 
> default HDFS block size of 128MB, then you will not be impacted by this issue.
> To be impacted, the EC dataNumber * blockSize has to be larger than the Java 
> max int of 2,147,483,647.
> For example 10-4 is 10 * 134217728 = 1,342,177,280 which is OK.
> However 10-4 with 256MB blocks is 2,684,354,560 which overflows the INT and 
> causes the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16272) Int overflow in computing safe length during EC block recovery

2021-10-18 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16272:
-
Fix Version/s: 3.3.2
   3.2.3
   3.4.0

> Int overflow in computing safe length during EC block recovery
> --
>
> Key: HDFS-16272
> URL: https://issues.apache.org/jira/browse/HDFS-16272
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: 3.1.1
>Affects Versions: 3.3.0, 3.3.1
> Environment: Cluster settings: EC RS-8-2-256k, Block Size 1GiB.
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There exists an int overflow problem in StripedBlockUtil#getSafeLength, which 
> will produce a negative or zero length:
> 1. With negative length, it fails to the later >=0 check, and will crash the 
> BlockRecoveryWorker thread, which make the lease recovery operation unable to 
> finish.
> 2. With zero length, it passes the check, and directly truncate the block 
> size to zero, leads to data lossing.
> If you are using any of the default EC policies (3-2, 6-3 or 10-4) and the 
> default HDFS block size of 128MB, then you will not be impacted by this issue.
> To be impacted, the EC dataNumber * blockSize has to be larger than the Java 
> max int of 2,147,483,647.
> For example 10-4 is 10 * 134217728 = 1,342,177,280 which is OK.
> However 10-4 with 256MB blocks is 2,684,354,560 which overflows the INT and 
> causes the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16272) Int overflow in computing safe length during EC block recovery

2021-10-18 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16272:
-
Description: 
There exists an int overflow problem in StripedBlockUtil#getSafeLength, which 
will produce a negative or zero length:
1. With negative length, it fails to the later >=0 check, and will crash the 
BlockRecoveryWorker thread, which make the lease recovery operation unable to 
finish.
2. With zero length, it passes the check, and directly truncate the block size 
to zero, leads to data lossing.

If you are using any of the default EC policies (3-2, 6-3 or 10-4) and the 
default HDFS block size of 128MB, then you will not be impacted by this issue.

To be impacted, the EC dataNumber * blockSize has to be larger than the Java 
max int of 2,147,483,647.

For example 10-4 is 10 * 134217728 = 1,342,177,280 which is OK.

However 10-4 with 256MB blocks is 2,684,354,560 which overflows the INT and 
causes the problem.

  was:
There exists an int overflow problem in StripedBlockUtil#getSafeLength, which 
will produce a negative or zero length:
1. With negative length, it fails to the later >=0 check, and will crash the 
BlockRecoveryWorker thread, which make the lease recovery operation unable to 
finish.
2. With zero length, it passes the check, and directly truncate the block size 
to zero, leads to data lossing.


> Int overflow in computing safe length during EC block recovery
> --
>
> Key: HDFS-16272
> URL: https://issues.apache.org/jira/browse/HDFS-16272
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: 3.1.1
>Affects Versions: 3.3.0, 3.3.1
> Environment: Cluster settings: EC RS-8-2-256k, Block Size 1GiB.
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> There exists an int overflow problem in StripedBlockUtil#getSafeLength, which 
> will produce a negative or zero length:
> 1. With negative length, it fails to the later >=0 check, and will crash the 
> BlockRecoveryWorker thread, which make the lease recovery operation unable to 
> finish.
> 2. With zero length, it passes the check, and directly truncate the block 
> size to zero, leads to data lossing.
> If you are using any of the default EC policies (3-2, 6-3 or 10-4) and the 
> default HDFS block size of 128MB, then you will not be impacted by this issue.
> To be impacted, the EC dataNumber * blockSize has to be larger than the Java 
> max int of 2,147,483,647.
> For example 10-4 is 10 * 134217728 = 1,342,177,280 which is OK.
> However 10-4 with 256MB blocks is 2,684,354,560 which overflows the INT and 
> causes the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-16272) Int overflow in computing safe length during EC block recovery

2021-10-13 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell reassigned HDFS-16272:


Assignee: daimin

> Int overflow in computing safe length during EC block recovery
> --
>
> Key: HDFS-16272
> URL: https://issues.apache.org/jira/browse/HDFS-16272
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: 3.1.1
>Affects Versions: 3.3.0, 3.3.1
> Environment: Cluster settings: EC RS-8-2-256k, Block Size 1GiB.
>Reporter: daimin
>Assignee: daimin
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There exists an int overflow problem in StripedBlockUtil#getSafeLength, which 
> will produce a negative or zero length:
> 1. With negative length, it fails to the later >=0 check, and will crash the 
> BlockRecoveryWorker thread, which make the lease recovery operation unable to 
> finish.
> 2. With zero length, it passes the check, and directly truncate the block 
> size to zero, leads to data lossing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16259) Catch and re-throw sub-classes of AccessControlException thrown by any permission provider plugins (eg Ranger)

2021-10-07 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425573#comment-17425573
 ] 

Stephen O'Donnell commented on HDFS-16259:
--

{quote}
What do you think about Compatibility? I think even if you unwrap at DfsClient 
or convert to ACE at Namenode, Compatibility guidelines would definitely break
{quote}

This is why I think catching the enforcer exceptions in the Namenode and 
throwing a plain AccessControlException is the safest bet, at least for the 3.3 
and 3.2 branches. Perhaps we should do something different on trunk that may be 
incompatible, eg change the client.

Nothing that calls the DFS Client should depend on a Ranger or other plugin 
defined exception coming out of the DFS client, and the way the client has been 
coded, it doesn't expect it either, as it only unwraps specific exceptions 
right now.

{quote}
Why we would just need to unwrap only a selective Exceptions
{quote}

Yea I agree, this was a strange decision. It means that sometimes you get a 
useful exception, and others you get a RemoteException, and with 
RemoteException you cannot even call "getCause()" on it to get the real 
exception. It would probably have been better to unwrap the remote exception 
always and just return the real cause to the caller.


> Catch and re-throw sub-classes of AccessControlException thrown by any 
> permission provider plugins (eg Ranger)
> --
>
> Key: HDFS-16259
> URL: https://issues.apache.org/jira/browse/HDFS-16259
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> When a permission provider plugin is enabled (eg Ranger) there are some 
> scenarios where it can throw a sub-class of an AccessControlException (eg 
> RangerAccessControlException). If this exception is allowed to propagate up 
> the stack, it can give problems in the HDFS Client, when it unwraps the 
> remote exception containing the AccessControlException sub-class.
> Ideally, we should make AccessControlException final so it cannot be 
> sub-classed, but that would be a breaking change at this point. Therefore I 
> believe the safest thing to do, is to catch any AccessControlException that 
> comes out of the permission enforcer plugin, and re-throw an 
> AccessControlException instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16259) Catch and re-throw sub-classes of AccessControlException thrown by any permission provider plugins (eg Ranger)

2021-10-07 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425383#comment-17425383
 ] 

Stephen O'Donnell commented on HDFS-16259:
--

I think it can be argued both ways. HDFS should have made 
AccessControlException final so it was clear what Ranger should do, but we 
cannot do that now as it will break Ranger, and any other plugins that may use 
this interface.

The HDFS client currently unwraps specific exceptions, so changing as you 
suggested above may need to be made in quite a few places and it could also 
change what the client returns in some circumstances.

To me it seems safer to ensure that the access plugins internal exceptions 
never make it to the client by catching them at the namenode.

There is already some code that does this inFSPermissionChecker:

{code}
  void checkPermission(INode inode, int snapshotId, FsAction access)
  throws AccessControlException {
byte[][] pathComponents = inode.getPathComponents();
INodeAttributes nodeAttributes = getINodeAttrs(pathComponents,
pathComponents.length - 1, inode, snapshotId);
try {
  INodeAttributes[] iNodeAttr = {nodeAttributes};
  AccessControlEnforcer enforcer = getAccessControlEnforcer();
  String opType = operationType.get();
  if (this.authorizeWithContext && opType != null) {
INodeAttributeProvider.AuthorizationContext.Builder builder =
new INodeAttributeProvider.AuthorizationContext.Builder();
builder.fsOwner(fsOwner)
.supergroup(supergroup)
.callerUgi(callerUgi)
.inodeAttrs(iNodeAttr) // single inode attr in the array
.inodes(new INode[] { inode }) // single inode attr in the array
.pathByNameArr(pathComponents)
.snapshotId(snapshotId)
.path(null)
.ancestorIndex(-1) // this will skip checkTraverse()
   // because not checking ancestor here
.doCheckOwner(false)
.ancestorAccess(null)
.parentAccess(null)
.access(access)// the target access to be checked against
   // the inode
.subAccess(null)   // passing null sub access avoids checking
   // children
.ignoreEmptyDir(false)
.operationName(opType)
.callerContext(CallerContext.getCurrent());

enforcer.checkPermissionWithContext(builder.build());
  } else {
enforcer.checkPermission(
fsOwner, supergroup, callerUgi,
iNodeAttr, // single inode attr in the array
new INode[]{inode}, // single inode in the array
pathComponents, snapshotId,
null, -1, // this will skip checkTraverse() because
// not checking ancestor here
false, null, null,
access, // the target access to be checked against the inode
null, // passing null sub access avoids checking children
false);
  }
} catch (AccessControlException ace) {
  throw new AccessControlException(
  toAccessControlString(nodeAttributes, inode.getFullPathName(),
  access));
}
  }
{code}

The enforcer is also called from the method:

{code}
  void checkPermission(INodesInPath inodesInPath, boolean doCheckOwner,
  FsAction ancestorAccess, FsAction parentAccess, FsAction access,
  FsAction subAccess, boolean ignoreEmptyDir)
  throws AccessControlException {
{code}

Which does not catch it, so right now, the behaviour is inconsistent across 
calls to the enforcer.

> Catch and re-throw sub-classes of AccessControlException thrown by any 
> permission provider plugins (eg Ranger)
> --
>
> Key: HDFS-16259
> URL: https://issues.apache.org/jira/browse/HDFS-16259
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> When a permission provider plugin is enabled (eg Ranger) there are some 
> scenarios where it can throw a sub-class of an AccessControlException (eg 
> RangerAccessControlException). If this exception is allowed to propagate up 
> the stack, it can give problems in the HDFS Client, when it unwraps the 
> remote exception containing the AccessControlException sub-class.
> Ideally, we should make AccessControlException final so it cannot be 
> sub-classed, but that would be a breaking change at this point. Therefore I 
> believe the safest thing to do, is to catch any AccessControlException that 
> comes out of the permission enforcer plugin, and re-throw an 
> AccessControlException instead.



--
This

[jira] [Updated] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec

2021-10-06 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16252:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Correct docs for dfs.http.client.retry.policy.spec 
> ---
>
> Key: HDFS-16252
> URL: https://issues.apache.org/jira/browse/HDFS-16252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
> Attachments: HDFS-16252.001.patch, HDFS-16252.002.patch
>
>
> The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as 
> it has the wait time and retries switched around in the descriptio. Also, the 
> doc for dfs.client.retry.policy.spec is not present and should be the same as 
> for dfs.http.client.retry.policy.spec.
> The code shows the timeout is first and then the number of retries:
> {code}
> String  POLICY_SPEC_KEY = PREFIX + "policy.spec";
> String  POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,...
> // In RetryPolicies.java, we can see it gets the timeout as the first in 
> the pair
>/**
>  * Parse the given string as a MultipleLinearRandomRetry object.
>  * The format of the string is "t_1, n_1, t_2, n_2, ...",
>  * where t_i and n_i are the i-th pair of sleep time and number of 
> retries.
>  * Note that the white spaces in the string are ignored.
>  *
>  * @return the parsed object, or null if the parsing fails.
>  */
> public static MultipleLinearRandomRetry parseCommaSeparatedString(String 
> s) {
>   final String[] elements = s.split(",");
>   if (elements.length == 0) {
> LOG.warn("Illegal value: there is no element in \"" + s + "\".");
> return null;
>   }
>   if (elements.length % 2 != 0) {
> LOG.warn("Illegal value: the number of elements in \"" + s + "\" is "
> + elements.length + " but an even number of elements is 
> expected.");
> return null;
>   }
>   final List pairs
>   = new ArrayList();
>
>   for(int i = 0; i < elements.length; ) {
> //parse the i-th sleep-time
> final int sleep = parsePositiveInt(elements, i++, s);
> if (sleep == -1) {
>   return null; //parse fails
> }
> //parse the i-th number-of-retries
> final int retries = parsePositiveInt(elements, i++, s);
> if (retries == -1) {
>   return null; //parse fails
> }
> pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, 
> sleep));
>   }
>   return new RetryPolicies.MultipleLinearRandomRetry(pairs);
>   }
> {code}
> This change simply updates the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec

2021-10-06 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16252:
-
Fix Version/s: 3.3.2
   3.4.0

> Correct docs for dfs.http.client.retry.policy.spec 
> ---
>
> Key: HDFS-16252
> URL: https://issues.apache.org/jira/browse/HDFS-16252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
> Attachments: HDFS-16252.001.patch, HDFS-16252.002.patch
>
>
> The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as 
> it has the wait time and retries switched around in the descriptio. Also, the 
> doc for dfs.client.retry.policy.spec is not present and should be the same as 
> for dfs.http.client.retry.policy.spec.
> The code shows the timeout is first and then the number of retries:
> {code}
> String  POLICY_SPEC_KEY = PREFIX + "policy.spec";
> String  POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,...
> // In RetryPolicies.java, we can see it gets the timeout as the first in 
> the pair
>/**
>  * Parse the given string as a MultipleLinearRandomRetry object.
>  * The format of the string is "t_1, n_1, t_2, n_2, ...",
>  * where t_i and n_i are the i-th pair of sleep time and number of 
> retries.
>  * Note that the white spaces in the string are ignored.
>  *
>  * @return the parsed object, or null if the parsing fails.
>  */
> public static MultipleLinearRandomRetry parseCommaSeparatedString(String 
> s) {
>   final String[] elements = s.split(",");
>   if (elements.length == 0) {
> LOG.warn("Illegal value: there is no element in \"" + s + "\".");
> return null;
>   }
>   if (elements.length % 2 != 0) {
> LOG.warn("Illegal value: the number of elements in \"" + s + "\" is "
> + elements.length + " but an even number of elements is 
> expected.");
> return null;
>   }
>   final List pairs
>   = new ArrayList();
>
>   for(int i = 0; i < elements.length; ) {
> //parse the i-th sleep-time
> final int sleep = parsePositiveInt(elements, i++, s);
> if (sleep == -1) {
>   return null; //parse fails
> }
> //parse the i-th number-of-retries
> final int retries = parsePositiveInt(elements, i++, s);
> if (retries == -1) {
>   return null; //parse fails
> }
> pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, 
> sleep));
>   }
>   return new RetryPolicies.MultipleLinearRandomRetry(pairs);
>   }
> {code}
> This change simply updates the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16259) Catch and re-throw sub-classes of AccessControlException thrown by any permission provider plugins (eg Ranger)

2021-10-06 Thread Stephen O'Donnell (Jira)

Stephen O'Donnell created HDFS-16259:


 Summary: Catch and re-throw sub-classes of AccessControlException 
thrown by any permission provider plugins (eg Ranger)
 Key: HDFS-16259
 URL: https://issues.apache.org/jira/browse/HDFS-16259
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


When a permission provider plugin is enabled (eg Ranger) there are some 
scenarios where it can throw a sub-class of an AccessControlException (eg 
RangerAccessControlException). If this exception is allowed to propagate up the 
stack, it can give problems in the HDFS Client, when it unwraps the remote 
exception containing the AccessControlException sub-class.

Ideally, we should make AccessControlException final so it cannot be 
sub-classed, but that would be a breaking change at this point. Therefore I 
believe the safest thing to do, is to catch any AccessControlException that 
comes out of the permission enforcer plugin, and re-throw an 
AccessControlException instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec

2021-10-04 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16252:
-
Attachment: HDFS-16252.002.patch

> Correct docs for dfs.http.client.retry.policy.spec 
> ---
>
> Key: HDFS-16252
> URL: https://issues.apache.org/jira/browse/HDFS-16252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-16252.001.patch, HDFS-16252.002.patch
>
>
> The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as 
> it has the wait time and retries switched around in the descriptio. Also, the 
> doc for dfs.client.retry.policy.spec is not present and should be the same as 
> for dfs.http.client.retry.policy.spec.
> The code shows the timeout is first and then the number of retries:
> {code}
> String  POLICY_SPEC_KEY = PREFIX + "policy.spec";
> String  POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,...
> // In RetryPolicies.java, we can see it gets the timeout as the first in 
> the pair
>/**
>  * Parse the given string as a MultipleLinearRandomRetry object.
>  * The format of the string is "t_1, n_1, t_2, n_2, ...",
>  * where t_i and n_i are the i-th pair of sleep time and number of 
> retries.
>  * Note that the white spaces in the string are ignored.
>  *
>  * @return the parsed object, or null if the parsing fails.
>  */
> public static MultipleLinearRandomRetry parseCommaSeparatedString(String 
> s) {
>   final String[] elements = s.split(",");
>   if (elements.length == 0) {
> LOG.warn("Illegal value: there is no element in \"" + s + "\".");
> return null;
>   }
>   if (elements.length % 2 != 0) {
> LOG.warn("Illegal value: the number of elements in \"" + s + "\" is "
> + elements.length + " but an even number of elements is 
> expected.");
> return null;
>   }
>   final List pairs
>   = new ArrayList();
>
>   for(int i = 0; i < elements.length; ) {
> //parse the i-th sleep-time
> final int sleep = parsePositiveInt(elements, i++, s);
> if (sleep == -1) {
>   return null; //parse fails
> }
> //parse the i-th number-of-retries
> final int retries = parsePositiveInt(elements, i++, s);
> if (retries == -1) {
>   return null; //parse fails
> }
> pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, 
> sleep));
>   }
>   return new RetryPolicies.MultipleLinearRandomRetry(pairs);
>   }
> {code}
> This change simply updates the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec

2021-10-04 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16252:
-
Status: Patch Available  (was: Open)

> Correct docs for dfs.http.client.retry.policy.spec 
> ---
>
> Key: HDFS-16252
> URL: https://issues.apache.org/jira/browse/HDFS-16252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-16252.001.patch
>
>
> The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as 
> it has the wait time and retries switched around in the descriptio. Also, the 
> doc for dfs.client.retry.policy.spec is not present and should be the same as 
> for dfs.http.client.retry.policy.spec.
> The code shows the timeout is first and then the number of retries:
> {code}
> String  POLICY_SPEC_KEY = PREFIX + "policy.spec";
> String  POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,...
> // In RetryPolicies.java, we can see it gets the timeout as the first in 
> the pair
>/**
>  * Parse the given string as a MultipleLinearRandomRetry object.
>  * The format of the string is "t_1, n_1, t_2, n_2, ...",
>  * where t_i and n_i are the i-th pair of sleep time and number of 
> retries.
>  * Note that the white spaces in the string are ignored.
>  *
>  * @return the parsed object, or null if the parsing fails.
>  */
> public static MultipleLinearRandomRetry parseCommaSeparatedString(String 
> s) {
>   final String[] elements = s.split(",");
>   if (elements.length == 0) {
> LOG.warn("Illegal value: there is no element in \"" + s + "\".");
> return null;
>   }
>   if (elements.length % 2 != 0) {
> LOG.warn("Illegal value: the number of elements in \"" + s + "\" is "
> + elements.length + " but an even number of elements is 
> expected.");
> return null;
>   }
>   final List pairs
>   = new ArrayList();
>
>   for(int i = 0; i < elements.length; ) {
> //parse the i-th sleep-time
> final int sleep = parsePositiveInt(elements, i++, s);
> if (sleep == -1) {
>   return null; //parse fails
> }
> //parse the i-th number-of-retries
> final int retries = parsePositiveInt(elements, i++, s);
> if (retries == -1) {
>   return null; //parse fails
> }
> pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, 
> sleep));
>   }
>   return new RetryPolicies.MultipleLinearRandomRetry(pairs);
>   }
> {code}
> This change simply updates the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec

2021-10-04 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16252:
-
Attachment: HDFS-16252.001.patch

> Correct docs for dfs.http.client.retry.policy.spec 
> ---
>
> Key: HDFS-16252
> URL: https://issues.apache.org/jira/browse/HDFS-16252
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-16252.001.patch
>
>
> The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as 
> it has the wait time and retries switched around in the descriptio. Also, the 
> doc for dfs.client.retry.policy.spec is not present and should be the same as 
> for dfs.http.client.retry.policy.spec.
> The code shows the timeout is first and then the number of retries:
> {code}
> String  POLICY_SPEC_KEY = PREFIX + "policy.spec";
> String  POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,...
> // In RetryPolicies.java, we can see it gets the timeout as the first in 
> the pair
>/**
>  * Parse the given string as a MultipleLinearRandomRetry object.
>  * The format of the string is "t_1, n_1, t_2, n_2, ...",
>  * where t_i and n_i are the i-th pair of sleep time and number of 
> retries.
>  * Note that the white spaces in the string are ignored.
>  *
>  * @return the parsed object, or null if the parsing fails.
>  */
> public static MultipleLinearRandomRetry parseCommaSeparatedString(String 
> s) {
>   final String[] elements = s.split(",");
>   if (elements.length == 0) {
> LOG.warn("Illegal value: there is no element in \"" + s + "\".");
> return null;
>   }
>   if (elements.length % 2 != 0) {
> LOG.warn("Illegal value: the number of elements in \"" + s + "\" is "
> + elements.length + " but an even number of elements is 
> expected.");
> return null;
>   }
>   final List pairs
>   = new ArrayList();
>
>   for(int i = 0; i < elements.length; ) {
> //parse the i-th sleep-time
> final int sleep = parsePositiveInt(elements, i++, s);
> if (sleep == -1) {
>   return null; //parse fails
> }
> //parse the i-th number-of-retries
> final int retries = parsePositiveInt(elements, i++, s);
> if (retries == -1) {
>   return null; //parse fails
> }
> pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, 
> sleep));
>   }
>   return new RetryPolicies.MultipleLinearRandomRetry(pairs);
>   }
> {code}
> This change simply updates the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16252) Correct docs for dfs.http.client.retry.policy.spec

2021-10-04 Thread Stephen O'Donnell (Jira)

Stephen O'Donnell created HDFS-16252:


 Summary: Correct docs for dfs.http.client.retry.policy.spec 
 Key: HDFS-16252
 URL: https://issues.apache.org/jira/browse/HDFS-16252
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


The hdfs-default doc for dfs.http.client.retry.policy.spec is incorrect, as it 
has the wait time and retries switched around in the descriptio. Also, the doc 
for dfs.client.retry.policy.spec is not present and should be the same as for 
dfs.http.client.retry.policy.spec.

The code shows the timeout is first and then the number of retries:

{code}
String  POLICY_SPEC_KEY = PREFIX + "policy.spec";
String  POLICY_SPEC_DEFAULT = "1,6,6,10"; //t1,n1,t2,n2,...


// In RetryPolicies.java, we can see it gets the timeout as the first in 
the pair


   /**
 * Parse the given string as a MultipleLinearRandomRetry object.
 * The format of the string is "t_1, n_1, t_2, n_2, ...",
 * where t_i and n_i are the i-th pair of sleep time and number of retries.
 * Note that the white spaces in the string are ignored.
 *
 * @return the parsed object, or null if the parsing fails.
 */
public static MultipleLinearRandomRetry parseCommaSeparatedString(String s) 
{
  final String[] elements = s.split(",");
  if (elements.length == 0) {
LOG.warn("Illegal value: there is no element in \"" + s + "\".");
return null;
  }
  if (elements.length % 2 != 0) {
LOG.warn("Illegal value: the number of elements in \"" + s + "\" is "
+ elements.length + " but an even number of elements is expected.");
return null;
  }

  final List pairs
  = new ArrayList();
   
  for(int i = 0; i < elements.length; ) {
//parse the i-th sleep-time
final int sleep = parsePositiveInt(elements, i++, s);
if (sleep == -1) {
  return null; //parse fails
}

//parse the i-th number-of-retries
final int retries = parsePositiveInt(elements, i++, s);
if (retries == -1) {
  return null; //parse fails
}

pairs.add(new RetryPolicies.MultipleLinearRandomRetry.Pair(retries, 
sleep));
  }
  return new RetryPolicies.MultipleLinearRandomRetry(pairs);
  }
{code}

This change simply updates the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16190) Combine some tests in TestOzoneManagerHAWithData to reuse mini-Clusters

2021-08-27 Thread Stephen O'Donnell (Jira)

Stephen O'Donnell created HDFS-16190:


 Summary: Combine some tests in TestOzoneManagerHAWithData to reuse 
mini-Clusters
 Key: HDFS-16190
 URL: https://issues.apache.org/jira/browse/HDFS-16190
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell


Some tests in TestOzoneManagerHAWithData can naturally be combined. For example 
some run the same test with all OMs up and then one OM down.

Some other tests can be run on the same cluster in isolated buckets, as they do 
not restart any nodes.

This test suit typically runs for about 490 seconds , so it would be good to 
reduce its runtime.

{code}
[INFO] Running org.apache.hadoop.ozone.om.TestOzoneManagerHAWithData
Warning:  Tests run: 12, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
490.975 s - in org.apache.hadoop.ozone.om.TestOzoneManagerHAWithData
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16153) Avoid evaluation of LOG.debug statement in QuorumJournalManager

2021-08-06 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16153.
--
Resolution: Fixed

> Avoid evaluation of LOG.debug statement in QuorumJournalManager
> ---
>
> Key: HDFS-16153
> URL: https://issues.apache.org/jira/browse/HDFS-16153
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: wangzhaohui
>Assignee: wangzhaohui
>Priority: Trivial
>  Labels: patch-available, pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-16153-001_patch.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16153) Avoid evaluation of LOG.debug statement in QuorumJournalManager

2021-08-06 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16153:
-
Fix Version/s: 3.3.2
   3.2.3
   3.4.0

> Avoid evaluation of LOG.debug statement in QuorumJournalManager
> ---
>
> Key: HDFS-16153
> URL: https://issues.apache.org/jira/browse/HDFS-16153
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: wangzhaohui
>Assignee: wangzhaohui
>Priority: Trivial
>  Labels: patch-available, pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-16153-001_patch.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16147) load fsimage with parallelization and compression

2021-08-04 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393386#comment-17393386
 ] 

Stephen O'Donnell commented on HDFS-16147:
--

With your patch in place I think the output file looks like this:

{code}
OVERALL_STREAM
  INODE_SECTION (only in index, not in the data stream)
COMPRESSED_INODE_SUB_SECTION
COMPRESSED_INODE_SUB_SECTION
COMPRESSED_INODE_SUB_SECTION
...
EMPTY_COMPRESSED_INODE_SUB_SECTION
  ...
  DIR_SECTION  (only in index, not in the data stream)
COMPRESSED_DIR_SUB_SECTION
COMPRESSED_DIR_SUB_SECTION
...
EMPTY_COMPRESSED_DIR_SUB_SECTION
  ...
{code}

The reason for the empty ones, is because at the end of the INODE section, you 
call commitSectionAndSubSection() and it closes the sub-section and opens a new 
compressed stream. Then you immediately close the section, which closes it and 
opens a new one. I don't think it does any harm, but it would be better if it 
did not do that, if we can fix it without making the code too complex.

Then I think the reason this works, is that if you try to read in parallel, it 
reads each compressed sub-section. This is fine.

When you try to read it serially (ie turn parallel off and load an image or use 
OIV), it will try to read all the compressed INODE sections using a single 
stream. I think this is like a series of streams concatenated together, and the 
decompressor must handle that (concatenated streams) and return the output like 
it is a single compressed stream. We can probably test this out somehow to be 
sure.

In TestFSImage.testNoParallelSectionsWithCompressionEnabled(..) - could you 
remove or rename that test and make it load an image with parallel enabled 
rather than the current test which checks it does not work.

Provided my understanding is correct, this patch looks mostly good apart from 
the two things above, but I will give it a more detailed review in the next day 
or two.

Have you tested this patch on a large image with millions of inodes?

> load fsimage with parallelization and compression
> -
>
> Key: HDFS-16147
> URL: https://issues.apache.org/jira/browse/HDFS-16147
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.3.0
>Reporter: liuyongpan
>Priority: Minor
> Attachments: HDFS-16147.001.patch, HDFS-16147.002.patch, 
> subsection.svg
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16147) load fsimage with parallelization and compression

2021-08-04 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393356#comment-17393356
 ] 

Stephen O'Donnell commented on HDFS-16147:
--

On further review, most of what I wrote above is wrong!

When saving the image, there is a single output stream, but each section is 
compressed within that stream, each as a separate compressed stream, eg:

{code}
OVERALL_STREAM
COMPRESSED_INODE_SECTION
COMPRESSED_DIR_SECTION
...
{code}

You can see this in the commitSection() method, where the stream is finished().

So this means that when we load a section (not in parallel), it jumps to the 
start of a compressed section, and reads it in full.

This means it is still unknown how you can save a compressed image with 
sub-sections and load it without parallel. Perhaps a compressed stream can read 
embedded compressed streams within itself - I am not sure, but I would like to 
understand how this is working.

> load fsimage with parallelization and compression
> -
>
> Key: HDFS-16147
> URL: https://issues.apache.org/jira/browse/HDFS-16147
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.3.0
>Reporter: liuyongpan
>Priority: Minor
> Attachments: HDFS-16147.001.patch, HDFS-16147.002.patch, 
> subsection.svg
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

2021-08-04 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393310#comment-17393310
 ] 

Stephen O'Donnell edited comment on HDFS-16147 at 8/4/21, 4:44 PM:
---

When the image is saved, there is a single stream written out serially. To 
enable parallel load on the image, index entries are added for the sub-sections 
as they are written out.

This means we have a single stream, with the position of the sub-sections saved.

That means, when we load the image, there are two choices:

1. We start at the beginning of a section and open a stream and read the entire 
section.

2. We open several streams, reading each sub-section in parallel by jumping to 
the indexed sub-section position and read the given length.

When you enable compression too, this means the entire stream is compressed 
from end to end as a single compressed stream. I wrongly thought there would be 
many compressed streams within the image file, and that is why I though OIV etc 
would have trouble reading this.

So it makes sense OIV can read the image serially, and the namenode can also 
read the image with parallel disabled when compression is on. The surprise to 
me, is that we can load the image in parallel, as that involves jumping into 
the compressed stream somewhere in the middle and starting to read, which most 
compression codecs do not support. It was my belief that gzip does not support 
this.

However, looking at the existing code, before this change, I see that we jump 
around in the stream already:

{code}
for (FileSummary.Section s : sections) {
  channel.position(s.getOffset());
  InputStream in = new BufferedInputStream(new LimitInputStream(fin,
  s.getLength()));

  in = FSImageUtil.wrapInputStreamForCompression(conf, summary.getCodec(), in);
{code}

So that must mean the compression codecs are splitable somehow, and they can 
start decompressing from an random position in the stream. Due to this, if the 
image is compressed, the existing parallel code can be mostly reused to load 
the sub-sections within the compressed stream.

>From the above, could we allow parallel loading of compressed images by simply 
>removing the code which disallows it?

{code}
-if (loadInParallel) {
-  if (compressionEnabled) {
-LOG.warn("Parallel Image loading and saving is not supported when {}" +
-" is set to true. Parallel will be disabled.",
-DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY);
-loadInParallel = false;
-  }
-}
{code}

Then let the image save compressed with the sub-sections indexed and try to 
load it?


was (Author: sodonnell):
When the image is saved, there is a single stream written out serially. To 
enable parallel load on the image, index entries are added for the sub-sections 
as they are written out.

This means we have a single stream, with the position of the sub-sections saved.

That means, when we load the image, there are two choices:

1. We start at the beginning of a section and open a stream and read the entire 
section.

2. We open several streams at by jumping to that position and read the given 
length.

When you enabled compression too, this means the entire stream is compressed 
from end to end as a single compressed stream. I wrongly thought there would be 
many compressed streams within the image file, and that is why I though OIV etc 
would have trouble reading this.

So it makes sense OIV can read the image serially, and the namenode can also 
read the image with parallel disabled when compression is on. The surprise to 
me, is that we can load the image in parallel, as that involves jumping into 
the compressed stream somewhere in the middle and starting to read, which more 
compression codecs do not support. It was my belief that gzip does not support 
this.


However, looking at the existing code, before this change, I see that we jump 
around in the stream already:

{code}
for (FileSummary.Section s : sections) {
  channel.position(s.getOffset());
  InputStream in = new BufferedInputStream(new LimitInputStream(fin,
  s.getLength()));

  in = FSImageUtil.wrapInputStreamForCompression(conf, summary.getCodec(), in);
{code}

So that must mean the compression codecs are splittable somehow, and they are 
start decompressing from an random position in the stream. Due to this, if the 
image is compressed, the existing parallel code can be mostly reused to load 
the sub-sections within the compressed stream.

>From the above, could we allow parallel loading of compressed images by simply 
>removing the code:

{code}
-if (loadInParallel) {
-  if (compressionEnabled) {
-LOG.warn("Parallel Image loading and saving is not supported when {}" +
-" is set to true. Parallel will be disabled.",
-DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY);
-loadInParallel = false;
-  }
-}

[jira] [Commented] (HDFS-16147) load fsimage with parallelization and compression

2021-08-04 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393310#comment-17393310
 ] 

Stephen O'Donnell commented on HDFS-16147:
--

When the image is saved, there is a single stream written out serially. To 
enable parallel load on the image, index entries are added for the sub-sections 
as they are written out.

This means we have a single stream, with the position of the sub-sections saved.

That means, when we load the image, there are two choices:

1. We start at the beginning of a section and open a stream and read the entire 
section.

2. We open several streams at by jumping to that position and read the given 
length.

When you enabled compression too, this means the entire stream is compressed 
from end to end as a single compressed stream. I wrongly thought there would be 
many compressed streams within the image file, and that is why I though OIV etc 
would have trouble reading this.

So it makes sense OIV can read the image serially, and the namenode can also 
read the image with parallel disabled when compression is on. The surprise to 
me, is that we can load the image in parallel, as that involves jumping into 
the compressed stream somewhere in the middle and starting to read, which more 
compression codecs do not support. It was my belief that gzip does not support 
this.


However, looking at the existing code, before this change, I see that we jump 
around in the stream already:

{code}
for (FileSummary.Section s : sections) {
  channel.position(s.getOffset());
  InputStream in = new BufferedInputStream(new LimitInputStream(fin,
  s.getLength()));

  in = FSImageUtil.wrapInputStreamForCompression(conf, summary.getCodec(), in);
{code}

So that must mean the compression codecs are splittable somehow, and they are 
start decompressing from an random position in the stream. Due to this, if the 
image is compressed, the existing parallel code can be mostly reused to load 
the sub-sections within the compressed stream.

>From the above, could we allow parallel loading of compressed images by simply 
>removing the code:

{code}
-if (loadInParallel) {
-  if (compressionEnabled) {
-LOG.warn("Parallel Image loading and saving is not supported when {}" +
-" is set to true. Parallel will be disabled.",
-DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY);
-loadInParallel = false;
-  }
-}
{code}

Then let the image save compressed with the sub-sections indexed and try to 
load it?

> load fsimage with parallelization and compression
> -
>
> Key: HDFS-16147
> URL: https://issues.apache.org/jira/browse/HDFS-16147
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.3.0
>Reporter: liuyongpan
>Priority: Minor
> Attachments: HDFS-16147.001.patch, HDFS-16147.002.patch, 
> subsection.svg
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16055) Quota is not preserved in snapshot INode

2021-08-03 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16055:
-
Fix Version/s: 3.2.3

> Quota is not preserved in snapshot INode
> 
>
> Key: HDFS-16055
> URL: https://issues.apache.org/jira/browse/HDFS-16055
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Quota feature is not preserved during snapshot creation, this causes 
> {{INodeDirectory#metadataEquals}} to ALWAYS return true. Therefore, 
> {{snapshotDiff}} will ALWAYS return the snapshot root as modified, even if 
> the quota is set before the snapshot creation:
> {code:bash}
> $ hdfs snapshotDiff /diffTest s0 .
> Difference between snapshot s0 and current directory under directory 
> /diffTest:
> M .
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16055) Quota is not preserved in snapshot INode

2021-08-03 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16055:
-
Fix Version/s: 3.3.2

> Quota is not preserved in snapshot INode
> 
>
> Key: HDFS-16055
> URL: https://issues.apache.org/jira/browse/HDFS-16055
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Quota feature is not preserved during snapshot creation, this causes 
> {{INodeDirectory#metadataEquals}} to ALWAYS return true. Therefore, 
> {{snapshotDiff}} will ALWAYS return the snapshot root as modified, even if 
> the quota is set before the snapshot creation:
> {code:bash}
> $ hdfs snapshotDiff /diffTest s0 .
> Difference between snapshot s0 and current directory under directory 
> /diffTest:
> M .
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16147) load fsimage with parallelization and compression

2021-07-29 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17389923#comment-17389923
 ] 

Stephen O'Donnell commented on HDFS-16147:
--

It is important that OIV can read these images.

If you create a parallel compressed image with this patch, and then try to load 
it in a NN without this patch and parallel loading disabled, is the NN still 
able to load it? I suspect it won't as there are multiple compressed sections, 
so it cannot read a single compressed stream from end to end.

{quote}
It can load 300M Fsimage with  compression and parallelization , which can 
improve 50% loading time.
{quote}

Is the 50% improvement measured against a compressed single threaded load 
verses parallel compressed loading?

How are the load times between "parallel not compressed " vs "parallel 
compressed"?

> load fsimage with parallelization and compression
> -
>
> Key: HDFS-16147
> URL: https://issues.apache.org/jira/browse/HDFS-16147
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.3.0
>Reporter: liuyongpan
>Priority: Minor
> Attachments: HDFS-16147.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16147) load fsimage with parallelization and compression

2021-07-29 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17389880#comment-17389880
 ] 

Stephen O'Donnell commented on HDFS-16147:
--

I only quickly looked at this, but I have a few questions.

With this change, will tools like OIV be able to read the image? It looks like 
there are a series of new compressed sections, and OIV currently does not read 
the image in parallel - it will just try to read it from the start to the end - 
will this work?

If we have parallel enabled and compressed sub-sections, if we disable parallel 
for some reason, will the image be readable?

Have you been able to benchmark loading a large image in parallel with 
compression enabled and disabled so we can see if compression makes it faster 
or slower?

> load fsimage with parallelization and compression
> -
>
> Key: HDFS-16147
> URL: https://issues.apache.org/jira/browse/HDFS-16147
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.3.0
>Reporter: liuyongpan
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-16147.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2021-07-29 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17389821#comment-17389821
 ] 

Stephen O'Donnell commented on HDFS-15175:
--

Thanks for committing it. It saved me some work today!

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Wan Chang
>Priority: Critical
>  Labels: NameNode
> Fix For: 3.4.0, 3.2.3, 3.3.2, 3.2.4
>
> Attachments: HDFS-15175-trunk.1.patch
>
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2021-07-29 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17389810#comment-17389810
 ] 

Stephen O'Donnell commented on HDFS-15175:
--

We should backport to branch-3.3 too, otherwise the change might get lost of 
someone was on 3.2.x and upgrades. 3.2 and 3.3 and trunk are the active 3.x 
branchs now. 3.1 is end of life.

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Wan Chang
>Priority: Critical
>  Labels: NameNode
> Fix For: 3.4.0, 3.2.3, 3.2.4
>
> Attachments: HDFS-15175-trunk.1.patch
>
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16132) SnapshotDiff report fails with invalid path assertion with external Attribute provider

2021-07-28 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16132.
--
Resolution: Won't Fix

Closing this as it is no longer relevant after HDFS-16144 reverted HDFS-15372.

> SnapshotDiff report fails with invalid path assertion with external Attribute 
> provider
> --
>
> Key: HDFS-16132
> URL: https://issues.apache.org/jira/browse/HDFS-16132
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> The issue can be reproduced with the below unit test:
> {code:java}
> diff --git 
> a/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeAttributeProvider.java
>  
> b/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeAttributeProvider.java
> index 512d1029835..27b80882766 100644
> --- 
> a/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeAttributeProvider.java
> +++ 
> b/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeAttributeProvider.java
> @@ -36,6 +36,7 @@
>  import org.apache.hadoop.hdfs.DistributedFileSystem;
>  import org.apache.hadoop.hdfs.HdfsConfiguration;
>  import org.apache.hadoop.hdfs.MiniDFSCluster;
> +import org.apache.hadoop.hdfs.DFSTestUtil;
>  import org.apache.hadoop.security.AccessControlException;
>  import org.apache.hadoop.security.UserGroupInformation;
>  import org.apache.hadoop.util.Lists;
> @@ -89,7 +90,7 @@ public void checkPermissionWithContext(
>            AuthorizationContext authzContext) throws AccessControlException {
>          if (authzContext.getAncestorIndex() > 1
>              && authzContext.getInodes()[1].getLocalName().equals("user")
> -            && authzContext.getInodes()[2].getLocalName().equals("acl")) {
> +            && authzContext.getInodes()[2].getLocalName().equals("acl") || 
> runPermissionCheck) {
>            this.ace.checkPermissionWithContext(authzContext);
>          }
>          CALLED.add("checkPermission|" + authzContext.getAncestorAccess()
> @@ -598,6 +599,55 @@ public Void run() throws Exception {
>          return null;
>        }
>      });
> +  }
>  
> +  @Test
> +  public void testAttrProviderSeesResolvedSnapahotPaths1() throws Exception {
> +    runPermissionCheck = true;
> +    FileSystem fs = FileSystem.get(miniDFS.getConfiguration(0));
> +    DistributedFileSystem hdfs = miniDFS.getFileSystem();
> +    final Path parent = new Path("/user");
> +    hdfs.mkdirs(parent);
> +    fs.setPermission(parent, new FsPermission(HDFS_PERMISSION));
> +    final Path sub1 = new Path(parent, "sub1");
> +    final Path sub1foo = new Path(sub1, "foo");
> +    hdfs.mkdirs(sub1);
> +    hdfs.mkdirs(sub1foo);
> +    Path f = new Path(sub1foo, "file0");
> +    DFSTestUtil.createFile(hdfs, f, 0, (short) 1, 0);
> +    hdfs.allowSnapshot(parent);
> +    hdfs.createSnapshot(parent, "s0");
> +
> +    f = new Path(sub1foo, "file1");
> +    DFSTestUtil.createFile(hdfs, f, 0, (short) 1, 0);
> +    f = new Path(sub1foo, "file2");
> +    DFSTestUtil.createFile(hdfs, f, 0, (short) 1, 0);
> +
> +    final Path sub2 = new Path(parent, "sub2");
> +    hdfs.mkdirs(sub2);
> +    final Path sub2foo = new Path(sub2, "foo");
> +    // mv /parent/sub1/foo to /parent/sub2/foo
> +    hdfs.rename(sub1foo, sub2foo);
> +
> +    hdfs.createSnapshot(parent, "s1");
> +    hdfs.createSnapshot(parent, "s2");
> +
> +    final Path sub3 = new Path(parent, "sub3");
> +    hdfs.mkdirs(sub3);
> +    // mv /parent/sub2/foo to /parent/sub3/foo
> +    hdfs.rename(sub2foo, sub3);
> +
> +    hdfs.delete(sub3, true);
> +    UserGroupInformation ugi =
> +        UserGroupInformation.createUserForTesting("u1", new String[] { "g1" 
> });
> +    ugi.doAs(new PrivilegedExceptionAction() {
> +      @Override
> +      public Void run() throws Exception {
> +        FileSystem fs = FileSystem.get(miniDFS.getConfiguration(0));
> +        ((DistributedFileSystem)fs).getSnapshotDiffReport(parent, "s1", 
> "s2");
> +        CALLED.clear();
> +        return null;
> +      }
> +    });
>    }
>  }
> {code}
> It fails with the below error when executed:
> {code:java}
> org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): Absolute 
> path required, but got 
> 'foo'org.apache.hadoop.ipc.RemoteException(java.lang.AssertionError): 
> Absolute path required, but got 'foo' at 
> org.apache.hadoop.hdfs.server.namenode.INode.checkAbsolutePath(INode.java:838)
>  at 
> org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents(INode.java:813)
>  at 
> org.apache.hadoop.hdfs.server.namenode.INodesInPath.resolveFromRoot(INodesInPath.java:154)
>  at 
>

[jira] [Updated] (HDFS-16144) Revert HDFS-15372 (Files in snapshots no longer see attribute provider permissions)

2021-07-28 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16144:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Revert HDFS-15372 (Files in snapshots no longer see attribute provider 
> permissions)
> ---
>
> Key: HDFS-16144
> URL: https://issues.apache.org/jira/browse/HDFS-16144
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
> Attachments: HDFS-16144.001.patch, HDFS-16144.002.patch, 
> HDFS-16144.003.patch, HDFS-16144.004.patch
>
>
> In HDFS-15372, I noted a change in behaviour between Hadoop 2 and Hadoop 3. 
> When a user accesses a file in a snapshot, if an attribute provider is 
> configured it would see the original file path (ie no .snapshot folder) in 
> Hadoop 2, but it would see the snapshot path in Hadoop 3.
> HDFS-15372 changed this back, but I noted at the time it may make sense for 
> the provider to see the actual snapshot path instead.
> Recently we discovered HDFS-16132 where the HDFS-15372 does not work 
> correctly. At this stage I believe it is better to revert HDFS-15372 as the 
> fix to this issue is probably not trivial and allow providers to see the 
> actual path the user accessed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16144) Revert HDFS-15372 (Files in snapshots no longer see attribute provider permissions)

2021-07-28 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell updated HDFS-16144:
-
Fix Version/s: 3.3.2
   3.4.0

> Revert HDFS-15372 (Files in snapshots no longer see attribute provider 
> permissions)
> ---
>
> Key: HDFS-16144
> URL: https://issues.apache.org/jira/browse/HDFS-16144
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
> Attachments: HDFS-16144.001.patch, HDFS-16144.002.patch, 
> HDFS-16144.003.patch, HDFS-16144.004.patch
>
>
> In HDFS-15372, I noted a change in behaviour between Hadoop 2 and Hadoop 3. 
> When a user accesses a file in a snapshot, if an attribute provider is 
> configured it would see the original file path (ie no .snapshot folder) in 
> Hadoop 2, but it would see the snapshot path in Hadoop 3.
> HDFS-15372 changed this back, but I noted at the time it may make sense for 
> the provider to see the actual snapshot path instead.
> Recently we discovered HDFS-16132 where the HDFS-15372 does not work 
> correctly. At this stage I believe it is better to revert HDFS-15372 as the 
> fix to this issue is probably not trivial and allow providers to see the 
> actual path the user accessed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16144) Revert HDFS-15372 (Files in snapshots no longer see attribute provider permissions)

2021-07-28 Thread Stephen O'Donnell (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17388784#comment-17388784
 ] 

Stephen O'Donnell commented on HDFS-16144:
--

The only changes between 003 and 004 were white space:

{code}
$ diff HDFS-16144.003.patch HDFS-16144.004.patch 
157c157
< index 512d1029835..84a9b0c08c9 100644
---
> index 512d1029835..776a1981ce4 100644
227,230c227,230
< +  // at org.apache.hadoop.hdfs.server.namenode.INode.checkAbsolutePath
< +  // (INode.java:838)
< +  // at org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents
< +  // (INode.java:813)
---
> +  //  at org.apache.hadoop.hdfs.server.namenode.INode.checkAbsolutePath
> +  //(INode.java:838)
> +  //  at org.apache.hadoop.hdfs.server.namenode.INode.getPathComponents
> +  //(INode.java:813)
342c342
< +UserGroupInformation.createUserForTesting("u1", new String[] { "g1" 
});
---
> +UserGroupInformation.createUserForTesting("u1", new String[] {"g1"});
{code}

003 got a good test run and a +1 from [~shashikant], so I will go ahead and 
commit the 004 patch based on that.

> Revert HDFS-15372 (Files in snapshots no longer see attribute provider 
> permissions)
> ---
>
> Key: HDFS-16144
> URL: https://issues.apache.org/jira/browse/HDFS-16144
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-16144.001.patch, HDFS-16144.002.patch, 
> HDFS-16144.003.patch, HDFS-16144.004.patch
>
>
> In HDFS-15372, I noted a change in behaviour between Hadoop 2 and Hadoop 3. 
> When a user accesses a file in a snapshot, if an attribute provider is 
> configured it would see the original file path (ie no .snapshot folder) in 
> Hadoop 2, but it would see the snapshot path in Hadoop 3.
> HDFS-15372 changed this back, but I noted at the time it may make sense for 
> the provider to see the actual snapshot path instead.
> Recently we discovered HDFS-16132 where the HDFS-15372 does not work 
> correctly. At this stage I believe it is better to revert HDFS-15372 as the 
> fix to this issue is probably not trivial and allow providers to see the 
> actual path the user accessed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 975 matches

Mail list logo