[jira] [Commented] (HDFS-17093) In the case of all datanodes sending FBR when the namenode restarts (large clusters), there is an issue with incomplete block reporting

2023-08-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755752#comment-17755752
 ] 

ASF GitHub Bot commented on HDFS-17093:
---

Tre2878 commented on PR #5855:
URL: https://github.com/apache/hadoop/pull/5855#issuecomment-1683193008

   @Hexiaoqiao The unit test error should have nothing to do with this change




> In the case of all datanodes sending FBR when the namenode restarts (large 
> clusters), there is an issue with incomplete block reporting
> ---
>
> Key: HDFS-17093
> URL: https://issues.apache.org/jira/browse/HDFS-17093
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.4
>Reporter: Yanlei Yu
>Priority: Minor
>  Labels: pull-request-available
>
> In our cluster of 800+ nodes, after restarting the namenode, we found that 
> some datanodes did not report enough blocks, causing the namenode to stay in 
> secure mode for a long time after restarting because of incomplete block 
> reporting
> I found in the logs of the datanode with incomplete block reporting that the 
> first FBR attempt failed, possibly due to namenode stress, and then a second 
> FBR attempt was made as follows:
> {code:java}
> 
> 2023-07-17 11:29:28,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Unsuccessfully sent block report 0x6237a52c1e817e,  containing 12 storage 
> report(s), of which we sent 1. The reports had 1099057 total blocks and used 
> 1 RPC(s). This took 294 msec to generate and 101721 msecs for RPC and NN 
> processing. Got back no commands.
> 2023-07-17 11:37:04,014 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Successfully sent block report 0x62382416f3f055,  containing 12 storage 
> report(s), of which we sent 12. The reports had 1099048 total blocks and used 
> 12 RPC(s). This took 295 msec to generate and 11647 msecs for RPC and NN 
> processing. Got back no commands. {code}
> There's nothing wrong with that. Retry the send if it fails But on the 
> namenode side of the logic:
> {code:java}
> if (namesystem.isInStartupSafeMode()
>     && !StorageType.PROVIDED.equals(storageInfo.getStorageType())
>     && storageInfo.getBlockReportCount() > 0) {
>   blockLog.info("BLOCK* processReport 0x{} with lease ID 0x{}: "
>       + "discarded non-initial block report from {}"
>       + " because namenode still in startup phase",
>       strBlockReportId, fullBrLeaseId, nodeID);
>   blockReportLeaseManager.removeLease(node);
>   return !node.hasStaleStorages();
> } {code}
> When a disk was identified as the report is not the first time, namely 
> storageInfo. GetBlockReportCount > 0, Will remove the ticket from the 
> datanode, lead to a second report failed because no lease



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17156) Client may receive old state ID which will lead to inconsistent reads

2023-08-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755744#comment-17755744
 ] 

ASF GitHub Bot commented on HDFS-17156:
---

chunyiyang commented on PR #5951:
URL: https://github.com/apache/hadoop/pull/5951#issuecomment-1683172861

   Thanks for updating the title and merging it! @tasanuma 




> Client may receive old state ID which will lead to inconsistent reads
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17156) Client may receive old state ID which will lead to inconsistent reads

2023-08-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755656#comment-17755656
 ] 

ASF GitHub Bot commented on HDFS-17156:
---

simbadzina commented on PR #5951:
URL: https://github.com/apache/hadoop/pull/5951#issuecomment-1682710012

   Thanks.




> Client may receive old state ID which will lead to inconsistent reads
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17156) Client may receive old state ID which will lead to inconsistent reads

2023-08-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755655#comment-17755655
 ] 

ASF GitHub Bot commented on HDFS-17156:
---

tasanuma commented on PR #5951:
URL: https://github.com/apache/hadoop/pull/5951#issuecomment-1682709152

   Yes, I just did it.




> Client may receive old state ID which will lead to inconsistent reads
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17156) Client may receive old state ID which will lead to inconsistent reads

2023-08-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755653#comment-17755653
 ] 

ASF GitHub Bot commented on HDFS-17156:
---

simbadzina commented on PR #5951:
URL: https://github.com/apache/hadoop/pull/5951#issuecomment-1682701064

   Thank you. @tasanuma could you please also backport this into branch-3.3.




> Client may receive old state ID which will lead to inconsistent reads
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17156) Client may receive old state ID which will lead to inconsistent reads

2023-08-17 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17156:

Fix Version/s: 3.3.9
   (was: 3.3.7)

> Client may receive old state ID which will lead to inconsistent reads
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17030) Limit wait time for getHAServiceState in ObserverReaderProxy

2023-08-17 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17030:

Fix Version/s: 3.3.9

> Limit wait time for getHAServiceState in ObserverReaderProxy
> 
>
> Key: HDFS-17030
> URL: https://issues.apache.org/jira/browse/HDFS-17030
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> When namenode HA is enabled and a standby NN is not responsible, we have 
> observed it would take a long time to serve a request, even though we have a 
> healthy observer or active NN. 
> Basically, when a standby is down, the RPC client would (re)try to create 
> socket connection to that standby for _ipc.client.connect.timeout_ _* 
> ipc.client.connect.max.retries.on.timeouts_ before giving up. When we take a 
> heap dump at a standby, the NN still accepts the socket connection but it 
> won't send responses to these RPC requests and we would timeout after 
> _ipc.client.rpc-timeout.ms._ This adds a significantly latency. For clusters 
> at Linkedin, we set _ipc.client.rpc-timeout.ms_ to 120 seconds and thus a 
> request takes more than 2 mins to complete when we take a heap dump at a 
> standby. This has been causing user job failures. 
> We could set _ipc.client.rpc-timeout.ms to_ a smaller value when sending 
> getHAServiceState requests in ObserverReaderProxy (for user rpc requests, we 
> still use the original value from the config). However, that would double the 
> socket connection between clients and the NN (which is a deal-breaker). 
> The proposal is to add a timeout on getHAServiceState() calls in 
> ObserverReaderProxy and we will only wait for the timeout for an NN to 
> respond its HA state. Once we pass that timeout, we will move on to probe the 
> next NN. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17156) Client may receive old state ID which will lead to inconsistent reads

2023-08-17 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17156:

Fix Version/s: 3.3.7

> Client may receive old state ID which will lead to inconsistent reads
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
> Fix For: 3.4.0, 3.3.7
>
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17156) Client may receive old state ID which will lead to inconsistent reads

2023-08-17 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-17156.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Client may receive old state ID which will lead to inconsistent reads
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
> Fix For: 3.4.0
>
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17156) Client may receive old state ID which will lead to inconsistent reads

2023-08-17 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-17156:

Summary: Client may receive old state ID which will lead to inconsistent 
reads  (was: mapreduce job encounters java.io.IOException)

> Client may receive old state ID which will lead to inconsistent reads
> -
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17156) mapreduce job encounters java.io.IOException

2023-08-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755631#comment-17755631
 ] 

ASF GitHub Bot commented on HDFS-17156:
---

tasanuma commented on PR #5951:
URL: https://github.com/apache/hadoop/pull/5951#issuecomment-1682643278

   Merged. Thanks for your contribution, @chunyiyang! Thanks for your review 
and providing the unit test, @simbadzina!




> mapreduce job encounters java.io.IOException
> 
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17156) mapreduce job encounters java.io.IOException

2023-08-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755630#comment-17755630
 ] 

ASF GitHub Bot commented on HDFS-17156:
---

tasanuma merged PR #5951:
URL: https://github.com/apache/hadoop/pull/5951




> mapreduce job encounters java.io.IOException
> 
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17121) BPServiceActor to provide new thread to handle FBR

2023-08-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1777#comment-1777
 ] 

ASF GitHub Bot commented on HDFS-17121:
---

hadoop-yetus commented on PR #5888:
URL: https://github.com/apache/hadoop/pull/5888#issuecomment-1682311673

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 34s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 49s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 56s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 49s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 43s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 57s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 59s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m  8s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 47s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 44s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 44s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 34s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5888/12/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 23 unchanged - 
0 fixed = 24 total (was 23)  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 52s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 54s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 220m 12s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5888/12/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 313m 11s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestDecommissionWithBackoffMonitor |
   |   | hadoop.hdfs.server.namenode.TestFSImage |
   |   | hadoop.hdfs.server.namenode.TestNestedEncryptionZones |
   |   | hadoop.hdfs.TestDFSUpgradeFromImage |
   |   | hadoop.hdfs.server.datanode.TestBlockReplacement |
   |   | hadoop.hdfs.server.namenode.TestFSEditLogLoader |
   |   | hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics |
   |   | hadoop.hdfs.server.datanode.TestTriggerBlockReport |
   |   | hadoop.hdfs.server.namenode.TestFsck |
   |   | hadoop.hdfs.TestEncryptedTransfer |
   |   | hadoop.hdfs.server.blockmanagement.TestBlockReportRateLimiting |
   |   | hadoop.hdfs.server.namenode.TestProcessCorruptBlocks |
   |   | hadoop.hdfs.server.namenode.ha.TestHASafeMode |
   |   | hadoop.hdfs.server.blockmanagement.TestBlockManager |
   |   | hadoop.hdfs.server.namenode.TestListCorruptFileBlocks |
   |   | hadoop.hdfs.TestDFSShell |
   |   | 

[jira] [Commented] (HDFS-17121) BPServiceActor to provide new thread to handle FBR

2023-08-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1772#comment-1772
 ] 

ASF GitHub Bot commented on HDFS-17121:
---

hadoop-yetus commented on PR #5888:
URL: https://github.com/apache/hadoop/pull/5888#issuecomment-1682286799

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 52s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  48m  3s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 19s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 56s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m  9s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  1s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5888/11/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 23 unchanged - 
0 fixed = 24 total (was 23)  |
   | +1 :green_heart: |  mvnsite  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 55s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | -1 :x: |  spotbugs  |   3m 22s | 
[/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5888/11/artifact/out/new-spotbugs-hadoop-hdfs-project_hadoop-hdfs.html)
 |  hadoop-hdfs-project/hadoop-hdfs generated 1 new + 0 unchanged - 0 fixed = 1 
total (was 0)  |
   | -1 :x: |  shadedclient  |  40m 25s |  |  patch has errors when building 
and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 208m 30s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5888/11/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 49s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 361m 14s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-hdfs-project/hadoop-hdfs |
   |  |  Uninitialized read of fullBlockReportLeaseIdLock in new 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor(String, String, 
InetSocketAddress, InetSocketAddress, BPOfferService)  At 
BPServiceActor.java:new 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor(String, String, 
InetSocketAddress, InetSocketAddress, BPOfferService)  At 
BPServiceActor.java:[line 150] |
   | Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestSnapshotRename |
   |   | hadoop.hdfs.server.namenode.TestFSImage |
   |   | hadoop.cli.TestCryptoAdminCLI |
   |   | hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd |
   |   | 

[jira] [Commented] (HDFS-17161) Adding test for StripedBlockReader#createBlockReader leaks socket on IOException

2023-08-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755505#comment-17755505
 ] 

ASF GitHub Bot commented on HDFS-17161:
---

hadoop-yetus commented on PR #5956:
URL: https://github.com/apache/hadoop/pull/5956#issuecomment-1682117418

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  14m 19s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  44m 45s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   1m 17s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 29s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 12s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 28s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 53s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   1m 13s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5956/1/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   1m  1s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5956/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 6 new + 3 unchanged - 
0 fixed = 9 total (was 3)  |
   | +1 :green_heart: |  mvnsite  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 24s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  37m 59s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 226m 15s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 56s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 388m 33s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5956/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5956 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux d3aa8279a157 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / aa9c54677f107d2cb8daf7783e9d10f10c5fb900 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5956/1/testReport/ |
   | Max. process+thread count | 3170 (vs. 

[jira] [Commented] (HDFS-17161) Adding test for StripedBlockReader#createBlockReader leaks socket on IOException

2023-08-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755503#comment-17755503
 ] 

ASF GitHub Bot commented on HDFS-17161:
---

hadoop-yetus commented on PR #5956:
URL: https://github.com/apache/hadoop/pull/5956#issuecomment-1682114560

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   3m 28s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  45m 10s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   1m 17s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 31s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 14s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  36m  5s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   1m 16s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5956/2/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   1m  5s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5956/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 6 new + 3 unchanged - 
0 fixed = 9 total (was 3)  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   3m 35s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  38m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 227m  3s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5956/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 53s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 376m 49s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestObserverNode |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5956/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5956 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux ad1bc585f255 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / aa9c54677f107d2cb8daf7783e9d10f10c5fb900 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 

[jira] [Commented] (HDFS-17128) RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers

2023-08-17 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755378#comment-17755378
 ] 

Shilun Fan commented on HDFS-17128:
---

[~hchaverri] Sorry for the late reply, Thank you very much for your 
contribution! I will help merge the code to branch-3.3.

> RBF: SQLDelegationTokenSecretManager should use version of tokens updated by 
> other routers
> --
>
> Key: HDFS-17128
> URL: https://issues.apache.org/jira/browse/HDFS-17128
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Hector Sandoval Chaverri
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-17128-branch-3.3.patch
>
>
> The SQLDelegationTokenSecretManager keeps tokens that it has interacted with 
> in a memory cache. This prevents routers from connecting to the SQL server 
> for each token operation, improving performance.
> We've noticed issues with some tokens being loaded in one router's cache and 
> later renewed on a different one. If clients try to use the token in the 
> outdated router, it will throw an "Auth failed" error when the cached token's 
> expiration has passed.
> This can also affect cancelation scenarios since a token can be removed from 
> one router's cache and still exist in another one.
> A possible solution is already implemented on the 
> ZKDelegationTokenSecretManager, which consists of having an executor 
> refreshing each router's cache on a periodic basis. We should evaluate 
> whether this will work with the volume of tokens expected to be handled by 
> the SQLDelegationTokenSecretManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17156) mapreduce job encounters java.io.IOException

2023-08-17 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755375#comment-17755375
 ] 

ASF GitHub Bot commented on HDFS-17156:
---

chunyiyang commented on PR #5951:
URL: https://github.com/apache/hadoop/pull/5951#issuecomment-1681667372

   This PR has passed all the checks, but seems like I don't have the 
permission to merge. Could someone help to merge this if possible? Thanks in 
advance!




> mapreduce job encounters java.io.IOException
> 
>
> Key: HDFS-17156
> URL: https://issues.apache.org/jira/browse/HDFS-17156
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chunyi Yang
>Assignee: Chunyi Yang
>Priority: Minor
>  Labels: Observer, RBF, pull-request-available
>
> While executing a mapreduce job in an environment utilizing Router-Based 
> Federation with Observer read enabled, there is an estimated 1% chance of 
> encountering the following error.
> {code:java}
> "java.io.IOException: Resource 
> hdfs:///user//.staging/job_XX/.tez/application_XX/tez-conf.pb 
> changed on src filesystem - expected: \"2023-07-07T12:41:16.801+0900\", was: 
> \"2023-07-07T12:41:16.822+0900\", current time: 
> \"2023-07-07T12:41:22.386+0900\"",
> {code}
> This error happens in function verifyAndCopy inside FSDownload.java when 
> nodemanager tries to download a file right after the file has been written to 
> the HDFS. The write operation runs on active namenode and read operation runs 
> on observer namenode as expected.
> The edits file and hdfs-audit files indicate that the expected timestamp 
> mentioned in the error message aligns with the OP_CLOSE MTIME of the 
> 'tez-conf.pb' file (which is correct). However, the actual timestamp 
> retrieved from the read operation corresponds to the OP_ADD MTIME of the 
> target 'tez-conf.pf' file (which is incorrect). This inconsistency suggests 
> that the observer namenode responds to the client before its edits file is 
> updated with the latest stateId.
> Further troubleshooting has revealed that during write operations, the router 
> responds to the client before receiving the latest stateId from the active 
> namenode. Consequently, the outdated stateId is then used in the subsequent 
> read operation on the observer namenode, leading to inaccuracies in the 
> information provided by the observer namenode.
> To resolve this issue, it is essential to ensure that the router sends a 
> response to the client only after receiving the latest stateId from the 
> active namenode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org