[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834936#comment-16834936 ] Lukas Majercak commented on HDFS-14134: --- Hi [~John Smith]. I'm not sure if it's that simple. Say you have 2 NNs, where nn1 is active, nn2 is standby. If your current target is nn2, but we send a request to both, you can get responses like: nn1 - RETRY nn2 - RETRY_AND_FAILOVER In this case, the ideal scenario would be to failover to nn1, but what you're proposing would not do that. I think this obviously has room for improvement, as in some cases RETRY > FAILOVER is reasonable, but I'd like to refrain from increasing the scope of this JIRA. Maybe we can create a new one and have the discussion there? > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834323#comment-16834323 ] Yuxuan Wang commented on HDFS-14134: Hi [~lukmajercak]. No. I just figure that {{StandbyException}} always trigger {{FAILOVER_AND_RETRY}}. And with hedging proxy, the action {{FAILOVER_AND_RETRY}} will always be gotten and will cover all actions have lower order than {{FAILOVER_AND_RETRY}}. I mean that the order of {{enum RetryDecision}} may should be {{FAILOVER_AND_RETRY < RETRY < FAIL}} in patch 007. Or in short, {{FAILOVER_AND_RETRY}}'s order should be the lowest. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832730#comment-16832730 ] Lukas Majercak commented on HDFS-14134: --- Hi [~John Smith]. I'm not sure I'm following, what's the concern with StandbyException triggering FAILOVER_AND_RETRY with the logic in patch 007 ? > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828887#comment-16828887 ] Hadoop QA commented on HDFS-14134: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} HDFS-14134 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-14134 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12954490/HDFS-14134.007.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/26723/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828886#comment-16828886 ] Yuxuan Wang commented on HDFS-14134: Hello, anyone is working on this? I find a bug in {{org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider}} . Just like [~lukmajercak] said: {quote}Also note that previously, if a hedging request got FAILOVER_RETRY and some request got SocketExc on nonidempotent operation (e.g. FAIL), the client would still pick FAILOVER_RETRY over FAIL, so i think we are fixing an issue here as well. {quote} But more than this, standby namenode will always throw back StandbyException which can cause {{FAILOVER_AND_RETRY}} action. It will cover all actions have lower order than {{FAILOVER_AND_RETRY}}, such as {{RETRY}} in [^HDFS-14134.007.patch]. I mean, the correct order should be {{Ordering: FAILOVER_AND_RETRY < RETRY < FAIL}}, right ? > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767831#comment-16767831 ] Íñigo Goiri commented on HDFS-14134: As nobody else other than [~knanasi] is taking over the review, I'll give it a try. I've tested the equivalent of {{testIdempotentOperationShouldNotGetStuckInRetries()}} and currently it retries in all Namenodes which I think is wrong as stated in the JIRA. >From the unit test, the new behavior seems better. I would improve the unit test a little: * Use {{LambdaTestUtils#intercept()}}. * Can we constraint a little more the verification and do a tighter check instead of {{atMost()}}. * Is there other checks we can add here? Negative cases? Are methods with FAILOVER_AND_RETRY and FAIL fully covered? BTW, [~lukmajercak] have you tested this in your setup? What scale? What impact have you seen? > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740810#comment-16740810 ] Lukas Majercak commented on HDFS-14134: --- + more people [~szetszwo], [~jingzhao]. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740802#comment-16740802 ] Lukas Majercak commented on HDFS-14134: --- Thanks [~knanasi]. [~atm], [~eli], [~sureshms], [~sanjay.radia], [~xgong], [~jianhe]; I see you guys worked on this part of the codebase before, anyone available to review this? Thanks! > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740605#comment-16740605 ] Íñigo Goiri commented on HDFS-14134: [^HDFS-14134.007.patch] LGTM. However, I have limited experience with this part of the client. It would be nice for others to give it a look. [~lukmajercak], from the previous commits, who do you think would be a good candidate for review? > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740232#comment-16740232 ] Kitti Nanasi commented on HDFS-14134: - Thanks [~lukmajercak] for the work here! {quote}Also note that previously, if a hedging request got FAILOVER_RETRY and some request got SocketExc on nonidempotent operation (e.g. FAIL), the client would still pick FAILOVER_RETRY over FAIL, so i think we are fixing an issue here as well. {quote} Sounds good that you found and fixed this issue as well. +1 (non-binding) New patch looks good to me! > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739894#comment-16739894 ] Hadoop QA commented on HDFS-14134: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 24s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 27s{color} | {color:green} root: The patch generated 0 new + 108 unchanged - 10 fixed = 108 total (was 118) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 53s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 39s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 55s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}119m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.security.ssl.TestSSLFactory | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-14134 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12954490/HDFS-14134.007.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 4ec5ff729e55 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 33c009a4 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_191 | | findbugs | v3.1.0-RC1 | | unit |
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739821#comment-16739821 ] Lukas Majercak commented on HDFS-14134: --- Patch 007 to fix checkstyle + whitespace warnings. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134.007.patch, > HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739809#comment-16739809 ] Hadoop QA commented on HDFS-14134: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 19m 15s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 4m 4s{color} | {color:orange} root: The patch generated 10 new + 108 unchanged - 10 fixed = 118 total (was 118) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 31s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 24s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 9m 30s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}127m 39s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.security.ssl.TestSSLFactory | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-14134 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12954472/HDFS-14134.006.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 40318a766566 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 33c009a4 | | maven | version: Apache
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739719#comment-16739719 ] Lukas Majercak commented on HDFS-14134: --- Added patch006 together with HDFS-14134_retrypolicy_change_proposal_1.pdf to explain the changes. After the discussion, it seems like just changing the logic for remote IOExceptions together with the priority for retry actions will be enough. Could you review this [~knanasi] ? Thanks! > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739704#comment-16739704 ] Hadoop QA commented on HDFS-14134: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HDFS-14134 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-14134 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/25956/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134.006.patch, HDFS-14134_retrypolicy_change_proposal.pdf, > HDFS-14134_retrypolicy_change_proposal_1.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738702#comment-16738702 ] Lukas Majercak commented on HDFS-14134: --- Also note that previously, if a hedging request got FAILOVER_RETRY and some request got SocketExc on nonidempotent operation (e.g. FAIL), the client would still pick FAILOVER_RETRY over FAIL, so i think we are fixing an issue here as well. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738700#comment-16738700 ] Lukas Majercak commented on HDFS-14134: --- I see, that makes sense, I'm happy to change SocketException (non-idempotent) IOException (non-idempotent) back to being FAIL. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720572#comment-16720572 ] Kitti Nanasi commented on HDFS-14134: - The relevant part is the following: {quote}in FailoverOnNetworkExceptionRetry#shouldRetry we don't fail-over and retry if we're making a non-idempotent call and there's an IOException or SocketException that's not Connect, NoRouteToHost, UnknownHost, or Standby. The rationale of course is that the operation may have reached the server and retrying elsewhere could leave us in an insconsistent state. This means if a client doing a create/delete which gets a SocketTimeoutException (which is an IOE) or an EOF SocketException the exception will be thrown all the way up to the caller of FileSystem/FileContext. That's reasonable because only the user of the API at this level has sufficient knoweldge of how to handle the failure, eg if they get such an exception after issuing a delete they can check if the file still exists and if so re-issue the delete (however they may also not want to do this, and FileContext doesn't know which). {quote} > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720564#comment-16720564 ] Lukas Majercak commented on HDFS-14134: --- I'll go through that discussion. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720563#comment-16720563 ] Kitti Nanasi commented on HDFS-14134: - Yes, this change covers that, I just wanted to understand why you changed it like that, but we're pretty much on the same page now. I have only one concern, which is the case of non-remote IOExceptions on non-idempotent operations, I'm not sure if retrying those will cause any problems. For reference there is a discussion on [HADOOP-7380|https://issues.apache.org/jira/browse/HADOOP-7380] on why it was introduced. Other than that patch v5 looks good. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720506#comment-16720506 ] Lukas Majercak commented on HDFS-14134: --- I agree non-remote IOExceptions could be network related, but this is covered right? Non-remote IOExceptions are retried with this change, no matter whether the operation is idempotent. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720507#comment-16720507 ] Lukas Majercak commented on HDFS-14134: --- I'd argue that this change is even safer, because previously the retry action would be FAIL for: SocketExceptions (non-idempotent) Non-remote IOExceptions (non-idempotent) > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720204#comment-16720204 ] Kitti Nanasi commented on HDFS-14134: - I totally agree with you that retrying getXAttr on "attr could not find" IOException is not good and wasteful, and that we have to have a better concept than the current. But we also have to keep in mind that the FailoverOnNetworkExceptionRetry policy is used by many parts of the code and it is a bit risky to change it. I think the idea behind the previous design is that non remote IOExceptions may be network related exceptions, so it is worth to retry them if the operation is idempotent. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719444#comment-16719444 ] Lukas Majercak commented on HDFS-14134: --- Retrying failed idempotent operations might be safe, but it surely is wasteful. Check the test I wrote for getXAttr, the client just retries the same exception over and over for no reason. We have to have a concept of nonretriable exceptions in HDFS, and I feel like RemoteException of an idempotent operation is a very good start. The previous design was very strange, they chose to FAIL if the operation was not idempotent and the exception was not Remote, which does not make a lot of sense to me. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718752#comment-16718752 ] Kitti Nanasi commented on HDFS-14134: - [~lukmajercak], you are correct on the definition of idempotency. I think the original approach in retrying was that idempotent operations don't change internal state, so it is safe to retry them. For example if you just get a value, it is always safe to retry, but if you renew a delegation token, it is a more complex question if it is safe to retry that or not, because maybe the renewal already took place before failing, maybe not, and if it already took place, is it safe to renew it again? By the way idempotency originally was only considered in case of non-remote IOExceptions, why did that change? > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717754#comment-16717754 ] Lukas Majercak commented on HDFS-14134: --- Why should we retry if the operation is idempotent and the exception is remoteexception? The definition of an idempotent operation is that it will have the same outcome next time as well right? In that case, we should just fail fast. Check TestRequestHedgingProxyProvider.testIdempotentOperationShouldNotGetStuckInRetries > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716690#comment-16716690 ] Kitti Nanasi commented on HDFS-14134: - Thanks for the new patch [~lukmajercak]! It looks better regarding retrying on non-remote IOExceptions, but there is one thing I don't understand, which I think is wrong in the pdf as well. In case of remote IOException, we should retry if the operation is idempotent, and not the opposite. So instead of this code: {code:java} else if (e instanceof IOException) { if (e instanceof RemoteException && isIdempotentOrAtMostOnce) { return new RetryAction(RetryAction.RetryDecision.FAIL, 0, "Remote exception and the invoked method is idempotent " + "or at most once."); } return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY, getFailoverOrRetrySleepTime(failovers)); } {code} I think it should look like this: {code:java} else if (e instanceof IOException) { if (e instanceof RemoteException && !isIdempotentOrAtMostOnce) { return new RetryAction(RetryAction.RetryDecision.FAIL, 0, "Remote exception and the invoked method is idempotent " + "or at most once."); } return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY, getFailoverOrRetrySleepTime(failovers)); } {code} What do you think [~lukmajercak]? > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716043#comment-16716043 ] Íñigo Goiri commented on HDFS-14134: I think the change in semantics makes sense and +1 on that. Regarding [^HDFS-14134.005.patch] itslef: * There are a bunch of changes that are just cosmetic; on the fence on what to do with them. * For checking the exception in {{testIdempotentOperationShouldNotGetStuckInRetries()}} we may want to just use {{LambdaTestUtils#intercept}}. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716033#comment-16716033 ] Hadoop QA commented on HDFS-14134: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 6m 11s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 50s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 30s{color} | {color:green} root: The patch generated 0 new + 108 unchanged - 10 fixed = 108 total (was 118) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 25s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 36s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 34s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 44s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}123m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-14134 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12951288/HDFS-14134.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux fc8eabb705f3 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 3ff8580 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/25752/testReport/ | |
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715870#comment-16715870 ] Hadoop QA commented on HDFS-14134: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 1s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 46s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 21s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 58s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 26s{color} | {color:orange} root: The patch generated 1 new + 108 unchanged - 10 fixed = 109 total (was 118) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 56s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 40s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}104m 33s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-14134 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12951276/HDFS-14134.004.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ac796f6b2ff0 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 80e59e7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715863#comment-16715863 ] Lukas Majercak commented on HDFS-14134: --- Patch 005 to fix minor checkstyle issue in UnreliableImplementation > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715862#comment-16715862 ] Hadoop QA commented on HDFS-14134: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 2s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 36m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 33s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 11s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 39s{color} | {color:orange} root: The patch generated 1 new + 108 unchanged - 10 fixed = 109 total (was 118) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 6s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 1s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 44s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}126m 3s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.util.TestReadWriteDiskValidator | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-14134 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12951272/HDFS-14134.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux a4f98f6eea35 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 80e59e7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle |
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715705#comment-16715705 ] Lukas Majercak commented on HDFS-14134: --- Added tests to cover all cases (SocketExc, IOException, RemoteException, non/idempotent) in patch004. Anyone available to review? > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134.004.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715682#comment-16715682 ] Lukas Majercak commented on HDFS-14134: --- Fixed TestFailoverProxy as well, still might need to add more tests to cover all the exceptions. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715579#comment-16715579 ] Lukas Majercak commented on HDFS-14134: --- I realized I needed to change the mock expectations to fix TestLoadBalancingKMSClientProvider. Added patch003 to fix that. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134.003.patch, HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715566#comment-16715566 ] Hadoop QA commented on HDFS-14134: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 7s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 18s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 56s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 48s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}114m 8s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.crypto.key.kms.TestLoadBalancingKMSClientProvider | | | hadoop.io.retry.TestFailoverProxy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-14134 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12951240/HDFS-14134.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 53617b2e9409 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 64411a6 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | unit |
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715545#comment-16715545 ] Lukas Majercak commented on HDFS-14134: --- Thanks for the review [~knanasi]. I've uploaded patch002 to include non-remote IOException handling + fix TestDefaultRetryPolicy. Seems like this should also fix TestLoadBalancingKMSClientProvider. I'll then fix and add more tests in TestFailoverProxy. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714464#comment-16714464 ] Kitti Nanasi commented on HDFS-14134: - Thanks [~lukmajercak] for the patch! The proposed solution in the pdf seems good to me, but looking at the code, the retry does not happen on non-remote IOExceptions at all, which is not the same behaviour as described in the pdf. Also TestLoadBalancingKMSClientProvider#testClientRetriesIdempotentOpWithIOExceptionSucceedsSecondTime fails because of that. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713440#comment-16713440 ] Lukas Majercak commented on HDFS-14134: --- The unit tests are expected to fail, I can fix them once we agree on how the retry policy should behave. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713419#comment-16713419 ] Hadoop QA commented on HDFS-14134: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 8s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 1s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 8s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 45s{color} | {color:orange} root: The patch generated 2 new + 70 unchanged - 0 fixed = 72 total (was 70) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 23s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 51s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}105m 36s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.crypto.key.kms.TestLoadBalancingKMSClientProvider | | | hadoop.io.retry.TestDefaultRetryPolicy | | | hadoop.io.retry.TestFailoverProxy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-14134 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12951049/HDFS-14134.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e77d6c1073ca 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 154449f | |
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713398#comment-16713398 ] Hadoop QA commented on HDFS-14134: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 54s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 3m 3s{color} | {color:orange} root: The patch generated 2 new + 70 unchanged - 0 fixed = 72 total (was 70) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 57s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 42s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 35s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}109m 22s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.crypto.key.kms.TestLoadBalancingKMSClientProvider | | | hadoop.io.retry.TestDefaultRetryPolicy | | | hadoop.io.retry.TestFailoverProxy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f | | JIRA Issue | HDFS-14134 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12951030/HDFS-14134.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 124b32531eb9 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 154449f | |
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713358#comment-16713358 ] Lukas Majercak commented on HDFS-14134: --- For the retry policy changes, maybe it would make sense to just RETRY when the exception is RemoteException and the operation is not idempotent/atmostonce. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713326#comment-16713326 ] Lukas Majercak commented on HDFS-14134: --- Reuploaded the patch, because this guy Yetus took my pdf file as patch. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713314#comment-16713314 ] Lukas Majercak commented on HDFS-14134: --- Added HDFS-14134_retrypolicy_change_proposal.pdf to illustrate the proposed changes in the FailoverOnNetworkExceptionRetry retry policy. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713319#comment-16713319 ] Hadoop QA commented on HDFS-14134: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HDFS-14134 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-14134 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/25734/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch, > HDFS-14134_retrypolicy_change_proposal.pdf > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713301#comment-16713301 ] Íñigo Goiri commented on HDFS-14134: There are a bunch of related JIRAs: * HADOOP-9792 * HADOOP-7896 * HDFS-5371 * HDFS-4974 * HDFS-2393 * HDFS-1973 The most relevant is HADOOP-9792. [~sureshms] you have involved in most of those JIRAs. Can you chime in? > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where the file > does not have the attribute, NN throws an IOException with message "could not > find attr". The current client retry policy determines the action for that to > be FAILOVER_AND_RETRY. The client then fails over and retries until it > reaches the maximum number of retries. Supposedly, the client should be able > to tell that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client
[ https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713233#comment-16713233 ] Lukas Majercak commented on HDFS-14134: --- Added a patch to demonstrate the issue. > Idempotent operations throwing RemoteException should not be retried by the > client > -- > > Key: HDFS-14134 > URL: https://issues.apache.org/jira/browse/HDFS-14134 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, hdfs-client, ipc >Reporter: Lukas Majercak >Assignee: Lukas Majercak >Priority: Critical > Attachments: HDFS-14134.001.patch > > > Currently, some operations that throw IOException on the NameNode are > evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail > fast. > For example, when calling getXAttr("user.some_attr", file") where file does > not have the attribute, NN throws an IOException with message "could not find > attr". The current client retry policy determines the action for that to be > FAILOVER_AND_RETRY. The client then fails over and retries until it reaches > the maximum number of retries. Supposedly, the client should be able to tell > that this exception is normal and fail fast. > Moreover, even if the action was FAIL, the RetryInvocationHandler looks at > all the retry actions from all requests, and FAILOVER_AND_RETRY takes > precedence over FAIL action. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org