[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-05-07 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834936#comment-16834936
 ] 

Lukas Majercak commented on HDFS-14134:
---

Hi [~John Smith]. I'm not sure if it's that simple.

 Say you have 2 NNs, where nn1 is active, nn2 is standby. If your current 
target is nn2, but we send a request to both, you can get responses like:
nn1 - RETRY
nn2 - RETRY_AND_FAILOVER

In this case, the ideal scenario would be to failover to nn1, but what you're 
proposing would not do that. 

I think this obviously has room for improvement, as in some cases RETRY > 
FAILOVER is reasonable, but I'd like to refrain from increasing the scope of 
this JIRA. Maybe we can create a new one and have the discussion there?

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134.007.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-05-06 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834323#comment-16834323
 ] 

Yuxuan Wang commented on HDFS-14134:


Hi [~lukmajercak]. No. I just figure that {{StandbyException}} always trigger 
{{FAILOVER_AND_RETRY}}. And with hedging proxy, the action 
{{FAILOVER_AND_RETRY}} will always be gotten and will cover all actions have 
lower order than {{FAILOVER_AND_RETRY}}.
 I mean that the order of {{enum RetryDecision}} may should be 
{{FAILOVER_AND_RETRY < RETRY < FAIL}} in patch 007. 
Or in short, {{FAILOVER_AND_RETRY}}'s order should be the lowest.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134.007.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-05-03 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832730#comment-16832730
 ] 

Lukas Majercak commented on HDFS-14134:
---

Hi [~John Smith]. I'm not sure I'm following, what's the concern with 
StandbyException triggering FAILOVER_AND_RETRY with the logic in patch 007 ?

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134.007.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-04-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828887#comment-16828887
 ] 

Hadoop QA commented on HDFS-14134:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} HDFS-14134 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12954490/HDFS-14134.007.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/26723/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134.007.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-04-28 Thread Yuxuan Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16828886#comment-16828886
 ] 

Yuxuan Wang commented on HDFS-14134:


Hello, anyone is working on this? I find a bug in 
{{org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider}} . 
Just like [~lukmajercak] said:
{quote}Also note that previously, if a hedging request got FAILOVER_RETRY and 
some request got SocketExc on nonidempotent operation (e.g. FAIL), the client 
would still pick FAILOVER_RETRY over FAIL, so i think we are fixing an issue 
here as well.
{quote}
But more than this, standby namenode will always throw back StandbyException 
which can cause {{FAILOVER_AND_RETRY}} action. It will cover all actions have 
lower order than {{FAILOVER_AND_RETRY}}, such as {{RETRY}} in 
[^HDFS-14134.007.patch].
I mean, the correct order should be {{Ordering: FAILOVER_AND_RETRY < RETRY < 
FAIL}}, right ?

 

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134.007.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-02-13 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767831#comment-16767831
 ] 

Íñigo Goiri commented on HDFS-14134:


As nobody else other than [~knanasi] is taking over the review, I'll give it a 
try.
I've tested the equivalent of 
{{testIdempotentOperationShouldNotGetStuckInRetries()}} and currently it 
retries in all Namenodes which I think is wrong as stated in the JIRA.
>From the unit test, the new behavior seems better.

I would improve the unit test a little:
* Use {{LambdaTestUtils#intercept()}}.
* Can we constraint a little more the verification and do a tighter check 
instead of {{atMost()}}.
* Is there other checks we can add here? Negative cases? Are methods with 
FAILOVER_AND_RETRY and FAIL fully covered?

BTW, [~lukmajercak] have you tested this in your setup? What scale? What impact 
have you seen?


> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134.007.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-01-11 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740810#comment-16740810
 ] 

Lukas Majercak commented on HDFS-14134:
---

+ more people [~szetszwo], [~jingzhao].

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134.007.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-01-11 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740802#comment-16740802
 ] 

Lukas Majercak commented on HDFS-14134:
---

Thanks [~knanasi]. 

[~atm], [~eli], [~sureshms], [~sanjay.radia], [~xgong], [~jianhe]; I see you 
guys worked on this part of the codebase before, anyone available to review 
this? Thanks!

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134.007.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-01-11 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740605#comment-16740605
 ] 

Íñigo Goiri commented on HDFS-14134:


[^HDFS-14134.007.patch] LGTM.
However, I have limited experience with this part of the client.
It would be nice for others to give it a look.
[~lukmajercak], from the previous commits, who do you think would be a good 
candidate for review?

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134.007.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-01-11 Thread Kitti Nanasi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740232#comment-16740232
 ] 

Kitti Nanasi commented on HDFS-14134:
-

Thanks [~lukmajercak] for the work here!
{quote}Also note that previously, if a hedging request got FAILOVER_RETRY and 
some request got SocketExc on nonidempotent operation (e.g. FAIL), the client 
would still pick FAILOVER_RETRY over FAIL, so i think we are fixing an issue 
here as well.
{quote}
Sounds good that you found and fixed this issue as well.

+1 (non-binding) New patch looks good to me!

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134.007.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-01-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739894#comment-16739894
 ] 

Hadoop QA commented on HDFS-14134:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 24s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
35s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
27s{color} | {color:green} root: The patch generated 0 new + 108 unchanged - 10 
fixed = 108 total (was 118) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 53s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 39s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
55s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}119m  2s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.ssl.TestSSLFactory |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HDFS-14134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12954490/HDFS-14134.007.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 4ec5ff729e55 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 33c009a4 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_191 |
| findbugs | v3.1.0-RC1 |
| unit | 

[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-01-10 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739821#comment-16739821
 ] 

Lukas Majercak commented on HDFS-14134:
---

Patch 007 to fix checkstyle + whitespace warnings.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134.007.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-01-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739809#comment-16739809
 ] 

Hadoop QA commented on HDFS-14134:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 58s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
32s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 19m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 19m 
15s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
4m  4s{color} | {color:orange} root: The patch generated 10 new + 108 unchanged 
- 10 fixed = 118 total (was 118) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
31s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  9m 30s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
5s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}127m 39s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.security.ssl.TestSSLFactory |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HDFS-14134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12954472/HDFS-14134.006.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 40318a766566 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 33c009a4 |
| maven | version: Apache 

[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-01-10 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739719#comment-16739719
 ] 

Lukas Majercak commented on HDFS-14134:
---

Added patch006 together with HDFS-14134_retrypolicy_change_proposal_1.pdf to 
explain the changes. After the discussion, it seems like just changing the 
logic for remote IOExceptions together with the priority for retry actions will 
be enough. Could you review this [~knanasi] ? Thanks!

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-01-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739704#comment-16739704
 ] 

Hadoop QA commented on HDFS-14134:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-14134 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14134 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/25956/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134.006.patch, HDFS-14134_retrypolicy_change_proposal.pdf, 
> HDFS-14134_retrypolicy_change_proposal_1.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-01-09 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738702#comment-16738702
 ] 

Lukas Majercak commented on HDFS-14134:
---

Also note that previously, if a hedging request got FAILOVER_RETRY and some 
request got SocketExc on nonidempotent operation (e.g. FAIL), the client would 
still pick FAILOVER_RETRY over FAIL, so i think we are fixing an issue here as 
well.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2019-01-09 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738700#comment-16738700
 ] 

Lukas Majercak commented on HDFS-14134:
---

I see, that makes sense, I'm happy to change
SocketException (non-idempotent)
IOException (non-idempotent)
back to being FAIL.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-13 Thread Kitti Nanasi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720572#comment-16720572
 ] 

Kitti Nanasi commented on HDFS-14134:
-

The relevant part is the following:
{quote}in FailoverOnNetworkExceptionRetry#shouldRetry we don't fail-over and 
retry if we're making a non-idempotent call and there's an IOException or 
SocketException that's not Connect, NoRouteToHost, UnknownHost, or Standby. The 
rationale of course is that the operation may have reached the server and 
retrying elsewhere could leave us in an insconsistent state. This means if a 
client doing a create/delete which gets a SocketTimeoutException (which is an 
IOE) or an EOF SocketException the exception will be thrown all the way up to 
the caller of FileSystem/FileContext. That's reasonable because only the user 
of the API at this level has sufficient knoweldge of how to handle the failure, 
eg if they get such an exception after issuing a delete they can check if the 
file still exists and if so re-issue the delete (however they may also not want 
to do this, and FileContext doesn't know which).
{quote}

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-13 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720564#comment-16720564
 ] 

Lukas Majercak commented on HDFS-14134:
---

I'll go through that discussion.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-13 Thread Kitti Nanasi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720563#comment-16720563
 ] 

Kitti Nanasi commented on HDFS-14134:
-

Yes, this change covers that, I just wanted to understand why you changed it 
like that, but we're pretty much on the same page now.

I have only one concern, which is the case of non-remote IOExceptions on 
non-idempotent operations, I'm not sure if retrying those will cause any 
problems. For reference there is a discussion on 
[HADOOP-7380|https://issues.apache.org/jira/browse/HADOOP-7380] on why it was 
introduced.

Other than that patch v5 looks good.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-13 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720506#comment-16720506
 ] 

Lukas Majercak commented on HDFS-14134:
---

I agree non-remote IOExceptions could be network related, but this is covered 
right? Non-remote IOExceptions are retried with this change, no matter whether 
the operation is idempotent.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-13 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720507#comment-16720507
 ] 

Lukas Majercak commented on HDFS-14134:
---

I'd argue that this change is even safer, because previously the retry action 
would be FAIL for:
SocketExceptions (non-idempotent)
Non-remote IOExceptions (non-idempotent)

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-13 Thread Kitti Nanasi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720204#comment-16720204
 ] 

Kitti Nanasi commented on HDFS-14134:
-

I totally agree with you that retrying getXAttr on "attr could not find" 
IOException is not good and wasteful, and that we have to have a better concept 
than the current.

But we also have to keep in mind that the FailoverOnNetworkExceptionRetry 
policy is used by many parts of the code and it is a bit risky to change it. I 
think the idea behind the previous design is that non remote IOExceptions may 
be network related exceptions, so it is worth to retry them if the operation is 
idempotent.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-12 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719444#comment-16719444
 ] 

Lukas Majercak commented on HDFS-14134:
---

Retrying failed idempotent operations might be safe, but it surely is wasteful. 
Check the test I wrote for getXAttr, the client just retries the same exception 
over and over for no reason. We have to have a concept of nonretriable 
exceptions in HDFS, and I feel like RemoteException of an idempotent operation 
is a very good start. 

The previous design was very strange, they chose to FAIL if the operation was 
not idempotent and the exception was not Remote, which does not make a lot of 
sense to me.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-12 Thread Kitti Nanasi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718752#comment-16718752
 ] 

Kitti Nanasi commented on HDFS-14134:
-

[~lukmajercak], you are correct on the definition of idempotency. I think the 
original approach in retrying was that idempotent operations don't change 
internal state, so it is safe to retry them. For example if you just get a 
value, it is always safe to retry, but if you renew a delegation token, it is a 
more complex question if it is safe to retry that or not, because maybe the 
renewal already took place before failing, maybe not, and if it already took 
place, is it safe to renew it again?

 

By the way idempotency originally was only considered in case of non-remote 
IOExceptions, why did that change?

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-11 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16717754#comment-16717754
 ] 

Lukas Majercak commented on HDFS-14134:
---

Why should we retry if the operation is idempotent and the exception is 
remoteexception? The definition of an idempotent operation is that it will have 
the same outcome next time as well right? In that case, we should just fail 
fast. Check 
TestRequestHedgingProxyProvider.testIdempotentOperationShouldNotGetStuckInRetries

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-11 Thread Kitti Nanasi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716690#comment-16716690
 ] 

Kitti Nanasi commented on HDFS-14134:
-

Thanks for the new patch [~lukmajercak]!

It looks better regarding retrying on non-remote IOExceptions, but there is one 
thing I don't understand, which I think is wrong in the pdf as well. In case of 
remote IOException, we should retry if the operation is idempotent, and not the 
opposite. 
 So instead of this code:
{code:java}
else if (e instanceof IOException) {
if (e instanceof RemoteException && isIdempotentOrAtMostOnce) {
  return new RetryAction(RetryAction.RetryDecision.FAIL, 0,
  "Remote exception and the invoked method is idempotent " +
  "or at most once.");
}
return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY,
getFailoverOrRetrySleepTime(failovers));
}
{code}
I think it should look like this:
{code:java}
else if (e instanceof IOException) {
if (e instanceof RemoteException && !isIdempotentOrAtMostOnce) {
  return new RetryAction(RetryAction.RetryDecision.FAIL, 0,
  "Remote exception and the invoked method is idempotent " +
  "or at most once.");
}
return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY,
getFailoverOrRetrySleepTime(failovers));
}
{code}
What do you think [~lukmajercak]?
  

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-10 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716043#comment-16716043
 ] 

Íñigo Goiri commented on HDFS-14134:


I think the change in semantics makes sense and +1 on that.

Regarding  [^HDFS-14134.005.patch] itslef:
* There are a bunch of changes that are just cosmetic; on the fence on what to 
do with them.
* For checking the exception in 
{{testIdempotentOperationShouldNotGetStuckInRetries()}} we may want to just use 
{{LambdaTestUtils#intercept}}.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16716033#comment-16716033
 ] 

Hadoop QA commented on HDFS-14134:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  6m 
11s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 50s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
38s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 16m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
30s{color} | {color:green} root: The patch generated 0 new + 108 unchanged - 10 
fixed = 108 total (was 118) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m 
34s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
44s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
39s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}123m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HDFS-14134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951288/HDFS-14134.005.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux fc8eabb705f3 4.4.0-138-generic #164~14.04.1-Ubuntu SMP Fri Oct 
5 08:56:16 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 3ff8580 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/25752/testReport/ |
| 

[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715870#comment-16715870
 ] 

Hadoop QA commented on HDFS-14134:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
1s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 46s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 13m 
58s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 26s{color} | {color:orange} root: The patch generated 1 new + 108 unchanged 
- 10 fixed = 109 total (was 118) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
56s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
40s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}104m 33s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HDFS-14134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951276/HDFS-14134.004.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux ac796f6b2ff0 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 80e59e7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-10 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715863#comment-16715863
 ] 

Lukas Majercak commented on HDFS-14134:
---

Patch 005 to fix minor checkstyle issue in UnreliableImplementation

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, HDFS-14134.005.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715862#comment-16715862
 ] 

Hadoop QA commented on HDFS-14134:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
2s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 36m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 2s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
9s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
45s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 14m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 14m 
11s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m 39s{color} | {color:orange} root: The patch generated 1 new + 108 unchanged 
- 10 fixed = 109 total (was 118) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m  6s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m  1s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
44s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}126m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.util.TestReadWriteDiskValidator |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HDFS-14134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951272/HDFS-14134.003.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux a4f98f6eea35 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 80e59e7 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 

[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-10 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715705#comment-16715705
 ] 

Lukas Majercak commented on HDFS-14134:
---

Added tests to cover all cases (SocketExc, IOException, RemoteException, 
non/idempotent) in patch004. Anyone available to review?

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134.004.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-10 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715682#comment-16715682
 ] 

Lukas Majercak commented on HDFS-14134:
---

Fixed TestFailoverProxy as well, still might need to add more tests to cover 
all the exceptions.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-10 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715579#comment-16715579
 ] 

Lukas Majercak commented on HDFS-14134:
---

I realized I needed to change the mock expectations to fix 
TestLoadBalancingKMSClientProvider. Added patch003 to fix that.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134.003.patch, HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715566#comment-16715566
 ] 

Hadoop QA commented on HDFS-14134:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
15s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
7s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
10s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
33s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  7m 56s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
48s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}114m  8s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.crypto.key.kms.TestLoadBalancingKMSClientProvider 
|
|   | hadoop.io.retry.TestFailoverProxy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HDFS-14134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951240/HDFS-14134.002.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 53617b2e9409 4.4.0-139-generic #165~14.04.1-Ubuntu SMP Wed Oct 
31 10:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 64411a6 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC1 |
| unit | 

[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-10 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715545#comment-16715545
 ] 

Lukas Majercak commented on HDFS-14134:
---

Thanks for the review [~knanasi]. I've uploaded patch002 to include non-remote 
IOException handling + fix TestDefaultRetryPolicy. Seems like this should also 
fix TestLoadBalancingKMSClientProvider. I'll then fix and add more tests in 
TestFailoverProxy.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, HDFS-14134.002.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-10 Thread Kitti Nanasi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714464#comment-16714464
 ] 

Kitti Nanasi commented on HDFS-14134:
-

Thanks [~lukmajercak] for the patch!

The proposed solution in the pdf seems good to me, but looking at the code, the 
retry does not happen on non-remote IOExceptions at all, which is not the same 
behaviour as described in the pdf. Also 
TestLoadBalancingKMSClientProvider#testClientRetriesIdempotentOpWithIOExceptionSucceedsSecondTime
 fails because of that.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-07 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713440#comment-16713440
 ] 

Lukas Majercak commented on HDFS-14134:
---

The unit tests are expected to fail, I can fix them once we agree on how the 
retry policy should behave.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713419#comment-16713419
 ] 

Hadoop QA commented on HDFS-14134:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
8s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  3m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  1s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m  
8s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 45s{color} | {color:orange} root: The patch generated 2 new + 70 unchanged - 
0 fixed = 72 total (was 70) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m 23s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
51s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
34s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}105m 36s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.crypto.key.kms.TestLoadBalancingKMSClientProvider 
|
|   | hadoop.io.retry.TestDefaultRetryPolicy |
|   | hadoop.io.retry.TestFailoverProxy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HDFS-14134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951049/HDFS-14134.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e77d6c1073ca 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 
17:16:02 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 154449f |
| 

[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713398#comment-16713398
 ] 

Hadoop QA commented on HDFS-14134:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
13s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 
 5s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 16m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  2s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 15m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 15m 
54s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
3m  3s{color} | {color:orange} root: The patch generated 2 new + 70 unchanged - 
0 fixed = 72 total (was 70) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 12s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  7m 57s{color} 
| {color:red} hadoop-common in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
42s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}109m 22s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.crypto.key.kms.TestLoadBalancingKMSClientProvider 
|
|   | hadoop.io.retry.TestDefaultRetryPolicy |
|   | hadoop.io.retry.TestFailoverProxy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8f97d6f |
| JIRA Issue | HDFS-14134 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12951030/HDFS-14134.001.patch |
| Optional Tests |  dupname  asflicense  compile  javac  javadoc  mvninstall  
mvnsite  unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 124b32531eb9 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 
10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 154449f |
| 

[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-07 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713358#comment-16713358
 ] 

Lukas Majercak commented on HDFS-14134:
---

For the retry policy changes, maybe it would make sense to just RETRY when the 
exception is RemoteException and the operation is not idempotent/atmostonce.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-07 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713326#comment-16713326
 ] 

Lukas Majercak commented on HDFS-14134:
---

Reuploaded the patch, because this guy Yetus took my pdf file as patch.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-07 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713314#comment-16713314
 ] 

Lukas Majercak commented on HDFS-14134:
---

Added HDFS-14134_retrypolicy_change_proposal.pdf to illustrate the proposed 
changes in the FailoverOnNetworkExceptionRetry retry policy.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713319#comment-16713319
 ] 

Hadoop QA commented on HDFS-14134:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  6s{color} 
| {color:red} HDFS-14134 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14134 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/25734/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch, 
> HDFS-14134_retrypolicy_change_proposal.pdf
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-07 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713301#comment-16713301
 ] 

Íñigo Goiri commented on HDFS-14134:


There are a bunch of related JIRAs:
* HADOOP-9792
* HADOOP-7896
* HDFS-5371
* HDFS-4974
* HDFS-2393
* HDFS-1973

The most relevant is HADOOP-9792.
[~sureshms] you have involved in most of those JIRAs.
Can you chime in?

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where the file 
> does not have the attribute, NN throws an IOException with message "could not 
> find attr". The current client retry policy determines the action for that to 
> be FAILOVER_AND_RETRY. The client then fails over and retries until it 
> reaches the maximum number of retries. Supposedly, the client should be able 
> to tell that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14134) Idempotent operations throwing RemoteException should not be retried by the client

2018-12-07 Thread Lukas Majercak (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713233#comment-16713233
 ] 

Lukas Majercak commented on HDFS-14134:
---

Added a patch to demonstrate the issue.

> Idempotent operations throwing RemoteException should not be retried by the 
> client
> --
>
> Key: HDFS-14134
> URL: https://issues.apache.org/jira/browse/HDFS-14134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, hdfs-client, ipc
>Reporter: Lukas Majercak
>Assignee: Lukas Majercak
>Priority: Critical
> Attachments: HDFS-14134.001.patch
>
>
> Currently, some operations that throw IOException on the NameNode are 
> evaluated by RetryPolicy as FAILOVER_AND_RETRY, but they should just fail 
> fast.
> For example, when calling getXAttr("user.some_attr", file") where file does 
> not have the attribute, NN throws an IOException with message "could not find 
> attr". The current client retry policy determines the action for that to be 
> FAILOVER_AND_RETRY. The client then fails over and retries until it reaches 
> the maximum number of retries. Supposedly, the client should be able to tell 
> that this exception is normal and fail fast. 
> Moreover, even if the action was FAIL, the RetryInvocationHandler looks at 
> all the retry actions from all requests, and FAILOVER_AND_RETRY takes 
> precedence over FAIL action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org