[jira] [Commented] (HADOOP-15216) S3AInputStream to handle reconnect on read() failure better

2018-02-09 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359139#comment-16359139
 ] 

Aaron Fabbri commented on HADOOP-15216:
---

{quote}
We could think about extending the fault injection to inject stream read 
failures intermittently too
{quote}
I have some basic code for this I'm testing now. Will try to post soon, 
probably on HADOOP-13761.

> S3AInputStream to handle reconnect on read() failure better
> ---
>
> Key: HADOOP-15216
> URL: https://issues.apache.org/jira/browse/HADOOP-15216
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> {{S3AInputStream}} handles any IOE through a close() of stream and single 
> re-invocation of the read, with 
> * no backoff
> * no abort of the HTTPS connection, which is just returned to the pool, If 
> httpclient hasn't noticed the failure, it may get returned to the caller on 
> the next read
> Proposed
> * switch to invoker
> * retry policy explicitly for stream (EOF => throw, timeout => close, sleep, 
> retry, etc)
> We could think about extending the fault injection to inject stream read 
> failures intermittently too, though it would need something in S3AInputStream 
> to (optionally) wrap the http input streams with the failing stream. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15216) S3AInputStream to handle reconnect on read() failure better

2018-02-08 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357725#comment-16357725
 ] 

Aaron Fabbri commented on HADOOP-15216:
---

I'm working on a patch that uses {{retry()}} for {{onReadFailure()}} in the s3a 
input stream, but only when s3guard is enabled.

What do you want to do for HEAD->200, GET -> 400 on the non-s3guard case?  
Currently we retry once immediately.  Was going to keep that behavior for now, 
unless you think otherwise.  We could add another retry policy config knob 
"input stream retry always" or something and default to off.

{quote}
+on s3guard, GET could be 403 -> fail
{quote}
Trying to parse this.  We have a couple cases in open(), when we call 
getFileStatus():
- MetadataStore sees tombstone and throws FNFE.
- MetadataStore has no state for the path, returns null. We fall through to 
s3GetFileStatus(), which should throw FNFE which bypasses the retry policy, 
right?




> S3AInputStream to handle reconnect on read() failure better
> ---
>
> Key: HADOOP-15216
> URL: https://issues.apache.org/jira/browse/HADOOP-15216
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> {{S3AInputStream}} handles any IOE through a close() of stream and single 
> re-invocation of the read, with 
> * no backoff
> * no abort of the HTTPS connection, which is just returned to the pool, If 
> httpclient hasn't noticed the failure, it may get returned to the caller on 
> the next read
> Proposed
> * switch to invoker
> * retry policy explicitly for stream (EOF => throw, timeout => close, sleep, 
> retry, etc)
> We could think about extending the fault injection to inject stream read 
> failures intermittently too, though it would need something in S3AInputStream 
> to (optionally) wrap the http input streams with the failing stream. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15216) S3AInputStream to handle reconnect on read() failure better

2018-02-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357458#comment-16357458
 ] 

Steve Loughran commented on HADOOP-15216:
-

+on s3guard, GET could be 403 -> fail

> S3AInputStream to handle reconnect on read() failure better
> ---
>
> Key: HADOOP-15216
> URL: https://issues.apache.org/jira/browse/HADOOP-15216
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> {{S3AInputStream}} handles any IOE through a close() of stream and single 
> re-invocation of the read, with 
> * no backoff
> * no abort of the HTTPS connection, which is just returned to the pool, If 
> httpclient hasn't noticed the failure, it may get returned to the caller on 
> the next read
> Proposed
> * switch to invoker
> * retry policy explicitly for stream (EOF => throw, timeout => close, sleep, 
> retry, etc)
> We could think about extending the fault injection to inject stream read 
> failures intermittently too, though it would need something in S3AInputStream 
> to (optionally) wrap the http input streams with the failing stream. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15216) S3AInputStream to handle reconnect on read() failure better

2018-02-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357390#comment-16357390
 ] 

Steve Loughran commented on HADOOP-15216:
-

HADOOP-13761 covers the condition where S3Guard finds the file in its 
geFileStatus in the {{FileSystem.open()}} call, but when S3AInputStream 
initiates the GET a 404 comes back: FNFE should be handled with backoff too

* Maybe: special handling for that first attempt, as an FNFE on later ones 
probably means someone deleted the file
* The situation of HEAD -> 200, GET -> 400 could also arise if the GET went to 
a different shard from the HEAD. So the condition could also arise in 
non-S3guarded buckets, sometimes

> S3AInputStream to handle reconnect on read() failure better
> ---
>
> Key: HADOOP-15216
> URL: https://issues.apache.org/jira/browse/HADOOP-15216
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
>Priority: Major
>
> {{S3AInputStream}} handles any IOE through a close() of stream and single 
> re-invocation of the read, with 
> * no backoff
> * no abort of the HTTPS connection, which is just returned to the pool, If 
> httpclient hasn't noticed the failure, it may get returned to the caller on 
> the next read
> Proposed
> * switch to invoker
> * retry policy explicitly for stream (EOF => throw, timeout => close, sleep, 
> retry, etc)
> We could think about extending the fault injection to inject stream read 
> failures intermittently too, though it would need something in S3AInputStream 
> to (optionally) wrap the http input streams with the failing stream. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org