[jira] [Commented] (HADOOP-14070) S3a: Failed to reset the request input stream/make S3A uploadPart() retriable

2018-07-26 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558730#comment-16558730
 ] 

Steve Loughran commented on HADOOP-14070:
-

We can fix this with {{uploadPart()}} implementing the retry logic itself, 
which will also handle other transient errors.

Implies that {{com.amazonaws.ResetException}} will need to go into the retry 
table as retriable

> S3a: Failed to reset the request input stream/make S3A uploadPart() retriable
> -
>
> Key: HADOOP-14070
> URL: https://issues.apache.org/jira/browse/HADOOP-14070
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Priority: Major
>
> {code}
> Feb 07, 2017 8:05:46 AM 
> com.google.common.util.concurrent.Futures$CombinedFuture 
> setExceptionAndMaybeLog
> SEVERE: input future failed.
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1114)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:501)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:492)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1219)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Resetting to invalid mark
> at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
> at 
> com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
> at 
> com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:169)
> at 
> com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> ... 20 more
> 2017-02-07 08:05:46 WARN S3AInstrumentation:777 - Closing output stream 
> statistics while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=519, blocksInQueue=0, blocksActive=1, 
> blockUploadsCompleted=518, blockUploadsFailed=2, bytesPendingUpload=82528300, 
> bytesUploaded=54316236800, blocksAllocated=519, blocksReleased=519, 
> blocksActivelyAllocated=0, exceptionsInMultipartFinalize=0, 
> transferDuration=2637812 ms, queueDuration=839 ms, averageQueueTime=1 ms, 
> totalUploadDuration=2638651 ms, effectiveBandwidth=2.05848506680118E7 bytes/s}
> Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
> Multi-part upload with id 
> 'uDonLgtsyeToSmhyZuNb7YrubCDiyXCCQy4mdVc5ZmYWPPHyZ3H3ZlFZzKktaPUiYb7uT4.oM.lcyoazHF7W8pK4xWmXV4RWmIYGYYhN6m25nWRrBEE9DcJHcgIhFD8xd7EKIjijEd1k4S5JY1HQvA--'
>  to 2017/history-170130.orc on 2017/history-170130.orc: 
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> 

[jira] [Commented] (HADOOP-14070) S3a: Failed to reset the request input stream

2018-06-25 Thread Seth Fitzsimmons (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522551#comment-16522551
 ] 

Seth Fitzsimmons commented on HADOOP-14070:
---

I did not set fs.s3a.fast.upload.buffer explicitly.

> S3a: Failed to reset the request input stream
> -
>
> Key: HADOOP-14070
> URL: https://issues.apache.org/jira/browse/HADOOP-14070
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Priority: Major
>
> {code}
> Feb 07, 2017 8:05:46 AM 
> com.google.common.util.concurrent.Futures$CombinedFuture 
> setExceptionAndMaybeLog
> SEVERE: input future failed.
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1114)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:501)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:492)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1219)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Resetting to invalid mark
> at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
> at 
> com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
> at 
> com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:169)
> at 
> com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> ... 20 more
> 2017-02-07 08:05:46 WARN S3AInstrumentation:777 - Closing output stream 
> statistics while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=519, blocksInQueue=0, blocksActive=1, 
> blockUploadsCompleted=518, blockUploadsFailed=2, bytesPendingUpload=82528300, 
> bytesUploaded=54316236800, blocksAllocated=519, blocksReleased=519, 
> blocksActivelyAllocated=0, exceptionsInMultipartFinalize=0, 
> transferDuration=2637812 ms, queueDuration=839 ms, averageQueueTime=1 ms, 
> totalUploadDuration=2638651 ms, effectiveBandwidth=2.05848506680118E7 bytes/s}
> Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
> Multi-part upload with id 
> 'uDonLgtsyeToSmhyZuNb7YrubCDiyXCCQy4mdVc5ZmYWPPHyZ3H3ZlFZzKktaPUiYb7uT4.oM.lcyoazHF7W8pK4xWmXV4RWmIYGYYhN6m25nWRrBEE9DcJHcgIhFD8xd7EKIjijEd1k4S5JY1HQvA--'
>  to 2017/history-170130.orc on 2017/history-170130.orc: 
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int): Failed to 
> reset the request input stream; If the request involves an input stream, the 
> maximum stream buffer size can be configured via 
> 

[jira] [Commented] (HADOOP-14070) S3a: Failed to reset the request input stream

2018-06-25 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522437#comment-16522437
 ] 

Steve Loughran commented on HADOOP-14070:
-

Looking @ this again, tracing codepaths in the IDE

* When disk is being used as the buffer for block upload, we pass in the File 
reference: expectation, AWS SDK does mark/reset against it, reopening the file 
if needed. We explicitly moved to passing in the file ref for better recovery.
* When a byte array or bytebuffer is used to buffer blocks, then we pass in an 
associated stream which has no limits on where it can reset...its just an 
offset into an array

Stack trace here doesn't show us what is being used in the S3A connector, just 
that the AWS buffering couldn't reset its buffering wrapper as it had gone too 
far.  And why was that {{SdkBufferedInputStream}} being used? That's the 
interesting question.

# tracing things through the IDE, it would only seem to happen if 
{{InputStream.markSupported() == false}}.
# the byte array and our won block array output streams do return true, do let 
you reset to anywhere.
# So either that wasn't picked up, and they were wrapped, or this was a file 
source (it's the default, after all), and somehow that got wrapped by a buffer.

[~mojodna] it's been a while —but do you remember if you explicitly set 
{{fs.s3a.fast.upload.buffer}} to buffer via byte array "array" or byte buffer, 
"bytebuffer"? If not, something isn't right with what the SDK claims to do for 
wrapping file sources & handling falures there.

w.r.t actions, if it is in the SDK, that's the place for the fix. its happening 
in the xfer manager threads, only being manifest for us in the waitForUploads() 
code; there's no way we could wrap and fix this


> S3a: Failed to reset the request input stream
> -
>
> Key: HADOOP-14070
> URL: https://issues.apache.org/jira/browse/HADOOP-14070
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Priority: Major
>
> {code}
> Feb 07, 2017 8:05:46 AM 
> com.google.common.util.concurrent.Futures$CombinedFuture 
> setExceptionAndMaybeLog
> SEVERE: input future failed.
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1114)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:501)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:492)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1219)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Resetting to invalid mark
> at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
> at 
> com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
> at 
> com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:169)
> at 
> 

[jira] [Commented] (HADOOP-14070) S3a: Failed to reset the request input stream

2018-02-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369474#comment-16369474
 ] 

Steve Loughran commented on HADOOP-14070:
-

Reviewing. Sorry, I'd missed this before.

# 80 MB parts of a 40GB upload; 519 blocks up
# write of a block failed; SdkBufferedInputStream/BufferedInputStream  couldn't 
return to a previous mark, so reported a failure to the progress listener then 
threw {{{com.amazonaws.ResetException}} which extends {{SdkClientException}}
# {{BlockOutputStream.close()}} call got a failure from the upload, so failed.

I actually think we are close being able to handle this, as that part upload 
goes through {{WriteOperationsHelper.uploadPart()}}, which had the retry 
handler. And with HADOOP-14028 which went in after this was filed, we are 
handling the File directly to AWS S3, so the mark/reset should work.

But that doesn't have any explicit code for {{com.amazonaws.ResetException}}, 
which is what we need to look at.

Adding explicit handling for this in the S3ARetryPolicy would guarantee that in 
any retryable POST/PUT operation the failure would trigger a retry. We can do 
this without needing to to wait and see if the problem is now completely fixed, 
whereas its one-line to add a new idempotent retry policy, and it will isolate 
us from any AWS client regressions.

Targeting Hadoop 3.2


> S3a: Failed to reset the request input stream
> -
>
> Key: HADOOP-14070
> URL: https://issues.apache.org/jira/browse/HADOOP-14070
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Priority: Major
>
> {code}
> Feb 07, 2017 8:05:46 AM 
> com.google.common.util.concurrent.Futures$CombinedFuture 
> setExceptionAndMaybeLog
> SEVERE: input future failed.
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1114)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:501)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:492)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1219)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Resetting to invalid mark
> at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
> at 
> com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
> at 
> com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:169)
> at 
> com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> ... 20 more
> 2017-02-07 08:05:46 WARN S3AInstrumentation:777 - Closing output stream 
> statistics while data is still marked as pending upload in 
> 

[jira] [Commented] (HADOOP-14070) S3a: Failed to reset the request input stream

2018-02-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369423#comment-16369423
 ] 

Steve Loughran commented on HADOOP-14070:
-

Looks like the stream reset logic isn't going back to the filesystem, which can 
reset to anywhere it likes in the block

> S3a: Failed to reset the request input stream
> -
>
> Key: HADOOP-14070
> URL: https://issues.apache.org/jira/browse/HADOOP-14070
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Priority: Major
>
> {code}
> Feb 07, 2017 8:05:46 AM 
> com.google.common.util.concurrent.Futures$CombinedFuture 
> setExceptionAndMaybeLog
> SEVERE: input future failed.
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1114)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:501)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:492)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1219)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Resetting to invalid mark
> at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
> at 
> com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
> at 
> com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:169)
> at 
> com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> ... 20 more
> 2017-02-07 08:05:46 WARN S3AInstrumentation:777 - Closing output stream 
> statistics while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=519, blocksInQueue=0, blocksActive=1, 
> blockUploadsCompleted=518, blockUploadsFailed=2, bytesPendingUpload=82528300, 
> bytesUploaded=54316236800, blocksAllocated=519, blocksReleased=519, 
> blocksActivelyAllocated=0, exceptionsInMultipartFinalize=0, 
> transferDuration=2637812 ms, queueDuration=839 ms, averageQueueTime=1 ms, 
> totalUploadDuration=2638651 ms, effectiveBandwidth=2.05848506680118E7 bytes/s}
> Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
> Multi-part upload with id 
> 'uDonLgtsyeToSmhyZuNb7YrubCDiyXCCQy4mdVc5ZmYWPPHyZ3H3ZlFZzKktaPUiYb7uT4.oM.lcyoazHF7W8pK4xWmXV4RWmIYGYYhN6m25nWRrBEE9DcJHcgIhFD8xd7EKIjijEd1k4S5JY1HQvA--'
>  to 2017/history-170130.orc on 2017/history-170130.orc: 
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int): Failed to 
> reset the request input stream; If the request involves an input stream, the 
> maximum