[jira] [Commented] (HADOOP-14070) S3a: Failed to reset the request input stream

2018-06-25 Thread Seth Fitzsimmons (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522551#comment-16522551
 ] 

Seth Fitzsimmons commented on HADOOP-14070:
---

I did not set fs.s3a.fast.upload.buffer explicitly.

> S3a: Failed to reset the request input stream
> -
>
> Key: HADOOP-14070
> URL: https://issues.apache.org/jira/browse/HADOOP-14070
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Priority: Major
>
> {code}
> Feb 07, 2017 8:05:46 AM 
> com.google.common.util.concurrent.Futures$CombinedFuture 
> setExceptionAndMaybeLog
> SEVERE: input future failed.
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1114)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:501)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:492)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1219)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Resetting to invalid mark
> at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
> at 
> com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
> at 
> com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:169)
> at 
> com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> ... 20 more
> 2017-02-07 08:05:46 WARN S3AInstrumentation:777 - Closing output stream 
> statistics while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=519, blocksInQueue=0, blocksActive=1, 
> blockUploadsCompleted=518, blockUploadsFailed=2, bytesPendingUpload=82528300, 
> bytesUploaded=54316236800, blocksAllocated=519, blocksReleased=519, 
> blocksActivelyAllocated=0, exceptionsInMultipartFinalize=0, 
> transferDuration=2637812 ms, queueDuration=839 ms, averageQueueTime=1 ms, 
> totalUploadDuration=2638651 ms, effectiveBandwidth=2.05848506680118E7 bytes/s}
> Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
> Multi-part upload with id 
> 'uDonLgtsyeToSmhyZuNb7YrubCDiyXCCQy4mdVc5ZmYWPPHyZ3H3ZlFZzKktaPUiYb7uT4.oM.lcyoazHF7W8pK4xWmXV4RWmIYGYYhN6m25nWRrBEE9DcJHcgIhFD8xd7EKIjijEd1k4S5JY1HQvA--'
>  to 2017/history-170130.orc on 2017/history-170130.orc: 
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int): Failed to 
> reset the request input stream; If the request involves an input stream, the 
> maximum stream buffer size can be configured via 
> 

[jira] [Commented] (HADOOP-14028) S3A BlockOutputStreams doesn't delete temporary files in multipart uploads or handle part upload failures

2017-02-24 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883571#comment-15883571
 ] 

Seth Fitzsimmons commented on HADOOP-14028:
---

You can disregard my comments about prematurely / incorrectly closing the 
output file when the process terminates. It turns out that the input producer 
was the process being OOM killed, so I need to fix handling of premature end of 
inputs.

> S3A BlockOutputStreams doesn't delete temporary files in multipart uploads or 
> handle part upload failures
> -
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2-008.patch, 
> HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, 
> HADOOP-14028-branch-2.8-004.patch, HADOOP-14028-branch-2.8-005.patch, 
> HADOOP-14028-branch-2.8-007.patch, HADOOP-14028-branch-2.8-008.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-23 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881311#comment-15881311
 ] 

Seth Fitzsimmons commented on HADOOP-14028:
---

I mean "data hasn't been fully created" (because the producer gets killed 
before it finishes generating it), but what's there may have been fully flushed 
(the "file" hasn't ended yet).

It smelled like a {{finally}} block, but I'm not sure how the OOM killer ends 
up interacting with Java (specifically how much cleanup can be done).

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-23 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881172#comment-15881172
 ] 

Seth Fitzsimmons commented on HADOOP-14028:
---

Err, actually, with one caveat: if the process is killed (OOM, typically), 
uploaded files are somehow still marked as complete (but not always), even 
though the write didn't actually complete successfully.

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-23 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881168#comment-15881168
 ] 

Seth Fitzsimmons commented on HADOOP-14028:
---

Functionality-wise, the patch has been working well for me on up to 70GB output 
files.

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14071) S3a: Failed to reset the request input stream

2017-02-16 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869635#comment-15869635
 ] 

Seth Fitzsimmons commented on HADOOP-14071:
---

For sure.

> S3a: Failed to reset the request input stream
> -
>
> Key: HADOOP-14071
> URL: https://issues.apache.org/jira/browse/HADOOP-14071
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>
> When using the patch from HADOOP-14028, I fairly consistently get {{Failed to 
> reset the request input stream}} exceptions. They're more likely to occur the 
> larger the file that's being written (70GB in the extreme case, but it needs 
> to be one file).
> {code}
> 2017-02-10 04:21:43 WARN S3ABlockOutputStream:692 - Transfer failure of block 
> FileBlock{index=416, 
> destFile=/tmp/hadoop-root/s3a/s3ablock-0416-4228067786955989475.tmp, 
> state=Upload, dataSize=11591473, limit=104857600}
> 2017-02-10 04:21:43 WARN S3AInstrumentation:777 - Closing output stream 
> statistics while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=416, blocksInQueue=0, blocksActive=0, 
> blockUploadsCompleted=416, blockUploadsFailed=3, 
> bytesPendingUpload=209747761, bytesUploaded=43317747712, blocksAllocated=416, 
> blocksReleased=416, blocksActivelyAllocated=0, 
> exceptionsInMultipartFinalize=0, transferDuration=1389936 ms, 
> queueDuration=519 ms, averageQueueTime=1 ms, totalUploadDuration=1390455 ms, 
> effectiveBandwidth=3.1153649497466657E7 bytes/s}
> at org.apache.hadoop.fs.s3a.S3AUtils.extractException(S3AUtils.java:200)
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128)
> Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
> Multi-part upload with id 
> 'Xx.ezqT5hWrY1W92GrcodCip88i8rkJiOcom2nuUAqHtb6aQX__26FYh5uYWKlRNX5vY5ktdmQWlOovsbR8CLmxUVmwFkISXxDRHeor8iH9nPhI3OkNbWJJBLrvB3xLUuLX0zvGZWo7bUrAKB6IGxA--'
>  to 2017/planet-170206.orc on 2017/planet-170206.orc: 
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int): Failed to 
> reset the request input stream; If the request involves an input stream, the 
> maximum stream buffer size can be configured via 
> request.getRequestClientOptions().setReadLimit(int)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.waitForAllPartUploads(S3ABlockOutputStream.java:539)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.access$100(S3ABlockOutputStream.java:456)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:351)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.orc.impl.PhysicalFsWriter.close(PhysicalFsWriter.java:221)
> at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2827)
> at net.mojodna.osm2orc.standalone.OsmPbf2Orc.convert(OsmPbf2Orc.java:296)
> at net.mojodna.osm2orc.Osm2Orc.main(Osm2Orc.java:47)
> Caused by: com.amazonaws.ResetException: Failed to reset the request input 
> stream; If the request involves an input stream, the maximum stream buffer 
> size can be configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1114)
> at 
> 

[jira] [Commented] (HADOOP-14071) S3a: Failed to reset the request input stream

2017-02-10 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861684#comment-15861684
 ] 

Seth Fitzsimmons commented on HADOOP-14071:
---

This is short-haul (EC2 in us-east-1 to S3 in us-standard).

> S3a: Failed to reset the request input stream
> -
>
> Key: HADOOP-14071
> URL: https://issues.apache.org/jira/browse/HADOOP-14071
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>
> When using the patch from HADOOP-14028, I fairly consistently get {{Failed to 
> reset the request input stream}} exceptions. They're more likely to occur the 
> larger the file that's being written (70GB in the extreme case, but it needs 
> to be one file).
> {code}
> 2017-02-10 04:21:43 WARN S3ABlockOutputStream:692 - Transfer failure of block 
> FileBlock{index=416, 
> destFile=/tmp/hadoop-root/s3a/s3ablock-0416-4228067786955989475.tmp, 
> state=Upload, dataSize=11591473, limit=104857600}
> 2017-02-10 04:21:43 WARN S3AInstrumentation:777 - Closing output stream 
> statistics while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=416, blocksInQueue=0, blocksActive=0, 
> blockUploadsCompleted=416, blockUploadsFailed=3, 
> bytesPendingUpload=209747761, bytesUploaded=43317747712, blocksAllocated=416, 
> blocksReleased=416, blocksActivelyAllocated=0, 
> exceptionsInMultipartFinalize=0, transferDuration=1389936 ms, 
> queueDuration=519 ms, averageQueueTime=1 ms, totalUploadDuration=1390455 ms, 
> effectiveBandwidth=3.1153649497466657E7 bytes/s}
> at org.apache.hadoop.fs.s3a.S3AUtils.extractException(S3AUtils.java:200)
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128)
> Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
> Multi-part upload with id 
> 'Xx.ezqT5hWrY1W92GrcodCip88i8rkJiOcom2nuUAqHtb6aQX__26FYh5uYWKlRNX5vY5ktdmQWlOovsbR8CLmxUVmwFkISXxDRHeor8iH9nPhI3OkNbWJJBLrvB3xLUuLX0zvGZWo7bUrAKB6IGxA--'
>  to 2017/planet-170206.orc on 2017/planet-170206.orc: 
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int): Failed to 
> reset the request input stream; If the request involves an input stream, the 
> maximum stream buffer size can be configured via 
> request.getRequestClientOptions().setReadLimit(int)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.waitForAllPartUploads(S3ABlockOutputStream.java:539)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.access$100(S3ABlockOutputStream.java:456)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:351)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.orc.impl.PhysicalFsWriter.close(PhysicalFsWriter.java:221)
> at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2827)
> at net.mojodna.osm2orc.standalone.OsmPbf2Orc.convert(OsmPbf2Orc.java:296)
> at net.mojodna.osm2orc.Osm2Orc.main(Osm2Orc.java:47)
> Caused by: com.amazonaws.ResetException: Failed to reset the request input 
> stream; If the request involves an input stream, the maximum stream buffer 
> size can be configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1114)
> at 
> 

[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-09 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860725#comment-15860725
 ] 

Seth Fitzsimmons commented on HADOOP-14028:
---

I'm running into HADOOP-14071 when using {{HADOOP-14028-branch-2.8-003.patch}}.

It manages to complete intermittently and usually takes a few hours to fail.

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-branch-2-001.patch, 
> HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, 
> HADOOP-14028-branch-2.8-004.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14071) S3a: Failed to reset the request input stream

2017-02-09 Thread Seth Fitzsimmons (JIRA)
Seth Fitzsimmons created HADOOP-14071:
-

 Summary: S3a: Failed to reset the request input stream
 Key: HADOOP-14071
 URL: https://issues.apache.org/jira/browse/HADOOP-14071
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.0-alpha2
Reporter: Seth Fitzsimmons


When using the patch from HADOOP-14028, I fairly consistently get {{Failed to 
reset the request input stream}} exceptions. They're more likely to occur the 
larger the file that's being written (70GB in the extreme case, but it needs to 
be one file).

{code}
2017-02-10 04:21:43 WARN S3ABlockOutputStream:692 - Transfer failure of block 
FileBlock{index=416, 
destFile=/tmp/hadoop-root/s3a/s3ablock-0416-4228067786955989475.tmp, 
state=Upload, dataSize=11591473, limit=104857600}
2017-02-10 04:21:43 WARN S3AInstrumentation:777 - Closing output stream 
statistics while data is still marked as pending upload in 
OutputStreamStatistics{blocksSubmitted=416, blocksInQueue=0, blocksActive=0, 
blockUploadsCompleted=416, blockUploadsFailed=3, bytesPendingUpload=209747761, 
bytesUploaded=43317747712, blocksAllocated=416, blocksReleased=416, 
blocksActivelyAllocated=0, exceptionsInMultipartFinalize=0, 
transferDuration=1389936 ms, queueDuration=519 ms, averageQueueTime=1 ms, 
totalUploadDuration=1390455 ms, effectiveBandwidth=3.1153649497466657E7 bytes/s}
at org.apache.hadoop.fs.s3a.S3AUtils.extractException(S3AUtils.java:200)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128)
Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
Multi-part upload with id 
'Xx.ezqT5hWrY1W92GrcodCip88i8rkJiOcom2nuUAqHtb6aQX__26FYh5uYWKlRNX5vY5ktdmQWlOovsbR8CLmxUVmwFkISXxDRHeor8iH9nPhI3OkNbWJJBLrvB3xLUuLX0zvGZWo7bUrAKB6IGxA--'
 to 2017/planet-170206.orc on 2017/planet-170206.orc: 
com.amazonaws.ResetException: Failed to reset the request input stream; If the 
request involves an input stream, the maximum stream buffer size can be 
configured via request.getRequestClientOptions().setReadLimit(int): Failed to 
reset the request input stream; If the request involves an input stream, the 
maximum stream buffer size can be configured via 
request.getRequestClientOptions().setReadLimit(int)
at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.waitForAllPartUploads(S3ABlockOutputStream.java:539)
at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.access$100(S3ABlockOutputStream.java:456)
at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:351)
at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
at org.apache.orc.impl.PhysicalFsWriter.close(PhysicalFsWriter.java:221)
at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2827)
at net.mojodna.osm2orc.standalone.OsmPbf2Orc.convert(OsmPbf2Orc.java:296)
at net.mojodna.osm2orc.Osm2Orc.main(Osm2Orc.java:47)
Caused by: com.amazonaws.ResetException: Failed to reset the request input 
stream; If the request involves an input stream, the maximum stream buffer size 
can be configured via request.getRequestClientOptions().setReadLimit(int)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
at 
org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
at 
com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1114)
at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:501)
at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:492)
at 
org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
at 

[jira] [Created] (HADOOP-14070) S3a: Failed to reset the request input stream

2017-02-09 Thread Seth Fitzsimmons (JIRA)
Seth Fitzsimmons created HADOOP-14070:
-

 Summary: S3a: Failed to reset the request input stream
 Key: HADOOP-14070
 URL: https://issues.apache.org/jira/browse/HADOOP-14070
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.0-alpha2
Reporter: Seth Fitzsimmons


{code}
Feb 07, 2017 8:05:46 AM 
com.google.common.util.concurrent.Futures$CombinedFuture setExceptionAndMaybeLog
SEVERE: input future failed.
com.amazonaws.ResetException: Failed to reset the request input stream; If the 
request involves an input stream, the maximum stream buffer size can be 
configured via request.getRequestClientOptions().setReadLimit(int)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:586)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
at 
com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1114)
at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:501)
at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3ABlockOutputStream.java:492)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1219)
at 
org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Resetting to invalid mark
at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
at 
com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
at 
com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:169)
at 
com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
at 
org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
... 20 more
2017-02-07 08:05:46 WARN S3AInstrumentation:777 - Closing output stream 
statistics while data is still marked as pending upload in 
OutputStreamStatistics{blocksSubmitted=519, blocksInQueue=0, blocksActive=1, 
blockUploadsCompleted=518, blockUploadsFailed=2, bytesPendingUpload=82528300, 
bytesUploaded=54316236800, blocksAllocated=519, blocksReleased=519, 
blocksActivelyAllocated=0, exceptionsInMultipartFinalize=0, 
transferDuration=2637812 ms, queueDuration=839 ms, averageQueueTime=1 ms, 
totalUploadDuration=2638651 ms, effectiveBandwidth=2.05848506680118E7 bytes/s}
Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
Multi-part upload with id 
'uDonLgtsyeToSmhyZuNb7YrubCDiyXCCQy4mdVc5ZmYWPPHyZ3H3ZlFZzKktaPUiYb7uT4.oM.lcyoazHF7W8pK4xWmXV4RWmIYGYYhN6m25nWRrBEE9DcJHcgIhFD8xd7EKIjijEd1k4S5JY1HQvA--'
 to 2017/history-170130.orc on 2017/history-170130.orc: 
com.amazonaws.ResetException: Failed to reset the request input stream; If the 
request involves an input stream, the maximum stream buffer size can be 
configured via request.getRequestClientOptions().setReadLimit(int): Failed to 
reset the request input stream; If the request involves an input stream, the 
maximum stream buffer size can be configured via 
request.getRequestClientOptions().setReadLimit(int)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128)
at org.apache.hadoop.fs.s3a.S3AUtils.extractException(S3AUtils.java:200)
at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.waitForAllPartUploads(S3ABlockOutputStream.java:539)
at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.access$100(S3ABlockOutputStream.java:456)
at 

[jira] [Commented] (HADOOP-14028) S3A block output streams don't clear temporary files

2017-01-26 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840605#comment-15840605
 ] 

Seth Fitzsimmons commented on HADOOP-14028:
---

Reading through the AWS SDK code, it looks like this is the line ultimately 
responsible for closing the input stream: 
https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-core/src/main/java/com/amazonaws/internal/ReleasableInputStream.java#L85

I'm using the default settings (other than {{fs.s3a.fast.upload=true}}.

>From watching the atimes, it looks like there's only 1 block going up at a 
>time while the next one fills up. (My producer is relatively slow and I'm 
>running in EC2, so it makes sense that the uploader can keep up).

> S3A block output streams don't clear temporary files
> 
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14028) S3A block output streams don't clear temporary files

2017-01-26 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840605#comment-15840605
 ] 

Seth Fitzsimmons edited comment on HADOOP-14028 at 1/26/17 10:52 PM:
-

Reading through the AWS SDK code, it looks like this is the line ultimately 
responsible for closing the input stream: 
https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-core/src/main/java/com/amazonaws/internal/ReleasableInputStream.java#L85

I'm using the default settings (other than {{fs.s3a.fast.upload=true}}).

>From watching the atimes, it looks like there's only 1 block going up at a 
>time while the next one fills up. (My producer is relatively slow and I'm 
>running in EC2, so it makes sense that the uploader can keep up).


was (Author: mojodna):
Reading through the AWS SDK code, it looks like this is the line ultimately 
responsible for closing the input stream: 
https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-core/src/main/java/com/amazonaws/internal/ReleasableInputStream.java#L85

I'm using the default settings (other than {{fs.s3a.fast.upload=true}}.

>From watching the atimes, it looks like there's only 1 block going up at a 
>time while the next one fills up. (My producer is relatively slow and I'm 
>running in EC2, so it makes sense that the uploader can keep up).

> S3A block output streams don't clear temporary files
> 
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't clear temporary files

2017-01-26 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840574#comment-15840574
 ] 

Seth Fitzsimmons commented on HADOOP-14028:
---

https://github.com/apache/hadoop/blob/6c348c56918973fd988b110e79231324a8befe12/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java#L490
 suggests that it should.

Once I get this file created and uploaded, I'll try enabling debug logging for 
S3A to get a clearer idea of what's happening.

> S3A block output streams don't clear temporary files
> 
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14028) S3A block output streams don't clear temporary files

2017-01-26 Thread Seth Fitzsimmons (JIRA)
Seth Fitzsimmons created HADOOP-14028:
-

 Summary: S3A block output streams don't clear temporary files
 Key: HADOOP-14028
 URL: https://issues.apache.org/jira/browse/HADOOP-14028
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.0-alpha2
 Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
Reporter: Seth Fitzsimmons


I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I was 
looking for after running into the same OOM problems) and don't see it cleaning 
up the disk-cached blocks.

I'm generating a ~50GB file on an instance with ~6GB free when the process 
starts. My expectation is that local copies of the blocks would be deleted 
after those parts finish uploading, but I'm seeing more than 15 blocks in /tmp 
(and none of them have been deleted thus far).

I see that DiskBlock deletes temporary files when closed, but is it closed 
after individual blocks have finished uploading or when the entire file has 
been fully written to the FS (full upload completed, including all parts)?

As a temporary workaround to avoid running out of space, I'm listing files, 
sorting by atime, and deleting anything older than the first 20: `ls -ut | tail 
-n +21 | xargs rm`

Steve Loughran says:

> They should be deleted as soon as the upload completes; the close() call that 
> the AWS httpclient makes on the input stream triggers the deletion. Though 
> there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org