[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-24 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883395#comment-15883395
 ] 

Mingliang Liu commented on HADOOP-14028:


{quote}
IOUtilsClient#cleanup() because its in the wrong package
{quote}
I missed that... sorry for the noise.

+1

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch, 
> HADOOP-14028-branch-2.8-008.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882691#comment-15882691
 ] 

Hadoop QA commented on HADOOP-14028:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
34s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 11s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 4 
new + 27 unchanged - 2 fixed = 31 total (was 29) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-aws in the patch passed with JDK v1.7.0_121. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 14m 49s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5af2af1 |
| JIRA Issue | HADOOP-14028 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12854467/HADOOP-14028-branch-2.8-008.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux de828f17db34 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | branch-2.8 / ff87ca8 |
| Default Java | 1.7.0_121 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_121 

[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-24 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15882504#comment-15882504
 ] 

Steve Loughran commented on HADOOP-14028:
-

Hadn't seen {{IOUtilsClient}}...it's in HDFS and we don't actually depend on 
it. Move those methods into hadoop common & I can use them

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-23 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881629#comment-15881629
 ] 

Mingliang Liu commented on HADOOP-14028:


+1

I learned a lot from the threads here: how to debug and identify the problem 
from logging/stats, how to evaluate direct idea to solve the problem and end up 
with a comprehensive one. I'd suggest we change the subject to "not closing" 
instead of "don't delete temporary files" to make the problem clearer.

One nit: can we use the {{IOUtilsClient#cleanup()}} instead of newly added 
{{closeAll()}}? Adding one more line of debug info to that method should just 
work fine.



> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-23 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881336#comment-15881336
 ] 

Lei (Eddy) Xu commented on HADOOP-14028:


Hi, Steve

The patch LGTM. +1 pending the following changes. 
 
{code}
public static void closeAll(Logger log,
  java.io.Closeable... closeables) {
for (java.io.Closeable c : closeables) {
  if (c != null) {
try {
  LOG.debug("Closing {}", c);
  c.close();
} catch (Exception e) {
  if (log != null && log.isDebugEnabled()) {
log.debug("Exception in closing {}", c, e);
  }
}
  }
}
  }
{code}

Can be  {{package-private}}?  And {{LOG.debug("Closing {}", c);}} should be 
{{log.debug(...)}} to use the logger passed in.

{code}
static final class BlockUploadData implements Closeable {
public final File file;
{code}

It should be {{private final File file}}.

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-23 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881311#comment-15881311
 ] 

Seth Fitzsimmons commented on HADOOP-14028:
---

I mean "data hasn't been fully created" (because the producer gets killed 
before it finishes generating it), but what's there may have been fully flushed 
(the "file" hasn't ended yet).

It smelled like a {{finally}} block, but I'm not sure how the OOM killer ends 
up interacting with Java (specifically how much cleanup can be done).

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881283#comment-15881283
 ] 

Steve Loughran commented on HADOOP-14028:
-

That's interesting, because we don't issue the complete until the final close 
kicked in. When you say "write didn't complete" do you mean "data hasn't been 
fully flushed?"; maybe the upload stream is being closed before some flush() is 
called by the source

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-23 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881172#comment-15881172
 ] 

Seth Fitzsimmons commented on HADOOP-14028:
---

Err, actually, with one caveat: if the process is killed (OOM, typically), 
uploaded files are somehow still marked as complete (but not always), even 
though the write didn't actually complete successfully.

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-23 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881168#comment-15881168
 ] 

Seth Fitzsimmons commented on HADOOP-14028:
---

Functionality-wise, the patch has been working well for me on up to 70GB output 
files.

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-23 Thread Mingliang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880770#comment-15880770
 ] 

Mingliang Liu commented on HADOOP-14028:


Will do a review this week. Any more reviews will be appreciated as I've just 
started looking at the problem/solution here.

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880323#comment-15880323
 ] 

Steve Loughran commented on HADOOP-14028:
-

Really, really need urgent review here. [~liuml07], [~eddyxu], [~cnauroth] , 
[~Thomas Demoor] — others?

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878025#comment-15878025
 ] 

Steve Loughran commented on HADOOP-14028:
-

Checkstyle complaints unimportant
{code}
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java:382:
: writeOperationHelper.newPutRequest(uploadData.getUploadStream(), 
size);: Line is longer than 80 characters (found 81).
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ADataBlocks.java:212:
protected final long index;:26: Variable 'index' must be private and have 
accessor methods.
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ADataBlocks.java:213:
protected final S3AInstrumentation.OutputStreamStatistics statistics;:63: 
Variable 'statistics' must be private and have accessor methods.
./hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java:1:/*:
 File length is 2,374 lines (max allowed is 2,000)
{Code}

# 81 chars is acceptable for readability IMO
# the visible fields are all final and for package private classes. We are in 
control of where these fields are referenced; wrapping them is needless.
# File length is insurmountable (and getting worse). I've already moved stuff 
out of S3aFileSystem (e.g. all the listing stuff); not sure what else can be 
done. I worry more about class complexity, especially with the s3guard changes, 
than about overall length.

This patch is ready for review, and it is important. Volunteers?

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch, HADOOP-14028-branch-2.8-007.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876194#comment-15876194
 ] 

Hadoop QA commented on HADOOP-14028:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  7m 
40s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
34s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
14s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
34s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 11s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 4 
new + 27 unchanged - 2 fixed = 31 total (was 29) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-aws in the patch passed with JDK v1.7.0_121. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m  4s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5af2af1 |
| JIRA Issue | HADOOP-14028 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12853765/HADOOP-14028-branch-2.8-007.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 80e2c0a596ee 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | branch-2.8 / 2b3e8b7 |
| Default Java | 1.7.0_121 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_121 

[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874945#comment-15874945
 ] 

Hadoop QA commented on HADOOP-14028:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  4s{color} 
| {color:red} HADOOP-14028 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-14028 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12853598/HADOOP-14028-007.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/11660/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-006.patch, HADOOP-14028-007.patch, 
> HADOOP-14028-branch-2-001.patch, HADOOP-14028-branch-2.8-002.patch, 
> HADOOP-14028-branch-2.8-003.patch, HADOOP-14028-branch-2.8-004.patch, 
> HADOOP-14028-branch-2.8-005.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870545#comment-15870545
 ] 

Steve Loughran commented on HADOOP-14028:
-

and that new test shows..file streams don't support mark/reset
{code}
Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 6.235 sec <<< 
FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3ABlockOutputDisk
testMarkReset(org.apache.hadoop.fs.s3a.ITestS3ABlockOutputDisk)  Time elapsed: 
0.533 sec  <<< FAILURE!
java.lang.AssertionError: Mark not supported in java.io.FileInputStream@7143d536
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.fs.s3a.ITestS3ABlockOutputArray.markAndResetDatablock(ITestS3ABlockOutputArray.java:134)
at 
org.apache.hadoop.fs.s3a.ITestS3ABlockOutputArray.testMarkReset(ITestS3ABlockOutputArray.java:147)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}

Looks like a valid complaint. 

Looking at the AWS code, they have a special class, 
{{com.amazonaws.internal.ResettableInputStream}} which does the mark/reset 
operation by using the FileChannel behind a File; we need to always pass it 
down for mark/reset to work with file sources


> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-branch-2-001.patch, 
> HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, 
> HADOOP-14028-branch-2.8-004.patch, HADOOP-14028-branch-2.8-005.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870213#comment-15870213
 ] 

Steve Loughran commented on HADOOP-14028:
-

Looking at the latest code (the stuff we get with HADOOP-14040). The check is 
purely about mark supported and rollback
{code}
if (requestInputStream.markSupported()) {
try {
requestInputStream.reset();
} catch (IOException ex) {
throw new ResetException("Failed to reset the request 
input stream", ex);
}
}
{code}
I will do a test to verify that all blocks are supporting reset of the full 
datablock

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-branch-2-001.patch, 
> HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, 
> HADOOP-14028-branch-2.8-004.patch, HADOOP-14028-branch-2.8-005.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-15 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868425#comment-15868425
 ] 

Steve Loughran commented on HADOOP-14028:
-

it that case its not relevant, then mark  & reset should be doing the move, 
which is something that all the output streams we work with should support. So 
why the error message in HADOOP-14071 saying "use the option"


maybe we should a a test to TestBlocks which verifies that mark & reset works 
for a multi MB block; straightforward enough.

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-branch-2-001.patch, 
> HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, 
> HADOOP-14028-branch-2.8-004.patch, HADOOP-14028-branch-2.8-005.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-15 Thread Thomas Demoor (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868162#comment-15868162
 ] 

Thomas Demoor commented on HADOOP-14028:


Not sure yet, looking at the aws sdk code but I'm still confused.

This is the comment for readlimit in {{com.amazonaws.RequestClientOptions}}
{code}
 /**
 * Used to enable mark-and-reset for
 * non-mark-and-resettable non-file input stream for up to 128K memory
 * buffering by default. Add 1 to get around an implementation quirk of
 * BufferedInputStream.
 *
 * Retries after reading {@link #DEFAULT_STREAM_BUFFER_SIZE} bytes would
 * fail to reset the underlying input stream as the mark position would
 * have been invalidated.
 *
 */
{code}
The upload / partupload code gets this from a legacy system property and puts 
it in awsreq variable but the awsreq variable doesn't seem to be used anymore.  
Code is here {{com/amazonaws/services/s3/AmazonS3Client.java:1627}} and here 
{{com/amazonaws/services/s3/AmazonS3Client.java:3130}} (sdk 1.11.86)

So don't think it's set for our inputstreams. I guess we currently use the 
default 128K+1byte?



> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-branch-2-001.patch, 
> HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, 
> HADOOP-14028-branch-2.8-004.patch, HADOOP-14028-branch-2.8-005.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868082#comment-15868082
 ] 

Hadoop QA commented on HADOOP-14028:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
 2s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
30s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
 1s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
18s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 13s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 3 
new + 26 unchanged - 2 fixed = 29 total (was 28) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
21s{color} | {color:green} hadoop-aws in the patch passed with JDK v1.7.0_121. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 18m 42s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5af2af1 |
| JIRA Issue | HADOOP-14028 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12852840/HADOOP-14028-branch-2.8-005.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 702889bc5280 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | branch-2.8 / 4c47cb6 |
| Default Java | 1.7.0_121 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_121 

[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866263#comment-15866263
 ] 

Steve Loughran commented on HADOOP-14028:
-

Regarding marker/stream rollback, Thomas [~Thomas Demoor] has suggested calling 
{{getRequestClientOptions().setReadLimit()}} for the memory block streams, to 
say "rollback as far as you want". Thomas: is this what you are thinking: set 
the limit to the block size to indicate that you can go back to the end

{code}
putObjectRequest.getRequestClientOptions().setReadLimit(size);
{code}

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-branch-2-001.patch, 
> HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, 
> HADOOP-14028-branch-2.8-004.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-09 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860725#comment-15860725
 ] 

Seth Fitzsimmons commented on HADOOP-14028:
---

I'm running into HADOOP-14071 when using {{HADOOP-14028-branch-2.8-003.patch}}.

It manages to complete intermittently and usually takes a few hours to fail.

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-branch-2-001.patch, 
> HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, 
> HADOOP-14028-branch-2.8-004.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-06 Thread Aaron Fabbri (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854968#comment-15854968
 ] 

Aaron Fabbri commented on HADOOP-14028:
---

Quick review of v4 patch

{noformat}
@@ -392,8 +391,15 @@ private void putObject() throws IOException {
 executorService.submit(new Callable() {
   @Override
   public PutObjectResult call() throws Exception {
-PutObjectResult result = fs.putObjectDirect(putObjectRequest);
-block.close();
+PutObjectResult result;
+try {
+  // the put object call automatically closes the input
+  // stream afterwards.
+  result = writeOperationHelper.putObject(putObjectRequest);
+  block.close();
+} finally {
+  closeCloseables(LOG, block);
+}
 return
{noformat}

Do you still need block.close() above finally block?

{noformat}
-  /**
-   * Return the buffer to the pool after the stream is closed.
-   */
-  @Override
-  public synchronized void close() {
-if (byteBuffer != null) {
-  LOG.debug("releasing buffer");
-  releaseBuffer(byteBuffer);
+/**
+ * Return the buffer to the pool after the stream is closed.
+ */
+@Override
+public synchronized void close() {
+  LOG.debug("ByteBufferInputStream.close() for {}",
+  ByteBufferBlock.super.toString());
   byteBuffer = null;
 }
-  }
  
-  /**
-   * Verify that the stream is open.
-   * @throws IOException if the stream is closed
-   */
-  private void verifyOpen() throws IOException {
-if (byteBuffer == null) {
-  throw new IOException(FSExceptionMessages.STREAM_IS_CLOSED);
{noformat}

This part of diff was hard to read due to indent change, but did you eliminate 
the call to releaseBuffer(byteBuffer)?
If so can you explain that a bit?




> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-branch-2-001.patch, 
> HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, 
> HADOOP-14028-branch-2.8-004.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-06 Thread Thomas Demoor (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854824#comment-15854824
 ] 

Thomas Demoor commented on HADOOP-14028:


I'll do some testing tomorrow.

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
>Priority: Critical
> Attachments: HADOOP-14028-branch-2-001.patch, 
> HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, 
> HADOOP-14028-branch-2.8-004.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-04 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852800#comment-15852800
 ] 

Steve Loughran commented on HADOOP-14028:
-

the two checkstyles are about having final read-only fields marked as protected 
rather than private+accessors, which, if the class in question were part of a 
public API would matter. As it is, the classes in S3ADataBlocks are all package 
private, so all uses will exist in our codebase, and encapsulating them would 
only be obsessive

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
> Attachments: HADOOP-14028-branch-2-001.patch, 
> HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, 
> HADOOP-14028-branch-2.8-004.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15852194#comment-15852194
 ] 

Hadoop QA commented on HADOOP-14028:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
25s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
18s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
 2s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 11s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 3 
new + 26 unchanged - 2 fixed = 29 total (was 28) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-aws in the patch passed with JDK v1.7.0_121. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 28s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5af2af1 |
| JIRA Issue | HADOOP-14028 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12850887/HADOOP-14028-branch-2.8-004.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 738c7a6cb3e4 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | branch-2.8 / 2bbcaa8 |
| Default Java | 1.7.0_121 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_121 

[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850746#comment-15850746
 ] 

Hadoop QA commented on HADOOP-14028:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
16s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
20s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
22s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
15s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
39s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 12s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 5 
new + 26 unchanged - 2 fixed = 31 total (was 28) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
11s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
21s{color} | {color:green} hadoop-aws in the patch passed with JDK v1.7.0_121. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 15m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5af2af1 |
| JIRA Issue | HADOOP-14028 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12850704/HADOOP-14028-branch-2.8-003.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux ae17cd972161 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | branch-2.8 / 4e423ed |
| Default Java | 1.7.0_121 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_121 

[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850389#comment-15850389
 ] 

Steve Loughran commented on HADOOP-14028:
-

It's confirmed that the multipart part upload does not close streams; the docs 
will make it clearer in future. So yes, we do need to close it on a multipart 
commit, and no, not on a single part one.

code will be updated accordingly

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.8.0
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
> Attachments: HADOOP-14028-branch-2-001.patch, 
> HADOOP-14028-branch-2.8-002.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848564#comment-15848564
 ] 

Hadoop QA commented on HADOOP-14028:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m  
3s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 4 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
34s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
16s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
19s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
26s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  1m 
 3s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
36s{color} | {color:green} branch-2.8 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
14s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 11s{color} | {color:orange} hadoop-tools/hadoop-aws: The patch generated 11 
new + 27 unchanged - 1 fixed = 38 total (was 28) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed with JDK v1.8.0_121 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
13s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-aws in the patch passed with JDK v1.7.0_121. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 36m 30s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5af2af1 |
| JIRA Issue | HADOOP-14028 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12850437/HADOOP-14028-branch-2.8-002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux b951db0510f5 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | branch-2.8 / 4f13564 |
| Default Java | 1.7.0_121 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_121 

[jira] [Commented] (HADOOP-14028) S3A block output streams don't delete temporary files in multipart uploads

2017-02-01 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848395#comment-15848395
 ] 

Steve Loughran commented on HADOOP-14028:
-

..looking into {{com.amazonaws.services.s3.model.S3DataSource}}, the cleanup 
code call at the end of the multipart put doesn't 

{code}
// restore the original input stream so the caller could close  
 
// it if necessary  
 
req.setInputStream(inputStreamOrig);
 
req.setFile(fileOrig);  
 
{code}

Conclusion: uploadPart really doesn't close input streams, we need to do it 
ourselves. As putObject *does*, I'm going to leave it alone, and rely on our 
new tests to pick up regressions in future

> S3A block output streams don't delete temporary files in multipart uploads
> --
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>Assignee: Steve Loughran
> Attachments: HADOOP-14028-branch-2-001.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org