[jira] [Commented] (HADOOP-13421) Switch to v2 of the S3 List Objects API in S3A
[ https://issues.apache.org/jira/browse/HADOOP-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695725#comment-15695725 ] Pieter Reuse commented on HADOOP-13421: --- Thank you for putting this on my radar, [~ste...@apache.org]. I will look deeper into it and come back on this in the course of next week. > Switch to v2 of the S3 List Objects API in S3A > -- > > Key: HADOOP-13421 > URL: https://issues.apache.org/jira/browse/HADOOP-13421 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 >Reporter: Steven K. Wong >Priority: Minor > > Unlike [version > 1|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html] of the > S3 List Objects API, [version > 2|http://docs.aws.amazon.com/AmazonS3/latest/API/v2-RESTBucketGET.html] by > default does not fetch object owner information, which S3A doesn't need > anyway. By switching to v2, there will be less data to transfer/process. > Also, it should be more robust when listing a versioned bucket with "a large > number of delete markers" ([according to > AWS|https://aws.amazon.com/releasenotes/Java/0735652458007581]). > Methods in S3AFileSystem that use this API include: > * getFileStatus(Path) > * innerDelete(Path, boolean) > * innerListStatus(Path) > * innerRename(Path, Path) > Requires AWS SDK 1.10.75 or later. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13735) ITestS3AFileContextStatistics.testStatistics() failing
[ https://issues.apache.org/jira/browse/HADOOP-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15591724#comment-15591724 ] Pieter Reuse commented on HADOOP-13735: --- Thank you for uploading my patch, Steve. -1 from Yetus is expected. This patch makes a test pass so no additional tests needed here in my opinion. > ITestS3AFileContextStatistics.testStatistics() failing > -- > > Key: HADOOP-13735 > URL: https://issues.apache.org/jira/browse/HADOOP-13735 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.8.0 >Reporter: Steve Loughran >Assignee: Pieter Reuse >Priority: Minor > Attachments: HADOOP-13735-branch-2-001.patch > > > The test {{ITestS3AFileContextStatistics.testStatistics()}} seems to fail > pretty reliably these days...I'd assumed it was some race condition, but > maybe not. > Fixing this will probably entail adding more diagnostics to the base test > case. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
[ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-9565: - Attachment: HADOOP-9565-branch-2-007.patch renamed file to HADOOP-9565-branch-2-007.patch so Hadoop QA can apply it. > Add a Blobstore interface to add to blobstore FileSystems > - > > Key: HADOOP-9565 > URL: https://issues.apache.org/jira/browse/HADOOP-9565 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/s3, fs/swift >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Pieter Reuse > Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, > HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch, > HADOOP-9565-006.patch, HADOOP-9565-branch-2-007.patch > > > We can make the fact that some {{FileSystem}} implementations are really > blobstores, with different atomicity and consistency guarantees, by adding a > {{Blobstore}} interface to add to them. > This could also be a place to add a {{Copy(Path,Path)}} method, assuming that > all blobstores implement at server-side copy operation as a substitute for > rename. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
[ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-9565: - Attachment: (was: HADOOP-9565-007.patch) > Add a Blobstore interface to add to blobstore FileSystems > - > > Key: HADOOP-9565 > URL: https://issues.apache.org/jira/browse/HADOOP-9565 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/s3, fs/swift >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Pieter Reuse > Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, > HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch, > HADOOP-9565-006.patch > > > We can make the fact that some {{FileSystem}} implementations are really > blobstores, with different atomicity and consistency guarantees, by adding a > {{Blobstore}} interface to add to them. > This could also be a place to add a {{Copy(Path,Path)}} method, assuming that > all blobstores implement at server-side copy operation as a substitute for > rename. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
[ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-9565: - Attachment: HADOOP-9565-007.patch Uploaded patch version 7: * Renamed UnsatisfiedSemanticsException to UnsatisfiedFeatureException * Renamed ObjectStoreFeatures to ObjectStoreFeature and changed to an enum-based approach * added style improvements to FileOutputCommitter, since we're changing a lot of lines throughout the class. Makes this patch prone to become non-applicable when other patches change this class. Chris, thanks for pointing out Steve will be enjoying a well earned break the coming weeks. We will have patience w.r.t. his response. > Add a Blobstore interface to add to blobstore FileSystems > - > > Key: HADOOP-9565 > URL: https://issues.apache.org/jira/browse/HADOOP-9565 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/s3, fs/swift >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Pieter Reuse > Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, > HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch, > HADOOP-9565-006.patch, HADOOP-9565-007.patch > > > We can make the fact that some {{FileSystem}} implementations are really > blobstores, with different atomicity and consistency guarantees, by adding a > {{Blobstore}} interface to add to them. > This could also be a place to add a {{Copy(Path,Path)}} method, assuming that > all blobstores implement at server-side copy operation as a substitute for > rename. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
[ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397523#comment-15397523 ] Pieter Reuse commented on HADOOP-9565: -- Thanks for patch 006, Steve. Having Strings instead of bitmasks is definitely an improvement. But maybe an enum is worth considering here. I don't think the Put command adds functionality. An ObjectStore-object is still a FileSystem-object, having the full FileSystem interface available. Or am I missing something here? Thank you for mentioning ViewFs here, Chris. It's important that this patch doesn't break the great ease of use ViewFs brings. But considering ViewFs is client-side only and this patch only brings performance enhancements, I don't think it is worth the extra miles checking the config of ViewFs whether the Path belongs to an ObjectStore which is part of a ViewFs instance. > Add a Blobstore interface to add to blobstore FileSystems > - > > Key: HADOOP-9565 > URL: https://issues.apache.org/jira/browse/HADOOP-9565 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/s3, fs/swift >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Pieter Reuse > Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, > HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch, > HADOOP-9565-006.patch > > > We can make the fact that some {{FileSystem}} implementations are really > blobstores, with different atomicity and consistency guarantees, by adding a > {{Blobstore}} interface to add to them. > This could also be a place to add a {{Copy(Path,Path)}} method, assuming that > all blobstores implement at server-side copy operation as a substitute for > rename. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients
[ https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374856#comment-15374856 ] Pieter Reuse commented on HADOOP-13139: --- Great that this feature is now in branch-2 as well! Thank you for working on this, [~ste...@apache.org]. Thanks for the input on this backport, [~andrew.wang] and [~cnauroth]. But of course most of the work was done by [~Thomas Demoor] and [~fabbri] on the original patch. > Branch-2: S3a to use thread pool that blocks clients > > > Key: HADOOP-13139 > URL: https://issues.apache.org/jira/browse/HADOOP-13139 > Project: Hadoop Common > Issue Type: Task > Components: fs/s3 >Affects Versions: 2.8.0 >Reporter: Pieter Reuse >Assignee: Pieter Reuse > Attachments: HADOOP-13139-001.patch, HADOOP-13139-branch-2-003.patch, > HADOOP-13139-branch-2-004.patch, HADOOP-13139-branch-2-005.patch, > HADOOP-13139-branch-2-006.patch, HADOOP-13139-branch-2.001.patch, > HADOOP-13139-branch-2.002.patch > > > HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will > attach a patch applicable to branch-2. > It should be noted in CHANGES-2.8.0.txt that the config parameter > 'fs.s3a.threads.core' has been been removed and the behavior of the > ThreadPool for s3a has been changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients
[ https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-13139: -- Status: Patch Available (was: In Progress) > Branch-2: S3a to use thread pool that blocks clients > > > Key: HADOOP-13139 > URL: https://issues.apache.org/jira/browse/HADOOP-13139 > Project: Hadoop Common > Issue Type: Task > Components: fs/s3 >Affects Versions: 2.8.0 >Reporter: Pieter Reuse >Assignee: Pieter Reuse > Fix For: 2.8.0 > > Attachments: HADOOP-13139-001.patch, HADOOP-13139-branch-2.001.patch, > HADOOP-13139-branch-2.002.patch > > > HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will > attach a patch applicable to branch-2. > It should be noted in CHANGES-2.8.0.txt that the config parameter > 'fs.s3a.threads.core' has been been removed and the behavior of the > ThreadPool for s3a has been changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients
[ https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-13139: -- Attachment: HADOOP-13139-branch-2.002.patch Uploaded patch 002: changed patch w.r.t. HADOOP-13028, fixed the checkstyle issues and copied HADOOP-12553 to fix the javadoc error. > Branch-2: S3a to use thread pool that blocks clients > > > Key: HADOOP-13139 > URL: https://issues.apache.org/jira/browse/HADOOP-13139 > Project: Hadoop Common > Issue Type: Task > Components: fs/s3 >Affects Versions: 2.8.0 >Reporter: Pieter Reuse >Assignee: Pieter Reuse > Fix For: 2.8.0 > > Attachments: HADOOP-13139-001.patch, HADOOP-13139-branch-2.001.patch, > HADOOP-13139-branch-2.002.patch > > > HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will > attach a patch applicable to branch-2. > It should be noted in CHANGES-2.8.0.txt that the config parameter > 'fs.s3a.threads.core' has been been removed and the behavior of the > ThreadPool for s3a has been changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients
[ https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-13139: -- Status: In Progress (was: Patch Available) > Branch-2: S3a to use thread pool that blocks clients > > > Key: HADOOP-13139 > URL: https://issues.apache.org/jira/browse/HADOOP-13139 > Project: Hadoop Common > Issue Type: Task > Components: fs/s3 >Affects Versions: 2.8.0 >Reporter: Pieter Reuse >Assignee: Pieter Reuse > Fix For: 2.8.0 > > Attachments: HADOOP-13139-001.patch, HADOOP-13139-branch-2.001.patch > > > HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will > attach a patch applicable to branch-2. > It should be noted in CHANGES-2.8.0.txt that the config parameter > 'fs.s3a.threads.core' has been been removed and the behavior of the > ThreadPool for s3a has been changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients
[ https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-13139: -- Attachment: HADOOP-13139-branch-2.001.patch Changed name to HADOOP-13139-branch-2.001.patch > Branch-2: S3a to use thread pool that blocks clients > > > Key: HADOOP-13139 > URL: https://issues.apache.org/jira/browse/HADOOP-13139 > Project: Hadoop Common > Issue Type: Task > Components: fs/s3 >Affects Versions: 2.8.0 >Reporter: Pieter Reuse >Assignee: Pieter Reuse > Fix For: 2.8.0 > > Attachments: HADOOP-13139-001.patch, HADOOP-13139-branch-2.001.patch > > > HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will > attach a patch applicable to branch-2. > It should be noted in CHANGES-2.8.0.txt that the config parameter > 'fs.s3a.threads.core' has been been removed and the behavior of the > ThreadPool for s3a has been changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients
[ https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-13139: -- Hadoop Flags: Incompatible change Affects Version/s: 2.8.0 Status: Patch Available (was: In Progress) > Branch-2: S3a to use thread pool that blocks clients > > > Key: HADOOP-13139 > URL: https://issues.apache.org/jira/browse/HADOOP-13139 > Project: Hadoop Common > Issue Type: Task > Components: fs/s3 >Affects Versions: 2.8.0 >Reporter: Pieter Reuse >Assignee: Pieter Reuse > Fix For: 2.8.0 > > Attachments: HADOOP-13139-001.patch > > > HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will > attach a patch applicable to branch-2. > It should be noted in CHANGES-2.8.0.txt that the config parameter > 'fs.s3a.threads.core' has been been removed and the behavior of the > ThreadPool for s3a has been changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients
[ https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-13139: -- Attachment: HADOOP-13139-001.patch Added patch 001 > Branch-2: S3a to use thread pool that blocks clients > > > Key: HADOOP-13139 > URL: https://issues.apache.org/jira/browse/HADOOP-13139 > Project: Hadoop Common > Issue Type: Task > Components: fs/s3 >Reporter: Pieter Reuse >Assignee: Pieter Reuse > Fix For: 2.8.0 > > Attachments: HADOOP-13139-001.patch > > > HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will > attach a patch applicable to branch-2. > It should be noted in CHANGES-2.8.0.txt that the config parameter > 'fs.s3a.threads.core' has been been removed and the behavior of the > ThreadPool for s3a has been changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients
[ https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-13139: -- Fix Version/s: 2.8.0 > Branch-2: S3a to use thread pool that blocks clients > > > Key: HADOOP-13139 > URL: https://issues.apache.org/jira/browse/HADOOP-13139 > Project: Hadoop Common > Issue Type: Task > Components: fs/s3 >Reporter: Pieter Reuse >Assignee: Pieter Reuse > Fix For: 2.8.0 > > Attachments: HADOOP-13139-001.patch > > > HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will > attach a patch applicable to branch-2. > It should be noted in CHANGES-2.8.0.txt that the config parameter > 'fs.s3a.threads.core' has been been removed and the behavior of the > ThreadPool for s3a has been changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Work started] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients
[ https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-13139 started by Pieter Reuse. - > Branch-2: S3a to use thread pool that blocks clients > > > Key: HADOOP-13139 > URL: https://issues.apache.org/jira/browse/HADOOP-13139 > Project: Hadoop Common > Issue Type: Task > Components: fs/s3 >Reporter: Pieter Reuse >Assignee: Pieter Reuse > > HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will > attach a patch applicable to branch-2. > It should be noted in CHANGES-2.8.0.txt that the config parameter > 'fs.s3a.threads.core' has been been removed and the behavior of the > ThreadPool for s3a has been changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients
Pieter Reuse created HADOOP-13139: - Summary: Branch-2: S3a to use thread pool that blocks clients Key: HADOOP-13139 URL: https://issues.apache.org/jira/browse/HADOOP-13139 Project: Hadoop Common Issue Type: Task Components: fs/s3 Reporter: Pieter Reuse Assignee: Pieter Reuse HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will attach a patch applicable to branch-2. It should be noted in CHANGES-2.8.0.txt that the config parameter 'fs.s3a.threads.core' has been been removed and the behavior of the ThreadPool for s3a has been changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-12844) Recover when S3A fails on IOException in read()
[ https://issues.apache.org/jira/browse/HADOOP-12844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264105#comment-15264105 ] Pieter Reuse commented on HADOOP-12844: --- Thank you for looking into this, [~ste...@apache.org]. The patch HADOOP-13028-006 indeed fixes the issue targeted by the patch I've uploaded here. So LGTM, and this ticket can be closed when HADOOP-13028 is accepted. > Recover when S3A fails on IOException in read() > --- > > Key: HADOOP-12844 > URL: https://issues.apache.org/jira/browse/HADOOP-12844 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.7.1, 2.7.2 >Reporter: Pieter Reuse >Assignee: Pieter Reuse > Attachments: HADOOP-12844.001.patch > > > This simple patch catches IOExceptions in S3AInputStream.read(byte[] buf, int > off, int len) and reopens the connection on the same location as it was > before the exception. > This is similar to the functionality introduced in S3N in > [HADOOP-6254|https://issues.apache.org/jira/browse/HADOOP-6254], for exactly > the same reason. > Patch developed in cooperation with [~emres]. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems
[ https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-9565: - Assignee: Pieter Reuse (was: Thomas Demoor) I've checked with [~Thomas Demoor] and I will work on a new patch addressing the remarks of [~steve_l] and [~aartokhy]. > Add a Blobstore interface to add to blobstore FileSystems > - > > Key: HADOOP-9565 > URL: https://issues.apache.org/jira/browse/HADOOP-9565 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/s3, fs/swift >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Pieter Reuse > Labels: BB2015-05-TBR > Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, > HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch > > > We can make the fact that some {{FileSystem}} implementations are really > blobstores, with different atomicity and consistency guarantees, by adding a > {{Blobstore}} interface to add to them. > This could also be a place to add a {{Copy(Path,Path)}} method, assuming that > all blobstores implement at server-side copy operation as a substitute for > rename. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12844) S3A fails on IOException
[ https://issues.apache.org/jira/browse/HADOOP-12844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-12844: -- Status: Patch Available (was: Open) > S3A fails on IOException > > > Key: HADOOP-12844 > URL: https://issues.apache.org/jira/browse/HADOOP-12844 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.7.2, 2.7.1 >Reporter: Pieter Reuse >Assignee: Pieter Reuse > Attachments: HADOOP-12844.001.patch > > > This simple patch catches IOExceptions in S3AInputStream.read(byte[] buf, int > off, int len) and reopens the connection on the same location as it was > before the exception. > This is similar to the functionality introduced in S3N in > [HADOOP-6254|https://issues.apache.org/jira/browse/HADOOP-6254], for exactly > the same reason. > Patch developed in cooperation with [~emres]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12844) S3A fails on IOException
[ https://issues.apache.org/jira/browse/HADOOP-12844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-12844: -- Attachment: HADOOP-12844.001.patch > S3A fails on IOException > > > Key: HADOOP-12844 > URL: https://issues.apache.org/jira/browse/HADOOP-12844 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.7.1, 2.7.2 >Reporter: Pieter Reuse >Assignee: Pieter Reuse > Attachments: HADOOP-12844.001.patch > > > This simple patch catches IOExceptions in S3AInputStream.read(byte[] buf, int > off, int len) and reopens the connection on the same location as it was > before the exception. > This is similar to the functionality introduced in S3N in > [HADOOP-6254|https://issues.apache.org/jira/browse/HADOOP-6254], for exactly > the same reason. > Patch developed in cooperation with [~emres]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12844) S3A fails on IOException
Pieter Reuse created HADOOP-12844: - Summary: S3A fails on IOException Key: HADOOP-12844 URL: https://issues.apache.org/jira/browse/HADOOP-12844 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 2.7.2, 2.7.1 Reporter: Pieter Reuse Assignee: Pieter Reuse This simple patch catches IOExceptions in S3AInputStream.read(byte[] buf, int off, int len) and reopens the connection on the same location as it was before the exception. This is similar to the functionality introduced in S3N in [HADOOP-6254|https://issues.apache.org/jira/browse/HADOOP-6254], for exactly the same reason. Patch developed in cooperation with [~emres]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A
[ https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-11262: -- Attachment: HADOOP-11262-10.patch Thank you for pointing out the missing @Ignore, [~cnauroth]. Added this to patch version 10. Test failures of Hadoop QA on patch version 9 are unrelated. > Enable YARN to use S3A > --- > > Key: HADOOP-11262 > URL: https://issues.apache.org/jira/browse/HADOOP-11262 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Thomas Demoor >Assignee: Pieter Reuse > Labels: amazon, s3 > Attachments: HADOOP-11262-10.patch, HADOOP-11262-2.patch, > HADOOP-11262-3.patch, HADOOP-11262-4.patch, HADOOP-11262-5.patch, > HADOOP-11262-6.patch, HADOOP-11262-7.patch, HADOOP-11262-8.patch, > HADOOP-11262-9.patch, HADOOP-11262.patch > > > Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A
[ https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-11262: -- Attachment: HADOOP-11262-9.patch Thank you for the +1's, [~mackrorysd], [~eddyxu] and [~ste...@apache.org]. [~cnauroth], thank you for looking at this patch. I have added @Ignore annotations where appropriate in version 9. I also removed the copyright lines featuring the year. Regarding the modification-times of directories in S3A: as directories are "fakes" in s3a, there is no feasible way to get accurate directory timestamps without extensive locking (and coping with slow listings), which counters the rationale of object stores. Therefore, we chose a "dummy" implementation that doesn't break (too many) things. Setting a fixed time (e.g. epoch) breaks the history server as it looks at the modification time of the directory-object before moving it, and decides the files don't need to be copied if they are "too old". Setting the modificationtime of directories in S3A to System.currentTimeMillis() ensures the historyserver never labels them as being "too old". Good that you have taken a deeper look into whether always labelling directories as "too young" can give rise to problems in YARN. Looking deeper into the classes LocalResource and LocalResourceType learns that the YARN resource localization is always executed against regular files or .jar-archives (these are the only possible values of LocalResourceType), for which S3A returns the correct timestamps. However, looking at the AggregatedLogDeletionService of YARN learns that this service will omit removing the appropriate logfiles because the directory will be labelled "too young". I did not find any other places in YARN where this patch can cause problems. I indicated this behaviour in the index.md file in the patch. As this is no breaking situation, I still propose to go forward with this patch. > Enable YARN to use S3A > --- > > Key: HADOOP-11262 > URL: https://issues.apache.org/jira/browse/HADOOP-11262 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Thomas Demoor >Assignee: Pieter Reuse > Labels: amazon, s3 > Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, > HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, > HADOOP-11262-7.patch, HADOOP-11262-8.patch, HADOOP-11262-9.patch, > HADOOP-11262.patch > > > Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A
[ https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-11262: -- Attachment: (was: HADOOP-11262-8.patch) > Enable YARN to use S3A > --- > > Key: HADOOP-11262 > URL: https://issues.apache.org/jira/browse/HADOOP-11262 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Thomas Demoor >Assignee: Pieter Reuse > Labels: amazon, s3 > Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, > HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, > HADOOP-11262-7.patch, HADOOP-11262.patch > > > Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A
[ https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-11262: -- Attachment: HADOOP-11262-8.patch Test failures seem to be unrelated, I could not reproduce them locally and the code path of these tests is separate from this patch. Re-uploaded patch 8, forgot to remove the testSetVerifyChecksum()-changes yesterday. > Enable YARN to use S3A > --- > > Key: HADOOP-11262 > URL: https://issues.apache.org/jira/browse/HADOOP-11262 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Thomas Demoor >Assignee: Pieter Reuse > Labels: amazon, s3 > Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, > HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, > HADOOP-11262-7.patch, HADOOP-11262-8.patch, HADOOP-11262.patch > > > Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A
[ https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-11262: -- Attachment: HADOOP-11262-8.patch Thank you, [~eddyxu] and [~cnauroth] for reviewing this and suggesting improvements. I've uploaded patch version 8, addressing [~eddyxu]'s remarks about coding style, @Override in combination with @Before, @After or @Test and a try-with-recourses. Regarding the setVerifyChecksum-test, we ([~Thomas Demoor] and I) noticed that the default FileSystem - and therefore S3A - simply ignores the "setVerifyChecksum"-flag, and that the test is therefore unnecessary for S3A. So overriding it with an empty method avoids having a falsely failing test. The original observation was that this flag is only used in the inputstream, and calling setVerifyChecksum() before out.close() is called makes S3A throw an IOException. Simply changing the order of these two lines fixed the issue we faced there, but patch 8 uses a different approach and does not change the testSetVerifyChecksum()-method in the super class any more. > Enable YARN to use S3A > --- > > Key: HADOOP-11262 > URL: https://issues.apache.org/jira/browse/HADOOP-11262 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Thomas Demoor >Assignee: Pieter Reuse > Labels: amazon, s3 > Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, > HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, > HADOOP-11262-7.patch, HADOOP-11262-8.patch, HADOOP-11262.patch > > > Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11262) Enable YARN to use S3A
[ https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907815#comment-14907815 ] Pieter Reuse commented on HADOOP-11262: --- Test failures in hadoop-common unrelated. Patch still applies. > Enable YARN to use S3A > --- > > Key: HADOOP-11262 > URL: https://issues.apache.org/jira/browse/HADOOP-11262 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Thomas Demoor >Assignee: Pieter Reuse > Labels: amazon, s3 > Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, > HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, > HADOOP-11262-7.patch, HADOOP-11262.patch > > > Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list
[ https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-12169: -- Attachment: HADOOP-12169-002.patch added patch v2 which addresses the issues mentioned by [~ste...@apache.org]: * moved f.makeQualified(uri, workingDir) out of the while-loop for performance reasons * moved added test to AbstractContractGetFileStatusTest in hadoop-common * added S3A-implementation of this abstract test class * added fs.contract.supports-getfilestatus=true to test/resources/contract/s3a.xml to enable these tests > ListStatus on empty dir in S3A lists itself instead of returning an empty list > -- > > Key: HADOOP-12169 > URL: https://issues.apache.org/jira/browse/HADOOP-12169 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.6.0, 2.7.0, 2.7.1 >Reporter: Pieter Reuse >Assignee: Pieter Reuse > Attachments: HADOOP-12169-001.patch, HADOOP-12169-002.patch > > > Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour > this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty > bucket returns an empty list, while doing the same on an empty directory, > returns an array of length 1 containing only this directory itself. > The bugfix is quite simple. In the line of code {code}...if > (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. > the fs and f is not. Therefore, this returns false while it shouldn't. The > bugfix to make f qualified in this line of code. > More formally: accoring to the formal definition of [The Hadoop FileSystem > API > Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/], > more specifically FileSystem.listStatus, only child elements of a directory > should be returned upon a listStatus()-call. > In detail: > {code} > elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where > f(c) == True] > {code} > and > {code} > def children(FS, p) = {q for q in paths(FS) where parent(q) == p} > {code} > Which translates to the result of listStatus on an empty directory being an > empty list. This is the same behaviour as ls has in Unix, which is what > someone would expect from a FileSystem. > Note: it seemed appropriate to add the test of this patch to the same file as > the test for HADOOP-11918, but as a result, one of the two will have to be > rebased wrt. the other before being applied to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A
[ https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-11262: -- Attachment: HADOOP-11262-7.patch Uploaded patch version 7, a rebased version of patch 6 which did no longer apply. Other than that the only difference between versions 6 and 7 is that the first mentioned bug in S3AFileSystem is fixed in DelegateToFileSystem (see HADOOP-12304), so I removed the duplicate bugfix is in this patch. Enable YARN to use S3A --- Key: HADOOP-11262 URL: https://issues.apache.org/jira/browse/HADOOP-11262 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Thomas Demoor Assignee: Pieter Reuse Labels: amazon, s3 Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, HADOOP-11262-7.patch, HADOOP-11262.patch Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A
[ https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-11262: -- Status: Patch Available (was: In Progress) Enable YARN to use S3A --- Key: HADOOP-11262 URL: https://issues.apache.org/jira/browse/HADOOP-11262 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Thomas Demoor Assignee: Pieter Reuse Labels: amazon, s3 Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, HADOOP-11262.patch Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A
[ https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-11262: -- Attachment: HADOOP-11262-6.patch Patch 6: As requested, expanded patch 5 with tests by extending the following tests, overriding them with S3A specifics: * _TestFileContext.java_ * _FileContextCreateMkdirBaseTest.java_ * _FileContextMainOperationsBaseTest.java_ * _FCStatisticsBaseTest.java_ * _FileContextURIBase.java_ * _FileContextUtilBase.java_ In doing so, fixed following bugs in _FileContextMainOperationsBaseTest.java_: * _line 1169_: creating a symlink on an FS that doesn't support this throws an _UnsupportedOperationException_, not an _IOException_ (see _FileContext.java:1441_). * _lines 1252 and 1313_: the contract of _read()_ is not to read the whole file - that's the contract of _readFully()_. For this reason tests assuming that the whole file has been read should use _readFully()_ instead of _read()_. And added an enhancement for object storage systems in the same file: * _line 1238_: an object storage system throws an _IOException_ as a file does not exist *before* the file is closed (nor does it have a checksum at that moment). This object-storage issue is resolved by changing the order of _fc.setVerifyChecksum(true, path)_ and _out.write(data, 0, data.length)_, while this does not impact the behaviour on hdfs or other file systems. Discovered and patched the following related bugs in S3A: * Bugfix in _S3AFileSystem.java_: ports on s3 should be ignored, which corresponds with a value of -1 (instead of the default 0 in FileSystem). * Another bugfix is in _S3AFileStatus.java_: _getModificationTime()_ is overwritten for directories. It returns _System.currentTimeMillis()_ because an ObjectStore does not keep track of modification-times of directories. Because some parts of the Hadoop ecosystem use modification time to ignore or delete old directories (e.g. the YarnHistorySever), returning 0 for directories is not the best option here. Added _TestS3AMiniYarnCluster.java_, which runs a simple _WordCount_-MapReduce job on a _YarnMiniCluster_ using S3A as filesystem. Enable YARN to use S3A --- Key: HADOOP-11262 URL: https://issues.apache.org/jira/browse/HADOOP-11262 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Thomas Demoor Assignee: Pieter Reuse Labels: amazon, s3 Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, HADOOP-11262.patch Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11262) Enable YARN to use S3A
[ https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612992#comment-14612992 ] Pieter Reuse commented on HADOOP-11262: --- Whoops, sorry for the double post of this comment. You can ignore one of both. Enable YARN to use S3A --- Key: HADOOP-11262 URL: https://issues.apache.org/jira/browse/HADOOP-11262 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Thomas Demoor Assignee: Pieter Reuse Labels: amazon, s3 Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, HADOOP-11262.patch Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HADOOP-11262) Enable YARN to use S3A
[ https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-11262 started by Pieter Reuse. - Enable YARN to use S3A --- Key: HADOOP-11262 URL: https://issues.apache.org/jira/browse/HADOOP-11262 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Thomas Demoor Assignee: Pieter Reuse Labels: amazon, s3 Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262.patch Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A
[ https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-11262: -- Attachment: HADOOP-11262-006.patch Patch 6: As requested, expanded patch 5 with tests by extending the following tests, overriding them with S3A specifics: * _TestFileContext.java_ * _FileContextCreateMkdirBaseTest.java_ * _FileContextMainOperationsBaseTest.java_ * _FCStatisticsBaseTest.java_ * _FileContextURIBase.java_ * _FileContextUtilBase.java_ In doing so, fixed following bugs in _FileContextMainOperationsBaseTest.java_: * _line 1169_: creating a symlink on an FS that doesn't support this throws an _UnsupportedOperationException_, not an _IOException_ (see _FileContext.java:1441_). * _lines 1252 and 1313_: the contract of _read()_ is not to read the whole file - that's the contract of _readFully()_. For this reason tests assuming that the whole file has been read should use _readFully()_ instead of _read()_. And added an enhancement for object storage systems in the same file: * _line 1238_: an object storage system throws an _IOException_ as a file does not exist *before* the file is closed (nor does it have a checksum at that moment). This object-storage issue is resolved by changing the order of _fc.setVerifyChecksum(true, path)_ and _out.write(data, 0, data.length)_, while this does not impact the behaviour on hdfs or other file systems. Discovered and patched the following related bugs in S3A: * Bugfix in _S3AFileSystem.java_: ports on s3 should be ignored, which corresponds with a value of -1 (instead of the default 0 in FileSystem). * Another bugfix is in _S3AFileStatus.java_: _getModificationTime()_ is overwritten for directories. It returns _System.currentTimeMillis()_ because an ObjectStore does not keep track of modification-times of directories. Because some parts of the Hadoop ecosystem use modification time to ignore or delete old directories (e.g. the YarnHistorySever), returning 0 for directories is not the best option here. Added _TestS3AMiniYarnCluster.java_, which runs a simple _WordCount_-MapReduce job on a _YarnMiniCluster_ using S3A as filesystem. Enable YARN to use S3A --- Key: HADOOP-11262 URL: https://issues.apache.org/jira/browse/HADOOP-11262 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Thomas Demoor Assignee: Pieter Reuse Labels: amazon, s3 Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262.patch Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A
[ https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-11262: -- Attachment: (was: HADOOP-11262-006.patch) Enable YARN to use S3A --- Key: HADOOP-11262 URL: https://issues.apache.org/jira/browse/HADOOP-11262 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Thomas Demoor Assignee: Pieter Reuse Labels: amazon, s3 Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262.patch Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list
[ https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611682#comment-14611682 ] Pieter Reuse commented on HADOOP-12169: --- Yes, this bugfix is intended for Hadoop version 2.7. The bug was introduced in Hadoop-2.6 and Hadoop-2.7 is affected by it. ListStatus on empty dir in S3A lists itself instead of returning an empty list -- Key: HADOOP-12169 URL: https://issues.apache.org/jira/browse/HADOOP-12169 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 2.6.0, 2.7.0, 2.7.1 Reporter: Pieter Reuse Assignee: Pieter Reuse Attachments: HADOOP-12169-001.patch Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty bucket returns an empty list, while doing the same on an empty directory, returns an array of length 1 containing only this directory itself. The bugfix is quite simple. In the line of code {code}...if (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. the fs and f is not. Therefore, this returns false while it shouldn't. The bugfix to make f qualified in this line of code. More formally: accoring to the formal definition of [The Hadoop FileSystem API Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/], more specifically FileSystem.listStatus, only child elements of a directory should be returned upon a listStatus()-call. In detail: {code} elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) == True] {code} and {code} def children(FS, p) = {q for q in paths(FS) where parent(q) == p} {code} Which translates to the result of listStatus on an empty directory being an empty list. This is the same behaviour as ls has in Unix, which is what someone would expect from a FileSystem. Note: it seemed appropriate to add the test of this patch to the same file as the test for HADOOP-11918, but as a result, one of the two will have to be rebased wrt. the other before being applied to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list
[ https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-12169: -- Affects Version/s: 2.6.0 ListStatus on empty dir in S3A lists itself instead of returning an empty list -- Key: HADOOP-12169 URL: https://issues.apache.org/jira/browse/HADOOP-12169 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 2.6.0, 2.7.0, 2.7.1 Reporter: Pieter Reuse Assignee: Pieter Reuse Attachments: HADOOP-12169-001.patch Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty bucket returns an empty list, while doing the same on an empty directory, returns an array of length 1 containing only this directory itself. The bugfix is quite simple. In the line of code {code}...if (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. the fs and f is not. Therefore, this returns false while it shouldn't. The bugfix to make f qualified in this line of code. More formally: accoring to the formal definition of [The Hadoop FileSystem API Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/], more specifically FileSystem.listStatus, only child elements of a directory should be returned upon a listStatus()-call. In detail: {code} elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) == True] {code} and {code} def children(FS, p) = {q for q in paths(FS) where parent(q) == p} {code} Which translates to the result of listStatus on an empty directory being an empty list. This is the same behaviour as ls has in Unix, which is what someone would expect from a FileSystem. Note: it seemed appropriate to add the test of this patch to the same file as the test for HADOOP-11918, but as a result, one of the two will have to be rebased wrt. the other before being applied to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list
[ https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-12169: -- Affects Version/s: 2.7.1 2.7.0 ListStatus on empty dir in S3A lists itself instead of returning an empty list -- Key: HADOOP-12169 URL: https://issues.apache.org/jira/browse/HADOOP-12169 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 2.7.0, 2.7.1 Reporter: Pieter Reuse Assignee: Pieter Reuse Attachments: HADOOP-12169-001.patch Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty bucket returns an empty list, while doing the same on an empty directory, returns an array of length 1 containing only this directory itself. The bugfix is quite simple. In the line of code {code}...if (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. the fs and f is not. Therefore, this returns false while it shouldn't. The bugfix to make f qualified in this line of code. More formally: accoring to the formal definition of [The Hadoop FileSystem API Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/], more specifically FileSystem.listStatus, only child elements of a directory should be returned upon a listStatus()-call. In detail: {code} elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) == True] {code} and {code} def children(FS, p) = {q for q in paths(FS) where parent(q) == p} {code} Which translates to the result of listStatus on an empty directory being an empty list. This is the same behaviour as ls has in Unix, which is what someone would expect from a FileSystem. Note: it seemed appropriate to add the test of this patch to the same file as the test for HADOOP-11918, but as a result, one of the two will have to be rebased wrt. the other before being applied to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list
[ https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-12169 started by Pieter Reuse. - ListStatus on empty dir in S3A lists itself instead of returning an empty list -- Key: HADOOP-12169 URL: https://issues.apache.org/jira/browse/HADOOP-12169 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Reporter: Pieter Reuse Assignee: Pieter Reuse Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty bucket returns an empty list, while doing the same on an empty directory, returns an array of length 1 containing only this directory itself. The bugfix is quite simple. In the line of code {code}...if (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. the fs and f is not. Therefore, this returns false while it shouldn't. The bugfix to make f qualified in this line of code. More formally: accoring to the formal definition of [The Hadoop FileSystem API Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/], more specifically FileSystem.listStatus, only child elements of a directory should be returned upon a listStatus()-call. In detail: {code} elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) == True] {code} and {code} def children(FS, p) = {q for q in paths(FS) where parent(q) == p} {code} Which translates to the result of listStatus on an empty directory being an empty list. This is the same behaviour as ls has in Unix, which is what someone would expect from a FileSystem. Note: it seemed appropriate to add the test of this patch to the same file as the test for HADOOP-11918, but as a result, one of the two will have to be rebased wrt. the other before being applied to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list
Pieter Reuse created HADOOP-12169: - Summary: ListStatus on empty dir in S3A lists itself instead of returning an empty list Key: HADOOP-12169 URL: https://issues.apache.org/jira/browse/HADOOP-12169 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Reporter: Pieter Reuse Assignee: Pieter Reuse Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty bucket returns an empty list, while doing the same on an empty directory, returns an array of length 1 containing only this directory itself. The bugfix is quite simple. In the line of code {code}...if (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. the fs and f is not. Therefore, this returns false while it shouldn't. The bugfix to make f qualified in this line of code. More formally: accoring to the formal definition of [The Hadoop FileSystem API Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/], more specifically FileSystem.listStatus, only child elements of a directory should be returned upon a listStatus()-call. In detail: {code} elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) == True] {code} and {code} def children(FS, p) = {q for q in paths(FS) where parent(q) == p} {code} Which translates to the result of listStatus on an empty directory being an empty list. This is the same behaviour as ls has in Unix, which is what someone would expect from a FileSystem. Note: it seemed appropriate to add the test of this patch to the same file as the test for HADOOP-11918, but as a result, one of the two will have to be rebased wrt. the other before being applied to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list
[ https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-12169: -- Description: Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty bucket returns an empty list, while doing the same on an empty directory, returns an array of length 1 containing only this directory itself. The bugfix is quite simple. In the line of code {code}...if (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. the fs and f is not. Therefore, this returns false while it shouldn't. The bugfix to make f qualified in this line of code. More formally: accoring to the formal definition of [The Hadoop FileSystem API Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/], more specifically FileSystem.listStatus, only child elements of a directory should be returned upon a listStatus()-call. In detail: {code} elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) == True] {code} and {code} def children(FS, p) = {q for q in paths(FS) where parent(q) == p} {code} Which translates to the result of listStatus on an empty directory being an empty list. This is the same behaviour as ls has in Unix, which is what someone would expect from a FileSystem. Note: it seemed appropriate to add the test of this patch to the same file as the test for HADOOP-11918, but as a result, one of the two will have to be rebased wrt. the other before being applied to trunk. was: Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty bucket returns an empty list, while doing the same on an empty directory, returns an array of length 1 containing only this directory itself. The bugfix is quite simple. In the line of code {code}...if (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. the fs and f is not. Therefore, this returns false while it shouldn't. The bugfix to make f qualified in this line of code. More formally: accoring to the formal definition of [The Hadoop FileSystem API Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/], more specifically FileSystem.listStatus, only child elements of a directory should be returned upon a listStatus()-call. In detail: {code} elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) == True] {code} and {code} def children(FS, p) = {q for q in paths(FS) where parent(q) == p} {code} Which translates to the result of listStatus on an empty directory being an empty list. This is the same behaviour as ls has in Unix, which is what someone would expect from a FileSystem. Note: it seemed appropriate to add the test of this patch to the same file as the test for HADOOP-11918, but as a result, one of the two will have to be rebased wrt. the other before being applied to trunk. ListStatus on empty dir in S3A lists itself instead of returning an empty list -- Key: HADOOP-12169 URL: https://issues.apache.org/jira/browse/HADOOP-12169 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Reporter: Pieter Reuse Assignee: Pieter Reuse Attachments: HADOOP-12169-001.patch Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty bucket returns an empty list, while doing the same on an empty directory, returns an array of length 1 containing only this directory itself. The bugfix is quite simple. In the line of code {code}...if (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. the fs and f is not. Therefore, this returns false while it shouldn't. The bugfix to make f qualified in this line of code. More formally: accoring to the formal definition of [The Hadoop FileSystem API Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/], more specifically FileSystem.listStatus, only child elements of a directory should be returned upon a listStatus()-call. In detail: {code} elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) == True] {code} and {code} def children(FS, p) = {q for q in paths(FS) where parent(q) == p} {code} Which translates to the result of listStatus on an empty directory being an empty list. This is the same behaviour as ls has in Unix, which is what someone would expect from a FileSystem. Note: it seemed appropriate to add the test of this patch to the same file as the test for
[jira] [Updated] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list
[ https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-12169: -- Status: Patch Available (was: In Progress) ListStatus on empty dir in S3A lists itself instead of returning an empty list -- Key: HADOOP-12169 URL: https://issues.apache.org/jira/browse/HADOOP-12169 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Reporter: Pieter Reuse Assignee: Pieter Reuse Attachments: HADOOP-12169-001.patch Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty bucket returns an empty list, while doing the same on an empty directory, returns an array of length 1 containing only this directory itself. The bugfix is quite simple. In the line of code {code}...if (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. the fs and f is not. Therefore, this returns false while it shouldn't. The bugfix to make f qualified in this line of code. More formally: accoring to the formal definition of [The Hadoop FileSystem API Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/], more specifically FileSystem.listStatus, only child elements of a directory should be returned upon a listStatus()-call. In detail: {code} elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) == True] {code} and {code} def children(FS, p) = {q for q in paths(FS) where parent(q) == p} {code} Which translates to the result of listStatus on an empty directory being an empty list. This is the same behaviour as ls has in Unix, which is what someone would expect from a FileSystem. Note: it seemed appropriate to add the test of this patch to the same file as the test for HADOOP-11918, but as a result, one of the two will have to be rebased wrt. the other before being applied to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11918) Listing an empty s3a root directory throws FileNotFound.
[ https://issues.apache.org/jira/browse/HADOOP-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pieter Reuse updated HADOOP-11918: -- Attachment: HADOOP-11918-002.patch I verified this patch with the test output below. *However*, running all the JUnit tests has the effect of spurious directories in the test bucket. This means that the bucket wasn't empty during the TestS3AFileSystem.testListRootDirectory, while the bug only surfaces on an empty directory. Because of this, I have added four lines to the test cleaning the entire bucket before doing the actual assert. --- T E S T S --- Running org.apache.hadoop.fs.contract.s3a.TestS3AContractSeek Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.706 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractSeek Running org.apache.hadoop.fs.contract.s3a.TestS3AContractDelete Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.564 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractDelete Running org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.598 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.443 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir Running org.apache.hadoop.fs.contract.s3a.TestS3AContractCreate Tests run: 6, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 10.286 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractCreate Running org.apache.hadoop.fs.contract.s3a.TestS3AContractMkdir Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.942 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractMkdir Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRename Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.752 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRename Running org.apache.hadoop.fs.s3a.TestS3AFastOutputStream Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.564 sec - in org.apache.hadoop.fs.s3a.TestS3AFastOutputStream Running org.apache.hadoop.fs.s3a.TestS3ABlocksize Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.872 sec - in org.apache.hadoop.fs.s3a.TestS3ABlocksize Running org.apache.hadoop.fs.s3a.TestS3AFileSystemContract Tests run: 43, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 91.585 sec - in org.apache.hadoop.fs.s3a.TestS3AFileSystemContract Running org.apache.hadoop.fs.s3a.TestS3AConfiguration Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.948 sec - in org.apache.hadoop.fs.s3a.TestS3AConfiguration Running org.apache.hadoop.fs.s3a.scale.TestS3ADeleteManyFiles Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 533.853 sec - in org.apache.hadoop.fs.s3a.scale.TestS3ADeleteManyFiles Running org.apache.hadoop.fs.s3a.TestS3AFileSystem Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.476 sec - in org.apache.hadoop.fs.s3a.TestS3AFileSystem Results : Tests run: 101, Failures: 0, Errors: 0, Skipped: 3 [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 12:34.020s [INFO] Finished at: Thu Jun 18 15:57:16 CEST 2015 [INFO] Final Memory: 25M/419M [INFO] Listing an empty s3a root directory throws FileNotFound. Key: HADOOP-11918 URL: https://issues.apache.org/jira/browse/HADOOP-11918 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Labels: BB2015-05-TBR, s3 Attachments: HADOOP-11918-002.patch, HADOOP-11918.000.patch, HADOOP-11918.001.patch With an empty s3 bucket and run {code} $ hadoop fs -D... -ls s3a://hdfs-s3a-test/ 15/05/04 15:21:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ls: `s3a://hdfs-s3a-test/': No such file or directory {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)