from:"Pieter Reuse \(JIRA\)"

[jira] [Commented] (HADOOP-13421) Switch to v2 of the S3 List Objects API in S3A

2016-11-25 Thread Pieter Reuse (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15695725#comment-15695725
 ] 

Pieter Reuse commented on HADOOP-13421:
---

Thank you for putting this on my radar, [~ste...@apache.org]. I will look 
deeper into it and come back on this in the course of next week.

> Switch to v2 of the S3 List Objects API in S3A
> --
>
> Key: HADOOP-13421
> URL: https://issues.apache.org/jira/browse/HADOOP-13421
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steven K. Wong
>Priority: Minor
>
> Unlike [version 
> 1|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html] of the 
> S3 List Objects API, [version 
> 2|http://docs.aws.amazon.com/AmazonS3/latest/API/v2-RESTBucketGET.html] by 
> default does not fetch object owner information, which S3A doesn't need 
> anyway. By switching to v2, there will be less data to transfer/process. 
> Also, it should be more robust when listing a versioned bucket with "a large 
> number of delete markers" ([according to 
> AWS|https://aws.amazon.com/releasenotes/Java/0735652458007581]).
> Methods in S3AFileSystem that use this API include:
> * getFileStatus(Path)
> * innerDelete(Path, boolean)
> * innerListStatus(Path)
> * innerRename(Path, Path)
> Requires AWS SDK 1.10.75 or later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13735) ITestS3AFileContextStatistics.testStatistics() failing

2016-10-20 Thread Pieter Reuse (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15591724#comment-15591724
 ] 

Pieter Reuse commented on HADOOP-13735:
---

Thank you for uploading my patch, Steve. -1 from Yetus is expected. This patch 
makes a test pass so no additional tests needed here in my opinion.

> ITestS3AFileContextStatistics.testStatistics() failing
> --
>
> Key: HADOOP-13735
> URL: https://issues.apache.org/jira/browse/HADOOP-13735
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Pieter Reuse
>Priority: Minor
> Attachments: HADOOP-13735-branch-2-001.patch
>
>
> The test {{ITestS3AFileContextStatistics.testStatistics()}} seems to fail 
> pretty reliably these days...I'd assumed it was some race condition, but 
> maybe not. 
> Fixing this will probably entail adding more diagnostics to the base test 
> case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems

2016-08-01 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-9565:
-
Attachment: HADOOP-9565-branch-2-007.patch

renamed file to HADOOP-9565-branch-2-007.patch so Hadoop QA can apply it.

> Add a Blobstore interface to add to blobstore FileSystems
> -
>
> Key: HADOOP-9565
> URL: https://issues.apache.org/jira/browse/HADOOP-9565
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/s3, fs/swift
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Pieter Reuse
> Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, 
> HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch, 
> HADOOP-9565-006.patch, HADOOP-9565-branch-2-007.patch
>
>
> We can make the fact that some {{FileSystem}} implementations are really 
> blobstores, with different atomicity and consistency guarantees, by adding a 
> {{Blobstore}} interface to add to them. 
> This could also be a place to add a {{Copy(Path,Path)}} method, assuming that 
> all blobstores implement at server-side copy operation as a substitute for 
> rename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems

2016-08-01 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-9565:
-
Attachment: (was: HADOOP-9565-007.patch)

> Add a Blobstore interface to add to blobstore FileSystems
> -
>
> Key: HADOOP-9565
> URL: https://issues.apache.org/jira/browse/HADOOP-9565
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/s3, fs/swift
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Pieter Reuse
> Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, 
> HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch, 
> HADOOP-9565-006.patch
>
>
> We can make the fact that some {{FileSystem}} implementations are really 
> blobstores, with different atomicity and consistency guarantees, by adding a 
> {{Blobstore}} interface to add to them. 
> This could also be a place to add a {{Copy(Path,Path)}} method, assuming that 
> all blobstores implement at server-side copy operation as a substitute for 
> rename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems

2016-07-29 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-9565:
-
Attachment: HADOOP-9565-007.patch

Uploaded patch version 7:
* Renamed UnsatisfiedSemanticsException to UnsatisfiedFeatureException
* Renamed ObjectStoreFeatures to ObjectStoreFeature and changed to an 
enum-based approach
* added style improvements to FileOutputCommitter, since we're changing a lot 
of lines throughout the class. Makes this patch prone to become non-applicable 
when other patches change this class.

Chris, thanks for pointing out Steve will be enjoying a well earned break the 
coming weeks. We will have patience w.r.t. his response.

> Add a Blobstore interface to add to blobstore FileSystems
> -
>
> Key: HADOOP-9565
> URL: https://issues.apache.org/jira/browse/HADOOP-9565
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/s3, fs/swift
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Pieter Reuse
> Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, 
> HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch, 
> HADOOP-9565-006.patch, HADOOP-9565-007.patch
>
>
> We can make the fact that some {{FileSystem}} implementations are really 
> blobstores, with different atomicity and consistency guarantees, by adding a 
> {{Blobstore}} interface to add to them. 
> This could also be a place to add a {{Copy(Path,Path)}} method, assuming that 
> all blobstores implement at server-side copy operation as a substitute for 
> rename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems

2016-07-28 Thread Pieter Reuse (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397523#comment-15397523
 ] 

Pieter Reuse commented on HADOOP-9565:
--

Thanks for patch 006, Steve. Having Strings instead of bitmasks is definitely 
an improvement. But maybe an enum is worth considering here.

I don't think the Put command adds functionality. An ObjectStore-object is 
still a FileSystem-object, having the full FileSystem interface available. Or 
am I missing something here?

Thank you for mentioning ViewFs here, Chris. It's important that this patch 
doesn't break the great ease of use ViewFs brings. But considering ViewFs is 
client-side only and this patch only brings performance enhancements, I don't 
think it is worth the extra miles checking the config of ViewFs whether the 
Path belongs to an ObjectStore which is part of a ViewFs instance.

> Add a Blobstore interface to add to blobstore FileSystems
> -
>
> Key: HADOOP-9565
> URL: https://issues.apache.org/jira/browse/HADOOP-9565
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/s3, fs/swift
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Pieter Reuse
> Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, 
> HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch, 
> HADOOP-9565-006.patch
>
>
> We can make the fact that some {{FileSystem}} implementations are really 
> blobstores, with different atomicity and consistency guarantees, by adding a 
> {{Blobstore}} interface to add to them. 
> This could also be a place to add a {{Copy(Path,Path)}} method, assuming that 
> all blobstores implement at server-side copy operation as a substitute for 
> rename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients

2016-07-13 Thread Pieter Reuse (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15374856#comment-15374856
 ] 

Pieter Reuse commented on HADOOP-13139:
---

Great that this feature is now in branch-2 as well! Thank you for working on 
this, [~ste...@apache.org]. Thanks for the input on this backport, 
[~andrew.wang] and [~cnauroth]. But of course most of the work was done by 
[~Thomas Demoor] and [~fabbri] on the original patch.

> Branch-2: S3a to use thread pool that blocks clients
> 
>
> Key: HADOOP-13139
> URL: https://issues.apache.org/jira/browse/HADOOP-13139
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Pieter Reuse
>Assignee: Pieter Reuse
> Attachments: HADOOP-13139-001.patch, HADOOP-13139-branch-2-003.patch, 
> HADOOP-13139-branch-2-004.patch, HADOOP-13139-branch-2-005.patch, 
> HADOOP-13139-branch-2-006.patch, HADOOP-13139-branch-2.001.patch, 
> HADOOP-13139-branch-2.002.patch
>
>
> HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will 
> attach a patch applicable to branch-2.
> It should be noted in CHANGES-2.8.0.txt that the config parameter 
> 'fs.s3a.threads.core' has been been removed and the behavior of the 
> ThreadPool for s3a has been changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients

2016-05-13 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-13139:
--
Status: Patch Available  (was: In Progress)

> Branch-2: S3a to use thread pool that blocks clients
> 
>
> Key: HADOOP-13139
> URL: https://issues.apache.org/jira/browse/HADOOP-13139
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Pieter Reuse
>Assignee: Pieter Reuse
> Fix For: 2.8.0
>
> Attachments: HADOOP-13139-001.patch, HADOOP-13139-branch-2.001.patch, 
> HADOOP-13139-branch-2.002.patch
>
>
> HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will 
> attach a patch applicable to branch-2.
> It should be noted in CHANGES-2.8.0.txt that the config parameter 
> 'fs.s3a.threads.core' has been been removed and the behavior of the 
> ThreadPool for s3a has been changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients

2016-05-13 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-13139:
--
Attachment: HADOOP-13139-branch-2.002.patch

Uploaded patch 002: changed patch w.r.t. HADOOP-13028, fixed the checkstyle 
issues and copied HADOOP-12553 to fix the javadoc error.

> Branch-2: S3a to use thread pool that blocks clients
> 
>
> Key: HADOOP-13139
> URL: https://issues.apache.org/jira/browse/HADOOP-13139
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Pieter Reuse
>Assignee: Pieter Reuse
> Fix For: 2.8.0
>
> Attachments: HADOOP-13139-001.patch, HADOOP-13139-branch-2.001.patch, 
> HADOOP-13139-branch-2.002.patch
>
>
> HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will 
> attach a patch applicable to branch-2.
> It should be noted in CHANGES-2.8.0.txt that the config parameter 
> 'fs.s3a.threads.core' has been been removed and the behavior of the 
> ThreadPool for s3a has been changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients

2016-05-12 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-13139:
--
Status: In Progress  (was: Patch Available)

> Branch-2: S3a to use thread pool that blocks clients
> 
>
> Key: HADOOP-13139
> URL: https://issues.apache.org/jira/browse/HADOOP-13139
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Pieter Reuse
>Assignee: Pieter Reuse
> Fix For: 2.8.0
>
> Attachments: HADOOP-13139-001.patch, HADOOP-13139-branch-2.001.patch
>
>
> HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will 
> attach a patch applicable to branch-2.
> It should be noted in CHANGES-2.8.0.txt that the config parameter 
> 'fs.s3a.threads.core' has been been removed and the behavior of the 
> ThreadPool for s3a has been changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients

2016-05-12 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-13139:
--
Attachment: HADOOP-13139-branch-2.001.patch

Changed name to HADOOP-13139-branch-2.001.patch

> Branch-2: S3a to use thread pool that blocks clients
> 
>
> Key: HADOOP-13139
> URL: https://issues.apache.org/jira/browse/HADOOP-13139
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Pieter Reuse
>Assignee: Pieter Reuse
> Fix For: 2.8.0
>
> Attachments: HADOOP-13139-001.patch, HADOOP-13139-branch-2.001.patch
>
>
> HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will 
> attach a patch applicable to branch-2.
> It should be noted in CHANGES-2.8.0.txt that the config parameter 
> 'fs.s3a.threads.core' has been been removed and the behavior of the 
> ThreadPool for s3a has been changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients

2016-05-12 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-13139:
--
 Hadoop Flags: Incompatible change
Affects Version/s: 2.8.0
   Status: Patch Available  (was: In Progress)

> Branch-2: S3a to use thread pool that blocks clients
> 
>
> Key: HADOOP-13139
> URL: https://issues.apache.org/jira/browse/HADOOP-13139
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Affects Versions: 2.8.0
>Reporter: Pieter Reuse
>Assignee: Pieter Reuse
> Fix For: 2.8.0
>
> Attachments: HADOOP-13139-001.patch
>
>
> HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will 
> attach a patch applicable to branch-2.
> It should be noted in CHANGES-2.8.0.txt that the config parameter 
> 'fs.s3a.threads.core' has been been removed and the behavior of the 
> ThreadPool for s3a has been changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients

2016-05-12 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-13139:
--
Attachment: HADOOP-13139-001.patch

Added patch 001

> Branch-2: S3a to use thread pool that blocks clients
> 
>
> Key: HADOOP-13139
> URL: https://issues.apache.org/jira/browse/HADOOP-13139
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Pieter Reuse
>Assignee: Pieter Reuse
> Fix For: 2.8.0
>
> Attachments: HADOOP-13139-001.patch
>
>
> HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will 
> attach a patch applicable to branch-2.
> It should be noted in CHANGES-2.8.0.txt that the config parameter 
> 'fs.s3a.threads.core' has been been removed and the behavior of the 
> ThreadPool for s3a has been changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients

2016-05-12 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-13139:
--
Fix Version/s: 2.8.0

> Branch-2: S3a to use thread pool that blocks clients
> 
>
> Key: HADOOP-13139
> URL: https://issues.apache.org/jira/browse/HADOOP-13139
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Pieter Reuse
>Assignee: Pieter Reuse
> Fix For: 2.8.0
>
> Attachments: HADOOP-13139-001.patch
>
>
> HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will 
> attach a patch applicable to branch-2.
> It should be noted in CHANGES-2.8.0.txt that the config parameter 
> 'fs.s3a.threads.core' has been been removed and the behavior of the 
> ThreadPool for s3a has been changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Work started] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients

2016-05-12 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-13139 started by Pieter Reuse.
-
> Branch-2: S3a to use thread pool that blocks clients
> 
>
> Key: HADOOP-13139
> URL: https://issues.apache.org/jira/browse/HADOOP-13139
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Reporter: Pieter Reuse
>Assignee: Pieter Reuse
>
> HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will 
> attach a patch applicable to branch-2.
> It should be noted in CHANGES-2.8.0.txt that the config parameter 
> 'fs.s3a.threads.core' has been been removed and the behavior of the 
> ThreadPool for s3a has been changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Created] (HADOOP-13139) Branch-2: S3a to use thread pool that blocks clients

2016-05-12 Thread Pieter Reuse (JIRA)

Pieter Reuse created HADOOP-13139:
-

 Summary: Branch-2: S3a to use thread pool that blocks clients
 Key: HADOOP-13139
 URL: https://issues.apache.org/jira/browse/HADOOP-13139
 Project: Hadoop Common
  Issue Type: Task
  Components: fs/s3
Reporter: Pieter Reuse
Assignee: Pieter Reuse


HADOOP-11684 is accepted into trunk, but was not applied to branch-2. I will 
attach a patch applicable to branch-2.

It should be noted in CHANGES-2.8.0.txt that the config parameter 
'fs.s3a.threads.core' has been been removed and the behavior of the ThreadPool 
for s3a has been changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-12844) Recover when S3A fails on IOException in read()

2016-04-29 Thread Pieter Reuse (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264105#comment-15264105
 ] 

Pieter Reuse commented on HADOOP-12844:
---

Thank you for looking into this, [~ste...@apache.org]. The patch 
HADOOP-13028-006 indeed fixes the issue targeted by the patch I've uploaded 
here. So LGTM, and this ticket can be closed when HADOOP-13028 is accepted.

> Recover when S3A fails on IOException in read()
> ---
>
> Key: HADOOP-12844
> URL: https://issues.apache.org/jira/browse/HADOOP-12844
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Pieter Reuse
>Assignee: Pieter Reuse
> Attachments: HADOOP-12844.001.patch
>
>
> This simple patch catches IOExceptions in S3AInputStream.read(byte[] buf, int 
> off, int len) and reopens the connection on the same location as it was 
> before the exception.
> This is similar to the functionality introduced in S3N in 
> [HADOOP-6254|https://issues.apache.org/jira/browse/HADOOP-6254], for exactly 
> the same reason.
> Patch developed in cooperation with [~emres].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-9565) Add a Blobstore interface to add to blobstore FileSystems

2016-04-08 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-9565:
-
Assignee: Pieter Reuse  (was: Thomas Demoor)

I've checked with [~Thomas Demoor] and I will work on a new patch addressing 
the remarks of [~steve_l] and [~aartokhy].

> Add a Blobstore interface to add to blobstore FileSystems
> -
>
> Key: HADOOP-9565
> URL: https://issues.apache.org/jira/browse/HADOOP-9565
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/s3, fs/swift
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Pieter Reuse
>  Labels: BB2015-05-TBR
> Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, 
> HADOOP-9565-003.patch, HADOOP-9565-004.patch, HADOOP-9565-005.patch
>
>
> We can make the fact that some {{FileSystem}} implementations are really 
> blobstores, with different atomicity and consistency guarantees, by adding a 
> {{Blobstore}} interface to add to them. 
> This could also be a place to add a {{Copy(Path,Path)}} method, assuming that 
> all blobstores implement at server-side copy operation as a substitute for 
> rename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12844) S3A fails on IOException

2016-02-26 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-12844:
--
Status: Patch Available  (was: Open)

> S3A fails on IOException
> 
>
> Key: HADOOP-12844
> URL: https://issues.apache.org/jira/browse/HADOOP-12844
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.7.2, 2.7.1
>Reporter: Pieter Reuse
>Assignee: Pieter Reuse
> Attachments: HADOOP-12844.001.patch
>
>
> This simple patch catches IOExceptions in S3AInputStream.read(byte[] buf, int 
> off, int len) and reopens the connection on the same location as it was 
> before the exception.
> This is similar to the functionality introduced in S3N in 
> [HADOOP-6254|https://issues.apache.org/jira/browse/HADOOP-6254], for exactly 
> the same reason.
> Patch developed in cooperation with [~emres].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12844) S3A fails on IOException

2016-02-26 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-12844:
--
Attachment: HADOOP-12844.001.patch

> S3A fails on IOException
> 
>
> Key: HADOOP-12844
> URL: https://issues.apache.org/jira/browse/HADOOP-12844
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Pieter Reuse
>Assignee: Pieter Reuse
> Attachments: HADOOP-12844.001.patch
>
>
> This simple patch catches IOExceptions in S3AInputStream.read(byte[] buf, int 
> off, int len) and reopens the connection on the same location as it was 
> before the exception.
> This is similar to the functionality introduced in S3N in 
> [HADOOP-6254|https://issues.apache.org/jira/browse/HADOOP-6254], for exactly 
> the same reason.
> Patch developed in cooperation with [~emres].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-12844) S3A fails on IOException

2016-02-26 Thread Pieter Reuse (JIRA)

Pieter Reuse created HADOOP-12844:
-

 Summary: S3A fails on IOException
 Key: HADOOP-12844
 URL: https://issues.apache.org/jira/browse/HADOOP-12844
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.7.2, 2.7.1
Reporter: Pieter Reuse
Assignee: Pieter Reuse


This simple patch catches IOExceptions in S3AInputStream.read(byte[] buf, int 
off, int len) and reopens the connection on the same location as it was before 
the exception.
This is similar to the functionality introduced in S3N in 
[HADOOP-6254|https://issues.apache.org/jira/browse/HADOOP-6254], for exactly 
the same reason.

Patch developed in cooperation with [~emres].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A

2016-01-11 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-11262:
--
Attachment: HADOOP-11262-10.patch

Thank you for pointing out the missing @Ignore, [~cnauroth]. Added this to 
patch version 10.

Test failures of Hadoop QA on patch version 9 are unrelated.

> Enable YARN to use S3A 
> ---
>
> Key: HADOOP-11262
> URL: https://issues.apache.org/jira/browse/HADOOP-11262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Thomas Demoor
>Assignee: Pieter Reuse
>  Labels: amazon, s3
> Attachments: HADOOP-11262-10.patch, HADOOP-11262-2.patch, 
> HADOOP-11262-3.patch, HADOOP-11262-4.patch, HADOOP-11262-5.patch, 
> HADOOP-11262-6.patch, HADOOP-11262-7.patch, HADOOP-11262-8.patch, 
> HADOOP-11262-9.patch, HADOOP-11262.patch
>
>
> Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A

2016-01-10 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-11262:
--
Attachment: HADOOP-11262-9.patch

Thank you for the +1's, [~mackrorysd], [~eddyxu] and [~ste...@apache.org].
[~cnauroth], thank you for looking at this patch.

I have added @Ignore annotations where appropriate in version 9. I also removed 
the copyright lines featuring the year.

Regarding the modification-times of directories in S3A: as directories are 
"fakes" in s3a, there is no feasible way to get accurate directory timestamps 
without extensive locking (and coping with slow listings), which counters the 
rationale of object stores. Therefore, we chose a "dummy" implementation that 
doesn't break (too many) things.

Setting a fixed time (e.g. epoch) breaks the history server as it looks at the 
modification time of the directory-object before moving it, and decides the 
files don't need to be copied if they are "too old". Setting the 
modificationtime of directories in S3A to System.currentTimeMillis() ensures 
the historyserver never labels them as being "too old".

Good that you have taken a deeper look into whether always labelling 
directories as "too young" can give rise to problems in YARN. Looking deeper 
into the classes LocalResource and LocalResourceType learns that the YARN 
resource localization is always executed against regular files or .jar-archives 
(these are the only possible values of LocalResourceType), for which S3A 
returns the correct timestamps.

However, looking at the AggregatedLogDeletionService of YARN learns that this 
service will omit removing the appropriate logfiles because the directory will 
be labelled "too young". I did not find any other places in YARN where this 
patch can cause problems. I indicated this behaviour in the index.md file in 
the patch. As this is no breaking situation, I still propose to go forward with 
this patch.

> Enable YARN to use S3A 
> ---
>
> Key: HADOOP-11262
> URL: https://issues.apache.org/jira/browse/HADOOP-11262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Thomas Demoor
>Assignee: Pieter Reuse
>  Labels: amazon, s3
> Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, 
> HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, 
> HADOOP-11262-7.patch, HADOOP-11262-8.patch, HADOOP-11262-9.patch, 
> HADOOP-11262.patch
>
>
> Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A

2015-11-10 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-11262:
--
Attachment: (was: HADOOP-11262-8.patch)

> Enable YARN to use S3A 
> ---
>
> Key: HADOOP-11262
> URL: https://issues.apache.org/jira/browse/HADOOP-11262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Thomas Demoor
>Assignee: Pieter Reuse
>  Labels: amazon, s3
> Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, 
> HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, 
> HADOOP-11262-7.patch, HADOOP-11262.patch
>
>
> Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A

2015-11-10 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-11262:
--
Attachment: HADOOP-11262-8.patch

Test failures seem to be unrelated, I could not reproduce them locally and the 
code path of these tests is separate from this patch.
Re-uploaded patch 8, forgot to remove the testSetVerifyChecksum()-changes 
yesterday.

> Enable YARN to use S3A 
> ---
>
> Key: HADOOP-11262
> URL: https://issues.apache.org/jira/browse/HADOOP-11262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Thomas Demoor
>Assignee: Pieter Reuse
>  Labels: amazon, s3
> Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, 
> HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, 
> HADOOP-11262-7.patch, HADOOP-11262-8.patch, HADOOP-11262.patch
>
>
> Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A

2015-11-09 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-11262:
--
Attachment: HADOOP-11262-8.patch

Thank you, [~eddyxu] and [~cnauroth] for reviewing this and suggesting 
improvements. I've uploaded patch version 8, addressing [~eddyxu]'s remarks 
about coding style, @Override in combination with @Before, @After or @Test and 
a try-with-recourses.

Regarding the setVerifyChecksum-test, we ([~Thomas Demoor] and I) noticed that 
the default FileSystem - and therefore S3A - simply ignores the 
"setVerifyChecksum"-flag, and that the test is therefore unnecessary for S3A. 
So overriding it with an empty method avoids having a falsely failing test.

The original observation was that this flag is only used in the inputstream, 
and calling setVerifyChecksum() before out.close() is called makes S3A throw an 
IOException. Simply changing the order of these two lines fixed the issue we 
faced there, but patch 8 uses a different approach and does not change the 
testSetVerifyChecksum()-method in the super class any more.

> Enable YARN to use S3A 
> ---
>
> Key: HADOOP-11262
> URL: https://issues.apache.org/jira/browse/HADOOP-11262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Thomas Demoor
>Assignee: Pieter Reuse
>  Labels: amazon, s3
> Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, 
> HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, 
> HADOOP-11262-7.patch, HADOOP-11262-8.patch, HADOOP-11262.patch
>
>
> Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11262) Enable YARN to use S3A

2015-09-25 Thread Pieter Reuse (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907815#comment-14907815
 ] 

Pieter Reuse commented on HADOOP-11262:
---

Test failures in hadoop-common unrelated. Patch still applies.

> Enable YARN to use S3A 
> ---
>
> Key: HADOOP-11262
> URL: https://issues.apache.org/jira/browse/HADOOP-11262
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Thomas Demoor
>Assignee: Pieter Reuse
>  Labels: amazon, s3
> Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, 
> HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, 
> HADOOP-11262-7.patch, HADOOP-11262.patch
>
>
> Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list

2015-09-24 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-12169:
--
Attachment: HADOOP-12169-002.patch

added patch v2 which addresses the issues mentioned by [~ste...@apache.org]:
* moved f.makeQualified(uri, workingDir) out of the while-loop for performance 
reasons
* moved added test to AbstractContractGetFileStatusTest in hadoop-common
* added S3A-implementation of this abstract test class
* added fs.contract.supports-getfilestatus=true to 
test/resources/contract/s3a.xml to enable these tests

> ListStatus on empty dir in S3A lists itself instead of returning an empty list
> --
>
> Key: HADOOP-12169
> URL: https://issues.apache.org/jira/browse/HADOOP-12169
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.6.0, 2.7.0, 2.7.1
>Reporter: Pieter Reuse
>Assignee: Pieter Reuse
> Attachments: HADOOP-12169-001.patch, HADOOP-12169-002.patch
>
>
> Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour 
> this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty 
> bucket returns an empty list, while doing the same on an empty directory, 
> returns an array of length 1 containing only this directory itself.
> The bugfix is quite simple. In the line of code {code}...if 
> (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. 
> the fs and f is not. Therefore, this returns false while it shouldn't. The 
> bugfix to make f qualified in this line of code.
> More formally: accoring to the formal definition of [The Hadoop FileSystem 
> API 
> Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/],
>  more specifically FileSystem.listStatus, only child elements of a directory 
> should be returned upon a listStatus()-call.
> In detail: 
> {code}
> elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where 
> f(c) == True]
> {code}
> and
> {code}
> def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
> {code}
> Which translates to the result of listStatus on an empty directory being an 
> empty list. This is the same behaviour as ls has in Unix, which is what 
> someone would expect from a FileSystem.
> Note: it seemed appropriate to add the test of this patch to the same file as 
> the test for HADOOP-11918, but as a result, one of the two will have to be 
> rebased wrt. the other before being applied to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A

2015-08-19 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-11262:
--
Attachment: HADOOP-11262-7.patch

Uploaded patch version 7, a rebased version of patch 6 which did no longer 
apply.

Other than that the only difference between versions 6 and 7 is that the first 
mentioned bug in S3AFileSystem is fixed in DelegateToFileSystem (see 
HADOOP-12304), so I removed the duplicate bugfix is in this patch.

 Enable YARN to use S3A 
 ---

 Key: HADOOP-11262
 URL: https://issues.apache.org/jira/browse/HADOOP-11262
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Thomas Demoor
Assignee: Pieter Reuse
  Labels: amazon, s3
 Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, 
 HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, 
 HADOOP-11262-7.patch, HADOOP-11262.patch


 Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A

2015-08-18 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-11262:
--
Status: Patch Available  (was: In Progress)

 Enable YARN to use S3A 
 ---

 Key: HADOOP-11262
 URL: https://issues.apache.org/jira/browse/HADOOP-11262
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Thomas Demoor
Assignee: Pieter Reuse
  Labels: amazon, s3
 Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, 
 HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, 
 HADOOP-11262.patch


 Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A

2015-07-03 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-11262:
--
Attachment: HADOOP-11262-6.patch

Patch 6:

As requested, expanded patch 5 with tests by extending the following tests, 
overriding them with S3A specifics:
* _TestFileContext.java_
* _FileContextCreateMkdirBaseTest.java_
* _FileContextMainOperationsBaseTest.java_
* _FCStatisticsBaseTest.java_
* _FileContextURIBase.java_
* _FileContextUtilBase.java_

In doing so, fixed following bugs in _FileContextMainOperationsBaseTest.java_:
* _line 1169_: creating a symlink on an FS that doesn't support this throws an 
_UnsupportedOperationException_, not an _IOException_ (see 
_FileContext.java:1441_).
* _lines 1252 and 1313_: the contract of _read()_ is not to read the whole file 
- that's the contract of _readFully()_. For this reason tests assuming that the 
whole file has been read should use _readFully()_ instead of _read()_.

And added an enhancement for object storage systems in the same file:
* _line 1238_: an object storage system throws an _IOException_ as a file does 
not exist *before* the file is closed (nor does it have a checksum at that 
moment). This object-storage issue is resolved by changing the order of 
_fc.setVerifyChecksum(true, path)_ and _out.write(data, 0, data.length)_, while 
this does not impact the behaviour on hdfs or other file systems.

Discovered and patched the following related bugs in S3A:
* Bugfix in _S3AFileSystem.java_: ports on s3 should be ignored, which 
corresponds with a value of -1 (instead of the default 0 in FileSystem).
* Another bugfix is in _S3AFileStatus.java_: _getModificationTime()_ is 
overwritten for directories. It returns _System.currentTimeMillis()_ because an 
ObjectStore does not keep track of modification-times of directories. Because 
some parts of the Hadoop ecosystem use modification time to ignore or delete 
old directories (e.g. the YarnHistorySever), returning 0 for directories is 
not the best option here.

Added _TestS3AMiniYarnCluster.java_, which runs a simple _WordCount_-MapReduce 
job on a _YarnMiniCluster_ using S3A as filesystem.

 Enable YARN to use S3A 
 ---

 Key: HADOOP-11262
 URL: https://issues.apache.org/jira/browse/HADOOP-11262
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Thomas Demoor
Assignee: Pieter Reuse
  Labels: amazon, s3
 Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, 
 HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, 
 HADOOP-11262.patch


 Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11262) Enable YARN to use S3A

2015-07-03 Thread Pieter Reuse (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612992#comment-14612992
 ] 

Pieter Reuse commented on HADOOP-11262:
---

Whoops, sorry for the double post of this comment. You can ignore one of both.

 Enable YARN to use S3A 
 ---

 Key: HADOOP-11262
 URL: https://issues.apache.org/jira/browse/HADOOP-11262
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Thomas Demoor
Assignee: Pieter Reuse
  Labels: amazon, s3
 Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, 
 HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262-6.patch, 
 HADOOP-11262.patch


 Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HADOOP-11262) Enable YARN to use S3A

2015-07-03 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-11262 started by Pieter Reuse.
-
 Enable YARN to use S3A 
 ---

 Key: HADOOP-11262
 URL: https://issues.apache.org/jira/browse/HADOOP-11262
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Thomas Demoor
Assignee: Pieter Reuse
  Labels: amazon, s3
 Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, 
 HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262.patch


 Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A

2015-07-03 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-11262:
--
Attachment: HADOOP-11262-006.patch

Patch 6:

As requested, expanded patch 5 with tests by extending the following tests, 
overriding them with S3A specifics:
* _TestFileContext.java_
* _FileContextCreateMkdirBaseTest.java_
* _FileContextMainOperationsBaseTest.java_
* _FCStatisticsBaseTest.java_
* _FileContextURIBase.java_
* _FileContextUtilBase.java_

In doing so, fixed following bugs in _FileContextMainOperationsBaseTest.java_:
* _line 1169_: creating a symlink on an FS that doesn't support this throws an 
_UnsupportedOperationException_, not an _IOException_ (see 
_FileContext.java:1441_).
* _lines 1252 and 1313_: the contract of _read()_ is not to read the whole file 
- that's the contract of _readFully()_. For this reason tests assuming that the 
whole file has been read should use _readFully()_ instead of _read()_.

And added an enhancement for object storage systems in the same file:
* _line 1238_: an object storage system throws an _IOException_ as a file does 
not exist *before* the file is closed (nor does it have a checksum at that 
moment). This object-storage issue is resolved by changing the order of 
_fc.setVerifyChecksum(true, path)_ and _out.write(data, 0, data.length)_, while 
this does not impact the behaviour on hdfs or other file systems.

Discovered and patched the following related bugs in S3A:
* Bugfix in _S3AFileSystem.java_: ports on s3 should be ignored, which 
corresponds with a value of -1 (instead of the default 0 in FileSystem).
* Another bugfix is in _S3AFileStatus.java_: _getModificationTime()_ is 
overwritten for directories. It returns _System.currentTimeMillis()_ because an 
ObjectStore does not keep track of modification-times of directories. Because 
some parts of the Hadoop ecosystem use modification time to ignore or delete 
old directories (e.g. the YarnHistorySever), returning 0 for directories is 
not the best option here.

Added _TestS3AMiniYarnCluster.java_, which runs a simple _WordCount_-MapReduce 
job on a _YarnMiniCluster_ using S3A as filesystem.

 Enable YARN to use S3A 
 ---

 Key: HADOOP-11262
 URL: https://issues.apache.org/jira/browse/HADOOP-11262
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Thomas Demoor
Assignee: Pieter Reuse
  Labels: amazon, s3
 Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, 
 HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262.patch


 Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11262) Enable YARN to use S3A

2015-07-03 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-11262:
--
Attachment: (was: HADOOP-11262-006.patch)

 Enable YARN to use S3A 
 ---

 Key: HADOOP-11262
 URL: https://issues.apache.org/jira/browse/HADOOP-11262
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Thomas Demoor
Assignee: Pieter Reuse
  Labels: amazon, s3
 Attachments: HADOOP-11262-2.patch, HADOOP-11262-3.patch, 
 HADOOP-11262-4.patch, HADOOP-11262-5.patch, HADOOP-11262.patch


 Uses DelegateToFileSystem to expose S3A as an AbstractFileSystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list

2015-07-02 Thread Pieter Reuse (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611682#comment-14611682
 ] 

Pieter Reuse commented on HADOOP-12169:
---

Yes, this bugfix is intended for Hadoop version 2.7.
The bug was introduced in Hadoop-2.6 and Hadoop-2.7 is affected by it.

 ListStatus on empty dir in S3A lists itself instead of returning an empty list
 --

 Key: HADOOP-12169
 URL: https://issues.apache.org/jira/browse/HADOOP-12169
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.6.0, 2.7.0, 2.7.1
Reporter: Pieter Reuse
Assignee: Pieter Reuse
 Attachments: HADOOP-12169-001.patch


 Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour 
 this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty 
 bucket returns an empty list, while doing the same on an empty directory, 
 returns an array of length 1 containing only this directory itself.
 The bugfix is quite simple. In the line of code {code}...if 
 (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. 
 the fs and f is not. Therefore, this returns false while it shouldn't. The 
 bugfix to make f qualified in this line of code.
 More formally: accoring to the formal definition of [The Hadoop FileSystem 
 API 
 Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/],
  more specifically FileSystem.listStatus, only child elements of a directory 
 should be returned upon a listStatus()-call.
 In detail: 
 {code}
 elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where 
 f(c) == True]
 {code}
 and
 {code}
 def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
 {code}
 Which translates to the result of listStatus on an empty directory being an 
 empty list. This is the same behaviour as ls has in Unix, which is what 
 someone would expect from a FileSystem.
 Note: it seemed appropriate to add the test of this patch to the same file as 
 the test for HADOOP-11918, but as a result, one of the two will have to be 
 rebased wrt. the other before being applied to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list

2015-07-02 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-12169:
--
Affects Version/s: 2.6.0

 ListStatus on empty dir in S3A lists itself instead of returning an empty list
 --

 Key: HADOOP-12169
 URL: https://issues.apache.org/jira/browse/HADOOP-12169
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.6.0, 2.7.0, 2.7.1
Reporter: Pieter Reuse
Assignee: Pieter Reuse
 Attachments: HADOOP-12169-001.patch


 Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour 
 this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty 
 bucket returns an empty list, while doing the same on an empty directory, 
 returns an array of length 1 containing only this directory itself.
 The bugfix is quite simple. In the line of code {code}...if 
 (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. 
 the fs and f is not. Therefore, this returns false while it shouldn't. The 
 bugfix to make f qualified in this line of code.
 More formally: accoring to the formal definition of [The Hadoop FileSystem 
 API 
 Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/],
  more specifically FileSystem.listStatus, only child elements of a directory 
 should be returned upon a listStatus()-call.
 In detail: 
 {code}
 elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where 
 f(c) == True]
 {code}
 and
 {code}
 def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
 {code}
 Which translates to the result of listStatus on an empty directory being an 
 empty list. This is the same behaviour as ls has in Unix, which is what 
 someone would expect from a FileSystem.
 Note: it seemed appropriate to add the test of this patch to the same file as 
 the test for HADOOP-11918, but as a result, one of the two will have to be 
 rebased wrt. the other before being applied to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list

2015-07-02 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-12169:
--
Affects Version/s: 2.7.1
   2.7.0

 ListStatus on empty dir in S3A lists itself instead of returning an empty list
 --

 Key: HADOOP-12169
 URL: https://issues.apache.org/jira/browse/HADOOP-12169
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.7.0, 2.7.1
Reporter: Pieter Reuse
Assignee: Pieter Reuse
 Attachments: HADOOP-12169-001.patch


 Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour 
 this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty 
 bucket returns an empty list, while doing the same on an empty directory, 
 returns an array of length 1 containing only this directory itself.
 The bugfix is quite simple. In the line of code {code}...if 
 (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. 
 the fs and f is not. Therefore, this returns false while it shouldn't. The 
 bugfix to make f qualified in this line of code.
 More formally: accoring to the formal definition of [The Hadoop FileSystem 
 API 
 Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/],
  more specifically FileSystem.listStatus, only child elements of a directory 
 should be returned upon a listStatus()-call.
 In detail: 
 {code}
 elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where 
 f(c) == True]
 {code}
 and
 {code}
 def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
 {code}
 Which translates to the result of listStatus on an empty directory being an 
 empty list. This is the same behaviour as ls has in Unix, which is what 
 someone would expect from a FileSystem.
 Note: it seemed appropriate to add the test of this patch to the same file as 
 the test for HADOOP-11918, but as a result, one of the two will have to be 
 rebased wrt. the other before being applied to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list

2015-07-01 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-12169 started by Pieter Reuse.
-
 ListStatus on empty dir in S3A lists itself instead of returning an empty list
 --

 Key: HADOOP-12169
 URL: https://issues.apache.org/jira/browse/HADOOP-12169
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Reporter: Pieter Reuse
Assignee: Pieter Reuse

 Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour 
 this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty 
 bucket returns an empty list, while doing the same on an empty directory, 
 returns an array of length 1 containing only this directory itself.
 The bugfix is quite simple. In the line of code {code}...if 
 (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. 
 the fs and f is not. Therefore, this returns false while it shouldn't. The 
 bugfix to make f qualified in this line of code.
 More formally: accoring to the formal definition of [The Hadoop FileSystem 
 API 
 Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/],
  more specifically FileSystem.listStatus, only child elements of a directory 
 should be returned upon a listStatus()-call.
 In detail: 
 {code}
 elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where 
 f(c) == True]
 {code}
 and
 {code}
 def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
 {code}
 Which translates to the result of listStatus on an empty directory being an 
 empty list. This is the same behaviour as ls has in Unix, which is what 
 someone would expect from a FileSystem.
 Note: it seemed appropriate to add the test of this patch to the same file as 
 the test for HADOOP-11918, but as a result, one of the two will have to be 
 rebased wrt. the other before being applied to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list

2015-07-01 Thread Pieter Reuse (JIRA)

Pieter Reuse created HADOOP-12169:
-

 Summary: ListStatus on empty dir in S3A lists itself instead of 
returning an empty list
 Key: HADOOP-12169
 URL: https://issues.apache.org/jira/browse/HADOOP-12169
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Reporter: Pieter Reuse
Assignee: Pieter Reuse


Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this 
introduces to the S3AFileSystem-class. Calling ListStatus() on an empty bucket 
returns an empty list, while doing the same on an empty directory, returns an 
array of length 1 containing only this directory itself.

The bugfix is quite simple. In the line of code {code}...if 
(keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. the 
fs and f is not. Therefore, this returns false while it shouldn't. The bugfix 
to make f qualified in this line of code.

More formally: accoring to the formal definition of [The Hadoop FileSystem API 
Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/],
 more specifically FileSystem.listStatus, only child elements of a directory 
should be returned upon a listStatus()-call.

In detail: 
{code}
elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) 
== True]
{code}
and
{code}
def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
{code}

Which translates to the result of listStatus on an empty directory being an 
empty list. This is the same behaviour as ls has in Unix, which is what someone 
would expect from a FileSystem.

Note: it seemed appropriate to add the test of this patch to the same file as 
the test for HADOOP-11918, but as a result, one of the two will have to be 
rebased wrt. the other before being applied to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list

2015-07-01 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-12169:
--
Description: 
Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this 
introduces to the S3AFileSystem-class. Calling ListStatus() on an empty bucket 
returns an empty list, while doing the same on an empty directory, returns an 
array of length 1 containing only this directory itself.

The bugfix is quite simple. In the line of code {code}...if 
(keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. the 
fs and f is not. Therefore, this returns false while it shouldn't. The bugfix 
to make f qualified in this line of code.

More formally: accoring to the formal definition of [The Hadoop FileSystem API 
Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/],
 more specifically FileSystem.listStatus, only child elements of a directory 
should be returned upon a listStatus()-call.

In detail: 
{code}
elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) 
== True]
{code}
and
{code}
def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
{code}

Which translates to the result of listStatus on an empty directory being an 
empty list. This is the same behaviour as ls has in Unix, which is what someone 
would expect from a FileSystem.

Note: it seemed appropriate to add the test of this patch to the same file as 
the test for HADOOP-11918, but as a result, one of the two will have to be 
rebased wrt. the other before being applied to trunk.

  was:
Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour this 
introduces to the S3AFileSystem-class. Calling ListStatus() on an empty bucket 
returns an empty list, while doing the same on an empty directory, returns an 
array of length 1 containing only this directory itself.

The bugfix is quite simple. In the line of code {code}...if 
(keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. the 
fs and f is not. Therefore, this returns false while it shouldn't. The bugfix 
to make f qualified in this line of code.

More formally: accoring to the formal definition of [The Hadoop FileSystem API 
Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/],
 more specifically FileSystem.listStatus, only child elements of a directory 
should be returned upon a listStatus()-call.

In detail: 
{code}
elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where f(c) 
== True]
{code}
and
{code}
def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
{code}

Which translates to the result of listStatus on an empty directory being an 
empty list. This is the same behaviour as ls has in Unix, which is what someone 
would expect from a FileSystem.

Note: it seemed appropriate to add the test of this patch to the same file as 
the test for HADOOP-11918, but as a result, one of the two will have to be 
rebased wrt. the other before being applied to trunk.


 ListStatus on empty dir in S3A lists itself instead of returning an empty list
 --

 Key: HADOOP-12169
 URL: https://issues.apache.org/jira/browse/HADOOP-12169
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Reporter: Pieter Reuse
Assignee: Pieter Reuse
 Attachments: HADOOP-12169-001.patch


 Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour 
 this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty 
 bucket returns an empty list, while doing the same on an empty directory, 
 returns an array of length 1 containing only this directory itself.
 The bugfix is quite simple. In the line of code {code}...if 
 (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. 
 the fs and f is not. Therefore, this returns false while it shouldn't. The 
 bugfix to make f qualified in this line of code.
 More formally: accoring to the formal definition of [The Hadoop FileSystem 
 API 
 Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/],
  more specifically FileSystem.listStatus, only child elements of a directory 
 should be returned upon a listStatus()-call.
 In detail: 
 {code}
 elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where 
 f(c) == True]
 {code}
 and
 {code}
 def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
 {code}
 Which translates to the result of listStatus on an empty directory being an 
 empty list. This is the same behaviour as ls has in Unix, which is what 
 someone would expect from a FileSystem.
 Note: it seemed appropriate to add the test of this patch to the same file as 
 the test for

[jira] [Updated] (HADOOP-12169) ListStatus on empty dir in S3A lists itself instead of returning an empty list

2015-07-01 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-12169:
--
Status: Patch Available  (was: In Progress)

 ListStatus on empty dir in S3A lists itself instead of returning an empty list
 --

 Key: HADOOP-12169
 URL: https://issues.apache.org/jira/browse/HADOOP-12169
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Reporter: Pieter Reuse
Assignee: Pieter Reuse
 Attachments: HADOOP-12169-001.patch


 Upon testing the patch for HADOOP-11918, I stumbled upon a weird behaviour 
 this introduces to the S3AFileSystem-class. Calling ListStatus() on an empty 
 bucket returns an empty list, while doing the same on an empty directory, 
 returns an array of length 1 containing only this directory itself.
 The bugfix is quite simple. In the line of code {code}...if 
 (keyPath.equals(f)...{code} (S3AFileSystem:758), keyPath is qualified wrt. 
 the fs and f is not. Therefore, this returns false while it shouldn't. The 
 bugfix to make f qualified in this line of code.
 More formally: accoring to the formal definition of [The Hadoop FileSystem 
 API 
 Definition|https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/filesystem/],
  more specifically FileSystem.listStatus, only child elements of a directory 
 should be returned upon a listStatus()-call.
 In detail: 
 {code}
 elif isDir(FS, p): result [getFileStatus(c) for c in children(FS, p) where 
 f(c) == True]
 {code}
 and
 {code}
 def children(FS, p) = {q for q in paths(FS) where parent(q) == p}
 {code}
 Which translates to the result of listStatus on an empty directory being an 
 empty list. This is the same behaviour as ls has in Unix, which is what 
 someone would expect from a FileSystem.
 Note: it seemed appropriate to add the test of this patch to the same file as 
 the test for HADOOP-11918, but as a result, one of the two will have to be 
 rebased wrt. the other before being applied to trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11918) Listing an empty s3a root directory throws FileNotFound.

2015-06-18 Thread Pieter Reuse (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Reuse updated HADOOP-11918:
--
Attachment: HADOOP-11918-002.patch

I verified this patch with the test output below. 

*However*, running all the JUnit tests has the effect of spurious directories 
in the test bucket. This means that the bucket wasn't empty during the 
TestS3AFileSystem.testListRootDirectory, while the bug only surfaces on an 
empty directory. 

Because of this, I have added four lines to the test cleaning the entire bucket 
before doing the actual assert.

---
 T E S T S
---
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractSeek
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.706 sec - 
in org.apache.hadoop.fs.contract.s3a.TestS3AContractSeek
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractDelete
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.564 sec - in 
org.apache.hadoop.fs.contract.s3a.TestS3AContractDelete
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.598 sec - in 
org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.443 sec - in 
org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractCreate
Tests run: 6, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 10.286 sec - in 
org.apache.hadoop.fs.contract.s3a.TestS3AContractCreate
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractMkdir
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.942 sec - in 
org.apache.hadoop.fs.contract.s3a.TestS3AContractMkdir
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRename
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.752 sec - in 
org.apache.hadoop.fs.contract.s3a.TestS3AContractRename
Running org.apache.hadoop.fs.s3a.TestS3AFastOutputStream
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 16.564 sec - in 
org.apache.hadoop.fs.s3a.TestS3AFastOutputStream
Running org.apache.hadoop.fs.s3a.TestS3ABlocksize
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.872 sec - in 
org.apache.hadoop.fs.s3a.TestS3ABlocksize
Running org.apache.hadoop.fs.s3a.TestS3AFileSystemContract
Tests run: 43, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 91.585 sec - 
in org.apache.hadoop.fs.s3a.TestS3AFileSystemContract
Running org.apache.hadoop.fs.s3a.TestS3AConfiguration
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.948 sec - in 
org.apache.hadoop.fs.s3a.TestS3AConfiguration
Running org.apache.hadoop.fs.s3a.scale.TestS3ADeleteManyFiles
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 533.853 sec - 
in org.apache.hadoop.fs.s3a.scale.TestS3ADeleteManyFiles
Running org.apache.hadoop.fs.s3a.TestS3AFileSystem
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.476 sec - in 
org.apache.hadoop.fs.s3a.TestS3AFileSystem

Results :

Tests run: 101, Failures: 0, Errors: 0, Skipped: 3

[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 12:34.020s
[INFO] Finished at: Thu Jun 18 15:57:16 CEST 2015
[INFO] Final Memory: 25M/419M
[INFO] 


 Listing an empty s3a root directory throws FileNotFound.
 

 Key: HADOOP-11918
 URL: https://issues.apache.org/jira/browse/HADOOP-11918
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
  Labels: BB2015-05-TBR, s3
 Attachments: HADOOP-11918-002.patch, HADOOP-11918.000.patch, 
 HADOOP-11918.001.patch


 With an empty s3 bucket and run
 {code}
 $ hadoop fs -D... -ls s3a://hdfs-s3a-test/
 15/05/04 15:21:34 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 ls: `s3a://hdfs-s3a-test/': No such file or directory
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

43 matches

Mail list logo