Jordan Mendelson created HADOOP-10400:
-----------------------------------------

             Summary: Incorporate new S3A FileSystem implementation
                 Key: HADOOP-10400
                 URL: https://issues.apache.org/jira/browse/HADOOP-10400
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs
            Reporter: Jordan Mendelson


The s3native filesystem has a number of limitations (some of which were 
recently fixed by HADOOP-9454). This patch adds an s3a filesystem which uses 
the aws-sdk instead of the jets3t library. There are a number of improvements 
over s3native including:

- Parallel copy (rename) support (dramatically speeds up commits on large files)
- AWS S3 explorer compatible empty directories files "xyz/" instead of 
"xyz_$folder$" (reduces littering)
- Ignores s3native created _$folder$ files created by s3native and other S3 
browsing utilities
- Supports multiple output buffer dirs to even out IO when uploading files
- Supports IAM role-based authentication
- Allows setting a default canned ACL for uploads (public, private, etc.)
- Better error recovery handling
- Should handle input seeks without having to download the whole file (used for 
splits a lot)

This code is a copy of https://github.com/Aloisius/hadoop-s3a with patches to 
various pom files to get it to build against trunk. I've been using 0.0.1 in 
production with CDH 4 for several months and CDH 5 for a few days. The version 
here is 0.0.2 which changes around some keys to hopefully bring the key name 
style more inline with the rest of hadoop 2.x.

It should be largely compatible with s3native except that it won't recognize 
s3native's empty directory marker files "*_$folder$" since it uses "folder/" 
like the Amazon's S3 explorer to denote empty directories.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to