Re: Re: Regarding HDFS and YARN support for S3

Takenori Sato Sun, 28 Sep 2014 16:35:21 -0700

Hi,

You may want to check HADOOP-10400
<https://issues.apache.org/jira/browse/HADOOP-10400> for the overhaul of S3
filesystem fixed in 2.6.


The subclass of AbstractFileSystem was filed as HADOOP-10643
<https://issues.apache.org/jira/browse/HADOOP-10643>, but which was not
included in HADOOP-10400 though I made a comment
<https://issues.apache.org/jira/browse/HADOOP-10400?focusedCommentId=14104967&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14104967>
.

I suggest not to use S3 as defaultFS as commented in "Why you cannot use S3
as a replacement for HDFS <https://wiki.apache.org/hadoop/AmazonS3>" to
avoid all sorts of these issues.

The best practice is to use S3 as a supplementary solution to Hadoop in
order to bring life cycle management(expiration and tiering), and
source/destination over the internet.

Thanks,
Takenori


On Sun, Sep 28, 2014 at 5:23 PM, Naganarasimha G R (Naga) <
[email protected]> wrote:

>  Hi Jay,
> Thanks a lot for replying and it clarifies most of it, but still some
> parts are not so clear .
> Some clarifications from my side :
> *| When you say "HDFS does not support fs.AbstractFileSystem.s3.impl"....
> That is true.  If your file system is configured using HDFS, then s3 urls
> will not be used, ever.*
> :) i think i am not doing this basic mistake . What we have done is we
> have configured *"viewfs://nsX" for "fs.defaultFS"* and one of the mount
> is S3 i.e. *"fs.viewfs.mounttable.nsX.link./uds" to "s3://hadoop/test1/"*
> .
> So it fails to even create YARNRunner instance as there is no mapping for
> *"**fs.AbstractFileSystem.s3.impl" *if run "./yarn jar"*. *But as per the
> code even if set *"fs.defaultFS"* to s3 it will not work as there is no
> mapping for S3's impl of AbstractFileSystem interface.
>
>  These are my further queries
>
>    1. Whats the purpose of *AbstractFileSystem *and *FileSystem *
>    interfaces?
>    2. Does HDFS default package(code) support configuration of S3 ? I see
>    S3 implementation of *FileSystem* interface(
>    *org.apache.hadoop.fs.s3.S3FileSystem*) *but not for **AbstractFileSystem
>    **!. *So i presume it doesn't support S3 completely. Whats the reason
>    for not supporting both ?
>    3. Suppose if i need to support Amazon S3 do i need to extend and
>    implement *AbstractFileSystem *and configure  
> *"**fs.AbstractFileSystem.s3.impl"
>    *or some thing more than this i need to take care*?*
>
>    Regards,
>
> Naga
>
>
>
> Huawei Technologies Co., Ltd.
> Phone:
> Fax:
> Mobile:  +91 9980040283
> Email: [email protected]
> Huawei Technologies Co., Ltd.
> Bantian, Longgang District,Shenzhen 518129, P.R.China
> http://www.huawei.com
>
>
>    ------------------------------
> *From:* jay vyas [[email protected]]
> *Sent:* Saturday, September 27, 2014 02:41
> *To:* [email protected]
> *Subject:* Re:
>
>      See https://wiki.apache.org/hadoop/HCFS/
>
> YES Yarn is written to the FileSystem interface.  It works on S3FileSystem
> and GlusterFileSystem and any other HCFS.
>
>  We have run , and continue to run, the many tests in apache bigtop's test
> suite against our hadoop clusters running on alternative file system
> implementations,
>  and it works.
>
>  When you say "HDFS does not support fs.AbstractFileSystem.s3.impl"....
> That is true.  If your file system is configured using HDFS, then s3 urls
> will not be used, ever.
>
>  When you create a FileSystem object in hadoop, it reads the uri (i.e.
> "glusterfs:///") and then finds the file system binding in your
> core-site.xml (i.e. fs.AbstractFileSystem.glusterfs.impl).
>
>  So the URI must have a corresponding entry in the core-site.xml.
>
>  As a reference implementation, you can see
> https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml
>
>
>
>
> On Fri, Sep 26, 2014 at 10:10 AM, Naganarasimha G R (Naga) <
> [email protected]> wrote:
>
>>   Hi All,
>>
>>  I have following doubts on pluggable FileSystem and YARN
>> 1. If all the implementations should extend FileSystem then why there is
>> a parallel class AbstractFileSystem. which ViewFS extends ?
>> 2. Is YARN supposed to run on any of the pluggable
>> org.apache.hadoop.fs.FileSystem like s3 ?
>> if its suppose to run then when submitting a job in the client side
>>  YARNRunner is calling FileContext.getFileContext(this.conf);
>> which is further calling FileContext.getAbstractFileSystem() which throws
>> exception for S3.
>> So i am not able to run YARN job with ViewFS with S3 as mount. And based
>> on the code even if i configure only S3 then also its going to fail.
>> 3. HDFS does not support "fs.AbstractFileSystem.s3.impl" with some
>> default class similar to org.apache.hadoop.fs.s3.S3FileSystem ?
>>
>>    Regards,
>>
>> Naga
>>
>>
>>
>> Huawei Technologies Co., Ltd.
>> Phone:
>> Fax:
>> Mobile:  +91 9980040283
>> Email: [email protected]
>> Huawei Technologies Co., Ltd.
>> http://www.huawei.com
>>
>>
>>
>
>
> --
> jay vyas
>

Re: Re: Regarding HDFS and YARN support for S3

Reply via email to