Re: Hadoop 2.0 - org.apache.hadoop.fs.Hdfs vs. DistributedFileSystem?

Eli Collins Tue, 18 Jun 2013 11:39:41 -0700

Hey Steve,

That's correct, see HADOOP-6223 for the history.  However, per Andrew
I don't think it's realistic to expect people to migrate off
FileSystem for a while (I filed HADOOP-6446 well over three years
ago).


The unfortunate consequence of the earlier decision to have parallel
interfaces rather than transition one over time means people
effectively need to end up implementing multiple backends - one that
gets used by clients of FileSystem, and one for clients of
FileContext.  Implementing in only one place significantly limits
adoption of the feature or file system because they can't be
effectively adopted in practice unless they're available to old and
new clients  (for example, this is why symlinks are getting backported
to FileSystem from FileContext).

Thanks,
Eli

On Tue, Jun 18, 2013 at 11:15 AM, Stephen Watt <sw...@redhat.com> wrote:
> Hi Folks
>
> My understanding is that from Hadoop 2.0 onwards the AbstractFileSystem is 
> now the strategic class to extend for writing Hadoop FileSystem plugins. This 
> is a departure from previous versions where one would extend the FileSystem 
> class. This seems to be reinforced by the hadoop-default.xml for Hadoop 2.0 
> in the Apache Wiki 
> (http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-project-dist/hadoop-common/core-default.xml)
>  which shows fs.AbstractFileSystem.hdfs.impl being set to 
> org.apache.hadoop.fs.Hdfs
>
> Is my assertion correct? Do we have community consensus around this? i.e. 
> Beyond the apache distro, are the commercial distros (Intel, Hortonworks, 
> Cloudera, WanDisco, EMC Pivotal, etc.) using org.apache.hadoop.fs.Hdfs as 
> their filesystem plugin for HDFS? What does one lose by using the 
> DistributedFileSystem class instead of the Hdfs class?
>
> Regards
> Steve Watt
>
> ----- Original Message -----
> From: "Andrew Wang" <andrew.w...@cloudera.com>
> To: common-dev@hadoop.apache.org
> Cc: "Milind Bhandarkar" <mbhandar...@gopivotal.com>, "shv hadoop" 
> <shv.had...@gmail.com>, "Steve Loughran" <ste...@hortonworks.com>, "Kun Ling" 
> <erlv5...@gmail.com>, "Roman Shaposhnik" <shaposh...@gmail.com>, "Andrew 
> Purtell" <apurt...@apache.org>, cdoug...@apache.org, jayh...@cs.ucsc.edu, 
> "Sanjay Radia" <san...@hortonworks.com>
> Sent: Friday, June 14, 2013 1:32:38 PM
> Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop 
> FileSystems + Workshop
>
> Hey Steve,
>
> I agree that it's confusing. FileSystem and FileContext are essentially two
> parallel sets of interfaces for accessing filesystems in Hadoop.
> FileContext splits the interface and shared code with AbstractFileSystem,
> while FileSystem is all-in-one. If you're looking for the AFS equivalents
> to DistributedFileSystem and LocalFileSystem, see Hdfs and LocalFs.
>
> Realistically, FileSystem isn't going to be deprecated and removed any time
> soon. There are lots of 3rd-party FileSystem implementations, and most apps
> today use FileSystem (including many HDFS internals, like trash and the
> shell).
>
> When I read the wiki page, I figured that the mention of AFS was
> essentially a typo, since everyone's been steaming ahead with FileSystem.
> Standardizing FileSystem makes total sense to me, I just wanted to confirm
> that plan.
>
> Best,
> Andrew
>
>
> On Fri, Jun 14, 2013 at 9:38 AM, Stephen Watt <sw...@redhat.com> wrote:
>
>> This is a good point Andrew. The hangout was actually the first time I'd
>> heard about the AbstractFileSystem class. I've been doing some further
>> analysis on the source in Hadoop 2.0 and when I look at the Hadoop 2.0
>> implementation of DistributedFileSystem and LocalFileSystem class they
>> extend the FileSystem class and not AbstractFileSystem. I would imagine if
>> the plan for Hadoop 2.0 is to build FileSystem implementations using the
>> AbstractFileSystem, then those two would use it, so I'm a bit confused.
>>
>> Perhaps I'm looking in the wrong place? Sanjay (or anyone else), could you
>> clarify this for us?
>>
>> Regards
>> Steve Watt
>>
>> ----- Original Message -----
>> From: "Andrew Wang" <andrew.w...@cloudera.com>
>> To: common-dev@hadoop.apache.org
>> Cc: mbhandar...@gopivotal.com, "shv hadoop" <shv.had...@gmail.com>,
>> ste...@hortonworks.com, erlv5...@gmail.com, shaposh...@gmail.com,
>> apurt...@apache.org, cdoug...@apache.org, jayh...@cs.ucsc.edu,
>> san...@hortonworks.com
>> Sent: Monday, June 10, 2013 5:14:16 PM
>> Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop
>> FileSystems + Workshop
>>
>> Thanks for the summary Steve, very useful.
>>
>> I'm wondering a bit about the point on testing AbstractFileSystem rather
>> than FileSystem. While these are both wrappers for DFSClient, they're
>> pretty different in terms of the APIs they expose. Furthermore, AFS is not
>> actually a client-facing API; clients interact with an AFS through
>> FileContext.
>>
>> I ask because I did some work trying to unify the symlink tests for both
>> FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things
>> like the default mkdir semantics are different; you can see some of the
>> contortions in HADOOP-9370. I ultimately ended up just adhering to the
>> FileContext-style behavior, but as a result I'm not really testing some
>> parts of FileSystem.
>>
>> Are we going to end up with two different sets of validation tests? Or just
>> choose one API over the other? FileSystem is supposed to eventually be
>> deprecated in favor of FileContext (HADOOP-6446, filed in 2009), but actual
>> uptake in practice has been slow.
>>
>> Best,
>> Andrew
>>
>>
>> On Mon, Jun 10, 2013 at 1:49 PM, Stephen Watt <sw...@redhat.com> wrote:
>>
>> > For those interested - I posted a recap of this mornings Google Hangout
>> on
>> > the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress
>> >
>> > On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote:
>> >
>> > > Hi Folks
>> > >
>> > > Per Roman's recommendation I've created a Wiki Page for organizing the
>> > work and managing the logistics -
>> > https://wiki.apache.org/hadoop/HCFS/Progress
>> > >
>> > > I'd like to propose a Google Hangout at 9am PST on Monday June 10th to
>> > get together and discuss the initiative. Please respond back to me if
>> > you're interested or would like to propose a different time. I'll update
>> > our Wiki page with the logistics.
>> > >
>> > > Regards
>> > > Steve Watt
>> > >
>> > > ----- Original Message -----
>> > > From: "Roman Shaposhnik" <shaposh...@gmail.com>
>> > > To: "Stephen Watt" <sw...@redhat.com>
>> > > Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, "shv
>> > hadoop" <shv.had...@gmail.com>, ste...@hortonworks.com,
>> erlv5...@gmail.com,
>> > apurt...@apache.org
>> > > Sent: Friday, May 31, 2013 5:28:58 PM
>> > > Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative
>> > Hadoop FileSystems + Workshop
>> > >
>> > > On Fri, May 31, 2013 at 1:00 PM, Stephen Watt <sw...@redhat.com>
>> wrote:
>> > >> What is the protocol for organizing the logistics and collaborating? I
>> > am loathe to flood common-dev with "does this time work for you?" emails
>> > from the interested parties. Do we create a high level JIRA ticket and
>> > collaborate and post comments and G+ meetup times on that ? Another
>> option
>> > might be the Wiki, I'd be happy to be responsible with tracking progress
>> on
>> > https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break
>> > initiatives down into more granular JIRA tickets.
>> > >
>> > > I'd go with a wiki page and perhaps http://www.doodle.com/
>> > >
>> > >> After we've had a few G+ hangouts, for those that would like to meet
>> > face to face, I have also made an all day reservation for a meeting room
>> > that can hold up to 20 people at our Red Hat Office in Castro Street,
>> > Mountain View on Tuesday June 25th (the day before Hadoop Summit and a
>> > short drive away). We don't have to use the whole day, but it gives us
>> some
>> > flexibility around the availability of interested parties. I was thinking
>> > something along the lines of 10am - 3pm. We are happy to cater lunch.
>> > >
>> > > That also would be very much appreciated!
>> > >
>> > > Thanks,
>> > > Roman.
>> >
>>

Re: Hadoop 2.0 - org.apache.hadoop.fs.Hdfs vs. DistributedFileSystem?

Reply via email to