Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

2013-06-14 Thread Stephen Watt
This is a good point Andrew. The hangout was actually the first time I'd heard 
about the AbstractFileSystem class. I've been doing some further analysis on 
the source in Hadoop 2.0 and when I look at the Hadoop 2.0 implementation of 
DistributedFileSystem and LocalFileSystem class they extend the FileSystem 
class and not AbstractFileSystem. I would imagine if the plan for Hadoop 2.0 is 
to build FileSystem implementations using the AbstractFileSystem, then those 
two would use it, so I'm a bit confused.

Perhaps I'm looking in the wrong place? Sanjay (or anyone else), could you 
clarify this for us?

Regards
Steve Watt

- Original Message -
From: Andrew Wang andrew.w...@cloudera.com
To: common-dev@hadoop.apache.org
Cc: mbhandar...@gopivotal.com, shv hadoop shv.had...@gmail.com, 
ste...@hortonworks.com, erlv5...@gmail.com, shaposh...@gmail.com, 
apurt...@apache.org, cdoug...@apache.org, jayh...@cs.ucsc.edu, 
san...@hortonworks.com
Sent: Monday, June 10, 2013 5:14:16 PM
Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop 
FileSystems + Workshop

Thanks for the summary Steve, very useful.

I'm wondering a bit about the point on testing AbstractFileSystem rather
than FileSystem. While these are both wrappers for DFSClient, they're
pretty different in terms of the APIs they expose. Furthermore, AFS is not
actually a client-facing API; clients interact with an AFS through
FileContext.

I ask because I did some work trying to unify the symlink tests for both
FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things
like the default mkdir semantics are different; you can see some of the
contortions in HADOOP-9370. I ultimately ended up just adhering to the
FileContext-style behavior, but as a result I'm not really testing some
parts of FileSystem.

Are we going to end up with two different sets of validation tests? Or just
choose one API over the other? FileSystem is supposed to eventually be
deprecated in favor of FileContext (HADOOP-6446, filed in 2009), but actual
uptake in practice has been slow.

Best,
Andrew


On Mon, Jun 10, 2013 at 1:49 PM, Stephen Watt sw...@redhat.com wrote:

 For those interested - I posted a recap of this mornings Google Hangout on
 the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress

 On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote:

  Hi Folks
 
  Per Roman's recommendation I've created a Wiki Page for organizing the
 work and managing the logistics -
 https://wiki.apache.org/hadoop/HCFS/Progress
 
  I'd like to propose a Google Hangout at 9am PST on Monday June 10th to
 get together and discuss the initiative. Please respond back to me if
 you're interested or would like to propose a different time. I'll update
 our Wiki page with the logistics.
 
  Regards
  Steve Watt
 
  - Original Message -
  From: Roman Shaposhnik shaposh...@gmail.com
  To: Stephen Watt sw...@redhat.com
  Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, shv
 hadoop shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com,
 apurt...@apache.org
  Sent: Friday, May 31, 2013 5:28:58 PM
  Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative
 Hadoop FileSystems + Workshop
 
  On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote:
  What is the protocol for organizing the logistics and collaborating? I
 am loathe to flood common-dev with does this time work for you? emails
 from the interested parties. Do we create a high level JIRA ticket and
 collaborate and post comments and G+ meetup times on that ? Another option
 might be the Wiki, I'd be happy to be responsible with tracking progress on
 https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break
 initiatives down into more granular JIRA tickets.
 
  I'd go with a wiki page and perhaps http://www.doodle.com/
 
  After we've had a few G+ hangouts, for those that would like to meet
 face to face, I have also made an all day reservation for a meeting room
 that can hold up to 20 people at our Red Hat Office in Castro Street,
 Mountain View on Tuesday June 25th (the day before Hadoop Summit and a
 short drive away). We don't have to use the whole day, but it gives us some
 flexibility around the availability of interested parties. I was thinking
 something along the lines of 10am - 3pm. We are happy to cater lunch.
 
  That also would be very much appreciated!
 
  Thanks,
  Roman.



Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

2013-06-14 Thread Andrew Wang
Hey Steve,

I agree that it's confusing. FileSystem and FileContext are essentially two
parallel sets of interfaces for accessing filesystems in Hadoop.
FileContext splits the interface and shared code with AbstractFileSystem,
while FileSystem is all-in-one. If you're looking for the AFS equivalents
to DistributedFileSystem and LocalFileSystem, see Hdfs and LocalFs.

Realistically, FileSystem isn't going to be deprecated and removed any time
soon. There are lots of 3rd-party FileSystem implementations, and most apps
today use FileSystem (including many HDFS internals, like trash and the
shell).

When I read the wiki page, I figured that the mention of AFS was
essentially a typo, since everyone's been steaming ahead with FileSystem.
Standardizing FileSystem makes total sense to me, I just wanted to confirm
that plan.

Best,
Andrew


On Fri, Jun 14, 2013 at 9:38 AM, Stephen Watt sw...@redhat.com wrote:

 This is a good point Andrew. The hangout was actually the first time I'd
 heard about the AbstractFileSystem class. I've been doing some further
 analysis on the source in Hadoop 2.0 and when I look at the Hadoop 2.0
 implementation of DistributedFileSystem and LocalFileSystem class they
 extend the FileSystem class and not AbstractFileSystem. I would imagine if
 the plan for Hadoop 2.0 is to build FileSystem implementations using the
 AbstractFileSystem, then those two would use it, so I'm a bit confused.

 Perhaps I'm looking in the wrong place? Sanjay (or anyone else), could you
 clarify this for us?

 Regards
 Steve Watt

 - Original Message -
 From: Andrew Wang andrew.w...@cloudera.com
 To: common-dev@hadoop.apache.org
 Cc: mbhandar...@gopivotal.com, shv hadoop shv.had...@gmail.com,
 ste...@hortonworks.com, erlv5...@gmail.com, shaposh...@gmail.com,
 apurt...@apache.org, cdoug...@apache.org, jayh...@cs.ucsc.edu,
 san...@hortonworks.com
 Sent: Monday, June 10, 2013 5:14:16 PM
 Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop
 FileSystems + Workshop

 Thanks for the summary Steve, very useful.

 I'm wondering a bit about the point on testing AbstractFileSystem rather
 than FileSystem. While these are both wrappers for DFSClient, they're
 pretty different in terms of the APIs they expose. Furthermore, AFS is not
 actually a client-facing API; clients interact with an AFS through
 FileContext.

 I ask because I did some work trying to unify the symlink tests for both
 FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things
 like the default mkdir semantics are different; you can see some of the
 contortions in HADOOP-9370. I ultimately ended up just adhering to the
 FileContext-style behavior, but as a result I'm not really testing some
 parts of FileSystem.

 Are we going to end up with two different sets of validation tests? Or just
 choose one API over the other? FileSystem is supposed to eventually be
 deprecated in favor of FileContext (HADOOP-6446, filed in 2009), but actual
 uptake in practice has been slow.

 Best,
 Andrew


 On Mon, Jun 10, 2013 at 1:49 PM, Stephen Watt sw...@redhat.com wrote:

  For those interested - I posted a recap of this mornings Google Hangout
 on
  the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress
 
  On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote:
 
   Hi Folks
  
   Per Roman's recommendation I've created a Wiki Page for organizing the
  work and managing the logistics -
  https://wiki.apache.org/hadoop/HCFS/Progress
  
   I'd like to propose a Google Hangout at 9am PST on Monday June 10th to
  get together and discuss the initiative. Please respond back to me if
  you're interested or would like to propose a different time. I'll update
  our Wiki page with the logistics.
  
   Regards
   Steve Watt
  
   - Original Message -
   From: Roman Shaposhnik shaposh...@gmail.com
   To: Stephen Watt sw...@redhat.com
   Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, shv
  hadoop shv.had...@gmail.com, ste...@hortonworks.com,
 erlv5...@gmail.com,
  apurt...@apache.org
   Sent: Friday, May 31, 2013 5:28:58 PM
   Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative
  Hadoop FileSystems + Workshop
  
   On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com
 wrote:
   What is the protocol for organizing the logistics and collaborating? I
  am loathe to flood common-dev with does this time work for you? emails
  from the interested parties. Do we create a high level JIRA ticket and
  collaborate and post comments and G+ meetup times on that ? Another
 option
  might be the Wiki, I'd be happy to be responsible with tracking progress
 on
  https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break
  initiatives down into more granular JIRA tickets.
  
   I'd go with a wiki page and perhaps http://www.doodle.com/
  
   After we've had a few G+ hangouts, for those that would like to meet
  face to face, I have also made an all day reservation for a meeting room
  that can

Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

2013-06-10 Thread Stephen Watt
For those interested - I posted a recap of this mornings Google Hangout on the 
Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress

On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote:

 Hi Folks
 
 Per Roman's recommendation I've created a Wiki Page for organizing the work 
 and managing the logistics - https://wiki.apache.org/hadoop/HCFS/Progress
 
 I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get 
 together and discuss the initiative. Please respond back to me if you're 
 interested or would like to propose a different time. I'll update our Wiki 
 page with the logistics.
 
 Regards
 Steve Watt
 
 - Original Message -
 From: Roman Shaposhnik shaposh...@gmail.com
 To: Stephen Watt sw...@redhat.com
 Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, shv hadoop 
 shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com, 
 apurt...@apache.org
 Sent: Friday, May 31, 2013 5:28:58 PM
 Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop 
 FileSystems + Workshop
 
 On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote:
 What is the protocol for organizing the logistics and collaborating? I am 
 loathe to flood common-dev with does this time work for you? emails from 
 the interested parties. Do we create a high level JIRA ticket and 
 collaborate and post comments and G+ meetup times on that ? Another option 
 might be the Wiki, I'd be happy to be responsible with tracking progress on 
 https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break 
 initiatives down into more granular JIRA tickets.
 
 I'd go with a wiki page and perhaps http://www.doodle.com/
 
 After we've had a few G+ hangouts, for those that would like to meet face to 
 face, I have also made an all day reservation for a meeting room that can 
 hold up to 20 people at our Red Hat Office in Castro Street, Mountain View 
 on Tuesday June 25th (the day before Hadoop Summit and a short drive away). 
 We don't have to use the whole day, but it gives us some flexibility around 
 the availability of interested parties. I was thinking something along the 
 lines of 10am - 3pm. We are happy to cater lunch.
 
 That also would be very much appreciated!
 
 Thanks,
 Roman.


Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

2013-06-10 Thread Andrew Wang
Thanks for the summary Steve, very useful.

I'm wondering a bit about the point on testing AbstractFileSystem rather
than FileSystem. While these are both wrappers for DFSClient, they're
pretty different in terms of the APIs they expose. Furthermore, AFS is not
actually a client-facing API; clients interact with an AFS through
FileContext.

I ask because I did some work trying to unify the symlink tests for both
FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things
like the default mkdir semantics are different; you can see some of the
contortions in HADOOP-9370. I ultimately ended up just adhering to the
FileContext-style behavior, but as a result I'm not really testing some
parts of FileSystem.

Are we going to end up with two different sets of validation tests? Or just
choose one API over the other? FileSystem is supposed to eventually be
deprecated in favor of FileContext (HADOOP-6446, filed in 2009), but actual
uptake in practice has been slow.

Best,
Andrew


On Mon, Jun 10, 2013 at 1:49 PM, Stephen Watt sw...@redhat.com wrote:

 For those interested - I posted a recap of this mornings Google Hangout on
 the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress

 On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote:

  Hi Folks
 
  Per Roman's recommendation I've created a Wiki Page for organizing the
 work and managing the logistics -
 https://wiki.apache.org/hadoop/HCFS/Progress
 
  I'd like to propose a Google Hangout at 9am PST on Monday June 10th to
 get together and discuss the initiative. Please respond back to me if
 you're interested or would like to propose a different time. I'll update
 our Wiki page with the logistics.
 
  Regards
  Steve Watt
 
  - Original Message -
  From: Roman Shaposhnik shaposh...@gmail.com
  To: Stephen Watt sw...@redhat.com
  Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, shv
 hadoop shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com,
 apurt...@apache.org
  Sent: Friday, May 31, 2013 5:28:58 PM
  Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative
 Hadoop FileSystems + Workshop
 
  On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote:
  What is the protocol for organizing the logistics and collaborating? I
 am loathe to flood common-dev with does this time work for you? emails
 from the interested parties. Do we create a high level JIRA ticket and
 collaborate and post comments and G+ meetup times on that ? Another option
 might be the Wiki, I'd be happy to be responsible with tracking progress on
 https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break
 initiatives down into more granular JIRA tickets.
 
  I'd go with a wiki page and perhaps http://www.doodle.com/
 
  After we've had a few G+ hangouts, for those that would like to meet
 face to face, I have also made an all day reservation for a meeting room
 that can hold up to 20 people at our Red Hat Office in Castro Street,
 Mountain View on Tuesday June 25th (the day before Hadoop Summit and a
 short drive away). We don't have to use the whole day, but it gives us some
 flexibility around the availability of interested parties. I was thinking
 something along the lines of 10am - 3pm. We are happy to cater lunch.
 
  That also would be very much appreciated!
 
  Thanks,
  Roman.



Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

2013-06-07 Thread sanjay Radia
I plan to attend.
A 9:30 time is a little better for me.

sanjay

On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote:

 Hi Folks
 
 Per Roman's recommendation I've created a Wiki Page for organizing the work 
 and managing the logistics - https://wiki.apache.org/hadoop/HCFS/Progress
 
 I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get 
 together and discuss the initiative. Please respond back to me if you're 
 interested or would like to propose a different time. I'll update our Wiki 
 page with the logistics.
 
 Regards
 Steve Watt
 
 - Original Message -
 From: Roman Shaposhnik shaposh...@gmail.com
 To: Stephen Watt sw...@redhat.com
 Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, shv hadoop 
 shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com, 
 apurt...@apache.org
 Sent: Friday, May 31, 2013 5:28:58 PM
 Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop 
 FileSystems + Workshop
 
 On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote:
 What is the protocol for organizing the logistics and collaborating? I am 
 loathe to flood common-dev with does this time work for you? emails from 
 the interested parties. Do we create a high level JIRA ticket and 
 collaborate and post comments and G+ meetup times on that ? Another option 
 might be the Wiki, I'd be happy to be responsible with tracking progress on 
 https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break 
 initiatives down into more granular JIRA tickets.
 
 I'd go with a wiki page and perhaps http://www.doodle.com/
 
 After we've had a few G+ hangouts, for those that would like to meet face to 
 face, I have also made an all day reservation for a meeting room that can 
 hold up to 20 people at our Red Hat Office in Castro Street, Mountain View 
 on Tuesday June 25th (the day before Hadoop Summit and a short drive away). 
 We don't have to use the whole day, but it gives us some flexibility around 
 the availability of interested parties. I was thinking something along the 
 lines of 10am - 3pm. We are happy to cater lunch.
 
 That also would be very much appreciated!
 
 Thanks,
 Roman.



Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

2013-06-06 Thread Andrew Purtell
The proposed time (9am PST Monday June 10th) is good for me.


On Thu, Jun 6, 2013 at 5:14 AM, Stephen Watt sw...@redhat.com wrote:

 Hi Folks

 Per Roman's recommendation I've created a Wiki Page for organizing the
 work and managing the logistics -
 https://wiki.apache.org/hadoop/HCFS/Progress

 I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get
 together and discuss the initiative. Please respond back to me if you're
 interested or would like to propose a different time. I'll update our Wiki
 page with the logistics.

 Regards
 Steve Watt

 - Original Message -
 From: Roman Shaposhnik shaposh...@gmail.com
 To: Stephen Watt sw...@redhat.com
 Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, shv hadoop
 shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com,
 apurt...@apache.org
 Sent: Friday, May 31, 2013 5:28:58 PM
 Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop
 FileSystems + Workshop

 On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote:
  What is the protocol for organizing the logistics and collaborating? I
 am loathe to flood common-dev with does this time work for you? emails
 from the interested parties. Do we create a high level JIRA ticket and
 collaborate and post comments and G+ meetup times on that ? Another option
 might be the Wiki, I'd be happy to be responsible with tracking progress on
 https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break
 initiatives down into more granular JIRA tickets.

 I'd go with a wiki page and perhaps http://www.doodle.com/

  After we've had a few G+ hangouts, for those that would like to meet
 face to face, I have also made an all day reservation for a meeting room
 that can hold up to 20 people at our Red Hat Office in Castro Street,
 Mountain View on Tuesday June 25th (the day before Hadoop Summit and a
 short drive away). We don't have to use the whole day, but it gives us some
 flexibility around the availability of interested parties. I was thinking
 something along the lines of 10am - 3pm. We are happy to cater lunch.

 That also would be very much appreciated!

 Thanks,
 Roman.




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

2013-06-05 Thread Stephen Watt
Hi Folks

Per Roman's recommendation I've created a Wiki Page for organizing the work and 
managing the logistics - https://wiki.apache.org/hadoop/HCFS/Progress

I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get 
together and discuss the initiative. Please respond back to me if you're 
interested or would like to propose a different time. I'll update our Wiki page 
with the logistics.

Regards
Steve Watt

- Original Message -
From: Roman Shaposhnik shaposh...@gmail.com
To: Stephen Watt sw...@redhat.com
Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, shv hadoop 
shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com, 
apurt...@apache.org
Sent: Friday, May 31, 2013 5:28:58 PM
Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop 
FileSystems + Workshop

On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote:
 What is the protocol for organizing the logistics and collaborating? I am 
 loathe to flood common-dev with does this time work for you? emails from 
 the interested parties. Do we create a high level JIRA ticket and collaborate 
 and post comments and G+ meetup times on that ? Another option might be the 
 Wiki, I'd be happy to be responsible with tracking progress on 
 https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break 
 initiatives down into more granular JIRA tickets.

I'd go with a wiki page and perhaps http://www.doodle.com/

 After we've had a few G+ hangouts, for those that would like to meet face to 
 face, I have also made an all day reservation for a meeting room that can 
 hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on 
 Tuesday June 25th (the day before Hadoop Summit and a short drive away). We 
 don't have to use the whole day, but it gives us some flexibility around the 
 availability of interested parties. I was thinking something along the lines 
 of 10am - 3pm. We are happy to cater lunch.

That also would be very much appreciated!

Thanks,
Roman.


Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

2013-05-31 Thread Stephen Watt
Hi Folks

I am grateful for the interest and to get so many responses (interested parties 
that responded are on CC).

I like Steve Loughran's idea of having a few G+ hangouts first to get to some 
consensus on how to organize the work as well as hear his thoughts about 
leveraging the Hadoop FileSystem tests he's already developed for the SWIFT 
object store. I am also keen to present/discuss the work we've (Red Hat) done 
around our perception of the state of the art for filesystem semantics and 
their test coverage to validate if the community at least has a shared point of 
view, which I think would be a good starting point.

What is the protocol for organizing the logistics and collaborating? I am 
loathe to flood common-dev with does this time work for you? emails from the 
interested parties. Do we create a high level JIRA ticket and collaborate and 
post comments and G+ meetup times on that ? Another option might be the Wiki, 
I'd be happy to be responsible with tracking progress on 
https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break 
initiatives down into more granular JIRA tickets.

After we've had a few G+ hangouts, for those that would like to meet face to 
face, I have also made an all day reservation for a meeting room that can hold 
up to 20 people at our Red Hat Office in Castro Street, Mountain View on 
Tuesday June 25th (the day before Hadoop Summit and a short drive away). We 
don't have to use the whole day, but it gives us some flexibility around the 
availability of interested parties. I was thinking something along the lines of 
10am - 3pm. We are happy to cater lunch. 

Regards
Steve Watt

- Original Message -
From: Steve Loughran ste...@hortonworks.com
To: common-dev@hadoop.apache.org
Sent: Friday, May 24, 2013 3:47:04 PM
Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop 
FileSystems + Workshop

On 24 May 2013 00:52, Stephen Watt sw...@redhat.com wrote:

 Hi Folks

 Hadoop's pluggable filesystem architecture supports the ability to enable
 an alternate filesystem for use with Hadoop by writing a plugin for it. We
 now have several alternate filesystems that have Hadoop FileSystem plugins
 and because this isn't a very well understood topic, I've been working on a
 page on the project wiki to bring this all together -
 http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project
 has been opening up Ambari to support any configured Hadoop FileSystem (as
 opposed to just HDFS) over at
 https://issues.apache.org/jira/browse/AMBARI-1817

 My team (over at Red Hat) have been working on writing a Hadoop FileSystem
 plugin for the glusterfs filesystem and have been finding that some of the
 expected semantics of the operations within the Abstract FileSystem class
 are a little ambiguous. With that said, we've joined Steve Loughran in
 attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0
 FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371

 It seems to me that once we had these semantics defined, it would be good
 for consistency of implementation if we could make sure they are well
 understood and properly implemented by the community of folks writing
 Hadoop FileSystem plugins. To that end, we might work to ensure that those
 semantics are tested within an exhaustive test framework that focuses on
 the abstract Hadoop FileSystem layer. Each FileSystem provider could run
 the tests to ensure their plugin implementation and behavior is consistent
 with the expectation. Perhaps a broader extension of
 https://issues.apache.org/jira/browse/HADOOP-9258.


I have a plan for starting those tests, pulling up the Swift ones when they
are checked in. Big tests that do scale, and that verify the assumptions
that MR, HBase c are where we are weakest. The defacto definition of FS
sematics are the apps, and its them that currently find the problems (e.g
MAPREDUCE-5264)


 If folks are interested in these goals, I could host a
 workshop/discussion/hackday in Mountain View to get local people together
 (perhaps a Google Hangout for the remote folks) to keep the ball rolling on
 the semantics discussion and test creation. As a side note, I think this
 could also turn out be quite an effective means of introducing FileSystem
 vendors to the ASF and getting them contributing to these aspects of the
 project.


Can we start with some G+ hangouts to get to know each other and have some
broader participation (myself, the others working on Swift, people who have
done S3 (Tom, some of the amazon folk), etc...), Then when a workshop is
held, it's got some clearer objectives how do we test this. I would want
the FS semantics to be locked down in some online discussions/JIRA rather
than come back after a night's sleep to discover it had be defined with
tests.

-steve


Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

2013-05-31 Thread Roman Shaposhnik
On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote:
 What is the protocol for organizing the logistics and collaborating? I am 
 loathe to flood common-dev with does this time work for you? emails from 
 the interested parties. Do we create a high level JIRA ticket and collaborate 
 and post comments and G+ meetup times on that ? Another option might be the 
 Wiki, I'd be happy to be responsible with tracking progress on 
 https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break 
 initiatives down into more granular JIRA tickets.

I'd go with a wiki page and perhaps http://www.doodle.com/

 After we've had a few G+ hangouts, for those that would like to meet face to 
 face, I have also made an all day reservation for a meeting room that can 
 hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on 
 Tuesday June 25th (the day before Hadoop Summit and a short drive away). We 
 don't have to use the whole day, but it gives us some flexibility around the 
 availability of interested parties. I was thinking something along the lines 
 of 10am - 3pm. We are happy to cater lunch.

That also would be very much appreciated!

Thanks,
Roman.


Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

2013-05-24 Thread Milind Bhandarkar
Thanks for the initiative, Steve.

A few folks from Pivotal and our partners would be interested in joining
the workshop/discussion.

- milind


---
Milind Bhandarkar
Chief Scientist, Machine Learning Platforms,
Pivotal
+1-650-523-3858 (W)
+1-408-666-8483 (C)


On Thu, May 23, 2013 at 4:52 PM, Stephen Watt sw...@redhat.com wrote:

 Hi Folks

 Hadoop's pluggable filesystem architecture supports the ability to enable
 an alternate filesystem for use with Hadoop by writing a plugin for it. We
 now have several alternate filesystems that have Hadoop FileSystem plugins
 and because this isn't a very well understood topic, I've been working on a
 page on the project wiki to bring this all together -
 http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project
 has been opening up Ambari to support any configured Hadoop FileSystem (as
 opposed to just HDFS) over at
 https://issues.apache.org/jira/browse/AMBARI-1817

 My team (over at Red Hat) have been working on writing a Hadoop FileSystem
 plugin for the glusterfs filesystem and have been finding that some of the
 expected semantics of the operations within the Abstract FileSystem class
 are a little ambiguous. With that said, we've joined Steve Loughran in
 attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0
 FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371

 It seems to me that once we had these semantics defined, it would be good
 for consistency of implementation if we could make sure they are well
 understood and properly implemented by the community of folks writing
 Hadoop FileSystem plugins. To that end, we might work to ensure that those
 semantics are tested within an exhaustive test framework that focuses on
 the abstract Hadoop FileSystem layer. Each FileSystem provider could run
 the tests to ensure their plugin implementation and behavior is consistent
 with the expectation. Perhaps a broader extension of
 https://issues.apache.org/jira/browse/HADOOP-9258.

 If folks are interested in these goals, I could host a
 workshop/discussion/hackday in Mountain View to get local people together
 (perhaps a Google Hangout for the remote folks) to keep the ball rolling on
 the semantics discussion and test creation. As a side note, I think this
 could also turn out be quite an effective means of introducing FileSystem
 vendors to the ASF and getting them contributing to these aspects of the
 project.

 Regards
 Steve Watt



Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

2013-05-24 Thread Brandon Li
Hi Kun,

In case you are looking for the NFS support to HDFS,  this JIRA might
interest you: HDFS-4750.

Thanks,
Brandon Li


On Thu, May 23, 2013 at 6:43 PM, Kun Ling lkun.e...@gmail.com wrote:

 Hi Stephen Watt,
 I am a fresh  developer trying to add a NFS-like FileSystem support for
 Hadoop, and also have some confusion about the FileSystem Semantics.

Since I live  in East Asia, I'd like to attend via Google Hangout if
 possible.

Thanks.

 +1 Kun Ling


 yours,
 Kun Ling


 On Fri, May 24, 2013 at 7:52 AM, Stephen Watt sw...@redhat.com wrote:

  Hi Folks
 
  Hadoop's pluggable filesystem architecture supports the ability to enable
  an alternate filesystem for use with Hadoop by writing a plugin for it.
 We
  now have several alternate filesystems that have Hadoop FileSystem
 plugins
  and because this isn't a very well understood topic, I've been working
 on a
  page on the project wiki to bring this all together -
  http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project
  has been opening up Ambari to support any configured Hadoop FileSystem
 (as
  opposed to just HDFS) over at
  https://issues.apache.org/jira/browse/AMBARI-1817
 
  My team (over at Red Hat) have been working on writing a Hadoop
 FileSystem
  plugin for the glusterfs filesystem and have been finding that some of
 the
  expected semantics of the operations within the Abstract FileSystem class
  are a little ambiguous. With that said, we've joined Steve Loughran in
  attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0
  FileSystem class over at
 https://issues.apache.org/jira/browse/HADOOP-9371
 
  It seems to me that once we had these semantics defined, it would be good
  for consistency of implementation if we could make sure they are well
  understood and properly implemented by the community of folks writing
  Hadoop FileSystem plugins. To that end, we might work to ensure that
 those
  semantics are tested within an exhaustive test framework that focuses on
  the abstract Hadoop FileSystem layer. Each FileSystem provider could run
  the tests to ensure their plugin implementation and behavior is
 consistent
  with the expectation. Perhaps a broader extension of
  https://issues.apache.org/jira/browse/HADOOP-9258.
 
  If folks are interested in these goals, I could host a
  workshop/discussion/hackday in Mountain View to get local people together
  (perhaps a Google Hangout for the remote folks) to keep the ball rolling
 on
  the semantics discussion and test creation. As a side note, I think this
  could also turn out be quite an effective means of introducing FileSystem
  vendors to the ASF and getting them contributing to these aspects of the
  project.
 
  Regards
  Steve Watt
 



 --
 http://www.lingcc.com



Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

2013-05-24 Thread Steve Loughran
On 24 May 2013 00:52, Stephen Watt sw...@redhat.com wrote:

 Hi Folks

 Hadoop's pluggable filesystem architecture supports the ability to enable
 an alternate filesystem for use with Hadoop by writing a plugin for it. We
 now have several alternate filesystems that have Hadoop FileSystem plugins
 and because this isn't a very well understood topic, I've been working on a
 page on the project wiki to bring this all together -
 http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project
 has been opening up Ambari to support any configured Hadoop FileSystem (as
 opposed to just HDFS) over at
 https://issues.apache.org/jira/browse/AMBARI-1817

 My team (over at Red Hat) have been working on writing a Hadoop FileSystem
 plugin for the glusterfs filesystem and have been finding that some of the
 expected semantics of the operations within the Abstract FileSystem class
 are a little ambiguous. With that said, we've joined Steve Loughran in
 attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0
 FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371

 It seems to me that once we had these semantics defined, it would be good
 for consistency of implementation if we could make sure they are well
 understood and properly implemented by the community of folks writing
 Hadoop FileSystem plugins. To that end, we might work to ensure that those
 semantics are tested within an exhaustive test framework that focuses on
 the abstract Hadoop FileSystem layer. Each FileSystem provider could run
 the tests to ensure their plugin implementation and behavior is consistent
 with the expectation. Perhaps a broader extension of
 https://issues.apache.org/jira/browse/HADOOP-9258.


I have a plan for starting those tests, pulling up the Swift ones when they
are checked in. Big tests that do scale, and that verify the assumptions
that MR, HBase c are where we are weakest. The defacto definition of FS
sematics are the apps, and its them that currently find the problems (e.g
MAPREDUCE-5264)


 If folks are interested in these goals, I could host a
 workshop/discussion/hackday in Mountain View to get local people together
 (perhaps a Google Hangout for the remote folks) to keep the ball rolling on
 the semantics discussion and test creation. As a side note, I think this
 could also turn out be quite an effective means of introducing FileSystem
 vendors to the ASF and getting them contributing to these aspects of the
 project.


Can we start with some G+ hangouts to get to know each other and have some
broader participation (myself, the others working on Swift, people who have
done S3 (Tom, some of the amazon folk), etc...), Then when a workshop is
held, it's got some clearer objectives how do we test this. I would want
the FS semantics to be locked down in some online discussions/JIRA rather
than come back after a night's sleep to discover it had be defined with
tests.

-steve


Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

2013-05-24 Thread Konstantin Shvachko
Makes sense, Steve.
There are a couple of guys here at WANdisco who will be interested in
joining.

Thanks,
--Konstantin

On Fri, May 24, 2013 at 10:15 AM, Milind Bhandarkar 
mbhandar...@gopivotal.com wrote:

 Thanks for the initiative, Steve.

 A few folks from Pivotal and our partners would be interested in joining
 the workshop/discussion.

 - milind


 ---
 Milind Bhandarkar
 Chief Scientist, Machine Learning Platforms,
 Pivotal
 +1-650-523-3858 (W)
 +1-408-666-8483 (C)


 On Thu, May 23, 2013 at 4:52 PM, Stephen Watt sw...@redhat.com wrote:

  Hi Folks
 
  Hadoop's pluggable filesystem architecture supports the ability to enable
  an alternate filesystem for use with Hadoop by writing a plugin for it.
 We
  now have several alternate filesystems that have Hadoop FileSystem
 plugins
  and because this isn't a very well understood topic, I've been working
 on a
  page on the project wiki to bring this all together -
  http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project
  has been opening up Ambari to support any configured Hadoop FileSystem
 (as
  opposed to just HDFS) over at
  https://issues.apache.org/jira/browse/AMBARI-1817
 
  My team (over at Red Hat) have been working on writing a Hadoop
 FileSystem
  plugin for the glusterfs filesystem and have been finding that some of
 the
  expected semantics of the operations within the Abstract FileSystem class
  are a little ambiguous. With that said, we've joined Steve Loughran in
  attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0
  FileSystem class over at
 https://issues.apache.org/jira/browse/HADOOP-9371
 
  It seems to me that once we had these semantics defined, it would be good
  for consistency of implementation if we could make sure they are well
  understood and properly implemented by the community of folks writing
  Hadoop FileSystem plugins. To that end, we might work to ensure that
 those
  semantics are tested within an exhaustive test framework that focuses on
  the abstract Hadoop FileSystem layer. Each FileSystem provider could run
  the tests to ensure their plugin implementation and behavior is
 consistent
  with the expectation. Perhaps a broader extension of
  https://issues.apache.org/jira/browse/HADOOP-9258.
 
  If folks are interested in these goals, I could host a
  workshop/discussion/hackday in Mountain View to get local people together
  (perhaps a Google Hangout for the remote folks) to keep the ball rolling
 on
  the semantics discussion and test creation. As a side note, I think this
  could also turn out be quite an effective means of introducing FileSystem
  vendors to the ASF and getting them contributing to these aspects of the
  project.
 
  Regards
  Steve Watt
 



Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop

2013-05-23 Thread Kun Ling
Hi Stephen Watt,
I am a fresh  developer trying to add a NFS-like FileSystem support for
Hadoop, and also have some confusion about the FileSystem Semantics.

   Since I live  in East Asia, I'd like to attend via Google Hangout if
possible.

   Thanks.

+1 Kun Ling


yours,
Kun Ling


On Fri, May 24, 2013 at 7:52 AM, Stephen Watt sw...@redhat.com wrote:

 Hi Folks

 Hadoop's pluggable filesystem architecture supports the ability to enable
 an alternate filesystem for use with Hadoop by writing a plugin for it. We
 now have several alternate filesystems that have Hadoop FileSystem plugins
 and because this isn't a very well understood topic, I've been working on a
 page on the project wiki to bring this all together -
 http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project
 has been opening up Ambari to support any configured Hadoop FileSystem (as
 opposed to just HDFS) over at
 https://issues.apache.org/jira/browse/AMBARI-1817

 My team (over at Red Hat) have been working on writing a Hadoop FileSystem
 plugin for the glusterfs filesystem and have been finding that some of the
 expected semantics of the operations within the Abstract FileSystem class
 are a little ambiguous. With that said, we've joined Steve Loughran in
 attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0
 FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371

 It seems to me that once we had these semantics defined, it would be good
 for consistency of implementation if we could make sure they are well
 understood and properly implemented by the community of folks writing
 Hadoop FileSystem plugins. To that end, we might work to ensure that those
 semantics are tested within an exhaustive test framework that focuses on
 the abstract Hadoop FileSystem layer. Each FileSystem provider could run
 the tests to ensure their plugin implementation and behavior is consistent
 with the expectation. Perhaps a broader extension of
 https://issues.apache.org/jira/browse/HADOOP-9258.

 If folks are interested in these goals, I could host a
 workshop/discussion/hackday in Mountain View to get local people together
 (perhaps a Google Hangout for the remote folks) to keep the ball rolling on
 the semantics discussion and test creation. As a side note, I think this
 could also turn out be quite an effective means of introducing FileSystem
 vendors to the ASF and getting them contributing to these aspects of the
 project.

 Regards
 Steve Watt




-- 
http://www.lingcc.com