Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
This is a good point Andrew. The hangout was actually the first time I'd heard about the AbstractFileSystem class. I've been doing some further analysis on the source in Hadoop 2.0 and when I look at the Hadoop 2.0 implementation of DistributedFileSystem and LocalFileSystem class they extend the FileSystem class and not AbstractFileSystem. I would imagine if the plan for Hadoop 2.0 is to build FileSystem implementations using the AbstractFileSystem, then those two would use it, so I'm a bit confused. Perhaps I'm looking in the wrong place? Sanjay (or anyone else), could you clarify this for us? Regards Steve Watt - Original Message - From: Andrew Wang andrew.w...@cloudera.com To: common-dev@hadoop.apache.org Cc: mbhandar...@gopivotal.com, shv hadoop shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com, shaposh...@gmail.com, apurt...@apache.org, cdoug...@apache.org, jayh...@cs.ucsc.edu, san...@hortonworks.com Sent: Monday, June 10, 2013 5:14:16 PM Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop Thanks for the summary Steve, very useful. I'm wondering a bit about the point on testing AbstractFileSystem rather than FileSystem. While these are both wrappers for DFSClient, they're pretty different in terms of the APIs they expose. Furthermore, AFS is not actually a client-facing API; clients interact with an AFS through FileContext. I ask because I did some work trying to unify the symlink tests for both FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things like the default mkdir semantics are different; you can see some of the contortions in HADOOP-9370. I ultimately ended up just adhering to the FileContext-style behavior, but as a result I'm not really testing some parts of FileSystem. Are we going to end up with two different sets of validation tests? Or just choose one API over the other? FileSystem is supposed to eventually be deprecated in favor of FileContext (HADOOP-6446, filed in 2009), but actual uptake in practice has been slow. Best, Andrew On Mon, Jun 10, 2013 at 1:49 PM, Stephen Watt sw...@redhat.com wrote: For those interested - I posted a recap of this mornings Google Hangout on the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote: Hi Folks Per Roman's recommendation I've created a Wiki Page for organizing the work and managing the logistics - https://wiki.apache.org/hadoop/HCFS/Progress I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get together and discuss the initiative. Please respond back to me if you're interested or would like to propose a different time. I'll update our Wiki page with the logistics. Regards Steve Watt - Original Message - From: Roman Shaposhnik shaposh...@gmail.com To: Stephen Watt sw...@redhat.com Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, shv hadoop shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com, apurt...@apache.org Sent: Friday, May 31, 2013 5:28:58 PM Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote: What is the protocol for organizing the logistics and collaborating? I am loathe to flood common-dev with does this time work for you? emails from the interested parties. Do we create a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break initiatives down into more granular JIRA tickets. I'd go with a wiki page and perhaps http://www.doodle.com/ After we've had a few G+ hangouts, for those that would like to meet face to face, I have also made an all day reservation for a meeting room that can hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on Tuesday June 25th (the day before Hadoop Summit and a short drive away). We don't have to use the whole day, but it gives us some flexibility around the availability of interested parties. I was thinking something along the lines of 10am - 3pm. We are happy to cater lunch. That also would be very much appreciated! Thanks, Roman.
Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
Hey Steve, I agree that it's confusing. FileSystem and FileContext are essentially two parallel sets of interfaces for accessing filesystems in Hadoop. FileContext splits the interface and shared code with AbstractFileSystem, while FileSystem is all-in-one. If you're looking for the AFS equivalents to DistributedFileSystem and LocalFileSystem, see Hdfs and LocalFs. Realistically, FileSystem isn't going to be deprecated and removed any time soon. There are lots of 3rd-party FileSystem implementations, and most apps today use FileSystem (including many HDFS internals, like trash and the shell). When I read the wiki page, I figured that the mention of AFS was essentially a typo, since everyone's been steaming ahead with FileSystem. Standardizing FileSystem makes total sense to me, I just wanted to confirm that plan. Best, Andrew On Fri, Jun 14, 2013 at 9:38 AM, Stephen Watt sw...@redhat.com wrote: This is a good point Andrew. The hangout was actually the first time I'd heard about the AbstractFileSystem class. I've been doing some further analysis on the source in Hadoop 2.0 and when I look at the Hadoop 2.0 implementation of DistributedFileSystem and LocalFileSystem class they extend the FileSystem class and not AbstractFileSystem. I would imagine if the plan for Hadoop 2.0 is to build FileSystem implementations using the AbstractFileSystem, then those two would use it, so I'm a bit confused. Perhaps I'm looking in the wrong place? Sanjay (or anyone else), could you clarify this for us? Regards Steve Watt - Original Message - From: Andrew Wang andrew.w...@cloudera.com To: common-dev@hadoop.apache.org Cc: mbhandar...@gopivotal.com, shv hadoop shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com, shaposh...@gmail.com, apurt...@apache.org, cdoug...@apache.org, jayh...@cs.ucsc.edu, san...@hortonworks.com Sent: Monday, June 10, 2013 5:14:16 PM Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop Thanks for the summary Steve, very useful. I'm wondering a bit about the point on testing AbstractFileSystem rather than FileSystem. While these are both wrappers for DFSClient, they're pretty different in terms of the APIs they expose. Furthermore, AFS is not actually a client-facing API; clients interact with an AFS through FileContext. I ask because I did some work trying to unify the symlink tests for both FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things like the default mkdir semantics are different; you can see some of the contortions in HADOOP-9370. I ultimately ended up just adhering to the FileContext-style behavior, but as a result I'm not really testing some parts of FileSystem. Are we going to end up with two different sets of validation tests? Or just choose one API over the other? FileSystem is supposed to eventually be deprecated in favor of FileContext (HADOOP-6446, filed in 2009), but actual uptake in practice has been slow. Best, Andrew On Mon, Jun 10, 2013 at 1:49 PM, Stephen Watt sw...@redhat.com wrote: For those interested - I posted a recap of this mornings Google Hangout on the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote: Hi Folks Per Roman's recommendation I've created a Wiki Page for organizing the work and managing the logistics - https://wiki.apache.org/hadoop/HCFS/Progress I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get together and discuss the initiative. Please respond back to me if you're interested or would like to propose a different time. I'll update our Wiki page with the logistics. Regards Steve Watt - Original Message - From: Roman Shaposhnik shaposh...@gmail.com To: Stephen Watt sw...@redhat.com Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, shv hadoop shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com, apurt...@apache.org Sent: Friday, May 31, 2013 5:28:58 PM Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote: What is the protocol for organizing the logistics and collaborating? I am loathe to flood common-dev with does this time work for you? emails from the interested parties. Do we create a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break initiatives down into more granular JIRA tickets. I'd go with a wiki page and perhaps http://www.doodle.com/ After we've had a few G+ hangouts, for those that would like to meet face to face, I have also made an all day reservation for a meeting room that can
Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
For those interested - I posted a recap of this mornings Google Hangout on the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote: Hi Folks Per Roman's recommendation I've created a Wiki Page for organizing the work and managing the logistics - https://wiki.apache.org/hadoop/HCFS/Progress I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get together and discuss the initiative. Please respond back to me if you're interested or would like to propose a different time. I'll update our Wiki page with the logistics. Regards Steve Watt - Original Message - From: Roman Shaposhnik shaposh...@gmail.com To: Stephen Watt sw...@redhat.com Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, shv hadoop shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com, apurt...@apache.org Sent: Friday, May 31, 2013 5:28:58 PM Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote: What is the protocol for organizing the logistics and collaborating? I am loathe to flood common-dev with does this time work for you? emails from the interested parties. Do we create a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break initiatives down into more granular JIRA tickets. I'd go with a wiki page and perhaps http://www.doodle.com/ After we've had a few G+ hangouts, for those that would like to meet face to face, I have also made an all day reservation for a meeting room that can hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on Tuesday June 25th (the day before Hadoop Summit and a short drive away). We don't have to use the whole day, but it gives us some flexibility around the availability of interested parties. I was thinking something along the lines of 10am - 3pm. We are happy to cater lunch. That also would be very much appreciated! Thanks, Roman.
Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
Thanks for the summary Steve, very useful. I'm wondering a bit about the point on testing AbstractFileSystem rather than FileSystem. While these are both wrappers for DFSClient, they're pretty different in terms of the APIs they expose. Furthermore, AFS is not actually a client-facing API; clients interact with an AFS through FileContext. I ask because I did some work trying to unify the symlink tests for both FileContext and FileSystem (HADOOP-9370 and HADOOP-9355). Subtle things like the default mkdir semantics are different; you can see some of the contortions in HADOOP-9370. I ultimately ended up just adhering to the FileContext-style behavior, but as a result I'm not really testing some parts of FileSystem. Are we going to end up with two different sets of validation tests? Or just choose one API over the other? FileSystem is supposed to eventually be deprecated in favor of FileContext (HADOOP-6446, filed in 2009), but actual uptake in practice has been slow. Best, Andrew On Mon, Jun 10, 2013 at 1:49 PM, Stephen Watt sw...@redhat.com wrote: For those interested - I posted a recap of this mornings Google Hangout on the Wiki Page at https://wiki.apache.org/hadoop/HCFS/Progress On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote: Hi Folks Per Roman's recommendation I've created a Wiki Page for organizing the work and managing the logistics - https://wiki.apache.org/hadoop/HCFS/Progress I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get together and discuss the initiative. Please respond back to me if you're interested or would like to propose a different time. I'll update our Wiki page with the logistics. Regards Steve Watt - Original Message - From: Roman Shaposhnik shaposh...@gmail.com To: Stephen Watt sw...@redhat.com Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, shv hadoop shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com, apurt...@apache.org Sent: Friday, May 31, 2013 5:28:58 PM Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote: What is the protocol for organizing the logistics and collaborating? I am loathe to flood common-dev with does this time work for you? emails from the interested parties. Do we create a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break initiatives down into more granular JIRA tickets. I'd go with a wiki page and perhaps http://www.doodle.com/ After we've had a few G+ hangouts, for those that would like to meet face to face, I have also made an all day reservation for a meeting room that can hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on Tuesday June 25th (the day before Hadoop Summit and a short drive away). We don't have to use the whole day, but it gives us some flexibility around the availability of interested parties. I was thinking something along the lines of 10am - 3pm. We are happy to cater lunch. That also would be very much appreciated! Thanks, Roman.
Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
I plan to attend. A 9:30 time is a little better for me. sanjay On Jun 5, 2013, at 8:14 PM, Stephen Watt wrote: Hi Folks Per Roman's recommendation I've created a Wiki Page for organizing the work and managing the logistics - https://wiki.apache.org/hadoop/HCFS/Progress I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get together and discuss the initiative. Please respond back to me if you're interested or would like to propose a different time. I'll update our Wiki page with the logistics. Regards Steve Watt - Original Message - From: Roman Shaposhnik shaposh...@gmail.com To: Stephen Watt sw...@redhat.com Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, shv hadoop shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com, apurt...@apache.org Sent: Friday, May 31, 2013 5:28:58 PM Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote: What is the protocol for organizing the logistics and collaborating? I am loathe to flood common-dev with does this time work for you? emails from the interested parties. Do we create a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break initiatives down into more granular JIRA tickets. I'd go with a wiki page and perhaps http://www.doodle.com/ After we've had a few G+ hangouts, for those that would like to meet face to face, I have also made an all day reservation for a meeting room that can hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on Tuesday June 25th (the day before Hadoop Summit and a short drive away). We don't have to use the whole day, but it gives us some flexibility around the availability of interested parties. I was thinking something along the lines of 10am - 3pm. We are happy to cater lunch. That also would be very much appreciated! Thanks, Roman.
Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
The proposed time (9am PST Monday June 10th) is good for me. On Thu, Jun 6, 2013 at 5:14 AM, Stephen Watt sw...@redhat.com wrote: Hi Folks Per Roman's recommendation I've created a Wiki Page for organizing the work and managing the logistics - https://wiki.apache.org/hadoop/HCFS/Progress I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get together and discuss the initiative. Please respond back to me if you're interested or would like to propose a different time. I'll update our Wiki page with the logistics. Regards Steve Watt - Original Message - From: Roman Shaposhnik shaposh...@gmail.com To: Stephen Watt sw...@redhat.com Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, shv hadoop shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com, apurt...@apache.org Sent: Friday, May 31, 2013 5:28:58 PM Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote: What is the protocol for organizing the logistics and collaborating? I am loathe to flood common-dev with does this time work for you? emails from the interested parties. Do we create a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break initiatives down into more granular JIRA tickets. I'd go with a wiki page and perhaps http://www.doodle.com/ After we've had a few G+ hangouts, for those that would like to meet face to face, I have also made an all day reservation for a meeting room that can hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on Tuesday June 25th (the day before Hadoop Summit and a short drive away). We don't have to use the whole day, but it gives us some flexibility around the availability of interested parties. I was thinking something along the lines of 10am - 3pm. We are happy to cater lunch. That also would be very much appreciated! Thanks, Roman. -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
Hi Folks Per Roman's recommendation I've created a Wiki Page for organizing the work and managing the logistics - https://wiki.apache.org/hadoop/HCFS/Progress I'd like to propose a Google Hangout at 9am PST on Monday June 10th to get together and discuss the initiative. Please respond back to me if you're interested or would like to propose a different time. I'll update our Wiki page with the logistics. Regards Steve Watt - Original Message - From: Roman Shaposhnik shaposh...@gmail.com To: Stephen Watt sw...@redhat.com Cc: common-dev@hadoop.apache.org, mbhandar...@gopivotal.com, shv hadoop shv.had...@gmail.com, ste...@hortonworks.com, erlv5...@gmail.com, apurt...@apache.org Sent: Friday, May 31, 2013 5:28:58 PM Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote: What is the protocol for organizing the logistics and collaborating? I am loathe to flood common-dev with does this time work for you? emails from the interested parties. Do we create a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break initiatives down into more granular JIRA tickets. I'd go with a wiki page and perhaps http://www.doodle.com/ After we've had a few G+ hangouts, for those that would like to meet face to face, I have also made an all day reservation for a meeting room that can hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on Tuesday June 25th (the day before Hadoop Summit and a short drive away). We don't have to use the whole day, but it gives us some flexibility around the availability of interested parties. I was thinking something along the lines of 10am - 3pm. We are happy to cater lunch. That also would be very much appreciated! Thanks, Roman.
Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
Hi Folks I am grateful for the interest and to get so many responses (interested parties that responded are on CC). I like Steve Loughran's idea of having a few G+ hangouts first to get to some consensus on how to organize the work as well as hear his thoughts about leveraging the Hadoop FileSystem tests he's already developed for the SWIFT object store. I am also keen to present/discuss the work we've (Red Hat) done around our perception of the state of the art for filesystem semantics and their test coverage to validate if the community at least has a shared point of view, which I think would be a good starting point. What is the protocol for organizing the logistics and collaborating? I am loathe to flood common-dev with does this time work for you? emails from the interested parties. Do we create a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break initiatives down into more granular JIRA tickets. After we've had a few G+ hangouts, for those that would like to meet face to face, I have also made an all day reservation for a meeting room that can hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on Tuesday June 25th (the day before Hadoop Summit and a short drive away). We don't have to use the whole day, but it gives us some flexibility around the availability of interested parties. I was thinking something along the lines of 10am - 3pm. We are happy to cater lunch. Regards Steve Watt - Original Message - From: Steve Loughran ste...@hortonworks.com To: common-dev@hadoop.apache.org Sent: Friday, May 24, 2013 3:47:04 PM Subject: Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop On 24 May 2013 00:52, Stephen Watt sw...@redhat.com wrote: Hi Folks Hadoop's pluggable filesystem architecture supports the ability to enable an alternate filesystem for use with Hadoop by writing a plugin for it. We now have several alternate filesystems that have Hadoop FileSystem plugins and because this isn't a very well understood topic, I've been working on a page on the project wiki to bring this all together - http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project has been opening up Ambari to support any configured Hadoop FileSystem (as opposed to just HDFS) over at https://issues.apache.org/jira/browse/AMBARI-1817 My team (over at Red Hat) have been working on writing a Hadoop FileSystem plugin for the glusterfs filesystem and have been finding that some of the expected semantics of the operations within the Abstract FileSystem class are a little ambiguous. With that said, we've joined Steve Loughran in attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0 FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371 It seems to me that once we had these semantics defined, it would be good for consistency of implementation if we could make sure they are well understood and properly implemented by the community of folks writing Hadoop FileSystem plugins. To that end, we might work to ensure that those semantics are tested within an exhaustive test framework that focuses on the abstract Hadoop FileSystem layer. Each FileSystem provider could run the tests to ensure their plugin implementation and behavior is consistent with the expectation. Perhaps a broader extension of https://issues.apache.org/jira/browse/HADOOP-9258. I have a plan for starting those tests, pulling up the Swift ones when they are checked in. Big tests that do scale, and that verify the assumptions that MR, HBase c are where we are weakest. The defacto definition of FS sematics are the apps, and its them that currently find the problems (e.g MAPREDUCE-5264) If folks are interested in these goals, I could host a workshop/discussion/hackday in Mountain View to get local people together (perhaps a Google Hangout for the remote folks) to keep the ball rolling on the semantics discussion and test creation. As a side note, I think this could also turn out be quite an effective means of introducing FileSystem vendors to the ASF and getting them contributing to these aspects of the project. Can we start with some G+ hangouts to get to know each other and have some broader participation (myself, the others working on Swift, people who have done S3 (Tom, some of the amazon folk), etc...), Then when a workshop is held, it's got some clearer objectives how do we test this. I would want the FS semantics to be locked down in some online discussions/JIRA rather than come back after a night's sleep to discover it had be defined with tests. -steve
Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
On Fri, May 31, 2013 at 1:00 PM, Stephen Watt sw...@redhat.com wrote: What is the protocol for organizing the logistics and collaborating? I am loathe to flood common-dev with does this time work for you? emails from the interested parties. Do we create a high level JIRA ticket and collaborate and post comments and G+ meetup times on that ? Another option might be the Wiki, I'd be happy to be responsible with tracking progress on https://wiki.apache.org/hadoop/HCFS/Progress until we are able to break initiatives down into more granular JIRA tickets. I'd go with a wiki page and perhaps http://www.doodle.com/ After we've had a few G+ hangouts, for those that would like to meet face to face, I have also made an all day reservation for a meeting room that can hold up to 20 people at our Red Hat Office in Castro Street, Mountain View on Tuesday June 25th (the day before Hadoop Summit and a short drive away). We don't have to use the whole day, but it gives us some flexibility around the availability of interested parties. I was thinking something along the lines of 10am - 3pm. We are happy to cater lunch. That also would be very much appreciated! Thanks, Roman.
Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
Thanks for the initiative, Steve. A few folks from Pivotal and our partners would be interested in joining the workshop/discussion. - milind --- Milind Bhandarkar Chief Scientist, Machine Learning Platforms, Pivotal +1-650-523-3858 (W) +1-408-666-8483 (C) On Thu, May 23, 2013 at 4:52 PM, Stephen Watt sw...@redhat.com wrote: Hi Folks Hadoop's pluggable filesystem architecture supports the ability to enable an alternate filesystem for use with Hadoop by writing a plugin for it. We now have several alternate filesystems that have Hadoop FileSystem plugins and because this isn't a very well understood topic, I've been working on a page on the project wiki to bring this all together - http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project has been opening up Ambari to support any configured Hadoop FileSystem (as opposed to just HDFS) over at https://issues.apache.org/jira/browse/AMBARI-1817 My team (over at Red Hat) have been working on writing a Hadoop FileSystem plugin for the glusterfs filesystem and have been finding that some of the expected semantics of the operations within the Abstract FileSystem class are a little ambiguous. With that said, we've joined Steve Loughran in attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0 FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371 It seems to me that once we had these semantics defined, it would be good for consistency of implementation if we could make sure they are well understood and properly implemented by the community of folks writing Hadoop FileSystem plugins. To that end, we might work to ensure that those semantics are tested within an exhaustive test framework that focuses on the abstract Hadoop FileSystem layer. Each FileSystem provider could run the tests to ensure their plugin implementation and behavior is consistent with the expectation. Perhaps a broader extension of https://issues.apache.org/jira/browse/HADOOP-9258. If folks are interested in these goals, I could host a workshop/discussion/hackday in Mountain View to get local people together (perhaps a Google Hangout for the remote folks) to keep the ball rolling on the semantics discussion and test creation. As a side note, I think this could also turn out be quite an effective means of introducing FileSystem vendors to the ASF and getting them contributing to these aspects of the project. Regards Steve Watt
Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
Hi Kun, In case you are looking for the NFS support to HDFS, this JIRA might interest you: HDFS-4750. Thanks, Brandon Li On Thu, May 23, 2013 at 6:43 PM, Kun Ling lkun.e...@gmail.com wrote: Hi Stephen Watt, I am a fresh developer trying to add a NFS-like FileSystem support for Hadoop, and also have some confusion about the FileSystem Semantics. Since I live in East Asia, I'd like to attend via Google Hangout if possible. Thanks. +1 Kun Ling yours, Kun Ling On Fri, May 24, 2013 at 7:52 AM, Stephen Watt sw...@redhat.com wrote: Hi Folks Hadoop's pluggable filesystem architecture supports the ability to enable an alternate filesystem for use with Hadoop by writing a plugin for it. We now have several alternate filesystems that have Hadoop FileSystem plugins and because this isn't a very well understood topic, I've been working on a page on the project wiki to bring this all together - http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project has been opening up Ambari to support any configured Hadoop FileSystem (as opposed to just HDFS) over at https://issues.apache.org/jira/browse/AMBARI-1817 My team (over at Red Hat) have been working on writing a Hadoop FileSystem plugin for the glusterfs filesystem and have been finding that some of the expected semantics of the operations within the Abstract FileSystem class are a little ambiguous. With that said, we've joined Steve Loughran in attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0 FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371 It seems to me that once we had these semantics defined, it would be good for consistency of implementation if we could make sure they are well understood and properly implemented by the community of folks writing Hadoop FileSystem plugins. To that end, we might work to ensure that those semantics are tested within an exhaustive test framework that focuses on the abstract Hadoop FileSystem layer. Each FileSystem provider could run the tests to ensure their plugin implementation and behavior is consistent with the expectation. Perhaps a broader extension of https://issues.apache.org/jira/browse/HADOOP-9258. If folks are interested in these goals, I could host a workshop/discussion/hackday in Mountain View to get local people together (perhaps a Google Hangout for the remote folks) to keep the ball rolling on the semantics discussion and test creation. As a side note, I think this could also turn out be quite an effective means of introducing FileSystem vendors to the ASF and getting them contributing to these aspects of the project. Regards Steve Watt -- http://www.lingcc.com
Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
On 24 May 2013 00:52, Stephen Watt sw...@redhat.com wrote: Hi Folks Hadoop's pluggable filesystem architecture supports the ability to enable an alternate filesystem for use with Hadoop by writing a plugin for it. We now have several alternate filesystems that have Hadoop FileSystem plugins and because this isn't a very well understood topic, I've been working on a page on the project wiki to bring this all together - http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project has been opening up Ambari to support any configured Hadoop FileSystem (as opposed to just HDFS) over at https://issues.apache.org/jira/browse/AMBARI-1817 My team (over at Red Hat) have been working on writing a Hadoop FileSystem plugin for the glusterfs filesystem and have been finding that some of the expected semantics of the operations within the Abstract FileSystem class are a little ambiguous. With that said, we've joined Steve Loughran in attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0 FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371 It seems to me that once we had these semantics defined, it would be good for consistency of implementation if we could make sure they are well understood and properly implemented by the community of folks writing Hadoop FileSystem plugins. To that end, we might work to ensure that those semantics are tested within an exhaustive test framework that focuses on the abstract Hadoop FileSystem layer. Each FileSystem provider could run the tests to ensure their plugin implementation and behavior is consistent with the expectation. Perhaps a broader extension of https://issues.apache.org/jira/browse/HADOOP-9258. I have a plan for starting those tests, pulling up the Swift ones when they are checked in. Big tests that do scale, and that verify the assumptions that MR, HBase c are where we are weakest. The defacto definition of FS sematics are the apps, and its them that currently find the problems (e.g MAPREDUCE-5264) If folks are interested in these goals, I could host a workshop/discussion/hackday in Mountain View to get local people together (perhaps a Google Hangout for the remote folks) to keep the ball rolling on the semantics discussion and test creation. As a side note, I think this could also turn out be quite an effective means of introducing FileSystem vendors to the ASF and getting them contributing to these aspects of the project. Can we start with some G+ hangouts to get to know each other and have some broader participation (myself, the others working on Swift, people who have done S3 (Tom, some of the amazon folk), etc...), Then when a workshop is held, it's got some clearer objectives how do we test this. I would want the FS semantics to be locked down in some online discussions/JIRA rather than come back after a night's sleep to discover it had be defined with tests. -steve
Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
Makes sense, Steve. There are a couple of guys here at WANdisco who will be interested in joining. Thanks, --Konstantin On Fri, May 24, 2013 at 10:15 AM, Milind Bhandarkar mbhandar...@gopivotal.com wrote: Thanks for the initiative, Steve. A few folks from Pivotal and our partners would be interested in joining the workshop/discussion. - milind --- Milind Bhandarkar Chief Scientist, Machine Learning Platforms, Pivotal +1-650-523-3858 (W) +1-408-666-8483 (C) On Thu, May 23, 2013 at 4:52 PM, Stephen Watt sw...@redhat.com wrote: Hi Folks Hadoop's pluggable filesystem architecture supports the ability to enable an alternate filesystem for use with Hadoop by writing a plugin for it. We now have several alternate filesystems that have Hadoop FileSystem plugins and because this isn't a very well understood topic, I've been working on a page on the project wiki to bring this all together - http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project has been opening up Ambari to support any configured Hadoop FileSystem (as opposed to just HDFS) over at https://issues.apache.org/jira/browse/AMBARI-1817 My team (over at Red Hat) have been working on writing a Hadoop FileSystem plugin for the glusterfs filesystem and have been finding that some of the expected semantics of the operations within the Abstract FileSystem class are a little ambiguous. With that said, we've joined Steve Loughran in attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0 FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371 It seems to me that once we had these semantics defined, it would be good for consistency of implementation if we could make sure they are well understood and properly implemented by the community of folks writing Hadoop FileSystem plugins. To that end, we might work to ensure that those semantics are tested within an exhaustive test framework that focuses on the abstract Hadoop FileSystem layer. Each FileSystem provider could run the tests to ensure their plugin implementation and behavior is consistent with the expectation. Perhaps a broader extension of https://issues.apache.org/jira/browse/HADOOP-9258. If folks are interested in these goals, I could host a workshop/discussion/hackday in Mountain View to get local people together (perhaps a Google Hangout for the remote folks) to keep the ball rolling on the semantics discussion and test creation. As a side note, I think this could also turn out be quite an effective means of introducing FileSystem vendors to the ASF and getting them contributing to these aspects of the project. Regards Steve Watt
Re: [DISCUSS] Ensuring Consistent Behavior for Alternative Hadoop FileSystems + Workshop
Hi Stephen Watt, I am a fresh developer trying to add a NFS-like FileSystem support for Hadoop, and also have some confusion about the FileSystem Semantics. Since I live in East Asia, I'd like to attend via Google Hangout if possible. Thanks. +1 Kun Ling yours, Kun Ling On Fri, May 24, 2013 at 7:52 AM, Stephen Watt sw...@redhat.com wrote: Hi Folks Hadoop's pluggable filesystem architecture supports the ability to enable an alternate filesystem for use with Hadoop by writing a plugin for it. We now have several alternate filesystems that have Hadoop FileSystem plugins and because this isn't a very well understood topic, I've been working on a page on the project wiki to bring this all together - http://wiki.apache.org/hadoop/HCFS. At the same time, the Ambari project has been opening up Ambari to support any configured Hadoop FileSystem (as opposed to just HDFS) over at https://issues.apache.org/jira/browse/AMBARI-1817 My team (over at Red Hat) have been working on writing a Hadoop FileSystem plugin for the glusterfs filesystem and have been finding that some of the expected semantics of the operations within the Abstract FileSystem class are a little ambiguous. With that said, we've joined Steve Loughran in attempting to clarify these for both the Hadoop 1.0 and the Hadoop 2.0 FileSystem class over at https://issues.apache.org/jira/browse/HADOOP-9371 It seems to me that once we had these semantics defined, it would be good for consistency of implementation if we could make sure they are well understood and properly implemented by the community of folks writing Hadoop FileSystem plugins. To that end, we might work to ensure that those semantics are tested within an exhaustive test framework that focuses on the abstract Hadoop FileSystem layer. Each FileSystem provider could run the tests to ensure their plugin implementation and behavior is consistent with the expectation. Perhaps a broader extension of https://issues.apache.org/jira/browse/HADOOP-9258. If folks are interested in these goals, I could host a workshop/discussion/hackday in Mountain View to get local people together (perhaps a Google Hangout for the remote folks) to keep the ball rolling on the semantics discussion and test creation. As a side note, I think this could also turn out be quite an effective means of introducing FileSystem vendors to the ASF and getting them contributing to these aspects of the project. Regards Steve Watt -- http://www.lingcc.com