Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
On Sep 28, 2009, at 3:15 AM, Steve Loughran wrote: Dhruba Borthakur wrote: It is really nice to have wire-compatibility between clients and servers running different versions of hadoop. The reason we would like this is because we can allow the same client (Hive, etc) submit jobs to two different clusters running different versions of hadoop. But I am not stuck up on the name of the release that supports wire-compatibility, it can be either 1.0 or something later than that. API compatibility +1 Data compatibility +1 Job Q compatibility -1Wire compatibility +0 That's stability of the job submission network protocol you are looking for there. * We need a job submission API that is designed to work over long- haul links and versions * It does not have to be the same as anything used in-cluster * It does not actually need to run in the JobTracker. An independent service bridging the stable long-haul API to an unstable datacentre protocol does work, though authentication and user-rights are a troublespot I think you are misinterpreting what Job Q compatibility means. It is about jobs already in the queue surviving an upgrade across a release. See my initial proposal on Jan 16th: https://issues.apache.org/jira/browse/HADOOP-5071?focusedCommentId=12664691page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel #action_12664691 Doug argued that it is nice to have but not required for 1.0 - can be added later. sanjay Similarly, it would be good for a stable long-haul HDFS protocol, such as FTP or webdav. Again, no need to build into the namenode . see http://www.slideshare.net/steve_l/long-haul-hadoop and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
I think we should not require Job Q compatibility for 1.0 release. thanks, dhruba On Mon, Sep 28, 2009 at 11:06 AM, Sanjay Radia sra...@yahoo-inc.com wrote: On Sep 28, 2009, at 3:15 AM, Steve Loughran wrote: Dhruba Borthakur wrote: It is really nice to have wire-compatibility between clients and servers running different versions of hadoop. The reason we would like this is because we can allow the same client (Hive, etc) submit jobs to two different clusters running different versions of hadoop. But I am not stuck up on the name of the release that supports wire-compatibility, it can be either 1.0 or something later than that. API compatibility +1 Data compatibility +1 Job Q compatibility -1Wire compatibility +0 That's stability of the job submission network protocol you are looking for there. * We need a job submission API that is designed to work over long-haul links and versions * It does not have to be the same as anything used in-cluster * It does not actually need to run in the JobTracker. An independent service bridging the stable long-haul API to an unstable datacentre protocol does work, though authentication and user-rights are a troublespot I think you are misinterpreting what Job Q compatibility means. It is about jobs already in the queue surviving an upgrade across a release. See my initial proposal on Jan 16th: https://issues.apache.org/jira/browse/HADOOP-5071?focusedCommentId=12664691page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel #action_12664691 Doug argued that it is nice to have but not required for 1.0 - can be added later. sanjay Similarly, it would be good for a stable long-haul HDFS protocol, such as FTP or webdav. Again, no need to build into the namenode . see http://www.slideshare.net/steve_l/long-haul-hadoop and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop -- Connect to me at http://www.facebook.com/dhruba
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
On 9/25/09 10:13 AM, Dhruba Borthakur dhr...@gmail.com wrote: It is really nice to have wire-compatibility between clients and servers running different versions of hadoop. The reason we would like this is because we can allow the same client (Hive, etc) submit jobs to two different clusters running different versions of hadoop. But I am not stuck up on the name of the release that supports wire-compatibility, it can be either 1.0 or something later than that. To me, the lack of wire compatibility makes will make Hadoop 1.0 in name only when in reality it is more like 0.80. :(
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
On Sep 25, 2009, at 12:03 PM, Allen Wittenauer wrote: On 9/25/09 10:13 AM, Dhruba Borthakur dhr...@gmail.com wrote: It is really nice to have wire-compatibility between clients and servers running different versions of hadoop. The reason we would like this is because we can allow the same client (Hive, etc) submit jobs to two different clusters running different versions of hadoop. But I am not stuck up on the name of the release that supports wire-compatibility, it can be either 1.0 or something later than that. To me, the lack of wire compatibility makes will make Hadoop 1.0 in name only when in reality it is more like 0.80. :( My sentiments exactly, though I could learn to live with it
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
On 9/25/09 12:44 PM, Sanjay Radia sra...@yahoo-inc.com wrote: On Sep 25, 2009, at 12:03 PM, Allen Wittenauer wrote: On 9/25/09 10:13 AM, Dhruba Borthakur dhr...@gmail.com wrote: It is really nice to have wire-compatibility between clients and servers running different versions of hadoop. The reason we would like this is because we can allow the same client (Hive, etc) submit jobs to two different clusters running different versions of hadoop. But I am not stuck up on the name of the release that supports wire-compatibility, it can be either 1.0 or something later than that. To me, the lack of wire compatibility makes will make Hadoop 1.0 in name only when in reality it is more like 0.80. :( My sentiments exactly, though I could learn to live with it We just had this discussion today about how to put Hadoop into a production pipeline. I was under the impression that 1.0 was going to be wire compatible too. This is just so disappointing and, quite frankly, makes 1.0 less than useful for Real Work. Great, the APIs don't change but you still have the same problems of getting data on/off the grid without upgrading your clients every time. To me, without wire compatibility, 1.0 makes me feel pretty meh; who cares--we're still going to be in upgrade hell.
Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
Hadoop 1.0's goal was compatibility on several fronts. (See https://issues.apache.org/jira/browse/HADOOP-5071) for details. Due to the amount of work involved, it has been necessary to split this work across several releases prior to 1.0. Turns out that release 0.21 has a number of Jiras targeted towards API and config stability. Further, in 0.21, we are tagging interfaces with a classification of their intended audience(scope) and their stability (see HADOOP-5073 for the classification). Post 1.0 stable interfaces will remain stable (both syntax and semantics) according the proposed 1.0 rules. Hadoop's pre-1.0 rules allow interfaces to be changed regardless of stability as long as one allows 2 releases of deprecation. (See http://wiki.apache.org/hadoop/Roadmap for the current i.e. pre-1.0 rules). So how do we arrange to maintain that stable interfaces remain stable (both syntax and semantics) between 0.21 and 1.0? I propose that we honor the compatibility of stable interfaces from release 0.21 onwards; i.e. apply the same post 1.0 rules to pre-1.0 releases. The actual discussion on what needs to be stable or not belongs inside Jira Hadoop-5073, not in this email thread; I would like to use this email thread to discuss the proposal of honoring compatibility of stable interfaces prior to 1.0. Feedback? sanjay
Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards
Sanjay Radia wrote: No. The 1.0 proposal was that it included both API and wire compatibility. The proposal includes a lot of things, but it's so far just a proposal. There's been no vote to formally define what 1.0 will mean. In every discussion I've heard, from the very beginning of the project, it primarily meant API stability. You've added wire compatibility, data stability, security, restart recovery, etc. These are all very nice features to have, essential perhaps in some contexts, but they may nor may not be required for 1.0. I worry that if we keep piling more things on, we'll never get to 1.0. What would be wrong with calling it 1.0 when we have end-user API stability? Why would that be a bad thing? Doug