Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-09-28 Thread Sanjay Radia


On Sep 28, 2009, at 3:15 AM, Steve Loughran wrote:


Dhruba Borthakur wrote:
 It is really nice to have wire-compatibility between clients and  
servers
 running different versions of hadoop. The reason we would like  
this is

 because we can allow the same client (Hive, etc) submit jobs to two
 different clusters running different versions of hadoop. But I am  
not stuck
 up on the name of the release that supports wire-compatibility, it  
can be

 either 1.0  or something later than that.
 API compatibility  +1
 Data compatibility +1
 Job Q compatibility -1Wire compatibility +0


That's stability of the job submission network protocol you are  
looking

for there.
  * We need a job submission API that is designed to work over long- 
haul

links and versions
  * It does not have to be the same as anything used in-cluster
  * It does not actually need to run in the JobTracker. An independent
service bridging the stable long-haul API to an unstable datacentre
protocol does work, though authentication and user-rights are a  
troublespot






I think you are misinterpreting what Job Q compatibility means.
It is about jobs already in the queue surviving an upgrade across a  
release.


See my initial proposal on Jan 16th:
https://issues.apache.org/jira/browse/HADOOP-5071?focusedCommentId=12664691page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel 
#action_12664691


Doug argued that it is nice to have but not required for 1.0 - can be  
added later.



sanjay


Similarly, it would be good for a stable long-haul HDFS protocol, such
as FTP or webdav. Again, no need to build into the namenode .

see http://www.slideshare.net/steve_l/long-haul-hadoop
and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop





Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-09-28 Thread Dhruba Borthakur
I think we should not require Job Q compatibility for 1.0 release.

thanks,
dhruba


On Mon, Sep 28, 2009 at 11:06 AM, Sanjay Radia sra...@yahoo-inc.com wrote:


 On Sep 28, 2009, at 3:15 AM, Steve Loughran wrote:

  Dhruba Borthakur wrote:
  It is really nice to have wire-compatibility between clients and servers
  running different versions of hadoop. The reason we would like this is
  because we can allow the same client (Hive, etc) submit jobs to two
  different clusters running different versions of hadoop. But I am not
 stuck
  up on the name of the release that supports wire-compatibility, it can
 be
  either 1.0  or something later than that.
  API compatibility  +1
  Data compatibility +1
  Job Q compatibility -1Wire compatibility +0


 That's stability of the job submission network protocol you are looking
 for there.
  * We need a job submission API that is designed to work over long-haul
 links and versions
  * It does not have to be the same as anything used in-cluster
  * It does not actually need to run in the JobTracker. An independent
 service bridging the stable long-haul API to an unstable datacentre
 protocol does work, though authentication and user-rights are a
 troublespot




 I think you are misinterpreting what Job Q compatibility means.
 It is about jobs already in the queue surviving an upgrade across a
 release.

 See my initial proposal on Jan 16th:

 https://issues.apache.org/jira/browse/HADOOP-5071?focusedCommentId=12664691page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel
 #action_12664691

 Doug argued that it is nice to have but not required for 1.0 - can be added
 later.


 sanjay


 Similarly, it would be good for a stable long-haul HDFS protocol, such
 as FTP or webdav. Again, no need to build into the namenode .

 see http://www.slideshare.net/steve_l/long-haul-hadoop
 and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop





-- 
Connect to me at http://www.facebook.com/dhruba


Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-09-25 Thread Allen Wittenauer
On 9/25/09 10:13 AM, Dhruba Borthakur dhr...@gmail.com wrote:
 It is really nice to have wire-compatibility between clients and servers
 running different versions of hadoop. The reason we would like this is
 because we can allow the same client (Hive, etc) submit jobs to two
 different clusters running different versions of hadoop. But I am not stuck
 up on the name of the release that supports wire-compatibility, it can be
 either 1.0  or something later than that.

To me, the lack of wire compatibility makes will make Hadoop 1.0 in name
only when in reality it is more like 0.80. :(



Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-09-25 Thread Sanjay Radia


On Sep 25, 2009, at 12:03 PM, Allen Wittenauer wrote:


On 9/25/09 10:13 AM, Dhruba Borthakur dhr...@gmail.com wrote:
 It is really nice to have wire-compatibility between clients and  
servers
 running different versions of hadoop. The reason we would like  
this is

 because we can allow the same client (Hive, etc) submit jobs to two
 different clusters running different versions of hadoop. But I am  
not stuck
 up on the name of the release that supports wire-compatibility, it  
can be

 either 1.0  or something later than that.

To me, the lack of wire compatibility makes will make Hadoop 1.0  
in name

only when in reality it is more like 0.80. :(


My sentiments exactly, though I could learn to live with it 








Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-09-25 Thread Allen Wittenauer



On 9/25/09 12:44 PM, Sanjay Radia sra...@yahoo-inc.com wrote:

 
 On Sep 25, 2009, at 12:03 PM, Allen Wittenauer wrote:
 
 On 9/25/09 10:13 AM, Dhruba Borthakur dhr...@gmail.com wrote:
 It is really nice to have wire-compatibility between clients and
 servers
 running different versions of hadoop. The reason we would like
 this is
 because we can allow the same client (Hive, etc) submit jobs to two
 different clusters running different versions of hadoop. But I am
 not stuck
 up on the name of the release that supports wire-compatibility, it
 can be
 either 1.0  or something later than that.
 
 To me, the lack of wire compatibility makes will make Hadoop 1.0
 in name
 only when in reality it is more like 0.80. :(
 
 My sentiments exactly, though I could learn to live with it 

We just had this discussion today about how to put Hadoop into a production
pipeline.  I was under the impression that 1.0 was going to be wire
compatible too.

This is just so disappointing and, quite frankly, makes 1.0 less than useful
for Real Work.  Great, the APIs don't  change but you still have the same
problems of getting data on/off the grid without upgrading your clients
every time. 

To me, without wire compatibility, 1.0 makes me feel pretty meh; who
cares--we're still going to be in upgrade hell.



Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-08-28 Thread Sanjay Radia


Hadoop 1.0's goal was compatibility on several fronts.
(See https://issues.apache.org/jira/browse/HADOOP-5071) for details.

Due to the amount of work involved, it has been necessary to split  
this work across several  releases prior to 1.0.


Turns out that release 0.21 has a number of Jiras targeted towards API  
and config stability.
Further, in 0.21,  we are tagging interfaces with a classification of  
their intended audience(scope) and their stability

(see HADOOP-5073 for the classification).
Post 1.0 stable interfaces will remain stable (both syntax and  
semantics) according the proposed 1.0 rules.
Hadoop's  pre-1.0 rules allow interfaces to be changed regardless of  
stability as long as one allows 2 releases of deprecation.
(See http://wiki.apache.org/hadoop/Roadmap for the current i.e.  
pre-1.0 rules).


So how do we arrange to maintain that stable interfaces remain stable  
(both syntax and semantics) between 0.21 and 1.0?
I propose that we honor the compatibility of stable interfaces  from  
release 0.21 onwards;

i.e. apply the same post 1.0 rules to pre-1.0 releases.

The actual discussion on what needs to be stable or not belongs inside  
Jira Hadoop-5073, not in this email thread;
 I would like to use this email thread to discuss the proposal of  
honoring  compatibility of stable interfaces prior to 1.0.


Feedback?

sanjay




Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-08-28 Thread Doug Cutting

Sanjay Radia wrote:

No. The 1.0 proposal was that it included both API and wire compatibility.


The proposal includes a lot of things, but it's so far just a proposal. 
 There's been no vote to formally define what 1.0 will mean.  In every 
discussion I've heard, from the very beginning of the project, it 
primarily meant API stability.  You've added wire compatibility, data 
stability, security, restart recovery, etc.  These are all very nice 
features to have, essential perhaps in some contexts, but they may nor 
may not be required for 1.0.  I worry that if we keep piling more things 
on, we'll never get to 1.0.


What would be wrong with calling it 1.0 when we have end-user API 
stability?  Why would that be a bad thing?


Doug