subject:"Towards Hadoop 1.0\: Stronger API Compatibility from 0.21 onwards"

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-09-28 Thread Sanjay Radia

On Sep 28, 2009, at 3:15 AM, Steve Loughran wrote:

Dhruba Borthakur wrote:
It is really nice to have wire-compatibility between clients and
servers
running different versions of hadoop. The reason we would like
this is

because we can allow the same client (Hive, etc) submit jobs to two
different clusters running different versions of hadoop. But I am
not stuck
up on the name of the release that supports wire-compatibility, it
can be

either 1.0 or something later than that.
API compatibility +1
Data compatibility +1
Job Q compatibility -1Wire compatibility +0

That's stability of the job submission network protocol you are
looking

for there.
* We need a job submission API that is designed to work over long-
haul

links and versions
* It does not have to be the same as anything used in-cluster
* It does not actually need to run in the JobTracker. An independent
service bridging the stable long-haul API to an unstable datacentre
protocol does work, though authentication and user-rights are a
troublespot

I think you are misinterpreting what Job Q compatibility means.
It is about jobs already in the queue surviving an upgrade across a
release.

See my initial proposal on Jan 16th:
https://issues.apache.org/jira/browse/HADOOP-5071?focusedCommentId=12664691page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel
#action_12664691

Doug argued that it is nice to have but not required for 1.0 - can be
added later.

sanjay

Similarly, it would be good for a stable long-haul HDFS protocol, such
as FTP or webdav. Again, no need to build into the namenode .

see http://www.slideshare.net/steve_l/long-haul-hadoop
and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-09-28 Thread Dhruba Borthakur

I think we should not require Job Q compatibility for 1.0 release.

thanks,
dhruba

On Mon, Sep 28, 2009 at 11:06 AM, Sanjay Radia sra...@yahoo-inc.com wrote:

On Sep 28, 2009, at 3:15 AM, Steve Loughran wrote:

Dhruba Borthakur wrote:
It is really nice to have wire-compatibility between clients and servers
running different versions of hadoop. The reason we would like this is
because we can allow the same client (Hive, etc) submit jobs to two
different clusters running different versions of hadoop. But I am not
stuck
up on the name of the release that supports wire-compatibility, it can
be
either 1.0 or something later than that.
API compatibility +1
Data compatibility +1
Job Q compatibility -1Wire compatibility +0

That's stability of the job submission network protocol you are looking
for there.
* We need a job submission API that is designed to work over long-haul
links and versions
* It does not have to be the same as anything used in-cluster
* It does not actually need to run in the JobTracker. An independent
service bridging the stable long-haul API to an unstable datacentre
protocol does work, though authentication and user-rights are a
troublespot

I think you are misinterpreting what Job Q compatibility means.
It is about jobs already in the queue surviving an upgrade across a
release.

See my initial proposal on Jan 16th:

https://issues.apache.org/jira/browse/HADOOP-5071?focusedCommentId=12664691page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel
#action_12664691

Doug argued that it is nice to have but not required for 1.0 - can be added
later.

sanjay

Similarly, it would be good for a stable long-haul HDFS protocol, such
as FTP or webdav. Again, no need to build into the namenode .

see http://www.slideshare.net/steve_l/long-haul-hadoop
and commentary under http://wiki.apache.org/hadoop/BristolHadoopWorkshop

--
Connect to me at http://www.facebook.com/dhruba

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-09-25 Thread Allen Wittenauer

On 9/25/09 10:13 AM, Dhruba Borthakur dhr...@gmail.com wrote:
 It is really nice to have wire-compatibility between clients and servers
 running different versions of hadoop. The reason we would like this is
 because we can allow the same client (Hive, etc) submit jobs to two
 different clusters running different versions of hadoop. But I am not stuck
 up on the name of the release that supports wire-compatibility, it can be
 either 1.0  or something later than that.

To me, the lack of wire compatibility makes will make Hadoop 1.0 in name
only when in reality it is more like 0.80. :(

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-09-25 Thread Sanjay Radia



On Sep 25, 2009, at 12:03 PM, Allen Wittenauer wrote:


On 9/25/09 10:13 AM, Dhruba Borthakur dhr...@gmail.com wrote:
 It is really nice to have wire-compatibility between clients and  
servers
 running different versions of hadoop. The reason we would like  
this is

 because we can allow the same client (Hive, etc) submit jobs to two
 different clusters running different versions of hadoop. But I am  
not stuck
 up on the name of the release that supports wire-compatibility, it  
can be

 either 1.0  or something later than that.

To me, the lack of wire compatibility makes will make Hadoop 1.0  
in name

only when in reality it is more like 0.80. :(


My sentiments exactly, though I could learn to live with it

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-09-25 Thread Allen Wittenauer




On 9/25/09 12:44 PM, Sanjay Radia sra...@yahoo-inc.com wrote:

 
 On Sep 25, 2009, at 12:03 PM, Allen Wittenauer wrote:
 
 On 9/25/09 10:13 AM, Dhruba Borthakur dhr...@gmail.com wrote:
 It is really nice to have wire-compatibility between clients and
 servers
 running different versions of hadoop. The reason we would like
 this is
 because we can allow the same client (Hive, etc) submit jobs to two
 different clusters running different versions of hadoop. But I am
 not stuck
 up on the name of the release that supports wire-compatibility, it
 can be
 either 1.0  or something later than that.
 
 To me, the lack of wire compatibility makes will make Hadoop 1.0
 in name
 only when in reality it is more like 0.80. :(
 
 My sentiments exactly, though I could learn to live with it 

We just had this discussion today about how to put Hadoop into a production
pipeline.  I was under the impression that 1.0 was going to be wire
compatible too.

This is just so disappointing and, quite frankly, makes 1.0 less than useful
for Real Work.  Great, the APIs don't  change but you still have the same
problems of getting data on/off the grid without upgrading your clients
every time. 

To me, without wire compatibility, 1.0 makes me feel pretty meh; who
cares--we're still going to be in upgrade hell.

Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-08-28 Thread Sanjay Radia



Hadoop 1.0's goal was compatibility on several fronts.
(See https://issues.apache.org/jira/browse/HADOOP-5071) for details.

Due to the amount of work involved, it has been necessary to split  
this work across several  releases prior to 1.0.


Turns out that release 0.21 has a number of Jiras targeted towards API  
and config stability.
Further, in 0.21,  we are tagging interfaces with a classification of  
their intended audience(scope) and their stability

(see HADOOP-5073 for the classification).
Post 1.0 stable interfaces will remain stable (both syntax and  
semantics) according the proposed 1.0 rules.
Hadoop's  pre-1.0 rules allow interfaces to be changed regardless of  
stability as long as one allows 2 releases of deprecation.
(See http://wiki.apache.org/hadoop/Roadmap for the current i.e.  
pre-1.0 rules).


So how do we arrange to maintain that stable interfaces remain stable  
(both syntax and semantics) between 0.21 and 1.0?
I propose that we honor the compatibility of stable interfaces  from  
release 0.21 onwards;

i.e. apply the same post 1.0 rules to pre-1.0 releases.

The actual discussion on what needs to be stable or not belongs inside  
Jira Hadoop-5073, not in this email thread;
 I would like to use this email thread to discuss the proposal of  
honoring  compatibility of stable interfaces prior to 1.0.


Feedback?

sanjay

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-08-28 Thread Doug Cutting


Sanjay Radia wrote:

No. The 1.0 proposal was that it included both API and wire compatibility.


The proposal includes a lot of things, but it's so far just a proposal. 
 There's been no vote to formally define what 1.0 will mean.  In every 
discussion I've heard, from the very beginning of the project, it 
primarily meant API stability.  You've added wire compatibility, data 
stability, security, restart recovery, etc.  These are all very nice 
features to have, essential perhaps in some contexts, but they may nor 
may not be required for 1.0.  I worry that if we keep piling more things 
on, we'll never get to 1.0.


What would be wrong with calling it 1.0 when we have end-user API 
stability?  Why would that be a bad thing?


Doug

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

7 matches

Site Navigation

Mail list logo

Footer information