Re: Release numbering for branch-2 releases

2013-01-31 Thread Eli Collins
We also need to spell out what's permissible *before* GA as well.  The
alpha/beta labels, as I understand them, are not green lights to break
anything as long as it's not API compatibility.  The API compatibility
story has been somewhat fuzzy as well, eg MR2 requires users recompile all
their Hadoop 1.x jobs (ouch).  We've been working on stabilizing 2.x for a
while now and we need to start slating some changes to 3.x if we want to
get a 2.x GA release out soon.  To do that we have to consider issues for
end users (and downstream projects) upgrading from 0.23 releases and older
2.0.x releases, aside from just API compatibility, in terms of what's
permissible in the releases between now and GA.

Thanks,
Eli

On Wed, Jan 30, 2013 at 5:10 PM, Arun C Murthy a...@hortonworks.com wrote:

 The discussions in HADOOP-9151 were related to wire-compatibility. I think
 we all agree that breaking API compatibility is not allowed without
 deprecating them first in a prior major release - this is something we have
 followed since hadoop-0.1.

 I agree we need to spell out what changes we can and cannot do *after* we
 go GA, for e.g.:
 # Clearly incompatible *API* changes are *not* allowed in hadoop-2 post-GA.
 # Do we allow incompatible changes on Client-Server protocols? I would say
 *no*.
 # Do we allow incompatible changes on internal-server protocols (for e.g.
 NN-DN or NN-NN in HA setup or RM-NM in YARN) to ensure we support
 rolling-upgrades? I would like to not allow this, but I do not know how
 feasible this is. An option is to allow these changes between minor
 releases i.e. between hadoop-2.10 and hadoop-2.11.
 # Do we allow changes which force a HDFS metadata upgrade between a minor
 upgrade i.e. hadoop-2.20 to hadoop-2.21?
 # Clearly *no* incompatible changes (API/client-server/server-server)
 changes are allowed in a patch release i.e. hadoop-2.20.0 and hadoop-2.20.1
 have to be compatible among all respects.

 What else am I missing?

 I'll make sure we update our Roadmap wiki and other docs post this
 discussion.

 thanks,
 Arun



 On Jan 30, 2013, at 4:21 PM, Eli Collins wrote:

  Thanks for bringing this up Arun.  One of the issues is that we
  haven't been clear about what type of compatibility breakages are
  allowed, and which are not.  For example, renaming FileSystem#open is
  incompatible, and not OK, regardless of the alpha/beta tag.  Breaking
  a server/server APIs is OK pre-GA but probably not post GA, at least
  in a point release, or required for a security fix, etc.
  Configuration, data format, environment variable, changes etc can all
  be similarly incompatible. The issue we had in HADOOP-9151 was someone
  claimed it is not an incompatible change because it doesn't break API
  compatibility even though it breaks wire compatibility. So let's be
  clear about the types of incompatibility we are or are not permitting.
  For example, will it be OK to merge a change before 2.2.0-beta that
  requires an HDFS metadata upgrade? Or breaks client server wire
  compatibility?  I've been assuming that changing an API annotated
  Public/Stable still requires multiple major releases (one to deprecate
  and one to remove), does the alpha label change that? To some people
  the alpha, beta label implies instability in terms of
  quality/features, while to others it means unstable APIs (and to some
  both) so it would be good to spell that out. In short, agree that we
  really need to figure out what changes are permitted in what releases,
  and we should update the docs accordingly (there's a start here:
  http://wiki.apache.org/hadoop/Roadmap).
 
  Note that the 2.0.0 alpha release vote thread was clear that we
  thought were all in agreement that we'd like to keep client/server
  compatible post 2.0 - and there was no push back. We pulled a number
  of jiras into the 2.0 release explicitly so that we could preserve
  client/server compatibility going forward.  Here's the relevant part
  of the thread as a refresher: http://s.apache.org/gQ
 
  2) HADOOP-8285 and HADOOP-8366 changed the wire format for the RPC
  envelope in branch-2, but didn't make it into this rc. So, that would
  mean that future alphas would not be protocol-compatible with this
  alpha. Per a discussion a few weeks ago, I think we all were in
  agreement that, if possible, we'd like all 2.x to be compatible for
  client-server communication, at least (even if we don't support
  cross-version for the intra-cluster protocols)
 
  Thanks,
  Eli
 
  On Tue, Jan 29, 2013 at 12:56 PM, Arun C Murthy a...@hortonworks.com
 wrote:
  Folks,
 
  There has been some discussions about incompatible changes in the
 hadoop-2.x.x-alpha releases on HADOOP-9070, HADOOP-9151, HADOOP-9192 and
 few other jiras. Frankly, I'm surprised about some of them since the
 'alpha' moniker was precisely to harden apis by changing them if necessary,
 borne out by the fact that every  single release in hadoop-2 chain has had
 incompatible changes. This happened since 

Re: Release numbering for branch-2 releases

2013-01-31 Thread Arun C Murthy
Stack,

On Jan 30, 2013, at 9:25 PM, Stack wrote:

 I find the above opaque and written in a cryptic language that I might grok
 if I spent a day or two running over cited issues trying to make some
 distillation of the esotericia debated therein.  If you want feedback from
 other than the cognescenti, I would suggest a better summation of what all
 is involved.  


I apologize if there was too much technical details.

The simplified version is that hadoop-2 isn't baked as it stands today, and is 
not viable to be supported by this community in a stable manner. In particular, 
it is due to the move to PB for HDFS protocols and the freshly minted YARN 
apis/protocols. As a result, we have been forced to make (incompatible) changes 
in every hadoop-2 release so far (2.0.0, 2.0.2 etc.). Since we released the 
previous bits we have found security issues, bugs and other issues which will 
cause long-term maintenance harm (details are in the HADOOP/HDFS/YARN jiras in 
the original email).

My aim, as the RM, is to try nudge (nay, force) all contributors to spend time 
over the next couple of months focussing on fixing known issues and to look for 
other surprises - this way I hope to ensure we do not have further incompatible 
changes for downstream projects and we can support hadoop-2 for at least a 
couple of years. I hope this makes sense to you. I don't think turning around 
and calling these 3.x or 4.x makes things better since no amount of numbering 
lipstick will make the software better or viable for the long-term for both 
users and other projects. Worse, it will force HBase and other projects to deal 
with *even more* major Hadoop releases... which seems like a royal pita. 

I hope that clarifies things. Thanks Stack.

Arun



[jira] [Created] (MAPREDUCE-4970) Child tasks create security audit log files

2013-01-31 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-4970:
-

 Summary: Child tasks create security audit log files
 Key: MAPREDUCE-4970
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4970
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Sandy Ryza


After HADOOP-8552, MR child tasks will attempt to create security audit log 
files with their user names.  On an insecure cluster, this has no effect, but 
on a secure cluster, empty log files will be created for tasks with names like 
SecurityAuth-joeuser.log.

I haven't verified whether this occurs in MR2 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira