Re: Moving to JDK7, JDK8 and new major releases

2014-06-28 Thread Chris Nauroth
Following up on ecosystem, I just took a look at the Apache trunk pom.xml
files for HBase, Flume and Oozie.  All are specifying 1.6 for source and
target in the maven-compiler-plugin configuration, so there may be
additional follow-up required here.  (For example, if HBase has made a
statement that its client will continue to support JDK6, then it wouldn't
be practical for them to link to a JDK7 version of hadoop-common.)

+1 for the whole plan though.  We can work through these details.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Fri, Jun 27, 2014 at 3:10 PM, Karthik Kambatla ka...@cloudera.com
wrote:

 +1 to making 2.6 the last JDK6 release.

 If we want, 2.7 could be a parallel release or one soon after 2.6. We could
 upgrade other dependencies that require JDK7 as well.


 On Fri, Jun 27, 2014 at 3:01 PM, Arun C. Murthy a...@hortonworks.com
 wrote:

  Thanks everyone for the discussion. Looks like we have come to a
 pragmatic
  and progressive conclusion.
 
  In terms of execution of the consensus plan, I think a little bit of
  caution is in order.
 
  Let's give downstream projects more of a runway.
 
  I propose we inform HBase, Pig, Hive etc. that we are considering making
  2.6 (not 2.5) the last JDK6 release and solicit their feedback. Once they
  are comfortable we can pull the trigger in 2.7.
 
  thanks,
  Arun
 
 
   On Jun 27, 2014, at 11:34 AM, Karthik Kambatla ka...@cloudera.com
  wrote:
  
   As someone else already mentioned, we should announce one future
 release
   (may be, 2.5) as the last JDK6-based release before making the move to
  JDK7.
  
   I am comfortable calling 2.5 the last JDK6 release.
  
  
   On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang 
 andrew.w...@cloudera.com
   wrote:
  
   Hi all, responding to multiple messages here,
  
   Arun, thanks for the clarification regarding MR classpaths. It sounds
  like
   the story there is improved and still improving.
  
   However, I think we still suffer from this at least on the HDFS side.
 We
   have a single JAR for all of HDFS, and our clients need to have all
 the
  fun
   deps like Guava on the classpath. I'm told Spark sticks a newer Guava
 at
   the front of the classpath and the HDFS client still works okay, but
  this
   is more happy coincidence than anything else. While we're leaking
 deps,
   we're in a scary situation.
  
   API compat to me means that an app should be able to run on a new
 minor
   version of Hadoop and not have anything break. MAPREDUCE-4421 sounds
  like
   it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what
   should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs
  and
   have nothing break. If we muck with the classpath, my understanding is
  that
   this could break.
  
   Owen, bumping the minimum JDK version in a minor release like this
  should
   be a one-time exception as Tucu stated. A number of people have
 pointed
  out
   how painful a forced JDK upgrade is for end users, and it's not
  something
   we should be springing on them in a minor release unless we're *very*
   confident like in this case.
  
   Chris, thanks for bringing up the ecosystem. For CDH5, we standardized
  on
   JDK7 across the CDH stack, so I think that's an indication that most
   ecosystem projects are ready to make the jump. Is that sufficient in
  your
   mind?
  
   For the record, I'm also +1 on the Tucu plan. Is it too late to do
 this
  for
   2.5? I'll offer to help out with some of the mechanics.
  
   Thanks,
   Andrew
  
   On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth 
  cnaur...@hortonworks.com
   wrote:
  
   I understood the plan for avoiding JDK7-specific features in our
 code,
   and
   your suggestion to add an extra Jenkins job is a great way to guard
   against
   that.  The thing I haven't seen discussed yet is how downstream
  projects
   will continue to consume our built artifacts.  If a downstream
 project
   upgrades to pick up a bug fix, and the jar switches to 1.7 class
 files,
   but
   their project is still building with 1.6, then it would be a nasty
   surprise.
  
   These are the options I see:
  
   1. Make sure all other projects upgrade first.  This doesn't sound
   feasible, unless all other ecosystem projects have moved to JDK7
  already.
   If not, then waiting on a single long pole project would hold up our
   migration indefinitely.
  
   2. We switch to JDK7, but run javac with -target 1.6 until the whole
   ecosystem upgrades.  I find this undesirable, because in a certain
  sense,
   it still leaves a bit of 1.6 lingering in the project.  (I'll assume
  that
   end-of-life for JDK6 also means end-of-life for the 1.6 bytecode
  format.)
  
   3. Just declare a clean break on some version (your earlier email
 said
   2.5)
   and start publishing artifacts built with JDK7 and no -target option.
   Overall, this is my preferred option.  However, as a side effect,
 this
   sets us up for longer-term maintenance and patch 

Re: Moving to JDK7, JDK8 and new major releases

2014-06-28 Thread Steve Loughran
Guava is a separate problem and I think we should have a separate
discussion what can we do about guava? That's more traumatic than a JDK
update, I fear, as the guava releases care a lot less about compatibility.
I don't worry about JDK updates removing classes like StringBuffer
because StringBuilder is better.


On 27 June 2014 19:26, Andrew Wang andrew.w...@cloudera.com wrote:

 Hi all, responding to multiple messages here,

 Arun, thanks for the clarification regarding MR classpaths. It sounds like
 the story there is improved and still improving.

 However, I think we still suffer from this at least on the HDFS side. We
 have a single JAR for all of HDFS, and our clients need to have all the fun
 deps like Guava on the classpath. I'm told Spark sticks a newer Guava at
 the front of the classpath and the HDFS client still works okay, but this
 is more happy coincidence than anything else. While we're leaking deps,
 we're in a scary situation.


very good point.



 API compat to me means that an app should be able to run on a new minor
 version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like
 it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what
 should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and
 have nothing break. If we muck with the classpath, my understanding is that
 this could break.


I think this is possible by having the app upload all the JARs...I need to
experiment here myself.



 Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on
 JDK7 across the CDH stack, so I think that's an indication that most
 ecosystem projects are ready to make the jump. Is that sufficient in your
 mind?


+1, we've had no complaints about things not working on Java 7. It's been
out a long time. IF you look at our own code, the main thing that broke
were tests -due to junit test case ordering- and not much else.



 For the record, I'm also +1 on the Tucu plan. Is it too late to do this for
 2.5? I'll offer to help out with some of the mechanics.



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Moving to JDK7, JDK8 and new major releases

2014-06-27 Thread Andrew Wang
Hi all, responding to multiple messages here,

Arun, thanks for the clarification regarding MR classpaths. It sounds like
the story there is improved and still improving.

However, I think we still suffer from this at least on the HDFS side. We
have a single JAR for all of HDFS, and our clients need to have all the fun
deps like Guava on the classpath. I'm told Spark sticks a newer Guava at
the front of the classpath and the HDFS client still works okay, but this
is more happy coincidence than anything else. While we're leaking deps,
we're in a scary situation.

API compat to me means that an app should be able to run on a new minor
version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like
it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what
should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and
have nothing break. If we muck with the classpath, my understanding is that
this could break.

Owen, bumping the minimum JDK version in a minor release like this should
be a one-time exception as Tucu stated. A number of people have pointed out
how painful a forced JDK upgrade is for end users, and it's not something
we should be springing on them in a minor release unless we're *very*
confident like in this case.

Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on
JDK7 across the CDH stack, so I think that's an indication that most
ecosystem projects are ready to make the jump. Is that sufficient in your
mind?

For the record, I'm also +1 on the Tucu plan. Is it too late to do this for
2.5? I'll offer to help out with some of the mechanics.

Thanks,
Andrew

On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com
wrote:

 I understood the plan for avoiding JDK7-specific features in our code, and
 your suggestion to add an extra Jenkins job is a great way to guard against
 that.  The thing I haven't seen discussed yet is how downstream projects
 will continue to consume our built artifacts.  If a downstream project
 upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but
 their project is still building with 1.6, then it would be a nasty
 surprise.

 These are the options I see:

 1. Make sure all other projects upgrade first.  This doesn't sound
 feasible, unless all other ecosystem projects have moved to JDK7 already.
  If not, then waiting on a single long pole project would hold up our
 migration indefinitely.

 2. We switch to JDK7, but run javac with -target 1.6 until the whole
 ecosystem upgrades.  I find this undesirable, because in a certain sense,
 it still leaves a bit of 1.6 lingering in the project.  (I'll assume that
 end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.)

 3. Just declare a clean break on some version (your earlier email said 2.5)
 and start publishing artifacts built with JDK7 and no -target option.
  Overall, this is my preferred option.  However, as a side effect, this
 sets us up for longer-term maintenance and patch releases off of the 2.4
 branch if a downstream project that's still on 1.6 needs to pick up a
 critical bug fix.

 Of course, this is all a moot point if all the downstream ecosystem
 projects have already made the switch to JDK7.  I don't know the status of
 that off the top of my head.  Maybe someone else out there knows?  If not,
 then I expect I can free up enough in a few weeks to volunteer for tracking
 down that information.

 Chris Nauroth
 Hortonworks
 http://hortonworks.com/



 On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com
 wrote:

  Chris,
 
  Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you
 are
  still using jdk7 libraries and you could use new APIs, thus breaking jdk6
  both at compile and runtime.
 
  you need to compile with jdk6 to ensure you are not running into that
  scenario. that is why i was suggesting the nightly jdk6 build/test
 jenkins
  job.
 
 
  On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth cnaur...@hortonworks.com
 
  wrote:
 
   I'm also +1 for getting us to JDK7 within the 2.x line after reading
 the
   proposals and catching up on the discussion in this thread.
  
   Has anyone yet considered how to coordinate this change with downstream
   projects?  Would we request downstream projects to upgrade to JDK7
 first
   before we make the move?  Would we switch to JDK7, but run javac
 -target
   1.6 to maintain compatibility for downstream projects during an interim
   period?
  
   Chris Nauroth
   Hortonworks
   http://hortonworks.com/
  
  
  
   On Wed, Jun 25, 2014 at 9:48 AM, Owen O'Malley omal...@apache.org
  wrote:
  
On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur 
 t...@cloudera.com
  
wrote:
   
 After reading this thread and thinking a bit about it, I think it
   should
be
 OK such move up to JDK7 in Hadoop
   
   
I agree with Alejandro. Changing minimum JDKs is not an incompatible
   change
and is fine in the 2 

Re: Moving to JDK7, JDK8 and new major releases

2014-06-27 Thread Karthik Kambatla
As someone else already mentioned, we should announce one future release
(may be, 2.5) as the last JDK6-based release before making the move to JDK7.

I am comfortable calling 2.5 the last JDK6 release.


On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi all, responding to multiple messages here,

 Arun, thanks for the clarification regarding MR classpaths. It sounds like
 the story there is improved and still improving.

 However, I think we still suffer from this at least on the HDFS side. We
 have a single JAR for all of HDFS, and our clients need to have all the fun
 deps like Guava on the classpath. I'm told Spark sticks a newer Guava at
 the front of the classpath and the HDFS client still works okay, but this
 is more happy coincidence than anything else. While we're leaking deps,
 we're in a scary situation.

 API compat to me means that an app should be able to run on a new minor
 version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like
 it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what
 should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and
 have nothing break. If we muck with the classpath, my understanding is that
 this could break.

 Owen, bumping the minimum JDK version in a minor release like this should
 be a one-time exception as Tucu stated. A number of people have pointed out
 how painful a forced JDK upgrade is for end users, and it's not something
 we should be springing on them in a minor release unless we're *very*
 confident like in this case.

 Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on
 JDK7 across the CDH stack, so I think that's an indication that most
 ecosystem projects are ready to make the jump. Is that sufficient in your
 mind?

 For the record, I'm also +1 on the Tucu plan. Is it too late to do this for
 2.5? I'll offer to help out with some of the mechanics.

 Thanks,
 Andrew

 On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com
 wrote:

  I understood the plan for avoiding JDK7-specific features in our code,
 and
  your suggestion to add an extra Jenkins job is a great way to guard
 against
  that.  The thing I haven't seen discussed yet is how downstream projects
  will continue to consume our built artifacts.  If a downstream project
  upgrades to pick up a bug fix, and the jar switches to 1.7 class files,
 but
  their project is still building with 1.6, then it would be a nasty
  surprise.
 
  These are the options I see:
 
  1. Make sure all other projects upgrade first.  This doesn't sound
  feasible, unless all other ecosystem projects have moved to JDK7 already.
   If not, then waiting on a single long pole project would hold up our
  migration indefinitely.
 
  2. We switch to JDK7, but run javac with -target 1.6 until the whole
  ecosystem upgrades.  I find this undesirable, because in a certain sense,
  it still leaves a bit of 1.6 lingering in the project.  (I'll assume that
  end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.)
 
  3. Just declare a clean break on some version (your earlier email said
 2.5)
  and start publishing artifacts built with JDK7 and no -target option.
   Overall, this is my preferred option.  However, as a side effect, this
  sets us up for longer-term maintenance and patch releases off of the 2.4
  branch if a downstream project that's still on 1.6 needs to pick up a
  critical bug fix.
 
  Of course, this is all a moot point if all the downstream ecosystem
  projects have already made the switch to JDK7.  I don't know the status
 of
  that off the top of my head.  Maybe someone else out there knows?  If
 not,
  then I expect I can free up enough in a few weeks to volunteer for
 tracking
  down that information.
 
  Chris Nauroth
  Hortonworks
  http://hortonworks.com/
 
 
 
  On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com
  wrote:
 
   Chris,
  
   Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you
  are
   still using jdk7 libraries and you could use new APIs, thus breaking
 jdk6
   both at compile and runtime.
  
   you need to compile with jdk6 to ensure you are not running into that
   scenario. that is why i was suggesting the nightly jdk6 build/test
  jenkins
   job.
  
  
   On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth 
 cnaur...@hortonworks.com
  
   wrote:
  
I'm also +1 for getting us to JDK7 within the 2.x line after reading
  the
proposals and catching up on the discussion in this thread.
   
Has anyone yet considered how to coordinate this change with
 downstream
projects?  Would we request downstream projects to upgrade to JDK7
  first
before we make the move?  Would we switch to JDK7, but run javac
  -target
1.6 to maintain compatibility for downstream projects during an
 interim
period?
   
Chris Nauroth
Hortonworks
http://hortonworks.com/
   
   
   
On Wed, Jun 25, 

Re: Moving to JDK7, JDK8 and new major releases

2014-06-27 Thread Andrew Wang
FYI I also just updated the wiki page with a Proposal D, aka Tucu plan,
which I think is essentially Proposal C but tabling JDK8 plans for now.

https://wiki.apache.org/hadoop/MovingToJdk7and8

Karthik, thanks for ringing in re: 2.5. I guess there's nothing urgently
required, the Jenkins stuff just needs to happen before 2.6. Still, I'm
happy to help with anything.

Thanks,
Andrew


On Fri, Jun 27, 2014 at 11:34 AM, Karthik Kambatla ka...@cloudera.com
wrote:

 As someone else already mentioned, we should announce one future release
 (may be, 2.5) as the last JDK6-based release before making the move to
 JDK7.

 I am comfortable calling 2.5 the last JDK6 release.


 On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang andrew.w...@cloudera.com
 wrote:

  Hi all, responding to multiple messages here,
 
  Arun, thanks for the clarification regarding MR classpaths. It sounds
 like
  the story there is improved and still improving.
 
  However, I think we still suffer from this at least on the HDFS side. We
  have a single JAR for all of HDFS, and our clients need to have all the
 fun
  deps like Guava on the classpath. I'm told Spark sticks a newer Guava at
  the front of the classpath and the HDFS client still works okay, but this
  is more happy coincidence than anything else. While we're leaking deps,
  we're in a scary situation.
 
  API compat to me means that an app should be able to run on a new minor
  version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like
  it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what
  should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and
  have nothing break. If we muck with the classpath, my understanding is
 that
  this could break.
 
  Owen, bumping the minimum JDK version in a minor release like this should
  be a one-time exception as Tucu stated. A number of people have pointed
 out
  how painful a forced JDK upgrade is for end users, and it's not something
  we should be springing on them in a minor release unless we're *very*
  confident like in this case.
 
  Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on
  JDK7 across the CDH stack, so I think that's an indication that most
  ecosystem projects are ready to make the jump. Is that sufficient in your
  mind?
 
  For the record, I'm also +1 on the Tucu plan. Is it too late to do this
 for
  2.5? I'll offer to help out with some of the mechanics.
 
  Thanks,
  Andrew
 
  On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com
 
  wrote:
 
   I understood the plan for avoiding JDK7-specific features in our code,
  and
   your suggestion to add an extra Jenkins job is a great way to guard
  against
   that.  The thing I haven't seen discussed yet is how downstream
 projects
   will continue to consume our built artifacts.  If a downstream project
   upgrades to pick up a bug fix, and the jar switches to 1.7 class files,
  but
   their project is still building with 1.6, then it would be a nasty
   surprise.
  
   These are the options I see:
  
   1. Make sure all other projects upgrade first.  This doesn't sound
   feasible, unless all other ecosystem projects have moved to JDK7
 already.
If not, then waiting on a single long pole project would hold up our
   migration indefinitely.
  
   2. We switch to JDK7, but run javac with -target 1.6 until the whole
   ecosystem upgrades.  I find this undesirable, because in a certain
 sense,
   it still leaves a bit of 1.6 lingering in the project.  (I'll assume
 that
   end-of-life for JDK6 also means end-of-life for the 1.6 bytecode
 format.)
  
   3. Just declare a clean break on some version (your earlier email said
  2.5)
   and start publishing artifacts built with JDK7 and no -target option.
Overall, this is my preferred option.  However, as a side effect, this
   sets us up for longer-term maintenance and patch releases off of the
 2.4
   branch if a downstream project that's still on 1.6 needs to pick up a
   critical bug fix.
  
   Of course, this is all a moot point if all the downstream ecosystem
   projects have already made the switch to JDK7.  I don't know the status
  of
   that off the top of my head.  Maybe someone else out there knows?  If
  not,
   then I expect I can free up enough in a few weeks to volunteer for
  tracking
   down that information.
  
   Chris Nauroth
   Hortonworks
   http://hortonworks.com/
  
  
  
   On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com
 
   wrote:
  
Chris,
   
Compiling with jdk7 and doing javac -target 1.6 is not sufficient,
 you
   are
still using jdk7 libraries and you could use new APIs, thus breaking
  jdk6
both at compile and runtime.
   
you need to compile with jdk6 to ensure you are not running into that
scenario. that is why i was suggesting the nightly jdk6 build/test
   jenkins
job.
   
   
On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth 
  

Re: Moving to JDK7, JDK8 and new major releases

2014-06-27 Thread Arun C. Murthy
Thanks everyone for the discussion. Looks like we have come to a pragmatic and 
progressive conclusion.

In terms of execution of the consensus plan, I think a little bit of caution is 
in order.

Let's give downstream projects more of a runway.

I propose we inform HBase, Pig, Hive etc. that we are considering making 2.6 
(not 2.5) the last JDK6 release and solicit their feedback. Once they are 
comfortable we can pull the trigger in 2.7.

thanks,
Arun


 On Jun 27, 2014, at 11:34 AM, Karthik Kambatla ka...@cloudera.com wrote:
 
 As someone else already mentioned, we should announce one future release
 (may be, 2.5) as the last JDK6-based release before making the move to JDK7.
 
 I am comfortable calling 2.5 the last JDK6 release.
 
 
 On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang andrew.w...@cloudera.com
 wrote:
 
 Hi all, responding to multiple messages here,
 
 Arun, thanks for the clarification regarding MR classpaths. It sounds like
 the story there is improved and still improving.
 
 However, I think we still suffer from this at least on the HDFS side. We
 have a single JAR for all of HDFS, and our clients need to have all the fun
 deps like Guava on the classpath. I'm told Spark sticks a newer Guava at
 the front of the classpath and the HDFS client still works okay, but this
 is more happy coincidence than anything else. While we're leaking deps,
 we're in a scary situation.
 
 API compat to me means that an app should be able to run on a new minor
 version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like
 it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what
 should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and
 have nothing break. If we muck with the classpath, my understanding is that
 this could break.
 
 Owen, bumping the minimum JDK version in a minor release like this should
 be a one-time exception as Tucu stated. A number of people have pointed out
 how painful a forced JDK upgrade is for end users, and it's not something
 we should be springing on them in a minor release unless we're *very*
 confident like in this case.
 
 Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on
 JDK7 across the CDH stack, so I think that's an indication that most
 ecosystem projects are ready to make the jump. Is that sufficient in your
 mind?
 
 For the record, I'm also +1 on the Tucu plan. Is it too late to do this for
 2.5? I'll offer to help out with some of the mechanics.
 
 Thanks,
 Andrew
 
 On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com
 wrote:
 
 I understood the plan for avoiding JDK7-specific features in our code,
 and
 your suggestion to add an extra Jenkins job is a great way to guard
 against
 that.  The thing I haven't seen discussed yet is how downstream projects
 will continue to consume our built artifacts.  If a downstream project
 upgrades to pick up a bug fix, and the jar switches to 1.7 class files,
 but
 their project is still building with 1.6, then it would be a nasty
 surprise.
 
 These are the options I see:
 
 1. Make sure all other projects upgrade first.  This doesn't sound
 feasible, unless all other ecosystem projects have moved to JDK7 already.
 If not, then waiting on a single long pole project would hold up our
 migration indefinitely.
 
 2. We switch to JDK7, but run javac with -target 1.6 until the whole
 ecosystem upgrades.  I find this undesirable, because in a certain sense,
 it still leaves a bit of 1.6 lingering in the project.  (I'll assume that
 end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.)
 
 3. Just declare a clean break on some version (your earlier email said
 2.5)
 and start publishing artifacts built with JDK7 and no -target option.
 Overall, this is my preferred option.  However, as a side effect, this
 sets us up for longer-term maintenance and patch releases off of the 2.4
 branch if a downstream project that's still on 1.6 needs to pick up a
 critical bug fix.
 
 Of course, this is all a moot point if all the downstream ecosystem
 projects have already made the switch to JDK7.  I don't know the status
 of
 that off the top of my head.  Maybe someone else out there knows?  If
 not,
 then I expect I can free up enough in a few weeks to volunteer for
 tracking
 down that information.
 
 Chris Nauroth
 Hortonworks
 http://hortonworks.com/
 
 
 
 On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com
 wrote:
 
 Chris,
 
 Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you
 are
 still using jdk7 libraries and you could use new APIs, thus breaking
 jdk6
 both at compile and runtime.
 
 you need to compile with jdk6 to ensure you are not running into that
 scenario. that is why i was suggesting the nightly jdk6 build/test
 jenkins
 job.
 
 
 On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth 
 cnaur...@hortonworks.com
 
 wrote:
 
 I'm also +1 for getting us to JDK7 within the 2.x line after reading
 the
 

Re: Moving to JDK7, JDK8 and new major releases

2014-06-27 Thread Karthik Kambatla
+1 to making 2.6 the last JDK6 release.

If we want, 2.7 could be a parallel release or one soon after 2.6. We could
upgrade other dependencies that require JDK7 as well.


On Fri, Jun 27, 2014 at 3:01 PM, Arun C. Murthy a...@hortonworks.com wrote:

 Thanks everyone for the discussion. Looks like we have come to a pragmatic
 and progressive conclusion.

 In terms of execution of the consensus plan, I think a little bit of
 caution is in order.

 Let's give downstream projects more of a runway.

 I propose we inform HBase, Pig, Hive etc. that we are considering making
 2.6 (not 2.5) the last JDK6 release and solicit their feedback. Once they
 are comfortable we can pull the trigger in 2.7.

 thanks,
 Arun


  On Jun 27, 2014, at 11:34 AM, Karthik Kambatla ka...@cloudera.com
 wrote:
 
  As someone else already mentioned, we should announce one future release
  (may be, 2.5) as the last JDK6-based release before making the move to
 JDK7.
 
  I am comfortable calling 2.5 the last JDK6 release.
 
 
  On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang andrew.w...@cloudera.com
  wrote:
 
  Hi all, responding to multiple messages here,
 
  Arun, thanks for the clarification regarding MR classpaths. It sounds
 like
  the story there is improved and still improving.
 
  However, I think we still suffer from this at least on the HDFS side. We
  have a single JAR for all of HDFS, and our clients need to have all the
 fun
  deps like Guava on the classpath. I'm told Spark sticks a newer Guava at
  the front of the classpath and the HDFS client still works okay, but
 this
  is more happy coincidence than anything else. While we're leaking deps,
  we're in a scary situation.
 
  API compat to me means that an app should be able to run on a new minor
  version of Hadoop and not have anything break. MAPREDUCE-4421 sounds
 like
  it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what
  should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs
 and
  have nothing break. If we muck with the classpath, my understanding is
 that
  this could break.
 
  Owen, bumping the minimum JDK version in a minor release like this
 should
  be a one-time exception as Tucu stated. A number of people have pointed
 out
  how painful a forced JDK upgrade is for end users, and it's not
 something
  we should be springing on them in a minor release unless we're *very*
  confident like in this case.
 
  Chris, thanks for bringing up the ecosystem. For CDH5, we standardized
 on
  JDK7 across the CDH stack, so I think that's an indication that most
  ecosystem projects are ready to make the jump. Is that sufficient in
 your
  mind?
 
  For the record, I'm also +1 on the Tucu plan. Is it too late to do this
 for
  2.5? I'll offer to help out with some of the mechanics.
 
  Thanks,
  Andrew
 
  On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth 
 cnaur...@hortonworks.com
  wrote:
 
  I understood the plan for avoiding JDK7-specific features in our code,
  and
  your suggestion to add an extra Jenkins job is a great way to guard
  against
  that.  The thing I haven't seen discussed yet is how downstream
 projects
  will continue to consume our built artifacts.  If a downstream project
  upgrades to pick up a bug fix, and the jar switches to 1.7 class files,
  but
  their project is still building with 1.6, then it would be a nasty
  surprise.
 
  These are the options I see:
 
  1. Make sure all other projects upgrade first.  This doesn't sound
  feasible, unless all other ecosystem projects have moved to JDK7
 already.
  If not, then waiting on a single long pole project would hold up our
  migration indefinitely.
 
  2. We switch to JDK7, but run javac with -target 1.6 until the whole
  ecosystem upgrades.  I find this undesirable, because in a certain
 sense,
  it still leaves a bit of 1.6 lingering in the project.  (I'll assume
 that
  end-of-life for JDK6 also means end-of-life for the 1.6 bytecode
 format.)
 
  3. Just declare a clean break on some version (your earlier email said
  2.5)
  and start publishing artifacts built with JDK7 and no -target option.
  Overall, this is my preferred option.  However, as a side effect, this
  sets us up for longer-term maintenance and patch releases off of the
 2.4
  branch if a downstream project that's still on 1.6 needs to pick up a
  critical bug fix.
 
  Of course, this is all a moot point if all the downstream ecosystem
  projects have already made the switch to JDK7.  I don't know the status
  of
  that off the top of my head.  Maybe someone else out there knows?  If
  not,
  then I expect I can free up enough in a few weeks to volunteer for
  tracking
  down that information.
 
  Chris Nauroth
  Hortonworks
  http://hortonworks.com/
 
 
 
  On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com
 
  wrote:
 
  Chris,
 
  Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you
  are
  still using jdk7 libraries and you could use new APIs, thus breaking
 

Re: Moving to JDK7, JDK8 and new major releases

2014-06-25 Thread Owen O'Malley
On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com
wrote:

 After reading this thread and thinking a bit about it, I think it should be
 OK such move up to JDK7 in Hadoop


I agree with Alejandro. Changing minimum JDKs is not an incompatible change
and is fine in the 2 branch. (Although I think it is would *not* be
appropriate for a patch release.) Of course we need to do it with
forethought and testing, but moving off of JDK 6, which is EOL'ed is a good
thing. Moving to Java 8 as a minimum seems much too aggressive and I would
push back on that.

I'm also think that we need to let the dust settle on the Hadoop 2 line for
a while before we talk about Hadoop 3. It seems that it has only been in
the last 6 months that Hadoop 2 adoption has reached the main stream users.
Our user community needs time to digest the changes in Hadoop 2.x before we
fracture the community by starting to discuss Hadoop 3 releases.

.. Owen


Re: Moving to JDK7, JDK8 and new major releases

2014-06-25 Thread Chris Nauroth
I'm also +1 for getting us to JDK7 within the 2.x line after reading the
proposals and catching up on the discussion in this thread.

Has anyone yet considered how to coordinate this change with downstream
projects?  Would we request downstream projects to upgrade to JDK7 first
before we make the move?  Would we switch to JDK7, but run javac -target
1.6 to maintain compatibility for downstream projects during an interim
period?

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Wed, Jun 25, 2014 at 9:48 AM, Owen O'Malley omal...@apache.org wrote:

 On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com
 wrote:

  After reading this thread and thinking a bit about it, I think it should
 be
  OK such move up to JDK7 in Hadoop


 I agree with Alejandro. Changing minimum JDKs is not an incompatible change
 and is fine in the 2 branch. (Although I think it is would *not* be
 appropriate for a patch release.) Of course we need to do it with
 forethought and testing, but moving off of JDK 6, which is EOL'ed is a good
 thing. Moving to Java 8 as a minimum seems much too aggressive and I would
 push back on that.

 I'm also think that we need to let the dust settle on the Hadoop 2 line for
 a while before we talk about Hadoop 3. It seems that it has only been in
 the last 6 months that Hadoop 2 adoption has reached the main stream users.
 Our user community needs time to digest the changes in Hadoop 2.x before we
 fracture the community by starting to discuss Hadoop 3 releases.

 .. Owen


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Moving to JDK7, JDK8 and new major releases

2014-06-25 Thread Alejandro Abdelnur
Chris,

Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you are
still using jdk7 libraries and you could use new APIs, thus breaking jdk6
both at compile and runtime.

you need to compile with jdk6 to ensure you are not running into that
scenario. that is why i was suggesting the nightly jdk6 build/test jenkins
job.


On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth cnaur...@hortonworks.com
wrote:

 I'm also +1 for getting us to JDK7 within the 2.x line after reading the
 proposals and catching up on the discussion in this thread.

 Has anyone yet considered how to coordinate this change with downstream
 projects?  Would we request downstream projects to upgrade to JDK7 first
 before we make the move?  Would we switch to JDK7, but run javac -target
 1.6 to maintain compatibility for downstream projects during an interim
 period?

 Chris Nauroth
 Hortonworks
 http://hortonworks.com/



 On Wed, Jun 25, 2014 at 9:48 AM, Owen O'Malley omal...@apache.org wrote:

  On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com
  wrote:
 
   After reading this thread and thinking a bit about it, I think it
 should
  be
   OK such move up to JDK7 in Hadoop
 
 
  I agree with Alejandro. Changing minimum JDKs is not an incompatible
 change
  and is fine in the 2 branch. (Although I think it is would *not* be
  appropriate for a patch release.) Of course we need to do it with
  forethought and testing, but moving off of JDK 6, which is EOL'ed is a
 good
  thing. Moving to Java 8 as a minimum seems much too aggressive and I
 would
  push back on that.
 
  I'm also think that we need to let the dust settle on the Hadoop 2 line
 for
  a while before we talk about Hadoop 3. It seems that it has only been in
  the last 6 months that Hadoop 2 adoption has reached the main stream
 users.
  Our user community needs time to digest the changes in Hadoop 2.x before
 we
  fracture the community by starting to discuss Hadoop 3 releases.
 
  .. Owen
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Alejandro


Re: Moving to JDK7, JDK8 and new major releases

2014-06-25 Thread Akira AJISAKA

+1 (non-binding) for 2.5 to be the last release to ensure JDK6.

 My higher-level goal though is to avoid going through this same pain
 again when JDK7 goes EOL. I'd like to do a JDK8-based release
 before then for this reason. This is why I suggested skipping an
 intermediate 2.x+JDK7 release and leapfrogging to 3.0+JDK8.

I'm thinking skipping an intermediate release and leapfrogging to 3.0 
makes it difficult to maintain branch-2. It's only about a half year 
from 2.2 GA, so we should maintain branch-2 and create bug-fix releases 
for long-term even if 3.0+JDK8 is released.


Thanks,
Akira

(2014/06/24 17:56), Steve Loughran wrote:

+1, though I think 2.5 may be premature if we want to send a warning note
last ever. That's an issue for followon when in branch 2.

Guava and protobuf.jar are two things we have to leave alone, with the
first being unfortunate, but their attitude to updates is pretty dramatic.
The latter? We all know how traumatic that can be.

-Steve


On 24 June 2014 16:44, Alejandro Abdelnur t...@cloudera.com wrote:


After reading this thread and thinking a bit about it, I think it should be
OK such move up to JDK7 in Hadoop 2 for the following reasons:

* Existing Hadoop 2 releases and related projects are running
   on JDK7 in production.
* Commercial vendors of Hadoop have already done lot of
   work to ensure Hadoop on JDK7 works while keeping Hadoop
   on JDK6 working.
* Different from many of the 3rd party libraries used by Hadoop,
   JDK is much stricter on backwards compatibility.

IMPORTANT: I take this as an exception and not as a carte blanche for 3rd
party dependencies and for moving from JDK7 to JDK8 (though it could OK for
the later if we end up in the same state of affairs)

Even for Hadoop 2.5, I think we could do the move:

* Create the Hadoop 2.5 release branch.
* Have one nightly Jenkins job that builds Hadoop 2.5 branch
   with JDK6 to ensure not JDK7 language/API  feature creeps
   out in Hadoop 2.5. Keep this for all Hadoop 2.5.x releases.
* Sanity tests for the Hadoop 2.5.x releases should be done
   with JDK7.
* Apply Steve’s patch to require JDK7 on trunk and branch-2.
* Move all Apache Jenkins jobs to build/test using JDK7.
* Starting from Hadoop 2.6 we support JDK7 language/API
   features.

Effectively what we are ensuring that Hadoop 2.5.x builds and test with
JDK6  JDK7 and that all tests towards the release
are done with JDK7.

Users can proactively upgrade to JDK7 before upgrading to Hadoop 2.5.x, or
if upgrade to Hadoop 2.5.x and they run into any issue because of JDK6
(which it would be quite unlikely) they can reactively upgrade to JDK7.

Thoughts?


On Tue, Jun 24, 2014 at 4:22 PM, Andrew Wang andrew.w...@cloudera.com
wrote:


Hi all,

On dependencies, we've bumped library versions when we think it's safe

and

the APIs in the new version are compatible. Or, it's not leaked to the

app

classpath (e.g the JUnit version bump). I think the JIRAs Arun mentioned
fall into one of those categories. Steve can do a better job explaining
this to me, but we haven't bumped things like Jetty or Guava because they
are on the classpath and are not compatible. There is this line in the
compat guidelines:

- Existing MapReduce, YARN  HDFS applications and frameworks should
work unmodified within a major release i.e. Apache Hadoop ABI is
supported.

Since Hadoop apps can and do depend on the Hadoop classpath, the

classpath

is effectively part of our API. I'm sure there are user apps out there

that

will break if we make incompatible changes to the classpath. I haven't

read

up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app

out

there.

Sticking to the theme of work unmodified, let's think about the user
effort required to upgrade their JDK. This can be a very expensive task.

It

might need approval up and down the org, meaning lots of certification,
testing, and signoff. Considering the amount of user effort involved

here,

it really seems like dropping a JDK is something that should only happen

in

a major release. Else, there's the potential for nasty surprises in a
supposedly minor release.

That said, we are in an unhappy place right now regarding JDK6, and it's
true that almost everyone's moved off of JDK6 at this point. So, I'd be
okay with an intermediate 2.x release that drops JDK6 support (but no
incompatible changes to the classpath like Guava). This is basically

free,

and we could start using JDK7 idioms like multi-catch and new NIO stuff

in

Hadoop code (a minor draw I guess).

My higher-level goal though is to avoid going through this same pain

again

when JDK7 goes EOL. I'd like to do a JDK8-based release before then for
this reason. This is why I suggested skipping an intermediate 2.x+JDK7
release and leapfrogging to 3.0+JDK8. 10 months is really not that far in
the future, and it seems like a better place to focus our efforts. I was
also hoping it'd be realistic to fix our classpath leakage by then, since
then 

Re: Moving to JDK7, JDK8 and new major releases

2014-06-25 Thread Chris Nauroth
I understood the plan for avoiding JDK7-specific features in our code, and
your suggestion to add an extra Jenkins job is a great way to guard against
that.  The thing I haven't seen discussed yet is how downstream projects
will continue to consume our built artifacts.  If a downstream project
upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but
their project is still building with 1.6, then it would be a nasty surprise.

These are the options I see:

1. Make sure all other projects upgrade first.  This doesn't sound
feasible, unless all other ecosystem projects have moved to JDK7 already.
 If not, then waiting on a single long pole project would hold up our
migration indefinitely.

2. We switch to JDK7, but run javac with -target 1.6 until the whole
ecosystem upgrades.  I find this undesirable, because in a certain sense,
it still leaves a bit of 1.6 lingering in the project.  (I'll assume that
end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.)

3. Just declare a clean break on some version (your earlier email said 2.5)
and start publishing artifacts built with JDK7 and no -target option.
 Overall, this is my preferred option.  However, as a side effect, this
sets us up for longer-term maintenance and patch releases off of the 2.4
branch if a downstream project that's still on 1.6 needs to pick up a
critical bug fix.

Of course, this is all a moot point if all the downstream ecosystem
projects have already made the switch to JDK7.  I don't know the status of
that off the top of my head.  Maybe someone else out there knows?  If not,
then I expect I can free up enough in a few weeks to volunteer for tracking
down that information.

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com
wrote:

 Chris,

 Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you are
 still using jdk7 libraries and you could use new APIs, thus breaking jdk6
 both at compile and runtime.

 you need to compile with jdk6 to ensure you are not running into that
 scenario. that is why i was suggesting the nightly jdk6 build/test jenkins
 job.


 On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth cnaur...@hortonworks.com
 wrote:

  I'm also +1 for getting us to JDK7 within the 2.x line after reading the
  proposals and catching up on the discussion in this thread.
 
  Has anyone yet considered how to coordinate this change with downstream
  projects?  Would we request downstream projects to upgrade to JDK7 first
  before we make the move?  Would we switch to JDK7, but run javac -target
  1.6 to maintain compatibility for downstream projects during an interim
  period?
 
  Chris Nauroth
  Hortonworks
  http://hortonworks.com/
 
 
 
  On Wed, Jun 25, 2014 at 9:48 AM, Owen O'Malley omal...@apache.org
 wrote:
 
   On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com
 
   wrote:
  
After reading this thread and thinking a bit about it, I think it
  should
   be
OK such move up to JDK7 in Hadoop
  
  
   I agree with Alejandro. Changing minimum JDKs is not an incompatible
  change
   and is fine in the 2 branch. (Although I think it is would *not* be
   appropriate for a patch release.) Of course we need to do it with
   forethought and testing, but moving off of JDK 6, which is EOL'ed is a
  good
   thing. Moving to Java 8 as a minimum seems much too aggressive and I
  would
   push back on that.
  
   I'm also think that we need to let the dust settle on the Hadoop 2 line
  for
   a while before we talk about Hadoop 3. It seems that it has only been
 in
   the last 6 months that Hadoop 2 adoption has reached the main stream
  users.
   Our user community needs time to digest the changes in Hadoop 2.x
 before
  we
   fracture the community by starting to discuss Hadoop 3 releases.
  
   .. Owen
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 



 --
 Alejandro


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this 

Re: Moving to JDK7, JDK8 and new major releases

2014-06-24 Thread Arun C Murthy
Andrew,

 Thanks for starting this thread. I'll edit the wiki to provide more context 
around rolling-upgrades etc. which, as I pointed out in the original thread, 
are key IMHO.

On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com wrote:
 https://wiki.apache.org/hadoop/MovingToJdk7and8
 
 I think based on our current compatibility guidelines, Proposal A is the
 most attractive. We're pretty hamstrung by the requirement to keep the
 classpath the same, which would be solved by either OSGI or shading our
 deps (but that's a different discussion).

I don't see that anywhere in our current compatibility guidelines.

As you can see from 
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html
 we do not have such a policy (pasted here for convenience):

Java Classpath

User applications built against Hadoop might add all Hadoop jars (including 
Hadoop's library dependencies) to the application's classpath. Adding new 
dependencies or updating the version of existing dependencies may interfere 
with those in applications' classpaths.

Policy

Currently, there is NO policy on when Hadoop's dependencies can change.

Furthermore, we have *already* changed our classpath in hadoop-2.x. Again, as I 
pointed out in the previous thread, here is the precedent:

On Jun 21, 2014, at 5:59 PM, Arun C Murthy a...@hortonworks.com wrote:

 Also, this is something we already have done i.e. we updated some of our 
 software deps in hadoop-2.4 v/s hadoop-2.2 - clearly not something as 
 dramatic as JDK. Here are some examples:
 https://issues.apache.org/jira/browse/HADOOP-9991
 https://issues.apache.org/jira/browse/HADOOP-10102
 https://issues.apache.org/jira/browse/HADOOP-10103
 https://issues.apache.org/jira/browse/HADOOP-10104
 https://issues.apache.org/jira/browse/HADOOP-10503

thanks,
Arun
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Moving to JDK7, JDK8 and new major releases

2014-06-24 Thread Steve Loughran
That classpath policy was explicitly added because we can't lock down our
dependencies for security/bug fix reasons, and also because if we do update
something explicitly, their transitive dependencies can change -beyond our
control.

https://issues.apache.org/jira/browse/HADOOP-9555 is an example of this: an
update of ZK explicitly to fix an HA problem. Are there changes in its
dependencies? I don't know. But we didn't have a choice to update if we
wanted NN  RM failover to work reliably, so we have to take any other
changes that went in.

JDK upgrades can be viewed as an extension of this -we are changing the
base platform that Hadoop runs on. More precisely, for the Java 6- Java 7
update, we are reflecting the fact that nobody is running in production on
Java 6

Do you realise we actually moved to Java 6 in 2008?
https://issues.apache.org/jira/browse/HADOOP-2325 . That was six years ago
-half the names on that list are not active on the project any more.

What we did there was issue a warning in 0.18 that it would be the last
Java 5 version; 0.19  moved up -we can do the same for a Hadoop 2.x release
at some point this year.



On 24 June 2014 11:43, Arun C Murthy a...@hortonworks.com wrote:

 Andrew,

  Thanks for starting this thread. I'll edit the wiki to provide more
 context around rolling-upgrades etc. which, as I pointed out in the
 original thread, are key IMHO.

 On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com
 wrote:
  https://wiki.apache.org/hadoop/MovingToJdk7and8
 
  I think based on our current compatibility guidelines, Proposal A is the
  most attractive. We're pretty hamstrung by the requirement to keep the
  classpath the same, which would be solved by either OSGI or shading our
  deps (but that's a different discussion).

 I don't see that anywhere in our current compatibility guidelines.

 As you can see from
 http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html
 we do not have such a policy (pasted here for convenience):

 Java Classpath

 User applications built against Hadoop might add all Hadoop jars
 (including Hadoop's library dependencies) to the application's classpath.
 Adding new dependencies or updating the version of existing dependencies
 may interfere with those in applications' classpaths.

 Policy

 Currently, there is NO policy on when Hadoop's dependencies can change.

 Furthermore, we have *already* changed our classpath in hadoop-2.x. Again,
 as I pointed out in the previous thread, here is the precedent:

 On Jun 21, 2014, at 5:59 PM, Arun C Murthy a...@hortonworks.com wrote:

  Also, this is something we already have done i.e. we updated some of our
 software deps in hadoop-2.4 v/s hadoop-2.2 - clearly not something as
 dramatic as JDK. Here are some examples:
  https://issues.apache.org/jira/browse/HADOOP-9991
  https://issues.apache.org/jira/browse/HADOOP-10102
  https://issues.apache.org/jira/browse/HADOOP-10103
  https://issues.apache.org/jira/browse/HADOOP-10104
  https://issues.apache.org/jira/browse/HADOOP-10503

 thanks,
 Arun
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Moving to JDK7, JDK8 and new major releases

2014-06-24 Thread Vinod Kumar Vavilapalli
Tx for the new thread Andrew, hopefully it can attract more eyes.

Here's what I am behind - a modified proposal C.
 - Overall I wouldn't think about EOL of JDK7 and/or JDK8 specifically given 
how long it has taken for JDK6 life-cycle to end. We should try to focus on 
JDK7 only for now.
 - As we have seen, a lot (majority?) of orgs on Hadoop have moved beyond JDK6 
and are already running on JDK7. So upgrading to JDK7 is more of a reflection 
of reality (to quote Steve) than it in itself being a disruptive change.
 - We should try decoupling the discussion of major releases from JDK upgrades. 
We have seen individual libraries getting updated right in the 2.x lines as and 
when necessary. Given the new reality of JDK7, I don't see the 'JDK change' as 
much different from the library upgrades.

We have seen how long it has taken (and still taking) users and organization to 
move from Hadoop 1 to Hadoop 2. A Hadoop 3/4 that adds nothing else other than 
JDK upgrades will be a big source of confusion for users. A major version 
update is also seen an opportunity for devs to break APIs. Unless we have 
groundbreaking 'features' (like YARN or wire-compatibility in Hadoop-2) that a 
majority of users want and that specifically warrant incompatible changes in 
our APIs or wire protocols, we are better off separating the major-version 
update discussion into ints own.

Irrespective of all this, we should actively get behind better isolation of 
user classes/jars from MapReduce classpath. This one's been such a long running 
concern, it's not funny anymore.

Thanks,
+Vinod

On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com wrote:

 Hi all,
 
 Forking this thread as requested by Vinod. To help anyone who's catching up
 with this thread, I've written up a wiki page containing what I think are
 the proposals under discussion. I did my very best to make this as
 fact-based and disinterested as possible; I really appreciate the
 constructive discussion we've had so far. If you believe you have a
 proposal pending, please feel free to edit the wiki.
 
 https://wiki.apache.org/hadoop/MovingToJdk7and8
 
 I think based on our current compatibility guidelines, Proposal A is the
 most attractive. We're pretty hamstrung by the requirement to keep the
 classpath the same, which would be solved by either OSGI or shading our
 deps (but that's a different discussion).
 
 Thanks,
 Andrew


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Moving to JDK7, JDK8 and new major releases

2014-06-24 Thread Andrew Wang
Hi all,

On dependencies, we've bumped library versions when we think it's safe and
the APIs in the new version are compatible. Or, it's not leaked to the app
classpath (e.g the JUnit version bump). I think the JIRAs Arun mentioned
fall into one of those categories. Steve can do a better job explaining
this to me, but we haven't bumped things like Jetty or Guava because they
are on the classpath and are not compatible. There is this line in the
compat guidelines:

   - Existing MapReduce, YARN  HDFS applications and frameworks should
   work unmodified within a major release i.e. Apache Hadoop ABI is supported.

Since Hadoop apps can and do depend on the Hadoop classpath, the classpath
is effectively part of our API. I'm sure there are user apps out there that
will break if we make incompatible changes to the classpath. I haven't read
up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app out
there.

Sticking to the theme of work unmodified, let's think about the user
effort required to upgrade their JDK. This can be a very expensive task. It
might need approval up and down the org, meaning lots of certification,
testing, and signoff. Considering the amount of user effort involved here,
it really seems like dropping a JDK is something that should only happen in
a major release. Else, there's the potential for nasty surprises in a
supposedly minor release.

That said, we are in an unhappy place right now regarding JDK6, and it's
true that almost everyone's moved off of JDK6 at this point. So, I'd be
okay with an intermediate 2.x release that drops JDK6 support (but no
incompatible changes to the classpath like Guava). This is basically free,
and we could start using JDK7 idioms like multi-catch and new NIO stuff in
Hadoop code (a minor draw I guess).

My higher-level goal though is to avoid going through this same pain again
when JDK7 goes EOL. I'd like to do a JDK8-based release before then for
this reason. This is why I suggested skipping an intermediate 2.x+JDK7
release and leapfrogging to 3.0+JDK8. 10 months is really not that far in
the future, and it seems like a better place to focus our efforts. I was
also hoping it'd be realistic to fix our classpath leakage by then, since
then we'd have a nice, tight, future-proofed new major release.

Thanks,
Andrew




On Tue, Jun 24, 2014 at 11:43 AM, Arun C Murthy a...@hortonworks.com wrote:

 Andrew,

  Thanks for starting this thread. I'll edit the wiki to provide more
 context around rolling-upgrades etc. which, as I pointed out in the
 original thread, are key IMHO.

 On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com
 wrote:
  https://wiki.apache.org/hadoop/MovingToJdk7and8
 
  I think based on our current compatibility guidelines, Proposal A is the
  most attractive. We're pretty hamstrung by the requirement to keep the
  classpath the same, which would be solved by either OSGI or shading our
  deps (but that's a different discussion).

 I don't see that anywhere in our current compatibility guidelines.

 As you can see from
 http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html
 we do not have such a policy (pasted here for convenience):

 Java Classpath

 User applications built against Hadoop might add all Hadoop jars
 (including Hadoop's library dependencies) to the application's classpath.
 Adding new dependencies or updating the version of existing dependencies
 may interfere with those in applications' classpaths.

 Policy

 Currently, there is NO policy on when Hadoop's dependencies can change.

 Furthermore, we have *already* changed our classpath in hadoop-2.x. Again,
 as I pointed out in the previous thread, here is the precedent:

 On Jun 21, 2014, at 5:59 PM, Arun C Murthy a...@hortonworks.com wrote:

  Also, this is something we already have done i.e. we updated some of our
 software deps in hadoop-2.4 v/s hadoop-2.2 - clearly not something as
 dramatic as JDK. Here are some examples:
  https://issues.apache.org/jira/browse/HADOOP-9991
  https://issues.apache.org/jira/browse/HADOOP-10102
  https://issues.apache.org/jira/browse/HADOOP-10103
  https://issues.apache.org/jira/browse/HADOOP-10104
  https://issues.apache.org/jira/browse/HADOOP-10503

 thanks,
 Arun
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Moving to JDK7, JDK8 and new major releases

2014-06-24 Thread Alejandro Abdelnur
After reading this thread and thinking a bit about it, I think it should be
OK such move up to JDK7 in Hadoop 2 for the following reasons:

* Existing Hadoop 2 releases and related projects are running
  on JDK7 in production.
* Commercial vendors of Hadoop have already done lot of
  work to ensure Hadoop on JDK7 works while keeping Hadoop
  on JDK6 working.
* Different from many of the 3rd party libraries used by Hadoop,
  JDK is much stricter on backwards compatibility.

IMPORTANT: I take this as an exception and not as a carte blanche for 3rd
party dependencies and for moving from JDK7 to JDK8 (though it could OK for
the later if we end up in the same state of affairs)

Even for Hadoop 2.5, I think we could do the move:

* Create the Hadoop 2.5 release branch.
* Have one nightly Jenkins job that builds Hadoop 2.5 branch
  with JDK6 to ensure not JDK7 language/API  feature creeps
  out in Hadoop 2.5. Keep this for all Hadoop 2.5.x releases.
* Sanity tests for the Hadoop 2.5.x releases should be done
  with JDK7.
* Apply Steve’s patch to require JDK7 on trunk and branch-2.
* Move all Apache Jenkins jobs to build/test using JDK7.
* Starting from Hadoop 2.6 we support JDK7 language/API
  features.

Effectively what we are ensuring that Hadoop 2.5.x builds and test with
JDK6  JDK7 and that all tests towards the release
are done with JDK7.

Users can proactively upgrade to JDK7 before upgrading to Hadoop 2.5.x, or
if upgrade to Hadoop 2.5.x and they run into any issue because of JDK6
(which it would be quite unlikely) they can reactively upgrade to JDK7.

Thoughts?


On Tue, Jun 24, 2014 at 4:22 PM, Andrew Wang andrew.w...@cloudera.com
wrote:

 Hi all,

 On dependencies, we've bumped library versions when we think it's safe and
 the APIs in the new version are compatible. Or, it's not leaked to the app
 classpath (e.g the JUnit version bump). I think the JIRAs Arun mentioned
 fall into one of those categories. Steve can do a better job explaining
 this to me, but we haven't bumped things like Jetty or Guava because they
 are on the classpath and are not compatible. There is this line in the
 compat guidelines:

- Existing MapReduce, YARN  HDFS applications and frameworks should
work unmodified within a major release i.e. Apache Hadoop ABI is
 supported.

 Since Hadoop apps can and do depend on the Hadoop classpath, the classpath
 is effectively part of our API. I'm sure there are user apps out there that
 will break if we make incompatible changes to the classpath. I haven't read
 up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app out
 there.

 Sticking to the theme of work unmodified, let's think about the user
 effort required to upgrade their JDK. This can be a very expensive task. It
 might need approval up and down the org, meaning lots of certification,
 testing, and signoff. Considering the amount of user effort involved here,
 it really seems like dropping a JDK is something that should only happen in
 a major release. Else, there's the potential for nasty surprises in a
 supposedly minor release.

 That said, we are in an unhappy place right now regarding JDK6, and it's
 true that almost everyone's moved off of JDK6 at this point. So, I'd be
 okay with an intermediate 2.x release that drops JDK6 support (but no
 incompatible changes to the classpath like Guava). This is basically free,
 and we could start using JDK7 idioms like multi-catch and new NIO stuff in
 Hadoop code (a minor draw I guess).

 My higher-level goal though is to avoid going through this same pain again
 when JDK7 goes EOL. I'd like to do a JDK8-based release before then for
 this reason. This is why I suggested skipping an intermediate 2.x+JDK7
 release and leapfrogging to 3.0+JDK8. 10 months is really not that far in
 the future, and it seems like a better place to focus our efforts. I was
 also hoping it'd be realistic to fix our classpath leakage by then, since
 then we'd have a nice, tight, future-proofed new major release.

 Thanks,
 Andrew




 On Tue, Jun 24, 2014 at 11:43 AM, Arun C Murthy a...@hortonworks.com
 wrote:

  Andrew,
 
   Thanks for starting this thread. I'll edit the wiki to provide more
  context around rolling-upgrades etc. which, as I pointed out in the
  original thread, are key IMHO.
 
  On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com
  wrote:
   https://wiki.apache.org/hadoop/MovingToJdk7and8
  
   I think based on our current compatibility guidelines, Proposal A is
 the
   most attractive. We're pretty hamstrung by the requirement to keep the
   classpath the same, which would be solved by either OSGI or shading our
   deps (but that's a different discussion).
 
  I don't see that anywhere in our current compatibility guidelines.
 
  As you can see from
 
 http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html
  we do not have such a policy (pasted here for convenience):
 
  Java Classpath
 
  User applications built 

Re: Moving to JDK7, JDK8 and new major releases

2014-06-24 Thread Steve Loughran
+1, though I think 2.5 may be premature if we want to send a warning note
last ever. That's an issue for followon when in branch 2.

Guava and protobuf.jar are two things we have to leave alone, with the
first being unfortunate, but their attitude to updates is pretty dramatic.
The latter? We all know how traumatic that can be.

-Steve


On 24 June 2014 16:44, Alejandro Abdelnur t...@cloudera.com wrote:

 After reading this thread and thinking a bit about it, I think it should be
 OK such move up to JDK7 in Hadoop 2 for the following reasons:

 * Existing Hadoop 2 releases and related projects are running
   on JDK7 in production.
 * Commercial vendors of Hadoop have already done lot of
   work to ensure Hadoop on JDK7 works while keeping Hadoop
   on JDK6 working.
 * Different from many of the 3rd party libraries used by Hadoop,
   JDK is much stricter on backwards compatibility.

 IMPORTANT: I take this as an exception and not as a carte blanche for 3rd
 party dependencies and for moving from JDK7 to JDK8 (though it could OK for
 the later if we end up in the same state of affairs)

 Even for Hadoop 2.5, I think we could do the move:

 * Create the Hadoop 2.5 release branch.
 * Have one nightly Jenkins job that builds Hadoop 2.5 branch
   with JDK6 to ensure not JDK7 language/API  feature creeps
   out in Hadoop 2.5. Keep this for all Hadoop 2.5.x releases.
 * Sanity tests for the Hadoop 2.5.x releases should be done
   with JDK7.
 * Apply Steve’s patch to require JDK7 on trunk and branch-2.
 * Move all Apache Jenkins jobs to build/test using JDK7.
 * Starting from Hadoop 2.6 we support JDK7 language/API
   features.

 Effectively what we are ensuring that Hadoop 2.5.x builds and test with
 JDK6  JDK7 and that all tests towards the release
 are done with JDK7.

 Users can proactively upgrade to JDK7 before upgrading to Hadoop 2.5.x, or
 if upgrade to Hadoop 2.5.x and they run into any issue because of JDK6
 (which it would be quite unlikely) they can reactively upgrade to JDK7.

 Thoughts?


 On Tue, Jun 24, 2014 at 4:22 PM, Andrew Wang andrew.w...@cloudera.com
 wrote:

  Hi all,
 
  On dependencies, we've bumped library versions when we think it's safe
 and
  the APIs in the new version are compatible. Or, it's not leaked to the
 app
  classpath (e.g the JUnit version bump). I think the JIRAs Arun mentioned
  fall into one of those categories. Steve can do a better job explaining
  this to me, but we haven't bumped things like Jetty or Guava because they
  are on the classpath and are not compatible. There is this line in the
  compat guidelines:
 
 - Existing MapReduce, YARN  HDFS applications and frameworks should
 work unmodified within a major release i.e. Apache Hadoop ABI is
  supported.
 
  Since Hadoop apps can and do depend on the Hadoop classpath, the
 classpath
  is effectively part of our API. I'm sure there are user apps out there
 that
  will break if we make incompatible changes to the classpath. I haven't
 read
  up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app
 out
  there.
 
  Sticking to the theme of work unmodified, let's think about the user
  effort required to upgrade their JDK. This can be a very expensive task.
 It
  might need approval up and down the org, meaning lots of certification,
  testing, and signoff. Considering the amount of user effort involved
 here,
  it really seems like dropping a JDK is something that should only happen
 in
  a major release. Else, there's the potential for nasty surprises in a
  supposedly minor release.
 
  That said, we are in an unhappy place right now regarding JDK6, and it's
  true that almost everyone's moved off of JDK6 at this point. So, I'd be
  okay with an intermediate 2.x release that drops JDK6 support (but no
  incompatible changes to the classpath like Guava). This is basically
 free,
  and we could start using JDK7 idioms like multi-catch and new NIO stuff
 in
  Hadoop code (a minor draw I guess).
 
  My higher-level goal though is to avoid going through this same pain
 again
  when JDK7 goes EOL. I'd like to do a JDK8-based release before then for
  this reason. This is why I suggested skipping an intermediate 2.x+JDK7
  release and leapfrogging to 3.0+JDK8. 10 months is really not that far in
  the future, and it seems like a better place to focus our efforts. I was
  also hoping it'd be realistic to fix our classpath leakage by then, since
  then we'd have a nice, tight, future-proofed new major release.
 
  Thanks,
  Andrew
 
 
 
 
  On Tue, Jun 24, 2014 at 11:43 AM, Arun C Murthy a...@hortonworks.com
  wrote:
 
   Andrew,
  
Thanks for starting this thread. I'll edit the wiki to provide more
   context around rolling-upgrades etc. which, as I pointed out in the
   original thread, are key IMHO.
  
   On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com
   wrote:
https://wiki.apache.org/hadoop/MovingToJdk7and8
   
I think based on our current compatibility guidelines, 

Re: Moving to JDK7, JDK8 and new major releases

2014-06-24 Thread Arun Murthy
Alejandro,


On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com
wrote:

 After reading this thread and thinking a bit about it, I think it should be
 OK such move up to JDK7 in Hadoop 2 for the following reasons:

 * Existing Hadoop 2 releases and related projects are running
   on JDK7 in production.
 * Commercial vendors of Hadoop have already done lot of
   work to ensure Hadoop on JDK7 works while keeping Hadoop
   on JDK6 working.
 * Different from many of the 3rd party libraries used by Hadoop,
   JDK is much stricter on backwards compatibility.


+1 - I think we are all on the same page here. Fully agree.



 IMPORTANT: I take this as an exception and not as a carte blanche for 3rd
 party dependencies and for moving from JDK7 to JDK8 (though it could OK for
 the later if we end up in the same state of affairs)


+1. Agree again - let's just wait/watch.

From the thread I've become more convinced that (as you've noted before)
that since we are at the bottom of the stack, we need to be more
conservative.

From http://www.oracle.com/technetwork/java/eol-135779.html, it looks like
April 2015 is the *earliest* Java7 will EOL. Java6 EOL was Feb 2011 and we
are still debating whether we can stop supporting it. So, my guess is that
we will support Java7 at least for a year after it's EOL i.e. till sometime
in early 2016. It's just practical.

Net - We really don't have a good idea when a significant portion of users
will actually migrate to Java 8. W.r.t Java7 this took nearly 3 years after
Java6 EOL. So for now, let's just wait  see how things develop in the
field.


 Even for Hadoop 2.5, I think we could do the move:

 * Create the Hadoop 2.5 release branch.
 * Have one nightly Jenkins job that builds Hadoop 2.5 branch
   with JDK6 to ensure not JDK7 language/API  feature creeps
   out in Hadoop 2.5. Keep this for all Hadoop 2.5.x releases.
 * Sanity tests for the Hadoop 2.5.x releases should be done
   with JDK7.
 * Apply Steve’s patch to require JDK7 on trunk and branch-2.
 * Move all Apache Jenkins jobs to build/test using JDK7.
 * Starting from Hadoop 2.6 we support JDK7 language/API
   features.


I think the mechanics make perfect sense to me. I think we should probably
think a bit more on whether we drop support for JDK6 in hadoop-2.6 or
hadoop-2.7.

I'd like to add one more:
* Sometime soon (within a release or two) after we actually drop support
for Java6 and move branch-2 to JDK7, let's also start testing on Java8.

This way we will be ready for Java8 early regardless of when we stop
support for Java7. Dropping Java7 is a bridge we can cross when we come to
it.


thanks,
Arun


Effectively what we are ensuring that Hadoop 2.5.x builds and test with
 JDK6  JDK7 and that all tests towards the release
 are done with JDK7.

 Users can proactively upgrade to JDK7 before upgrading to Hadoop 2.5.x, or
 if upgrade to Hadoop 2.5.x and they run into any issue because of JDK6
 (which it would be quite unlikely) they can reactively upgrade to JDK7.

 Thoughts?


 On Tue, Jun 24, 2014 at 4:22 PM, Andrew Wang andrew.w...@cloudera.com
 wrote:

  Hi all,
 
  On dependencies, we've bumped library versions when we think it's safe
 and
  the APIs in the new version are compatible. Or, it's not leaked to the
 app
  classpath (e.g the JUnit version bump). I think the JIRAs Arun mentioned
  fall into one of those categories. Steve can do a better job explaining
  this to me, but we haven't bumped things like Jetty or Guava because they
  are on the classpath and are not compatible. There is this line in the
  compat guidelines:
 
 - Existing MapReduce, YARN  HDFS applications and frameworks should
 work unmodified within a major release i.e. Apache Hadoop ABI is
  supported.
 
  Since Hadoop apps can and do depend on the Hadoop classpath, the
 classpath
  is effectively part of our API. I'm sure there are user apps out there
 that
  will break if we make incompatible changes to the classpath. I haven't
 read
  up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app
 out
  there.
 
  Sticking to the theme of work unmodified, let's think about the user
  effort required to upgrade their JDK. This can be a very expensive task.
 It
  might need approval up and down the org, meaning lots of certification,
  testing, and signoff. Considering the amount of user effort involved
 here,
  it really seems like dropping a JDK is something that should only happen
 in
  a major release. Else, there's the potential for nasty surprises in a
  supposedly minor release.
 
  That said, we are in an unhappy place right now regarding JDK6, and it's
  true that almost everyone's moved off of JDK6 at this point. So, I'd be
  okay with an intermediate 2.x release that drops JDK6 support (but no
  incompatible changes to the classpath like Guava). This is basically
 free,
  and we could start using JDK7 idioms like multi-catch and new NIO stuff
 in
  Hadoop code (a minor draw I guess).
 
  My 

Re: Moving to JDK7, JDK8 and new major releases

2014-06-24 Thread Arun C Murthy
On Jun 24, 2014, at 4:22 PM, Andrew Wang andrew.w...@cloudera.com wrote:


 Since Hadoop apps can and do depend on the Hadoop classpath, the classpath
 is effectively part of our API. I'm sure there are user apps out there that
 will break if we make incompatible changes to the classpath. I haven't read
 up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app out
 there.

I think there is a some confusion/misunderstanding here.

With hadoop-2 the user is completely in control of his own classpath (we had a 
similar, but limited capability in hadoop-1 w/ 
https://issues.apache.org/jira/browse/MAPREDUCE-1938).

Furthermore, it's probably not well known that in hadoop-2 the user application 
(MR or otherwise) can also pick the JDK version by using JAVA_HOME env for the 
container. So, in effect, MR applications can continue to use java6 while YARN 
is running java7 - this hasn't been tested extensively though. This capability 
did not exist in hadoop-1. We've also made some progress with 
https://issues.apache.org/jira/browse/MAPREDUCE-1700 to defuse user jar-deps 
from MR system jars. https://issues.apache.org/jira/browse/MAPREDUCE-4421 also 
helps by ensuring MR applications can pick exact version of MR jars they were 
compiled against; and not rely on cluster installs.

Hope that helps somewhat.

thanks,
Arun


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.