Re: Plans of moving towards JDK7 in trunk

2014-06-23 Thread Sandy Ryza
Andrew, correct me if I'm misunderstanding, but the incompatible change
that would require a major version bump is dropping support for JDK6.


On Mon, Jun 23, 2014 at 1:53 PM, sanjay Radia san...@hortonworks.com
wrote:


 On Jun 21, 2014, at 8:01 AM, Andrew Wang andrew.w...@cloudera.com wrote:

  This is why I'd like to keep my original proposal on the table: keep
 going
  with branch-2 in the near term, while working towards a JDK8-based
 Hadoop 3
  by April next year. It doesn't need to be a big bang release either. I'd
 be
  delighted if we could rolling upgrade from one to the other. I just
 didn't
  want to rule out the inclusion of some very compelling feature outright.
  Trust me though, I'd be the first person to ask about compatibility if
 such
  a feature does come up.


 Given your above statement  on compatibility (such as rolling upgrades),
  it should be fine for the JDK8-based-Hadoop-release to not be 3.0 and
 instead merely be 2.x? Or do you have any incompatible changes to Hadoop
 protocol or APIs in mind during the same time period?

 sanjay
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Plans of moving towards JDK7 in trunk

2014-06-23 Thread Vinod Kumar Vavilapalli
Hey all,

This one started as an innocuous thread of enabling JDK7 on trunk and now it 
seems like (haven't still finished reading the entire thing, and I started a 
while ago) it has become a full blown proposal on 2.x, 3.x and 4.x releases. 
Some of us haven't been tracking this (at least me and a few others who 
indicated offline as such) assuming this is only about letting Jenkins run 
JDK7, but it has the potential to impact all future work.

I propose we fork this thread into a new one which clarifies the topic clearly 
for others to follow too.

Thanks,
+Vinod

On Jun 23, 2014, at 1:53 PM, sanjay Radia san...@hortonworks.com wrote:

 
 On Jun 21, 2014, at 8:01 AM, Andrew Wang andrew.w...@cloudera.com wrote:
 
 This is why I'd like to keep my original proposal on the table: keep going
 with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
 by April next year. It doesn't need to be a big bang release either. I'd be
 delighted if we could rolling upgrade from one to the other. I just didn't
 want to rule out the inclusion of some very compelling feature outright.
 Trust me though, I'd be the first person to ask about compatibility if such
 a feature does come up.
 
 
 Given your above statement  on compatibility (such as rolling upgrades),  it 
 should be fine for the JDK8-based-Hadoop-release to not be 3.0 and instead 
 merely be 2.x? Or do you have any incompatible changes to Hadoop protocol or 
 APIs in mind during the same time period?
 
 sanjay
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Plans of moving towards JDK7 in trunk

2014-06-21 Thread Andrew Wang
Hi Steve, let me confirm that I understand your proposal correctly:

- Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
bumped library versions
- Release a Hadoop 4 mid next year, based on JDK8

I question the utility of an intermediate Hadoop 3 like this. Assuming that
it gets out in September (i.e. roughly when a 2.6 would land), we're
looking at a valid lifespan of about 7 months before JDK7 is EOL in April.
If this release also breaks compatibility by changing library versions,
then it looks less and less appealing from a user perspective. I suspect it
would end up seeing low adoption as everyone waits (at most) 7 months for
the JDK8-based release to emerge.

I'd be more okay with an intermediate release with no incompatible changes
whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
be a weak release considering that branch-2 already runs fine on JDK7, and
it looks somewhat bad publicly as we burn another major release number less
than a year since 2.x going GA.

This is why I'd like to keep my original proposal on the table: keep going
with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
by April next year. It doesn't need to be a big bang release either. I'd be
delighted if we could rolling upgrade from one to the other. I just didn't
want to rule out the inclusion of some very compelling feature outright.
Trust me though, I'd be the first person to ask about compatibility if such
a feature does come up.

I'll also posit that people will shy away from using JDK8 features while
branch-2 remains in active use. There's definitely some new shiny there,
but nothing compelling enough to me personally when weighed against the
pain of harder branch-2 backports.

Let's try to keep this thread focused on the planning side of things
though, deferring JDK-feature-related discussion to a different thread.
We'd need to draw up a code-style doc on the wiki, but it sounds like
something Steve and/or I could draft initially.

Thanks,
Andrew


On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy a...@hortonworks.com wrote:


 On Jun 20, 2014, at 9:51 PM, Steve Loughran ste...@hortonworks.com
 wrote:

  On 20 June 2014 21:35, Steve Loughran ste...@hortonworks.com wrote:
 
 
  This actually argues in favour of
 
  -renaming branch-2 branch-3 after a release
  -making trunk hadoop-4
 
  -getting hadoop 3 released off the new branch-3 out in 2014, effectively
  being an iteration of branch-2 with updated java , moves of (off?)
 guava,
  off jetty, lib changes, but no other significant big bang features
 
 
  Hadoop 4.x then becomes the 2015 release, which can add more stuff. In
  particular, anything that goes into Hadoop 4 for which there's no
 intent to
  support in hadoop 2  3, can use the java 8 language features sooner
 rather
  than later.
 
 
 
  I should add that I'm willing to be the person who gets the Java-7 based
  Hadoop  3.x out the door later this year

 +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
 share the pain… ;-)

 Arun
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Plans of moving towards JDK7 in trunk

2014-06-21 Thread Steve Loughran
On 21 June 2014 08:01, Andrew Wang andrew.w...@cloudera.com wrote:

 Hi Steve, let me confirm that I understand your proposal correctly:

 - Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
 bumped library versions
 - Release a Hadoop 4 mid next year, based on JDK8

 I question the utility of an intermediate Hadoop 3 like this. Assuming that
 it gets out in September (i.e. roughly when a 2.6 would land), we're
 looking at a valid lifespan of about 7 months before JDK7 is EOL in April.
 If this release also breaks compatibility by changing library versions,
 then it looks less and less appealing from a user perspective. I suspect it
 would end up seeing low adoption as everyone waits (at most) 7 months for
 the JDK8-based release to emerge.



I'm saying that we'd replace hadooop 2.6 with a 3.x release that, along
with the 2.6 changes, ups the java version and the JARs and dependencies
which we are frozen with in Hadoop 2.x

this issue of dependencies may not be so visible in hadoop's own codebase,
but when you write any downstream project, the majority of the xml
clauses in your POM file is about excluding stuff Hadoop pulls in. I've
been quietly trying to address this at HADOOP-9991, but we've reached the
limit of what can get in.

I'd be happy enough with the original Stata Plan: a release of Hadoop 2.x
that says java 7 + new libs, but given we've committed to not doing that,
releasing a Hadoop 3 stating that lets us get a hadoop with a modern set of
underpinnings out in 2014



 I'd be more okay with an intermediate release with no incompatible changes
 whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
 be a weak release considering that branch-2 already runs fine on JDK7, and
 it looks somewhat bad publicly as we burn another major release number less
 than a year since 2.x going GA.



it'll be  1 year for 2.x to 3,

And to be realistic, the move to java 8+ across the entire hadoop stack
will probably take 1y too.



 This is why I'd like to keep my original proposal on the table: keep going
 with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
 by April next year. It doesn't need to be a big bang release either. I'd be
 delighted if we could rolling upgrade from one to the other. I just didn't
 want to rule out the inclusion of some very compelling feature outright.
 Trust me though, I'd be the first person to ask about compatibility if such
 a feature does come up.

 I'll also posit that people will shy away from using JDK8 features while
 branch-2 remains in active use. There's definitely some new shiny there,
 but nothing compelling enough to me personally when weighed against the
 pain of harder branch-2 backports.



branch 2 would be frozen and tell everyone move to java 7+, everything
downstream gets updated binaries and a chance to move forwards.

There's another issue, which is one Alejandro highlit:

-- Forwarded message --
From: Alejandro Abdelnur t...@cloudera.com
Date: 10 April 2014 10:30
Subject: Re: Plans of moving towards JDK7 in trunk
To: common-dev@hadoop.apache.org common-dev@hadoop.apache.org


A bit of a different angle.

As the bottom of the stack Hadoop has to be conservative in adopting
things, but it should not preclude consumers of Hadoop (downstream projects
and Hadoop application developers) to have additional requirements such as
a higher JDK API than JDK6.

Hadoop 2.x should stick to using JDK6  API
Hadoop 2.x should be tested with multiple runtimes: JDK6, JDK7 and
eventually JDK8
Downstream projects and Hadoop application developers are free to require
any JDK6+ version for development and runtime.

Hadoop 3.x should allow using JDK7 API, bumping the minimum runtime
requirement to JDK7 and be tested with JDK7 and JDK8 runtimes.

-- Forwarded message --

The minimum version of Java that Hadoop mandates is going to be the minimum
version of Java that the entire stack has to adopt, and the minimum version
of Java that has to be run in the datacentre.

I wonder about how easily it will be for us all to go to the big hadoop
sites and say java 8+ only, as well as to all those Hadoop projects that
want to run on java 7 and say upgrade time. I think we'll hit a lot of
inertia -and, to be fair- it's due to Hadoop core's long-standing support
for Java 6. If Hadoop 2.x had always been java7+ it would be simpler, but
we all know the trauma of getting hadoop 2.2 out the door and our lack of
enthusiasm for any major dependency updates apart from the protobuf one.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding

Re: Plans of moving towards JDK7 in trunk

2014-06-21 Thread Arun C. Murthy
Andrew,


 On Jun 21, 2014, at 8:01 AM, Andrew Wang andrew.w...@cloudera.com wrote:
 
 Hi Steve, let me confirm that I understand your proposal correctly:
 
 - Release an intermediate Hadoop 3 a few months out, based on JDK7 and with
 bumped library versions
 - Release a Hadoop 4 mid next year, based on JDK8
 
 I question the utility of an intermediate Hadoop 3 like this. Assuming that
 it gets out in September (i.e. roughly when a 2.6 would land), we're
 looking at a valid lifespan of about 7 months before JDK7 is EOL i

JDK6 eol was Feb 2013 and, a year later, we are still have customers using it - 
which means we can't drop it yet.

http://www.oracle.com/technetwork/java/eol-135779.html

Given that, it seems highly unlikely everyone will suddenly jump to JDK8 by 
April of next year... I suspect this means we'd have to support JDK7 at least 
till late 2015. I think, that, is really key regardless of version numbers.

Furthermore, if we, as a community, maintain discipline in terms of 
wire-compat, rolling-upgrades etc. we are better off making a major release 
every year - as you put, no more 'Big Bang' releases.

 We have to, as a development community, ourselves get over the 'trauma' of 
major releases - I do realize the irony here - but it's requisite to help our 
users feel confident in upgrading at a reasonable rate.

So, something like this could work:
# hadoop-2 / jdk6 - Oct 2013
# hadoop-3 / jdk7 - Oct 2014
# hadoop-4 / jdk8 - Oct 2015

Having said that, it would also be prudent to co-release hadoop-2/hadoop-3  
hadoop-3/hadoop-4 with requisite jdk versions. Maybe even hadoop-4 beta by 
middle of 2015. As such, it a good idea to allow trunk to move to jdk7 now - 
it's good practice as we will have to do the same for jdk8.

It does help, a lot, that we have now de-coupled user dependencies from the 
system with YARN. For e.g. we could run hadoop-2 MR on hadoop-3 YARN, even if 
there is some work remaining... see MAPREDUCE-4551. Future reliance on 
technologies like Docker will help further.

Thoughts?

Arun

 If this release also breaks compatibility by changing library versions,
 then it looks less and less appealing from a user perspective. I suspect it
 would end up seeing low adoption as everyone waits (at most) 7 months for
 the JDK8-based release to emerge.
 
 I'd be more okay with an intermediate release with no incompatible changes
 whatsoever besides bumping the JDK requirement to JDK7. However, it'd still
 be a weak release considering that branch-2 already runs fine on JDK7, and
 it looks somewhat bad publicly as we burn another major release number less
 than a year since 2.x going GA.
 
 This is why I'd like to keep my original proposal on the table: keep going
 with branch-2 in the near term, while working towards a JDK8-based Hadoop 3
 by April next year. It doesn't need to be a big bang release either. I'd be
 delighted if we could rolling upgrade from one to the other. I just didn't
 want to rule out the inclusion of some very compelling feature outright.
 Trust me though, I'd be the first person to ask about compatibility if such
 a feature does come up.
 
 I'll also posit that people will shy away from using JDK8 features while
 branch-2 remains in active use. There's definitely some new shiny there,
 but nothing compelling enough to me personally when weighed against the
 pain of harder branch-2 backports.
 
 Let's try to keep this thread focused on the planning side of things
 though, deferring JDK-feature-related discussion to a different thread.
 We'd need to draw up a code-style doc on the wiki, but it sounds like
 something Steve and/or I could draft initially.
 
 Thanks,
 Andrew
 
 
 On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy a...@hortonworks.com wrote:
 
 
 On Jun 20, 2014, at 9:51 PM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 On 20 June 2014 21:35, Steve Loughran ste...@hortonworks.com wrote:
 
 
 This actually argues in favour of
 
 -renaming branch-2 branch-3 after a release
 -making trunk hadoop-4
 
 -getting hadoop 3 released off the new branch-3 out in 2014, effectively
 being an iteration of branch-2 with updated java , moves of (off?)
 guava,
 off jetty, lib changes, but no other significant big bang features
 
 
 Hadoop 4.x then becomes the 2015 release, which can add more stuff. In
 particular, anything that goes into Hadoop 4 for which there's no
 intent to
 support in hadoop 2  3, can use the java 8 language features sooner
 rather
 than later.
 I should add that I'm willing to be the person who gets the Java-7 based
 Hadoop  3.x out the door later this year
 
 +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
 share the pain… ;-)
 
 Arun
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended 

Re: Plans of moving towards JDK7 in trunk

2014-06-21 Thread Arun C Murthy

After further consideration, here is an alternate.

On Jun 21, 2014, at 11:14 AM, Arun C. Murthy a...@hortonworks.com wrote:
 
 JDK6 eol was Feb 2013 and, a year later, we are still have customers using it 
 - which means we can't drop it yet.
 
 http://www.oracle.com/technetwork/java/eol-135779.html
 
 Given that, it seems highly unlikely everyone will suddenly jump to JDK8 by 
 April of next year... I suspect this means we'd have to support JDK7 at least 
 till late 2015. I think, that, is really key regardless of version numbers.
 
 Furthermore, if we, as a community, maintain discipline in terms of 
 wire-compat, rolling-upgrades etc. we are better off making a major release 
 every year - as you put, no more 'Big Bang' releases.


Looking at the big picture, I believe the users of Apache Hadoop would be 
better served by us if we prioritized operational aspects such as rolling 
upgrades, wire-compatibility, binary etc. for a couple of years.

Since not everyone has moved to hadoop-2 yet, talk of more incompatibility 
between hadoop-2/hadoop-3 or between hadoop-3/hadoop-4 within the next 12 
months would certainly be a big issue for users - especially w.r.t rolling 
upgrades, wire-compat etc.

So, I think we should prioritize these operational aspects for users above 
everything else. Sure, jdk versions, features etc. are important, but lower in 
priority.

I'd also like to reiterate my concern on *dropping* support for a JDK7 - we 
need to support it till end of 2015 at the very least; happy to ship a version 
of Hadoop which is JDK8 only in 2015 - it just needs to support 
rolling-upgrades from the JDK7 Hadoop till end of 2015.

With that in mind... I actually like Andrew's suggestion below:

  On Jun 21, 2014, at 8:01 AM, Andrew Wang andrew.w...@cloudera.com wrote:
 
  I'd be more okay with an intermediate release with no incompatible changes
  whatsoever besides bumping the JDK requirement to JDK7.

Taking that thought to it's logical conclusion, we can de-couple the dual 
concerns of JDK versions and major releases but bumping up our software 
dependencies (JDK, guice etc.) at well-defined and well-articulated releases.

The reason to so would be to ensure we *do not* sneak in operational 
incompatibilities in the guise of bumping JDK versions.

So, we could do something like:
# hadoop-2.30+ is JDK7, but provides rolling upgrades and wire-compat with 
hadoop-2.2+; say in Oct 2014
# hadoop-2.50+ is JDK8, but provides rolling upgrades and wire-compat with 
hadoop-2.2+; say in June 2015 (or even earlier).

This scheme certainly has some dis-advantages, however it has the significant 
advantage of making it *very* clear to end-users and administrators that we 
take operational aspects seriously.

Also, this is something we already have done i.e. we updated some of our 
software deps in hadoop-2.4 v/s hadoop-2.2 - clearly not something as dramatic 
as JDK. Here are some examples:
https://issues.apache.org/jira/browse/HADOOP-9991
https://issues.apache.org/jira/browse/HADOOP-10102
https://issues.apache.org/jira/browse/HADOOP-10103
https://issues.apache.org/jira/browse/HADOOP-10104
https://issues.apache.org/jira/browse/HADOOP-10503

In summary, the key goals we should keep in mind are:
# Operational aspects such as rolling upgrades, wire-compat etc. for the next 
couple of years.
# Support JDK7 till end of 2015 at least, even if we decide to support JDK8 
sometime in 2015. Just ensure wire-compat, rolling-upgrades etc.

Thoughts?

thanks,
Arun
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-06-21 Thread Alejandro Abdelnur
On Fri, Jun 20, 2014 at 10:02 PM, Arun C Murthy a...@hortonworks.com wrote:

  Hadoop  3.x out the door later this year

 +1 that makes sense to me. Thanks for volunteering Steve - I'm glad to
 share the pain… ;-)


Hey Arun, you may have missed that Andrew volunteered for doing this as
well (the thread is long, so easy to miss).

Cheers

-- 
Alejandro


Re: Plans of moving towards JDK7 in trunk

2014-06-20 Thread Colin McCabe
) was more than a year ago
 
  - premier support ended more than six months ago
 
  - extended support may get critical security fixes until the end of 2016
 
 
 
  Given this timeline, does Cloudera officially take responsibility for
  Hadoop customer safety? Are you going to be releasing critical security
  fixes to a known unsafe JDK?
 
 
 
  Davi
 
 
 
 
 
 
 
   -Original Message-
 
   From: Andrew Wang [mailto:andrew.w...@cloudera.com]
 
   Sent: Wednesday, June 18, 2014 12:33 PM
 
   To: common-dev@hadoop.apache.org
 
   Subject: Re: Plans of moving towards JDK7 in trunk
 
  
 
   Actually, a lot of our customers are still on JDK6, so if anything,
 its
  popularity
 
   hasn't significantly decreased. We still test and support JDK6 for
 CDH4
  and
 
   CDH5. The claim that branch-2 is effectively JDK7 because no one
 supports
 
   JDK6 is untrue.
 
  
 
   One issue with your proposal is that java 7+ libraries can have
  incompatible
 
   APIs compared to their java 6 versions. Guava moves very quickly with
  regard
 
   to the deprecate+remove cycle. This means branch-2 and trunk
 divergence,
 
   as we're stuck using different Guava APIs to do the same thing.
 
  
 
   No one's arguing against moving to Java 7+ in trunk eventually, but
  there isn't
 
   a clear plan for a trunk-based release. I don't see any point to
  switching trunk
 
   over until that's true, for the aforementioned reasons.
 
  
 
   Best,
 
   Andrew
 
  
 
  
 
   On Wed, Jun 18, 2014 at 12:08 PM, Steve Loughran
 
   ste...@hortonworks.commailto:ste...@hortonworks.com
 
   wrote:
 
  
 
I also think we need to recognise that its been three months since
 
that last discussion, and Java 6 has not suddenly burst back into
 
popularity
 
   
 
   
 
   - nobody providing commercial support for Hadoop is offering
  branch-2
 
   support on Java 6 AFAIK
 
   - therefore, nobody is testing it at scale except privately, and
  they
 
   aren't reporting bugs if they are
 
   - if someone actually did file a bug on something on branch-2
 which
 
   didn't work on Java 6 but went away on Java7+, we'd probably
 close
 
it as a
 
   WORKSFORME
 
   
 
   
 
whether we acknowledge it or not, Hadoop 2.x is now really Java 7+.
 
   
 
We do all agree that hadoop 3 will not be java 6, so the only issue
 is
 
when and how to make that transition.
 
   
 
That patch of mine just makes it possible to do today.
 
   
 
I have actually jumped to Java7 in the slider project, and actually
 
being using Java 8 and twill; the new language features there are
 
significant and would be great to use in Hadoop *at some point in
 the
 
future*
 
   
 
For Java 7 though, based on that experience, the language changes
 are
 
convenient but not essential
 
   
 
   - try-with-resources simply swallows close failures without the
 log
 
   integration we have with IOUtils.closeStream(), so shoudn't be
 used
  in
 
   hadoop core anyway.
 
   - string based switching: convenient, but not critical
 
   - type inference on template constructors. Modern IDEs handle the
  pain
 
   anyway
 
   
 
The only feature I like is multi-catch and typed rethrow
 
   
 
catch(IOException | ExitException e) {  log.warn(e.toString();
  throw
 
e; }
 
   
 
this would make e look like Exception, but when rethrown go back
 to
 
its original type.
 
   
 
This reduces duplicate work, and is the bit l actually value. Is it
 
enough to justify making code incompatible across branches? No.
 
   
 
So i'm going to propose this, and would like to start a vote on it
 
soon
 
   
 
   
 
   1. we parameterize java versions in the POMs on all branches,
 with
 
   separate JDK versions and Java language
 
   2. branch-2: java-6-language and JDK-6 minimum JDK
 
   3. trunk: java-6-language and JDK-7 minimum JDK
 
   
 
This would guarantee that none of the java 7 language features went
 
in, but we could move trunk up to java 7+ only libraries (jersey,
 
guava). Adopting
 
JDK7 features then becomes no more different from adopting java7+
 
libraries: those bits of code that have moved can't be backported.
 
   
 
-Steve
 
   
 
   
 
   
 
   
 
   
 
On 17 June 2014 22:08, Andrew Wang andrew.w...@cloudera.com
 mailto:
  andrew.w...@cloudera.com
 
   wrote:
 
   
 
 Reviving this thread, I noticed there's been a patch and +1 on
 
 HADOOP-10530, and I don't think we actually reached a conclusion.
 

 
 I (and others) have expressed concerns about moving to JDK7 for
  trunk.
 
 Summarizing a few points:
 

 
 - We can't move to JDK7 in branch-2 because of compatibility
 
 - branch-2 is currently the only Hadoop release vehicle, there are
 
 no
 
plans
 
 for a trunk-based Hadoop 3
 
 - Introducing JDK7-only APIs in trunk will increase divergence

Re: Plans of moving towards JDK7 in trunk

2014-06-20 Thread Colin McCabe
 insecure and deprecated/unpatched JDK?
 
 
 
  I mentioned before in this thread the Oracle support timeline:
 
 
 
  - official public EOL (end of life) was more than a year ago
 
  - premier support ended more than six months ago
 
  - extended support may get critical security fixes until the end of 2016
 
 
 
  Given this timeline, does Cloudera officially take responsibility for
  Hadoop customer safety? Are you going to be releasing critical security
  fixes to a known unsafe JDK?
 
 
 
  Davi
 
 
 
 
 
 
 
   -Original Message-
 
   From: Andrew Wang [mailto:andrew.w...@cloudera.com]
 
   Sent: Wednesday, June 18, 2014 12:33 PM
 
   To: common-dev@hadoop.apache.org
 
   Subject: Re: Plans of moving towards JDK7 in trunk
 
  
 
   Actually, a lot of our customers are still on JDK6, so if anything,
 its
  popularity
 
   hasn't significantly decreased. We still test and support JDK6 for
 CDH4
  and
 
   CDH5. The claim that branch-2 is effectively JDK7 because no one
 supports
 
   JDK6 is untrue.
 
  
 
   One issue with your proposal is that java 7+ libraries can have
  incompatible
 
   APIs compared to their java 6 versions. Guava moves very quickly with
  regard
 
   to the deprecate+remove cycle. This means branch-2 and trunk
 divergence,
 
   as we're stuck using different Guava APIs to do the same thing.
 
  
 
   No one's arguing against moving to Java 7+ in trunk eventually, but
  there isn't
 
   a clear plan for a trunk-based release. I don't see any point to
  switching trunk
 
   over until that's true, for the aforementioned reasons.
 
  
 
   Best,
 
   Andrew
 
  
 
  
 
   On Wed, Jun 18, 2014 at 12:08 PM, Steve Loughran
 
   ste...@hortonworks.commailto:ste...@hortonworks.com
 
   wrote:
 
  
 
I also think we need to recognise that its been three months since
 
that last discussion, and Java 6 has not suddenly burst back into
 
popularity
 
   
 
   
 
   - nobody providing commercial support for Hadoop is offering
  branch-2
 
   support on Java 6 AFAIK
 
   - therefore, nobody is testing it at scale except privately, and
  they
 
   aren't reporting bugs if they are
 
   - if someone actually did file a bug on something on branch-2
 which
 
   didn't work on Java 6 but went away on Java7+, we'd probably
 close
 
it as a
 
   WORKSFORME
 
   
 
   
 
whether we acknowledge it or not, Hadoop 2.x is now really Java 7+.
 
   
 
We do all agree that hadoop 3 will not be java 6, so the only issue
 is
 
when and how to make that transition.
 
   
 
That patch of mine just makes it possible to do today.
 
   
 
I have actually jumped to Java7 in the slider project, and actually
 
being using Java 8 and twill; the new language features there are
 
significant and would be great to use in Hadoop *at some point in
 the
 
future*
 
   
 
For Java 7 though, based on that experience, the language changes
 are
 
convenient but not essential
 
   
 
   - try-with-resources simply swallows close failures without the
 log
 
   integration we have with IOUtils.closeStream(), so shoudn't be
 used
  in
 
   hadoop core anyway.
 
   - string based switching: convenient, but not critical
 
   - type inference on template constructors. Modern IDEs handle the
  pain
 
   anyway
 
   
 
The only feature I like is multi-catch and typed rethrow
 
   
 
catch(IOException | ExitException e) {  log.warn(e.toString();
  throw
 
e; }
 
   
 
this would make e look like Exception, but when rethrown go back
 to
 
its original type.
 
   
 
This reduces duplicate work, and is the bit l actually value. Is it
 
enough to justify making code incompatible across branches? No.
 
   
 
So i'm going to propose this, and would like to start a vote on it
 
soon
 
   
 
   
 
   1. we parameterize java versions in the POMs on all branches,
 with
 
   separate JDK versions and Java language
 
   2. branch-2: java-6-language and JDK-6 minimum JDK
 
   3. trunk: java-6-language and JDK-7 minimum JDK
 
   
 
This would guarantee that none of the java 7 language features went
 
in, but we could move trunk up to java 7+ only libraries (jersey,
 
guava). Adopting
 
JDK7 features then becomes no more different from adopting java7+
 
libraries: those bits of code that have moved can't be backported.
 
   
 
-Steve
 
   
 
   
 
   
 
   
 
   
 
On 17 June 2014 22:08, Andrew Wang andrew.w...@cloudera.com
 mailto:
  andrew.w...@cloudera.com
 
   wrote:
 
   
 
 Reviving this thread, I noticed there's been a patch and +1 on
 
 HADOOP-10530, and I don't think we actually reached a conclusion.
 

 
 I (and others) have expressed concerns about moving to JDK7 for
  trunk.
 
 Summarizing a few points:
 

 
 - We can't move to JDK7 in branch-2 because of compatibility
 
 - branch-2 is currently the only Hadoop release

Re: Plans of moving towards JDK7 in trunk

2014-06-20 Thread Bryan Beaudreault
 of running unsafe software. Clearly customer
 best
   interest is stability. JDK6 is in a known unsafe state. The longer
  anyone
   delays the necessary transition to safety the longer the door is
 left
  open
   to predictable disaster.
  
  
  
   You also said we still test and support JDK6. I searched but have
 not
   been able to find Cloudera critical security fixes for JDK6.
  
  
  
   Can you clarify, for example, Java 6 Update 51 for CVE-2013-2465? In
  other
   words, did you release to your customers any kind of public alert or
   warning of this CVSS 10.0 event as part of your JDK6 support?
  
  
  
   http://www.cvedetails.com/cve/CVE-2013-2465/
  
  
  
   If you are not releasing your own security fixes for JDK6 post-EOL
 would
   it perhaps be safer to say Cloudera is hands-off; neither supports,
 nor
   opposes the known insecure and deprecated/unpatched JDK?
  
  
  
   I mentioned before in this thread the Oracle support timeline:
  
  
  
   - official public EOL (end of life) was more than a year ago
  
   - premier support ended more than six months ago
  
   - extended support may get critical security fixes until the end of
 2016
  
  
  
   Given this timeline, does Cloudera officially take responsibility
 for
   Hadoop customer safety? Are you going to be releasing critical
 security
   fixes to a known unsafe JDK?
  
  
  
   Davi
  
  
  
  
  
  
  
-Original Message-
  
From: Andrew Wang [mailto:andrew.w...@cloudera.com]
  
Sent: Wednesday, June 18, 2014 12:33 PM
  
To: common-dev@hadoop.apache.org
  
Subject: Re: Plans of moving towards JDK7 in trunk
  
   
  
Actually, a lot of our customers are still on JDK6, so if
 anything,
  its
   popularity
  
hasn't significantly decreased. We still test and support JDK6 for
  CDH4
   and
  
CDH5. The claim that branch-2 is effectively JDK7 because no one
  supports
  
JDK6 is untrue.
  
   
  
One issue with your proposal is that java 7+ libraries can have
   incompatible
  
APIs compared to their java 6 versions. Guava moves very quickly
 with
   regard
  
to the deprecate+remove cycle. This means branch-2 and trunk
  divergence,
  
as we're stuck using different Guava APIs to do the same thing.
  
   
  
No one's arguing against moving to Java 7+ in trunk eventually,
 but
   there isn't
  
a clear plan for a trunk-based release. I don't see any point to
   switching trunk
  
over until that's true, for the aforementioned reasons.
  
   
  
Best,
  
Andrew
  
   
  
   
  
On Wed, Jun 18, 2014 at 12:08 PM, Steve Loughran
  
ste...@hortonworks.commailto:ste...@hortonworks.com
  
wrote:
  
   
  
 I also think we need to recognise that its been three months
 since
  
 that last discussion, and Java 6 has not suddenly burst back
 into
  
 popularity
  

  

  
- nobody providing commercial support for Hadoop is offering
   branch-2
  
support on Java 6 AFAIK
  
- therefore, nobody is testing it at scale except privately,
 and
   they
  
aren't reporting bugs if they are
  
- if someone actually did file a bug on something on branch-2
  which
  
didn't work on Java 6 but went away on Java7+, we'd probably
  close
  
 it as a
  
WORKSFORME
  

  

  
 whether we acknowledge it or not, Hadoop 2.x is now really Java
 7+.
  

  
 We do all agree that hadoop 3 will not be java 6, so the only
 issue
  is
  
 when and how to make that transition.
  

  
 That patch of mine just makes it possible to do today.
  

  
 I have actually jumped to Java7 in the slider project, and
 actually
  
 being using Java 8 and twill; the new language features there
 are
  
 significant and would be great to use in Hadoop *at some point
 in
  the
  
 future*
  

  
 For Java 7 though, based on that experience, the language
 changes
  are
  
 convenient but not essential
  

  
- try-with-resources simply swallows close failures without
 the
  log
  
integration we have with IOUtils.closeStream(), so shoudn't
 be
  used
   in
  
hadoop core anyway.
  
- string based switching: convenient, but not critical
  
- type inference on template constructors. Modern IDEs
 handle the
   pain
  
anyway
  

  
 The only feature I like is multi-catch and typed rethrow
  

  
 catch(IOException | ExitException e) {  log.warn(e.toString();
   throw
  
 e; }
  

  
 this would make e look like Exception, but when rethrown go
 back
  to
  
 its original type.
  

  
 This reduces duplicate work, and is the bit l actually value.
 Is it
  
 enough to justify making code incompatible across branches? No.
  

  
 So i'm going to propose this, and would like to start a vote on
 it
  
 soon
  

  

  
1. we parameterize java

Re: Plans of moving towards JDK7 in trunk

2014-06-20 Thread Aaron T. Myers
, we are
fragmenting the project.  People will start writing unreleaseable
  code
(because it doesn't work on branch-2) and we'll be back to the bad
  old
days of Hadoop version fragmentation that branch-2 was intended to
solve.  Backports will become harder.  The biggest problem is that
trunk will start to depend on libraries or Maven plugins that
  branch-2
can't even use, because they're JDK7+-only.
   
Steve wrote: if someone actually did file a bug on something on
branch-2 which didn't work on Java 6 but went away on Java7+, we'd
probably close it as a WORKSFORME.
   
Steve, if this is true, we should just bump the minimum supported
version for branch-2 to 1.7 today and resolve this.  If we truly
believe that there are no issues here, then let's just decide to
 drop
1.6 in a specific future release of Hadoop 2.  If there are issues
with releasing JDK1.7+ only code, then let's figure out what they
 are
before proceeding.
   
best,
Colin
   
   
On Wed, Jun 18, 2014 at 1:41 PM, Sandy Ryza 
 sandy.r...@cloudera.com
  
wrote:
 We do release warnings when we are aware of vulnerabilities in
 our
 dependencies.

 However, unless I'm grossly misunderstanding, the vulnerability
  that
   you
 point out is not a vulnerability within the context of our
  software.
  Hadoop doesn't try to sandbox within JVMs.  In a secure setup,
 any
   JVM
 running non-trusted user code is running as that user, so
 breaking
   out
 doesn't offer the ability to do anything malicious.

 -Sandy

 On Wed, Jun 18, 2014 at 1:30 PM, Ottenheimer, Davi 
davi.ottenhei...@emc.com
 wrote:

 Andrew,



 “I don't see any point to switching” is an interesting
  perspective,
given
 the well-known risks of running unsafe software. Clearly
 customer
   best
 interest is stability. JDK6 is in a known unsafe state. The
 longer
anyone
 delays the necessary transition to safety the longer the door is
   left
open
 to predictable disaster.



 You also said we still test and support JDK6. I searched but
  have
   not
 been able to find Cloudera critical security fixes for JDK6.



 Can you clarify, for example, Java 6 Update 51 for
 CVE-2013-2465?
  In
other
 words, did you release to your customers any kind of public
 alert
  or
 warning of this CVSS 10.0 event as part of your JDK6 support?



 http://www.cvedetails.com/cve/CVE-2013-2465/



 If you are not releasing your own security fixes for JDK6
 post-EOL
   would
 it perhaps be safer to say Cloudera is hands-off; neither
  supports,
   nor
 opposes the known insecure and deprecated/unpatched JDK?



 I mentioned before in this thread the Oracle support timeline:



 - official public EOL (end of life) was more than a year ago

 - premier support ended more than six months ago

 - extended support may get critical security fixes until the end
  of
   2016



 Given this timeline, does Cloudera officially take
 responsibility
   for
 Hadoop customer safety? Are you going to be releasing critical
   security
 fixes to a known unsafe JDK?



 Davi







  -Original Message-

  From: Andrew Wang [mailto:andrew.w...@cloudera.com]

  Sent: Wednesday, June 18, 2014 12:33 PM

  To: common-dev@hadoop.apache.org

  Subject: Re: Plans of moving towards JDK7 in trunk

 

  Actually, a lot of our customers are still on JDK6, so if
   anything,
its
 popularity

  hasn't significantly decreased. We still test and support JDK6
  for
CDH4
 and

  CDH5. The claim that branch-2 is effectively JDK7 because no
 one
supports

  JDK6 is untrue.

 

  One issue with your proposal is that java 7+ libraries can
 have
 incompatible

  APIs compared to their java 6 versions. Guava moves very
 quickly
   with
 regard

  to the deprecate+remove cycle. This means branch-2 and trunk
divergence,

  as we're stuck using different Guava APIs to do the same
 thing.

 

  No one's arguing against moving to Java 7+ in trunk
 eventually,
   but
 there isn't

  a clear plan for a trunk-based release. I don't see any point
 to
 switching trunk

  over until that's true, for the aforementioned reasons.

 

  Best,

  Andrew

 

 

  On Wed, Jun 18, 2014 at 12:08 PM, Steve Loughran

  ste...@hortonworks.commailto:ste...@hortonworks.com

  wrote:

 

   I also think we need to recognise that its been three months
   since

   that last discussion, and Java 6 has not suddenly burst

Re: Plans of moving towards JDK7 in trunk

2014-06-20 Thread Steve Loughran
On 20 June 2014 17:01, Andrew Wang andrew.w...@cloudera.com wrote:

 Thanks everyone for the discussion so far. I talked with some of our other
 teams and thought about the issue some more.

 Regarding branch-2, we can't do much because of compatibility. Dropping
 support for a JDK is supposed to happen in a major release. I think we all
 understand this though, so it's not really under discussion.


...although we can just rename hadoop 2.6 as hadoop 3.0 and make that
the java 7
switch.,



 Regarding trunk, I think that leapfrogging to JDK8 is the right move. JDK7
 is EOL April next year, so it'd be better to avoid going through this pain
 twice so soon. Developer momentum also seems very strong behind JDK8
 because of all the shiny new features, so I think we'll see quick adoption.
 We also need some time to clean up APIs and I'm sure people have big,
 incompatible project ideas floating around they'd like to get in.



 With the JDK7 EOL in mind, we need a JDK8-based 3.0 release by mid next
 year. Since I have a strong interest in all these things, I'd like to
 volunteer as release manager for this beast. This means, yep, I'll wrangle
 the builds, worry about compat, bump lib versions, and all those other fun
 tasks. There's clearly a lot to discuss logistically (let's take that to a
 different thread), but this feels like the right way forward to me.

 Best,
 Andrew



I feel the appeal of a jump to java 8, but also fear that it will postpone
that release even more.

If we had a java 7 flag today, we could think -as Raymie proposed- about
having a hadoop-only-runs-on-java7 release relatively easily. There's no
technical cost to migrate to java7, as it is effectively the java version
hadoop is running on. All we would be doing is documenting the fact

In contrast, even making sure the entire Hadoop stack runs on Java 8 is a
major undertaking -which I know, as TWILL-82 shows that it isn't widely
tested. That's making sure it works -not even the big project ideas and any
java 8 migration.

which is something that worries me here   big, incompatible project ideas
floating around they'd like to get in.

There's a risk that this becomes an opportunity for everything to go in, it
ends up taking too long and being pushed out, Hadoop 2.x frozen in java 6
mode for its code and all its dependencies, for at least another year
-which is what is being proposed here.

-steve

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-06-20 Thread Steve Loughran
​Having gone back through the entire thread I can see we've made progress
here, as the discussion has moved on from when to move to java 7 to when to
move to java 8... which, I've alway felt the appeal of from the
coding-side. Java 8 tomorrow is the most compelling reason to move to java
7 today.

It also sets us up to thinking about java 9, where there are already early
access releases available on java.net.

But, as Colin McCabe cmcc...@alumni.cmu.edu wrote on 14 April 2014 09:22,
 I think the bottom line here is that as long as our stable release
 uses JDK6, there is going to be a very, very strong disincentive to
 put any code which can't run on JDK6 into trunk.

and there's a problem. I'm seeing push back now on flipping the java7 bit,
I can imagine how a patch that went to java 8 and added some of the new
streams operations would go down? Java 8 is radically different enough
code-wise from java 6 that if you embrace those new features, you don't
stand a chance of backporting.

We need to move to a more recent java version in release hadoop, so that
trunk  backported code can use java 7 code and libraries. Then trunk can
flip the java 8 jvm bit -while still using java7 language- for as long as
we plan to be able to move code/patches from trunk to release.

This actually argues in favour of

-renaming branch-2 branch-3 after a release
-making trunk hadoop-4

-getting hadoop 3 released off the new branch-3 out in 2014, effectively
being an iteration of branch-2 with updated java , moves of (off?) guava,
off jetty, lib changes, but no other significant big bang features


Hadoop 4.x then becomes the 2015 release, which can add more stuff. In
particular, anything that goes into Hadoop 4 for which there's no intent to
support in hadoop 2  3, can use the java 8 language features sooner rather
than later.

-Steve

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-06-20 Thread Steve Loughran
On 20 June 2014 21:35, Steve Loughran ste...@hortonworks.com wrote:


 This actually argues in favour of

 -renaming branch-2 branch-3 after a release
 -making trunk hadoop-4

 -getting hadoop 3 released off the new branch-3 out in 2014, effectively
 being an iteration of branch-2 with updated java , moves of (off?) guava,
 off jetty, lib changes, but no other significant big bang features


 Hadoop 4.x then becomes the 2015 release, which can add more stuff. In
 particular, anything that goes into Hadoop 4 for which there's no intent to
 support in hadoop 2  3, can use the java 8 language features sooner rather
 than later.



I should add that I'm willing to be the person who gets the Java-7 based
Hadoop  3.x out the door later this year

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-06-20 Thread Arun C Murthy

On Jun 20, 2014, at 9:51 PM, Steve Loughran ste...@hortonworks.com wrote:

 On 20 June 2014 21:35, Steve Loughran ste...@hortonworks.com wrote:
 
 
 This actually argues in favour of
 
 -renaming branch-2 branch-3 after a release
 -making trunk hadoop-4
 
 -getting hadoop 3 released off the new branch-3 out in 2014, effectively
 being an iteration of branch-2 with updated java , moves of (off?) guava,
 off jetty, lib changes, but no other significant big bang features
 
 
 Hadoop 4.x then becomes the 2015 release, which can add more stuff. In
 particular, anything that goes into Hadoop 4 for which there's no intent to
 support in hadoop 2  3, can use the java 8 language features sooner rather
 than later.
 
 
 
 I should add that I'm willing to be the person who gets the Java-7 based
 Hadoop  3.x out the door later this year

+1 that makes sense to me. Thanks for volunteering Steve - I'm glad to share 
the pain… ;-)

Arun
-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-06-19 Thread Andrew Purtell
There are a number of security (algorithm, not vulnerability) and
performance improvements that landed in 8, not 7. As a runtime for the
performance conscious, it might be recommendable. I've come across GC
issues in 6 or 7 where, talking with some Java platform folks, the first
suggested course of action is try again with 8. Would be be more of the
current moment if this discussion was about setting guidelines that
prescribe when and when not to use 8+ language features, or concurrency
library improvements that rely on intrinsics only available in the 8
runtime? Has the Java 6 ship sailed? Just set the minimum supported JDK and
runtime at 7 at next release? Use of the  operator or multicatch wouldn't
and shouldn't need be debated, they are quite minor. On the other hand, I
would imagine discussion and debate on what 8+ language features might be
useful to use at some future time could be a lively one.



On Wed, Jun 18, 2014 at 3:03 PM, Colin McCabe cmcc...@alumni.cmu.edu
wrote:

 In CDH5, Cloudera encourages people to use JDK7.  JDK6 has been EOL
 for a while now and is not something we recommend.

 As we discussed before, everyone is in favor of upgrading to JDK7.
 Every cluster operator of a reasonably modern Hadoop should do it
 whatever distro or release you run.  As developers, we run JDK7 as
 well.

 I'd just like to see a plan for when branch-2 (or some other branch)
 will create a stable release that drops support for JDK1.6.  If we
 don't have such a plan, I feel like it's too early to talk about this
 stuff.

 If we drop support for 1.6 in trunk but not in branch-2, we are
 fragmenting the project.  People will start writing unreleaseable code
 (because it doesn't work on branch-2) and we'll be back to the bad old
 days of Hadoop version fragmentation that branch-2 was intended to
 solve.  Backports will become harder.  The biggest problem is that
 trunk will start to depend on libraries or Maven plugins that branch-2
 can't even use, because they're JDK7+-only.

 Steve wrote: if someone actually did file a bug on something on
 branch-2 which didn't work on Java 6 but went away on Java7+, we'd
 probably close it as a WORKSFORME.

 Steve, if this is true, we should just bump the minimum supported
 version for branch-2 to 1.7 today and resolve this.  If we truly
 believe that there are no issues here, then let's just decide to drop
 1.6 in a specific future release of Hadoop 2.  If there are issues
 with releasing JDK1.7+ only code, then let's figure out what they are
 before proceeding.

 best,
 Colin


 On Wed, Jun 18, 2014 at 1:41 PM, Sandy Ryza sandy.r...@cloudera.com
 wrote:
  We do release warnings when we are aware of vulnerabilities in our
  dependencies.
 
  However, unless I'm grossly misunderstanding, the vulnerability that you
  point out is not a vulnerability within the context of our software.
   Hadoop doesn't try to sandbox within JVMs.  In a secure setup, any JVM
  running non-trusted user code is running as that user, so breaking out
  doesn't offer the ability to do anything malicious.
 
  -Sandy
 
  On Wed, Jun 18, 2014 at 1:30 PM, Ottenheimer, Davi 
 davi.ottenhei...@emc.com
  wrote:
 
  Andrew,
 
 
 
  “I don't see any point to switching” is an interesting perspective,
 given
  the well-known risks of running unsafe software. Clearly customer best
  interest is stability. JDK6 is in a known unsafe state. The longer
 anyone
  delays the necessary transition to safety the longer the door is left
 open
  to predictable disaster.
 
 
 
  You also said we still test and support JDK6. I searched but have not
  been able to find Cloudera critical security fixes for JDK6.
 
 
 
  Can you clarify, for example, Java 6 Update 51 for CVE-2013-2465? In
 other
  words, did you release to your customers any kind of public alert or
  warning of this CVSS 10.0 event as part of your JDK6 support?
 
 
 
  http://www.cvedetails.com/cve/CVE-2013-2465/
 
 
 
  If you are not releasing your own security fixes for JDK6 post-EOL would
  it perhaps be safer to say Cloudera is hands-off; neither supports, nor
  opposes the known insecure and deprecated/unpatched JDK?
 
 
 
  I mentioned before in this thread the Oracle support timeline:
 
 
 
  - official public EOL (end of life) was more than a year ago
 
  - premier support ended more than six months ago
 
  - extended support may get critical security fixes until the end of 2016
 
 
 
  Given this timeline, does Cloudera officially take responsibility for
  Hadoop customer safety? Are you going to be releasing critical security
  fixes to a known unsafe JDK?
 
 
 
  Davi
 
 
 
 
 
 
 
   -Original Message-
 
   From: Andrew Wang [mailto:andrew.w...@cloudera.com]
 
   Sent: Wednesday, June 18, 2014 12:33 PM
 
   To: common-dev@hadoop.apache.org
 
   Subject: Re: Plans of moving towards JDK7 in trunk
 
  
 
   Actually, a lot of our customers are still on JDK6, so if anything,
 its
  popularity
 
   hasn't significantly decreased. We

Re: Plans of moving towards JDK7 in trunk

2014-06-18 Thread Colin McCabe
I think we should come up with a plan for when the next Hadoop release
will drop support for JDK6.  We all know that day needs to come... the
only question is when.  I agree that writing the JDK7-only code
doesn't seem very productive unless we have a plan for when it will be
released and usable.

best,
Colin

On Tue, Jun 17, 2014 at 10:08 PM, Andrew Wang andrew.w...@cloudera.com wrote:
 Reviving this thread, I noticed there's been a patch and +1 on
 HADOOP-10530, and I don't think we actually reached a conclusion.

 I (and others) have expressed concerns about moving to JDK7 for trunk.
 Summarizing a few points:

 - We can't move to JDK7 in branch-2 because of compatibility
 - branch-2 is currently the only Hadoop release vehicle, there are no plans
 for a trunk-based Hadoop 3
 - Introducing JDK7-only APIs in trunk will increase divergence with
 branch-2 and make backports harder
 - Almost all developers care only about branch-2, since it is the only
 release vehicle

 With this in mind, I struggle to see any upsides to introducing JDK7-only
 APIs to trunk. Please let's not do anything on HADOOP-10530 or related
 until we agree on this.

 Thanks,
 Andrew


 On Mon, Apr 14, 2014 at 3:31 PM, Steve Loughran ste...@hortonworks.com
 wrote:

 On 14 April 2014 17:46, Andrew Purtell apurt...@apache.org wrote:

  How well is trunk tested? Does anyone deploy it with real applications
  running on top? When will the trunk codebase next be the basis for a
  production release? An impromptu diff of hadoop-common trunk against
  branch-2 as of today is 38,625 lines. Can they be said to be the same
  animal? I ask because any disincentive toward putting code in trunk is
  beside the point, if the only target worth pursuing today is branch-2
  unless one doesn't care if the code is released for production use.
  Questions on whither JDK6 or JDK7+ (or JRE6 versus JRE7+) only matter for
  the vast majority of Hadoopers if talking about branch-2.
 
 
 I think its partly a timescale issue; its also because the 1-2 transition
 was so significant, especially at the YARN layer, that it's still taking
 time to trickle through.

 If you do want code to ship this year, branch-2 is where you are going to
 try and get it in -and like you say, that's where things get tried in the
 field. At the same time, the constraints of stability are holding us back
 -already-.

 I don't see why we should have such another major 1-2 transition in future;
 the rate that Arun is pushing out 2.x releases its almost back to the 0.1x
 timescale -though at that point most people were fending for themselves and
 expectations of stability were less. We do want smaller version increments
 in future, which branch-2 is -mostly- delivering.

 While Java 7 doesn't have some must-have features, Java 8 is a significant
 improvement in the language, and we should be looking ahead to that, maybe
 even doing some leading-edge work on the side, so the same discussion
 doesn't come up in two years time when java 7 goes EOL.


 -steve

 (personal opinions only, etc, )


 
  On Mon, Apr 14, 2014 at 9:22 AM, Colin McCabe cmcc...@alumni.cmu.edu
  wrote:
 
   I think the bottom line here is that as long as our stable release
   uses JDK6, there is going to be a very, very strong disincentive to
   put any code which can't run on JDK6 into trunk.
  
   Like I said earlier, the traditional reason for putting something in
   trunk but not the stable release is that it needs more testing.  If a
   stable release that drops support for JDK6 is more than a year away,
   does it make sense to put anything in trunk like that?  What might
   need more than a year of testing?  Certainly not changes to
   LocalFileSystem to use the new APIs.  I also don't think an upgrade to
   various libraries qualifies.
  
   It might be best to shelve this for now, like we've done in the past,
   until we're ready to talk about a stable release that requires JDK7+.
   At least that's my feeling.
  
   If we're really desperate for the new file APIs JDK7 provides, we
   could consider using loadable modules for it in branch-2.  This is
   similar to how we provide JNI versions of certain things on certain
   platforms, without dropping support for the other platforms.
  
   best,
   Colin
  
   On Sun, Apr 13, 2014 at 10:39 AM, Raymie Stata rst...@altiscale.com
   wrote:
There's an outstanding question addressed to me: Are there
 particular
features or new dependencies that you would like to contribute (or
 see
contributed) that require using the Java 1.7 APIs?  The question
misses the point: We'd figure out how to write something we wanted to
contribute to Hadoop against the APIs of Java4 if that's what it took
to get them into a stable release.  And at current course and speed,
that's how ridiculous things could get.
   
To summarize, it seems like there's a vague consensus that it might
 be
okay to eventually allow the use of Java7 in trunk, but there's no

Re: Plans of moving towards JDK7 in trunk

2014-06-18 Thread Steve Loughran
I also think we need to recognise that its been three months since that
last discussion, and Java 6 has not suddenly burst back into popularity


   - nobody providing commercial support for Hadoop is offering branch-2
   support on Java 6 AFAIK
   - therefore, nobody is testing it at scale except privately, and they
   aren't reporting bugs if they are
   - if someone actually did file a bug on something on branch-2 which
   didn't work on Java 6 but went away on Java7+, we'd probably close it as a
   WORKSFORME


whether we acknowledge it or not, Hadoop 2.x is now really Java 7+.

We do all agree that hadoop 3 will not be java 6, so the only issue is
when and how to make that transition.

That patch of mine just makes it possible to do today.

I have actually jumped to Java7 in the slider project, and actually being
using Java 8 and twill; the new language features there are significant and
would be great to use in Hadoop *at some point in the future*

For Java 7 though, based on that experience, the language changes are
convenient but not essential

   - try-with-resources simply swallows close failures without the log
   integration we have with IOUtils.closeStream(), so shoudn't be used in
   hadoop core anyway.
   - string based switching: convenient, but not critical
   - type inference on template constructors. Modern IDEs handle the pain
   anyway

The only feature I like is multi-catch and typed rethrow

catch(IOException | ExitException e) {
 log.warn(e.toString();
 throw e;
}

this would make e look like Exception, but when rethrown go back to its
original type.

This reduces duplicate work, and is the bit l actually value. Is it enough
to justify making code incompatible across branches? No.

So i'm going to propose this, and would like to start a vote on it soon


   1. we parameterize java versions in the POMs on all branches, with
   separate JDK versions and Java language
   2. branch-2: java-6-language and JDK-6 minimum JDK
   3. trunk: java-6-language and JDK-7 minimum JDK

This would guarantee that none of the java 7 language features went in, but
we could move trunk up to java 7+ only libraries (jersey, guava). Adopting
JDK7 features then becomes no more different from adopting java7+
libraries: those bits of code that have moved can't be backported.

-Steve





On 17 June 2014 22:08, Andrew Wang andrew.w...@cloudera.com wrote:

 Reviving this thread, I noticed there's been a patch and +1 on
 HADOOP-10530, and I don't think we actually reached a conclusion.

 I (and others) have expressed concerns about moving to JDK7 for trunk.
 Summarizing a few points:

 - We can't move to JDK7 in branch-2 because of compatibility
 - branch-2 is currently the only Hadoop release vehicle, there are no plans
 for a trunk-based Hadoop 3
 - Introducing JDK7-only APIs in trunk will increase divergence with
 branch-2 and make backports harder
 - Almost all developers care only about branch-2, since it is the only
 release vehicle

 With this in mind, I struggle to see any upsides to introducing JDK7-only
 APIs to trunk. Please let's not do anything on HADOOP-10530 or related
 until we agree on this.

 Thanks,
 Andrew


 On Mon, Apr 14, 2014 at 3:31 PM, Steve Loughran ste...@hortonworks.com
 wrote:

  On 14 April 2014 17:46, Andrew Purtell apurt...@apache.org wrote:
 
   How well is trunk tested? Does anyone deploy it with real applications
   running on top? When will the trunk codebase next be the basis for a
   production release? An impromptu diff of hadoop-common trunk against
   branch-2 as of today is 38,625 lines. Can they be said to be the same
   animal? I ask because any disincentive toward putting code in trunk is
   beside the point, if the only target worth pursuing today is branch-2
   unless one doesn't care if the code is released for production use.
   Questions on whither JDK6 or JDK7+ (or JRE6 versus JRE7+) only matter
 for
   the vast majority of Hadoopers if talking about branch-2.
  
  
  I think its partly a timescale issue; its also because the 1-2 transition
  was so significant, especially at the YARN layer, that it's still taking
  time to trickle through.
 
  If you do want code to ship this year, branch-2 is where you are going to
  try and get it in -and like you say, that's where things get tried in the
  field. At the same time, the constraints of stability are holding us back
  -already-.
 
  I don't see why we should have such another major 1-2 transition in
 future;
  the rate that Arun is pushing out 2.x releases its almost back to the
 0.1x
  timescale -though at that point most people were fending for themselves
 and
  expectations of stability were less. We do want smaller version
 increments
  in future, which branch-2 is -mostly- delivering.
 
  While Java 7 doesn't have some must-have features, Java 8 is a
 significant
  improvement in the language, and we should be looking ahead to that,
 maybe
  even doing some leading-edge work on the side, so the same 

Re: Plans of moving towards JDK7 in trunk

2014-06-18 Thread Andrew Wang
Actually, a lot of our customers are still on JDK6, so if anything, its
popularity hasn't significantly decreased. We still test and support JDK6
for CDH4 and CDH5. The claim that branch-2 is effectively JDK7 because no
one supports JDK6 is untrue.

One issue with your proposal is that java 7+ libraries can have
incompatible APIs compared to their java 6 versions. Guava moves very
quickly with regard to the deprecate+remove cycle. This means branch-2 and
trunk divergence, as we're stuck using different Guava APIs to do the same
thing.

No one's arguing against moving to Java 7+ in trunk eventually, but there
isn't a clear plan for a trunk-based release. I don't see any point to
switching trunk over until that's true, for the aforementioned reasons.

Best,
Andrew


On Wed, Jun 18, 2014 at 12:08 PM, Steve Loughran ste...@hortonworks.com
wrote:

 I also think we need to recognise that its been three months since that
 last discussion, and Java 6 has not suddenly burst back into popularity


- nobody providing commercial support for Hadoop is offering branch-2
support on Java 6 AFAIK
- therefore, nobody is testing it at scale except privately, and they
aren't reporting bugs if they are
- if someone actually did file a bug on something on branch-2 which
didn't work on Java 6 but went away on Java7+, we'd probably close it
 as a
WORKSFORME


 whether we acknowledge it or not, Hadoop 2.x is now really Java 7+.

 We do all agree that hadoop 3 will not be java 6, so the only issue is
 when and how to make that transition.

 That patch of mine just makes it possible to do today.

 I have actually jumped to Java7 in the slider project, and actually being
 using Java 8 and twill; the new language features there are significant and
 would be great to use in Hadoop *at some point in the future*

 For Java 7 though, based on that experience, the language changes are
 convenient but not essential

- try-with-resources simply swallows close failures without the log
integration we have with IOUtils.closeStream(), so shoudn't be used in
hadoop core anyway.
- string based switching: convenient, but not critical
- type inference on template constructors. Modern IDEs handle the pain
anyway

 The only feature I like is multi-catch and typed rethrow

 catch(IOException | ExitException e) {
  log.warn(e.toString();
  throw e;
 }

 this would make e look like Exception, but when rethrown go back to its
 original type.

 This reduces duplicate work, and is the bit l actually value. Is it enough
 to justify making code incompatible across branches? No.

 So i'm going to propose this, and would like to start a vote on it soon


1. we parameterize java versions in the POMs on all branches, with
separate JDK versions and Java language
2. branch-2: java-6-language and JDK-6 minimum JDK
3. trunk: java-6-language and JDK-7 minimum JDK

 This would guarantee that none of the java 7 language features went in, but
 we could move trunk up to java 7+ only libraries (jersey, guava). Adopting
 JDK7 features then becomes no more different from adopting java7+
 libraries: those bits of code that have moved can't be backported.

 -Steve





 On 17 June 2014 22:08, Andrew Wang andrew.w...@cloudera.com wrote:

  Reviving this thread, I noticed there's been a patch and +1 on
  HADOOP-10530, and I don't think we actually reached a conclusion.
 
  I (and others) have expressed concerns about moving to JDK7 for trunk.
  Summarizing a few points:
 
  - We can't move to JDK7 in branch-2 because of compatibility
  - branch-2 is currently the only Hadoop release vehicle, there are no
 plans
  for a trunk-based Hadoop 3
  - Introducing JDK7-only APIs in trunk will increase divergence with
  branch-2 and make backports harder
  - Almost all developers care only about branch-2, since it is the only
  release vehicle
 
  With this in mind, I struggle to see any upsides to introducing JDK7-only
  APIs to trunk. Please let's not do anything on HADOOP-10530 or related
  until we agree on this.
 
  Thanks,
  Andrew
 
 
  On Mon, Apr 14, 2014 at 3:31 PM, Steve Loughran ste...@hortonworks.com
  wrote:
 
   On 14 April 2014 17:46, Andrew Purtell apurt...@apache.org wrote:
  
How well is trunk tested? Does anyone deploy it with real
 applications
running on top? When will the trunk codebase next be the basis for a
production release? An impromptu diff of hadoop-common trunk against
branch-2 as of today is 38,625 lines. Can they be said to be the same
animal? I ask because any disincentive toward putting code in trunk
 is
beside the point, if the only target worth pursuing today is branch-2
unless one doesn't care if the code is released for production use.
Questions on whither JDK6 or JDK7+ (or JRE6 versus JRE7+) only matter
  for
the vast majority of Hadoopers if talking about branch-2.
   
   
   I think its partly a timescale issue; its also because the 1-2
 

Re: Plans of moving towards JDK7 in trunk

2014-06-18 Thread Steve Loughran
On 18 June 2014 12:32, Andrew Wang andrew.w...@cloudera.com wrote:

 Actually, a lot of our customers are still on JDK6, so if anything, its
 popularity hasn't significantly decreased. We still test and support JDK6
 for CDH4 and CDH5. The claim that branch-2 is effectively JDK7 because no
 one supports JDK6 is untrue.


Really?  I was misinformed
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Requirements-and-Supported-Versions/cdhrsv_jdk.html

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


RE: Plans of moving towards JDK7 in trunk

2014-06-18 Thread Ottenheimer, Davi
Andrew,



“I don't see any point to switching” is an interesting perspective, given the 
well-known risks of running unsafe software. Clearly customer best interest is 
stability. JDK6 is in a known unsafe state. The longer anyone delays the 
necessary transition to safety the longer the door is left open to predictable 
disaster.



You also said we still test and support JDK6. I searched but have not been 
able to find Cloudera critical security fixes for JDK6.



Can you clarify, for example, Java 6 Update 51 for CVE-2013-2465? In other 
words, did you release to your customers any kind of public alert or warning of 
this CVSS 10.0 event as part of your JDK6 support?



http://www.cvedetails.com/cve/CVE-2013-2465/



If you are not releasing your own security fixes for JDK6 post-EOL would it 
perhaps be safer to say Cloudera is hands-off; neither supports, nor opposes 
the known insecure and deprecated/unpatched JDK?



I mentioned before in this thread the Oracle support timeline:



- official public EOL (end of life) was more than a year ago

- premier support ended more than six months ago

- extended support may get critical security fixes until the end of 2016



Given this timeline, does Cloudera officially take responsibility for Hadoop 
customer safety? Are you going to be releasing critical security fixes to a 
known unsafe JDK?



Davi







 -Original Message-

 From: Andrew Wang [mailto:andrew.w...@cloudera.com]

 Sent: Wednesday, June 18, 2014 12:33 PM

 To: common-dev@hadoop.apache.org

 Subject: Re: Plans of moving towards JDK7 in trunk



 Actually, a lot of our customers are still on JDK6, so if anything, its 
 popularity

 hasn't significantly decreased. We still test and support JDK6 for CDH4 and

 CDH5. The claim that branch-2 is effectively JDK7 because no one supports

 JDK6 is untrue.



 One issue with your proposal is that java 7+ libraries can have incompatible

 APIs compared to their java 6 versions. Guava moves very quickly with regard

 to the deprecate+remove cycle. This means branch-2 and trunk divergence,

 as we're stuck using different Guava APIs to do the same thing.



 No one's arguing against moving to Java 7+ in trunk eventually, but there 
 isn't

 a clear plan for a trunk-based release. I don't see any point to switching 
 trunk

 over until that's true, for the aforementioned reasons.



 Best,

 Andrew





 On Wed, Jun 18, 2014 at 12:08 PM, Steve Loughran

 ste...@hortonworks.commailto:ste...@hortonworks.com

 wrote:



  I also think we need to recognise that its been three months since

  that last discussion, and Java 6 has not suddenly burst back into

  popularity

 

 

 - nobody providing commercial support for Hadoop is offering branch-2

 support on Java 6 AFAIK

 - therefore, nobody is testing it at scale except privately, and they

 aren't reporting bugs if they are

 - if someone actually did file a bug on something on branch-2 which

 didn't work on Java 6 but went away on Java7+, we'd probably close

  it as a

 WORKSFORME

 

 

  whether we acknowledge it or not, Hadoop 2.x is now really Java 7+.

 

  We do all agree that hadoop 3 will not be java 6, so the only issue is

  when and how to make that transition.

 

  That patch of mine just makes it possible to do today.

 

  I have actually jumped to Java7 in the slider project, and actually

  being using Java 8 and twill; the new language features there are

  significant and would be great to use in Hadoop *at some point in the

  future*

 

  For Java 7 though, based on that experience, the language changes are

  convenient but not essential

 

 - try-with-resources simply swallows close failures without the log

 integration we have with IOUtils.closeStream(), so shoudn't be used in

 hadoop core anyway.

 - string based switching: convenient, but not critical

 - type inference on template constructors. Modern IDEs handle the pain

 anyway

 

  The only feature I like is multi-catch and typed rethrow

 

  catch(IOException | ExitException e) {  log.warn(e.toString();  throw

  e; }

 

  this would make e look like Exception, but when rethrown go back to

  its original type.

 

  This reduces duplicate work, and is the bit l actually value. Is it

  enough to justify making code incompatible across branches? No.

 

  So i'm going to propose this, and would like to start a vote on it

  soon

 

 

 1. we parameterize java versions in the POMs on all branches, with

 separate JDK versions and Java language

 2. branch-2: java-6-language and JDK-6 minimum JDK

 3. trunk: java-6-language and JDK-7 minimum JDK

 

  This would guarantee that none of the java 7 language features went

  in, but we could move trunk up to java 7+ only libraries (jersey,

  guava). Adopting

  JDK7 features then becomes no more different from adopting java7+

  libraries: those bits of code that have moved can't

Re: Plans of moving towards JDK7 in trunk

2014-06-18 Thread Steve Loughran
Most of the security problems in Java are sandbox jailbreaking and not
relevant. Anything related to kerberos, HTTPS or other in-cluster security
issues would be a different story...I haven't heard anything. Its a
different matter client-side, but anyone who enables Java in their web
browsers is doomed already.

Java security issues may matter developer-side, as if you really want to
support java6, you need a java6 JVM to hand. There's a risk there...but if
you run an OS/X box apple keep them around for you even after you upgrade
(try /usr/libexec/java_home -V to see this).


On 18 June 2014 13:41, Sandy Ryza sandy.r...@cloudera.com wrote:

 We do release warnings when we are aware of vulnerabilities in our
 dependencies.

 However, unless I'm grossly misunderstanding, the vulnerability that you
 point out is not a vulnerability within the context of our software.
  Hadoop doesn't try to sandbox within JVMs.  In a secure setup, any JVM
 running non-trusted user code is running as that user, so breaking out
 doesn't offer the ability to do anything malicious.

 -Sandy

 On Wed, Jun 18, 2014 at 1:30 PM, Ottenheimer, Davi 
 davi.ottenhei...@emc.com
  wrote:

  Andrew,
 
 
 
  “I don't see any point to switching” is an interesting perspective, given
  the well-known risks of running unsafe software. Clearly customer best
  interest is stability. JDK6 is in a known unsafe state. The longer anyone
  delays the necessary transition to safety the longer the door is left
 open
  to predictable disaster.
 
 
 
  You also said we still test and support JDK6. I searched but have not
  been able to find Cloudera critical security fixes for JDK6.
 
 
 
  Can you clarify, for example, Java 6 Update 51 for CVE-2013-2465? In
 other
  words, did you release to your customers any kind of public alert or
  warning of this CVSS 10.0 event as part of your JDK6 support?
 
 
 
  http://www.cvedetails.com/cve/CVE-2013-2465/
 
 
 
  If you are not releasing your own security fixes for JDK6 post-EOL would
  it perhaps be safer to say Cloudera is hands-off; neither supports, nor
  opposes the known insecure and deprecated/unpatched JDK?
 
 
 
  I mentioned before in this thread the Oracle support timeline:
 
 
 
  - official public EOL (end of life) was more than a year ago
 
  - premier support ended more than six months ago
 
  - extended support may get critical security fixes until the end of 2016
 
 
 
  Given this timeline, does Cloudera officially take responsibility for
  Hadoop customer safety? Are you going to be releasing critical security
  fixes to a known unsafe JDK?
 
 
 
  Davi
 
 
 
 
 
 
 
   -Original Message-
 
   From: Andrew Wang [mailto:andrew.w...@cloudera.com]
 
   Sent: Wednesday, June 18, 2014 12:33 PM
 
   To: common-dev@hadoop.apache.org
 
   Subject: Re: Plans of moving towards JDK7 in trunk
 
  
 
   Actually, a lot of our customers are still on JDK6, so if anything, its
  popularity
 
   hasn't significantly decreased. We still test and support JDK6 for CDH4
  and
 
   CDH5. The claim that branch-2 is effectively JDK7 because no one
 supports
 
   JDK6 is untrue.
 
  
 
   One issue with your proposal is that java 7+ libraries can have
  incompatible
 
   APIs compared to their java 6 versions. Guava moves very quickly with
  regard
 
   to the deprecate+remove cycle. This means branch-2 and trunk
 divergence,
 
   as we're stuck using different Guava APIs to do the same thing.
 
  
 
   No one's arguing against moving to Java 7+ in trunk eventually, but
  there isn't
 
   a clear plan for a trunk-based release. I don't see any point to
  switching trunk
 
   over until that's true, for the aforementioned reasons.
 
  
 
   Best,
 
   Andrew
 
  
 
  
 
   On Wed, Jun 18, 2014 at 12:08 PM, Steve Loughran
 
   ste...@hortonworks.commailto:ste...@hortonworks.com
 
   wrote:
 
  
 
I also think we need to recognise that its been three months since
 
that last discussion, and Java 6 has not suddenly burst back into
 
popularity
 
   
 
   
 
   - nobody providing commercial support for Hadoop is offering
  branch-2
 
   support on Java 6 AFAIK
 
   - therefore, nobody is testing it at scale except privately, and
  they
 
   aren't reporting bugs if they are
 
   - if someone actually did file a bug on something on branch-2
 which
 
   didn't work on Java 6 but went away on Java7+, we'd probably close
 
it as a
 
   WORKSFORME
 
   
 
   
 
whether we acknowledge it or not, Hadoop 2.x is now really Java 7+.
 
   
 
We do all agree that hadoop 3 will not be java 6, so the only issue
 is
 
when and how to make that transition.
 
   
 
That patch of mine just makes it possible to do today.
 
   
 
I have actually jumped to Java7 in the slider project, and actually
 
being using Java 8 and twill; the new language features there are
 
significant and would be great to use in Hadoop *at some point in the
 
future

Re: Plans of moving towards JDK7 in trunk

2014-06-18 Thread Colin McCabe
In CDH5, Cloudera encourages people to use JDK7.  JDK6 has been EOL
for a while now and is not something we recommend.

As we discussed before, everyone is in favor of upgrading to JDK7.
Every cluster operator of a reasonably modern Hadoop should do it
whatever distro or release you run.  As developers, we run JDK7 as
well.

I'd just like to see a plan for when branch-2 (or some other branch)
will create a stable release that drops support for JDK1.6.  If we
don't have such a plan, I feel like it's too early to talk about this
stuff.

If we drop support for 1.6 in trunk but not in branch-2, we are
fragmenting the project.  People will start writing unreleaseable code
(because it doesn't work on branch-2) and we'll be back to the bad old
days of Hadoop version fragmentation that branch-2 was intended to
solve.  Backports will become harder.  The biggest problem is that
trunk will start to depend on libraries or Maven plugins that branch-2
can't even use, because they're JDK7+-only.

Steve wrote: if someone actually did file a bug on something on
branch-2 which didn't work on Java 6 but went away on Java7+, we'd
probably close it as a WORKSFORME.

Steve, if this is true, we should just bump the minimum supported
version for branch-2 to 1.7 today and resolve this.  If we truly
believe that there are no issues here, then let's just decide to drop
1.6 in a specific future release of Hadoop 2.  If there are issues
with releasing JDK1.7+ only code, then let's figure out what they are
before proceeding.

best,
Colin


On Wed, Jun 18, 2014 at 1:41 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
 We do release warnings when we are aware of vulnerabilities in our
 dependencies.

 However, unless I'm grossly misunderstanding, the vulnerability that you
 point out is not a vulnerability within the context of our software.
  Hadoop doesn't try to sandbox within JVMs.  In a secure setup, any JVM
 running non-trusted user code is running as that user, so breaking out
 doesn't offer the ability to do anything malicious.

 -Sandy

 On Wed, Jun 18, 2014 at 1:30 PM, Ottenheimer, Davi davi.ottenhei...@emc.com
 wrote:

 Andrew,



 “I don't see any point to switching” is an interesting perspective, given
 the well-known risks of running unsafe software. Clearly customer best
 interest is stability. JDK6 is in a known unsafe state. The longer anyone
 delays the necessary transition to safety the longer the door is left open
 to predictable disaster.



 You also said we still test and support JDK6. I searched but have not
 been able to find Cloudera critical security fixes for JDK6.



 Can you clarify, for example, Java 6 Update 51 for CVE-2013-2465? In other
 words, did you release to your customers any kind of public alert or
 warning of this CVSS 10.0 event as part of your JDK6 support?



 http://www.cvedetails.com/cve/CVE-2013-2465/



 If you are not releasing your own security fixes for JDK6 post-EOL would
 it perhaps be safer to say Cloudera is hands-off; neither supports, nor
 opposes the known insecure and deprecated/unpatched JDK?



 I mentioned before in this thread the Oracle support timeline:



 - official public EOL (end of life) was more than a year ago

 - premier support ended more than six months ago

 - extended support may get critical security fixes until the end of 2016



 Given this timeline, does Cloudera officially take responsibility for
 Hadoop customer safety? Are you going to be releasing critical security
 fixes to a known unsafe JDK?



 Davi







  -Original Message-

  From: Andrew Wang [mailto:andrew.w...@cloudera.com]

  Sent: Wednesday, June 18, 2014 12:33 PM

  To: common-dev@hadoop.apache.org

  Subject: Re: Plans of moving towards JDK7 in trunk

 

  Actually, a lot of our customers are still on JDK6, so if anything, its
 popularity

  hasn't significantly decreased. We still test and support JDK6 for CDH4
 and

  CDH5. The claim that branch-2 is effectively JDK7 because no one supports

  JDK6 is untrue.

 

  One issue with your proposal is that java 7+ libraries can have
 incompatible

  APIs compared to their java 6 versions. Guava moves very quickly with
 regard

  to the deprecate+remove cycle. This means branch-2 and trunk divergence,

  as we're stuck using different Guava APIs to do the same thing.

 

  No one's arguing against moving to Java 7+ in trunk eventually, but
 there isn't

  a clear plan for a trunk-based release. I don't see any point to
 switching trunk

  over until that's true, for the aforementioned reasons.

 

  Best,

  Andrew

 

 

  On Wed, Jun 18, 2014 at 12:08 PM, Steve Loughran

  ste...@hortonworks.commailto:ste...@hortonworks.com

  wrote:

 

   I also think we need to recognise that its been three months since

   that last discussion, and Java 6 has not suddenly burst back into

   popularity

  

  

  - nobody providing commercial support for Hadoop is offering
 branch-2

  support on Java 6 AFAIK

  - therefore, nobody

Re: Plans of moving towards JDK7 in trunk

2014-06-17 Thread Andrew Wang
Reviving this thread, I noticed there's been a patch and +1 on
HADOOP-10530, and I don't think we actually reached a conclusion.

I (and others) have expressed concerns about moving to JDK7 for trunk.
Summarizing a few points:

- We can't move to JDK7 in branch-2 because of compatibility
- branch-2 is currently the only Hadoop release vehicle, there are no plans
for a trunk-based Hadoop 3
- Introducing JDK7-only APIs in trunk will increase divergence with
branch-2 and make backports harder
- Almost all developers care only about branch-2, since it is the only
release vehicle

With this in mind, I struggle to see any upsides to introducing JDK7-only
APIs to trunk. Please let's not do anything on HADOOP-10530 or related
until we agree on this.

Thanks,
Andrew


On Mon, Apr 14, 2014 at 3:31 PM, Steve Loughran ste...@hortonworks.com
wrote:

 On 14 April 2014 17:46, Andrew Purtell apurt...@apache.org wrote:

  How well is trunk tested? Does anyone deploy it with real applications
  running on top? When will the trunk codebase next be the basis for a
  production release? An impromptu diff of hadoop-common trunk against
  branch-2 as of today is 38,625 lines. Can they be said to be the same
  animal? I ask because any disincentive toward putting code in trunk is
  beside the point, if the only target worth pursuing today is branch-2
  unless one doesn't care if the code is released for production use.
  Questions on whither JDK6 or JDK7+ (or JRE6 versus JRE7+) only matter for
  the vast majority of Hadoopers if talking about branch-2.
 
 
 I think its partly a timescale issue; its also because the 1-2 transition
 was so significant, especially at the YARN layer, that it's still taking
 time to trickle through.

 If you do want code to ship this year, branch-2 is where you are going to
 try and get it in -and like you say, that's where things get tried in the
 field. At the same time, the constraints of stability are holding us back
 -already-.

 I don't see why we should have such another major 1-2 transition in future;
 the rate that Arun is pushing out 2.x releases its almost back to the 0.1x
 timescale -though at that point most people were fending for themselves and
 expectations of stability were less. We do want smaller version increments
 in future, which branch-2 is -mostly- delivering.

 While Java 7 doesn't have some must-have features, Java 8 is a significant
 improvement in the language, and we should be looking ahead to that, maybe
 even doing some leading-edge work on the side, so the same discussion
 doesn't come up in two years time when java 7 goes EOL.


 -steve

 (personal opinions only, etc, )


 
  On Mon, Apr 14, 2014 at 9:22 AM, Colin McCabe cmcc...@alumni.cmu.edu
  wrote:
 
   I think the bottom line here is that as long as our stable release
   uses JDK6, there is going to be a very, very strong disincentive to
   put any code which can't run on JDK6 into trunk.
  
   Like I said earlier, the traditional reason for putting something in
   trunk but not the stable release is that it needs more testing.  If a
   stable release that drops support for JDK6 is more than a year away,
   does it make sense to put anything in trunk like that?  What might
   need more than a year of testing?  Certainly not changes to
   LocalFileSystem to use the new APIs.  I also don't think an upgrade to
   various libraries qualifies.
  
   It might be best to shelve this for now, like we've done in the past,
   until we're ready to talk about a stable release that requires JDK7+.
   At least that's my feeling.
  
   If we're really desperate for the new file APIs JDK7 provides, we
   could consider using loadable modules for it in branch-2.  This is
   similar to how we provide JNI versions of certain things on certain
   platforms, without dropping support for the other platforms.
  
   best,
   Colin
  
   On Sun, Apr 13, 2014 at 10:39 AM, Raymie Stata rst...@altiscale.com
   wrote:
There's an outstanding question addressed to me: Are there
 particular
features or new dependencies that you would like to contribute (or
 see
contributed) that require using the Java 1.7 APIs?  The question
misses the point: We'd figure out how to write something we wanted to
contribute to Hadoop against the APIs of Java4 if that's what it took
to get them into a stable release.  And at current course and speed,
that's how ridiculous things could get.
   
To summarize, it seems like there's a vague consensus that it might
 be
okay to eventually allow the use of Java7 in trunk, but there's no
decision.  And there's been no answer to the concern that even if
 such
dependencies were allowed in Java7, the only people using them would
be people who uninterested in getting their patches into a stable
release of Hadoop on any knowable timeframe, which doesn't bode well
for the ability to stabilize that Java7 code when it comes time to
attempt to.
   
I don't have more to 

Re: Plans of moving towards JDK7 in trunk

2014-04-14 Thread Colin McCabe
I think the bottom line here is that as long as our stable release
uses JDK6, there is going to be a very, very strong disincentive to
put any code which can't run on JDK6 into trunk.

Like I said earlier, the traditional reason for putting something in
trunk but not the stable release is that it needs more testing.  If a
stable release that drops support for JDK6 is more than a year away,
does it make sense to put anything in trunk like that?  What might
need more than a year of testing?  Certainly not changes to
LocalFileSystem to use the new APIs.  I also don't think an upgrade to
various libraries qualifies.

It might be best to shelve this for now, like we've done in the past,
until we're ready to talk about a stable release that requires JDK7+.
At least that's my feeling.

If we're really desperate for the new file APIs JDK7 provides, we
could consider using loadable modules for it in branch-2.  This is
similar to how we provide JNI versions of certain things on certain
platforms, without dropping support for the other platforms.

best,
Colin

On Sun, Apr 13, 2014 at 10:39 AM, Raymie Stata rst...@altiscale.com wrote:
 There's an outstanding question addressed to me: Are there particular
 features or new dependencies that you would like to contribute (or see
 contributed) that require using the Java 1.7 APIs?  The question
 misses the point: We'd figure out how to write something we wanted to
 contribute to Hadoop against the APIs of Java4 if that's what it took
 to get them into a stable release.  And at current course and speed,
 that's how ridiculous things could get.

 To summarize, it seems like there's a vague consensus that it might be
 okay to eventually allow the use of Java7 in trunk, but there's no
 decision.  And there's been no answer to the concern that even if such
 dependencies were allowed in Java7, the only people using them would
 be people who uninterested in getting their patches into a stable
 release of Hadoop on any knowable timeframe, which doesn't bode well
 for the ability to stabilize that Java7 code when it comes time to
 attempt to.

 I don't have more to add, so I'll go back to lurking.  It'll be
 interesting to see where we'll be standing a year from now.

 On Sun, Apr 13, 2014 at 2:09 AM, Tsuyoshi OZAWA
 ozawa.tsuyo...@gmail.com wrote:
 Hi,

 +1 for Karthik's idea(non-binding).

 IMO, we should keep the compatibility between JDK 6 and JDK 7 on both 
 branch-1
 and branch-2, because users can be using them. For future releases that we 
 can
 declare breaking compatibility(e.g. 3.0.0 release), we can use JDK 7
 features if we
 can get benefits. However, it can increase maintenance costs and distributes 
 the
 efforts of contributions to maintain branches. Then, I think it is
 reasonable approach
 that we use limited and minimum JDK-7 APIs when we have reasons we need to 
 use
 the features.
 By the way, if we start to use JDK 7 APIs, we should declare the basis
 when to use
 JDK 7 APIs on Wiki not to confuse contributors.

 Thanks,
 - Tsuyoshi

 On Wed, Apr 9, 2014 at 11:44 AM, Raymie Stata rst...@altiscale.com wrote:
 It might make sense to try to enumerate the benefits of switching to
 Java7 APIs and dependencies.

   - Java7 introduced a huge number of language, byte-code, API, and
 tooling enhancements!  Just to name a few: try-with-resources, newer
 and stronger encyrption methods, more scalable concurrency primitives.
  See http://www.slideshare.net/boulderjug/55-things-in-java-7

   - We can't update current dependencies, and we can't add cool new ones.

   - Putting language/APIs aside, don't forget that a huge amount of effort
 goes into qualifying for Java6 (at least, I hope the folks claiming to
 support Java6 are putting in such an effort :-).  Wouldn't Hadoop
 users/customers be better served if qualification effort went into
 Java7/8 versus Java6/7?

 Getting to Java7 as a development env (and Java8 as a runtime env)
 seems like a no-brainer.  Question is: How?

 On Tue, Apr 8, 2014 at 10:21 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
 It might make sense to try to enumerate the benefits of switching to Java7
 APIs and dependencies.  IMO, the ones listed so far on this thread don't
 make a compelling enough case to drop Java6 in branch-2 on any time frame,
 even if this means supporting Java6 through 2015.  For example, the change
 in RawLocalFileSystem semantics might be an incompatible change for
 branch-2 any way.


 On Tue, Apr 8, 2014 at 10:05 AM, Karthik Kambatla 
 ka...@cloudera.comwrote:

 +1 to NOT breaking compatibility in branch-2.

 I think it is reasonable to require JDK7 for trunk, if we limit use of
 JDK7-only API to security fixes etc. If we make other optimizations (like
 IO), it would be a pain to backport things to branch-2. I guess this all
 depends on when we see ourselves shipping Hadoop-3. Any ideas on that?


 On Tue, Apr 8, 2014 at 9:19 AM, Eli Collins e...@cloudera.com wrote:

  On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
 

Re: Plans of moving towards JDK7 in trunk

2014-04-14 Thread Andrew Purtell
How well is trunk tested? Does anyone deploy it with real applications
running on top? When will the trunk codebase next be the basis for a
production release? An impromptu diff of hadoop-common trunk against
branch-2 as of today is 38,625 lines. Can they be said to be the same
animal? I ask because any disincentive toward putting code in trunk is
beside the point, if the only target worth pursuing today is branch-2
unless one doesn't care if the code is released for production use.
Questions on whither JDK6 or JDK7+ (or JRE6 versus JRE7+) only matter for
the vast majority of Hadoopers if talking about branch-2.


On Mon, Apr 14, 2014 at 9:22 AM, Colin McCabe cmcc...@alumni.cmu.eduwrote:

 I think the bottom line here is that as long as our stable release
 uses JDK6, there is going to be a very, very strong disincentive to
 put any code which can't run on JDK6 into trunk.

 Like I said earlier, the traditional reason for putting something in
 trunk but not the stable release is that it needs more testing.  If a
 stable release that drops support for JDK6 is more than a year away,
 does it make sense to put anything in trunk like that?  What might
 need more than a year of testing?  Certainly not changes to
 LocalFileSystem to use the new APIs.  I also don't think an upgrade to
 various libraries qualifies.

 It might be best to shelve this for now, like we've done in the past,
 until we're ready to talk about a stable release that requires JDK7+.
 At least that's my feeling.

 If we're really desperate for the new file APIs JDK7 provides, we
 could consider using loadable modules for it in branch-2.  This is
 similar to how we provide JNI versions of certain things on certain
 platforms, without dropping support for the other platforms.

 best,
 Colin

 On Sun, Apr 13, 2014 at 10:39 AM, Raymie Stata rst...@altiscale.com
 wrote:
  There's an outstanding question addressed to me: Are there particular
  features or new dependencies that you would like to contribute (or see
  contributed) that require using the Java 1.7 APIs?  The question
  misses the point: We'd figure out how to write something we wanted to
  contribute to Hadoop against the APIs of Java4 if that's what it took
  to get them into a stable release.  And at current course and speed,
  that's how ridiculous things could get.
 
  To summarize, it seems like there's a vague consensus that it might be
  okay to eventually allow the use of Java7 in trunk, but there's no
  decision.  And there's been no answer to the concern that even if such
  dependencies were allowed in Java7, the only people using them would
  be people who uninterested in getting their patches into a stable
  release of Hadoop on any knowable timeframe, which doesn't bode well
  for the ability to stabilize that Java7 code when it comes time to
  attempt to.
 
  I don't have more to add, so I'll go back to lurking.  It'll be
  interesting to see where we'll be standing a year from now.
 
  On Sun, Apr 13, 2014 at 2:09 AM, Tsuyoshi OZAWA
  ozawa.tsuyo...@gmail.com wrote:
  Hi,
 
  +1 for Karthik's idea(non-binding).
 
  IMO, we should keep the compatibility between JDK 6 and JDK 7 on both
 branch-1
  and branch-2, because users can be using them. For future releases that
 we can
  declare breaking compatibility(e.g. 3.0.0 release), we can use JDK 7
  features if we
  can get benefits. However, it can increase maintenance costs and
 distributes the
  efforts of contributions to maintain branches. Then, I think it is
  reasonable approach
  that we use limited and minimum JDK-7 APIs when we have reasons we need
 to use
  the features.
  By the way, if we start to use JDK 7 APIs, we should declare the basis
  when to use
  JDK 7 APIs on Wiki not to confuse contributors.
 
  Thanks,
  - Tsuyoshi
 
  On Wed, Apr 9, 2014 at 11:44 AM, Raymie Stata rst...@altiscale.com
 wrote:
  It might make sense to try to enumerate the benefits of switching to
  Java7 APIs and dependencies.
 
- Java7 introduced a huge number of language, byte-code, API, and
  tooling enhancements!  Just to name a few: try-with-resources, newer
  and stronger encyrption methods, more scalable concurrency primitives.
   See http://www.slideshare.net/boulderjug/55-things-in-java-7
 
- We can't update current dependencies, and we can't add cool new
 ones.
 
- Putting language/APIs aside, don't forget that a huge amount of
 effort
  goes into qualifying for Java6 (at least, I hope the folks claiming to
  support Java6 are putting in such an effort :-).  Wouldn't Hadoop
  users/customers be better served if qualification effort went into
  Java7/8 versus Java6/7?
 
  Getting to Java7 as a development env (and Java8 as a runtime env)
  seems like a no-brainer.  Question is: How?
 
  On Tue, Apr 8, 2014 at 10:21 AM, Sandy Ryza sandy.r...@cloudera.com
 wrote:
  It might make sense to try to enumerate the benefits of switching to
 Java7
  APIs and dependencies.  IMO, the ones listed so far on this thread
 don't
  

Re: Plans of moving towards JDK7 in trunk

2014-04-14 Thread Sangjin Lee
I would say, to an extent. The current state of the jetty version is
*severe*. We're 3 major versions behind, and if my understanding is
correct, it was a long time ago they EOF'ed version 6.x.

Yes, upgrading jetty could break some customers. However, we need to view
it in balance. We're constantly making customers scramble to work around
this stale dependency (and its transitive dependencies).

Sangjin

On Sat, Apr 12, 2014 at 7:29 AM, Alejandro Abdelnur t...@cloudera.comwrote:

 i disagree, mustn't break downstrea

 Alejandro
 (phone typing)

  On Apr 12, 2014, at 3:15, Steve Loughran ste...@hortonworks.com wrote:
 
  1. I wasn't thinking of sticking of jetty in in the web ui or webhdfs at
  all.
  2. the later jetties change their packaing, so should be able to co-exist
  anyway.
 
  Jetty is a fundamental cause of problems, especially on things like
  webhdfs. We can't use the excuse of mustn't break downstream app
 classpath
  compatibility to avoid fixing significant problems
 
 
  On 11 April 2014 23:05, Alejandro Abdelnur t...@cloudera.com wrote:
 
  newer jetties have non backwards compat APIs, we would break any user
 app
  using jetty (consumed via hadoop classpath)
 
 
 
  On Fri, Apr 11, 2014 at 2:16 PM, Steve Loughran ste...@hortonworks.com
  wrote:
 
  that doesn't actually stop is from switching in our own code to
 alternate
  web servers,  only that jetty can remain a published artifact in the
  hadoop/lib dir
 
 
  On 11 April 2014 21:16, Alejandro Abdelnur t...@cloudera.com wrote:
 
  because it is exposed as classpath dependency, changing it breaks
  backward
  compatibility.
 
 
  On Fri, Apr 11, 2014 at 1:02 PM, Steve Loughran 
  ste...@hortonworks.com
  wrote:
 
  Jetty's a big change, it's fairly intimately involved in bits of the
  code
 
  but: it's a source of grief, currently webhdfs is an example
  https://issues.apache.org/jira/browse/HDFS-6221
 
  all YARN apps seem to get hosted by it too
 
 
  On 11 April 2014 20:56, Robert Rati rr...@redhat.com wrote:
 
  I don't mean to be dense, but can you expand on why jetty 8 can't
  go
  into
  branch2?  What is the concern?
 
  Rob
 
 
  On 04/11/2014 10:55 AM, Alejandro Abdelnur wrote:
 
  if you mean updating jetty on branch2, we cannot do that. it has
  to
  be
  done in trunk.
 
  thx
 
  Alejandro
  (phone typing)
 
  On Apr 11, 2014, at 4:46, Robert Rati rr...@redhat.com wrote:
 
  Just an FYI, but I'm working on updating that jetty patch for the
  current 2.4.0 release.  The one that is there won't cleanly apply
  because
  so much has changed since it was posted.  I'll post a new patch
  when
  it's
  done.
 
  Rob
 
  On 04/11/2014 04:24 AM, Steve Loughran wrote:
 
  On 10 April 2014 18:12, Eli Collins e...@cloudera.com wrote:
 
  Let's speak less abstractly, are there particular features or
  new
  dependencies that you would like to contribute (or see
  contributed)
  that
  require using the Java 1.7 APIs?  Breaking compat in v2 or
  rolling
  a
  v3
  release are both non-trivial, not something I suspect we'd want
  to
  do
  just
  because it would be, for example, nicer to have a newer version
  of
  Jetty.
 
  Oddly enough, rolling the web framework is something I'd like to
  see
  in
  a
  v3. the shuffle may be off jetty, but webhdfs isn't. Moving up
  also
  lets is
  reliably switch to servlet API v3
 
  But.. I think we may be able to increment Jetty more without
  going
  to
  java7, see https://issues.apache.org/jira/browse/HADOOP-9650 .
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
  entity
  to
  which it is addressed and may contain information that is
  confidential,
  privileged and exempt from disclosure under applicable law. If the
  reader
  of this message is not the intended recipient, you are hereby
  notified
  that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
  immediately
  and delete it from your system. Thank You.
 
 
 
  --
  Alejandro
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
 entity
  to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader
  of this message is not the intended recipient, you are hereby notified
  that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
  immediately
  and delete it from your system. Thank You.
 
 
 
  --
  Alejandro
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under 

Re: Plans of moving towards JDK7 in trunk

2014-04-14 Thread Steve Loughran
On 14 April 2014 17:46, Andrew Purtell apurt...@apache.org wrote:

 How well is trunk tested? Does anyone deploy it with real applications
 running on top? When will the trunk codebase next be the basis for a
 production release? An impromptu diff of hadoop-common trunk against
 branch-2 as of today is 38,625 lines. Can they be said to be the same
 animal? I ask because any disincentive toward putting code in trunk is
 beside the point, if the only target worth pursuing today is branch-2
 unless one doesn't care if the code is released for production use.
 Questions on whither JDK6 or JDK7+ (or JRE6 versus JRE7+) only matter for
 the vast majority of Hadoopers if talking about branch-2.


I think its partly a timescale issue; its also because the 1-2 transition
was so significant, especially at the YARN layer, that it's still taking
time to trickle through.

If you do want code to ship this year, branch-2 is where you are going to
try and get it in -and like you say, that's where things get tried in the
field. At the same time, the constraints of stability are holding us back
-already-.

I don't see why we should have such another major 1-2 transition in future;
the rate that Arun is pushing out 2.x releases its almost back to the 0.1x
timescale -though at that point most people were fending for themselves and
expectations of stability were less. We do want smaller version increments
in future, which branch-2 is -mostly- delivering.

While Java 7 doesn't have some must-have features, Java 8 is a significant
improvement in the language, and we should be looking ahead to that, maybe
even doing some leading-edge work on the side, so the same discussion
doesn't come up in two years time when java 7 goes EOL.


-steve

(personal opinions only, etc, )



 On Mon, Apr 14, 2014 at 9:22 AM, Colin McCabe cmcc...@alumni.cmu.edu
 wrote:

  I think the bottom line here is that as long as our stable release
  uses JDK6, there is going to be a very, very strong disincentive to
  put any code which can't run on JDK6 into trunk.
 
  Like I said earlier, the traditional reason for putting something in
  trunk but not the stable release is that it needs more testing.  If a
  stable release that drops support for JDK6 is more than a year away,
  does it make sense to put anything in trunk like that?  What might
  need more than a year of testing?  Certainly not changes to
  LocalFileSystem to use the new APIs.  I also don't think an upgrade to
  various libraries qualifies.
 
  It might be best to shelve this for now, like we've done in the past,
  until we're ready to talk about a stable release that requires JDK7+.
  At least that's my feeling.
 
  If we're really desperate for the new file APIs JDK7 provides, we
  could consider using loadable modules for it in branch-2.  This is
  similar to how we provide JNI versions of certain things on certain
  platforms, without dropping support for the other platforms.
 
  best,
  Colin
 
  On Sun, Apr 13, 2014 at 10:39 AM, Raymie Stata rst...@altiscale.com
  wrote:
   There's an outstanding question addressed to me: Are there particular
   features or new dependencies that you would like to contribute (or see
   contributed) that require using the Java 1.7 APIs?  The question
   misses the point: We'd figure out how to write something we wanted to
   contribute to Hadoop against the APIs of Java4 if that's what it took
   to get them into a stable release.  And at current course and speed,
   that's how ridiculous things could get.
  
   To summarize, it seems like there's a vague consensus that it might be
   okay to eventually allow the use of Java7 in trunk, but there's no
   decision.  And there's been no answer to the concern that even if such
   dependencies were allowed in Java7, the only people using them would
   be people who uninterested in getting their patches into a stable
   release of Hadoop on any knowable timeframe, which doesn't bode well
   for the ability to stabilize that Java7 code when it comes time to
   attempt to.
  
   I don't have more to add, so I'll go back to lurking.  It'll be
   interesting to see where we'll be standing a year from now.
  
   On Sun, Apr 13, 2014 at 2:09 AM, Tsuyoshi OZAWA
   ozawa.tsuyo...@gmail.com wrote:
   Hi,
  
   +1 for Karthik's idea(non-binding).
  
   IMO, we should keep the compatibility between JDK 6 and JDK 7 on both
  branch-1
   and branch-2, because users can be using them. For future releases
 that
  we can
   declare breaking compatibility(e.g. 3.0.0 release), we can use JDK 7
   features if we
   can get benefits. However, it can increase maintenance costs and
  distributes the
   efforts of contributions to maintain branches. Then, I think it is
   reasonable approach
   that we use limited and minimum JDK-7 APIs when we have reasons we
 need
  to use
   the features.
   By the way, if we start to use JDK 7 APIs, we should declare the basis
   when to use
   JDK 7 APIs on Wiki not to confuse contributors.
  
 

Re: Plans of moving towards JDK7 in trunk

2014-04-13 Thread Tsuyoshi OZAWA
Hi,

+1 for Karthik's idea(non-binding).

IMO, we should keep the compatibility between JDK 6 and JDK 7 on both branch-1
and branch-2, because users can be using them. For future releases that we can
declare breaking compatibility(e.g. 3.0.0 release), we can use JDK 7
features if we
can get benefits. However, it can increase maintenance costs and distributes the
efforts of contributions to maintain branches. Then, I think it is
reasonable approach
that we use limited and minimum JDK-7 APIs when we have reasons we need to use
the features.
By the way, if we start to use JDK 7 APIs, we should declare the basis
when to use
JDK 7 APIs on Wiki not to confuse contributors.

Thanks,
- Tsuyoshi

On Wed, Apr 9, 2014 at 11:44 AM, Raymie Stata rst...@altiscale.com wrote:
 It might make sense to try to enumerate the benefits of switching to
 Java7 APIs and dependencies.

   - Java7 introduced a huge number of language, byte-code, API, and
 tooling enhancements!  Just to name a few: try-with-resources, newer
 and stronger encyrption methods, more scalable concurrency primitives.
  See http://www.slideshare.net/boulderjug/55-things-in-java-7

   - We can't update current dependencies, and we can't add cool new ones.

   - Putting language/APIs aside, don't forget that a huge amount of effort
 goes into qualifying for Java6 (at least, I hope the folks claiming to
 support Java6 are putting in such an effort :-).  Wouldn't Hadoop
 users/customers be better served if qualification effort went into
 Java7/8 versus Java6/7?

 Getting to Java7 as a development env (and Java8 as a runtime env)
 seems like a no-brainer.  Question is: How?

 On Tue, Apr 8, 2014 at 10:21 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
 It might make sense to try to enumerate the benefits of switching to Java7
 APIs and dependencies.  IMO, the ones listed so far on this thread don't
 make a compelling enough case to drop Java6 in branch-2 on any time frame,
 even if this means supporting Java6 through 2015.  For example, the change
 in RawLocalFileSystem semantics might be an incompatible change for
 branch-2 any way.


 On Tue, Apr 8, 2014 at 10:05 AM, Karthik Kambatla ka...@cloudera.comwrote:

 +1 to NOT breaking compatibility in branch-2.

 I think it is reasonable to require JDK7 for trunk, if we limit use of
 JDK7-only API to security fixes etc. If we make other optimizations (like
 IO), it would be a pain to backport things to branch-2. I guess this all
 depends on when we see ourselves shipping Hadoop-3. Any ideas on that?


 On Tue, Apr 8, 2014 at 9:19 AM, Eli Collins e...@cloudera.com wrote:

  On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
  davi.ottenhei...@emc.com wrote:
   From: Eli Collins [mailto:e...@cloudera.com]
   Sent: Monday, April 07, 2014 11:54 AM
  
  
   IMO we should not drop support for Java 6 in a minor update of a
 stable
   release (v2).  I don't think the larger Hadoop user base would find it
   acceptable that upgrading to a minor update caused their systems to
 stop
   working because they didn't upgrade Java. There are people still
 getting
   support for Java 6. ...
  
   Thanks,
   Eli
  
   Hi Eli,
  
   Technically you are correct those with extended support get critical
  security fixes for 6 until the end of 2016. I am curious whether many of
  those are in the Hadoop user base. Do you know? My guess is the vast
  majority are within Oracle's official public end of life, which was over
 12
  months ago. Even Premier support ended Dec 2013:
  
   http://www.oracle.com/technetwork/java/eol-135779.html
  
   The end of Java 6 support carries much risk. It has to be considered in
  terms of serious security vulnerabilities such as CVE-2013-2465 with CVSS
  score 10.0.
  
   http://www.cvedetails.com/cve/CVE-2013-2465/
  
   Since you mentioned caused systems to stop as an example of what
 would
  be a concern to Hadoop users, please note the CVE-2013-2465 availability
  impact:
  
   Complete (There is a total shutdown of the affected resource. The
  attacker can render the resource completely unavailable.)
  
   This vulnerability was patched in Java 6 Update 51, but post end of
  life. Apple pushed out the update specifically because of this
  vulnerability (http://support.apple.com/kb/HT5717) as did some other
  vendors privately, but for the majority of people using Java 6 means they
  have a ticking time bomb.
  
   Allowing it to stay should be considered in terms of accepting the
 whole
  risk posture.
  
 
  There are some who get extended support, but I suspect many just have
  a if-it's-not-broke mentality when it comes to production deployments.
  The current code supports both java6 and java7 and so allows these
  people to remain compatible, while enabling others to upgrade to the
  java7 runtime. This seems like the right compromise for a stable
  release series. Again, absolutely makes sense for trunk (ie v3) to
  require java7 or greater.
 




-- 
- Tsuyoshi


Re: Plans of moving towards JDK7 in trunk

2014-04-13 Thread Raymie Stata
There's an outstanding question addressed to me: Are there particular
features or new dependencies that you would like to contribute (or see
contributed) that require using the Java 1.7 APIs?  The question
misses the point: We'd figure out how to write something we wanted to
contribute to Hadoop against the APIs of Java4 if that's what it took
to get them into a stable release.  And at current course and speed,
that's how ridiculous things could get.

To summarize, it seems like there's a vague consensus that it might be
okay to eventually allow the use of Java7 in trunk, but there's no
decision.  And there's been no answer to the concern that even if such
dependencies were allowed in Java7, the only people using them would
be people who uninterested in getting their patches into a stable
release of Hadoop on any knowable timeframe, which doesn't bode well
for the ability to stabilize that Java7 code when it comes time to
attempt to.

I don't have more to add, so I'll go back to lurking.  It'll be
interesting to see where we'll be standing a year from now.

On Sun, Apr 13, 2014 at 2:09 AM, Tsuyoshi OZAWA
ozawa.tsuyo...@gmail.com wrote:
 Hi,

 +1 for Karthik's idea(non-binding).

 IMO, we should keep the compatibility between JDK 6 and JDK 7 on both branch-1
 and branch-2, because users can be using them. For future releases that we can
 declare breaking compatibility(e.g. 3.0.0 release), we can use JDK 7
 features if we
 can get benefits. However, it can increase maintenance costs and distributes 
 the
 efforts of contributions to maintain branches. Then, I think it is
 reasonable approach
 that we use limited and minimum JDK-7 APIs when we have reasons we need to use
 the features.
 By the way, if we start to use JDK 7 APIs, we should declare the basis
 when to use
 JDK 7 APIs on Wiki not to confuse contributors.

 Thanks,
 - Tsuyoshi

 On Wed, Apr 9, 2014 at 11:44 AM, Raymie Stata rst...@altiscale.com wrote:
 It might make sense to try to enumerate the benefits of switching to
 Java7 APIs and dependencies.

   - Java7 introduced a huge number of language, byte-code, API, and
 tooling enhancements!  Just to name a few: try-with-resources, newer
 and stronger encyrption methods, more scalable concurrency primitives.
  See http://www.slideshare.net/boulderjug/55-things-in-java-7

   - We can't update current dependencies, and we can't add cool new ones.

   - Putting language/APIs aside, don't forget that a huge amount of effort
 goes into qualifying for Java6 (at least, I hope the folks claiming to
 support Java6 are putting in such an effort :-).  Wouldn't Hadoop
 users/customers be better served if qualification effort went into
 Java7/8 versus Java6/7?

 Getting to Java7 as a development env (and Java8 as a runtime env)
 seems like a no-brainer.  Question is: How?

 On Tue, Apr 8, 2014 at 10:21 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
 It might make sense to try to enumerate the benefits of switching to Java7
 APIs and dependencies.  IMO, the ones listed so far on this thread don't
 make a compelling enough case to drop Java6 in branch-2 on any time frame,
 even if this means supporting Java6 through 2015.  For example, the change
 in RawLocalFileSystem semantics might be an incompatible change for
 branch-2 any way.


 On Tue, Apr 8, 2014 at 10:05 AM, Karthik Kambatla ka...@cloudera.comwrote:

 +1 to NOT breaking compatibility in branch-2.

 I think it is reasonable to require JDK7 for trunk, if we limit use of
 JDK7-only API to security fixes etc. If we make other optimizations (like
 IO), it would be a pain to backport things to branch-2. I guess this all
 depends on when we see ourselves shipping Hadoop-3. Any ideas on that?


 On Tue, Apr 8, 2014 at 9:19 AM, Eli Collins e...@cloudera.com wrote:

  On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
  davi.ottenhei...@emc.com wrote:
   From: Eli Collins [mailto:e...@cloudera.com]
   Sent: Monday, April 07, 2014 11:54 AM
  
  
   IMO we should not drop support for Java 6 in a minor update of a
 stable
   release (v2).  I don't think the larger Hadoop user base would find it
   acceptable that upgrading to a minor update caused their systems to
 stop
   working because they didn't upgrade Java. There are people still
 getting
   support for Java 6. ...
  
   Thanks,
   Eli
  
   Hi Eli,
  
   Technically you are correct those with extended support get critical
  security fixes for 6 until the end of 2016. I am curious whether many of
  those are in the Hadoop user base. Do you know? My guess is the vast
  majority are within Oracle's official public end of life, which was over
 12
  months ago. Even Premier support ended Dec 2013:
  
   http://www.oracle.com/technetwork/java/eol-135779.html
  
   The end of Java 6 support carries much risk. It has to be considered in
  terms of serious security vulnerabilities such as CVE-2013-2465 with CVSS
  score 10.0.
  
   http://www.cvedetails.com/cve/CVE-2013-2465/
  
   Since you mentioned caused systems 

Re: Plans of moving towards JDK7 in trunk

2014-04-12 Thread Steve Loughran
1. I wasn't thinking of sticking of jetty in in the web ui or webhdfs at
all.
2. the later jetties change their packaing, so should be able to co-exist
anyway.

Jetty is a fundamental cause of problems, especially on things like
webhdfs. We can't use the excuse of mustn't break downstream app classpath
compatibility to avoid fixing significant problems


On 11 April 2014 23:05, Alejandro Abdelnur t...@cloudera.com wrote:

 newer jetties have non backwards compat APIs, we would break any user app
 using jetty (consumed via hadoop classpath)



 On Fri, Apr 11, 2014 at 2:16 PM, Steve Loughran ste...@hortonworks.com
 wrote:

  that doesn't actually stop is from switching in our own code to alternate
  web servers,  only that jetty can remain a published artifact in the
  hadoop/lib dir
 
 
  On 11 April 2014 21:16, Alejandro Abdelnur t...@cloudera.com wrote:
 
   because it is exposed as classpath dependency, changing it breaks
  backward
   compatibility.
  
  
   On Fri, Apr 11, 2014 at 1:02 PM, Steve Loughran 
 ste...@hortonworks.com
   wrote:
  
Jetty's a big change, it's fairly intimately involved in bits of the
  code
   
but: it's a source of grief, currently webhdfs is an example
https://issues.apache.org/jira/browse/HDFS-6221
   
all YARN apps seem to get hosted by it too
   
   
On 11 April 2014 20:56, Robert Rati rr...@redhat.com wrote:
   
 I don't mean to be dense, but can you expand on why jetty 8 can't
 go
   into
 branch2?  What is the concern?

 Rob


 On 04/11/2014 10:55 AM, Alejandro Abdelnur wrote:

 if you mean updating jetty on branch2, we cannot do that. it has
 to
  be
 done in trunk.

 thx

 Alejandro
 (phone typing)

  On Apr 11, 2014, at 4:46, Robert Rati rr...@redhat.com wrote:

 Just an FYI, but I'm working on updating that jetty patch for the
 current 2.4.0 release.  The one that is there won't cleanly apply
because
 so much has changed since it was posted.  I'll post a new patch
  when
it's
 done.

 Rob

  On 04/11/2014 04:24 AM, Steve Loughran wrote:

 On 10 April 2014 18:12, Eli Collins e...@cloudera.com wrote:

 Let's speak less abstractly, are there particular features or
 new
 dependencies that you would like to contribute (or see
  contributed)
 that
 require using the Java 1.7 APIs?  Breaking compat in v2 or
  rolling
   a
v3
 release are both non-trivial, not something I suspect we'd want
  to
   do
 just
 because it would be, for example, nicer to have a newer version
  of
 Jetty.


 Oddly enough, rolling the web framework is something I'd like to
  see
in
 a
 v3. the shuffle may be off jetty, but webhdfs isn't. Moving up
  also
 lets is
 reliably switch to servlet API v3

 But.. I think we may be able to increment Jetty more without
 going
   to
 java7, see https://issues.apache.org/jira/browse/HADOOP-9650 .


   
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
  entity
   to
which it is addressed and may contain information that is
 confidential,
privileged and exempt from disclosure under applicable law. If the
  reader
of this message is not the intended recipient, you are hereby
 notified
   that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
   immediately
and delete it from your system. Thank You.
   
  
  
  
   --
   Alejandro
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 



 --
 Alejandro


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-12 Thread Alejandro Abdelnur
i disagree, mustn't break downstrea

Alejandro
(phone typing)

 On Apr 12, 2014, at 3:15, Steve Loughran ste...@hortonworks.com wrote:
 
 1. I wasn't thinking of sticking of jetty in in the web ui or webhdfs at
 all.
 2. the later jetties change their packaing, so should be able to co-exist
 anyway.
 
 Jetty is a fundamental cause of problems, especially on things like
 webhdfs. We can't use the excuse of mustn't break downstream app classpath
 compatibility to avoid fixing significant problems
 
 
 On 11 April 2014 23:05, Alejandro Abdelnur t...@cloudera.com wrote:
 
 newer jetties have non backwards compat APIs, we would break any user app
 using jetty (consumed via hadoop classpath)
 
 
 
 On Fri, Apr 11, 2014 at 2:16 PM, Steve Loughran ste...@hortonworks.com
 wrote:
 
 that doesn't actually stop is from switching in our own code to alternate
 web servers,  only that jetty can remain a published artifact in the
 hadoop/lib dir
 
 
 On 11 April 2014 21:16, Alejandro Abdelnur t...@cloudera.com wrote:
 
 because it is exposed as classpath dependency, changing it breaks
 backward
 compatibility.
 
 
 On Fri, Apr 11, 2014 at 1:02 PM, Steve Loughran 
 ste...@hortonworks.com
 wrote:
 
 Jetty's a big change, it's fairly intimately involved in bits of the
 code
 
 but: it's a source of grief, currently webhdfs is an example
 https://issues.apache.org/jira/browse/HDFS-6221
 
 all YARN apps seem to get hosted by it too
 
 
 On 11 April 2014 20:56, Robert Rati rr...@redhat.com wrote:
 
 I don't mean to be dense, but can you expand on why jetty 8 can't
 go
 into
 branch2?  What is the concern?
 
 Rob
 
 
 On 04/11/2014 10:55 AM, Alejandro Abdelnur wrote:
 
 if you mean updating jetty on branch2, we cannot do that. it has
 to
 be
 done in trunk.
 
 thx
 
 Alejandro
 (phone typing)
 
 On Apr 11, 2014, at 4:46, Robert Rati rr...@redhat.com wrote:
 
 Just an FYI, but I'm working on updating that jetty patch for the
 current 2.4.0 release.  The one that is there won't cleanly apply
 because
 so much has changed since it was posted.  I'll post a new patch
 when
 it's
 done.
 
 Rob
 
 On 04/11/2014 04:24 AM, Steve Loughran wrote:
 
 On 10 April 2014 18:12, Eli Collins e...@cloudera.com wrote:
 
 Let's speak less abstractly, are there particular features or
 new
 dependencies that you would like to contribute (or see
 contributed)
 that
 require using the Java 1.7 APIs?  Breaking compat in v2 or
 rolling
 a
 v3
 release are both non-trivial, not something I suspect we'd want
 to
 do
 just
 because it would be, for example, nicer to have a newer version
 of
 Jetty.
 
 Oddly enough, rolling the web framework is something I'd like to
 see
 in
 a
 v3. the shuffle may be off jetty, but webhdfs isn't. Moving up
 also
 lets is
 reliably switch to servlet API v3
 
 But.. I think we may be able to increment Jetty more without
 going
 to
 java7, see https://issues.apache.org/jira/browse/HADOOP-9650 .
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
 entity
 to
 which it is addressed and may contain information that is
 confidential,
 privileged and exempt from disclosure under applicable law. If the
 reader
 of this message is not the intended recipient, you are hereby
 notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.
 
 
 
 --
 Alejandro
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.
 
 
 
 --
 Alejandro
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-11 Thread Steve Loughran
On 10 April 2014 18:12, Eli Collins e...@cloudera.com wrote:

 Let's speak less abstractly, are there particular features or new
 dependencies that you would like to contribute (or see contributed) that
 require using the Java 1.7 APIs?  Breaking compat in v2 or rolling a v3
 release are both non-trivial, not something I suspect we'd want to do just
 because it would be, for example, nicer to have a newer version of Jetty.


Oddly enough, rolling the web framework is something I'd like to see in a
v3. the shuffle may be off jetty, but webhdfs isn't. Moving up also lets is
reliably switch to servlet API v3

But.. I think we may be able to increment Jetty more without going to
java7, see https://issues.apache.org/jira/browse/HADOOP-9650 .

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-11 Thread Robert Rati
Just an FYI, but I'm working on updating that jetty patch for the 
current 2.4.0 release.  The one that is there won't cleanly apply 
because so much has changed since it was posted.  I'll post a new patch 
when it's done.


Rob

On 04/11/2014 04:24 AM, Steve Loughran wrote:

On 10 April 2014 18:12, Eli Collins e...@cloudera.com wrote:


Let's speak less abstractly, are there particular features or new
dependencies that you would like to contribute (or see contributed) that
require using the Java 1.7 APIs?  Breaking compat in v2 or rolling a v3
release are both non-trivial, not something I suspect we'd want to do just
because it would be, for example, nicer to have a newer version of Jetty.



Oddly enough, rolling the web framework is something I'd like to see in a
v3. the shuffle may be off jetty, but webhdfs isn't. Moving up also lets is
reliably switch to servlet API v3

But.. I think we may be able to increment Jetty more without going to
java7, see https://issues.apache.org/jira/browse/HADOOP-9650 .



Re: Plans of moving towards JDK7 in trunk

2014-04-11 Thread Alejandro Abdelnur
if you mean updating jetty on branch2, we cannot do that. it has to be done in 
trunk. 

thx

Alejandro
(phone typing)

 On Apr 11, 2014, at 4:46, Robert Rati rr...@redhat.com wrote:
 
 Just an FYI, but I'm working on updating that jetty patch for the current 
 2.4.0 release.  The one that is there won't cleanly apply because so much has 
 changed since it was posted.  I'll post a new patch when it's done.
 
 Rob
 
 On 04/11/2014 04:24 AM, Steve Loughran wrote:
 On 10 April 2014 18:12, Eli Collins e...@cloudera.com wrote:
 
 Let's speak less abstractly, are there particular features or new
 dependencies that you would like to contribute (or see contributed) that
 require using the Java 1.7 APIs?  Breaking compat in v2 or rolling a v3
 release are both non-trivial, not something I suspect we'd want to do just
 because it would be, for example, nicer to have a newer version of Jetty.
 
 Oddly enough, rolling the web framework is something I'd like to see in a
 v3. the shuffle may be off jetty, but webhdfs isn't. Moving up also lets is
 reliably switch to servlet API v3
 
 But.. I think we may be able to increment Jetty more without going to
 java7, see https://issues.apache.org/jira/browse/HADOOP-9650 .
 


Re: Plans of moving towards JDK7 in trunk

2014-04-11 Thread Robert Rati
I don't mean to be dense, but can you expand on why jetty 8 can't go 
into branch2?  What is the concern?


Rob

On 04/11/2014 10:55 AM, Alejandro Abdelnur wrote:

if you mean updating jetty on branch2, we cannot do that. it has to be done in 
trunk.

thx

Alejandro
(phone typing)


On Apr 11, 2014, at 4:46, Robert Rati rr...@redhat.com wrote:

Just an FYI, but I'm working on updating that jetty patch for the current 2.4.0 
release.  The one that is there won't cleanly apply because so much has changed 
since it was posted.  I'll post a new patch when it's done.

Rob


On 04/11/2014 04:24 AM, Steve Loughran wrote:

On 10 April 2014 18:12, Eli Collins e...@cloudera.com wrote:

Let's speak less abstractly, are there particular features or new
dependencies that you would like to contribute (or see contributed) that
require using the Java 1.7 APIs?  Breaking compat in v2 or rolling a v3
release are both non-trivial, not something I suspect we'd want to do just
because it would be, for example, nicer to have a newer version of Jetty.


Oddly enough, rolling the web framework is something I'd like to see in a
v3. the shuffle may be off jetty, but webhdfs isn't. Moving up also lets is
reliably switch to servlet API v3

But.. I think we may be able to increment Jetty more without going to
java7, see https://issues.apache.org/jira/browse/HADOOP-9650 .



Re: Plans of moving towards JDK7 in trunk

2014-04-11 Thread Alejandro Abdelnur
because it is exposed as classpath dependency, changing it breaks backward
compatibility.


On Fri, Apr 11, 2014 at 1:02 PM, Steve Loughran ste...@hortonworks.comwrote:

 Jetty's a big change, it's fairly intimately involved in bits of the code

 but: it's a source of grief, currently webhdfs is an example
 https://issues.apache.org/jira/browse/HDFS-6221

 all YARN apps seem to get hosted by it too


 On 11 April 2014 20:56, Robert Rati rr...@redhat.com wrote:

  I don't mean to be dense, but can you expand on why jetty 8 can't go into
  branch2?  What is the concern?
 
  Rob
 
 
  On 04/11/2014 10:55 AM, Alejandro Abdelnur wrote:
 
  if you mean updating jetty on branch2, we cannot do that. it has to be
  done in trunk.
 
  thx
 
  Alejandro
  (phone typing)
 
   On Apr 11, 2014, at 4:46, Robert Rati rr...@redhat.com wrote:
 
  Just an FYI, but I'm working on updating that jetty patch for the
  current 2.4.0 release.  The one that is there won't cleanly apply
 because
  so much has changed since it was posted.  I'll post a new patch when
 it's
  done.
 
  Rob
 
   On 04/11/2014 04:24 AM, Steve Loughran wrote:
 
  On 10 April 2014 18:12, Eli Collins e...@cloudera.com wrote:
 
  Let's speak less abstractly, are there particular features or new
  dependencies that you would like to contribute (or see contributed)
  that
  require using the Java 1.7 APIs?  Breaking compat in v2 or rolling a
 v3
  release are both non-trivial, not something I suspect we'd want to do
  just
  because it would be, for example, nicer to have a newer version of
  Jetty.
 
 
  Oddly enough, rolling the web framework is something I'd like to see
 in
  a
  v3. the shuffle may be off jetty, but webhdfs isn't. Moving up also
  lets is
  reliably switch to servlet API v3
 
  But.. I think we may be able to increment Jetty more without going to
  java7, see https://issues.apache.org/jira/browse/HADOOP-9650 .
 
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Alejandro


Re: Plans of moving towards JDK7 in trunk

2014-04-11 Thread Steve Loughran
that doesn't actually stop is from switching in our own code to alternate
web servers,  only that jetty can remain a published artifact in the
hadoop/lib dir


On 11 April 2014 21:16, Alejandro Abdelnur t...@cloudera.com wrote:

 because it is exposed as classpath dependency, changing it breaks backward
 compatibility.


 On Fri, Apr 11, 2014 at 1:02 PM, Steve Loughran ste...@hortonworks.com
 wrote:

  Jetty's a big change, it's fairly intimately involved in bits of the code
 
  but: it's a source of grief, currently webhdfs is an example
  https://issues.apache.org/jira/browse/HDFS-6221
 
  all YARN apps seem to get hosted by it too
 
 
  On 11 April 2014 20:56, Robert Rati rr...@redhat.com wrote:
 
   I don't mean to be dense, but can you expand on why jetty 8 can't go
 into
   branch2?  What is the concern?
  
   Rob
  
  
   On 04/11/2014 10:55 AM, Alejandro Abdelnur wrote:
  
   if you mean updating jetty on branch2, we cannot do that. it has to be
   done in trunk.
  
   thx
  
   Alejandro
   (phone typing)
  
On Apr 11, 2014, at 4:46, Robert Rati rr...@redhat.com wrote:
  
   Just an FYI, but I'm working on updating that jetty patch for the
   current 2.4.0 release.  The one that is there won't cleanly apply
  because
   so much has changed since it was posted.  I'll post a new patch when
  it's
   done.
  
   Rob
  
On 04/11/2014 04:24 AM, Steve Loughran wrote:
  
   On 10 April 2014 18:12, Eli Collins e...@cloudera.com wrote:
  
   Let's speak less abstractly, are there particular features or new
   dependencies that you would like to contribute (or see contributed)
   that
   require using the Java 1.7 APIs?  Breaking compat in v2 or rolling
 a
  v3
   release are both non-trivial, not something I suspect we'd want to
 do
   just
   because it would be, for example, nicer to have a newer version of
   Jetty.
  
  
   Oddly enough, rolling the web framework is something I'd like to see
  in
   a
   v3. the shuffle may be off jetty, but webhdfs isn't. Moving up also
   lets is
   reliably switch to servlet API v3
  
   But.. I think we may be able to increment Jetty more without going
 to
   java7, see https://issues.apache.org/jira/browse/HADOOP-9650 .
  
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 



 --
 Alejandro


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-11 Thread Alejandro Abdelnur
newer jetties have non backwards compat APIs, we would break any user app
using jetty (consumed via hadoop classpath)



On Fri, Apr 11, 2014 at 2:16 PM, Steve Loughran ste...@hortonworks.comwrote:

 that doesn't actually stop is from switching in our own code to alternate
 web servers,  only that jetty can remain a published artifact in the
 hadoop/lib dir


 On 11 April 2014 21:16, Alejandro Abdelnur t...@cloudera.com wrote:

  because it is exposed as classpath dependency, changing it breaks
 backward
  compatibility.
 
 
  On Fri, Apr 11, 2014 at 1:02 PM, Steve Loughran ste...@hortonworks.com
  wrote:
 
   Jetty's a big change, it's fairly intimately involved in bits of the
 code
  
   but: it's a source of grief, currently webhdfs is an example
   https://issues.apache.org/jira/browse/HDFS-6221
  
   all YARN apps seem to get hosted by it too
  
  
   On 11 April 2014 20:56, Robert Rati rr...@redhat.com wrote:
  
I don't mean to be dense, but can you expand on why jetty 8 can't go
  into
branch2?  What is the concern?
   
Rob
   
   
On 04/11/2014 10:55 AM, Alejandro Abdelnur wrote:
   
if you mean updating jetty on branch2, we cannot do that. it has to
 be
done in trunk.
   
thx
   
Alejandro
(phone typing)
   
 On Apr 11, 2014, at 4:46, Robert Rati rr...@redhat.com wrote:
   
Just an FYI, but I'm working on updating that jetty patch for the
current 2.4.0 release.  The one that is there won't cleanly apply
   because
so much has changed since it was posted.  I'll post a new patch
 when
   it's
done.
   
Rob
   
 On 04/11/2014 04:24 AM, Steve Loughran wrote:
   
On 10 April 2014 18:12, Eli Collins e...@cloudera.com wrote:
   
Let's speak less abstractly, are there particular features or new
dependencies that you would like to contribute (or see
 contributed)
that
require using the Java 1.7 APIs?  Breaking compat in v2 or
 rolling
  a
   v3
release are both non-trivial, not something I suspect we'd want
 to
  do
just
because it would be, for example, nicer to have a newer version
 of
Jetty.
   
   
Oddly enough, rolling the web framework is something I'd like to
 see
   in
a
v3. the shuffle may be off jetty, but webhdfs isn't. Moving up
 also
lets is
reliably switch to servlet API v3
   
But.. I think we may be able to increment Jetty more without going
  to
java7, see https://issues.apache.org/jira/browse/HADOOP-9650 .
   
   
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
  to
   which it is addressed and may contain information that is confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 
 
 
  --
  Alejandro
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Alejandro


Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Steve Loughran
On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote:



 For the sake of this discussion we should separate the runtime from
 the programming APIs. Users are already migrating to the java7 runtime
 for most of the reasons listed below (support, performance, bugs,
 etc), and the various distributions cert their Hadoop 2 based
 distributions on java7.  This gives users many of the benefits of
 java7, without forcing users off java6. Ie Hadoop does not need to
 switch to the java7 programming APIs to make sure everyone has a
 supported runtime.


+1: you can use Java 7 today; I'm not sure how tested Java 8 is


 The question here is really about when Hadoop, and the Hadoop
 ecosystem (since adjacent projects often end up in the same classpath)
 start using the java7 programming APIs and therefore break
 compatibility with java6 runtimes. I think our java6 runtime users
 would consider dropping support for their java runtime in an update of
 a major release to be an incompatible change (the binaries stop
 working on their current jvm).


do you mean major 2.x - 3.y or minor 2.x - 2.(x+1)  here?


 That may be worth it if we can
 articulate sufficient value to offset the cost (they have to upgrade
 their environment, might make rolling upgrades stop working, etc), but
 I've not yet heard an argument that articulates the value relative to
 the cost.  Eg upgrading to the java7 APIs allows us to pull in
 dependencies with new major versions, but only if those dependencies
 don't break compatibility (which is likely given that our classpaths
 aren't so isolated), and, realistically, only if the entire Hadoop
 stack moves to java7 as well




 (eg we have to recompile HBase to
 generate v1.7 binaries even if they stick on API v1.6). I'm not aware
 of a feature, bug etc that really motivates this.

 I don't see that being needed unless we move up to new java7+ only
libraries and HBase needs to track this.

 The big recompile to work issue is google guava, which is troublesome
enough I'd be tempted to say can we drop it entirely



 An alternate approach is to keep the current stable release series
 (v2.x) as is, and start using new APIs in trunk (for v3). This will be
 a major upgrade for Hadoop and therefore an incompatible change like
 this is to be expected (it would be great if this came with additional
 changes to better isolate classpaths and dependencies from each
 other). It allows us to continue to support multiple types of users
 with different branches, vs forcing all users onto a new version. It
 of course means that 2.x users will not get the benefits of the new
 API, but its unclear what those benefits are given theIy can already
 get the benefits of adopting the newer java runtimes today.



I'm (personally) +1 to this, I also think we should plan to do the switch
some time this year to not only get the benefits, but discover the costs

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Raymie Stata
I think the problem to be solved here is to define a point in time
when the average Hadoop contributor can start using Java7 dependencies
in their code.

The use Java7 dependencies in trunk(/branch3) plan, by itself, does
not solve this problem.  The average Hadoop contributor wants to see
their contributions make it into a stable release in a predictable
amount of time.  Putting code with a Java7 dependency into trunk means
the exact opposite: there is no timeline to a stable release.  So most
contributors will stay away from Java7 dependencies, despite the
nominal policy that they're allowed in trunk.  (And the few that do
use Java7 dependencies are people who do not value releasing code into
stable releases, which arguably could lead to a situation that the
Java7-dependent code in trunk is, on average, on the buggy side.)

I'm not saying the branch2-in-the-future plan is the only way to
solve the problem of putting Java7 dependencies on a known time-table,
but at least it solves it.  Is there another solution?

On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.com wrote:
 On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote:



 For the sake of this discussion we should separate the runtime from
 the programming APIs. Users are already migrating to the java7 runtime
 for most of the reasons listed below (support, performance, bugs,
 etc), and the various distributions cert their Hadoop 2 based
 distributions on java7.  This gives users many of the benefits of
 java7, without forcing users off java6. Ie Hadoop does not need to
 switch to the java7 programming APIs to make sure everyone has a
 supported runtime.


 +1: you can use Java 7 today; I'm not sure how tested Java 8 is


 The question here is really about when Hadoop, and the Hadoop
 ecosystem (since adjacent projects often end up in the same classpath)
 start using the java7 programming APIs and therefore break
 compatibility with java6 runtimes. I think our java6 runtime users
 would consider dropping support for their java runtime in an update of
 a major release to be an incompatible change (the binaries stop
 working on their current jvm).


 do you mean major 2.x - 3.y or minor 2.x - 2.(x+1)  here?


 That may be worth it if we can
 articulate sufficient value to offset the cost (they have to upgrade
 their environment, might make rolling upgrades stop working, etc), but
 I've not yet heard an argument that articulates the value relative to
 the cost.  Eg upgrading to the java7 APIs allows us to pull in
 dependencies with new major versions, but only if those dependencies
 don't break compatibility (which is likely given that our classpaths
 aren't so isolated), and, realistically, only if the entire Hadoop
 stack moves to java7 as well




 (eg we have to recompile HBase to
 generate v1.7 binaries even if they stick on API v1.6). I'm not aware
 of a feature, bug etc that really motivates this.

 I don't see that being needed unless we move up to new java7+ only
 libraries and HBase needs to track this.

  The big recompile to work issue is google guava, which is troublesome
 enough I'd be tempted to say can we drop it entirely



 An alternate approach is to keep the current stable release series
 (v2.x) as is, and start using new APIs in trunk (for v3). This will be
 a major upgrade for Hadoop and therefore an incompatible change like
 this is to be expected (it would be great if this came with additional
 changes to better isolate classpaths and dependencies from each
 other). It allows us to continue to support multiple types of users
 with different branches, vs forcing all users onto a new version. It
 of course means that 2.x users will not get the benefits of the new
 API, but its unclear what those benefits are given theIy can already
 get the benefits of adopting the newer java runtimes today.



 I'm (personally) +1 to this, I also think we should plan to do the switch
 some time this year to not only get the benefits, but discover the costs

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Eli Collins
On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.comwrote:

 On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote:

 
 
  For the sake of this discussion we should separate the runtime from
  the programming APIs. Users are already migrating to the java7 runtime
  for most of the reasons listed below (support, performance, bugs,
  etc), and the various distributions cert their Hadoop 2 based
  distributions on java7.  This gives users many of the benefits of
  java7, without forcing users off java6. Ie Hadoop does not need to
  switch to the java7 programming APIs to make sure everyone has a
  supported runtime.
 
 
 +1: you can use Java 7 today; I'm not sure how tested Java 8 is


  The question here is really about when Hadoop, and the Hadoop
  ecosystem (since adjacent projects often end up in the same classpath)
  start using the java7 programming APIs and therefore break
  compatibility with java6 runtimes. I think our java6 runtime users
  would consider dropping support for their java runtime in an update of
  a major release to be an incompatible change (the binaries stop
  working on their current jvm).


 do you mean major 2.x - 3.y or minor 2.x - 2.(x+1)  here?


I mean 2.x -- 2.(x+1).  Ie I'm running the 2.4 stable and upgrading to 2.5.




  That may be worth it if we can
  articulate sufficient value to offset the cost (they have to upgrade
  their environment, might make rolling upgrades stop working, etc), but
  I've not yet heard an argument that articulates the value relative to
  the cost.  Eg upgrading to the java7 APIs allows us to pull in
  dependencies with new major versions, but only if those dependencies
  don't break compatibility (which is likely given that our classpaths
  aren't so isolated), and, realistically, only if the entire Hadoop
  stack moves to java7 as well




  (eg we have to recompile HBase to
  generate v1.7 binaries even if they stick on API v1.6). I'm not aware
  of a feature, bug etc that really motivates this.
 
  I don't see that being needed unless we move up to new java7+ only
 libraries and HBase needs to track this.

  The big recompile to work issue is google guava, which is troublesome
 enough I'd be tempted to say can we drop it entirely



  An alternate approach is to keep the current stable release series
  (v2.x) as is, and start using new APIs in trunk (for v3). This will be
  a major upgrade for Hadoop and therefore an incompatible change like
  this is to be expected (it would be great if this came with additional
  changes to better isolate classpaths and dependencies from each
  other). It allows us to continue to support multiple types of users
  with different branches, vs forcing all users onto a new version. It
  of course means that 2.x users will not get the benefits of the new
  API, but its unclear what those benefits are given theIy can already
  get the benefits of adopting the newer java runtimes today.
 
 
 
 I'm (personally) +1 to this, I also think we should plan to do the switch
 some time this year to not only get the benefits, but discover the costs



Agree



 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Eli Collins
On Thu, Apr 10, 2014 at 6:49 AM, Raymie Stata rst...@altiscale.com wrote:

 I think the problem to be solved here is to define a point in time
 when the average Hadoop contributor can start using Java7 dependencies
 in their code.

 The use Java7 dependencies in trunk(/branch3) plan, by itself, does
 not solve this problem.  The average Hadoop contributor wants to see
 their contributions make it into a stable release in a predictable
 amount of time.  Putting code with a Java7 dependency into trunk means
 the exact opposite: there is no timeline to a stable release.  So most
 contributors will stay away from Java7 dependencies, despite the
 nominal policy that they're allowed in trunk.  (And the few that do
 use Java7 dependencies are people who do not value releasing code into
 stable releases, which arguably could lead to a situation that the
 Java7-dependent code in trunk is, on average, on the buggy side.)

 I'm not saying the branch2-in-the-future plan is the only way to
 solve the problem of putting Java7 dependencies on a known time-table,
 but at least it solves it.  Is there another solution?


All good reasons for why we should start thinking about a plan for v3. The
points above pertain to any features for trunk that break compatibility,
not just ones that use new Java APIs.  We shouldn't permit incompatible
changes to merge to v2 just because we don't yet have a timeline for v3, we
should figure out the latter. Also motivates finishing the work to isolate
dependencies between Hadoop code, other framework code, and user code.

Let's speak less abstractly, are there particular features or new
dependencies that you would like to contribute (or see contributed) that
require using the Java 1.7 APIs?  Breaking compat in v2 or rolling a v3
release are both non-trivial, not something I suspect we'd want to do just
because it would be, for example, nicer to have a newer version of Jetty.

Thanks,
Eli







 On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.com
 wrote:
  On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote:
 
 
 
  For the sake of this discussion we should separate the runtime from
  the programming APIs. Users are already migrating to the java7 runtime
  for most of the reasons listed below (support, performance, bugs,
  etc), and the various distributions cert their Hadoop 2 based
  distributions on java7.  This gives users many of the benefits of
  java7, without forcing users off java6. Ie Hadoop does not need to
  switch to the java7 programming APIs to make sure everyone has a
  supported runtime.
 
 
  +1: you can use Java 7 today; I'm not sure how tested Java 8 is
 
 
  The question here is really about when Hadoop, and the Hadoop
  ecosystem (since adjacent projects often end up in the same classpath)
  start using the java7 programming APIs and therefore break
  compatibility with java6 runtimes. I think our java6 runtime users
  would consider dropping support for their java runtime in an update of
  a major release to be an incompatible change (the binaries stop
  working on their current jvm).
 
 
  do you mean major 2.x - 3.y or minor 2.x - 2.(x+1)  here?
 
 
  That may be worth it if we can
  articulate sufficient value to offset the cost (they have to upgrade
  their environment, might make rolling upgrades stop working, etc), but
  I've not yet heard an argument that articulates the value relative to
  the cost.  Eg upgrading to the java7 APIs allows us to pull in
  dependencies with new major versions, but only if those dependencies
  don't break compatibility (which is likely given that our classpaths
  aren't so isolated), and, realistically, only if the entire Hadoop
  stack moves to java7 as well
 
 
 
 
  (eg we have to recompile HBase to
  generate v1.7 binaries even if they stick on API v1.6). I'm not aware
  of a feature, bug etc that really motivates this.
 
  I don't see that being needed unless we move up to new java7+ only
  libraries and HBase needs to track this.
 
   The big recompile to work issue is google guava, which is troublesome
  enough I'd be tempted to say can we drop it entirely
 
 
 
  An alternate approach is to keep the current stable release series
  (v2.x) as is, and start using new APIs in trunk (for v3). This will be
  a major upgrade for Hadoop and therefore an incompatible change like
  this is to be expected (it would be great if this came with additional
  changes to better isolate classpaths and dependencies from each
  other). It allows us to continue to support multiple types of users
  with different branches, vs forcing all users onto a new version. It
  of course means that 2.x users will not get the benefits of the new
  API, but its unclear what those benefits are given theIy can already
  get the benefits of adopting the newer java runtimes today.
 
 
 
  I'm (personally) +1 to this, I also think we should plan to do the switch
  some time this year to not only get the benefits, but discover the 

Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Alejandro Abdelnur
A bit of a different angle.

As the bottom of the stack Hadoop has to be conservative in adopting
things, but it should not preclude consumers of Hadoop (downstream projects
and Hadoop application developers) to have additional requirements such as
a higher JDK API than JDK6.

Hadoop 2.x should stick to using JDK6  API
Hadoop 2.x should be tested with multiple runtimes: JDK6, JDK7 and
eventually JDK8
Downstream projects and Hadoop application developers are free to require
any JDK6+ version for development and runtime.

Hadoop 3.x should allow using JDK7 API, bumping the minimum runtime
requirement to JDK7 and be tested with JDK7 and JDK8 runtimes.

Thanks.



On Thu, Apr 10, 2014 at 10:04 AM, Eli Collins e...@cloudera.com wrote:

 On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran ste...@hortonworks.com
 wrote:

  On 9 April 2014 23:52, Eli Collins e...@cloudera.com wrote:
 
  
  
   For the sake of this discussion we should separate the runtime from
   the programming APIs. Users are already migrating to the java7 runtime
   for most of the reasons listed below (support, performance, bugs,
   etc), and the various distributions cert their Hadoop 2 based
   distributions on java7.  This gives users many of the benefits of
   java7, without forcing users off java6. Ie Hadoop does not need to
   switch to the java7 programming APIs to make sure everyone has a
   supported runtime.
  
  
  +1: you can use Java 7 today; I'm not sure how tested Java 8 is
 
 
   The question here is really about when Hadoop, and the Hadoop
   ecosystem (since adjacent projects often end up in the same classpath)
   start using the java7 programming APIs and therefore break
   compatibility with java6 runtimes. I think our java6 runtime users
   would consider dropping support for their java runtime in an update of
   a major release to be an incompatible change (the binaries stop
   working on their current jvm).
 
 
  do you mean major 2.x - 3.y or minor 2.x - 2.(x+1)  here?
 

 I mean 2.x -- 2.(x+1).  Ie I'm running the 2.4 stable and upgrading to
 2.5.


 
 
   That may be worth it if we can
   articulate sufficient value to offset the cost (they have to upgrade
   their environment, might make rolling upgrades stop working, etc), but
   I've not yet heard an argument that articulates the value relative to
   the cost.  Eg upgrading to the java7 APIs allows us to pull in
   dependencies with new major versions, but only if those dependencies
   don't break compatibility (which is likely given that our classpaths
   aren't so isolated), and, realistically, only if the entire Hadoop
   stack moves to java7 as well
 
 
 
 
   (eg we have to recompile HBase to
   generate v1.7 binaries even if they stick on API v1.6). I'm not aware
   of a feature, bug etc that really motivates this.
  
   I don't see that being needed unless we move up to new java7+ only
  libraries and HBase needs to track this.
 
   The big recompile to work issue is google guava, which is troublesome
  enough I'd be tempted to say can we drop it entirely
 
 
 
   An alternate approach is to keep the current stable release series
   (v2.x) as is, and start using new APIs in trunk (for v3). This will be
   a major upgrade for Hadoop and therefore an incompatible change like
   this is to be expected (it would be great if this came with additional
   changes to better isolate classpaths and dependencies from each
   other). It allows us to continue to support multiple types of users
   with different branches, vs forcing all users onto a new version. It
   of course means that 2.x users will not get the benefits of the new
   API, but its unclear what those benefits are given theIy can already
   get the benefits of adopting the newer java runtimes today.
  
  
  
  I'm (personally) +1 to this, I also think we should plan to do the switch
  some time this year to not only get the benefits, but discover the costs
 


 Agree



  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 




-- 
Alejandro


Re: Plans of moving towards JDK7 in trunk

2014-04-09 Thread Andrew Purtell
A Java 8 runtime would also offer transparent performance improvements like
a reimplementation of ConcurrentSkipListMap, C2 support for AES cipher
acceleration with native CPU instructions, perf improvements for going from
String to byte[] or vice versa, and IIRC after 8u20 monitor lock elision
using restricted transactional memory with hardware support (if available).
Getting away from fully transparent changes but tractable to deal with
using reflection, removal of the permanent generation, support for AEAD
cipher modes like AES-GCM, stronger cipher and key exchange algorithms, TLS
1.2, support for some krb 5 features not handled previously.



On Tue, Apr 8, 2014 at 7:44 PM, Raymie Stata rst...@altiscale.com wrote:

  It might make sense to try to enumerate the benefits of switching to
  Java7 APIs and dependencies.

   - Java7 introduced a huge number of language, byte-code, API, and
 tooling enhancements!  Just to name a few: try-with-resources, newer
 and stronger encyrption methods, more scalable concurrency primitives.
  See http://www.slideshare.net/boulderjug/55-things-in-java-7

   - We can't update current dependencies, and we can't add cool new ones.

   - Putting language/APIs aside, don't forget that a huge amount of effort
 goes into qualifying for Java6 (at least, I hope the folks claiming to
 support Java6 are putting in such an effort :-).  Wouldn't Hadoop
 users/customers be better served if qualification effort went into
 Java7/8 versus Java6/7?

 Getting to Java7 as a development env (and Java8 as a runtime env)
 seems like a no-brainer.  Question is: How?

 On Tue, Apr 8, 2014 at 10:21 AM, Sandy Ryza sandy.r...@cloudera.com
 wrote:
  It might make sense to try to enumerate the benefits of switching to
 Java7
  APIs and dependencies.  IMO, the ones listed so far on this thread don't
  make a compelling enough case to drop Java6 in branch-2 on any time
 frame,
  even if this means supporting Java6 through 2015.  For example, the
 change
  in RawLocalFileSystem semantics might be an incompatible change for
  branch-2 any way.
 
 
  On Tue, Apr 8, 2014 at 10:05 AM, Karthik Kambatla ka...@cloudera.com
 wrote:
 
  +1 to NOT breaking compatibility in branch-2.
 
  I think it is reasonable to require JDK7 for trunk, if we limit use of
  JDK7-only API to security fixes etc. If we make other optimizations
 (like
  IO), it would be a pain to backport things to branch-2. I guess this all
  depends on when we see ourselves shipping Hadoop-3. Any ideas on that?
 
 
  On Tue, Apr 8, 2014 at 9:19 AM, Eli Collins e...@cloudera.com wrote:
 
   On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
   davi.ottenhei...@emc.com wrote:
From: Eli Collins [mailto:e...@cloudera.com]
Sent: Monday, April 07, 2014 11:54 AM
   
   
IMO we should not drop support for Java 6 in a minor update of a
  stable
release (v2).  I don't think the larger Hadoop user base would
 find it
acceptable that upgrading to a minor update caused their systems to
  stop
working because they didn't upgrade Java. There are people still
  getting
support for Java 6. ...
   
Thanks,
Eli
   
Hi Eli,
   
Technically you are correct those with extended support get critical
   security fixes for 6 until the end of 2016. I am curious whether many
 of
   those are in the Hadoop user base. Do you know? My guess is the vast
   majority are within Oracle's official public end of life, which was
 over
  12
   months ago. Even Premier support ended Dec 2013:
   
http://www.oracle.com/technetwork/java/eol-135779.html
   
The end of Java 6 support carries much risk. It has to be
 considered in
   terms of serious security vulnerabilities such as CVE-2013-2465 with
 CVSS
   score 10.0.
   
http://www.cvedetails.com/cve/CVE-2013-2465/
   
Since you mentioned caused systems to stop as an example of what
  would
   be a concern to Hadoop users, please note the CVE-2013-2465
 availability
   impact:
   
Complete (There is a total shutdown of the affected resource. The
   attacker can render the resource completely unavailable.)
   
This vulnerability was patched in Java 6 Update 51, but post end of
   life. Apple pushed out the update specifically because of this
   vulnerability (http://support.apple.com/kb/HT5717) as did some other
   vendors privately, but for the majority of people using Java 6 means
 they
   have a ticking time bomb.
   
Allowing it to stay should be considered in terms of accepting the
  whole
   risk posture.
   
  
   There are some who get extended support, but I suspect many just have
   a if-it's-not-broke mentality when it comes to production deployments.
   The current code supports both java6 and java7 and so allows these
   people to remain compatible, while enabling others to upgrade to the
   java7 runtime. This seems like the right compromise for a stable
   release series. Again, absolutely makes sense for trunk (ie v3) to
   require java7 or greater.
  

Re: Plans of moving towards JDK7 in trunk

2014-04-09 Thread Eli Collins
I think this thread isn't so much about whether java7, 8 etc features
are valuable, they are useful of course, and we'll want to adopt them,
it's a question of how we adopt them and in which releases.

For the sake of this discussion we should separate the runtime from
the programming APIs. Users are already migrating to the java7 runtime
for most of the reasons listed below (support, performance, bugs,
etc), and the various distributions cert their Hadoop 2 based
distributions on java7.  This gives users many of the benefits of
java7, without forcing users off java6. Ie Hadoop does not need to
switch to the java7 programming APIs to make sure everyone has a
supported runtime.

The question here is really about when Hadoop, and the Hadoop
ecosystem (since adjacent projects often end up in the same classpath)
start using the java7 programming APIs and therefore break
compatibility with java6 runtimes. I think our java6 runtime users
would consider dropping support for their java runtime in an update of
a major release to be an incompatible change (the binaries stop
working on their current jvm). That may be worth it if we can
articulate sufficient value to offset the cost (they have to upgrade
their environment, might make rolling upgrades stop working, etc), but
I've not yet heard an argument that articulates the value relative to
the cost.  Eg upgrading to the java7 APIs allows us to pull in
dependencies with new major versions, but only if those dependencies
don't break compatibility (which is likely given that our classpaths
aren't so isolated), and, realistically, only if the entire Hadoop
stack moves to java7 as well (eg we have to recompile HBase to
generate v1.7 binaries even if they stick on API v1.6). I'm not aware
of a feature, bug etc that really motivates this.

An alternate approach is to keep the current stable release series
(v2.x) as is, and start using new APIs in trunk (for v3). This will be
a major upgrade for Hadoop and therefore an incompatible change like
this is to be expected (it would be great if this came with additional
changes to better isolate classpaths and dependencies from each
other). It allows us to continue to support multiple types of users
with different branches, vs forcing all users onto a new version. It
of course means that 2.x users will not get the benefits of the new
API, but its unclear what those benefits are given they can already
get the benefits of adopting the newer java runtimes today.

Thanks,
Eli


On Wed, Apr 9, 2014 at 9:38 AM, Andrew Purtell apurt...@apache.org wrote:
 A Java 8 runtime would also offer transparent performance improvements like
 a reimplementation of ConcurrentSkipListMap, C2 support for AES cipher
 acceleration with native CPU instructions, perf improvements for going from
 String to byte[] or vice versa, and IIRC after 8u20 monitor lock elision
 using restricted transactional memory with hardware support (if available).
 Getting away from fully transparent changes but tractable to deal with
 using reflection, removal of the permanent generation, support for AEAD
 cipher modes like AES-GCM, stronger cipher and key exchange algorithms, TLS
 1.2, support for some krb 5 features not handled previously.



 On Tue, Apr 8, 2014 at 7:44 PM, Raymie Stata rst...@altiscale.com wrote:

  It might make sense to try to enumerate the benefits of switching to
  Java7 APIs and dependencies.

   - Java7 introduced a huge number of language, byte-code, API, and
 tooling enhancements!  Just to name a few: try-with-resources, newer
 and stronger encyrption methods, more scalable concurrency primitives.
  See http://www.slideshare.net/boulderjug/55-things-in-java-7

   - We can't update current dependencies, and we can't add cool new ones.

   - Putting language/APIs aside, don't forget that a huge amount of effort
 goes into qualifying for Java6 (at least, I hope the folks claiming to
 support Java6 are putting in such an effort :-).  Wouldn't Hadoop
 users/customers be better served if qualification effort went into
 Java7/8 versus Java6/7?

 Getting to Java7 as a development env (and Java8 as a runtime env)
 seems like a no-brainer.  Question is: How?

 On Tue, Apr 8, 2014 at 10:21 AM, Sandy Ryza sandy.r...@cloudera.com
 wrote:
  It might make sense to try to enumerate the benefits of switching to
 Java7
  APIs and dependencies.  IMO, the ones listed so far on this thread don't
  make a compelling enough case to drop Java6 in branch-2 on any time
 frame,
  even if this means supporting Java6 through 2015.  For example, the
 change
  in RawLocalFileSystem semantics might be an incompatible change for
  branch-2 any way.
 
 
  On Tue, Apr 8, 2014 at 10:05 AM, Karthik Kambatla ka...@cloudera.com
 wrote:
 
  +1 to NOT breaking compatibility in branch-2.
 
  I think it is reasonable to require JDK7 for trunk, if we limit use of
  JDK7-only API to security fixes etc. If we make other optimizations
 (like
  IO), it would be a pain to backport things to 

Re: Plans of moving towards JDK7 in trunk

2014-04-09 Thread Vinayakumar B
+1 for keeping jdk 6 suppprt in branch-2 and start using jdk 7 in trunk.

I agree that this approach makes patch generation difficult for branch-2
and trunk.

Also the actual benefit and real issues after start using jdk7 will be
known only if atleast one of the release is out in trunk version.

Regards,
Vinay
I think this thread isn't so much about whether java7, 8 etc features
are valuable, they are useful of course, and we'll want to adopt them,
it's a question of how we adopt them and in which releases.

For the sake of this discussion we should separate the runtime from
the programming APIs. Users are already migrating to the java7 runtime
for most of the reasons listed below (support, performance, bugs,
etc), and the various distributions cert their Hadoop 2 based
distributions on java7.  This gives users many of the benefits of
java7, without forcing users off java6. Ie Hadoop does not need to
switch to the java7 programming APIs to make sure everyone has a
supported runtime.

The question here is really about when Hadoop, and the Hadoop
ecosystem (since adjacent projects often end up in the same classpath)
start using the java7 programming APIs and therefore break
compatibility with java6 runtimes. I think our java6 runtime users
would consider dropping support for their java runtime in an update of
a major release to be an incompatible change (the binaries stop
working on their current jvm). That may be worth it if we can
articulate sufficient value to offset the cost (they have to upgrade
their environment, might make rolling upgrades stop working, etc), but
I've not yet heard an argument that articulates the value relative to
the cost.  Eg upgrading to the java7 APIs allows us to pull in
dependencies with new major versions, but only if those dependencies
don't break compatibility (which is likely given that our classpaths
aren't so isolated), and, realistically, only if the entire Hadoop
stack moves to java7 as well (eg we have to recompile HBase to
generate v1.7 binaries even if they stick on API v1.6). I'm not aware
of a feature, bug etc that really motivates this.

An alternate approach is to keep the current stable release series
(v2.x) as is, and start using new APIs in trunk (for v3). This will be
a major upgrade for Hadoop and therefore an incompatible change like
this is to be expected (it would be great if this came with additional
changes to better isolate classpaths and dependencies from each
other). It allows us to continue to support multiple types of users
with different branches, vs forcing all users onto a new version. It
of course means that 2.x users will not get the benefits of the new
API, but its unclear what those benefits are given they can already
get the benefits of adopting the newer java runtimes today.

Thanks,
Eli


On Wed, Apr 9, 2014 at 9:38 AM, Andrew Purtell apurt...@apache.org wrote:
 A Java 8 runtime would also offer transparent performance improvements
like
 a reimplementation of ConcurrentSkipListMap, C2 support for AES cipher
 acceleration with native CPU instructions, perf improvements for going
from
 String to byte[] or vice versa, and IIRC after 8u20 monitor lock elision
 using restricted transactional memory with hardware support (if
available).
 Getting away from fully transparent changes but tractable to deal with
 using reflection, removal of the permanent generation, support for AEAD
 cipher modes like AES-GCM, stronger cipher and key exchange algorithms,
TLS
 1.2, support for some krb 5 features not handled previously.



 On Tue, Apr 8, 2014 at 7:44 PM, Raymie Stata rst...@altiscale.com wrote:

  It might make sense to try to enumerate the benefits of switching to
  Java7 APIs and dependencies.

   - Java7 introduced a huge number of language, byte-code, API, and
 tooling enhancements!  Just to name a few: try-with-resources, newer
 and stronger encyrption methods, more scalable concurrency primitives.
  See http://www.slideshare.net/boulderjug/55-things-in-java-7

   - We can't update current dependencies, and we can't add cool new ones.

   - Putting language/APIs aside, don't forget that a huge amount of
effort
 goes into qualifying for Java6 (at least, I hope the folks claiming to
 support Java6 are putting in such an effort :-).  Wouldn't Hadoop
 users/customers be better served if qualification effort went into
 Java7/8 versus Java6/7?

 Getting to Java7 as a development env (and Java8 as a runtime env)
 seems like a no-brainer.  Question is: How?

 On Tue, Apr 8, 2014 at 10:21 AM, Sandy Ryza sandy.r...@cloudera.com
 wrote:
  It might make sense to try to enumerate the benefits of switching to
 Java7
  APIs and dependencies.  IMO, the ones listed so far on this thread
don't
  make a compelling enough case to drop Java6 in branch-2 on any time
 frame,
  even if this means supporting Java6 through 2015.  For example, the
 change
  in RawLocalFileSystem semantics might be an incompatible change for
  branch-2 any way.
 
 
  On Tue, Apr 8, 2014 

RE: Plans of moving towards JDK7 in trunk

2014-04-08 Thread Ottenheimer, Davi
 From: Eli Collins [mailto:e...@cloudera.com]
 Sent: Monday, April 07, 2014 11:54 AM
 
 
 IMO we should not drop support for Java 6 in a minor update of a stable
 release (v2).  I don't think the larger Hadoop user base would find it
 acceptable that upgrading to a minor update caused their systems to stop
 working because they didn't upgrade Java. There are people still getting
 support for Java 6. ...
 
 Thanks,
 Eli

Hi Eli, 

Technically you are correct those with extended support get critical security 
fixes for 6 until the end of 2016. I am curious whether many of those are in 
the Hadoop user base. Do you know? My guess is the vast majority are within 
Oracle's official public end of life, which was over 12 months ago. Even 
Premier support ended Dec 2013:

http://www.oracle.com/technetwork/java/eol-135779.html

The end of Java 6 support carries much risk. It has to be considered in terms 
of serious security vulnerabilities such as CVE-2013-2465 with CVSS score 10.0. 

http://www.cvedetails.com/cve/CVE-2013-2465/

Since you mentioned caused systems to stop as an example of what would be a 
concern to Hadoop users, please note the CVE-2013-2465 availability impact:

Complete (There is a total shutdown of the affected resource. The attacker can 
render the resource completely unavailable.)

This vulnerability was patched in Java 6 Update 51, but post end of life. Apple 
pushed out the update specifically because of this vulnerability 
(http://support.apple.com/kb/HT5717) as did some other vendors privately, but 
for the majority of people using Java 6 means they have a ticking time bomb. 

Allowing it to stay should be considered in terms of accepting the whole risk 
posture.

Davi


Re: Plans of moving towards JDK7 in trunk

2014-04-08 Thread Sandy Ryza
+1 for maintaining Java 6 support in branch-2.

Hadoop continuing to support Java 6 is not an endorsement of Java 6.  It's
an acknowledgement that many users of Hadoop 2 have Java 6 embedded in
their stack, and that upgrading is costly for some users and simply not an
option for others.  If a similar vulnerability were to be discovered in a
recent version of RHEL, I don't think it would make sense for Hadoop to
drop that version as a supported platform.

Assuming that we want to maintain Java 6 compatibility in branch-2, it
seems to me that we should do the same in trunk until we start seriously
planning a release of Hadoop 3.  Since we released 2.2 GA, trunk has mainly
been used as a staging area for changes that will go into branch-2.  The
larger the divergence between trunk and branch-2, the higher the overhead
for developers writing patches that need to go into both.  Eventually we'll
need to stomach this, but is there an advantage to doing so while Hadoop 3
is still remote?

-Sandy

On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
davi.ottenhei...@emc.comwrote:

  From: Eli Collins [mailto:e...@cloudera.com]
  Sent: Monday, April 07, 2014 11:54 AM
 
 
  IMO we should not drop support for Java 6 in a minor update of a stable
  release (v2).  I don't think the larger Hadoop user base would find it
  acceptable that upgrading to a minor update caused their systems to stop
  working because they didn't upgrade Java. There are people still getting
  support for Java 6. ...
 
  Thanks,
  Eli

 Hi Eli,

 Technically you are correct those with extended support get critical
 security fixes for 6 until the end of 2016. I am curious whether many of
 those are in the Hadoop user base. Do you know? My guess is the vast
 majority are within Oracle's official public end of life, which was over 12
 months ago. Even Premier support ended Dec 2013:

 http://www.oracle.com/technetwork/java/eol-135779.html

 The end of Java 6 support carries much risk. It has to be considered in
 terms of serious security vulnerabilities such as CVE-2013-2465 with CVSS
 score 10.0.

 http://www.cvedetails.com/cve/CVE-2013-2465/

 Since you mentioned caused systems to stop as an example of what would
 be a concern to Hadoop users, please note the CVE-2013-2465 availability
 impact:

 Complete (There is a total shutdown of the affected resource. The
 attacker can render the resource completely unavailable.)

 This vulnerability was patched in Java 6 Update 51, but post end of life.
 Apple pushed out the update specifically because of this vulnerability (
 http://support.apple.com/kb/HT5717) as did some other vendors privately,
 but for the majority of people using Java 6 means they have a ticking time
 bomb.

 Allowing it to stay should be considered in terms of accepting the whole
 risk posture.

 Davi



Re: Plans of moving towards JDK7 in trunk

2014-04-08 Thread Eli Collins
On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
davi.ottenhei...@emc.com wrote:
 From: Eli Collins [mailto:e...@cloudera.com]
 Sent: Monday, April 07, 2014 11:54 AM


 IMO we should not drop support for Java 6 in a minor update of a stable
 release (v2).  I don't think the larger Hadoop user base would find it
 acceptable that upgrading to a minor update caused their systems to stop
 working because they didn't upgrade Java. There are people still getting
 support for Java 6. ...

 Thanks,
 Eli

 Hi Eli,

 Technically you are correct those with extended support get critical security 
 fixes for 6 until the end of 2016. I am curious whether many of those are in 
 the Hadoop user base. Do you know? My guess is the vast majority are within 
 Oracle's official public end of life, which was over 12 months ago. Even 
 Premier support ended Dec 2013:

 http://www.oracle.com/technetwork/java/eol-135779.html

 The end of Java 6 support carries much risk. It has to be considered in terms 
 of serious security vulnerabilities such as CVE-2013-2465 with CVSS score 
 10.0.

 http://www.cvedetails.com/cve/CVE-2013-2465/

 Since you mentioned caused systems to stop as an example of what would be a 
 concern to Hadoop users, please note the CVE-2013-2465 availability impact:

 Complete (There is a total shutdown of the affected resource. The attacker 
 can render the resource completely unavailable.)

 This vulnerability was patched in Java 6 Update 51, but post end of life. 
 Apple pushed out the update specifically because of this vulnerability 
 (http://support.apple.com/kb/HT5717) as did some other vendors privately, but 
 for the majority of people using Java 6 means they have a ticking time bomb.

 Allowing it to stay should be considered in terms of accepting the whole risk 
 posture.


There are some who get extended support, but I suspect many just have
a if-it's-not-broke mentality when it comes to production deployments.
The current code supports both java6 and java7 and so allows these
people to remain compatible, while enabling others to upgrade to the
java7 runtime. This seems like the right compromise for a stable
release series. Again, absolutely makes sense for trunk (ie v3) to
require java7 or greater.


Re: Plans of moving towards JDK7 in trunk

2014-04-08 Thread Raymie Stata
Is there broad consensus that, by end of 3Q2014 at the latest, the
average contributor to Hadoop should be free to use Java7 features?
And start pulling in libraries that have a Java7 dependency?  And
start doing the janitorial work of taking advantage of the Java7
APIs?  Or do we think that the bulk of Hadoop work will be done
against Java6 APIs (and avoiding Java7-dependent libraries) through
the end of the year?

If the consensus is that we introduce Java7 into the bulk of Hadoop
coding, what's the plan for getting there?  The answer can't be right
now, in trunk.  Even if we agreed to start allowing Java7
dependencies into trunk, as a practical matter this isn't enough.
Right now, if I'm a random Hadoop contributor, I'd be stupid to
contribute to trunk: I know that any stable release in the near term
will be from branch2, so if I want a prayer of seeing my change in a
stable release, I'd better contribute to branch2.

If we want a path to allowing Java7 dependencies by Q4, then we need
one of the following:

1) branch3 plan: The major Hadoop vendors (you know who you are)
commit to shipping a v3 of Hadoop in Q4 that allows Java7
dependencies and show signs of living up to that commitment (e.g., a
branch3 is created sometime soon).  This puts us all on a path towards
a real release of Hadoop that allows Java7 dependencies.

2) branch2 plan: deprecate Java6 as a runtime environment now,
publicly declare a time frame (e.g., 4Q2014) when _future development_
stops supporting Java6 runtime, and work with our customers in the
meantime to get them off a crazy-old version of Java (that's what
we're doing right now).

I don't see another path to allowing Java7 dependencies.  In the
current state of indecision, the smart programmer would be assuming no
Java7 dependencies into 2015.

On the one hand, I don't see the branch3 plan actually happening.
This is a big decision involving marketing, engineering, customer
support.  Plus it creates a problem for sales: Come summertime,
they'll have a hard time selling 2.x-based releases because they've
pre-announced support for 3.x.  It's just not going to happen.

On the other hand, I don't see the problem with the branch2 plan.  The
branch2 plan also requires the commitment from the major vendors, but
this decision is not nearly as galactic.  By the time 3Q2014 comes
along, this problem will be very rarified.  Also, don't forget that it
typically takes a customer 3-6 months to upgrade their Hadoop -- and a
customer who's afraid to shift off Java6 in 3Q2014 will probably take
a year to upgrade.  The branch2 plan implies a last Java6 release of
Hadoop in 3Q2014.  If we assume a Java7-averse customer will take a
year to upgrade to this release -- and then will take another year to
upgrade their cluster after that -- then they can be happily using
Java6 all the way into 2016.  (Another point, if 3Q2014 comes along
and vendors find they have so many customers still on Java6 that they
can't afford the discontinuity, then they can shift their MAJOR
version number of their product to communicate the discontinuity --
there's nothing that says that a vendor's versioning scheme must agree
exactly with Hadoop's.)

In short, we don't currently have a realistic path for introducing
Java7 dependencies into Hadoop.  Simply allowing them into trunk will
NOT solve this problem: any contributor who wants to see their code in
a stable release knows it'll have to flow through branch2 -- and thus
they'll have to avoid Java6 dependencies.  The branch2 plan is the
only plan proposed so far that gets us to Java7 dependencies by Q4.
And the important part of the branch2 plan is we make the decision
soon -- so we have time to notify folks and otherwise work that
decision out into the field.

  Raymie



On Tue, Apr 8, 2014 at 9:19 AM, Eli Collins e...@cloudera.com wrote:
 On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
 davi.ottenhei...@emc.com wrote:
 From: Eli Collins [mailto:e...@cloudera.com]
 Sent: Monday, April 07, 2014 11:54 AM


 IMO we should not drop support for Java 6 in a minor update of a stable
 release (v2).  I don't think the larger Hadoop user base would find it
 acceptable that upgrading to a minor update caused their systems to stop
 working because they didn't upgrade Java. There are people still getting
 support for Java 6. ...

 Thanks,
 Eli

 Hi Eli,

 Technically you are correct those with extended support get critical 
 security fixes for 6 until the end of 2016. I am curious whether many of 
 those are in the Hadoop user base. Do you know? My guess is the vast 
 majority are within Oracle's official public end of life, which was over 12 
 months ago. Even Premier support ended Dec 2013:

 http://www.oracle.com/technetwork/java/eol-135779.html

 The end of Java 6 support carries much risk. It has to be considered in 
 terms of serious security vulnerabilities such as CVE-2013-2465 with CVSS 
 score 10.0.

 http://www.cvedetails.com/cve/CVE-2013-2465/

 Since you 

Re: Plans of moving towards JDK7 in trunk

2014-04-08 Thread Karthik Kambatla
+1 to NOT breaking compatibility in branch-2.

I think it is reasonable to require JDK7 for trunk, if we limit use of
JDK7-only API to security fixes etc. If we make other optimizations (like
IO), it would be a pain to backport things to branch-2. I guess this all
depends on when we see ourselves shipping Hadoop-3. Any ideas on that?


On Tue, Apr 8, 2014 at 9:19 AM, Eli Collins e...@cloudera.com wrote:

 On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
 davi.ottenhei...@emc.com wrote:
  From: Eli Collins [mailto:e...@cloudera.com]
  Sent: Monday, April 07, 2014 11:54 AM
 
 
  IMO we should not drop support for Java 6 in a minor update of a stable
  release (v2).  I don't think the larger Hadoop user base would find it
  acceptable that upgrading to a minor update caused their systems to stop
  working because they didn't upgrade Java. There are people still getting
  support for Java 6. ...
 
  Thanks,
  Eli
 
  Hi Eli,
 
  Technically you are correct those with extended support get critical
 security fixes for 6 until the end of 2016. I am curious whether many of
 those are in the Hadoop user base. Do you know? My guess is the vast
 majority are within Oracle's official public end of life, which was over 12
 months ago. Even Premier support ended Dec 2013:
 
  http://www.oracle.com/technetwork/java/eol-135779.html
 
  The end of Java 6 support carries much risk. It has to be considered in
 terms of serious security vulnerabilities such as CVE-2013-2465 with CVSS
 score 10.0.
 
  http://www.cvedetails.com/cve/CVE-2013-2465/
 
  Since you mentioned caused systems to stop as an example of what would
 be a concern to Hadoop users, please note the CVE-2013-2465 availability
 impact:
 
  Complete (There is a total shutdown of the affected resource. The
 attacker can render the resource completely unavailable.)
 
  This vulnerability was patched in Java 6 Update 51, but post end of
 life. Apple pushed out the update specifically because of this
 vulnerability (http://support.apple.com/kb/HT5717) as did some other
 vendors privately, but for the majority of people using Java 6 means they
 have a ticking time bomb.
 
  Allowing it to stay should be considered in terms of accepting the whole
 risk posture.
 

 There are some who get extended support, but I suspect many just have
 a if-it's-not-broke mentality when it comes to production deployments.
 The current code supports both java6 and java7 and so allows these
 people to remain compatible, while enabling others to upgrade to the
 java7 runtime. This seems like the right compromise for a stable
 release series. Again, absolutely makes sense for trunk (ie v3) to
 require java7 or greater.



Re: Plans of moving towards JDK7 in trunk

2014-04-08 Thread Sandy Ryza
It might make sense to try to enumerate the benefits of switching to Java7
APIs and dependencies.  IMO, the ones listed so far on this thread don't
make a compelling enough case to drop Java6 in branch-2 on any time frame,
even if this means supporting Java6 through 2015.  For example, the change
in RawLocalFileSystem semantics might be an incompatible change for
branch-2 any way.


On Tue, Apr 8, 2014 at 10:05 AM, Karthik Kambatla ka...@cloudera.comwrote:

 +1 to NOT breaking compatibility in branch-2.

 I think it is reasonable to require JDK7 for trunk, if we limit use of
 JDK7-only API to security fixes etc. If we make other optimizations (like
 IO), it would be a pain to backport things to branch-2. I guess this all
 depends on when we see ourselves shipping Hadoop-3. Any ideas on that?


 On Tue, Apr 8, 2014 at 9:19 AM, Eli Collins e...@cloudera.com wrote:

  On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
  davi.ottenhei...@emc.com wrote:
   From: Eli Collins [mailto:e...@cloudera.com]
   Sent: Monday, April 07, 2014 11:54 AM
  
  
   IMO we should not drop support for Java 6 in a minor update of a
 stable
   release (v2).  I don't think the larger Hadoop user base would find it
   acceptable that upgrading to a minor update caused their systems to
 stop
   working because they didn't upgrade Java. There are people still
 getting
   support for Java 6. ...
  
   Thanks,
   Eli
  
   Hi Eli,
  
   Technically you are correct those with extended support get critical
  security fixes for 6 until the end of 2016. I am curious whether many of
  those are in the Hadoop user base. Do you know? My guess is the vast
  majority are within Oracle's official public end of life, which was over
 12
  months ago. Even Premier support ended Dec 2013:
  
   http://www.oracle.com/technetwork/java/eol-135779.html
  
   The end of Java 6 support carries much risk. It has to be considered in
  terms of serious security vulnerabilities such as CVE-2013-2465 with CVSS
  score 10.0.
  
   http://www.cvedetails.com/cve/CVE-2013-2465/
  
   Since you mentioned caused systems to stop as an example of what
 would
  be a concern to Hadoop users, please note the CVE-2013-2465 availability
  impact:
  
   Complete (There is a total shutdown of the affected resource. The
  attacker can render the resource completely unavailable.)
  
   This vulnerability was patched in Java 6 Update 51, but post end of
  life. Apple pushed out the update specifically because of this
  vulnerability (http://support.apple.com/kb/HT5717) as did some other
  vendors privately, but for the majority of people using Java 6 means they
  have a ticking time bomb.
  
   Allowing it to stay should be considered in terms of accepting the
 whole
  risk posture.
  
 
  There are some who get extended support, but I suspect many just have
  a if-it's-not-broke mentality when it comes to production deployments.
  The current code supports both java6 and java7 and so allows these
  people to remain compatible, while enabling others to upgrade to the
  java7 runtime. This seems like the right compromise for a stable
  release series. Again, absolutely makes sense for trunk (ie v3) to
  require java7 or greater.
 



Re: Plans of moving towards JDK7 in trunk

2014-04-08 Thread Raymie Stata
 It might make sense to try to enumerate the benefits of switching to
 Java7 APIs and dependencies.

  - Java7 introduced a huge number of language, byte-code, API, and
tooling enhancements!  Just to name a few: try-with-resources, newer
and stronger encyrption methods, more scalable concurrency primitives.
 See http://www.slideshare.net/boulderjug/55-things-in-java-7

  - We can't update current dependencies, and we can't add cool new ones.

  - Putting language/APIs aside, don't forget that a huge amount of effort
goes into qualifying for Java6 (at least, I hope the folks claiming to
support Java6 are putting in such an effort :-).  Wouldn't Hadoop
users/customers be better served if qualification effort went into
Java7/8 versus Java6/7?

Getting to Java7 as a development env (and Java8 as a runtime env)
seems like a no-brainer.  Question is: How?

On Tue, Apr 8, 2014 at 10:21 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
 It might make sense to try to enumerate the benefits of switching to Java7
 APIs and dependencies.  IMO, the ones listed so far on this thread don't
 make a compelling enough case to drop Java6 in branch-2 on any time frame,
 even if this means supporting Java6 through 2015.  For example, the change
 in RawLocalFileSystem semantics might be an incompatible change for
 branch-2 any way.


 On Tue, Apr 8, 2014 at 10:05 AM, Karthik Kambatla ka...@cloudera.comwrote:

 +1 to NOT breaking compatibility in branch-2.

 I think it is reasonable to require JDK7 for trunk, if we limit use of
 JDK7-only API to security fixes etc. If we make other optimizations (like
 IO), it would be a pain to backport things to branch-2. I guess this all
 depends on when we see ourselves shipping Hadoop-3. Any ideas on that?


 On Tue, Apr 8, 2014 at 9:19 AM, Eli Collins e...@cloudera.com wrote:

  On Tue, Apr 8, 2014 at 2:00 AM, Ottenheimer, Davi
  davi.ottenhei...@emc.com wrote:
   From: Eli Collins [mailto:e...@cloudera.com]
   Sent: Monday, April 07, 2014 11:54 AM
  
  
   IMO we should not drop support for Java 6 in a minor update of a
 stable
   release (v2).  I don't think the larger Hadoop user base would find it
   acceptable that upgrading to a minor update caused their systems to
 stop
   working because they didn't upgrade Java. There are people still
 getting
   support for Java 6. ...
  
   Thanks,
   Eli
  
   Hi Eli,
  
   Technically you are correct those with extended support get critical
  security fixes for 6 until the end of 2016. I am curious whether many of
  those are in the Hadoop user base. Do you know? My guess is the vast
  majority are within Oracle's official public end of life, which was over
 12
  months ago. Even Premier support ended Dec 2013:
  
   http://www.oracle.com/technetwork/java/eol-135779.html
  
   The end of Java 6 support carries much risk. It has to be considered in
  terms of serious security vulnerabilities such as CVE-2013-2465 with CVSS
  score 10.0.
  
   http://www.cvedetails.com/cve/CVE-2013-2465/
  
   Since you mentioned caused systems to stop as an example of what
 would
  be a concern to Hadoop users, please note the CVE-2013-2465 availability
  impact:
  
   Complete (There is a total shutdown of the affected resource. The
  attacker can render the resource completely unavailable.)
  
   This vulnerability was patched in Java 6 Update 51, but post end of
  life. Apple pushed out the update specifically because of this
  vulnerability (http://support.apple.com/kb/HT5717) as did some other
  vendors privately, but for the majority of people using Java 6 means they
  have a ticking time bomb.
  
   Allowing it to stay should be considered in terms of accepting the
 whole
  risk posture.
  
 
  There are some who get extended support, but I suspect many just have
  a if-it's-not-broke mentality when it comes to production deployments.
  The current code supports both java6 and java7 and so allows these
  people to remain compatible, while enabling others to upgrade to the
  java7 runtime. This seems like the right compromise for a stable
  release series. Again, absolutely makes sense for trunk (ie v3) to
  require java7 or greater.
 



Re: Plans of moving towards JDK7 in trunk

2014-04-07 Thread Haohui Mai
It looks to me that the majority of this thread welcomes JDK7. Just to
reiterate, there are two separate questions here:

1. When should hadoop-trunk can be only built on top of JDK7?
2. When should hadoop-branch-2 can be only built on top of JDK7?

The answers of the above questions directly imply when and how hadoop can
break the compatibility for JDK6 runtime.

It looks that there are quite a bit of compatibility concerns of question
(2). Should we focus on question (1) and come up with a plan? Personally
I'd love to see (1) to happen as soon as possible.

~Haohui

On Sun, Apr 6, 2014 at 11:37 AM, Steve Loughran ste...@hortonworks.comwrote:

 On 5 April 2014 20:54, Raymie Stata rst...@altiscale.com wrote:

  To summarize the thread so far:
 
  a) Java7 is already a supported compile- and runtime environment for
  Hadoop branch2 and trunk
  b) Java6 must remain a supported compile- and runtime environment for
  Hadoop branch2
  c) (b) implies that branch2 must stick to Java6 APIs
 
  I wonder if point (b) should be revised.  We could immediately
  deprecate Java6 as a runtime (and thus compile-time) environment for
  Hadoop.  We could end support for in some published time frame
  (perhaps 3Q2014).  That is, we'd say that all future 2.x release past
  some date would not be guaranteed to run on Java6.  This would set us
  up for using Java7 APIs into branch2.
 

 I'll let others deal with that question.


 
  An alternative might be to keep branch2 on Java6 APIs forever, and to
  start using Java7 APIs in trunk relatively soon.  The concern here
  would be that trunk isn't getting the kind of production torture
  testing that branch2 is subjected to, and won't be for a while.  If
  trunk and branch2 diverge too much too quickly, trunk could become a
  nest of bugs, endangering the timeline and quality of Hadoop 3.  This
  would argue for keeping trunk and branch2 in closer sync (maybe until
  a branch3 is created and starts getting used by bleeding-edge users).
  However, as just suggested, keeping them in closer sync need _not_
  imply that Java7 features be avoided indefinitely: again, with
  sufficient warning, Java6 support could be sunset within branch2.
 

 One thing we could do is have a policy towards new features where there's
 consensus that they won't go into branch-2, especially things in their own
 JARs.

 Here we could consider a policy of build set up to be Java 7 only, with
 Java7 APIs.

 That would be justified if there was some special java 7 feature -such as
 its infiniband support. Add a feature like that in its own module (under
 hadoop-tools, presumably), and use Java7 and Java 7 libraries. If someone
 really did want to use the feature in hadoop-2, they'd be able to, in a
 java7+ only backport.


 
  On a related note, Steve points out that we need to start thinking
  about Java8.  YES!!  Lambdas are a Really Big Deal!  If we sunset
  Java6 in a few quarters, maybe we can add Java8 compile and runtime
  (but not API) support about the same time.  This does NOT imply
  bringing Java8 APIs into branch2: Even if we do allow Java7 APIs into
  branch2 in the future, I doubt that bringing Java8 APIs into it will
  ever make sense.  However, if Java8 is a supported runtime environment
  for Hadoop 2, that sets us up for using Java8 APIs for the eventual
  branch3 sometime in 2015.
 
 
 Testing Hadoop on Java 8 would let the rest of the stack move forward.

 In the meantime, can I point out that both Scala-on-Java7 and
 Groovy-on-Java 7 offer closures quite nicely, with performance by way of
 INVOKEDYNAMIC opcodes.

 -steve

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-07 Thread Eli Collins
On Sat, Apr 5, 2014 at 12:54 PM, Raymie Stata rst...@altiscale.com wrote:
 To summarize the thread so far:

 a) Java7 is already a supported compile- and runtime environment for
 Hadoop branch2 and trunk
 b) Java6 must remain a supported compile- and runtime environment for
 Hadoop branch2
 c) (b) implies that branch2 must stick to Java6 APIs

 I wonder if point (b) should be revised.  We could immediately
 deprecate Java6 as a runtime (and thus compile-time) environment for
 Hadoop.  We could end support for in some published time frame
 (perhaps 3Q2014).  That is, we'd say that all future 2.x release past
 some date would not be guaranteed to run on Java6.  This would set us
 up for using Java7 APIs into branch2.

IMO we should not drop support for Java 6 in a minor update of a
stable release (v2).  I don't think the larger Hadoop user base would
find it acceptable that upgrading to a minor update caused their
systems to stop working because they didn't upgrade Java. There are
people still getting support for Java 6. For the same reason, the
various distributions will not want to drop support in a minor update
of their products also, and since distros are using the Apache v2.x
update releases as the basis for their updates it would mean they have
to stop shipping v2.x updates, which makes it harder to collaborate
upstream.

Your point with regard to testing and releasing trunk is valid, though
we need to address that anyway, outside the context of Java versions.

Thanks,
Eli


Re: Plans of moving towards JDK7 in trunk

2014-04-06 Thread Steve Loughran
On 5 April 2014 20:54, Raymie Stata rst...@altiscale.com wrote:

 To summarize the thread so far:

 a) Java7 is already a supported compile- and runtime environment for
 Hadoop branch2 and trunk
 b) Java6 must remain a supported compile- and runtime environment for
 Hadoop branch2
 c) (b) implies that branch2 must stick to Java6 APIs

 I wonder if point (b) should be revised.  We could immediately
 deprecate Java6 as a runtime (and thus compile-time) environment for
 Hadoop.  We could end support for in some published time frame
 (perhaps 3Q2014).  That is, we'd say that all future 2.x release past
 some date would not be guaranteed to run on Java6.  This would set us
 up for using Java7 APIs into branch2.


I'll let others deal with that question.



 An alternative might be to keep branch2 on Java6 APIs forever, and to
 start using Java7 APIs in trunk relatively soon.  The concern here
 would be that trunk isn't getting the kind of production torture
 testing that branch2 is subjected to, and won't be for a while.  If
 trunk and branch2 diverge too much too quickly, trunk could become a
 nest of bugs, endangering the timeline and quality of Hadoop 3.  This
 would argue for keeping trunk and branch2 in closer sync (maybe until
 a branch3 is created and starts getting used by bleeding-edge users).
 However, as just suggested, keeping them in closer sync need _not_
 imply that Java7 features be avoided indefinitely: again, with
 sufficient warning, Java6 support could be sunset within branch2.


One thing we could do is have a policy towards new features where there's
consensus that they won't go into branch-2, especially things in their own
JARs.

Here we could consider a policy of build set up to be Java 7 only, with
Java7 APIs.

That would be justified if there was some special java 7 feature -such as
its infiniband support. Add a feature like that in its own module (under
hadoop-tools, presumably), and use Java7 and Java 7 libraries. If someone
really did want to use the feature in hadoop-2, they'd be able to, in a
java7+ only backport.



 On a related note, Steve points out that we need to start thinking
 about Java8.  YES!!  Lambdas are a Really Big Deal!  If we sunset
 Java6 in a few quarters, maybe we can add Java8 compile and runtime
 (but not API) support about the same time.  This does NOT imply
 bringing Java8 APIs into branch2: Even if we do allow Java7 APIs into
 branch2 in the future, I doubt that bringing Java8 APIs into it will
 ever make sense.  However, if Java8 is a supported runtime environment
 for Hadoop 2, that sets us up for using Java8 APIs for the eventual
 branch3 sometime in 2015.


Testing Hadoop on Java 8 would let the rest of the stack move forward.

In the meantime, can I point out that both Scala-on-Java7 and
Groovy-on-Java 7 offer closures quite nicely, with performance by way of
INVOKEDYNAMIC opcodes.

-steve

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-05 Thread Colin McCabe
I've been using JDK7 for Hadoop development for a while now, and I
know a lot of other folks have as well.  Correct me if I'm wrong, but
what we're talking about here is not moving towards JDK7 but
breaking compatibility with JDK6.

There are a lot of good reasons to ditch JDK6.  It would let us use
new APIs in JDK7, especially the new file APIs.  It would let us
update a few dependencies to newer versions.

I don't like the idea of breaking compatibility with JDK6 in trunk,
but not in branch-2.  The traditional reason for putting something in
trunk but not in branch-2 is that it is new code that needs some time
to prove itself.  This doesn't really apply to incrementing min.jdk--
we could do that easily whenever we like.  Meanwhile, if trunk starts
accumulating jdk7-only code and dependencies, backports from trunk to
branch-2 will become harder and harder over time.

Since we make stable releases off of branch-2, and not trunk, I don't
see any upside to this.  To be honest, I see only negatives here.
More time backporting, more issues that show up only in production
(branch-2) and not on dev machines (trunk).

Maybe it's time to start thinking about what version of branch-2 will
drop jdk6 support.  But until there is such a version, I don't think
trunk should do it.

best,
Colin


On Fri, Apr 4, 2014 at 3:15 PM, Haohui Mai h...@hortonworks.com wrote:
 I'm referring to the later case. Indeed migrating JDK7 for branch-2 is more
 difficult.

 I think one reasonable approach is to put the hdfs / yarn clients into
 separate jars. The client-side jars can only use JDK6 APIs, so that
 downstream projects running on top of JDK6 continue to work.

 The HDFS/YARN/MR servers need to be run on top of JDK7, and we're free to
 use JDK7 APIs inside them. Given the fact that there're way more code in
 the server-side compared to the client-side, having the ability to use JDK7
 in the server-side only might still be a win.

 The downside I can think of is that it might complicate the effort of
 publishing maven jars, but this should be an one-time issue.

 ~Haohui


 On Fri, Apr 4, 2014 at 2:37 PM, Alejandro Abdelnur t...@cloudera.comwrote:

 Haohui,

 Is the idea to compile/test with JDK7 and recommend it for runtime and stop
 there? Or to start using JDK7 API stuff as well? If the later is the case,
 then backporting stuff to branch-2 may break and patches may have to be
 refactored for JDK6. Given that branch-2 got GA status not so long ago, I
 assume it will be active for a while.

 What are your thoughts on this regard?

 Thanks


 On Fri, Apr 4, 2014 at 2:29 PM, Haohui Mai h...@hortonworks.com wrote:

  Hi,
 
  There have been multiple discussions on deprecating supports of JDK6 and
  moving towards JDK7. It looks to me that the consensus is that now hadoop
  is ready to drop the support of JDK6 and to move towards JDK7. Based on
 the
  consensus, I wonder whether it is a good time to start the migration.
 
  Here are my understandings of the current status:
 
  1. There is no more public updates of JDK6 since Feb 2013. Users no
 longer
  get fixes of security vulnerabilities through official public updates.
  2. Hadoop core is stuck with out-of-date dependency unless moving towards
  JDK7. (see
  http://hadoop.6.n7.nabble.com/very-old-dependencies-td71486.html)
  The implementation can also benefit from it thanks to the new
  functionalities in JDK7.
  3. The code is ready for JDK7. Cloudera and Hortonworks have successful
  stories of supporting Hadoop on JDK7.
 
 
  It seems that the real work of moving to JDK7 is minimal. We only need to
  (1) make sure the jenkins are running on top of JDK7, and (2) to update
 the
  minimum required Java version from 6 to 7. Therefore I propose that let's
  move towards JDK7 in trunk in the short term.
 
  Your feedbacks are appreciated.
 
  Regards,
  Haohui
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 



 --
 Alejandro


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please 

Re: Plans of moving towards JDK7 in trunk

2014-04-05 Thread Steve Loughran
On 5 April 2014 11:53, Colin McCabe cmcc...@alumni.cmu.edu wrote:

 I've been using JDK7 for Hadoop development for a while now, and I
 know a lot of other folks have as well.  Correct me if I'm wrong, but
 what we're talking about here is not moving towards JDK7 but
 breaking compatibility with JDK6.


+1


 There are a lot of good reasons to ditch JDK6.  It would let us use
 new APIs in JDK7, especially the new file APIs.  It would let us
 update a few dependencies to newer versions.


+1



 I don't like the idea of breaking compatibility with JDK6 in trunk,
 but not in branch-2.  The traditional reason for putting something in
 trunk but not in branch-2 is that it is new code that needs some time
 to prove itself.


+1. branch-2 must continue to run on JDK6


 This doesn't really apply to incrementing min.jdk--
 we could do that easily whenever we like.  Meanwhile, if trunk starts
 accumulating jdk7-only code and dependencies, backports from trunk to
 branch-2 will become harder and harder over time.



I agree, but note that trunk diverges from branch-2 over time anyway -it's
happening.



 Since we make stable releases off of branch-2, and not trunk, I don't
 see any upside to this.  To be honest, I see only negatives here.
 More time backporting, more issues that show up only in production
 (branch-2) and not on dev machines (trunk).


 Maybe it's time to start thinking about what version of branch-2 will
 drop jdk6 support.  But until there is such a version, I don't think
 trunk should do it.




   1. Let's assume that branch-2 will never drop JDK6 -clusters are
   committed to it, and saying JDK updated needed will simply stop updates.
   2. By the hadoop 3.0 ships -2015?- JDK6 will be EOL, java 8 will be in
   common use, and even JDK7 seen as trailing edge.
   3. JDK7  improves JVM performance: NUMA, nativeIO c -which you get for
   free -as we're confident its stable there's no reason to not move to it in
   production.
   4. As we update the dependencies on hadoop 3, we'll end up upgrading to
   libraries that are JDK7+ only (jetty!), so JDK6 is implicitly abandoned.
   5. There are new packages and APIs in Java7 which we can adopt to make
   our lives better and development more productive -as well as improving the
   user experience.

as a case in point, java.io.File.mkdirs() says true if and only if the
directory was created; false otherwise  , and returns false in either of
the two cases:
 -the path resolves to a directory that exists
 -the path resolves to a file
think about that, anyone using local filesystems could write code that
assumes that mkdir()==0 is harmless, because if you apply it more than once
on a directory it is. But call it on a file and you don't get told its only
a file until you try to do something under it, and then things stop
behaving.

In comparison, java.nio.files.Files differentiates this case by declaring
FileAlreadyExistsException - if dir exists but is not a directory. Which
is the kind of thing that would make RawLocalFS behave a lot more like
HDFS. Similarly, if we could switch to Files.moveTo(), then the destination
file would stop being overwritten if it existed, so RawLocalFS's rename()
semantics would come closer to HDFS.

These are things we just can't do while retaining Java 6 compatibility.
-and why I am looking forward to the time when we can stop caring about
Java7.

Now, assuming that Hadoop 3.x will be Java7+ only, we have the option
between now and its future ship date to move to those Java7 APIs. So when
to make the move?

   1. It can be done late -in which case few changes will happen, nobody
   sees much benefit.
   2. We can do it now, and have 12+ months to adopt the new features, make
   the move -and be set up for Java 8 migration in later versions.

Yes, code that uses the new APIs won't work on Java6, but that doesn't mean
it shouldn't happen Hadoop made the jump from Java 5 to Java 6 after all.

-Steve

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-05 Thread Raymie Stata
To summarize the thread so far:

a) Java7 is already a supported compile- and runtime environment for
Hadoop branch2 and trunk
b) Java6 must remain a supported compile- and runtime environment for
Hadoop branch2
c) (b) implies that branch2 must stick to Java6 APIs

I wonder if point (b) should be revised.  We could immediately
deprecate Java6 as a runtime (and thus compile-time) environment for
Hadoop.  We could end support for in some published time frame
(perhaps 3Q2014).  That is, we'd say that all future 2.x release past
some date would not be guaranteed to run on Java6.  This would set us
up for using Java7 APIs into branch2.

An alternative might be to keep branch2 on Java6 APIs forever, and to
start using Java7 APIs in trunk relatively soon.  The concern here
would be that trunk isn't getting the kind of production torture
testing that branch2 is subjected to, and won't be for a while.  If
trunk and branch2 diverge too much too quickly, trunk could become a
nest of bugs, endangering the timeline and quality of Hadoop 3.  This
would argue for keeping trunk and branch2 in closer sync (maybe until
a branch3 is created and starts getting used by bleeding-edge users).
However, as just suggested, keeping them in closer sync need _not_
imply that Java7 features be avoided indefinitely: again, with
sufficient warning, Java6 support could be sunset within branch2.

On a related note, Steve points out that we need to start thinking
about Java8.  YES!!  Lambdas are a Really Big Deal!  If we sunset
Java6 in a few quarters, maybe we can add Java8 compile and runtime
(but not API) support about the same time.  This does NOT imply
bringing Java8 APIs into branch2: Even if we do allow Java7 APIs into
branch2 in the future, I doubt that bringing Java8 APIs into it will
ever make sense.  However, if Java8 is a supported runtime environment
for Hadoop 2, that sets us up for using Java8 APIs for the eventual
branch3 sometime in 2015.


On Sat, Apr 5, 2014 at 10:52 AM, Steve Loughran ste...@hortonworks.com wrote:
 On 5 April 2014 11:53, Colin McCabe cmcc...@alumni.cmu.edu wrote:

 I've been using JDK7 for Hadoop development for a while now, and I
 know a lot of other folks have as well.  Correct me if I'm wrong, but
 what we're talking about here is not moving towards JDK7 but
 breaking compatibility with JDK6.


 +1


 There are a lot of good reasons to ditch JDK6.  It would let us use
 new APIs in JDK7, especially the new file APIs.  It would let us
 update a few dependencies to newer versions.


 +1



 I don't like the idea of breaking compatibility with JDK6 in trunk,
 but not in branch-2.  The traditional reason for putting something in
 trunk but not in branch-2 is that it is new code that needs some time
 to prove itself.


 +1. branch-2 must continue to run on JDK6


 This doesn't really apply to incrementing min.jdk--
 we could do that easily whenever we like.  Meanwhile, if trunk starts
 accumulating jdk7-only code and dependencies, backports from trunk to
 branch-2 will become harder and harder over time.



 I agree, but note that trunk diverges from branch-2 over time anyway -it's
 happening.



 Since we make stable releases off of branch-2, and not trunk, I don't
 see any upside to this.  To be honest, I see only negatives here.
 More time backporting, more issues that show up only in production
 (branch-2) and not on dev machines (trunk).


 Maybe it's time to start thinking about what version of branch-2 will
 drop jdk6 support.  But until there is such a version, I don't think
 trunk should do it.




1. Let's assume that branch-2 will never drop JDK6 -clusters are
committed to it, and saying JDK updated needed will simply stop updates.
2. By the hadoop 3.0 ships -2015?- JDK6 will be EOL, java 8 will be in
common use, and even JDK7 seen as trailing edge.
3. JDK7  improves JVM performance: NUMA, nativeIO c -which you get for
free -as we're confident its stable there's no reason to not move to it in
production.
4. As we update the dependencies on hadoop 3, we'll end up upgrading to
libraries that are JDK7+ only (jetty!), so JDK6 is implicitly abandoned.
5. There are new packages and APIs in Java7 which we can adopt to make
our lives better and development more productive -as well as improving the
user experience.

 as a case in point, java.io.File.mkdirs() says true if and only if the
 directory was created; false otherwise  , and returns false in either of
 the two cases:
  -the path resolves to a directory that exists
  -the path resolves to a file
 think about that, anyone using local filesystems could write code that
 assumes that mkdir()==0 is harmless, because if you apply it more than once
 on a directory it is. But call it on a file and you don't get told its only
 a file until you try to do something under it, and then things stop
 behaving.

 In comparison, java.nio.files.Files differentiates this case by declaring
 FileAlreadyExistsException - 

Re: Plans of moving towards JDK7 in trunk

2014-04-04 Thread Alejandro Abdelnur
Haohui,

Is the idea to compile/test with JDK7 and recommend it for runtime and stop
there? Or to start using JDK7 API stuff as well? If the later is the case,
then backporting stuff to branch-2 may break and patches may have to be
refactored for JDK6. Given that branch-2 got GA status not so long ago, I
assume it will be active for a while.

What are your thoughts on this regard?

Thanks


On Fri, Apr 4, 2014 at 2:29 PM, Haohui Mai h...@hortonworks.com wrote:

 Hi,

 There have been multiple discussions on deprecating supports of JDK6 and
 moving towards JDK7. It looks to me that the consensus is that now hadoop
 is ready to drop the support of JDK6 and to move towards JDK7. Based on the
 consensus, I wonder whether it is a good time to start the migration.

 Here are my understandings of the current status:

 1. There is no more public updates of JDK6 since Feb 2013. Users no longer
 get fixes of security vulnerabilities through official public updates.
 2. Hadoop core is stuck with out-of-date dependency unless moving towards
 JDK7. (see
 http://hadoop.6.n7.nabble.com/very-old-dependencies-td71486.html)
 The implementation can also benefit from it thanks to the new
 functionalities in JDK7.
 3. The code is ready for JDK7. Cloudera and Hortonworks have successful
 stories of supporting Hadoop on JDK7.


 It seems that the real work of moving to JDK7 is minimal. We only need to
 (1) make sure the jenkins are running on top of JDK7, and (2) to update the
 minimum required Java version from 6 to 7. Therefore I propose that let's
 move towards JDK7 in trunk in the short term.

 Your feedbacks are appreciated.

 Regards,
 Haohui

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Alejandro


Re: Plans of moving towards JDK7 in trunk

2014-04-04 Thread Haohui Mai
I'm referring to the later case. Indeed migrating JDK7 for branch-2 is more
difficult.

I think one reasonable approach is to put the hdfs / yarn clients into
separate jars. The client-side jars can only use JDK6 APIs, so that
downstream projects running on top of JDK6 continue to work.

The HDFS/YARN/MR servers need to be run on top of JDK7, and we're free to
use JDK7 APIs inside them. Given the fact that there're way more code in
the server-side compared to the client-side, having the ability to use JDK7
in the server-side only might still be a win.

The downside I can think of is that it might complicate the effort of
publishing maven jars, but this should be an one-time issue.

~Haohui


On Fri, Apr 4, 2014 at 2:37 PM, Alejandro Abdelnur t...@cloudera.comwrote:

 Haohui,

 Is the idea to compile/test with JDK7 and recommend it for runtime and stop
 there? Or to start using JDK7 API stuff as well? If the later is the case,
 then backporting stuff to branch-2 may break and patches may have to be
 refactored for JDK6. Given that branch-2 got GA status not so long ago, I
 assume it will be active for a while.

 What are your thoughts on this regard?

 Thanks


 On Fri, Apr 4, 2014 at 2:29 PM, Haohui Mai h...@hortonworks.com wrote:

  Hi,
 
  There have been multiple discussions on deprecating supports of JDK6 and
  moving towards JDK7. It looks to me that the consensus is that now hadoop
  is ready to drop the support of JDK6 and to move towards JDK7. Based on
 the
  consensus, I wonder whether it is a good time to start the migration.
 
  Here are my understandings of the current status:
 
  1. There is no more public updates of JDK6 since Feb 2013. Users no
 longer
  get fixes of security vulnerabilities through official public updates.
  2. Hadoop core is stuck with out-of-date dependency unless moving towards
  JDK7. (see
  http://hadoop.6.n7.nabble.com/very-old-dependencies-td71486.html)
  The implementation can also benefit from it thanks to the new
  functionalities in JDK7.
  3. The code is ready for JDK7. Cloudera and Hortonworks have successful
  stories of supporting Hadoop on JDK7.
 
 
  It seems that the real work of moving to JDK7 is minimal. We only need to
  (1) make sure the jenkins are running on top of JDK7, and (2) to update
 the
  minimum required Java version from 6 to 7. Therefore I propose that let's
  move towards JDK7 in trunk in the short term.
 
  Your feedbacks are appreciated.
 
  Regards,
  Haohui
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 



 --
 Alejandro


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-04 Thread Sangjin Lee
Please don't forget the mac os build on JDK 7. :)


On Fri, Apr 4, 2014 at 3:15 PM, Haohui Mai h...@hortonworks.com wrote:

 I'm referring to the later case. Indeed migrating JDK7 for branch-2 is more
 difficult.

 I think one reasonable approach is to put the hdfs / yarn clients into
 separate jars. The client-side jars can only use JDK6 APIs, so that
 downstream projects running on top of JDK6 continue to work.


It might not be as clear cut. For clients to run clean on JDK 6, not only
the client projects/artifacts but also any of their dependencies must be
free of JDK 7 code. And this obviously includes things like hadoop-common
(or any downstream dependencies for that matter).



 The HDFS/YARN/MR servers need to be run on top of JDK7, and we're free to
 use JDK7 APIs inside them. Given the fact that there're way more code in
 the server-side compared to the client-side, having the ability to use JDK7
 in the server-side only might still be a win.

 The downside I can think of is that it might complicate the effort of
 publishing maven jars, but this should be an one-time issue.


Could you elaborate on why it would complicate maven jar publication?
Perhaps I'm over-simplifying things, but I would have thought it could be
easily achieved by marking certain project poms with source/target 1.6 in
their maven compiler plugin configuration while upgrading the default
setting to 1.7. Do you anticipate more issues?



 ~Haohui


 On Fri, Apr 4, 2014 at 2:37 PM, Alejandro Abdelnur t...@cloudera.com
 wrote:

  Haohui,
 
  Is the idea to compile/test with JDK7 and recommend it for runtime and
 stop
  there? Or to start using JDK7 API stuff as well? If the later is the
 case,
  then backporting stuff to branch-2 may break and patches may have to be
  refactored for JDK6. Given that branch-2 got GA status not so long ago, I
  assume it will be active for a while.
 
  What are your thoughts on this regard?
 
  Thanks
 
 
  On Fri, Apr 4, 2014 at 2:29 PM, Haohui Mai h...@hortonworks.com wrote:
 
   Hi,
  
   There have been multiple discussions on deprecating supports of JDK6
 and
   moving towards JDK7. It looks to me that the consensus is that now
 hadoop
   is ready to drop the support of JDK6 and to move towards JDK7. Based on
  the
   consensus, I wonder whether it is a good time to start the migration.
  
   Here are my understandings of the current status:
  
   1. There is no more public updates of JDK6 since Feb 2013. Users no
  longer
   get fixes of security vulnerabilities through official public updates.
   2. Hadoop core is stuck with out-of-date dependency unless moving
 towards
   JDK7. (see
   http://hadoop.6.n7.nabble.com/very-old-dependencies-td71486.html)
   The implementation can also benefit from it thanks to the new
   functionalities in JDK7.
   3. The code is ready for JDK7. Cloudera and Hortonworks have successful
   stories of supporting Hadoop on JDK7.
  
  
   It seems that the real work of moving to JDK7 is minimal. We only need
 to
   (1) make sure the jenkins are running on top of JDK7, and (2) to update
  the
   minimum required Java version from 6 to 7. Therefore I propose that
 let's
   move towards JDK7 in trunk in the short term.
  
   Your feedbacks are appreciated.
  
   Regards,
   Haohui
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
  to
   which it is addressed and may contain information that is confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 
 
 
  --
  Alejandro
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Plans of moving towards JDK7 in trunk

2014-04-04 Thread Haohui Mai
bq. It might not be as clear cut...

Totally agree. I think the key is that we can do the work in an incremental
way. We can only introduce JDK7 dependency on the server side. In order to
do this we need to separate the client-side code to separate jars. I've
already proposed to create a hdfs-client jar in the hdfs-dev mailing list.

bq.  I would have thought it could be easily achieved by marking certain
project poms with source/target 1.6 in their maven compiler plugin
configuration while upgrading the default setting to 1.7. Do you anticipate
more issues?

Correct me if I'm wrong, but I think that's enough. The work should be
minimal.

~Haohui

On Fri, Apr 4, 2014 at 3:43 PM, Sangjin Lee sj...@apache.org wrote:

 Please don't forget the mac os build on JDK 7. :)


 On Fri, Apr 4, 2014 at 3:15 PM, Haohui Mai h...@hortonworks.com wrote:

  I'm referring to the later case. Indeed migrating JDK7 for branch-2 is
 more
  difficult.
 
  I think one reasonable approach is to put the hdfs / yarn clients into
  separate jars. The client-side jars can only use JDK6 APIs, so that
  downstream projects running on top of JDK6 continue to work.
 

 It might not be as clear cut. For clients to run clean on JDK 6, not only
 the client projects/artifacts but also any of their dependencies must be
 free of JDK 7 code. And this obviously includes things like hadoop-common
 (or any downstream dependencies for that matter).


 
  The HDFS/YARN/MR servers need to be run on top of JDK7, and we're free to
  use JDK7 APIs inside them. Given the fact that there're way more code in
  the server-side compared to the client-side, having the ability to use
 JDK7
  in the server-side only might still be a win.
 
  The downside I can think of is that it might complicate the effort of
  publishing maven jars, but this should be an one-time issue.
 

 Could you elaborate on why it would complicate maven jar publication?
 Perhaps I'm over-simplifying things, but I would have thought it could be
 easily achieved by marking certain project poms with source/target 1.6 in
 their maven compiler plugin configuration while upgrading the default
 setting to 1.7. Do you anticipate more issues?


 
  ~Haohui
 
 
  On Fri, Apr 4, 2014 at 2:37 PM, Alejandro Abdelnur t...@cloudera.com
  wrote:
 
   Haohui,
  
   Is the idea to compile/test with JDK7 and recommend it for runtime and
  stop
   there? Or to start using JDK7 API stuff as well? If the later is the
  case,
   then backporting stuff to branch-2 may break and patches may have to be
   refactored for JDK6. Given that branch-2 got GA status not so long
 ago, I
   assume it will be active for a while.
  
   What are your thoughts on this regard?
  
   Thanks
  
  
   On Fri, Apr 4, 2014 at 2:29 PM, Haohui Mai h...@hortonworks.com
 wrote:
  
Hi,
   
There have been multiple discussions on deprecating supports of JDK6
  and
moving towards JDK7. It looks to me that the consensus is that now
  hadoop
is ready to drop the support of JDK6 and to move towards JDK7. Based
 on
   the
consensus, I wonder whether it is a good time to start the migration.
   
Here are my understandings of the current status:
   
1. There is no more public updates of JDK6 since Feb 2013. Users no
   longer
get fixes of security vulnerabilities through official public
 updates.
2. Hadoop core is stuck with out-of-date dependency unless moving
  towards
JDK7. (see
http://hadoop.6.n7.nabble.com/very-old-dependencies-td71486.html)
The implementation can also benefit from it thanks to the new
functionalities in JDK7.
3. The code is ready for JDK7. Cloudera and Hortonworks have
 successful
stories of supporting Hadoop on JDK7.
   
   
It seems that the real work of moving to JDK7 is minimal. We only
 need
  to
(1) make sure the jenkins are running on top of JDK7, and (2) to
 update
   the
minimum required Java version from 6 to 7. Therefore I propose that
  let's
move towards JDK7 in trunk in the short term.
   
Your feedbacks are appreciated.
   
Regards,
Haohui
   
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
  entity
   to
which it is addressed and may contain information that is
 confidential,
privileged and exempt from disclosure under applicable law. If the
  reader
of this message is not the intended recipient, you are hereby
 notified
   that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
   immediately
and delete it from your system. Thank You.
   
  
  
  
   --
   Alejandro
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure 

Re: Plans of moving towards JDK7 in trunk

2014-04-04 Thread Alejandro Abdelnur
So, you want to compile hdfs/yarn/mapred clients (and hadoop-common and
hadoop-auth) with JDK6 and the rest with JDK7?


On Fri, Apr 4, 2014 at 3:15 PM, Haohui Mai h...@hortonworks.com wrote:

 I'm referring to the later case. Indeed migrating JDK7 for branch-2 is more
 difficult.

 I think one reasonable approach is to put the hdfs / yarn clients into
 separate jars. The client-side jars can only use JDK6 APIs, so that
 downstream projects running on top of JDK6 continue to work.

 The HDFS/YARN/MR servers need to be run on top of JDK7, and we're free to
 use JDK7 APIs inside them. Given the fact that there're way more code in
 the server-side compared to the client-side, having the ability to use JDK7
 in the server-side only might still be a win.

 The downside I can think of is that it might complicate the effort of
 publishing maven jars, but this should be an one-time issue.

 ~Haohui


 On Fri, Apr 4, 2014 at 2:37 PM, Alejandro Abdelnur t...@cloudera.com
 wrote:

  Haohui,
 
  Is the idea to compile/test with JDK7 and recommend it for runtime and
 stop
  there? Or to start using JDK7 API stuff as well? If the later is the
 case,
  then backporting stuff to branch-2 may break and patches may have to be
  refactored for JDK6. Given that branch-2 got GA status not so long ago, I
  assume it will be active for a while.
 
  What are your thoughts on this regard?
 
  Thanks
 
 
  On Fri, Apr 4, 2014 at 2:29 PM, Haohui Mai h...@hortonworks.com wrote:
 
   Hi,
  
   There have been multiple discussions on deprecating supports of JDK6
 and
   moving towards JDK7. It looks to me that the consensus is that now
 hadoop
   is ready to drop the support of JDK6 and to move towards JDK7. Based on
  the
   consensus, I wonder whether it is a good time to start the migration.
  
   Here are my understandings of the current status:
  
   1. There is no more public updates of JDK6 since Feb 2013. Users no
  longer
   get fixes of security vulnerabilities through official public updates.
   2. Hadoop core is stuck with out-of-date dependency unless moving
 towards
   JDK7. (see
   http://hadoop.6.n7.nabble.com/very-old-dependencies-td71486.html)
   The implementation can also benefit from it thanks to the new
   functionalities in JDK7.
   3. The code is ready for JDK7. Cloudera and Hortonworks have successful
   stories of supporting Hadoop on JDK7.
  
  
   It seems that the real work of moving to JDK7 is minimal. We only need
 to
   (1) make sure the jenkins are running on top of JDK7, and (2) to update
  the
   minimum required Java version from 6 to 7. Therefore I propose that
 let's
   move towards JDK7 in trunk in the short term.
  
   Your feedbacks are appreciated.
  
   Regards,
   Haohui
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
  to
   which it is addressed and may contain information that is confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 
 
 
  --
  Alejandro
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Alejandro