Re: Moving to JDK7, JDK8 and new major releases
Following up on ecosystem, I just took a look at the Apache trunk pom.xml files for HBase, Flume and Oozie. All are specifying 1.6 for source and target in the maven-compiler-plugin configuration, so there may be additional follow-up required here. (For example, if HBase has made a statement that its client will continue to support JDK6, then it wouldn't be practical for them to link to a JDK7 version of hadoop-common.) +1 for the whole plan though. We can work through these details. Chris Nauroth Hortonworks http://hortonworks.com/ On Fri, Jun 27, 2014 at 3:10 PM, Karthik Kambatla ka...@cloudera.com wrote: +1 to making 2.6 the last JDK6 release. If we want, 2.7 could be a parallel release or one soon after 2.6. We could upgrade other dependencies that require JDK7 as well. On Fri, Jun 27, 2014 at 3:01 PM, Arun C. Murthy a...@hortonworks.com wrote: Thanks everyone for the discussion. Looks like we have come to a pragmatic and progressive conclusion. In terms of execution of the consensus plan, I think a little bit of caution is in order. Let's give downstream projects more of a runway. I propose we inform HBase, Pig, Hive etc. that we are considering making 2.6 (not 2.5) the last JDK6 release and solicit their feedback. Once they are comfortable we can pull the trigger in 2.7. thanks, Arun On Jun 27, 2014, at 11:34 AM, Karthik Kambatla ka...@cloudera.com wrote: As someone else already mentioned, we should announce one future release (may be, 2.5) as the last JDK6-based release before making the move to JDK7. I am comfortable calling 2.5 the last JDK6 release. On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, responding to multiple messages here, Arun, thanks for the clarification regarding MR classpaths. It sounds like the story there is improved and still improving. However, I think we still suffer from this at least on the HDFS side. We have a single JAR for all of HDFS, and our clients need to have all the fun deps like Guava on the classpath. I'm told Spark sticks a newer Guava at the front of the classpath and the HDFS client still works okay, but this is more happy coincidence than anything else. While we're leaking deps, we're in a scary situation. API compat to me means that an app should be able to run on a new minor version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and have nothing break. If we muck with the classpath, my understanding is that this could break. Owen, bumping the minimum JDK version in a minor release like this should be a one-time exception as Tucu stated. A number of people have pointed out how painful a forced JDK upgrade is for end users, and it's not something we should be springing on them in a minor release unless we're *very* confident like in this case. Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on JDK7 across the CDH stack, so I think that's an indication that most ecosystem projects are ready to make the jump. Is that sufficient in your mind? For the record, I'm also +1 on the Tucu plan. Is it too late to do this for 2.5? I'll offer to help out with some of the mechanics. Thanks, Andrew On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I understood the plan for avoiding JDK7-specific features in our code, and your suggestion to add an extra Jenkins job is a great way to guard against that. The thing I haven't seen discussed yet is how downstream projects will continue to consume our built artifacts. If a downstream project upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but their project is still building with 1.6, then it would be a nasty surprise. These are the options I see: 1. Make sure all other projects upgrade first. This doesn't sound feasible, unless all other ecosystem projects have moved to JDK7 already. If not, then waiting on a single long pole project would hold up our migration indefinitely. 2. We switch to JDK7, but run javac with -target 1.6 until the whole ecosystem upgrades. I find this undesirable, because in a certain sense, it still leaves a bit of 1.6 lingering in the project. (I'll assume that end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.) 3. Just declare a clean break on some version (your earlier email said 2.5) and start publishing artifacts built with JDK7 and no -target option. Overall, this is my preferred option. However, as a side effect, this sets us up for longer-term maintenance and patch
Re: Moving to JDK7, JDK8 and new major releases
Guava is a separate problem and I think we should have a separate discussion what can we do about guava? That's more traumatic than a JDK update, I fear, as the guava releases care a lot less about compatibility. I don't worry about JDK updates removing classes like StringBuffer because StringBuilder is better. On 27 June 2014 19:26, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, responding to multiple messages here, Arun, thanks for the clarification regarding MR classpaths. It sounds like the story there is improved and still improving. However, I think we still suffer from this at least on the HDFS side. We have a single JAR for all of HDFS, and our clients need to have all the fun deps like Guava on the classpath. I'm told Spark sticks a newer Guava at the front of the classpath and the HDFS client still works okay, but this is more happy coincidence than anything else. While we're leaking deps, we're in a scary situation. very good point. API compat to me means that an app should be able to run on a new minor version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and have nothing break. If we muck with the classpath, my understanding is that this could break. I think this is possible by having the app upload all the JARs...I need to experiment here myself. Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on JDK7 across the CDH stack, so I think that's an indication that most ecosystem projects are ready to make the jump. Is that sufficient in your mind? +1, we've had no complaints about things not working on Java 7. It's been out a long time. IF you look at our own code, the main thing that broke were tests -due to junit test case ordering- and not much else. For the record, I'm also +1 on the Tucu plan. Is it too late to do this for 2.5? I'll offer to help out with some of the mechanics. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Moving to JDK7, JDK8 and new major releases
As someone else already mentioned, we should announce one future release (may be, 2.5) as the last JDK6-based release before making the move to JDK7. I am comfortable calling 2.5 the last JDK6 release. On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, responding to multiple messages here, Arun, thanks for the clarification regarding MR classpaths. It sounds like the story there is improved and still improving. However, I think we still suffer from this at least on the HDFS side. We have a single JAR for all of HDFS, and our clients need to have all the fun deps like Guava on the classpath. I'm told Spark sticks a newer Guava at the front of the classpath and the HDFS client still works okay, but this is more happy coincidence than anything else. While we're leaking deps, we're in a scary situation. API compat to me means that an app should be able to run on a new minor version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and have nothing break. If we muck with the classpath, my understanding is that this could break. Owen, bumping the minimum JDK version in a minor release like this should be a one-time exception as Tucu stated. A number of people have pointed out how painful a forced JDK upgrade is for end users, and it's not something we should be springing on them in a minor release unless we're *very* confident like in this case. Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on JDK7 across the CDH stack, so I think that's an indication that most ecosystem projects are ready to make the jump. Is that sufficient in your mind? For the record, I'm also +1 on the Tucu plan. Is it too late to do this for 2.5? I'll offer to help out with some of the mechanics. Thanks, Andrew On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I understood the plan for avoiding JDK7-specific features in our code, and your suggestion to add an extra Jenkins job is a great way to guard against that. The thing I haven't seen discussed yet is how downstream projects will continue to consume our built artifacts. If a downstream project upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but their project is still building with 1.6, then it would be a nasty surprise. These are the options I see: 1. Make sure all other projects upgrade first. This doesn't sound feasible, unless all other ecosystem projects have moved to JDK7 already. If not, then waiting on a single long pole project would hold up our migration indefinitely. 2. We switch to JDK7, but run javac with -target 1.6 until the whole ecosystem upgrades. I find this undesirable, because in a certain sense, it still leaves a bit of 1.6 lingering in the project. (I'll assume that end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.) 3. Just declare a clean break on some version (your earlier email said 2.5) and start publishing artifacts built with JDK7 and no -target option. Overall, this is my preferred option. However, as a side effect, this sets us up for longer-term maintenance and patch releases off of the 2.4 branch if a downstream project that's still on 1.6 needs to pick up a critical bug fix. Of course, this is all a moot point if all the downstream ecosystem projects have already made the switch to JDK7. I don't know the status of that off the top of my head. Maybe someone else out there knows? If not, then I expect I can free up enough in a few weeks to volunteer for tracking down that information. Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com wrote: Chris, Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you are still using jdk7 libraries and you could use new APIs, thus breaking jdk6 both at compile and runtime. you need to compile with jdk6 to ensure you are not running into that scenario. that is why i was suggesting the nightly jdk6 build/test jenkins job. On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I'm also +1 for getting us to JDK7 within the 2.x line after reading the proposals and catching up on the discussion in this thread. Has anyone yet considered how to coordinate this change with downstream projects? Would we request downstream projects to upgrade to JDK7 first before we make the move? Would we switch to JDK7, but run javac -target 1.6 to maintain compatibility for downstream projects during an interim period? Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25,
Re: Moving to JDK7, JDK8 and new major releases
FYI I also just updated the wiki page with a Proposal D, aka Tucu plan, which I think is essentially Proposal C but tabling JDK8 plans for now. https://wiki.apache.org/hadoop/MovingToJdk7and8 Karthik, thanks for ringing in re: 2.5. I guess there's nothing urgently required, the Jenkins stuff just needs to happen before 2.6. Still, I'm happy to help with anything. Thanks, Andrew On Fri, Jun 27, 2014 at 11:34 AM, Karthik Kambatla ka...@cloudera.com wrote: As someone else already mentioned, we should announce one future release (may be, 2.5) as the last JDK6-based release before making the move to JDK7. I am comfortable calling 2.5 the last JDK6 release. On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, responding to multiple messages here, Arun, thanks for the clarification regarding MR classpaths. It sounds like the story there is improved and still improving. However, I think we still suffer from this at least on the HDFS side. We have a single JAR for all of HDFS, and our clients need to have all the fun deps like Guava on the classpath. I'm told Spark sticks a newer Guava at the front of the classpath and the HDFS client still works okay, but this is more happy coincidence than anything else. While we're leaking deps, we're in a scary situation. API compat to me means that an app should be able to run on a new minor version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and have nothing break. If we muck with the classpath, my understanding is that this could break. Owen, bumping the minimum JDK version in a minor release like this should be a one-time exception as Tucu stated. A number of people have pointed out how painful a forced JDK upgrade is for end users, and it's not something we should be springing on them in a minor release unless we're *very* confident like in this case. Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on JDK7 across the CDH stack, so I think that's an indication that most ecosystem projects are ready to make the jump. Is that sufficient in your mind? For the record, I'm also +1 on the Tucu plan. Is it too late to do this for 2.5? I'll offer to help out with some of the mechanics. Thanks, Andrew On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I understood the plan for avoiding JDK7-specific features in our code, and your suggestion to add an extra Jenkins job is a great way to guard against that. The thing I haven't seen discussed yet is how downstream projects will continue to consume our built artifacts. If a downstream project upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but their project is still building with 1.6, then it would be a nasty surprise. These are the options I see: 1. Make sure all other projects upgrade first. This doesn't sound feasible, unless all other ecosystem projects have moved to JDK7 already. If not, then waiting on a single long pole project would hold up our migration indefinitely. 2. We switch to JDK7, but run javac with -target 1.6 until the whole ecosystem upgrades. I find this undesirable, because in a certain sense, it still leaves a bit of 1.6 lingering in the project. (I'll assume that end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.) 3. Just declare a clean break on some version (your earlier email said 2.5) and start publishing artifacts built with JDK7 and no -target option. Overall, this is my preferred option. However, as a side effect, this sets us up for longer-term maintenance and patch releases off of the 2.4 branch if a downstream project that's still on 1.6 needs to pick up a critical bug fix. Of course, this is all a moot point if all the downstream ecosystem projects have already made the switch to JDK7. I don't know the status of that off the top of my head. Maybe someone else out there knows? If not, then I expect I can free up enough in a few weeks to volunteer for tracking down that information. Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com wrote: Chris, Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you are still using jdk7 libraries and you could use new APIs, thus breaking jdk6 both at compile and runtime. you need to compile with jdk6 to ensure you are not running into that scenario. that is why i was suggesting the nightly jdk6 build/test jenkins job. On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth
Re: Moving to JDK7, JDK8 and new major releases
Thanks everyone for the discussion. Looks like we have come to a pragmatic and progressive conclusion. In terms of execution of the consensus plan, I think a little bit of caution is in order. Let's give downstream projects more of a runway. I propose we inform HBase, Pig, Hive etc. that we are considering making 2.6 (not 2.5) the last JDK6 release and solicit their feedback. Once they are comfortable we can pull the trigger in 2.7. thanks, Arun On Jun 27, 2014, at 11:34 AM, Karthik Kambatla ka...@cloudera.com wrote: As someone else already mentioned, we should announce one future release (may be, 2.5) as the last JDK6-based release before making the move to JDK7. I am comfortable calling 2.5 the last JDK6 release. On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, responding to multiple messages here, Arun, thanks for the clarification regarding MR classpaths. It sounds like the story there is improved and still improving. However, I think we still suffer from this at least on the HDFS side. We have a single JAR for all of HDFS, and our clients need to have all the fun deps like Guava on the classpath. I'm told Spark sticks a newer Guava at the front of the classpath and the HDFS client still works okay, but this is more happy coincidence than anything else. While we're leaking deps, we're in a scary situation. API compat to me means that an app should be able to run on a new minor version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and have nothing break. If we muck with the classpath, my understanding is that this could break. Owen, bumping the minimum JDK version in a minor release like this should be a one-time exception as Tucu stated. A number of people have pointed out how painful a forced JDK upgrade is for end users, and it's not something we should be springing on them in a minor release unless we're *very* confident like in this case. Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on JDK7 across the CDH stack, so I think that's an indication that most ecosystem projects are ready to make the jump. Is that sufficient in your mind? For the record, I'm also +1 on the Tucu plan. Is it too late to do this for 2.5? I'll offer to help out with some of the mechanics. Thanks, Andrew On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I understood the plan for avoiding JDK7-specific features in our code, and your suggestion to add an extra Jenkins job is a great way to guard against that. The thing I haven't seen discussed yet is how downstream projects will continue to consume our built artifacts. If a downstream project upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but their project is still building with 1.6, then it would be a nasty surprise. These are the options I see: 1. Make sure all other projects upgrade first. This doesn't sound feasible, unless all other ecosystem projects have moved to JDK7 already. If not, then waiting on a single long pole project would hold up our migration indefinitely. 2. We switch to JDK7, but run javac with -target 1.6 until the whole ecosystem upgrades. I find this undesirable, because in a certain sense, it still leaves a bit of 1.6 lingering in the project. (I'll assume that end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.) 3. Just declare a clean break on some version (your earlier email said 2.5) and start publishing artifacts built with JDK7 and no -target option. Overall, this is my preferred option. However, as a side effect, this sets us up for longer-term maintenance and patch releases off of the 2.4 branch if a downstream project that's still on 1.6 needs to pick up a critical bug fix. Of course, this is all a moot point if all the downstream ecosystem projects have already made the switch to JDK7. I don't know the status of that off the top of my head. Maybe someone else out there knows? If not, then I expect I can free up enough in a few weeks to volunteer for tracking down that information. Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com wrote: Chris, Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you are still using jdk7 libraries and you could use new APIs, thus breaking jdk6 both at compile and runtime. you need to compile with jdk6 to ensure you are not running into that scenario. that is why i was suggesting the nightly jdk6 build/test jenkins job. On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I'm also +1 for getting us to JDK7 within the 2.x line after reading the
Re: Moving to JDK7, JDK8 and new major releases
+1 to making 2.6 the last JDK6 release. If we want, 2.7 could be a parallel release or one soon after 2.6. We could upgrade other dependencies that require JDK7 as well. On Fri, Jun 27, 2014 at 3:01 PM, Arun C. Murthy a...@hortonworks.com wrote: Thanks everyone for the discussion. Looks like we have come to a pragmatic and progressive conclusion. In terms of execution of the consensus plan, I think a little bit of caution is in order. Let's give downstream projects more of a runway. I propose we inform HBase, Pig, Hive etc. that we are considering making 2.6 (not 2.5) the last JDK6 release and solicit their feedback. Once they are comfortable we can pull the trigger in 2.7. thanks, Arun On Jun 27, 2014, at 11:34 AM, Karthik Kambatla ka...@cloudera.com wrote: As someone else already mentioned, we should announce one future release (may be, 2.5) as the last JDK6-based release before making the move to JDK7. I am comfortable calling 2.5 the last JDK6 release. On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, responding to multiple messages here, Arun, thanks for the clarification regarding MR classpaths. It sounds like the story there is improved and still improving. However, I think we still suffer from this at least on the HDFS side. We have a single JAR for all of HDFS, and our clients need to have all the fun deps like Guava on the classpath. I'm told Spark sticks a newer Guava at the front of the classpath and the HDFS client still works okay, but this is more happy coincidence than anything else. While we're leaking deps, we're in a scary situation. API compat to me means that an app should be able to run on a new minor version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and have nothing break. If we muck with the classpath, my understanding is that this could break. Owen, bumping the minimum JDK version in a minor release like this should be a one-time exception as Tucu stated. A number of people have pointed out how painful a forced JDK upgrade is for end users, and it's not something we should be springing on them in a minor release unless we're *very* confident like in this case. Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on JDK7 across the CDH stack, so I think that's an indication that most ecosystem projects are ready to make the jump. Is that sufficient in your mind? For the record, I'm also +1 on the Tucu plan. Is it too late to do this for 2.5? I'll offer to help out with some of the mechanics. Thanks, Andrew On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I understood the plan for avoiding JDK7-specific features in our code, and your suggestion to add an extra Jenkins job is a great way to guard against that. The thing I haven't seen discussed yet is how downstream projects will continue to consume our built artifacts. If a downstream project upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but their project is still building with 1.6, then it would be a nasty surprise. These are the options I see: 1. Make sure all other projects upgrade first. This doesn't sound feasible, unless all other ecosystem projects have moved to JDK7 already. If not, then waiting on a single long pole project would hold up our migration indefinitely. 2. We switch to JDK7, but run javac with -target 1.6 until the whole ecosystem upgrades. I find this undesirable, because in a certain sense, it still leaves a bit of 1.6 lingering in the project. (I'll assume that end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.) 3. Just declare a clean break on some version (your earlier email said 2.5) and start publishing artifacts built with JDK7 and no -target option. Overall, this is my preferred option. However, as a side effect, this sets us up for longer-term maintenance and patch releases off of the 2.4 branch if a downstream project that's still on 1.6 needs to pick up a critical bug fix. Of course, this is all a moot point if all the downstream ecosystem projects have already made the switch to JDK7. I don't know the status of that off the top of my head. Maybe someone else out there knows? If not, then I expect I can free up enough in a few weeks to volunteer for tracking down that information. Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com wrote: Chris, Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you are still using jdk7 libraries and you could use new APIs, thus breaking
Re: Moving to JDK7, JDK8 and new major releases
On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com wrote: After reading this thread and thinking a bit about it, I think it should be OK such move up to JDK7 in Hadoop I agree with Alejandro. Changing minimum JDKs is not an incompatible change and is fine in the 2 branch. (Although I think it is would *not* be appropriate for a patch release.) Of course we need to do it with forethought and testing, but moving off of JDK 6, which is EOL'ed is a good thing. Moving to Java 8 as a minimum seems much too aggressive and I would push back on that. I'm also think that we need to let the dust settle on the Hadoop 2 line for a while before we talk about Hadoop 3. It seems that it has only been in the last 6 months that Hadoop 2 adoption has reached the main stream users. Our user community needs time to digest the changes in Hadoop 2.x before we fracture the community by starting to discuss Hadoop 3 releases. .. Owen
Re: Moving to JDK7, JDK8 and new major releases
I'm also +1 for getting us to JDK7 within the 2.x line after reading the proposals and catching up on the discussion in this thread. Has anyone yet considered how to coordinate this change with downstream projects? Would we request downstream projects to upgrade to JDK7 first before we make the move? Would we switch to JDK7, but run javac -target 1.6 to maintain compatibility for downstream projects during an interim period? Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 9:48 AM, Owen O'Malley omal...@apache.org wrote: On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com wrote: After reading this thread and thinking a bit about it, I think it should be OK such move up to JDK7 in Hadoop I agree with Alejandro. Changing minimum JDKs is not an incompatible change and is fine in the 2 branch. (Although I think it is would *not* be appropriate for a patch release.) Of course we need to do it with forethought and testing, but moving off of JDK 6, which is EOL'ed is a good thing. Moving to Java 8 as a minimum seems much too aggressive and I would push back on that. I'm also think that we need to let the dust settle on the Hadoop 2 line for a while before we talk about Hadoop 3. It seems that it has only been in the last 6 months that Hadoop 2 adoption has reached the main stream users. Our user community needs time to digest the changes in Hadoop 2.x before we fracture the community by starting to discuss Hadoop 3 releases. .. Owen -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Moving to JDK7, JDK8 and new major releases
I understood the plan for avoiding JDK7-specific features in our code, and your suggestion to add an extra Jenkins job is a great way to guard against that. The thing I haven't seen discussed yet is how downstream projects will continue to consume our built artifacts. If a downstream project upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but their project is still building with 1.6, then it would be a nasty surprise. These are the options I see: 1. Make sure all other projects upgrade first. This doesn't sound feasible, unless all other ecosystem projects have moved to JDK7 already. If not, then waiting on a single long pole project would hold up our migration indefinitely. 2. We switch to JDK7, but run javac with -target 1.6 until the whole ecosystem upgrades. I find this undesirable, because in a certain sense, it still leaves a bit of 1.6 lingering in the project. (I'll assume that end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.) 3. Just declare a clean break on some version (your earlier email said 2.5) and start publishing artifacts built with JDK7 and no -target option. Overall, this is my preferred option. However, as a side effect, this sets us up for longer-term maintenance and patch releases off of the 2.4 branch if a downstream project that's still on 1.6 needs to pick up a critical bug fix. Of course, this is all a moot point if all the downstream ecosystem projects have already made the switch to JDK7. I don't know the status of that off the top of my head. Maybe someone else out there knows? If not, then I expect I can free up enough in a few weeks to volunteer for tracking down that information. Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com wrote: Chris, Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you are still using jdk7 libraries and you could use new APIs, thus breaking jdk6 both at compile and runtime. you need to compile with jdk6 to ensure you are not running into that scenario. that is why i was suggesting the nightly jdk6 build/test jenkins job. On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I'm also +1 for getting us to JDK7 within the 2.x line after reading the proposals and catching up on the discussion in this thread. Has anyone yet considered how to coordinate this change with downstream projects? Would we request downstream projects to upgrade to JDK7 first before we make the move? Would we switch to JDK7, but run javac -target 1.6 to maintain compatibility for downstream projects during an interim period? Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 9:48 AM, Owen O'Malley omal...@apache.org wrote: On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com wrote: After reading this thread and thinking a bit about it, I think it should be OK such move up to JDK7 in Hadoop I agree with Alejandro. Changing minimum JDKs is not an incompatible change and is fine in the 2 branch. (Although I think it is would *not* be appropriate for a patch release.) Of course we need to do it with forethought and testing, but moving off of JDK 6, which is EOL'ed is a good thing. Moving to Java 8 as a minimum seems much too aggressive and I would push back on that. I'm also think that we need to let the dust settle on the Hadoop 2 line for a while before we talk about Hadoop 3. It seems that it has only been in the last 6 months that Hadoop 2 adoption has reached the main stream users. Our user community needs time to digest the changes in Hadoop 2.x before we fracture the community by starting to discuss Hadoop 3 releases. .. Owen -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Alejandro -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this
Re: Moving to JDK7, JDK8 and new major releases
Andrew thanks for writing the proposal. In the proposal you mention: Dropping support for a JDK in a minor release is incompatible, so this would require a change to our compatibility guidelines. Why is dropping a JDK incompatible? sanjay On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, Forking this thread as requested by Vinod. To help anyone who's catching up with this thread, I've written up a wiki page containing what I think are the proposals under discussion. I did my very best to make this as fact-based and disinterested as possible; I really appreciate the constructive discussion we've had so far. If you believe you have a proposal pending, please feel free to edit the wiki. https://wiki.apache.org/hadoop/MovingToJdk7and8 I think based on our current compatibility guidelines, Proposal A is the most attractive. We're pretty hamstrung by the requirement to keep the classpath the same, which would be solved by either OSGI or shading our deps (but that's a different discussion). Thanks, Andrew -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Moving to JDK7, JDK8 and new major releases
Tx for the new thread Andrew, hopefully it can attract more eyes. Here's what I am behind - a modified proposal C. - Overall I wouldn't think about EOL of JDK7 and/or JDK8 specifically given how long it has taken for JDK6 life-cycle to end. We should try to focus on JDK7 only for now. - As we have seen, a lot (majority?) of orgs on Hadoop have moved beyond JDK6 and are already running on JDK7. So upgrading to JDK7 is more of a reflection of reality (to quote Steve) than it in itself being a disruptive change. - We should try decoupling the discussion of major releases from JDK upgrades. We have seen individual libraries getting updated right in the 2.x lines as and when necessary. Given the new reality of JDK7, I don't see the 'JDK change' as much different from the library upgrades. We have seen how long it has taken (and still taking) users and organization to move from Hadoop 1 to Hadoop 2. A Hadoop 3/4 that adds nothing else other than JDK upgrades will be a big source of confusion for users. A major version update is also seen an opportunity for devs to break APIs. Unless we have groundbreaking 'features' (like YARN or wire-compatibility in Hadoop-2) that a majority of users want and that specifically warrant incompatible changes in our APIs or wire protocols, we are better off separating the major-version update discussion into ints own. Irrespective of all this, we should actively get behind better isolation of user classes/jars from MapReduce classpath. This one's been such a long running concern, it's not funny anymore. Thanks, +Vinod On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, Forking this thread as requested by Vinod. To help anyone who's catching up with this thread, I've written up a wiki page containing what I think are the proposals under discussion. I did my very best to make this as fact-based and disinterested as possible; I really appreciate the constructive discussion we've had so far. If you believe you have a proposal pending, please feel free to edit the wiki. https://wiki.apache.org/hadoop/MovingToJdk7and8 I think based on our current compatibility guidelines, Proposal A is the most attractive. We're pretty hamstrung by the requirement to keep the classpath the same, which would be solved by either OSGI or shading our deps (but that's a different discussion). Thanks, Andrew -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Moving to JDK7, JDK8 and new major releases
While we haven't codified this in our compatibility guidelines, dropping a Java version seems to me like change that needs to happen alongside a major release. In plain talk, it has the ability to break everything for users who aren't doing anything particularly unreasonable. I don't think we should accept Hadoop's compatibility behavior 6 years ago as precedent for what we can do now. That was before Hadoop 1.0. And we probably have several orders of magnitude more production users. I also don't think we should accept library upgrades as precedent. While this may make sense in specific situations, I definitely don't think this is OK in general. I'd be very nervous about updating Guava outside of major version upgrade. Lastly, I think the claim that nobody is running in production on Java 6 is unsubstantiated. We need to think about a JDK upgrade in terms of what its implications are for users, not in terms of what other kinds of compatibility we've broken that's loosely analogous. -Sandy On Tue, Jun 24, 2014 at 3:42 PM, Vinod Kumar Vavilapalli vino...@apache.org wrote: Tx for the new thread Andrew, hopefully it can attract more eyes. Here's what I am behind - a modified proposal C. - Overall I wouldn't think about EOL of JDK7 and/or JDK8 specifically given how long it has taken for JDK6 life-cycle to end. We should try to focus on JDK7 only for now. - As we have seen, a lot (majority?) of orgs on Hadoop have moved beyond JDK6 and are already running on JDK7. So upgrading to JDK7 is more of a reflection of reality (to quote Steve) than it in itself being a disruptive change. - We should try decoupling the discussion of major releases from JDK upgrades. We have seen individual libraries getting updated right in the 2.x lines as and when necessary. Given the new reality of JDK7, I don't see the 'JDK change' as much different from the library upgrades. We have seen how long it has taken (and still taking) users and organization to move from Hadoop 1 to Hadoop 2. A Hadoop 3/4 that adds nothing else other than JDK upgrades will be a big source of confusion for users. A major version update is also seen an opportunity for devs to break APIs. Unless we have groundbreaking 'features' (like YARN or wire-compatibility in Hadoop-2) that a majority of users want and that specifically warrant incompatible changes in our APIs or wire protocols, we are better off separating the major-version update discussion into ints own. Irrespective of all this, we should actively get behind better isolation of user classes/jars from MapReduce classpath. This one's been such a long running concern, it's not funny anymore. Thanks, +Vinod On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, Forking this thread as requested by Vinod. To help anyone who's catching up with this thread, I've written up a wiki page containing what I think are the proposals under discussion. I did my very best to make this as fact-based and disinterested as possible; I really appreciate the constructive discussion we've had so far. If you believe you have a proposal pending, please feel free to edit the wiki. https://wiki.apache.org/hadoop/MovingToJdk7and8 I think based on our current compatibility guidelines, Proposal A is the most attractive. We're pretty hamstrung by the requirement to keep the classpath the same, which would be solved by either OSGI or shading our deps (but that's a different discussion). Thanks, Andrew -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Moving to JDK7, JDK8 and new major releases
Hello, I do know scenarios which involve sticking with Java 6. Mostly related to supporting Applications Servers of (Big) 3 Letter companies (both). I dont see that in an Hadoop cluster. Especially given that expressiveness of Java 8 and also Performance of Java 7 are both important in Big Data analytics an das hoc Job platforms. So I would really be Suprised if any user Organisation would wand a major Hadoop upgrade but sticking with an unsupported / outdated JVM. Greetings Bernd (Who still ships 1.4 products)
Re: Moving to JDK7, JDK8 and new major releases
Alejandro, On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com wrote: After reading this thread and thinking a bit about it, I think it should be OK such move up to JDK7 in Hadoop 2 for the following reasons: * Existing Hadoop 2 releases and related projects are running on JDK7 in production. * Commercial vendors of Hadoop have already done lot of work to ensure Hadoop on JDK7 works while keeping Hadoop on JDK6 working. * Different from many of the 3rd party libraries used by Hadoop, JDK is much stricter on backwards compatibility. +1 - I think we are all on the same page here. Fully agree. IMPORTANT: I take this as an exception and not as a carte blanche for 3rd party dependencies and for moving from JDK7 to JDK8 (though it could OK for the later if we end up in the same state of affairs) +1. Agree again - let's just wait/watch. From the thread I've become more convinced that (as you've noted before) that since we are at the bottom of the stack, we need to be more conservative. From http://www.oracle.com/technetwork/java/eol-135779.html, it looks like April 2015 is the *earliest* Java7 will EOL. Java6 EOL was Feb 2011 and we are still debating whether we can stop supporting it. So, my guess is that we will support Java7 at least for a year after it's EOL i.e. till sometime in early 2016. It's just practical. Net - We really don't have a good idea when a significant portion of users will actually migrate to Java 8. W.r.t Java7 this took nearly 3 years after Java6 EOL. So for now, let's just wait see how things develop in the field. Even for Hadoop 2.5, I think we could do the move: * Create the Hadoop 2.5 release branch. * Have one nightly Jenkins job that builds Hadoop 2.5 branch with JDK6 to ensure not JDK7 language/API feature creeps out in Hadoop 2.5. Keep this for all Hadoop 2.5.x releases. * Sanity tests for the Hadoop 2.5.x releases should be done with JDK7. * Apply Steve’s patch to require JDK7 on trunk and branch-2. * Move all Apache Jenkins jobs to build/test using JDK7. * Starting from Hadoop 2.6 we support JDK7 language/API features. I think the mechanics make perfect sense to me. I think we should probably think a bit more on whether we drop support for JDK6 in hadoop-2.6 or hadoop-2.7. I'd like to add one more: * Sometime soon (within a release or two) after we actually drop support for Java6 and move branch-2 to JDK7, let's also start testing on Java8. This way we will be ready for Java8 early regardless of when we stop support for Java7. Dropping Java7 is a bridge we can cross when we come to it. thanks, Arun Effectively what we are ensuring that Hadoop 2.5.x builds and test with JDK6 JDK7 and that all tests towards the release are done with JDK7. Users can proactively upgrade to JDK7 before upgrading to Hadoop 2.5.x, or if upgrade to Hadoop 2.5.x and they run into any issue because of JDK6 (which it would be quite unlikely) they can reactively upgrade to JDK7. Thoughts? On Tue, Jun 24, 2014 at 4:22 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, On dependencies, we've bumped library versions when we think it's safe and the APIs in the new version are compatible. Or, it's not leaked to the app classpath (e.g the JUnit version bump). I think the JIRAs Arun mentioned fall into one of those categories. Steve can do a better job explaining this to me, but we haven't bumped things like Jetty or Guava because they are on the classpath and are not compatible. There is this line in the compat guidelines: - Existing MapReduce, YARN HDFS applications and frameworks should work unmodified within a major release i.e. Apache Hadoop ABI is supported. Since Hadoop apps can and do depend on the Hadoop classpath, the classpath is effectively part of our API. I'm sure there are user apps out there that will break if we make incompatible changes to the classpath. I haven't read up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app out there. Sticking to the theme of work unmodified, let's think about the user effort required to upgrade their JDK. This can be a very expensive task. It might need approval up and down the org, meaning lots of certification, testing, and signoff. Considering the amount of user effort involved here, it really seems like dropping a JDK is something that should only happen in a major release. Else, there's the potential for nasty surprises in a supposedly minor release. That said, we are in an unhappy place right now regarding JDK6, and it's true that almost everyone's moved off of JDK6 at this point. So, I'd be okay with an intermediate 2.x release that drops JDK6 support (but no incompatible changes to the classpath like Guava). This is basically free, and we could start using JDK7 idioms like multi-catch and new NIO stuff in Hadoop code (a minor draw I guess). My
Re: Moving to JDK7, JDK8 and new major releases
On Jun 24, 2014, at 4:22 PM, Andrew Wang andrew.w...@cloudera.com wrote: Since Hadoop apps can and do depend on the Hadoop classpath, the classpath is effectively part of our API. I'm sure there are user apps out there that will break if we make incompatible changes to the classpath. I haven't read up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app out there. I think there is a some confusion/misunderstanding here. With hadoop-2 the user is completely in control of his own classpath (we had a similar, but limited capability in hadoop-1 w/ https://issues.apache.org/jira/browse/MAPREDUCE-1938). Furthermore, it's probably not well known that in hadoop-2 the user application (MR or otherwise) can also pick the JDK version by using JAVA_HOME env for the container. So, in effect, MR applications can continue to use java6 while YARN is running java7 - this hasn't been tested extensively though. This capability did not exist in hadoop-1. We've also made some progress with https://issues.apache.org/jira/browse/MAPREDUCE-1700 to defuse user jar-deps from MR system jars. https://issues.apache.org/jira/browse/MAPREDUCE-4421 also helps by ensuring MR applications can pick exact version of MR jars they were compiled against; and not rely on cluster installs. Hope that helps somewhat. thanks, Arun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.