Re: Moving to JDK7, JDK8 and new major releases
Following up on ecosystem, I just took a look at the Apache trunk pom.xml files for HBase, Flume and Oozie. All are specifying 1.6 for source and target in the maven-compiler-plugin configuration, so there may be additional follow-up required here. (For example, if HBase has made a statement that its client will continue to support JDK6, then it wouldn't be practical for them to link to a JDK7 version of hadoop-common.) +1 for the whole plan though. We can work through these details. Chris Nauroth Hortonworks http://hortonworks.com/ On Fri, Jun 27, 2014 at 3:10 PM, Karthik Kambatla ka...@cloudera.com wrote: +1 to making 2.6 the last JDK6 release. If we want, 2.7 could be a parallel release or one soon after 2.6. We could upgrade other dependencies that require JDK7 as well. On Fri, Jun 27, 2014 at 3:01 PM, Arun C. Murthy a...@hortonworks.com wrote: Thanks everyone for the discussion. Looks like we have come to a pragmatic and progressive conclusion. In terms of execution of the consensus plan, I think a little bit of caution is in order. Let's give downstream projects more of a runway. I propose we inform HBase, Pig, Hive etc. that we are considering making 2.6 (not 2.5) the last JDK6 release and solicit their feedback. Once they are comfortable we can pull the trigger in 2.7. thanks, Arun On Jun 27, 2014, at 11:34 AM, Karthik Kambatla ka...@cloudera.com wrote: As someone else already mentioned, we should announce one future release (may be, 2.5) as the last JDK6-based release before making the move to JDK7. I am comfortable calling 2.5 the last JDK6 release. On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, responding to multiple messages here, Arun, thanks for the clarification regarding MR classpaths. It sounds like the story there is improved and still improving. However, I think we still suffer from this at least on the HDFS side. We have a single JAR for all of HDFS, and our clients need to have all the fun deps like Guava on the classpath. I'm told Spark sticks a newer Guava at the front of the classpath and the HDFS client still works okay, but this is more happy coincidence than anything else. While we're leaking deps, we're in a scary situation. API compat to me means that an app should be able to run on a new minor version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and have nothing break. If we muck with the classpath, my understanding is that this could break. Owen, bumping the minimum JDK version in a minor release like this should be a one-time exception as Tucu stated. A number of people have pointed out how painful a forced JDK upgrade is for end users, and it's not something we should be springing on them in a minor release unless we're *very* confident like in this case. Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on JDK7 across the CDH stack, so I think that's an indication that most ecosystem projects are ready to make the jump. Is that sufficient in your mind? For the record, I'm also +1 on the Tucu plan. Is it too late to do this for 2.5? I'll offer to help out with some of the mechanics. Thanks, Andrew On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I understood the plan for avoiding JDK7-specific features in our code, and your suggestion to add an extra Jenkins job is a great way to guard against that. The thing I haven't seen discussed yet is how downstream projects will continue to consume our built artifacts. If a downstream project upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but their project is still building with 1.6, then it would be a nasty surprise. These are the options I see: 1. Make sure all other projects upgrade first. This doesn't sound feasible, unless all other ecosystem projects have moved to JDK7 already. If not, then waiting on a single long pole project would hold up our migration indefinitely. 2. We switch to JDK7, but run javac with -target 1.6 until the whole ecosystem upgrades. I find this undesirable, because in a certain sense, it still leaves a bit of 1.6 lingering in the project. (I'll assume that end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.) 3. Just declare a clean break on some version (your earlier email said 2.5) and start publishing artifacts built with JDK7 and no -target option. Overall, this is my preferred option. However, as a side effect, this sets us up for longer-term maintenance and patch
Re: Moving to JDK7, JDK8 and new major releases
Guava is a separate problem and I think we should have a separate discussion what can we do about guava? That's more traumatic than a JDK update, I fear, as the guava releases care a lot less about compatibility. I don't worry about JDK updates removing classes like StringBuffer because StringBuilder is better. On 27 June 2014 19:26, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, responding to multiple messages here, Arun, thanks for the clarification regarding MR classpaths. It sounds like the story there is improved and still improving. However, I think we still suffer from this at least on the HDFS side. We have a single JAR for all of HDFS, and our clients need to have all the fun deps like Guava on the classpath. I'm told Spark sticks a newer Guava at the front of the classpath and the HDFS client still works okay, but this is more happy coincidence than anything else. While we're leaking deps, we're in a scary situation. very good point. API compat to me means that an app should be able to run on a new minor version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and have nothing break. If we muck with the classpath, my understanding is that this could break. I think this is possible by having the app upload all the JARs...I need to experiment here myself. Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on JDK7 across the CDH stack, so I think that's an indication that most ecosystem projects are ready to make the jump. Is that sufficient in your mind? +1, we've had no complaints about things not working on Java 7. It's been out a long time. IF you look at our own code, the main thing that broke were tests -due to junit test case ordering- and not much else. For the record, I'm also +1 on the Tucu plan. Is it too late to do this for 2.5? I'll offer to help out with some of the mechanics. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Moving to JDK7, JDK8 and new major releases
Hi all, responding to multiple messages here, Arun, thanks for the clarification regarding MR classpaths. It sounds like the story there is improved and still improving. However, I think we still suffer from this at least on the HDFS side. We have a single JAR for all of HDFS, and our clients need to have all the fun deps like Guava on the classpath. I'm told Spark sticks a newer Guava at the front of the classpath and the HDFS client still works okay, but this is more happy coincidence than anything else. While we're leaking deps, we're in a scary situation. API compat to me means that an app should be able to run on a new minor version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and have nothing break. If we muck with the classpath, my understanding is that this could break. Owen, bumping the minimum JDK version in a minor release like this should be a one-time exception as Tucu stated. A number of people have pointed out how painful a forced JDK upgrade is for end users, and it's not something we should be springing on them in a minor release unless we're *very* confident like in this case. Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on JDK7 across the CDH stack, so I think that's an indication that most ecosystem projects are ready to make the jump. Is that sufficient in your mind? For the record, I'm also +1 on the Tucu plan. Is it too late to do this for 2.5? I'll offer to help out with some of the mechanics. Thanks, Andrew On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I understood the plan for avoiding JDK7-specific features in our code, and your suggestion to add an extra Jenkins job is a great way to guard against that. The thing I haven't seen discussed yet is how downstream projects will continue to consume our built artifacts. If a downstream project upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but their project is still building with 1.6, then it would be a nasty surprise. These are the options I see: 1. Make sure all other projects upgrade first. This doesn't sound feasible, unless all other ecosystem projects have moved to JDK7 already. If not, then waiting on a single long pole project would hold up our migration indefinitely. 2. We switch to JDK7, but run javac with -target 1.6 until the whole ecosystem upgrades. I find this undesirable, because in a certain sense, it still leaves a bit of 1.6 lingering in the project. (I'll assume that end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.) 3. Just declare a clean break on some version (your earlier email said 2.5) and start publishing artifacts built with JDK7 and no -target option. Overall, this is my preferred option. However, as a side effect, this sets us up for longer-term maintenance and patch releases off of the 2.4 branch if a downstream project that's still on 1.6 needs to pick up a critical bug fix. Of course, this is all a moot point if all the downstream ecosystem projects have already made the switch to JDK7. I don't know the status of that off the top of my head. Maybe someone else out there knows? If not, then I expect I can free up enough in a few weeks to volunteer for tracking down that information. Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com wrote: Chris, Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you are still using jdk7 libraries and you could use new APIs, thus breaking jdk6 both at compile and runtime. you need to compile with jdk6 to ensure you are not running into that scenario. that is why i was suggesting the nightly jdk6 build/test jenkins job. On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I'm also +1 for getting us to JDK7 within the 2.x line after reading the proposals and catching up on the discussion in this thread. Has anyone yet considered how to coordinate this change with downstream projects? Would we request downstream projects to upgrade to JDK7 first before we make the move? Would we switch to JDK7, but run javac -target 1.6 to maintain compatibility for downstream projects during an interim period? Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 9:48 AM, Owen O'Malley omal...@apache.org wrote: On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com wrote: After reading this thread and thinking a bit about it, I think it should be OK such move up to JDK7 in Hadoop I agree with Alejandro. Changing minimum JDKs is not an incompatible change and is fine in the 2
Re: Moving to JDK7, JDK8 and new major releases
As someone else already mentioned, we should announce one future release (may be, 2.5) as the last JDK6-based release before making the move to JDK7. I am comfortable calling 2.5 the last JDK6 release. On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, responding to multiple messages here, Arun, thanks for the clarification regarding MR classpaths. It sounds like the story there is improved and still improving. However, I think we still suffer from this at least on the HDFS side. We have a single JAR for all of HDFS, and our clients need to have all the fun deps like Guava on the classpath. I'm told Spark sticks a newer Guava at the front of the classpath and the HDFS client still works okay, but this is more happy coincidence than anything else. While we're leaking deps, we're in a scary situation. API compat to me means that an app should be able to run on a new minor version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and have nothing break. If we muck with the classpath, my understanding is that this could break. Owen, bumping the minimum JDK version in a minor release like this should be a one-time exception as Tucu stated. A number of people have pointed out how painful a forced JDK upgrade is for end users, and it's not something we should be springing on them in a minor release unless we're *very* confident like in this case. Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on JDK7 across the CDH stack, so I think that's an indication that most ecosystem projects are ready to make the jump. Is that sufficient in your mind? For the record, I'm also +1 on the Tucu plan. Is it too late to do this for 2.5? I'll offer to help out with some of the mechanics. Thanks, Andrew On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I understood the plan for avoiding JDK7-specific features in our code, and your suggestion to add an extra Jenkins job is a great way to guard against that. The thing I haven't seen discussed yet is how downstream projects will continue to consume our built artifacts. If a downstream project upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but their project is still building with 1.6, then it would be a nasty surprise. These are the options I see: 1. Make sure all other projects upgrade first. This doesn't sound feasible, unless all other ecosystem projects have moved to JDK7 already. If not, then waiting on a single long pole project would hold up our migration indefinitely. 2. We switch to JDK7, but run javac with -target 1.6 until the whole ecosystem upgrades. I find this undesirable, because in a certain sense, it still leaves a bit of 1.6 lingering in the project. (I'll assume that end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.) 3. Just declare a clean break on some version (your earlier email said 2.5) and start publishing artifacts built with JDK7 and no -target option. Overall, this is my preferred option. However, as a side effect, this sets us up for longer-term maintenance and patch releases off of the 2.4 branch if a downstream project that's still on 1.6 needs to pick up a critical bug fix. Of course, this is all a moot point if all the downstream ecosystem projects have already made the switch to JDK7. I don't know the status of that off the top of my head. Maybe someone else out there knows? If not, then I expect I can free up enough in a few weeks to volunteer for tracking down that information. Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com wrote: Chris, Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you are still using jdk7 libraries and you could use new APIs, thus breaking jdk6 both at compile and runtime. you need to compile with jdk6 to ensure you are not running into that scenario. that is why i was suggesting the nightly jdk6 build/test jenkins job. On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I'm also +1 for getting us to JDK7 within the 2.x line after reading the proposals and catching up on the discussion in this thread. Has anyone yet considered how to coordinate this change with downstream projects? Would we request downstream projects to upgrade to JDK7 first before we make the move? Would we switch to JDK7, but run javac -target 1.6 to maintain compatibility for downstream projects during an interim period? Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25,
Re: Moving to JDK7, JDK8 and new major releases
FYI I also just updated the wiki page with a Proposal D, aka Tucu plan, which I think is essentially Proposal C but tabling JDK8 plans for now. https://wiki.apache.org/hadoop/MovingToJdk7and8 Karthik, thanks for ringing in re: 2.5. I guess there's nothing urgently required, the Jenkins stuff just needs to happen before 2.6. Still, I'm happy to help with anything. Thanks, Andrew On Fri, Jun 27, 2014 at 11:34 AM, Karthik Kambatla ka...@cloudera.com wrote: As someone else already mentioned, we should announce one future release (may be, 2.5) as the last JDK6-based release before making the move to JDK7. I am comfortable calling 2.5 the last JDK6 release. On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, responding to multiple messages here, Arun, thanks for the clarification regarding MR classpaths. It sounds like the story there is improved and still improving. However, I think we still suffer from this at least on the HDFS side. We have a single JAR for all of HDFS, and our clients need to have all the fun deps like Guava on the classpath. I'm told Spark sticks a newer Guava at the front of the classpath and the HDFS client still works okay, but this is more happy coincidence than anything else. While we're leaking deps, we're in a scary situation. API compat to me means that an app should be able to run on a new minor version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and have nothing break. If we muck with the classpath, my understanding is that this could break. Owen, bumping the minimum JDK version in a minor release like this should be a one-time exception as Tucu stated. A number of people have pointed out how painful a forced JDK upgrade is for end users, and it's not something we should be springing on them in a minor release unless we're *very* confident like in this case. Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on JDK7 across the CDH stack, so I think that's an indication that most ecosystem projects are ready to make the jump. Is that sufficient in your mind? For the record, I'm also +1 on the Tucu plan. Is it too late to do this for 2.5? I'll offer to help out with some of the mechanics. Thanks, Andrew On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I understood the plan for avoiding JDK7-specific features in our code, and your suggestion to add an extra Jenkins job is a great way to guard against that. The thing I haven't seen discussed yet is how downstream projects will continue to consume our built artifacts. If a downstream project upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but their project is still building with 1.6, then it would be a nasty surprise. These are the options I see: 1. Make sure all other projects upgrade first. This doesn't sound feasible, unless all other ecosystem projects have moved to JDK7 already. If not, then waiting on a single long pole project would hold up our migration indefinitely. 2. We switch to JDK7, but run javac with -target 1.6 until the whole ecosystem upgrades. I find this undesirable, because in a certain sense, it still leaves a bit of 1.6 lingering in the project. (I'll assume that end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.) 3. Just declare a clean break on some version (your earlier email said 2.5) and start publishing artifacts built with JDK7 and no -target option. Overall, this is my preferred option. However, as a side effect, this sets us up for longer-term maintenance and patch releases off of the 2.4 branch if a downstream project that's still on 1.6 needs to pick up a critical bug fix. Of course, this is all a moot point if all the downstream ecosystem projects have already made the switch to JDK7. I don't know the status of that off the top of my head. Maybe someone else out there knows? If not, then I expect I can free up enough in a few weeks to volunteer for tracking down that information. Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com wrote: Chris, Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you are still using jdk7 libraries and you could use new APIs, thus breaking jdk6 both at compile and runtime. you need to compile with jdk6 to ensure you are not running into that scenario. that is why i was suggesting the nightly jdk6 build/test jenkins job. On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth
Re: Moving to JDK7, JDK8 and new major releases
Thanks everyone for the discussion. Looks like we have come to a pragmatic and progressive conclusion. In terms of execution of the consensus plan, I think a little bit of caution is in order. Let's give downstream projects more of a runway. I propose we inform HBase, Pig, Hive etc. that we are considering making 2.6 (not 2.5) the last JDK6 release and solicit their feedback. Once they are comfortable we can pull the trigger in 2.7. thanks, Arun On Jun 27, 2014, at 11:34 AM, Karthik Kambatla ka...@cloudera.com wrote: As someone else already mentioned, we should announce one future release (may be, 2.5) as the last JDK6-based release before making the move to JDK7. I am comfortable calling 2.5 the last JDK6 release. On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, responding to multiple messages here, Arun, thanks for the clarification regarding MR classpaths. It sounds like the story there is improved and still improving. However, I think we still suffer from this at least on the HDFS side. We have a single JAR for all of HDFS, and our clients need to have all the fun deps like Guava on the classpath. I'm told Spark sticks a newer Guava at the front of the classpath and the HDFS client still works okay, but this is more happy coincidence than anything else. While we're leaking deps, we're in a scary situation. API compat to me means that an app should be able to run on a new minor version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and have nothing break. If we muck with the classpath, my understanding is that this could break. Owen, bumping the minimum JDK version in a minor release like this should be a one-time exception as Tucu stated. A number of people have pointed out how painful a forced JDK upgrade is for end users, and it's not something we should be springing on them in a minor release unless we're *very* confident like in this case. Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on JDK7 across the CDH stack, so I think that's an indication that most ecosystem projects are ready to make the jump. Is that sufficient in your mind? For the record, I'm also +1 on the Tucu plan. Is it too late to do this for 2.5? I'll offer to help out with some of the mechanics. Thanks, Andrew On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I understood the plan for avoiding JDK7-specific features in our code, and your suggestion to add an extra Jenkins job is a great way to guard against that. The thing I haven't seen discussed yet is how downstream projects will continue to consume our built artifacts. If a downstream project upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but their project is still building with 1.6, then it would be a nasty surprise. These are the options I see: 1. Make sure all other projects upgrade first. This doesn't sound feasible, unless all other ecosystem projects have moved to JDK7 already. If not, then waiting on a single long pole project would hold up our migration indefinitely. 2. We switch to JDK7, but run javac with -target 1.6 until the whole ecosystem upgrades. I find this undesirable, because in a certain sense, it still leaves a bit of 1.6 lingering in the project. (I'll assume that end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.) 3. Just declare a clean break on some version (your earlier email said 2.5) and start publishing artifacts built with JDK7 and no -target option. Overall, this is my preferred option. However, as a side effect, this sets us up for longer-term maintenance and patch releases off of the 2.4 branch if a downstream project that's still on 1.6 needs to pick up a critical bug fix. Of course, this is all a moot point if all the downstream ecosystem projects have already made the switch to JDK7. I don't know the status of that off the top of my head. Maybe someone else out there knows? If not, then I expect I can free up enough in a few weeks to volunteer for tracking down that information. Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com wrote: Chris, Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you are still using jdk7 libraries and you could use new APIs, thus breaking jdk6 both at compile and runtime. you need to compile with jdk6 to ensure you are not running into that scenario. that is why i was suggesting the nightly jdk6 build/test jenkins job. On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I'm also +1 for getting us to JDK7 within the 2.x line after reading the
Re: Moving to JDK7, JDK8 and new major releases
+1 to making 2.6 the last JDK6 release. If we want, 2.7 could be a parallel release or one soon after 2.6. We could upgrade other dependencies that require JDK7 as well. On Fri, Jun 27, 2014 at 3:01 PM, Arun C. Murthy a...@hortonworks.com wrote: Thanks everyone for the discussion. Looks like we have come to a pragmatic and progressive conclusion. In terms of execution of the consensus plan, I think a little bit of caution is in order. Let's give downstream projects more of a runway. I propose we inform HBase, Pig, Hive etc. that we are considering making 2.6 (not 2.5) the last JDK6 release and solicit their feedback. Once they are comfortable we can pull the trigger in 2.7. thanks, Arun On Jun 27, 2014, at 11:34 AM, Karthik Kambatla ka...@cloudera.com wrote: As someone else already mentioned, we should announce one future release (may be, 2.5) as the last JDK6-based release before making the move to JDK7. I am comfortable calling 2.5 the last JDK6 release. On Fri, Jun 27, 2014 at 11:26 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, responding to multiple messages here, Arun, thanks for the clarification regarding MR classpaths. It sounds like the story there is improved and still improving. However, I think we still suffer from this at least on the HDFS side. We have a single JAR for all of HDFS, and our clients need to have all the fun deps like Guava on the classpath. I'm told Spark sticks a newer Guava at the front of the classpath and the HDFS client still works okay, but this is more happy coincidence than anything else. While we're leaking deps, we're in a scary situation. API compat to me means that an app should be able to run on a new minor version of Hadoop and not have anything break. MAPREDUCE-4421 sounds like it allows you to run e.g. 2.3 MR jobs on a 2.4 YARN cluster, but what should also be possible is running an HDFS 2.3 app with HDFS 2.4 JARs and have nothing break. If we muck with the classpath, my understanding is that this could break. Owen, bumping the minimum JDK version in a minor release like this should be a one-time exception as Tucu stated. A number of people have pointed out how painful a forced JDK upgrade is for end users, and it's not something we should be springing on them in a minor release unless we're *very* confident like in this case. Chris, thanks for bringing up the ecosystem. For CDH5, we standardized on JDK7 across the CDH stack, so I think that's an indication that most ecosystem projects are ready to make the jump. Is that sufficient in your mind? For the record, I'm also +1 on the Tucu plan. Is it too late to do this for 2.5? I'll offer to help out with some of the mechanics. Thanks, Andrew On Wed, Jun 25, 2014 at 4:18 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I understood the plan for avoiding JDK7-specific features in our code, and your suggestion to add an extra Jenkins job is a great way to guard against that. The thing I haven't seen discussed yet is how downstream projects will continue to consume our built artifacts. If a downstream project upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but their project is still building with 1.6, then it would be a nasty surprise. These are the options I see: 1. Make sure all other projects upgrade first. This doesn't sound feasible, unless all other ecosystem projects have moved to JDK7 already. If not, then waiting on a single long pole project would hold up our migration indefinitely. 2. We switch to JDK7, but run javac with -target 1.6 until the whole ecosystem upgrades. I find this undesirable, because in a certain sense, it still leaves a bit of 1.6 lingering in the project. (I'll assume that end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.) 3. Just declare a clean break on some version (your earlier email said 2.5) and start publishing artifacts built with JDK7 and no -target option. Overall, this is my preferred option. However, as a side effect, this sets us up for longer-term maintenance and patch releases off of the 2.4 branch if a downstream project that's still on 1.6 needs to pick up a critical bug fix. Of course, this is all a moot point if all the downstream ecosystem projects have already made the switch to JDK7. I don't know the status of that off the top of my head. Maybe someone else out there knows? If not, then I expect I can free up enough in a few weeks to volunteer for tracking down that information. Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com wrote: Chris, Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you are still using jdk7 libraries and you could use new APIs, thus breaking
Re: Moving to JDK7, JDK8 and new major releases
On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com wrote: After reading this thread and thinking a bit about it, I think it should be OK such move up to JDK7 in Hadoop I agree with Alejandro. Changing minimum JDKs is not an incompatible change and is fine in the 2 branch. (Although I think it is would *not* be appropriate for a patch release.) Of course we need to do it with forethought and testing, but moving off of JDK 6, which is EOL'ed is a good thing. Moving to Java 8 as a minimum seems much too aggressive and I would push back on that. I'm also think that we need to let the dust settle on the Hadoop 2 line for a while before we talk about Hadoop 3. It seems that it has only been in the last 6 months that Hadoop 2 adoption has reached the main stream users. Our user community needs time to digest the changes in Hadoop 2.x before we fracture the community by starting to discuss Hadoop 3 releases. .. Owen
Re: Moving to JDK7, JDK8 and new major releases
I'm also +1 for getting us to JDK7 within the 2.x line after reading the proposals and catching up on the discussion in this thread. Has anyone yet considered how to coordinate this change with downstream projects? Would we request downstream projects to upgrade to JDK7 first before we make the move? Would we switch to JDK7, but run javac -target 1.6 to maintain compatibility for downstream projects during an interim period? Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 9:48 AM, Owen O'Malley omal...@apache.org wrote: On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com wrote: After reading this thread and thinking a bit about it, I think it should be OK such move up to JDK7 in Hadoop I agree with Alejandro. Changing minimum JDKs is not an incompatible change and is fine in the 2 branch. (Although I think it is would *not* be appropriate for a patch release.) Of course we need to do it with forethought and testing, but moving off of JDK 6, which is EOL'ed is a good thing. Moving to Java 8 as a minimum seems much too aggressive and I would push back on that. I'm also think that we need to let the dust settle on the Hadoop 2 line for a while before we talk about Hadoop 3. It seems that it has only been in the last 6 months that Hadoop 2 adoption has reached the main stream users. Our user community needs time to digest the changes in Hadoop 2.x before we fracture the community by starting to discuss Hadoop 3 releases. .. Owen -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Moving to JDK7, JDK8 and new major releases
Chris, Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you are still using jdk7 libraries and you could use new APIs, thus breaking jdk6 both at compile and runtime. you need to compile with jdk6 to ensure you are not running into that scenario. that is why i was suggesting the nightly jdk6 build/test jenkins job. On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I'm also +1 for getting us to JDK7 within the 2.x line after reading the proposals and catching up on the discussion in this thread. Has anyone yet considered how to coordinate this change with downstream projects? Would we request downstream projects to upgrade to JDK7 first before we make the move? Would we switch to JDK7, but run javac -target 1.6 to maintain compatibility for downstream projects during an interim period? Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 9:48 AM, Owen O'Malley omal...@apache.org wrote: On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com wrote: After reading this thread and thinking a bit about it, I think it should be OK such move up to JDK7 in Hadoop I agree with Alejandro. Changing minimum JDKs is not an incompatible change and is fine in the 2 branch. (Although I think it is would *not* be appropriate for a patch release.) Of course we need to do it with forethought and testing, but moving off of JDK 6, which is EOL'ed is a good thing. Moving to Java 8 as a minimum seems much too aggressive and I would push back on that. I'm also think that we need to let the dust settle on the Hadoop 2 line for a while before we talk about Hadoop 3. It seems that it has only been in the last 6 months that Hadoop 2 adoption has reached the main stream users. Our user community needs time to digest the changes in Hadoop 2.x before we fracture the community by starting to discuss Hadoop 3 releases. .. Owen -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Alejandro
Re: Moving to JDK7, JDK8 and new major releases
+1 (non-binding) for 2.5 to be the last release to ensure JDK6. My higher-level goal though is to avoid going through this same pain again when JDK7 goes EOL. I'd like to do a JDK8-based release before then for this reason. This is why I suggested skipping an intermediate 2.x+JDK7 release and leapfrogging to 3.0+JDK8. I'm thinking skipping an intermediate release and leapfrogging to 3.0 makes it difficult to maintain branch-2. It's only about a half year from 2.2 GA, so we should maintain branch-2 and create bug-fix releases for long-term even if 3.0+JDK8 is released. Thanks, Akira (2014/06/24 17:56), Steve Loughran wrote: +1, though I think 2.5 may be premature if we want to send a warning note last ever. That's an issue for followon when in branch 2. Guava and protobuf.jar are two things we have to leave alone, with the first being unfortunate, but their attitude to updates is pretty dramatic. The latter? We all know how traumatic that can be. -Steve On 24 June 2014 16:44, Alejandro Abdelnur t...@cloudera.com wrote: After reading this thread and thinking a bit about it, I think it should be OK such move up to JDK7 in Hadoop 2 for the following reasons: * Existing Hadoop 2 releases and related projects are running on JDK7 in production. * Commercial vendors of Hadoop have already done lot of work to ensure Hadoop on JDK7 works while keeping Hadoop on JDK6 working. * Different from many of the 3rd party libraries used by Hadoop, JDK is much stricter on backwards compatibility. IMPORTANT: I take this as an exception and not as a carte blanche for 3rd party dependencies and for moving from JDK7 to JDK8 (though it could OK for the later if we end up in the same state of affairs) Even for Hadoop 2.5, I think we could do the move: * Create the Hadoop 2.5 release branch. * Have one nightly Jenkins job that builds Hadoop 2.5 branch with JDK6 to ensure not JDK7 language/API feature creeps out in Hadoop 2.5. Keep this for all Hadoop 2.5.x releases. * Sanity tests for the Hadoop 2.5.x releases should be done with JDK7. * Apply Steve’s patch to require JDK7 on trunk and branch-2. * Move all Apache Jenkins jobs to build/test using JDK7. * Starting from Hadoop 2.6 we support JDK7 language/API features. Effectively what we are ensuring that Hadoop 2.5.x builds and test with JDK6 JDK7 and that all tests towards the release are done with JDK7. Users can proactively upgrade to JDK7 before upgrading to Hadoop 2.5.x, or if upgrade to Hadoop 2.5.x and they run into any issue because of JDK6 (which it would be quite unlikely) they can reactively upgrade to JDK7. Thoughts? On Tue, Jun 24, 2014 at 4:22 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, On dependencies, we've bumped library versions when we think it's safe and the APIs in the new version are compatible. Or, it's not leaked to the app classpath (e.g the JUnit version bump). I think the JIRAs Arun mentioned fall into one of those categories. Steve can do a better job explaining this to me, but we haven't bumped things like Jetty or Guava because they are on the classpath and are not compatible. There is this line in the compat guidelines: - Existing MapReduce, YARN HDFS applications and frameworks should work unmodified within a major release i.e. Apache Hadoop ABI is supported. Since Hadoop apps can and do depend on the Hadoop classpath, the classpath is effectively part of our API. I'm sure there are user apps out there that will break if we make incompatible changes to the classpath. I haven't read up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app out there. Sticking to the theme of work unmodified, let's think about the user effort required to upgrade their JDK. This can be a very expensive task. It might need approval up and down the org, meaning lots of certification, testing, and signoff. Considering the amount of user effort involved here, it really seems like dropping a JDK is something that should only happen in a major release. Else, there's the potential for nasty surprises in a supposedly minor release. That said, we are in an unhappy place right now regarding JDK6, and it's true that almost everyone's moved off of JDK6 at this point. So, I'd be okay with an intermediate 2.x release that drops JDK6 support (but no incompatible changes to the classpath like Guava). This is basically free, and we could start using JDK7 idioms like multi-catch and new NIO stuff in Hadoop code (a minor draw I guess). My higher-level goal though is to avoid going through this same pain again when JDK7 goes EOL. I'd like to do a JDK8-based release before then for this reason. This is why I suggested skipping an intermediate 2.x+JDK7 release and leapfrogging to 3.0+JDK8. 10 months is really not that far in the future, and it seems like a better place to focus our efforts. I was also hoping it'd be realistic to fix our classpath leakage by then, since then
Re: Moving to JDK7, JDK8 and new major releases
I understood the plan for avoiding JDK7-specific features in our code, and your suggestion to add an extra Jenkins job is a great way to guard against that. The thing I haven't seen discussed yet is how downstream projects will continue to consume our built artifacts. If a downstream project upgrades to pick up a bug fix, and the jar switches to 1.7 class files, but their project is still building with 1.6, then it would be a nasty surprise. These are the options I see: 1. Make sure all other projects upgrade first. This doesn't sound feasible, unless all other ecosystem projects have moved to JDK7 already. If not, then waiting on a single long pole project would hold up our migration indefinitely. 2. We switch to JDK7, but run javac with -target 1.6 until the whole ecosystem upgrades. I find this undesirable, because in a certain sense, it still leaves a bit of 1.6 lingering in the project. (I'll assume that end-of-life for JDK6 also means end-of-life for the 1.6 bytecode format.) 3. Just declare a clean break on some version (your earlier email said 2.5) and start publishing artifacts built with JDK7 and no -target option. Overall, this is my preferred option. However, as a side effect, this sets us up for longer-term maintenance and patch releases off of the 2.4 branch if a downstream project that's still on 1.6 needs to pick up a critical bug fix. Of course, this is all a moot point if all the downstream ecosystem projects have already made the switch to JDK7. I don't know the status of that off the top of my head. Maybe someone else out there knows? If not, then I expect I can free up enough in a few weeks to volunteer for tracking down that information. Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 3:12 PM, Alejandro Abdelnur t...@cloudera.com wrote: Chris, Compiling with jdk7 and doing javac -target 1.6 is not sufficient, you are still using jdk7 libraries and you could use new APIs, thus breaking jdk6 both at compile and runtime. you need to compile with jdk6 to ensure you are not running into that scenario. that is why i was suggesting the nightly jdk6 build/test jenkins job. On Wed, Jun 25, 2014 at 2:04 PM, Chris Nauroth cnaur...@hortonworks.com wrote: I'm also +1 for getting us to JDK7 within the 2.x line after reading the proposals and catching up on the discussion in this thread. Has anyone yet considered how to coordinate this change with downstream projects? Would we request downstream projects to upgrade to JDK7 first before we make the move? Would we switch to JDK7, but run javac -target 1.6 to maintain compatibility for downstream projects during an interim period? Chris Nauroth Hortonworks http://hortonworks.com/ On Wed, Jun 25, 2014 at 9:48 AM, Owen O'Malley omal...@apache.org wrote: On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com wrote: After reading this thread and thinking a bit about it, I think it should be OK such move up to JDK7 in Hadoop I agree with Alejandro. Changing minimum JDKs is not an incompatible change and is fine in the 2 branch. (Although I think it is would *not* be appropriate for a patch release.) Of course we need to do it with forethought and testing, but moving off of JDK 6, which is EOL'ed is a good thing. Moving to Java 8 as a minimum seems much too aggressive and I would push back on that. I'm also think that we need to let the dust settle on the Hadoop 2 line for a while before we talk about Hadoop 3. It seems that it has only been in the last 6 months that Hadoop 2 adoption has reached the main stream users. Our user community needs time to digest the changes in Hadoop 2.x before we fracture the community by starting to discuss Hadoop 3 releases. .. Owen -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Alejandro -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this
Re: Moving to JDK7, JDK8 and new major releases
Andrew, Thanks for starting this thread. I'll edit the wiki to provide more context around rolling-upgrades etc. which, as I pointed out in the original thread, are key IMHO. On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com wrote: https://wiki.apache.org/hadoop/MovingToJdk7and8 I think based on our current compatibility guidelines, Proposal A is the most attractive. We're pretty hamstrung by the requirement to keep the classpath the same, which would be solved by either OSGI or shading our deps (but that's a different discussion). I don't see that anywhere in our current compatibility guidelines. As you can see from http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html we do not have such a policy (pasted here for convenience): Java Classpath User applications built against Hadoop might add all Hadoop jars (including Hadoop's library dependencies) to the application's classpath. Adding new dependencies or updating the version of existing dependencies may interfere with those in applications' classpaths. Policy Currently, there is NO policy on when Hadoop's dependencies can change. Furthermore, we have *already* changed our classpath in hadoop-2.x. Again, as I pointed out in the previous thread, here is the precedent: On Jun 21, 2014, at 5:59 PM, Arun C Murthy a...@hortonworks.com wrote: Also, this is something we already have done i.e. we updated some of our software deps in hadoop-2.4 v/s hadoop-2.2 - clearly not something as dramatic as JDK. Here are some examples: https://issues.apache.org/jira/browse/HADOOP-9991 https://issues.apache.org/jira/browse/HADOOP-10102 https://issues.apache.org/jira/browse/HADOOP-10103 https://issues.apache.org/jira/browse/HADOOP-10104 https://issues.apache.org/jira/browse/HADOOP-10503 thanks, Arun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Moving to JDK7, JDK8 and new major releases
That classpath policy was explicitly added because we can't lock down our dependencies for security/bug fix reasons, and also because if we do update something explicitly, their transitive dependencies can change -beyond our control. https://issues.apache.org/jira/browse/HADOOP-9555 is an example of this: an update of ZK explicitly to fix an HA problem. Are there changes in its dependencies? I don't know. But we didn't have a choice to update if we wanted NN RM failover to work reliably, so we have to take any other changes that went in. JDK upgrades can be viewed as an extension of this -we are changing the base platform that Hadoop runs on. More precisely, for the Java 6- Java 7 update, we are reflecting the fact that nobody is running in production on Java 6 Do you realise we actually moved to Java 6 in 2008? https://issues.apache.org/jira/browse/HADOOP-2325 . That was six years ago -half the names on that list are not active on the project any more. What we did there was issue a warning in 0.18 that it would be the last Java 5 version; 0.19 moved up -we can do the same for a Hadoop 2.x release at some point this year. On 24 June 2014 11:43, Arun C Murthy a...@hortonworks.com wrote: Andrew, Thanks for starting this thread. I'll edit the wiki to provide more context around rolling-upgrades etc. which, as I pointed out in the original thread, are key IMHO. On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com wrote: https://wiki.apache.org/hadoop/MovingToJdk7and8 I think based on our current compatibility guidelines, Proposal A is the most attractive. We're pretty hamstrung by the requirement to keep the classpath the same, which would be solved by either OSGI or shading our deps (but that's a different discussion). I don't see that anywhere in our current compatibility guidelines. As you can see from http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html we do not have such a policy (pasted here for convenience): Java Classpath User applications built against Hadoop might add all Hadoop jars (including Hadoop's library dependencies) to the application's classpath. Adding new dependencies or updating the version of existing dependencies may interfere with those in applications' classpaths. Policy Currently, there is NO policy on when Hadoop's dependencies can change. Furthermore, we have *already* changed our classpath in hadoop-2.x. Again, as I pointed out in the previous thread, here is the precedent: On Jun 21, 2014, at 5:59 PM, Arun C Murthy a...@hortonworks.com wrote: Also, this is something we already have done i.e. we updated some of our software deps in hadoop-2.4 v/s hadoop-2.2 - clearly not something as dramatic as JDK. Here are some examples: https://issues.apache.org/jira/browse/HADOOP-9991 https://issues.apache.org/jira/browse/HADOOP-10102 https://issues.apache.org/jira/browse/HADOOP-10103 https://issues.apache.org/jira/browse/HADOOP-10104 https://issues.apache.org/jira/browse/HADOOP-10503 thanks, Arun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Moving to JDK7, JDK8 and new major releases
Tx for the new thread Andrew, hopefully it can attract more eyes. Here's what I am behind - a modified proposal C. - Overall I wouldn't think about EOL of JDK7 and/or JDK8 specifically given how long it has taken for JDK6 life-cycle to end. We should try to focus on JDK7 only for now. - As we have seen, a lot (majority?) of orgs on Hadoop have moved beyond JDK6 and are already running on JDK7. So upgrading to JDK7 is more of a reflection of reality (to quote Steve) than it in itself being a disruptive change. - We should try decoupling the discussion of major releases from JDK upgrades. We have seen individual libraries getting updated right in the 2.x lines as and when necessary. Given the new reality of JDK7, I don't see the 'JDK change' as much different from the library upgrades. We have seen how long it has taken (and still taking) users and organization to move from Hadoop 1 to Hadoop 2. A Hadoop 3/4 that adds nothing else other than JDK upgrades will be a big source of confusion for users. A major version update is also seen an opportunity for devs to break APIs. Unless we have groundbreaking 'features' (like YARN or wire-compatibility in Hadoop-2) that a majority of users want and that specifically warrant incompatible changes in our APIs or wire protocols, we are better off separating the major-version update discussion into ints own. Irrespective of all this, we should actively get behind better isolation of user classes/jars from MapReduce classpath. This one's been such a long running concern, it's not funny anymore. Thanks, +Vinod On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, Forking this thread as requested by Vinod. To help anyone who's catching up with this thread, I've written up a wiki page containing what I think are the proposals under discussion. I did my very best to make this as fact-based and disinterested as possible; I really appreciate the constructive discussion we've had so far. If you believe you have a proposal pending, please feel free to edit the wiki. https://wiki.apache.org/hadoop/MovingToJdk7and8 I think based on our current compatibility guidelines, Proposal A is the most attractive. We're pretty hamstrung by the requirement to keep the classpath the same, which would be solved by either OSGI or shading our deps (but that's a different discussion). Thanks, Andrew -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. signature.asc Description: Message signed with OpenPGP using GPGMail
Re: Moving to JDK7, JDK8 and new major releases
Hi all, On dependencies, we've bumped library versions when we think it's safe and the APIs in the new version are compatible. Or, it's not leaked to the app classpath (e.g the JUnit version bump). I think the JIRAs Arun mentioned fall into one of those categories. Steve can do a better job explaining this to me, but we haven't bumped things like Jetty or Guava because they are on the classpath and are not compatible. There is this line in the compat guidelines: - Existing MapReduce, YARN HDFS applications and frameworks should work unmodified within a major release i.e. Apache Hadoop ABI is supported. Since Hadoop apps can and do depend on the Hadoop classpath, the classpath is effectively part of our API. I'm sure there are user apps out there that will break if we make incompatible changes to the classpath. I haven't read up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app out there. Sticking to the theme of work unmodified, let's think about the user effort required to upgrade their JDK. This can be a very expensive task. It might need approval up and down the org, meaning lots of certification, testing, and signoff. Considering the amount of user effort involved here, it really seems like dropping a JDK is something that should only happen in a major release. Else, there's the potential for nasty surprises in a supposedly minor release. That said, we are in an unhappy place right now regarding JDK6, and it's true that almost everyone's moved off of JDK6 at this point. So, I'd be okay with an intermediate 2.x release that drops JDK6 support (but no incompatible changes to the classpath like Guava). This is basically free, and we could start using JDK7 idioms like multi-catch and new NIO stuff in Hadoop code (a minor draw I guess). My higher-level goal though is to avoid going through this same pain again when JDK7 goes EOL. I'd like to do a JDK8-based release before then for this reason. This is why I suggested skipping an intermediate 2.x+JDK7 release and leapfrogging to 3.0+JDK8. 10 months is really not that far in the future, and it seems like a better place to focus our efforts. I was also hoping it'd be realistic to fix our classpath leakage by then, since then we'd have a nice, tight, future-proofed new major release. Thanks, Andrew On Tue, Jun 24, 2014 at 11:43 AM, Arun C Murthy a...@hortonworks.com wrote: Andrew, Thanks for starting this thread. I'll edit the wiki to provide more context around rolling-upgrades etc. which, as I pointed out in the original thread, are key IMHO. On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com wrote: https://wiki.apache.org/hadoop/MovingToJdk7and8 I think based on our current compatibility guidelines, Proposal A is the most attractive. We're pretty hamstrung by the requirement to keep the classpath the same, which would be solved by either OSGI or shading our deps (but that's a different discussion). I don't see that anywhere in our current compatibility guidelines. As you can see from http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html we do not have such a policy (pasted here for convenience): Java Classpath User applications built against Hadoop might add all Hadoop jars (including Hadoop's library dependencies) to the application's classpath. Adding new dependencies or updating the version of existing dependencies may interfere with those in applications' classpaths. Policy Currently, there is NO policy on when Hadoop's dependencies can change. Furthermore, we have *already* changed our classpath in hadoop-2.x. Again, as I pointed out in the previous thread, here is the precedent: On Jun 21, 2014, at 5:59 PM, Arun C Murthy a...@hortonworks.com wrote: Also, this is something we already have done i.e. we updated some of our software deps in hadoop-2.4 v/s hadoop-2.2 - clearly not something as dramatic as JDK. Here are some examples: https://issues.apache.org/jira/browse/HADOOP-9991 https://issues.apache.org/jira/browse/HADOOP-10102 https://issues.apache.org/jira/browse/HADOOP-10103 https://issues.apache.org/jira/browse/HADOOP-10104 https://issues.apache.org/jira/browse/HADOOP-10503 thanks, Arun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Moving to JDK7, JDK8 and new major releases
After reading this thread and thinking a bit about it, I think it should be OK such move up to JDK7 in Hadoop 2 for the following reasons: * Existing Hadoop 2 releases and related projects are running on JDK7 in production. * Commercial vendors of Hadoop have already done lot of work to ensure Hadoop on JDK7 works while keeping Hadoop on JDK6 working. * Different from many of the 3rd party libraries used by Hadoop, JDK is much stricter on backwards compatibility. IMPORTANT: I take this as an exception and not as a carte blanche for 3rd party dependencies and for moving from JDK7 to JDK8 (though it could OK for the later if we end up in the same state of affairs) Even for Hadoop 2.5, I think we could do the move: * Create the Hadoop 2.5 release branch. * Have one nightly Jenkins job that builds Hadoop 2.5 branch with JDK6 to ensure not JDK7 language/API feature creeps out in Hadoop 2.5. Keep this for all Hadoop 2.5.x releases. * Sanity tests for the Hadoop 2.5.x releases should be done with JDK7. * Apply Steve’s patch to require JDK7 on trunk and branch-2. * Move all Apache Jenkins jobs to build/test using JDK7. * Starting from Hadoop 2.6 we support JDK7 language/API features. Effectively what we are ensuring that Hadoop 2.5.x builds and test with JDK6 JDK7 and that all tests towards the release are done with JDK7. Users can proactively upgrade to JDK7 before upgrading to Hadoop 2.5.x, or if upgrade to Hadoop 2.5.x and they run into any issue because of JDK6 (which it would be quite unlikely) they can reactively upgrade to JDK7. Thoughts? On Tue, Jun 24, 2014 at 4:22 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, On dependencies, we've bumped library versions when we think it's safe and the APIs in the new version are compatible. Or, it's not leaked to the app classpath (e.g the JUnit version bump). I think the JIRAs Arun mentioned fall into one of those categories. Steve can do a better job explaining this to me, but we haven't bumped things like Jetty or Guava because they are on the classpath and are not compatible. There is this line in the compat guidelines: - Existing MapReduce, YARN HDFS applications and frameworks should work unmodified within a major release i.e. Apache Hadoop ABI is supported. Since Hadoop apps can and do depend on the Hadoop classpath, the classpath is effectively part of our API. I'm sure there are user apps out there that will break if we make incompatible changes to the classpath. I haven't read up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app out there. Sticking to the theme of work unmodified, let's think about the user effort required to upgrade their JDK. This can be a very expensive task. It might need approval up and down the org, meaning lots of certification, testing, and signoff. Considering the amount of user effort involved here, it really seems like dropping a JDK is something that should only happen in a major release. Else, there's the potential for nasty surprises in a supposedly minor release. That said, we are in an unhappy place right now regarding JDK6, and it's true that almost everyone's moved off of JDK6 at this point. So, I'd be okay with an intermediate 2.x release that drops JDK6 support (but no incompatible changes to the classpath like Guava). This is basically free, and we could start using JDK7 idioms like multi-catch and new NIO stuff in Hadoop code (a minor draw I guess). My higher-level goal though is to avoid going through this same pain again when JDK7 goes EOL. I'd like to do a JDK8-based release before then for this reason. This is why I suggested skipping an intermediate 2.x+JDK7 release and leapfrogging to 3.0+JDK8. 10 months is really not that far in the future, and it seems like a better place to focus our efforts. I was also hoping it'd be realistic to fix our classpath leakage by then, since then we'd have a nice, tight, future-proofed new major release. Thanks, Andrew On Tue, Jun 24, 2014 at 11:43 AM, Arun C Murthy a...@hortonworks.com wrote: Andrew, Thanks for starting this thread. I'll edit the wiki to provide more context around rolling-upgrades etc. which, as I pointed out in the original thread, are key IMHO. On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com wrote: https://wiki.apache.org/hadoop/MovingToJdk7and8 I think based on our current compatibility guidelines, Proposal A is the most attractive. We're pretty hamstrung by the requirement to keep the classpath the same, which would be solved by either OSGI or shading our deps (but that's a different discussion). I don't see that anywhere in our current compatibility guidelines. As you can see from http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html we do not have such a policy (pasted here for convenience): Java Classpath User applications built
Re: Moving to JDK7, JDK8 and new major releases
+1, though I think 2.5 may be premature if we want to send a warning note last ever. That's an issue for followon when in branch 2. Guava and protobuf.jar are two things we have to leave alone, with the first being unfortunate, but their attitude to updates is pretty dramatic. The latter? We all know how traumatic that can be. -Steve On 24 June 2014 16:44, Alejandro Abdelnur t...@cloudera.com wrote: After reading this thread and thinking a bit about it, I think it should be OK such move up to JDK7 in Hadoop 2 for the following reasons: * Existing Hadoop 2 releases and related projects are running on JDK7 in production. * Commercial vendors of Hadoop have already done lot of work to ensure Hadoop on JDK7 works while keeping Hadoop on JDK6 working. * Different from many of the 3rd party libraries used by Hadoop, JDK is much stricter on backwards compatibility. IMPORTANT: I take this as an exception and not as a carte blanche for 3rd party dependencies and for moving from JDK7 to JDK8 (though it could OK for the later if we end up in the same state of affairs) Even for Hadoop 2.5, I think we could do the move: * Create the Hadoop 2.5 release branch. * Have one nightly Jenkins job that builds Hadoop 2.5 branch with JDK6 to ensure not JDK7 language/API feature creeps out in Hadoop 2.5. Keep this for all Hadoop 2.5.x releases. * Sanity tests for the Hadoop 2.5.x releases should be done with JDK7. * Apply Steve’s patch to require JDK7 on trunk and branch-2. * Move all Apache Jenkins jobs to build/test using JDK7. * Starting from Hadoop 2.6 we support JDK7 language/API features. Effectively what we are ensuring that Hadoop 2.5.x builds and test with JDK6 JDK7 and that all tests towards the release are done with JDK7. Users can proactively upgrade to JDK7 before upgrading to Hadoop 2.5.x, or if upgrade to Hadoop 2.5.x and they run into any issue because of JDK6 (which it would be quite unlikely) they can reactively upgrade to JDK7. Thoughts? On Tue, Jun 24, 2014 at 4:22 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, On dependencies, we've bumped library versions when we think it's safe and the APIs in the new version are compatible. Or, it's not leaked to the app classpath (e.g the JUnit version bump). I think the JIRAs Arun mentioned fall into one of those categories. Steve can do a better job explaining this to me, but we haven't bumped things like Jetty or Guava because they are on the classpath and are not compatible. There is this line in the compat guidelines: - Existing MapReduce, YARN HDFS applications and frameworks should work unmodified within a major release i.e. Apache Hadoop ABI is supported. Since Hadoop apps can and do depend on the Hadoop classpath, the classpath is effectively part of our API. I'm sure there are user apps out there that will break if we make incompatible changes to the classpath. I haven't read up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app out there. Sticking to the theme of work unmodified, let's think about the user effort required to upgrade their JDK. This can be a very expensive task. It might need approval up and down the org, meaning lots of certification, testing, and signoff. Considering the amount of user effort involved here, it really seems like dropping a JDK is something that should only happen in a major release. Else, there's the potential for nasty surprises in a supposedly minor release. That said, we are in an unhappy place right now regarding JDK6, and it's true that almost everyone's moved off of JDK6 at this point. So, I'd be okay with an intermediate 2.x release that drops JDK6 support (but no incompatible changes to the classpath like Guava). This is basically free, and we could start using JDK7 idioms like multi-catch and new NIO stuff in Hadoop code (a minor draw I guess). My higher-level goal though is to avoid going through this same pain again when JDK7 goes EOL. I'd like to do a JDK8-based release before then for this reason. This is why I suggested skipping an intermediate 2.x+JDK7 release and leapfrogging to 3.0+JDK8. 10 months is really not that far in the future, and it seems like a better place to focus our efforts. I was also hoping it'd be realistic to fix our classpath leakage by then, since then we'd have a nice, tight, future-proofed new major release. Thanks, Andrew On Tue, Jun 24, 2014 at 11:43 AM, Arun C Murthy a...@hortonworks.com wrote: Andrew, Thanks for starting this thread. I'll edit the wiki to provide more context around rolling-upgrades etc. which, as I pointed out in the original thread, are key IMHO. On Jun 24, 2014, at 11:17 AM, Andrew Wang andrew.w...@cloudera.com wrote: https://wiki.apache.org/hadoop/MovingToJdk7and8 I think based on our current compatibility guidelines,
Re: Moving to JDK7, JDK8 and new major releases
Alejandro, On Tue, Jun 24, 2014 at 4:44 PM, Alejandro Abdelnur t...@cloudera.com wrote: After reading this thread and thinking a bit about it, I think it should be OK such move up to JDK7 in Hadoop 2 for the following reasons: * Existing Hadoop 2 releases and related projects are running on JDK7 in production. * Commercial vendors of Hadoop have already done lot of work to ensure Hadoop on JDK7 works while keeping Hadoop on JDK6 working. * Different from many of the 3rd party libraries used by Hadoop, JDK is much stricter on backwards compatibility. +1 - I think we are all on the same page here. Fully agree. IMPORTANT: I take this as an exception and not as a carte blanche for 3rd party dependencies and for moving from JDK7 to JDK8 (though it could OK for the later if we end up in the same state of affairs) +1. Agree again - let's just wait/watch. From the thread I've become more convinced that (as you've noted before) that since we are at the bottom of the stack, we need to be more conservative. From http://www.oracle.com/technetwork/java/eol-135779.html, it looks like April 2015 is the *earliest* Java7 will EOL. Java6 EOL was Feb 2011 and we are still debating whether we can stop supporting it. So, my guess is that we will support Java7 at least for a year after it's EOL i.e. till sometime in early 2016. It's just practical. Net - We really don't have a good idea when a significant portion of users will actually migrate to Java 8. W.r.t Java7 this took nearly 3 years after Java6 EOL. So for now, let's just wait see how things develop in the field. Even for Hadoop 2.5, I think we could do the move: * Create the Hadoop 2.5 release branch. * Have one nightly Jenkins job that builds Hadoop 2.5 branch with JDK6 to ensure not JDK7 language/API feature creeps out in Hadoop 2.5. Keep this for all Hadoop 2.5.x releases. * Sanity tests for the Hadoop 2.5.x releases should be done with JDK7. * Apply Steve’s patch to require JDK7 on trunk and branch-2. * Move all Apache Jenkins jobs to build/test using JDK7. * Starting from Hadoop 2.6 we support JDK7 language/API features. I think the mechanics make perfect sense to me. I think we should probably think a bit more on whether we drop support for JDK6 in hadoop-2.6 or hadoop-2.7. I'd like to add one more: * Sometime soon (within a release or two) after we actually drop support for Java6 and move branch-2 to JDK7, let's also start testing on Java8. This way we will be ready for Java8 early regardless of when we stop support for Java7. Dropping Java7 is a bridge we can cross when we come to it. thanks, Arun Effectively what we are ensuring that Hadoop 2.5.x builds and test with JDK6 JDK7 and that all tests towards the release are done with JDK7. Users can proactively upgrade to JDK7 before upgrading to Hadoop 2.5.x, or if upgrade to Hadoop 2.5.x and they run into any issue because of JDK6 (which it would be quite unlikely) they can reactively upgrade to JDK7. Thoughts? On Tue, Jun 24, 2014 at 4:22 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi all, On dependencies, we've bumped library versions when we think it's safe and the APIs in the new version are compatible. Or, it's not leaked to the app classpath (e.g the JUnit version bump). I think the JIRAs Arun mentioned fall into one of those categories. Steve can do a better job explaining this to me, but we haven't bumped things like Jetty or Guava because they are on the classpath and are not compatible. There is this line in the compat guidelines: - Existing MapReduce, YARN HDFS applications and frameworks should work unmodified within a major release i.e. Apache Hadoop ABI is supported. Since Hadoop apps can and do depend on the Hadoop classpath, the classpath is effectively part of our API. I'm sure there are user apps out there that will break if we make incompatible changes to the classpath. I haven't read up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app out there. Sticking to the theme of work unmodified, let's think about the user effort required to upgrade their JDK. This can be a very expensive task. It might need approval up and down the org, meaning lots of certification, testing, and signoff. Considering the amount of user effort involved here, it really seems like dropping a JDK is something that should only happen in a major release. Else, there's the potential for nasty surprises in a supposedly minor release. That said, we are in an unhappy place right now regarding JDK6, and it's true that almost everyone's moved off of JDK6 at this point. So, I'd be okay with an intermediate 2.x release that drops JDK6 support (but no incompatible changes to the classpath like Guava). This is basically free, and we could start using JDK7 idioms like multi-catch and new NIO stuff in Hadoop code (a minor draw I guess). My
Re: Moving to JDK7, JDK8 and new major releases
On Jun 24, 2014, at 4:22 PM, Andrew Wang andrew.w...@cloudera.com wrote: Since Hadoop apps can and do depend on the Hadoop classpath, the classpath is effectively part of our API. I'm sure there are user apps out there that will break if we make incompatible changes to the classpath. I haven't read up on the MR JIRA Arun mentioned, but there MR isn't the only YARN app out there. I think there is a some confusion/misunderstanding here. With hadoop-2 the user is completely in control of his own classpath (we had a similar, but limited capability in hadoop-1 w/ https://issues.apache.org/jira/browse/MAPREDUCE-1938). Furthermore, it's probably not well known that in hadoop-2 the user application (MR or otherwise) can also pick the JDK version by using JAVA_HOME env for the container. So, in effect, MR applications can continue to use java6 while YARN is running java7 - this hasn't been tested extensively though. This capability did not exist in hadoop-1. We've also made some progress with https://issues.apache.org/jira/browse/MAPREDUCE-1700 to defuse user jar-deps from MR system jars. https://issues.apache.org/jira/browse/MAPREDUCE-4421 also helps by ensuring MR applications can pick exact version of MR jars they were compiled against; and not rely on cluster installs. Hope that helps somewhat. thanks, Arun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.