[jira] [Created] (HADOOP-11691) X86 build of libwinutils is broken
Remus Rusanu created HADOOP-11691: - Summary: X86 build of libwinutils is broken Key: HADOOP-11691 URL: https://issues.apache.org/jira/browse/HADOOP-11691 Project: Hadoop Common Issue Type: Bug Components: build, native Affects Versions: 3.0.0 Reporter: Remus Rusanu Hadoop-9922 recently fixed x86 build. After YARN-2190 compiling x86 results in error: {code} (Link target) - E:\HW\project\hadoop-common\hadoop-common-project\hadoop-common\target/winutils/hadoopwinutilsvc_s.obj : fatal error LNK1112: module machine type 'x64' conflicts with target machine type 'X86' [E:\HW\project\hadoop-common\hadoop-common-project\hadoop-common\src\main\winutils\winutils.vcxproj] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HADOOP-6228) Configuration should allow storage of null values.
[ https://issues.apache.org/jira/browse/HADOOP-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer reopened HADOOP-6228: -- argh wrong jira. Configuration should allow storage of null values. -- Key: HADOOP-6228 URL: https://issues.apache.org/jira/browse/HADOOP-6228 Project: Hadoop Common Issue Type: Bug Components: conf Reporter: Hemanth Yamijala Currently the configuration class does not allow null keys and values. Null keys don't make sense, but null values may have semantic meaning for some features. Not storing these values in configuration causes some arguable side effects. For instance, if a value is defined in defaults, but wants to be disabled in site configuration by setting it to null, there's no way to do this currently. Also, no track of keys with null values is recorded. Hence, tools like dump configuration (HADOOP-6184) would not display these properties. Does this seem like a sensible use case ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-5606) Final parameters in Configuration doesnt get serialized
[ https://issues.apache.org/jira/browse/HADOOP-5606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-5606. -- Resolution: Duplicate Closing as a dupe of HADOOP-6317. Final parameters in Configuration doesnt get serialized --- Key: HADOOP-5606 URL: https://issues.apache.org/jira/browse/HADOOP-5606 Project: Hadoop Common Issue Type: Bug Components: conf Reporter: Amar Kamat Attachments: final.patch Here are the steps to reproduce the bug # Mark a parameter as _final_ in hadoop-site.xml # Load the conf in some job # Change the final parameter # Write the conf to a file The final parameter gets overridden upon serialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-9529) It looks like hadoop.tmp.dir is being used both for local and hdfs directories
[ https://issues.apache.org/jira/browse/HADOOP-9529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-9529. -- Resolution: Duplicate Fix Version/s: HADOOP-8970 Closing this as a dupe of HADOOP-8970 It looks like hadoop.tmp.dir is being used both for local and hdfs directories -- Key: HADOOP-9529 URL: https://issues.apache.org/jira/browse/HADOOP-9529 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 0.20.205.0 Environment: Ubuntu Server 12.04 Reporter: Ronald Kevin Burton Assignee: Arpit Gupta Fix For: HADOOP-8970 Original Estimate: 48h Remaining Estimate: 48h I would like to separate out the files that are written to /tmp so I added a definition for hadoop.tmp.dir which value I understand as a local folder. It apparently also specifies an HDFS folder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-6228) Configuration should allow storage of null values.
[ https://issues.apache.org/jira/browse/HADOOP-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-6228. -- Resolution: Duplicate Closing as a dupe since the other jira has code attached. Configuration should allow storage of null values. -- Key: HADOOP-6228 URL: https://issues.apache.org/jira/browse/HADOOP-6228 Project: Hadoop Common Issue Type: Bug Components: conf Reporter: Hemanth Yamijala Currently the configuration class does not allow null keys and values. Null keys don't make sense, but null values may have semantic meaning for some features. Not storing these values in configuration causes some arguable side effects. For instance, if a value is defined in defaults, but wants to be disabled in site configuration by setting it to null, there's no way to do this currently. Also, no track of keys with null values is recorded. Hence, tools like dump configuration (HADOOP-6184) would not display these properties. Does this seem like a sensible use case ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-8056) Configuration doesn't pass empty string values to tasks
[ https://issues.apache.org/jira/browse/HADOOP-8056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-8056. -- Resolution: Duplicate Duping this to HADOOP-6228 since there are more people on that JIRA. Configuration doesn't pass empty string values to tasks --- Key: HADOOP-8056 URL: https://issues.apache.org/jira/browse/HADOOP-8056 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 0.20.2, 1.0.0 Reporter: Luca Pireddu If I assign an *empty string* as a value to a property in a JobConf 'job' while I'm preparing it to run, the Configuration does store that value. I can retrieve it later while in the same process and the value is maintained. However, if then call JobClient.runJob(job), the Configuration that is received by the Map and Reduce tasks doesn't contain the property, and calling JobConf.get with that property name returns null (instead of an empty string). Futher, if I inspect the job's configuration via Hadoop's web interface, the property isn't present. It seems as if whatever serialization mechanism that is used to transmit the Configuration from the job client to the tasks discards properties with value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11685) StorageException complaining no lease ID during HBase distributed log splitting
[ https://issues.apache.org/jira/browse/HADOOP-11685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Xu resolved HADOOP-11685. - Resolution: Cannot Reproduce This exception occurred before HADOOP-11523 was patched and we lacked storage logs for further debugging and cannot reproduce. I will close this for now, if this error appears again after HADOOP-11523 fix in PROD. I will reopen this JIRA. StorageException complaining no lease ID during HBase distributed log splitting -- Key: HADOOP-11685 URL: https://issues.apache.org/jira/browse/HADOOP-11685 Project: Hadoop Common Issue Type: Bug Components: tools Reporter: Duo Xu Assignee: Duo Xu This is similar to HADOOP-11523, but in a different place. During HBase distributed log splitting, multiple threads will access the same folder called recovered.edits. However, lots of places in our WASB code did not acquire lease and simply passed null to Azure storage, which caused this issue. {code} 2015-02-26 03:21:28,871 WARN org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of WALs/workernode4.hbaseproddm2001.g6.internal.cloudapp.net,60020,1422071058425-splitting/workernode4.hbaseproddm2001.g6.internal.cloudapp.net%2C60020%2C1422071058425.1424914216773 failed, returning error java.io.IOException: org.apache.hadoop.fs.azure.AzureException: java.io.IOException at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.checkForErrors(HLogSplitter.java:633) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.access$000(HLogSplitter.java:121) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWriting(HLogSplitter.java:964) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(HLogSplitter.java:1019) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:359) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:223) at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:142) at org.apache.hadoop.hbase.regionserver.handler.HLogSplitterHandler.process(HLogSplitterHandler.java:79) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.fs.azure.AzureException: java.io.IOException at org.apache.hadoop.fs.azurenative.AzureNativeFileSystemStore.storeEmptyFolder(AzureNativeFileSystemStore.java:1477) at org.apache.hadoop.fs.azurenative.NativeAzureFileSystem.mkdirs(NativeAzureFileSystem.java:1862) at org.apache.hadoop.fs.azurenative.NativeAzureFileSystem.mkdirs(NativeAzureFileSystem.java:1812) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1815) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getRegionSplitEditsPath(HLogSplitter.java:502) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.createWAP(HLogSplitter.java:1211) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(HLogSplitter.java:1200) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.append(HLogSplitter.java:1243) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:851) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:843) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:813) Caused by: java.io.IOException at com.microsoft.windowsazure.storage.core.Utility.initIOException(Utility.java:493) at com.microsoft.windowsazure.storage.blob.BlobOutputStream.close(BlobOutputStream.java:282) at org.apache.hadoop.fs.azurenative.AzureNativeFileSystemStore.storeEmptyFolder(AzureNativeFileSystemStore.java:1472) ... 10 more Caused by: com.microsoft.windowsazure.storage.StorageException: There is currently a lease on the blob and no lease ID was specified in the request. at com.microsoft.windowsazure.storage.StorageException.translateException(StorageException.java:163) at com.microsoft.windowsazure.storage.core.StorageRequest.materializeException(StorageRequest.java:306) at com.microsoft.windowsazure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:229) at
Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
If 3.x is going to be Java 8 not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016. Issue: JDK 8 vs 7 It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that it's working, don't update it philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it. You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available. What we can't do in hadoop coretoday is set javac.version=1.8 use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions. So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative. Issue: Incompatible changes Without knowing what is proposed for an incompatible classpath change, I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, rewrite your code event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change... Issue: Getting trunk out the door The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop other downstream hadoop stacks, but developers don't need to worry about this: no recompilation necessary Proposed: ship trunk as a 2.x release, compatible with JDK7 Java code. It seems to me that I could go git checkout trunk mvn versions:set -DnewVersion=2.8.0-SNAPSHOT We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code JDK7+ clusters. A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules generally hate the hadoop dev team This lets us tick off the recent trunk release and fixed shell scripts items, pushing out those benefits to people sooner rather than later, and puts off the Hello, we've just broken your code event for another 12+ months. Comments? -Steve
[jira] [Created] (HADOOP-11694) Über-jira: S3a stabilisation phase II
Steve Loughran created HADOOP-11694: --- Summary: Über-jira: S3a stabilisation phase II Key: HADOOP-11694 URL: https://issues.apache.org/jira/browse/HADOOP-11694 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Affects Versions: 2.7.0 Reporter: Steve Loughran Fix For: 2.8.0 HADOOP-11571 covered the core s3a bugs surfacing in Hadoop-2.6 other enhancements to improve S3 (performance, proxy, custom endpoints) This JIRA covers post-2.7 issues and enhancements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11696) update compatibility documentation to reflect only API changes matter
Allen Wittenauer created HADOOP-11696: - Summary: update compatibility documentation to reflect only API changes matter Key: HADOOP-11696 URL: https://issues.apache.org/jira/browse/HADOOP-11696 Project: Hadoop Common Issue Type: Bug Reporter: Allen Wittenauer Given the changes file generated by processing JIRA and current discussion in common-dev, we should update the compatibility documents to reflect reality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
Java 7 will be end-of-lifed in April 2015. I think it would be unwise to plan a new Hadoop release against a version of Java that is almost obsolete and (soon) no longer receiving security updates. I think people will be willing to roll out a new version of Java for Hadoop 3.x. Similarly, the whole point of bumping the major version number is the ability to make incompatible changes. There are already a bunch of incompatible changes in the trunk branch. Are you proposing to revert those? Or push them into newly created feature branches? This doesn't seem like a good idea to me. I would be in favor of backporting targetted incompatible changes from trunk to branch-2. For example, we could consider pulling in Allen's shell script rewrite. But pulling in all of trunk seems like a bad idea at this point, if we want a 2.x release. best, Colin On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran ste...@hortonworks.com wrote: If 3.x is going to be Java 8 not backwards compatible, I don't expect anyone wanting to use this in production until some time deep into 2016. Issue: JDK 8 vs 7 It will require Hadoop clusters to move up to Java 8. While there's dev pull for this, there's ops pull against this: people are still in the moving-off Java 6 phase due to that it's working, don't update it philosophy. Java 8 is compelling to us coders, but that doesn't mean ops want it. You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, the main thing is setting up JAVA_HOME. That's something we could make easier somehow (maybe some min Java version field in resource requests that will let apps say java 8, java 9, ...). YARN could not only set up JVM paths, it could fail-fast if a Java version wasn't available. What we can't do in hadoop coretoday is set javac.version=1.8 use java 8 code. Downstream code ca do that (Hive, etc); they just need to accept that they don't get to play on JDK7 clusters if they embrace l-expressions. So...we need to stay on java 7 for some time due to ops pull; downstream apps get to choose what they want. We can/could enhance YARN to make JVM choice more declarative. Issue: Incompatible changes Without knowing what is proposed for an incompatible classpath change, I can't say whether this is something that could be made optional. If it isn't, then it is a python-3 class option, rewrite your code event, which is going to be particularly traumatic to things like Hive that already do complex CP games. I'm currently against any mandatory change here, though would love to see an optional one. And if optional, it ceases to become an incompatible change... Issue: Getting trunk out the door The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop other downstream hadoop stacks, but developers don't need to worry about this: no recompilation necessary Proposed: ship trunk as a 2.x release, compatible with JDK7 Java code. It seems to me that I could go git checkout trunk mvn versions:set -DnewVersion=2.8.0-SNAPSHOT We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code JDK7+ clusters. A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules generally hate the hadoop dev team This lets us tick off the recent trunk release and fixed shell scripts items, pushing out those benefits to people sooner rather than later, and puts off the Hello, we've just broken your code event for another 12+ months. Comments? -Steve
Re: Hadoop - Major releases
Hi Guys, From my prospective @ ebay we are not going to upgrade to JDK 8 any time soon we just upgraded to 7 and not want to move further at least this year so I will request you guys not to drop the support for JDK 7 as that would be very crucial for us to move forward. We also just completed our Hadoop 2 migration for all clusters this year which we started earlier last year, so I don't think we can do again major upgrades this year. Stabilizing the major releases takes lots of effort and time, I think Hadoop 3.x makes sense at least for us next year. Thanks, Mayank On Mon, Mar 9, 2015 at 12:29 AM, Arun Murthy a...@hortonworks.com wrote: Over the last few days, we have had lots of discussions that have intertwined several major themes: # When/why do we make major Hadoop releases? # When/how do we move to major JDK versions? # To a lesser extent, we have debated another theme: what do we do about trunk? For now, let's park JDK trunk to treat them in a separate thread(s). For a while now, I've had a couple of lampposts in my head which I used for guidance - apologize for not sharing this broadly prior to this discussion, maybe putting it out here will help - certainly hope so. Major Releases Hadoop continues to benefit tremendously by the investment in stability, validation etc. put in by its *anchor* users: Yahoo, Facebook, Twitter, eBay, LinkedIn etc. A historical perspective... In it's lifetime, Apache Hadoop went from monthly to quarterly releases because, as Hadoop became more and more of a production system (starting with hadoop-0.16 and more so with hadoop 0.18), users could not absorb the torrid pace of change. IMHO, we didn't go far enough in addressing the competing pressures of stability v/s rapid innovation. We paid for it by losing one of our anchor users - Facebook - around the time of hadoop-0.19 - they just forked. Around the same time, Yahoo hit the same problem (I know, I lived through it painfully) and got stuck with hadoop-0.20 for a *very* long time and forked to add Security rather than deal with the next major release (hadoop-0.21). Later on, Facebook did the same, and, unfortunately for the community, is stuck - probably forever - on their fork of hadoop-0.20. Overall, these were dark days for the community: every anchor user was on their own fork, and it took a toll on the project. Recently, thankfully for Hadoop, we have had a period of relative stability with hadoop-1.x and hadoop-2.x. Even so, there were close shaves: Yahoo was on hadoop-0.23 for a *very* long time - in fact, they are only just now finishing their migration to hadoop-2.x. I think the major lessons here are the obvious ones: # Compatibility matters # Maintaining ?multiple major releases, in parallel, is a big problem - it leads to an unproductive, and risky, split in community investment along different lines. Looking Ahead Given the above, here are some thoughts for looking ahead: # Be very conservative about major releases - a major benefit is required (features) for the cost. Let's not compel our anchor users like Yahoo, Twitter, eBay, and LinkedIn to invest in previous releases rather than the latest one. Let's hear more from them - and let's be very accommodating to them - for they play a key role in keeping Hadoop healthy stable. # Be conservative about dropping support for JDKs. In particular, let's hear from our anchor users on their plans for adoption jdk-1.8. LinkedIn has already moved to jdk-1.8, which is great for the validation , but let's wait for the rest of our anchor users to move before we drop jdk-1.7. We did the same thing with jdk-1.6 - waited for them to move before we drop support for jdk-1.7. Overall, I'd love to hear more from Twitter, Yahoo, eBay and other anchor users on their plans for jdk-1.8 specifically, and on their overall appetite for hadoop-3. Let's not finalize our plans for moving forward until this input has been considered. Thoughts? thanks, Arun Unfortunate that it's necessary disclaimers: # Before people point out vendor affiliations to lend unnecessary color to my opinions, let me state that hadoop-2 v/s hadoop-3 is a non-issue for us. For major HDP versions the key is, just, compatibility?... e.g. we ship major, but compatible, community releases such as hive-0.13/hive-0.14 in HDP-2.x/HDP-2.x+1 etc. # Also, release management is a similar non-issue - we have already had several individuals step up in hadoop-2.x line. Expect more of the same from folks like Andrew, Karthik, Vinod, Steve etc. -- Thanks and Regards, Mayank Cell: 408-718-9370
Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
Steve, From: Steve Loughran ste...@hortonworks.com Sent: Monday, March 09, 2015 2:15 PM To: mapreduce-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; common-dev@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015? Issue: Getting trunk out the door The main diff from branch-2 and trunk is currently the bash script changes. These don't break client apps. May or may not break bigtop other downstream hadoop stacks, but developers don't need to worry about this: no recompilation necessary Proposed: ship trunk as a 2.x release, compatible with JDK7 Java code. It seems to me that I could go git checkout trunk mvn versions:set -DnewVersion=2.8.0-SNAPSHOT We'd then have a version of Hadoop-trunk we could ship later this year, compatible at the JDK and API level with the existing java code JDK7+ clusters. A classpath fix that is optional/compatible can then go out on the 2.x line, saving the 3.x tag for something that really breaks things, forces all downstream apps to set up new hadoop profiles, have separate modules generally hate the hadoop dev team This seems like a great idea, something I hadn't considered before since most patches were flowing into branch-2 anyway - makes a lot of sense. We could just drop branch-2 while we are at it too. It's just a pain to maintain an extra branch. Also, we should formalize that major features should always come via feature branches - allows for some oversight on compatibility etc. as a whole (not piecemeal) when the feature branch is merged. In particular, let's also make sure we ship the script changes in a compatible manner. Happy to help. Given that Vinod has stepped up for 2.7, would you like to drive 2.8? Practically, this is reality already, but something to formalize: having RMs per dot release (Karthik for 2.5, Vinod for 2.7, Steve for 2.8 etc.). thanks, Arun
Re: Looking to a Hadoop 3 release
On Mar 6, 2015, at 5:20 PM, Chris Douglas cdoug...@apache.org wrote: On Fri, Mar 6, 2015 at 4:32 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: I'd encourage everyone to post their wish list on the Roadmap wiki that *warrants* making incompatible changes forcing us to go 3.x. This is a useful exercise, but not a prerequisite to releasing 3.0.0 as an alpha off of trunk, right? Andrew summarized the operating assumptions for anyone working on it: rolling upgrades still work, wire compat is preserved, breaking changes may get rolled back when branch-3 is in beta (so be very conservative, notify others loudly). This applies to branches merged to trunk, also. Not a prerequisite for alpha releases, yes. But it will be for a 'GA' release, because after that we will be back to restricting incompatible changes on 3.x line and we have to say no to features that need API breakage after that. If others feel there are features that warrant incompatibility, we should hear about them for inclusion in such a 3.x release. Till now, the operating assumption was to not break anything as much as possible. If we are opening the window on incompatibilities in 3.x, might as well get everyone to think about stuff that they want. +1 to Jason's comments on general. We can keep rolling alphas that downstream can pick up, but I'd also like us to clarify the exit criterion for a GA release of 3.0 and its relation to the life of 2.x if we are going this route. This brings us back to the roadmap discussion, and a collective agreement about a logical step at a future point in time where we say we have enough incompatible features in 3.x that we can stop putting more of them and start stabilizing it. We'll have this discussion again. We don't need to reach consensus on the roadmap, just that each artifact reflects the output of the project. Agreed. I wasn't requesting us to reach a consensus on the roadmap. Just requesting others to put their wish list up. Irrespective of that, here is my proposal in the interim: - Run JDK7 + JDK8 first in a compatible manner like I mentioned before for atleast two releases in branch-2: say 2.8 and 2.9 before we consider taking up the gauntlet on 3.0. - Continue working on the classpath isolation effort and try making it as compatible as is possible for users to opt in and migrate easily. +1 for 2.x, but again I don't understand the sequencing. -C There isn't. I was saying Irrespective of that.. Thanks, +Vinod
[jira] [Resolved] (HADOOP-8604) conf/* files overwritten at Hadoop compilation
[ https://issues.apache.org/jira/browse/HADOOP-8604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-8604. -- Resolution: Won't Fix Closing as won't fix. conf/* files overwritten at Hadoop compilation -- Key: HADOOP-8604 URL: https://issues.apache.org/jira/browse/HADOOP-8604 Project: Hadoop Common Issue Type: Bug Components: conf Affects Versions: 1.0.3 Reporter: Robert Grandl Priority: Minor Whenever I compile hadoop from terminal as: ant compile jar run all the conf/* files are overwritten. I am not sure if some of them should not be like that but at least hadoop-env.sh, mapred-site.ml, core-site.xml, hdfs-site.xml, masters, slaves should remains. Otherwise I am forced to backup and replace content again after compilation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Looking to a Hadoop 3 release
On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote: 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. + 1 on this. sanjay
Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
On 09/03/2015 15:56, Andrew Wang andrew.w...@cloudera.com wrote: I find this proposal very surprising. We've intentionally deferred incompatible changes to trunk, because they are incompatible and do not belong in a minor release. Now we are supposed to blur our eyes and release these changes anyway? I don't see this ending well. I'm staring at CHANGES.TXT thinking 'how can we ship something off trunk that has as many of these as we can get out ‹especially those shell script bits‹ in a way that doesn't break everything. Because there's a lot of improvements and bug fixes there which aren't going to be anyone's hands for a long time otherwise, not just due to any proposed 3.x release schedule, but because of the java 8 requirements as well as classloader stuff. One higher-level goal we should be working towards is tightening our compatibility guarantees, not loosening them. This is why I've been highlighting classpath isolation as a 3.0 feature, since this is one of the biggest issues faced by our users and downstreams. I think a 3.0 with an improved compatibility story will make operators and downstreams much happier than releasing trunk as 2.8. Best, Andrew I still want to see what's being proposed here. Having classpath isolation will make the JAR upgrade story in 3.x a lot cleaner, but we can't go to every app that imports hadoop-hdfs-client and say your code just broke, not if they want their apps to continue to run on Hadoop 2 and/or Java 7. Which, given that Java 7 is still something cluster ops teams are coming to terms with, is going to be a while
[jira] [Resolved] (HADOOP-11571) Über-jira: S3a stabilisation phase I
[ https://issues.apache.org/jira/browse/HADOOP-11571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-11571. - Resolution: Fixed Über-jira: S3a stabilisation phase I Key: HADOOP-11571 URL: https://issues.apache.org/jira/browse/HADOOP-11571 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Steve Loughran Priority: Blocker Fix For: 2.7.0 s3a shipped in 2.6; now its out various corner cases, scale and error handling issues are surfacing. fix them before 2.7 ships -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-7220) documentation lists options in wrong order
[ https://issues.apache.org/jira/browse/HADOOP-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-7220. -- Resolution: Won't Fix stale documentation lists options in wrong order -- Key: HADOOP-7220 URL: https://issues.apache.org/jira/browse/HADOOP-7220 Project: Hadoop Common Issue Type: Bug Reporter: Dieter Plaetinck Priority: Minor Original Estimate: 1h Remaining Estimate: 1h On http://hadoop.apache.org/common/docs/r0.20.2/streaming.html various example use -D flags. I noticed if you invoke hadoop this way, it won't work. dplaetin@n-0:/usr/local/hadoop/bin$ ./hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar -file /proj/Search/wall/experiment/ -mapper './build-models.py --mapper' -reducer './build-models.py --reducer' -input sim-input -output sim-output -D mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator -D mapred.text.key.comparator.options=-k1,2n 11/04/12 10:39:28 ERROR streaming.StreamJob: Unrecognized option: -D Usage: $HADOOP_HOME/bin/hadoop jar \ $HADOOP_HOME/hadoop-streaming.jar [options] Options: -inputpath DFS input file(s) for the Map step -output path DFS output directory for the Reduce step -mapper cmd|JavaClassName The streaming command to run -combiner JavaClassName Combiner has to be a Java class -reducer cmd|JavaClassName The streaming command to run -file file File/dir to be shipped in the Job jar file -inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional. -outputformat TextOutputFormat(default)|JavaClassName Optional. -partitioner JavaClassName Optional. -numReduceTasks num Optional. -inputreader spec Optional. -cmdenv n=vOptional. Pass env.var to streaming commands -mapdebug path Optional. To run this script when a map task fails -reducedebug path Optional. To run this script when a reduce task fails -verbose Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] For more details about these options: Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info Streaming Job Failed! I could only make it work by moving the '-D flags to the front' (right after the streaming.jar part). maybe because it's a generic option, it needs to be in front or something. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11698) remove distcpv1 from hadoop-extras
Allen Wittenauer created HADOOP-11698: - Summary: remove distcpv1 from hadoop-extras Key: HADOOP-11698 URL: https://issues.apache.org/jira/browse/HADOOP-11698 Project: Hadoop Common Issue Type: Bug Components: tools/distcp Affects Versions: 3.0.0 Reporter: Allen Wittenauer distcpv1 is pretty much unsupported. we should just remove it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Hadoop - Major releases
Hi Mayank, Note that Hadoop 3 does not mean the end of updates for Hadoop 2.x, which will keep supporting JDK7 for a while yet. Someone on the original thread also proposed keeping Hadoop 3 JDK7-source compatible to make backports to 2.x easier. I support this. Note also that the jump from Hadoop 1 to Hadoop 2 (which is what I assume was your previous migration) is a far, far more impactful change than what is being proposed for Hadoop 3. Hadoop 3 will look basically like a 2.x release except for the JDK8 bump and classpath isolation. The intent is to otherwise maintain wire and API compatibility. Overall your timeline sounds like it fits the schedule I proposed. If we release a 3.0 GA this year, it means you can upgrade to a baked 3.1 or 3.2 next year. Seems like a sound upgrade procedure for a large cluster. Best, Andrew On Mon, Mar 9, 2015 at 2:24 PM, Mayank Bansal maban...@gmail.com wrote: Hi Guys, From my prospective @ ebay we are not going to upgrade to JDK 8 any time soon we just upgraded to 7 and not want to move further at least this year so I will request you guys not to drop the support for JDK 7 as that would be very crucial for us to move forward. We also just completed our Hadoop 2 migration for all clusters this year which we started earlier last year, so I don't think we can do again major upgrades this year. Stabilizing the major releases takes lots of effort and time, I think Hadoop 3.x makes sense at least for us next year. Thanks, Mayank On Mon, Mar 9, 2015 at 12:29 AM, Arun Murthy a...@hortonworks.com wrote: Over the last few days, we have had lots of discussions that have intertwined several major themes: # When/why do we make major Hadoop releases? # When/how do we move to major JDK versions? # To a lesser extent, we have debated another theme: what do we do about trunk? For now, let's park JDK trunk to treat them in a separate thread(s). For a while now, I've had a couple of lampposts in my head which I used for guidance - apologize for not sharing this broadly prior to this discussion, maybe putting it out here will help - certainly hope so. Major Releases Hadoop continues to benefit tremendously by the investment in stability, validation etc. put in by its *anchor* users: Yahoo, Facebook, Twitter, eBay, LinkedIn etc. A historical perspective... In it's lifetime, Apache Hadoop went from monthly to quarterly releases because, as Hadoop became more and more of a production system (starting with hadoop-0.16 and more so with hadoop 0.18), users could not absorb the torrid pace of change. IMHO, we didn't go far enough in addressing the competing pressures of stability v/s rapid innovation. We paid for it by losing one of our anchor users - Facebook - around the time of hadoop-0.19 - they just forked. Around the same time, Yahoo hit the same problem (I know, I lived through it painfully) and got stuck with hadoop-0.20 for a *very* long time and forked to add Security rather than deal with the next major release (hadoop-0.21). Later on, Facebook did the same, and, unfortunately for the community, is stuck - probably forever - on their fork of hadoop-0.20. Overall, these were dark days for the community: every anchor user was on their own fork, and it took a toll on the project. Recently, thankfully for Hadoop, we have had a period of relative stability with hadoop-1.x and hadoop-2.x. Even so, there were close shaves: Yahoo was on hadoop-0.23 for a *very* long time - in fact, they are only just now finishing their migration to hadoop-2.x. I think the major lessons here are the obvious ones: # Compatibility matters # Maintaining ?multiple major releases, in parallel, is a big problem - it leads to an unproductive, and risky, split in community investment along different lines. Looking Ahead Given the above, here are some thoughts for looking ahead: # Be very conservative about major releases - a major benefit is required (features) for the cost. Let's not compel our anchor users like Yahoo, Twitter, eBay, and LinkedIn to invest in previous releases rather than the latest one. Let's hear more from them - and let's be very accommodating to them - for they play a key role in keeping Hadoop healthy stable. # Be conservative about dropping support for JDKs. In particular, let's hear from our anchor users on their plans for adoption jdk-1.8. LinkedIn has already moved to jdk-1.8, which is great for the validation , but let's wait for the rest of our anchor users to move before we drop jdk-1.7. We did the same thing with jdk-1.6 - waited for them to move before we drop support for jdk-1.7. Overall, I'd love to hear more from Twitter, Yahoo, eBay and other anchor users on their plans for jdk-1.8 specifically, and on their overall
Re: Hadoop - Major releases
Hi Andrew, I wish things are as simple as you are pointing out. At least they are not for us so far. Couple of things 1. We would be moving to Hadoop -3 (Not this year though) however I don't see we can do another JDK upgrade so soon. So the point I am trying to make is we should be supporting jdk 7 as well for Hadoop-3. 2. For the sake of JDK 8 and classpath isolation we shouldn't be making another release as those can be supported in Hadoop 2 as well, so what is the motivation of making Hadoop 3 so soon? Thanks, Mayank On Mon, Mar 9, 2015 at 3:34 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi Mayank, Note that Hadoop 3 does not mean the end of updates for Hadoop 2.x, which will keep supporting JDK7 for a while yet. Someone on the original thread also proposed keeping Hadoop 3 JDK7-source compatible to make backports to 2.x easier. I support this. Note also that the jump from Hadoop 1 to Hadoop 2 (which is what I assume was your previous migration) is a far, far more impactful change than what is being proposed for Hadoop 3. Hadoop 3 will look basically like a 2.x release except for the JDK8 bump and classpath isolation. The intent is to otherwise maintain wire and API compatibility. Overall your timeline sounds like it fits the schedule I proposed. If we release a 3.0 GA this year, it means you can upgrade to a baked 3.1 or 3.2 next year. Seems like a sound upgrade procedure for a large cluster. Best, Andrew On Mon, Mar 9, 2015 at 2:24 PM, Mayank Bansal maban...@gmail.com wrote: Hi Guys, From my prospective @ ebay we are not going to upgrade to JDK 8 any time soon we just upgraded to 7 and not want to move further at least this year so I will request you guys not to drop the support for JDK 7 as that would be very crucial for us to move forward. We also just completed our Hadoop 2 migration for all clusters this year which we started earlier last year, so I don't think we can do again major upgrades this year. Stabilizing the major releases takes lots of effort and time, I think Hadoop 3.x makes sense at least for us next year. Thanks, Mayank On Mon, Mar 9, 2015 at 12:29 AM, Arun Murthy a...@hortonworks.com wrote: Over the last few days, we have had lots of discussions that have intertwined several major themes: # When/why do we make major Hadoop releases? # When/how do we move to major JDK versions? # To a lesser extent, we have debated another theme: what do we do about trunk? For now, let's park JDK trunk to treat them in a separate thread(s). For a while now, I've had a couple of lampposts in my head which I used for guidance - apologize for not sharing this broadly prior to this discussion, maybe putting it out here will help - certainly hope so. Major Releases Hadoop continues to benefit tremendously by the investment in stability, validation etc. put in by its *anchor* users: Yahoo, Facebook, Twitter, eBay, LinkedIn etc. A historical perspective... In it's lifetime, Apache Hadoop went from monthly to quarterly releases because, as Hadoop became more and more of a production system (starting with hadoop-0.16 and more so with hadoop 0.18), users could not absorb the torrid pace of change. IMHO, we didn't go far enough in addressing the competing pressures of stability v/s rapid innovation. We paid for it by losing one of our anchor users - Facebook - around the time of hadoop-0.19 - they just forked. Around the same time, Yahoo hit the same problem (I know, I lived through it painfully) and got stuck with hadoop-0.20 for a *very* long time and forked to add Security rather than deal with the next major release (hadoop-0.21). Later on, Facebook did the same, and, unfortunately for the community, is stuck - probably forever - on their fork of hadoop-0.20. Overall, these were dark days for the community: every anchor user was on their own fork, and it took a toll on the project. Recently, thankfully for Hadoop, we have had a period of relative stability with hadoop-1.x and hadoop-2.x. Even so, there were close shaves: Yahoo was on hadoop-0.23 for a *very* long time - in fact, they are only just now finishing their migration to hadoop-2.x. I think the major lessons here are the obvious ones: # Compatibility matters # Maintaining ?multiple major releases, in parallel, is a big problem - it leads to an unproductive, and risky, split in community investment along different lines. Looking Ahead Given the above, here are some thoughts for looking ahead: # Be very conservative about major releases - a major benefit is required (features) for the cost. Let's not compel our anchor users like Yahoo, Twitter, eBay, and LinkedIn to
[jira] [Created] (HADOOP-11699) _HOST not consistently resolving to lowercase fully qualified hostname
Kevin Minder created HADOOP-11699: - Summary: _HOST not consistently resolving to lowercase fully qualified hostname Key: HADOOP-11699 URL: https://issues.apache.org/jira/browse/HADOOP-11699 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.6.0 Reporter: Kevin Minder The _HOST marker used for Kerberos principals in various configuration files does not always return lowercase fully qualified hostnames. For example this setting in hdfs-site.xml {code} property namedfs.namenode.kerberos.principal/name valuehdfs/_h...@yourrealm.com/value /property {code} In particular, this is impeding our work to have Hadoop work with equivalent security on Windows as on Linux. In the windows env in which I'm having the issue, I was able to get a fully qualified host name using this version of method getLocalHostName() in . hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java {code:java} public static String getLocalHostName() throws UnknownHostException { String hostname = InetAddress.getLocalHost().getCanonicalHostName(); if ( !hostname.contains( . ) ) { final String os = System.getProperties().getProperty( os.name, ? ).toLowerCase(); if ( os.startsWith( windows ) ) { String domain = System.getenv( USERDNSDOMAIN ); if ( domain != null ) { hostname += . + domain.trim(); } } } return hostname == null ? localhost : hostname.toLowerCase(); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Hadoop - Major releases
Thichphuoctien On Mar 9, 2015 3:35 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi Mayank, Note that Hadoop 3 does not mean the end of updates for Hadoop 2.x, which will keep supporting JDK7 for a while yet. Someone on the original thread also proposed keeping Hadoop 3 JDK7-source compatible to make backports to 2.x easier. I support this. Note also that the jump from Hadoop 1 to Hadoop 2 (which is what I assume was your previous migration) is a far, far more impactful change than what is being proposed for Hadoop 3. Hadoop 3 will look basically like a 2.x release except for the JDK8 bump and classpath isolation. The intent is to otherwise maintain wire and API compatibility. Overall your timeline sounds like it fits the schedule I proposed. If we release a 3.0 GA this year, it means you can upgrade to a baked 3.1 or 3.2 next year. Seems like a sound upgrade procedure for a large cluster. Best, Andrew On Mon, Mar 9, 2015 at 2:24 PM, Mayank Bansal maban...@gmail.com wrote: Hi Guys, From my prospective @ ebay we are not going to upgrade to JDK 8 any time soon we just upgraded to 7 and not want to move further at least this year so I will request you guys not to drop the support for JDK 7 as that would be very crucial for us to move forward. We also just completed our Hadoop 2 migration for all clusters this year which we started earlier last year, so I don't think we can do again major upgrades this year. Stabilizing the major releases takes lots of effort and time, I think Hadoop 3.x makes sense at least for us next year. Thanks, Mayank On Mon, Mar 9, 2015 at 12:29 AM, Arun Murthy a...@hortonworks.com wrote: Over the last few days, we have had lots of discussions that have intertwined several major themes: # When/why do we make major Hadoop releases? # When/how do we move to major JDK versions? # To a lesser extent, we have debated another theme: what do we do about trunk? For now, let's park JDK trunk to treat them in a separate thread(s). For a while now, I've had a couple of lampposts in my head which I used for guidance - apologize for not sharing this broadly prior to this discussion, maybe putting it out here will help - certainly hope so. Major Releases Hadoop continues to benefit tremendously by the investment in stability, validation etc. put in by its *anchor* users: Yahoo, Facebook, Twitter, eBay, LinkedIn etc. A historical perspective... In it's lifetime, Apache Hadoop went from monthly to quarterly releases because, as Hadoop became more and more of a production system (starting with hadoop-0.16 and more so with hadoop 0.18), users could not absorb the torrid pace of change. IMHO, we didn't go far enough in addressing the competing pressures of stability v/s rapid innovation. We paid for it by losing one of our anchor users - Facebook - around the time of hadoop-0.19 - they just forked. Around the same time, Yahoo hit the same problem (I know, I lived through it painfully) and got stuck with hadoop-0.20 for a *very* long time and forked to add Security rather than deal with the next major release (hadoop-0.21). Later on, Facebook did the same, and, unfortunately for the community, is stuck - probably forever - on their fork of hadoop-0.20. Overall, these were dark days for the community: every anchor user was on their own fork, and it took a toll on the project. Recently, thankfully for Hadoop, we have had a period of relative stability with hadoop-1.x and hadoop-2.x. Even so, there were close shaves: Yahoo was on hadoop-0.23 for a *very* long time - in fact, they are only just now finishing their migration to hadoop-2.x. I think the major lessons here are the obvious ones: # Compatibility matters # Maintaining ?multiple major releases, in parallel, is a big problem - it leads to an unproductive, and risky, split in community investment along different lines. Looking Ahead Given the above, here are some thoughts for looking ahead: # Be very conservative about major releases - a major benefit is required (features) for the cost. Let's not compel our anchor users like Yahoo, Twitter, eBay, and LinkedIn to invest in previous releases rather than the latest one. Let's hear more from them - and let's be very accommodating to them - for they play a key role in keeping Hadoop healthy stable. # Be conservative about dropping support for JDKs. In particular, let's hear from our anchor users on their plans for adoption jdk-1.8. LinkedIn has already moved to jdk-1.8, which is great for the validation , but let's wait for the rest of our anchor users to move before we drop jdk-1.7. We did the
[jira] [Resolved] (HADOOP-9086) Enforce process singleton rules through an exclusive write lock on a file, not a pid file +kill -0,
[ https://issues.apache.org/jira/browse/HADOOP-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-9086. -- Resolution: Won't Fix I'm going to set this as won't fix. Introducing more dependencies at this level sounds like a bad thing, esp given that every ops person has their own preferences as to what to user here. Enforce process singleton rules through an exclusive write lock on a file, not a pid file +kill -0, --- Key: HADOOP-9086 URL: https://issues.apache.org/jira/browse/HADOOP-9086 Project: Hadoop Common Issue Type: Improvement Components: scripts, util Affects Versions: 1.1.1, 2.0.3-alpha Environment: Unix/Linux. Reporter: Steve Loughran the {{hadoop-daemon.sh}} script (and other liveness monitors) probe the existence of a daemon service by a {{kill -0}} of a process id picked up from a pid file. This is flawed # pid file locations may change with installations. # Linux and Unix recycle pids, leading to false positives -the scripts think the process is running, when another process is. # doesn't work on windows. Having the processes acquire an exclusive write-lock on a known file would delegate lock management and implicitly liveness to the OS itself. when the process dies, the lock is relased (on Unixes) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11700) JDiff package is not found while building the project of hadoop-common from hadoop source code.
Radhanpura Aashish created HADOOP-11700: --- Summary: JDiff package is not found while building the project of hadoop-common from hadoop source code. Key: HADOOP-11700 URL: https://issues.apache.org/jira/browse/HADOOP-11700 Project: Hadoop Common Issue Type: Bug Reporter: Radhanpura Aashish hadoop-trunk/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/tools/ExcludePrivateAnnotationsJDiffDoclet.java:[24,13] package jdiff does not exist [ERROR] /home/thelionheart/Downloads/hadoop-trunk/hadoop-common-project/hadoop-annotations/src/main/java/org/apache/hadoop/classification/tools/ExcludePrivateAnnotationsJDiffDoclet.java:[42,12] cannot find symbol symbol: variable JDiff location: class org.apache.hadoop.classification.tools.ExcludePrivateAnnotationsJDiffDoclet -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11680) Deduplicate jars in convenience binary distribution
[ https://issues.apache.org/jira/browse/HADOOP-11680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-11680. --- Resolution: Duplicate I'm going to close this as a dupe of HADOOP-10115, especially since that was just committed. Deduplicate jars in convenience binary distribution --- Key: HADOOP-11680 URL: https://issues.apache.org/jira/browse/HADOOP-11680 Project: Hadoop Common Issue Type: Improvement Components: build Reporter: Sean Busbey Assignee: Sean Busbey Pulled from discussion on HADOOP-11656 Colin wrote: {quote} bq. Andrew wrote: One additional note related to this, we can spend a lot of time right now distributing 100s of MBs of jar dependencies when launching a YARN job. Maybe this is ameliorated by the new shared distributed cache, but I've heard this come up quite a bit as a complaint. If we could meaningfully slim down our client, it could lead to a nice win. I'm frustrated that nobody responded to my earlier suggestion that we de-duplicate jars. This would drastically reduce the size of our install, and without rearchitecting anything. In fact I was so frustrated that I decided to write a program to do it myself and measure the delta. Here it is: Before: {code} du -h /h 249M/h {code} After: {code} du -h /h 140M/h {code} Seems like deduplicating jars would be a much better project than splitting into a client jar, if we really cared about this. snip {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-7332) Deadlock in IPC
[ https://issues.apache.org/jira/browse/HADOOP-7332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-7332. -- Resolution: Won't Fix stale Deadlock in IPC --- Key: HADOOP-7332 URL: https://issues.apache.org/jira/browse/HADOOP-7332 Project: Hadoop Common Issue Type: Bug Components: ipc Affects Versions: 0.22.0 Reporter: Todd Lipcon Saw this during a run of TestIPC on 0.22 branch: [junit] Java stack information for the threads listed above: [junit] === [junit] IPC Client (47) connection to /0:0:0:0:0:0:0:0:48853 from an unknown user: [junit] at org.apache.hadoop.ipc.Client$ParallelResults.callComplete(Client.java:879) [junit] - waiting to lock 0xf599ef88 (a org.apache.hadoop.ipc.Client$ParallelResults) [junit] at org.apache.hadoop.ipc.Client$ParallelCall.callComplete(Client.java:862) [junit] at org.apache.hadoop.ipc.Client$Call.setException(Client.java:185) [junit] - locked 0xf59e2818 (a org.apache.hadoop.ipc.Client$ParallelCall) [junit] at org.apache.hadoop.ipc.Client$Connection.cleanupCalls(Client.java:843) [junit] at org.apache.hadoop.ipc.Client$Connection.close(Client.java:832) [junit] - locked 0xf59d8a90 (a org.apache.hadoop.ipc.Client$Connection) [junit] at org.apache.hadoop.ipc.Client$Connection.run(Client.java:708) [junit] Thread-242: [junit] at org.apache.hadoop.ipc.Client$Connection.markClosed(Client.java:788) [junit] - waiting to lock 0xf59d8a90 (a org.apache.hadoop.ipc.Client$Connection) [junit] at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:742) [junit] at org.apache.hadoop.ipc.Client.call(Client.java:1109) [junit] - locked 0xf599ef88 (a org.apache.hadoop.ipc.Client$ParallelResults) [junit] at org.apache.hadoop.ipc.TestIPC$ParallelCaller.run(TestIPC.java:135) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Hadoop - Major releases
Hi Mayank, 1. We would be moving to Hadoop -3 (Not this year though) however I don't see we can do another JDK upgrade so soon. So the point I am trying to make is we should be supporting jdk 7 as well for Hadoop-3. We'll still be releasing 2.x releases for a while, with similar featuresets as 3.x. You can keep using 2.x until you feel ready to jump to JDK8. 2. For the sake of JDK 8 and classpath isolation we shouldn't be making another release as those can be supported in Hadoop 2 as well, so what is the motivation of making Hadoop 3 so soon? So already you can run 2.x with JDK8 and some degree of classpath isolation, but I've discussed the motivation for a 3.0 on the previous thread. We had issues in the JDK6 days with our dependencies not supporting JDK6 and thus not releasing security or bug fixes, which in turn put us in a bad spot. Classpath isolation we are still discussing, but right now it's opt-in and somewhat incomplete, which makes it hard for downstream projects to effectively make use of it. The goal for 3.0 is to clean this up and have it on by default (or always). Best, Andrew
Re: Looking to a Hadoop 3 release
Avoiding the use of JDK8 language features (and, presumably, APIs) means you've abandoned #1, i.e., you haven't (really) bumped the JDK source version to JDK8. Also, note that releasing from trunk is a way of achieving #3, it's not a way of abandoning it. On Mon, Mar 9, 2015 at 7:10 PM, Andrew Wang andrew.w...@cloudera.com wrote: Hi Raymie, Konst proposed just releasing off of trunk rather than cutting a branch-2, and there was general agreement there. So, consider #3 abandoned. 12 can be achieved at the same time, we just need to avoid using JDK8 language features in trunk so things can be backported. Best, Andrew On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata rst...@altiscale.com wrote: In this (and the related threads), I see the following three requirements: 1. Bump the source JDK version to JDK8 (ie, drop JDK7 support). 2. We'll still be releasing 2.x releases for a while, with similar feature sets as 3.x. 3. Avoid the risk of split-brain behavior by minimize backporting headaches. Pulling trunk branch-2 branch-2.x is already tedious. Adding a branch-3, branch-3.x would be obnoxious. These three cannot be achieved at the same time. Which do we abandon? On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia sanjayo...@gmail.com wrote: On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote: 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. + 1 on this. sanjay
[jira] [Resolved] (HADOOP-7648) Update date from 2009 to 2011
[ https://issues.apache.org/jira/browse/HADOOP-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-7648. -- Resolution: Incomplete closing as stale Update date from 2009 to 2011 - Key: HADOOP-7648 URL: https://issues.apache.org/jira/browse/HADOOP-7648 Project: Hadoop Common Issue Type: Bug Components: build, documentation Affects Versions: 0.22.0 Reporter: Joep Rottinghuis Build files contains year parameter that shows up in the UI. Some of the documentation has 2009 copyrights on them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015?
Between this and the other thread, I’m seeing: * companies that were forced to make internal forks because their patches were ignored are now considered the deciders for whether we move forward * 5 years since the last branch off of trunk is considered ‘soon’ * More good reasons to kill hadoop 2.7 and release hadoop 3.0 as the JDK7 release * We are now OPENLY hostile to operations teams * No one seems to really care that we’re about to create an absolute nightmare for anyone that uses maven repos, as they’ll need to keep track of which jars have been compiled with which JVM with zero hints from our build artifacts On Mar 9, 2015, at 4:18 PM, Steve Loughran ste...@hortonworks.com wrote: On 09/03/2015 15:56, Andrew Wang andrew.w...@cloudera.com wrote: I find this proposal very surprising. We've intentionally deferred incompatible changes to trunk, because they are incompatible and do not belong in a minor release. Now we are supposed to blur our eyes and release these changes anyway? I don't see this ending well. I'm staring at CHANGES.TXT thinking 'how can we ship something off trunk that has as many of these as we can get out ‹especially those shell script bits‹ in a way that doesn't break everything. Because there's a lot of improvements and bug fixes there which aren't going to be anyone's hands for a long time otherwise, not just due to any proposed 3.x release schedule, but because of the java 8 requirements as well as classloader stuff. One higher-level goal we should be working towards is tightening our compatibility guarantees, not loosening them. This is why I've been highlighting classpath isolation as a 3.0 feature, since this is one of the biggest issues faced by our users and downstreams. I think a 3.0 with an improved compatibility story will make operators and downstreams much happier than releasing trunk as 2.8. Best, Andrew I still want to see what's being proposed here. Having classpath isolation will make the JAR upgrade story in 3.x a lot cleaner, but we can't go to every app that imports hadoop-hdfs-client and say your code just broke, not if they want their apps to continue to run on Hadoop 2 and/or Java 7. Which, given that Java 7 is still something cluster ops teams are coming to terms with, is going to be a while
Re: Looking to a Hadoop 3 release
In this (and the related threads), I see the following three requirements: 1. Bump the source JDK version to JDK8 (ie, drop JDK7 support). 2. We'll still be releasing 2.x releases for a while, with similar feature sets as 3.x. 3. Avoid the risk of split-brain behavior by minimize backporting headaches. Pulling trunk branch-2 branch-2.x is already tedious. Adding a branch-3, branch-3.x would be obnoxious. These three cannot be achieved at the same time. Which do we abandon? On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia sanjayo...@gmail.com wrote: On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote: 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. + 1 on this. sanjay
[jira] [Resolved] (HADOOP-7027) Temporary Fix to handle problem in org/apache/hadoop/security/UserGroupInformation.java when using Apache Harmony
[ https://issues.apache.org/jira/browse/HADOOP-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-7027. -- Resolution: Won't Fix Temporary Fix to handle problem in org/apache/hadoop/security/UserGroupInformation.java when using Apache Harmony - Key: HADOOP-7027 URL: https://issues.apache.org/jira/browse/HADOOP-7027 Project: Hadoop Common Issue Type: New Feature Affects Versions: 0.21.0 Environment: SLE v. 11, Apache Harmony 6 Reporter: Guillermo Cabrera Priority: Trivial Attachments: HADOOP-7027.patch Building and running Hadoop common is not possible with the error outloine in HADOOP-6941. To address the problem for someone using Apache Harmony, we have created a temporary fix specific to Harmony (will fail on other JREs) that will overcome this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Looking to a Hadoop 3 release
Hi Raymie, Konst proposed just releasing off of trunk rather than cutting a branch-2, and there was general agreement there. So, consider #3 abandoned. 12 can be achieved at the same time, we just need to avoid using JDK8 language features in trunk so things can be backported. Best, Andrew On Mon, Mar 9, 2015 at 7:01 PM, Raymie Stata rst...@altiscale.com wrote: In this (and the related threads), I see the following three requirements: 1. Bump the source JDK version to JDK8 (ie, drop JDK7 support). 2. We'll still be releasing 2.x releases for a while, with similar feature sets as 3.x. 3. Avoid the risk of split-brain behavior by minimize backporting headaches. Pulling trunk branch-2 branch-2.x is already tedious. Adding a branch-3, branch-3.x would be obnoxious. These three cannot be achieved at the same time. Which do we abandon? On Mon, Mar 9, 2015 at 12:45 PM, sanjay Radia sanjayo...@gmail.com wrote: On Mar 5, 2015, at 3:21 PM, Siddharth Seth ss...@apache.org wrote: 2) Simplification of configs - potentially separating client side configs and those used by daemons. This is another source of perpetual confusion for users. + 1 on this. sanjay
[jira] [Resolved] (HADOOP-7026) Adding new target to build.xml to run tests without compiling
[ https://issues.apache.org/jira/browse/HADOOP-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer resolved HADOOP-7026. -- Resolution: Fixed fixed in more recent trees Adding new target to build.xml to run tests without compiling - Key: HADOOP-7026 URL: https://issues.apache.org/jira/browse/HADOOP-7026 Project: Hadoop Common Issue Type: New Feature Components: build, test Affects Versions: 0.21.0 Environment: SLE v. 11, Apache Harmony 6 Reporter: Guillermo Cabrera Priority: Trivial Attachments: HADOOP-7026.patch While testing Apache Harmony Select (lightweight version of Harmony) with Hadoop Common we had to first build with Harmony and then test using Harmony Select using the test-core target. This was done in an effort to investigate any issues with Harmony Select in running common. However, the test-core target also compiles the classes which we are unable to do with Harmony Select. A new target is proposed that only runs the tests without compiling them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Hadoop - Major releases
Over the last few days, we have had lots of discussions that have intertwined several major themes: # When/why do we make major Hadoop releases? # When/how do we move to major JDK versions? # To a lesser extent, we have debated another theme: what do we do about trunk? For now, let's park JDK trunk to treat them in a separate thread(s). For a while now, I've had a couple of lampposts in my head which I used for guidance - apologize for not sharing this broadly prior to this discussion, maybe putting it out here will help - certainly hope so. Major Releases Hadoop continues to benefit tremendously by the investment in stability, validation etc. put in by its *anchor* users: Yahoo, Facebook, Twitter, eBay, LinkedIn etc. A historical perspective... In it's lifetime, Apache Hadoop went from monthly to quarterly releases because, as Hadoop became more and more of a production system (starting with hadoop-0.16 and more so with hadoop 0.18), users could not absorb the torrid pace of change. IMHO, we didn't go far enough in addressing the competing pressures of stability v/s rapid innovation. We paid for it by losing one of our anchor users - Facebook - around the time of hadoop-0.19 - they just forked. Around the same time, Yahoo hit the same problem (I know, I lived through it painfully) and got stuck with hadoop-0.20 for a *very* long time and forked to add Security rather than deal with the next major release (hadoop-0.21). Later on, Facebook did the same, and, unfortunately for the community, is stuck - probably forever - on their fork of hadoop-0.20. Overall, these were dark days for the community: every anchor user was on their own fork, and it took a toll on the project. Recently, thankfully for Hadoop, we have had a period of relative stability with hadoop-1.x and hadoop-2.x. Even so, there were close shaves: Yahoo was on hadoop-0.23 for a *very* long time - in fact, they are only just now finishing their migration to hadoop-2.x. I think the major lessons here are the obvious ones: # Compatibility matters # Maintaining ?multiple major releases, in parallel, is a big problem - it leads to an unproductive, and risky, split in community investment along different lines. Looking Ahead Given the above, here are some thoughts for looking ahead: # Be very conservative about major releases - a major benefit is required (features) for the cost. Let's not compel our anchor users like Yahoo, Twitter, eBay, and LinkedIn to invest in previous releases rather than the latest one. Let's hear more from them - and let's be very accommodating to them - for they play a key role in keeping Hadoop healthy stable. # Be conservative about dropping support for JDKs. In particular, let's hear from our anchor users on their plans for adoption jdk-1.8. LinkedIn has already moved to jdk-1.8, which is great for the validation , but let's wait for the rest of our anchor users to move before we drop jdk-1.7. We did the same thing with jdk-1.6 - waited for them to move before we drop support for jdk-1.7. Overall, I'd love to hear more from Twitter, Yahoo, eBay and other anchor users on their plans for jdk-1.8 specifically, and on their overall appetite for hadoop-3. Let's not finalize our plans for moving forward until this input has been considered. Thoughts? thanks, Arun Unfortunate that it's necessary disclaimers: # Before people point out vendor affiliations to lend unnecessary color to my opinions, let me state that hadoop-2 v/s hadoop-3 is a non-issue for us. For major HDP versions the key is, just, compatibility?... e.g. we ship major, but compatible, community releases such as hive-0.13/hive-0.14 in HDP-2.x/HDP-2.x+1 etc. # Also, release management is a similar non-issue - we have already had several individuals step up in hadoop-2.x line. Expect more of the same from folks like Andrew, Karthik, Vinod, Steve etc.
[jira] [Resolved] (HADOOP-11646) Erasure Coder API for encoding and decoding of block group
[ https://issues.apache.org/jira/browse/HADOOP-11646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B resolved HADOOP-11646. Resolution: Fixed Hadoop Flags: Reviewed Committed to HDFS-7285 branch. Erasure Coder API for encoding and decoding of block group -- Key: HADOOP-11646 URL: https://issues.apache.org/jira/browse/HADOOP-11646 Project: Hadoop Common Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Fix For: HDFS-7285 Attachments: HADOOP-11646-v4.patch, HADOOP-11646-v5.patch, HDFS-7662-v1.patch, HDFS-7662-v2.patch, HDFS-7662-v3.patch This is to define ErasureCoder API for encoding and decoding of BlockGroup. Given a BlockGroup, ErasureCoder extracts data chunks from the blocks and leverages RawErasureCoder defined in HADOOP-11514 to perform concrete encoding or decoding. Note this mainly focuses on the basic fundamental aspects, and solves encoding, data blocks recovering and etc. Regarding parity blocks recovering, as it involves multiple steps, HADOOP-11550 will handle it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)