Re: is it possible to ignore http Mapoutput get by feed mapoutput file and index file diirectly to reducer?
After search the hadoop maillist again, I found this link which trying to optimize hadoop based on Lustre using Hardlink instead of http( http://search-hadoop.com/m/JkHSa17oHp12 ). Any other suggestion ? Thanks all yours, Ling Kun On Thu, Feb 28, 2013 at 4:57 PM, Ling Kun lkun.e...@gmail.com wrote: Dear Arun C Murthy, Pavan Kulkarni and all. Hello! I am currently working on optimize Hadoop cluster based on Lustre FS. According to the TeraSort Benchmark, it seems the remote mapoutput copy takes a great part of the total runtime. After search , I saw your discussion half a years ago ( http://search-hadoop.com/m/jj3y46KUwC1 ). I am writing to wonder whether we can make reducer directly read his part of each mapout file based on index file, and merge them together, instead of making each map task generate output for each reduce task. In this way, it seems that not too much inode is needed. @Pavan Kulkarni: no email wa sent by you after Sep. 2012. Could you please kindly share some experience on how to optimize such a kind of FileSystem like lustre? Anyone have similar work experience? Any comment and reply is welcome and appreciate! yours, Ling Kun. * * -- http://www.lingcc.com -- http://www.lingcc.com
RE: [Vote] Merge branch-trunk-win to trunk
+1 (non-binding) I want to share my vote of confidence in this community. If motivated to do so, this community can keep this project cross-platform and continue to rapidly innovate without breaking a sweat. The day we started working on this, I saw the foundations of greatness in the quality and volume of dev tests, the code itself, and the Apache values themselves. 1.) Hadoop's unit tests and their frameworks are very well thought out and the consideration and energy that went into their design is worthy of praise. The MiniCluster abstractions utilize very few resources and put all the processes into one JVM for easy debugging. It is very easy to select specific tests from the full suite to reproduce an issue reported in another environment - like the Jenkins build server or another contributor's environment. 2.) This community has done an excellent job of incorporating well-placed log messages to make it easy to post mortem troubleshoot most failures. The logs are very useful, and it is extremely rare that troubleshooting a failure requires debugging a live repro. 3.) Hadoop is written primarily in Java, a cross-platform language that provides its own platform in the form of the JVM to insulate most of the code from the specifics of the OS layer. 4.) CoPDoC - The right priorities, and well stated. Thank you, John -Original Message- From: Ivan Mitic [mailto:iva...@microsoft.com] Sent: Wednesday, February 27, 2013 6:32 PM To: mapreduce-dev@hadoop.apache.org; common-...@hadoop.apache.org Cc: yarn-...@hadoop.apache.org; hdfs-...@hadoop.apache.org Subject: RE: [Vote] Merge branch-trunk-win to trunk +1 (non-binding) I am really glad to see this happening! As people already mentioned, this has been a great engineering effort involving many people! Folks raised some valid concerns below and I thought it would be good to share my 2 cents. In my opinion, we don't have to solve all these problems right now. As we move forward with two platforms, we can start addressing one problem at a time and incrementally improve. In the first iteration, maintaining Hadoop on Windows could be just everyone trying to do their best effort (make sure Jenkins build succeeds at least). We already have people who are building/running trunk on Windows daily, so they would jump in and fix problems as needed (we've been doing this in branch-trunk-win for a while now). Although I see that the problems could arise with platform specific features/optimizations, I don't think these are frequent, so in most cases everything will just work. Merging the two branches sooner rather than later does seems like the right thing to do if the ultimate goal is to have Hadoop on both platforms. Now that the port has completed, we will have people in Microsoft (and elsewhere) wanting to contribute features/improvements to the trunk branch. A separate branch would just make things more difficult and confusing for everyone :) Hope this makes sense. -Original Message- From: Todd Lipcon [mailto:t...@cloudera.com] Sent: Wednesday, February 27, 2013 3:43 PM To: common-...@hadoop.apache.org Cc: yarn-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; mapreduce-dev@hadoop.apache.org Subject: Re: [Vote] Merge branch-trunk-win to trunk On Wed, Feb 27, 2013 at 2:54 PM, Suresh Srinivas sur...@hortonworks.comwrote: With that we need to decide how our precommit process looks. My inclination is to wait for +1 from precommit builds on both the platforms to ensure no issues are introduced. Thoughts? 2. Feature development impact Some questions have been raised about would new features need to be supported on both the platforms. Yes. I do not see a reason why features cannot work on both the platforms, with the exception of platform specific optimizations. This what Java gives us. I'm concerned about the above. Personally, I don't have access to any Windows boxes with development tools, and I know nothing about developing on Windows. The only Windows I run is an 8GB VM with 1 GB RAM allocated, for powerpoint :) If I submit a patch and it gets -1 tests failed on the Windows slave, how am I supposed to proceed? I think a reasonable compromise would be that the tests should always *build* on Windows before commit, and contributors should do their best to look at the test logs for any Windows-specific failures. But, beyond looking at the logs, a -1 Tests failed on windows should not block a commit. Those contributors who are interested in Windows being a first-class platform should be responsible for watching the Windows builds and debugging/fixing any regressions that might be Windows-specific. I also think the KDE model that Harsh pointed out is an interesting one -- ie the idea that we would not merge windows support to trunk, but rather treat is as a parallel code line which lives in the ASF and has its own builds and releases. The windows team would periodically merge
Hadoop-Mapreduce-trunk - Build # 1358 - Failure
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1358/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 27587 lines...] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.801 sec Running org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncher Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.916 sec Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.122 sec Results : Failed tests: testMultipleCrashes(org.apache.hadoop.mapreduce.v2.app.TestRecovery): Reduce Task state not correct expected:RUNNING but was:SCHEDULED testOutputRecovery(org.apache.hadoop.mapreduce.v2.app.TestRecovery): Task state is not correct (timedout) expected:SUCCEEDED but was:RUNNING testOutputRecoveryMapsOnly(org.apache.hadoop.mapreduce.v2.app.TestRecovery): Task state is not correct (timedout) expected:SUCCEEDED but was:RUNNING testRecoveryWithOldCommiter(org.apache.hadoop.mapreduce.v2.app.TestRecovery): Task state is not correct (timedout) expected:SUCCEEDED but was:RUNNING testSpeculative(org.apache.hadoop.mapreduce.v2.app.TestRecovery): Task state is not correct (timedout) expected:SUCCEEDED but was:RUNNING Tests run: 209, Failures: 5, Errors: 0, Skipped: 0 [INFO] [INFO] Reactor Summary: [INFO] [INFO] hadoop-mapreduce-client ... SUCCESS [1.596s] [INFO] hadoop-mapreduce-client-core .. SUCCESS [22.735s] [INFO] hadoop-mapreduce-client-common SUCCESS [23.273s] [INFO] hadoop-mapreduce-client-shuffle ... SUCCESS [1.673s] [INFO] hadoop-mapreduce-client-app ... FAILURE [5:25.169s] [INFO] hadoop-mapreduce-client-hs SKIPPED [INFO] hadoop-mapreduce-client-jobclient . SKIPPED [INFO] hadoop-mapreduce-client-hs-plugins SKIPPED [INFO] Apache Hadoop MapReduce Examples .. SKIPPED [INFO] hadoop-mapreduce .. SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 6:15.105s [INFO] Finished at: Thu Feb 28 13:20:47 UTC 2013 [INFO] Final Memory: 20M/125M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.3:test (default-test) on project hadoop-mapreduce-client-app: There are test failures. [ERROR] [ERROR] Please refer to /home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/target/surefire-reports for the individual test results. [ERROR] - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :hadoop-mapreduce-client-app Build step 'Execute shell' marked build as failure [FINDBUGS] Skipping publisher since build result is FAILURE Archiving artifacts Updating HADOOP-9342 Updating YARN-426 Updating HADOOP-9339 Updating MAPREDUCE-4892 Updating MAPREDUCE-4693 Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## No tests ran.
[jira] [Created] (MAPREDUCE-5037) JobControl logging when a job completes
Jason Lowe created MAPREDUCE-5037: - Summary: JobControl logging when a job completes Key: MAPREDUCE-5037 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5037 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 0.23.7, 2.0.4-beta Reporter: Jason Lowe Priority: Minor JobControl emits logs, via the logging in Job.submit(), whenever a job is launched. It would be nice if it also logged when active jobs it is tracking complete and their final status (i.e.: success, failed, killed, etc.). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [Vote] Merge branch-trunk-win to trunk
I'd like to share a few anecdotes about developing cross-platform, hopefully to address some of the concerns about adding overhead to the development process. By reviewing past cases of cross-platform Linux vs. Windows bugs, we can get a sense for how the development process could look in the future. HADOOP-9131: TestLocalFileSystem#testListStatusWithColons cannot run on Windows. As part of an earlier jira, HADOOP-8962, there was a new test committed on trunk covering the case of a local file system interaction on a file containing a ':'. On Windows, ':' in a path has special meaning as part of the drive specifier (i.e. C:), so this test cannot pass when running on Windows. In this kind of case, the cross-platform bug is obvious, and the fix is obvious (assumeTrue(!Shell.WINDOWS)). Ideally, this would get fixed pre-commit after seeing a -1 from the Windows Jenkins slave. HDFS-4274: BlockPoolSliceScanner does not close verification log during shutdown. This caused problems for MiniDFSCluster-based tests running on Windows. Failure to close the verification log meant that we didn't release file locks, so the tests couldn't delete/recreate working directories during teardown/setup. Arguably, this was always a bug, and running on Windows just exposed it because of its stricter rules about file locking. This is a more complex fix, but it doesn't require platform-specific knowledge. If some future patch accidentally regresses this, then we'll likely see +1 from Linux Jenkins and -1 from Windows Jenkins. Ideally, it would get fixed pre-commit, because it doesn't require Windows-specific knowledge. There is also the matter of impact. Re-breaking this would re-break many test suites on Windows. HADOOP-9232: JniBasedUnixGroupsMappingWithFallback fails on Windows with UnsatisfiedLinkError. This was introduced by HADOOP-8712, which switched to JniBasedUnixGroupsMappingWithFallback as the default hadoop.security.group.mapping, but did not provide a Windows implementation of the JNI function. In this case, there was a strong desire to get HADOOP-8712 into a release, fixing it on Windows required native Windows API knowledge, and Windows users had a simple workaround available by changing their configs back to ShellBasedUnixGroupsMapping. I think this is the kind of situation where we could allow HADOOP-8712 to commit despite -1 from Windows Jenkins, with fairly quick follow-up from an engineer with the Windows expertise to fix it. To summarize, I don't think it needs to differ greatly from our current development process. We're all responsible for breadth of understanding and maintenance of the whole codebase, but we also rely on specific individuals with deep expertise in particular areas for certain issues. Sometimes we commit despite a -1 from Jenkins, based on the community's judgment. Virtualization greatly simplifies cross-platform development. I use VirtualBox on a Mac host and run VMs for Windows and Ubuntu with a shared drive so that they can all see the same copy of the source code. There are plenty of variations on this depending on your preference, such as offloading the VMs to a separate server or cloud service to free up local RAM. I'm planning on submitting BUILDING.txt changes later today that fully describe how to build on Windows. After some initial setup, it's nearly identical to the mvn commands that you already use today. Hope this helps, --Chris On Thu, Feb 28, 2013 at 3:25 AM, John Gordon john.gor...@microsoft.comwrote: +1 (non-binding) I want to share my vote of confidence in this community. If motivated to do so, this community can keep this project cross-platform and continue to rapidly innovate without breaking a sweat. The day we started working on this, I saw the foundations of greatness in the quality and volume of dev tests, the code itself, and the Apache values themselves. 1.) Hadoop's unit tests and their frameworks are very well thought out and the consideration and energy that went into their design is worthy of praise. The MiniCluster abstractions utilize very few resources and put all the processes into one JVM for easy debugging. It is very easy to select specific tests from the full suite to reproduce an issue reported in another environment - like the Jenkins build server or another contributor's environment. 2.) This community has done an excellent job of incorporating well-placed log messages to make it easy to post mortem troubleshoot most failures. The logs are very useful, and it is extremely rare that troubleshooting a failure requires debugging a live repro. 3.) Hadoop is written primarily in Java, a cross-platform language that provides its own platform in the form of the JVM to insulate most of the code from the specifics of the OS layer. 4.) CoPDoC - The right priorities, and well stated. Thank you, John -Original Message- From: Ivan Mitic [mailto:iva...@microsoft.com] Sent: Wednesday,
Re: [Vote] Merge branch-trunk-win to trunk
My initial question was mostly intended to understand the desired new classification of Windows after the merge, and how we plan to maintain Windows support. I am happy to hear that hardware for Jenkins will be provided. I am also fine, at least initially, with us trying to treat Windows as a first class supported platform. But I realize that there are a lot of people that do not have easy access to Windows for development/debugging, myself included. I also don't want to slow down the pace of development too much because of this. It will cause some organizations that do not use or support Windows to be more likely to run software that has diverged from an official release. It also has the potential to make the patch submission process even more difficult, which increases the likelihood of submitters abandoning patches. However, the great thing about being in a community is we can change if we need to. I am +0 for the merge. I am not a Windows expert so I don't feel comfortable giving it a true +1. --Bobby On 2/28/13 10:45 AM, Chris Nauroth cnaur...@hortonworks.com wrote: I'd like to share a few anecdotes about developing cross-platform, hopefully to address some of the concerns about adding overhead to the development process. By reviewing past cases of cross-platform Linux vs. Windows bugs, we can get a sense for how the development process could look in the future. HADOOP-9131: TestLocalFileSystem#testListStatusWithColons cannot run on Windows. As part of an earlier jira, HADOOP-8962, there was a new test committed on trunk covering the case of a local file system interaction on a file containing a ':'. On Windows, ':' in a path has special meaning as part of the drive specifier (i.e. C:), so this test cannot pass when running on Windows. In this kind of case, the cross-platform bug is obvious, and the fix is obvious (assumeTrue(!Shell.WINDOWS)). Ideally, this would get fixed pre-commit after seeing a -1 from the Windows Jenkins slave. HDFS-4274: BlockPoolSliceScanner does not close verification log during shutdown. This caused problems for MiniDFSCluster-based tests running on Windows. Failure to close the verification log meant that we didn't release file locks, so the tests couldn't delete/recreate working directories during teardown/setup. Arguably, this was always a bug, and running on Windows just exposed it because of its stricter rules about file locking. This is a more complex fix, but it doesn't require platform-specific knowledge. If some future patch accidentally regresses this, then we'll likely see +1 from Linux Jenkins and -1 from Windows Jenkins. Ideally, it would get fixed pre-commit, because it doesn't require Windows-specific knowledge. There is also the matter of impact. Re-breaking this would re-break many test suites on Windows. HADOOP-9232: JniBasedUnixGroupsMappingWithFallback fails on Windows with UnsatisfiedLinkError. This was introduced by HADOOP-8712, which switched to JniBasedUnixGroupsMappingWithFallback as the default hadoop.security.group.mapping, but did not provide a Windows implementation of the JNI function. In this case, there was a strong desire to get HADOOP-8712 into a release, fixing it on Windows required native Windows API knowledge, and Windows users had a simple workaround available by changing their configs back to ShellBasedUnixGroupsMapping. I think this is the kind of situation where we could allow HADOOP-8712 to commit despite -1 from Windows Jenkins, with fairly quick follow-up from an engineer with the Windows expertise to fix it. To summarize, I don't think it needs to differ greatly from our current development process. We're all responsible for breadth of understanding and maintenance of the whole codebase, but we also rely on specific individuals with deep expertise in particular areas for certain issues. Sometimes we commit despite a -1 from Jenkins, based on the community's judgment. Virtualization greatly simplifies cross-platform development. I use VirtualBox on a Mac host and run VMs for Windows and Ubuntu with a shared drive so that they can all see the same copy of the source code. There are plenty of variations on this depending on your preference, such as offloading the VMs to a separate server or cloud service to free up local RAM. I'm planning on submitting BUILDING.txt changes later today that fully describe how to build on Windows. After some initial setup, it's nearly identical to the mvn commands that you already use today. Hope this helps, --Chris On Thu, Feb 28, 2013 at 3:25 AM, John Gordon john.gor...@microsoft.comwrote: +1 (non-binding) I want to share my vote of confidence in this community. If motivated to do so, this community can keep this project cross-platform and continue to rapidly innovate without breaking a sweat. The day we started working on this, I saw the foundations of greatness in the quality and volume of dev tests, the code itself, and the Apache values
Re: [Vote] Merge branch-trunk-win to trunk
Is there a jira for resolving the outstanding TODOs in the code base (similar to HDFS-2148)? Looks like this merge doesn't introduce many which is great (just did a quick diff and grep). I found 2 remaining TODOs introduced in the current merge patch. One is in ContainerLaunch.java. The container launch script was trying to set a CLASSPATH that exceeded the Windows maximum command line length. The fix was to wrap the long classpath into an intermediate jar containing only a manifest file with a Class-Path entry. (See YARN-316.) Just to be conservative, we wrapped this logic in an if (Shell.WINDOWS) guard and marked a TODO to remove it later and use that approach on all platforms after additional testing. I've tested this code path successfully on Mac too, but several people wanted additional testing and performance checks before removing the if (Shell.WINDOWS) guard. That work is tracked in an existing jira: YARN-358. The other TODO is for winutils to print more usage information and examples. At this point, I think winutils is printing sufficient information, and we can just remove the TODO. I just submitted a new jira to start that conversation: HADOOP-9348. Thank you, --Chris On Thu, Feb 28, 2013 at 11:29 AM, Robert Evans ev...@yahoo-inc.com wrote: My initial question was mostly intended to understand the desired new classification of Windows after the merge, and how we plan to maintain Windows support. I am happy to hear that hardware for Jenkins will be provided. I am also fine, at least initially, with us trying to treat Windows as a first class supported platform. But I realize that there are a lot of people that do not have easy access to Windows for development/debugging, myself included. I also don't want to slow down the pace of development too much because of this. It will cause some organizations that do not use or support Windows to be more likely to run software that has diverged from an official release. It also has the potential to make the patch submission process even more difficult, which increases the likelihood of submitters abandoning patches. However, the great thing about being in a community is we can change if we need to. I am +0 for the merge. I am not a Windows expert so I don't feel comfortable giving it a true +1. --Bobby On 2/28/13 10:45 AM, Chris Nauroth cnaur...@hortonworks.com wrote: I'd like to share a few anecdotes about developing cross-platform, hopefully to address some of the concerns about adding overhead to the development process. By reviewing past cases of cross-platform Linux vs. Windows bugs, we can get a sense for how the development process could look in the future. HADOOP-9131: TestLocalFileSystem#testListStatusWithColons cannot run on Windows. As part of an earlier jira, HADOOP-8962, there was a new test committed on trunk covering the case of a local file system interaction on a file containing a ':'. On Windows, ':' in a path has special meaning as part of the drive specifier (i.e. C:), so this test cannot pass when running on Windows. In this kind of case, the cross-platform bug is obvious, and the fix is obvious (assumeTrue(!Shell.WINDOWS)). Ideally, this would get fixed pre-commit after seeing a -1 from the Windows Jenkins slave. HDFS-4274: BlockPoolSliceScanner does not close verification log during shutdown. This caused problems for MiniDFSCluster-based tests running on Windows. Failure to close the verification log meant that we didn't release file locks, so the tests couldn't delete/recreate working directories during teardown/setup. Arguably, this was always a bug, and running on Windows just exposed it because of its stricter rules about file locking. This is a more complex fix, but it doesn't require platform-specific knowledge. If some future patch accidentally regresses this, then we'll likely see +1 from Linux Jenkins and -1 from Windows Jenkins. Ideally, it would get fixed pre-commit, because it doesn't require Windows-specific knowledge. There is also the matter of impact. Re-breaking this would re-break many test suites on Windows. HADOOP-9232: JniBasedUnixGroupsMappingWithFallback fails on Windows with UnsatisfiedLinkError. This was introduced by HADOOP-8712, which switched to JniBasedUnixGroupsMappingWithFallback as the default hadoop.security.group.mapping, but did not provide a Windows implementation of the JNI function. In this case, there was a strong desire to get HADOOP-8712 into a release, fixing it on Windows required native Windows API knowledge, and Windows users had a simple workaround available by changing their configs back to ShellBasedUnixGroupsMapping. I think this is the kind of situation where we could allow HADOOP-8712 to commit despite -1 from Windows Jenkins, with fairly quick follow-up from an engineer with the Windows expertise to fix it. To summarize, I don't think it needs to differ greatly from our
[jira] [Created] (MAPREDUCE-5038) mapred CombineFileInputFormat does not work on non-splittable files
Sandy Ryza created MAPREDUCE-5038: - Summary: mapred CombineFileInputFormat does not work on non-splittable files Key: MAPREDUCE-5038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza MAPREDUCE-1597 enabled the CombineFileInputFormat in mapreduce to work on splittable files, but neglected to consider the one in mapred. In trunk this is not an issue as the one in mapred extends the one in mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [Vote] Merge branch-trunk-win to trunk
+1 for the merge. As someone who has been testing the code for many months now, both on singlenode and multinode clusters, I am very confident about the stability and the quality of the code. I have run several regression tests to verify distributed cache, streaming, compression, capacity scheduler, job history and many more features in HDFS and MR. - Ramya On Thu, Feb 28, 2013 at 3:08 PM, sanjay Radia san...@hortonworks.comwrote: +1 Java has done the bulk of the work in making Hadoop multi-platform. Windows specific code is a tiny percentage of the code. Jeninks support for windows is going help us keep the platform portable going forward. I expect that the vast majority of new commits have no problems. I propose that we start by fixing problems that Jenkins raises but not block new commits for too long if the author does not have a windows box or if a volunteer does not step up. sanjay
Re: [Vote] Merge branch-trunk-win to trunk
On Thu, Feb 28, 2013 at 03:08PM, sanjay Radia wrote: +1 Java has done the bulk of the work in making Hadoop multi-platform. Windows specific code is a tiny percentage of the code. Jeninks support for windows is going help us keep the platform portable going forward. I expect that the vast majority of new commits have no problems. I propose that we start by fixing problems that Jenkins raises but not block new commits for too long if the author does not have a windows box or if a volunteer does not step up. Considering a typical set of software most of the people here work with it would be completely inappropriate to block commits for failing Windows specific features. After all, Microsoft never did bother to check what features or compatibilty matters they have broke in Java and elsewhere, so why should we? I believe this kind of rules have to be set and discussed before the merge is done. Cheers, Cos signature.asc Description: Digital signature
Fwd: [Vote] Merge branch-trunk-win to trunk
+1 (binding) Apache is supposed to be about the community. We have here a community of developers, who have actively and openly worked to add a major improvement to Hadoop: the ability to work cross-platform. Furthermore, the size of the substantive part of the needed patch is only about 1500 lines, much smaller than quite a few other additions to Hadoop over the last few months. We should welcome and support this change, and make sure that the code stays cross-platform going forward by extending our CI practices, especially pre-commit test-patch, to also include Windows. As most of you know, my colleague Giri Kesavan (PMC member) helps maintain the Linux CI capability for Hadoop. I've talked with him, and he and I are committing to getting test-patch implemented for Windows, so that along with the current automated +1s required to commit, we can add two more, for javac build in Windows and core unit tests in Windows. Members of the team implementing cross-platform compatibility, including Microsoft employees, have opened the discussion for providing hardware or VM resources to perform this additional CI testing. I will assist them to work with the Apache Infra team and figure out how to make it happen. I understand there is some concern about the additional platform test. My going-in presumption, based on Java's intrinsic, pretty-good, cross-platform compatibility, is that patches to Hadoop will by default also have cross-platform compatibility, unless they are written in an explicitly platform-dependent way. I also believe that in the vast majority of cases the cross-platform compatibility of Java will carry thru to Hadoop patches, without additional effort on the developer's part. Let's try it, and see what happens. If we actually find a frequent difficulty, we'll change to engineer around it. But I believe that, in the rare cases where a Windows-specific failure occurs, there will be a number of people (new, enthusiastic members of the community! :-) willing to help. If such help is not forthcoming, then we can discuss work-arounds, but like a previous poster, I am confident in the community. Regards, --Matt On Thu, Feb 28, 2013 at 12:21 PM, Chuan Liu chuan...@microsoft.com wrote: +1 (non-binding) As someone also contributed to porting Hadoop to Windows, I think Java already provided a very good platform independent platform. For features that are not available in Java, we will try to provide our platform independent APIs that abstract OS tasks away. Most features should have no difficulty running on Windows and Linux by using Java and those platform independent APIs. For concerns raise on new features that may fail on Windows, I think we don't need to require passing on Windows a mandate at the moment. We can simply mark it unavailable to Windows and port it later if the feature is important. -Chuan -Original Message- From: Chris Nauroth [mailto:cnaur...@hortonworks.com] Sent: Thursday, February 28, 2013 11:51 AM To: hdfs-...@hadoop.apache.org Cc: mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org; common-...@hadoop.apache.org Subject: Re: [Vote] Merge branch-trunk-win to trunk Is there a jira for resolving the outstanding TODOs in the code base (similar to HDFS-2148)? Looks like this merge doesn't introduce many which is great (just did a quick diff and grep). I found 2 remaining TODOs introduced in the current merge patch. One is in ContainerLaunch.java. The container launch script was trying to set a CLASSPATH that exceeded the Windows maximum command line length. The fix was to wrap the long classpath into an intermediate jar containing only a manifest file with a Class-Path entry. (See YARN-316.) Just to be conservative, we wrapped this logic in an if (Shell.WINDOWS) guard and marked a TODO to remove it later and use that approach on all platforms after additional testing. I've tested this code path successfully on Mac too, but several people wanted additional testing and performance checks before removing the if (Shell.WINDOWS) guard. That work is tracked in an existing jira: YARN-358. The other TODO is for winutils to print more usage information and examples. At this point, I think winutils is printing sufficient information, and we can just remove the TODO. I just submitted a new jira to start that conversation: HADOOP-9348. Thank you, --Chris On Thu, Feb 28, 2013 at 11:29 AM, Robert Evans ev...@yahoo-inc.com wrote: My initial question was mostly intended to understand the desired new classification of Windows after the merge, and how we plan to maintain Windows support. I am happy to hear that hardware for Jenkins will be provided. I am also fine, at least initially, with us trying to treat Windows as a first class supported platform. But I realize that there are a lot of people that do not have easy access to Windows for development/debugging, myself included. I also