Re: is it possible to ignore http Mapoutput get by feed mapoutput file and index file diirectly to reducer?

2013-02-28 Thread Ling Kun
After search the hadoop maillist again, I found this link which trying to
optimize hadoop based on Lustre using Hardlink instead of http(
http://search-hadoop.com/m/JkHSa17oHp12 ).


 Any other suggestion ?

Thanks all

yours,
Ling Kun


On Thu, Feb 28, 2013 at 4:57 PM, Ling Kun lkun.e...@gmail.com wrote:

 Dear Arun C Murthy, Pavan Kulkarni and all.
  Hello!
  I am currently working on optimize Hadoop cluster based on Lustre FS.
 According to the TeraSort Benchmark, it seems the remote mapoutput copy
 takes a great part of the total runtime.


After search , I saw your discussion half a years ago (
 http://search-hadoop.com/m/jj3y46KUwC1 ).

  I am writing to wonder whether  we  can make reducer directly read
 his part of each mapout file based on index file, and merge them together,
 instead of making each map task generate output for each reduce task.

 In this way, it seems that not too much inode is needed.


 @Pavan Kulkarni: no email wa sent by you after Sep. 2012. Could you please
 kindly share some experience on how to optimize such a kind of  FileSystem
 like lustre?

   Anyone have similar work experience?


   Any comment and reply is welcome and appreciate!

 yours,
 Ling Kun.
 *
 *
 --
 http://www.lingcc.com




-- 
http://www.lingcc.com


RE: [Vote] Merge branch-trunk-win to trunk

2013-02-28 Thread John Gordon
+1 (non-binding)

I want to share my vote of confidence in this community.  If motivated to do 
so, this community can keep this project cross-platform and continue to rapidly 
innovate without breaking a sweat.

The day we started working on this, I saw the foundations of greatness in the 
quality and volume of dev tests, the code itself, and the Apache values 
themselves.

1.) Hadoop's unit tests and their frameworks are very well thought out and the 
consideration and energy that went into their design is worthy of praise.  The 
MiniCluster abstractions utilize very few resources and put all the processes 
into one JVM for easy debugging.  It is very easy to select specific tests from 
the full suite to reproduce an issue reported in another environment - like the 
Jenkins build server or another contributor's environment.  
2.) This community has done an excellent job of incorporating well-placed log 
messages to make it easy to post mortem troubleshoot most failures.  The logs 
are very useful, and it is extremely rare that troubleshooting a failure 
requires debugging a live repro.
3.) Hadoop is written primarily in Java, a cross-platform language that 
provides its own platform in the form of the JVM to insulate most of the code 
from the specifics of the OS layer.
4.) CoPDoC - The right priorities, and well stated.


Thank you,

John

-Original Message-
From: Ivan Mitic [mailto:iva...@microsoft.com] 
Sent: Wednesday, February 27, 2013 6:32 PM
To: mapreduce-dev@hadoop.apache.org; common-...@hadoop.apache.org
Cc: yarn-...@hadoop.apache.org; hdfs-...@hadoop.apache.org
Subject: RE: [Vote] Merge branch-trunk-win to trunk

+1 (non-binding)

I am really glad to see this happening! As people already mentioned, this has 
been a great engineering effort involving many people!


Folks raised some valid concerns below and I thought it would be good to share 
my 2 cents. In my opinion, we don't have to solve all these problems right now. 
As we move forward with two platforms, we can start addressing one problem at a 
time and incrementally improve. In the first iteration, maintaining Hadoop on 
Windows could be just everyone trying to do their best effort (make sure 
Jenkins build succeeds at least). We already have people who are 
building/running trunk on Windows daily, so they would jump in and fix problems 
as needed (we've been doing this in branch-trunk-win for a while now). Although 
I see that the problems could arise with platform specific 
features/optimizations, I don't think these are frequent, so in most cases 
everything will just work. Merging the two branches sooner rather than later 
does seems like the right thing to do if the ultimate goal is to have Hadoop on 
both platforms. Now that the port has completed, we will have people in 
Microsoft (and elsewhere) wanting to contribute features/improvements to the 
trunk branch. A separate branch would just make things more difficult and 
confusing for everyone :) Hope this makes sense.

-Original Message-
From: Todd Lipcon [mailto:t...@cloudera.com]
Sent: Wednesday, February 27, 2013 3:43 PM
To: common-...@hadoop.apache.org
Cc: yarn-...@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
mapreduce-dev@hadoop.apache.org
Subject: Re: [Vote] Merge branch-trunk-win to trunk

On Wed, Feb 27, 2013 at 2:54 PM, Suresh Srinivas sur...@hortonworks.comwrote:

 With that we need to decide how our precommit process looks.
 My inclination is to wait for +1 from precommit builds on both the 
 platforms to ensure no issues are introduced.
 Thoughts?

 2. Feature development impact
 Some questions have been raised about would new features need to be 
 supported on both the platforms. Yes. I do not see a reason why 
 features cannot work on both the platforms, with the exception of 
 platform specific optimizations. This what Java gives us.


I'm concerned about the above. Personally, I don't have access to any Windows 
boxes with development tools, and I know nothing about developing on Windows. 
The only Windows I run is an 8GB VM with 1 GB RAM allocated, for powerpoint :)

If I submit a patch and it gets -1 tests failed on the Windows slave, how am 
I supposed to proceed?

I think a reasonable compromise would be that the tests should always
*build* on Windows before commit, and contributors should do their best to look 
at the test logs for any Windows-specific failures. But, beyond looking at the 
logs, a -1 Tests failed on windows should not block a commit.

Those contributors who are interested in Windows being a first-class platform 
should be responsible for watching the Windows builds and debugging/fixing any 
regressions that might be Windows-specific.

I also think the KDE model that Harsh pointed out is an interesting one -- ie 
the idea that we would not merge windows support to trunk, but rather treat is 
as a parallel code line which lives in the ASF and has its own builds and 
releases. The windows team would periodically merge 

Hadoop-Mapreduce-trunk - Build # 1358 - Failure

2013-02-28 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1358/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 27587 lines...]
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.801 sec
Running org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncher
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.916 sec
Running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.122 sec

Results :

Failed tests:   
testMultipleCrashes(org.apache.hadoop.mapreduce.v2.app.TestRecovery): Reduce 
Task state not correct expected:RUNNING but was:SCHEDULED
  testOutputRecovery(org.apache.hadoop.mapreduce.v2.app.TestRecovery): Task 
state is not correct (timedout) expected:SUCCEEDED but was:RUNNING
  testOutputRecoveryMapsOnly(org.apache.hadoop.mapreduce.v2.app.TestRecovery): 
Task state is not correct (timedout) expected:SUCCEEDED but was:RUNNING
  testRecoveryWithOldCommiter(org.apache.hadoop.mapreduce.v2.app.TestRecovery): 
Task state is not correct (timedout) expected:SUCCEEDED but was:RUNNING
  testSpeculative(org.apache.hadoop.mapreduce.v2.app.TestRecovery): Task state 
is not correct (timedout) expected:SUCCEEDED but was:RUNNING

Tests run: 209, Failures: 5, Errors: 0, Skipped: 0

[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] hadoop-mapreduce-client ... SUCCESS [1.596s]
[INFO] hadoop-mapreduce-client-core .. SUCCESS [22.735s]
[INFO] hadoop-mapreduce-client-common  SUCCESS [23.273s]
[INFO] hadoop-mapreduce-client-shuffle ... SUCCESS [1.673s]
[INFO] hadoop-mapreduce-client-app ... FAILURE [5:25.169s]
[INFO] hadoop-mapreduce-client-hs  SKIPPED
[INFO] hadoop-mapreduce-client-jobclient . SKIPPED
[INFO] hadoop-mapreduce-client-hs-plugins  SKIPPED
[INFO] Apache Hadoop MapReduce Examples .. SKIPPED
[INFO] hadoop-mapreduce .. SKIPPED
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 6:15.105s
[INFO] Finished at: Thu Feb 28 13:20:47 UTC 2013
[INFO] Final Memory: 20M/125M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.12.3:test (default-test) on 
project hadoop-mapreduce-client-app: There are test failures.
[ERROR] 
[ERROR] Please refer to 
/home/jenkins/jenkins-slave/workspace/Hadoop-Mapreduce-trunk/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/target/surefire-reports
 for the individual test results.
[ERROR] - [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn goals -rf :hadoop-mapreduce-client-app
Build step 'Execute shell' marked build as failure
[FINDBUGS] Skipping publisher since build result is FAILURE
Archiving artifacts
Updating HADOOP-9342
Updating YARN-426
Updating HADOOP-9339
Updating MAPREDUCE-4892
Updating MAPREDUCE-4693
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Created] (MAPREDUCE-5037) JobControl logging when a job completes

2013-02-28 Thread Jason Lowe (JIRA)
Jason Lowe created MAPREDUCE-5037:
-

 Summary: JobControl logging when a job completes
 Key: MAPREDUCE-5037
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5037
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 0.23.7, 2.0.4-beta
Reporter: Jason Lowe
Priority: Minor


JobControl emits logs, via the logging in Job.submit(), whenever a job is 
launched.  It would be nice if it also logged when active jobs it is tracking 
complete and their final status (i.e.: success, failed, killed, etc.).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Vote] Merge branch-trunk-win to trunk

2013-02-28 Thread Chris Nauroth
I'd like to share a few anecdotes about developing cross-platform,
hopefully to address some of the concerns about adding overhead to the
development process.  By reviewing past cases of cross-platform Linux vs.
Windows bugs, we can get a sense for how the development process could look
in the future.

HADOOP-9131: TestLocalFileSystem#testListStatusWithColons cannot run on
Windows.  As part of an earlier jira, HADOOP-8962, there was a new test
committed on trunk covering the case of a local file system interaction on
a file containing a ':'.  On Windows, ':' in a path has special meaning as
part of the drive specifier (i.e. C:), so this test cannot pass when
running on Windows.  In this kind of case, the cross-platform bug is
obvious, and the fix is obvious (assumeTrue(!Shell.WINDOWS)).  Ideally,
this would get fixed pre-commit after seeing a -1 from the Windows Jenkins
slave.

HDFS-4274: BlockPoolSliceScanner does not close verification log during
shutdown.  This caused problems for MiniDFSCluster-based tests running on
Windows.  Failure to close the verification log meant that we didn't
release file locks, so the tests couldn't delete/recreate working
directories during teardown/setup.  Arguably, this was always a bug, and
running on Windows just exposed it because of its stricter rules about file
locking.  This is a more complex fix, but it doesn't require
platform-specific knowledge.  If some future patch accidentally regresses
this, then we'll likely see +1 from Linux Jenkins and -1 from Windows
Jenkins.  Ideally, it would get fixed pre-commit, because it doesn't
require Windows-specific knowledge.  There is also the matter of impact.
 Re-breaking this would re-break many test suites on Windows.

HADOOP-9232: JniBasedUnixGroupsMappingWithFallback fails on Windows with
UnsatisfiedLinkError.  This was introduced by HADOOP-8712, which switched
to JniBasedUnixGroupsMappingWithFallback as the default
hadoop.security.group.mapping, but did not provide a Windows implementation
of the JNI function.  In this case, there was a strong desire to get
HADOOP-8712 into a release, fixing it on Windows required native Windows
API knowledge, and Windows users had a simple workaround available by
changing their configs back to ShellBasedUnixGroupsMapping.  I think this
is the kind of situation where we could allow HADOOP-8712 to commit despite
-1 from Windows Jenkins, with fairly quick follow-up from an engineer with
the Windows expertise to fix it.

To summarize, I don't think it needs to differ greatly from our current
development process.  We're all responsible for breadth of understanding
and maintenance of the whole codebase, but we also rely on specific
individuals with deep expertise in particular areas for certain issues.
 Sometimes we commit despite a -1 from Jenkins, based on the community's
judgment.

Virtualization greatly simplifies cross-platform development.  I use
VirtualBox on a Mac host and run VMs for Windows and Ubuntu with a shared
drive so that they can all see the same copy of the source code.  There are
plenty of variations on this depending on your preference, such as
offloading the VMs to a separate server or cloud service to free up local
RAM.  I'm planning on submitting BUILDING.txt changes later today that
fully describe how to build on Windows.  After some initial setup, it's
nearly identical to the mvn commands that you already use today.

Hope this helps,
--Chris


On Thu, Feb 28, 2013 at 3:25 AM, John Gordon john.gor...@microsoft.comwrote:

 +1 (non-binding)

 I want to share my vote of confidence in this community.  If motivated to
 do so, this community can keep this project cross-platform and continue to
 rapidly innovate without breaking a sweat.

 The day we started working on this, I saw the foundations of greatness in
 the quality and volume of dev tests, the code itself, and the Apache values
 themselves.

 1.) Hadoop's unit tests and their frameworks are very well thought out and
 the consideration and energy that went into their design is worthy of
 praise.  The MiniCluster abstractions utilize very few resources and put
 all the processes into one JVM for easy debugging.  It is very easy to
 select specific tests from the full suite to reproduce an issue reported in
 another environment - like the Jenkins build server or another
 contributor's environment.
 2.) This community has done an excellent job of incorporating well-placed
 log messages to make it easy to post mortem troubleshoot most failures.
  The logs are very useful, and it is extremely rare that troubleshooting a
 failure requires debugging a live repro.
 3.) Hadoop is written primarily in Java, a cross-platform language that
 provides its own platform in the form of the JVM to insulate most of the
 code from the specifics of the OS layer.
 4.) CoPDoC - The right priorities, and well stated.


 Thank you,

 John

 -Original Message-
 From: Ivan Mitic [mailto:iva...@microsoft.com]
 Sent: Wednesday, 

Re: [Vote] Merge branch-trunk-win to trunk

2013-02-28 Thread Robert Evans
My initial question was mostly intended to understand the desired new
classification of Windows after the merge, and how we plan to maintain
Windows support.  I am happy to hear that hardware for Jenkins will be
provided.  I am also fine, at least initially, with us trying to treat
Windows as a first class supported platform.  But I realize that there are
a lot of people that do not have easy access to Windows for
development/debugging, myself included. I also don't want to slow down the
pace of development too much because of this.  It will cause some
organizations that do not use or support Windows to be more likely to run
software that has diverged from an official release.  It also has the
potential to make the patch submission process even more difficult, which
increases the likelihood of submitters abandoning patches.  However, the
great thing about being in a community is we can change if we need to.

I am +0 for the merge.  I am not a Windows expert so I don't feel
comfortable giving it a true +1.

--Bobby


On 2/28/13 10:45 AM, Chris Nauroth cnaur...@hortonworks.com wrote:

I'd like to share a few anecdotes about developing cross-platform,
hopefully to address some of the concerns about adding overhead to the
development process.  By reviewing past cases of cross-platform Linux vs.
Windows bugs, we can get a sense for how the development process could
look
in the future.

HADOOP-9131: TestLocalFileSystem#testListStatusWithColons cannot run on
Windows.  As part of an earlier jira, HADOOP-8962, there was a new test
committed on trunk covering the case of a local file system interaction on
a file containing a ':'.  On Windows, ':' in a path has special meaning as
part of the drive specifier (i.e. C:), so this test cannot pass when
running on Windows.  In this kind of case, the cross-platform bug is
obvious, and the fix is obvious (assumeTrue(!Shell.WINDOWS)).  Ideally,
this would get fixed pre-commit after seeing a -1 from the Windows Jenkins
slave.

HDFS-4274: BlockPoolSliceScanner does not close verification log during
shutdown.  This caused problems for MiniDFSCluster-based tests running on
Windows.  Failure to close the verification log meant that we didn't
release file locks, so the tests couldn't delete/recreate working
directories during teardown/setup.  Arguably, this was always a bug, and
running on Windows just exposed it because of its stricter rules about
file
locking.  This is a more complex fix, but it doesn't require
platform-specific knowledge.  If some future patch accidentally regresses
this, then we'll likely see +1 from Linux Jenkins and -1 from Windows
Jenkins.  Ideally, it would get fixed pre-commit, because it doesn't
require Windows-specific knowledge.  There is also the matter of impact.
 Re-breaking this would re-break many test suites on Windows.

HADOOP-9232: JniBasedUnixGroupsMappingWithFallback fails on Windows with
UnsatisfiedLinkError.  This was introduced by HADOOP-8712, which switched
to JniBasedUnixGroupsMappingWithFallback as the default
hadoop.security.group.mapping, but did not provide a Windows
implementation
of the JNI function.  In this case, there was a strong desire to get
HADOOP-8712 into a release, fixing it on Windows required native Windows
API knowledge, and Windows users had a simple workaround available by
changing their configs back to ShellBasedUnixGroupsMapping.  I think this
is the kind of situation where we could allow HADOOP-8712 to commit
despite
-1 from Windows Jenkins, with fairly quick follow-up from an engineer with
the Windows expertise to fix it.

To summarize, I don't think it needs to differ greatly from our current
development process.  We're all responsible for breadth of understanding
and maintenance of the whole codebase, but we also rely on specific
individuals with deep expertise in particular areas for certain issues.
 Sometimes we commit despite a -1 from Jenkins, based on the community's
judgment.

Virtualization greatly simplifies cross-platform development.  I use
VirtualBox on a Mac host and run VMs for Windows and Ubuntu with a shared
drive so that they can all see the same copy of the source code.  There
are
plenty of variations on this depending on your preference, such as
offloading the VMs to a separate server or cloud service to free up local
RAM.  I'm planning on submitting BUILDING.txt changes later today that
fully describe how to build on Windows.  After some initial setup, it's
nearly identical to the mvn commands that you already use today.

Hope this helps,
--Chris


On Thu, Feb 28, 2013 at 3:25 AM, John Gordon
john.gor...@microsoft.comwrote:

 +1 (non-binding)

 I want to share my vote of confidence in this community.  If motivated
to
 do so, this community can keep this project cross-platform and continue
to
 rapidly innovate without breaking a sweat.

 The day we started working on this, I saw the foundations of greatness
in
 the quality and volume of dev tests, the code itself, and the Apache
values
 

Re: [Vote] Merge branch-trunk-win to trunk

2013-02-28 Thread Chris Nauroth
 Is there a jira for resolving the outstanding TODOs in the code base
 (similar to HDFS-2148)?  Looks like this merge doesn't introduce many
 which is great (just did a quick diff and grep).

I found 2 remaining TODOs introduced in the current merge patch.  One is in
ContainerLaunch.java.  The container launch script was trying to set a
CLASSPATH that exceeded the Windows maximum command line length.  The fix
was to wrap the long classpath into an intermediate jar containing only a
manifest file with a Class-Path entry.  (See YARN-316.)  Just to be
conservative, we wrapped this logic in an if (Shell.WINDOWS) guard and
marked a TODO to remove it later and use that approach on all platforms
after additional testing.  I've tested this code path successfully on Mac
too, but several people wanted additional testing and performance checks
before removing the if (Shell.WINDOWS) guard.  That work is tracked in an
existing jira: YARN-358.

The other TODO is for winutils to print more usage information and
examples.  At this point, I think winutils is printing sufficient
information, and we can just remove the TODO.  I just submitted a new jira
to start that conversation: HADOOP-9348.

Thank you,
--Chris


On Thu, Feb 28, 2013 at 11:29 AM, Robert Evans ev...@yahoo-inc.com wrote:

 My initial question was mostly intended to understand the desired new
 classification of Windows after the merge, and how we plan to maintain
 Windows support.  I am happy to hear that hardware for Jenkins will be
 provided.  I am also fine, at least initially, with us trying to treat
 Windows as a first class supported platform.  But I realize that there are
 a lot of people that do not have easy access to Windows for
 development/debugging, myself included. I also don't want to slow down the
 pace of development too much because of this.  It will cause some
 organizations that do not use or support Windows to be more likely to run
 software that has diverged from an official release.  It also has the
 potential to make the patch submission process even more difficult, which
 increases the likelihood of submitters abandoning patches.  However, the
 great thing about being in a community is we can change if we need to.

 I am +0 for the merge.  I am not a Windows expert so I don't feel
 comfortable giving it a true +1.

 --Bobby


 On 2/28/13 10:45 AM, Chris Nauroth cnaur...@hortonworks.com wrote:

 I'd like to share a few anecdotes about developing cross-platform,
 hopefully to address some of the concerns about adding overhead to the
 development process.  By reviewing past cases of cross-platform Linux vs.
 Windows bugs, we can get a sense for how the development process could
 look
 in the future.
 
 HADOOP-9131: TestLocalFileSystem#testListStatusWithColons cannot run on
 Windows.  As part of an earlier jira, HADOOP-8962, there was a new test
 committed on trunk covering the case of a local file system interaction on
 a file containing a ':'.  On Windows, ':' in a path has special meaning as
 part of the drive specifier (i.e. C:), so this test cannot pass when
 running on Windows.  In this kind of case, the cross-platform bug is
 obvious, and the fix is obvious (assumeTrue(!Shell.WINDOWS)).  Ideally,
 this would get fixed pre-commit after seeing a -1 from the Windows Jenkins
 slave.
 
 HDFS-4274: BlockPoolSliceScanner does not close verification log during
 shutdown.  This caused problems for MiniDFSCluster-based tests running on
 Windows.  Failure to close the verification log meant that we didn't
 release file locks, so the tests couldn't delete/recreate working
 directories during teardown/setup.  Arguably, this was always a bug, and
 running on Windows just exposed it because of its stricter rules about
 file
 locking.  This is a more complex fix, but it doesn't require
 platform-specific knowledge.  If some future patch accidentally regresses
 this, then we'll likely see +1 from Linux Jenkins and -1 from Windows
 Jenkins.  Ideally, it would get fixed pre-commit, because it doesn't
 require Windows-specific knowledge.  There is also the matter of impact.
  Re-breaking this would re-break many test suites on Windows.
 
 HADOOP-9232: JniBasedUnixGroupsMappingWithFallback fails on Windows with
 UnsatisfiedLinkError.  This was introduced by HADOOP-8712, which switched
 to JniBasedUnixGroupsMappingWithFallback as the default
 hadoop.security.group.mapping, but did not provide a Windows
 implementation
 of the JNI function.  In this case, there was a strong desire to get
 HADOOP-8712 into a release, fixing it on Windows required native Windows
 API knowledge, and Windows users had a simple workaround available by
 changing their configs back to ShellBasedUnixGroupsMapping.  I think this
 is the kind of situation where we could allow HADOOP-8712 to commit
 despite
 -1 from Windows Jenkins, with fairly quick follow-up from an engineer with
 the Windows expertise to fix it.
 
 To summarize, I don't think it needs to differ greatly from our 

[jira] [Created] (MAPREDUCE-5038) mapred CombineFileInputFormat does not work on non-splittable files

2013-02-28 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-5038:
-

 Summary: mapred CombineFileInputFormat does not work on 
non-splittable files
 Key: MAPREDUCE-5038
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5038
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Sandy Ryza
Assignee: Sandy Ryza


MAPREDUCE-1597 enabled the CombineFileInputFormat in mapreduce to work on 
splittable files, but neglected to consider the one in mapred.

In trunk this is not an issue as the one in mapred extends the one in mapreduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Vote] Merge branch-trunk-win to trunk

2013-02-28 Thread Ramya Sunil
+1 for the merge.

As someone who has been testing the code for many months now, both on
singlenode and multinode clusters, I am very confident about the stability
and the quality of the code. I have run several regression tests to verify
distributed cache, streaming, compression, capacity scheduler, job history
and many more features in HDFS and MR.

- Ramya

On Thu, Feb 28, 2013 at 3:08 PM, sanjay Radia san...@hortonworks.comwrote:

 +1
 Java has done the bulk of the work in making Hadoop multi-platform.
 Windows specific code is a tiny percentage of the code.
 Jeninks support for windows is going help us keep the platform portable
 going forward.
 I expect that the vast majority of new commits have  no problems. I
 propose that we start by fixing problems that Jenkins raises but not block
 new commits for too long if the author does not have a windows box or if a
 volunteer does not step up.

 sanjay






Re: [Vote] Merge branch-trunk-win to trunk

2013-02-28 Thread Konstantin Boudnik
On Thu, Feb 28, 2013 at 03:08PM, sanjay Radia wrote:
 +1
 Java has done the bulk of the work in making Hadoop multi-platform.
 Windows specific code is a tiny percentage of the code.
 Jeninks support for windows is going help us keep the platform portable going 
 forward.
 I expect that the vast majority of new commits have  no problems. I propose
 that we start by fixing problems that Jenkins raises but not block new
 commits for too long if the author does not have a windows box or if a
 volunteer does not step up.

Considering a typical set of software most of the people here work with it
would be completely inappropriate to block commits for failing Windows
specific features. After all, Microsoft never did bother to check what
features or compatibilty matters they have broke in Java and elsewhere, so why
should we?

I believe this kind of rules have to be set and discussed before the merge is
done.

Cheers,
  Cos


signature.asc
Description: Digital signature


Fwd: [Vote] Merge branch-trunk-win to trunk

2013-02-28 Thread Matt Foley
+1 (binding)

Apache is supposed to be about the community.  We have here a community of
developers, who have actively and openly worked to add a major improvement
to Hadoop: the ability to work cross-platform.  Furthermore, the size of
the substantive part of the needed patch is only about 1500 lines, much
smaller than quite a few other additions to Hadoop over the last few
months.  We should welcome and support this change, and make sure that the
code stays cross-platform going forward by extending our CI practices,
especially pre-commit test-patch, to also include Windows.

As most of you know, my colleague Giri Kesavan (PMC member) helps maintain
the Linux CI capability for Hadoop.  I've talked with him, and he and I are
committing to getting test-patch implemented for Windows, so that along
with the current automated +1s required to commit, we can add two more,
for javac build in Windows and core unit tests in Windows.

Members of the team implementing cross-platform compatibility, including
Microsoft employees, have opened the discussion for providing hardware or
VM resources to perform this additional CI testing.  I will assist them to
work with the Apache Infra team and figure out how to make it happen.

I understand there is some concern about the additional platform test.
 My going-in
presumption, based on Java's intrinsic, pretty-good, cross-platform
compatibility, is that patches to Hadoop will by default also have
cross-platform compatibility, unless they are written in an explicitly
platform-dependent way.  I also believe that in the vast majority of cases
the cross-platform compatibility of Java will carry thru to Hadoop patches,
without additional effort on the developer's part.

Let's try it, and see what happens.  If we actually find a frequent
difficulty, we'll change to engineer around it.  But I believe that, in the
rare cases where a Windows-specific failure occurs, there will be a number
of people (new, enthusiastic members of the community! :-) willing to help.
 If such help is not forthcoming, then we can discuss work-arounds, but
like a previous poster, I am confident in the community.

Regards,
--Matt



On Thu, Feb 28, 2013 at 12:21 PM, Chuan Liu chuan...@microsoft.com wrote:

  +1 (non-binding)

 As someone also contributed to porting Hadoop to Windows, I think Java
 already provided a very good platform independent platform.
 For features that are not available in Java, we will try to provide our
 platform independent APIs that abstract OS tasks away.
 Most features should have no difficulty running on Windows and Linux by
 using Java and those platform independent APIs.

 For concerns raise on new features that may fail on Windows, I think we
 don't need to require passing on Windows a mandate at the moment. We can
 simply mark it unavailable to Windows and port it later if the feature is
 important.

 -Chuan

 -Original Message-
 From: Chris Nauroth [mailto:cnaur...@hortonworks.com]
 Sent: Thursday, February 28, 2013 11:51 AM
 To: hdfs-...@hadoop.apache.org
 Cc: mapreduce-dev@hadoop.apache.org; yarn-...@hadoop.apache.org;
 common-...@hadoop.apache.org
 Subject: Re: [Vote] Merge branch-trunk-win to trunk

  Is there a jira for resolving the outstanding TODOs in the code base
  (similar to HDFS-2148)?  Looks like this merge doesn't introduce many
  which is great (just did a quick diff and grep).

 I found 2 remaining TODOs introduced in the current merge patch.  One is
 in ContainerLaunch.java.  The container launch script was trying to set a
 CLASSPATH that exceeded the Windows maximum command line length.  The fix
 was to wrap the long classpath into an intermediate jar containing only a
 manifest file with a Class-Path entry.  (See YARN-316.)  Just to be
 conservative, we wrapped this logic in an if (Shell.WINDOWS) guard and
 marked a TODO to remove it later and use that approach on all platforms
 after additional testing.  I've tested this code path successfully on Mac
 too, but several people wanted additional testing and performance checks
 before removing the if (Shell.WINDOWS) guard.  That work is tracked in an
 existing jira: YARN-358.

 The other TODO is for winutils to print more usage information and
 examples.  At this point, I think winutils is printing sufficient
 information, and we can just remove the TODO.  I just submitted a new jira
 to start that conversation: HADOOP-9348.

 Thank you,
 --Chris


 On Thu, Feb 28, 2013 at 11:29 AM, Robert Evans ev...@yahoo-inc.com
 wrote:

  My initial question was mostly intended to understand the desired new
  classification of Windows after the merge, and how we plan to maintain
  Windows support.  I am happy to hear that hardware for Jenkins will be
  provided.  I am also fine, at least initially, with us trying to treat
  Windows as a first class supported platform.  But I realize that there
  are a lot of people that do not have easy access to Windows for
  development/debugging, myself included. I also