[jira] [Commented] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-08-21 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707800#comment-14707800
 ] 

zhihai xu commented on MAPREDUCE-6460:
--

The failure is because the test didn't wait for the app attempt unregistered 
from ApplicationMasterService (ApplicationMasterService#unregisterAttempt). The 
fix is to wait for the app entering state {{RMAppState.KILLED}} which will make 
sure {{appAttempt.masterService.unregisterAttempt(appAttemptId)}} being called. 
I uploaded the patch MAPREDUCE-6460.000.patch for review.

 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails
 ---

 Key: MAPREDUCE-6460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6460.000.patch


 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails with the following logs:
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
  FAILURE! - in 
 org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
   Time elapsed: 2.606 sec   FAILURE!
 java.lang.AssertionError: Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
   at 
 org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 Results :
 Failed tests: 
   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files

2015-08-21 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706979#comment-14706979
 ] 

Robert Kanter commented on MAPREDUCE-6415:
--

Thanks for the review [~asuresh].  This is just the preliminary patch.  I still 
have to write unit tests, javadocs, and split out the yarn changes into a YARN 
JIRA.  But it sounds like you're good with the approach.

[~aw], any other comments?
How about you [~jlowe]?

 Create a tool to combine aggregated logs into HAR files
 ---

 Key: MAPREDUCE-6415
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: HAR-ableAggregatedLogs_v1.pdf, 
 MAPREDUCE-6415_branch-2_prelim_001.patch, 
 MAPREDUCE-6415_branch-2_prelim_002.patch, MAPREDUCE-6415_prelim_001.patch, 
 MAPREDUCE-6415_prelim_002.patch


 While we wait for YARN-2942 to become viable, it would still be great to 
 improve the aggregated logs problem.  We can write a tool that combines 
 aggregated log files into a single HAR file per application, which should 
 solve the too many files and too many blocks problems.  See the design 
 document for details.
 See YARN-2942 for more context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute

2015-08-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707443#comment-14707443
 ] 

Hudson commented on MAPREDUCE-6357:
---

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #294 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/294/])
MAPREDUCE-6357. MultipleOutputs.write() API should document that output 
committing is not utilized when input path is absolute. Contributed by Dustin 
Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java


 MultipleOutputs.write() API should document that output committing is not 
 utilized when input path is absolute
 --

 Key: MAPREDUCE-6357
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Ivan Mitic
Assignee: Dustin Cote
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6357-1.patch


 After spending the afternoon debugging a user job where reduce tasks were 
 failing on retry with the below exception, I think it would be worthwhile to 
 add a note in the MultipleOutputs.write() documentation, saying that absolute 
 paths may cause improper execution of tasks on retry or when MR speculative 
 execution is enabled. 
 {code}
 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.io.IOException: File already 
 exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2
at 
 org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354)
at 
 org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at 
 org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135)
at 
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475)
at 
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 {code}
 As discussed in MAPREDUCE-3772, when the baseOutputPath passed to 
 MultipleOutputs.write() is an absolute path (or more precisely a path that 
 resolves outside of the job output-dir), the concept of output committing is 
 not utilized. 
 In this case, the user read thru the MultipleOutputs docs and was assuming 
 that everything will be working fine, as there are blog posts saying that 
 MultipleOutputs does handle output commit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.

2015-08-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707390#comment-14707390
 ] 

Allen Wittenauer commented on MAPREDUCE-6454:
-

bq. This is because HADOOP_CLASSPATH is not part of the default white-listed 
environment that goes from YARN to the apps.

If I have HADOOP_CLASSPATH=foo in hadoop-env.sh, when I run a shell command 
(say hadoop version) as part of my app, that's going to overwrite whatever 
Hadoop tries to set it to.  The whitelist is completely irrelevant.

 MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
 

 Key: MAPREDUCE-6454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Fix For: 2.7.2, 2.6.2

 Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, 
 MAPREDUCE-6454-v3.1.patch, MAPREDUCE-6454-v3.patch, MAPREDUCE-6454.patch


 We already set lib jars on distributed-cache to CLASSPATH. However, in some 
 corner cases (like: MR local mode, Hive Map side local join, etc.), we need 
 these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching 
 runjar process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6423) MapOutput Sampler

2015-08-21 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6423:
-
Status: Open  (was: Patch Available)

 MapOutput Sampler
 -

 Key: MAPREDUCE-6423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6423
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ram Manohar Bheemana
Assignee: Ram Manohar Bheemana
Priority: Minor
 Attachments: MapOutputSampler.java


 Need a sampler based on the MapOutput Keys. Current InputSampler 
 implementation has a major drawback which is input and output of a mapper 
 should be same, generally this isn't the case.
 approach:
 1. Create a Sampler which samples the data based on the input.
 2. Run a small map reduce in uber task mode using the original job mapper and 
 identity reducer to generate required MapOutputSample keys
 3. Optionally, we can input the input file to be sample. For example inputs 
 files A, B; we should be able to specify to use only file A for sampling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.

2015-08-21 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707305#comment-14707305
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6454:


bq. Just so it's on record so when someone hits this problem: this is fragile 
and subject to breakage, regardless of the version of hadoop in play. It all 
depends upon how users have HADOOP_CLASSPATH configured in hadoop-env.sh and 
yarn-env.sh.
It is a bit fragile, for sure, but it doesn't by default depend on what is 
configured in *-env.sh like you said. This is because HADOOP_CLASSPATH is not 
part of the default white-listed environment that goes from YARN to the apps.

 MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
 

 Key: MAPREDUCE-6454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Fix For: 2.7.2, 2.6.2

 Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, 
 MAPREDUCE-6454-v3.1.patch, MAPREDUCE-6454-v3.patch, MAPREDUCE-6454.patch


 We already set lib jars on distributed-cache to CLASSPATH. However, in some 
 corner cases (like: MR local mode, Hive Map side local join, etc.), we need 
 these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching 
 runjar process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute

2015-08-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707319#comment-14707319
 ] 

Hudson commented on MAPREDUCE-6357:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #291 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/291/])
MAPREDUCE-6357. MultipleOutputs.write() API should document that output 
committing is not utilized when input path is absolute. Contributed by Dustin 
Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java


 MultipleOutputs.write() API should document that output committing is not 
 utilized when input path is absolute
 --

 Key: MAPREDUCE-6357
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Ivan Mitic
Assignee: Dustin Cote
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6357-1.patch


 After spending the afternoon debugging a user job where reduce tasks were 
 failing on retry with the below exception, I think it would be worthwhile to 
 add a note in the MultipleOutputs.write() documentation, saying that absolute 
 paths may cause improper execution of tasks on retry or when MR speculative 
 execution is enabled. 
 {code}
 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.io.IOException: File already 
 exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2
at 
 org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354)
at 
 org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at 
 org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135)
at 
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475)
at 
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 {code}
 As discussed in MAPREDUCE-3772, when the baseOutputPath passed to 
 MultipleOutputs.write() is an absolute path (or more precisely a path that 
 resolves outside of the job output-dir), the concept of output committing is 
 not utilized. 
 In this case, the user read thru the MultipleOutputs docs and was assuming 
 that everything will be working fine, as there are blog posts saying that 
 MultipleOutputs does handle output commit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.

2015-08-21 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707401#comment-14707401
 ] 

Allen Wittenauer commented on MAPREDUCE-6454:
-

Here's a test you can do to prove my point.

$ echo HADOOP_CLASSPATH=/tmp  $HADOOP_CONF_DIR/hadoop-env.sh
$ hadoop classpath

You should see /tmp

$ HADOOP_CLASSPATH=/etc hadoop classpath

You'll still see /tmp.  You won't see /etc.  (Well, unless your classpath is 
really weird already.)


 MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
 

 Key: MAPREDUCE-6454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Fix For: 2.7.2, 2.6.2

 Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, 
 MAPREDUCE-6454-v3.1.patch, MAPREDUCE-6454-v3.patch, MAPREDUCE-6454.patch


 We already set lib jars on distributed-cache to CLASSPATH. However, in some 
 corner cases (like: MR local mode, Hive Map side local join, etc.), we need 
 these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching 
 runjar process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6434) Add support for PartialFileOutputCommiter when checkpointing is an option during preemption

2015-08-21 Thread Augusto Souza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Augusto Souza updated MAPREDUCE-6434:
-
Attachment: MAPREDUCE-6434.006.patch

 Add support for PartialFileOutputCommiter when checkpointing is an option 
 during preemption
 ---

 Key: MAPREDUCE-6434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6434
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Augusto Souza
Assignee: Augusto Souza
 Attachments: MAPREDUCE-6434.001.patch, MAPREDUCE-6434.002.patch, 
 MAPREDUCE-6434.003.patch, MAPREDUCE-6434.004.patch, MAPREDUCE-6434.005.patch, 
 MAPREDUCE-6434.006.patch


 Finish up some renaming work related to the annotation @Preemptable (it 
 should be @Checkpointable now) and help in the splitting of patch in 
 MAPREDUCE-5269 that is too large for being reviewed or accepted by Jenkins CI 
 scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6423) MapOutput Sampler

2015-08-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707436#comment-14707436
 ] 

Chris Douglas commented on MAPREDUCE-6423:
--

Thanks for taking a look at this. That the sampler only works on input data was 
always a weakness for jobs requiring their output be totally ordered.

Could you generate a patch? The contribution wiki is 
[here|http://wiki.apache.org/hadoop/HowToContribute].

It might be easier for others to use if the Mapper was integrated with the 
InputSampler, but a separate tool is still an improvement.

 MapOutput Sampler
 -

 Key: MAPREDUCE-6423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6423
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ram Manohar Bheemana
Assignee: Ram Manohar Bheemana
Priority: Minor
 Attachments: MapOutputSampler.java


 Need a sampler based on the MapOutput Keys. Current InputSampler 
 implementation has a major drawback which is input and output of a mapper 
 should be same, generally this isn't the case.
 approach:
 1. Create a Sampler which samples the data based on the input.
 2. Run a small map reduce in uber task mode using the original job mapper and 
 identity reducer to generate required MapOutputSample keys
 3. Optionally, we can input the input file to be sample. For example inputs 
 files A, B; we should be able to specify to use only file A for sampling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute

2015-08-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707349#comment-14707349
 ] 

Hudson commented on MAPREDUCE-6357:
---

FAILURE: Integrated in Hadoop-Yarn-trunk #1024 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/1024/])
MAPREDUCE-6357. MultipleOutputs.write() API should document that output 
committing is not utilized when input path is absolute. Contributed by Dustin 
Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
* hadoop-mapreduce-project/CHANGES.txt


 MultipleOutputs.write() API should document that output committing is not 
 utilized when input path is absolute
 --

 Key: MAPREDUCE-6357
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Ivan Mitic
Assignee: Dustin Cote
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6357-1.patch


 After spending the afternoon debugging a user job where reduce tasks were 
 failing on retry with the below exception, I think it would be worthwhile to 
 add a note in the MultipleOutputs.write() documentation, saying that absolute 
 paths may cause improper execution of tasks on retry or when MR speculative 
 execution is enabled. 
 {code}
 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.io.IOException: File already 
 exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2
at 
 org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354)
at 
 org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at 
 org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135)
at 
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475)
at 
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 {code}
 As discussed in MAPREDUCE-3772, when the baseOutputPath passed to 
 MultipleOutputs.write() is an absolute path (or more precisely a path that 
 resolves outside of the job output-dir), the concept of output committing is 
 not utilized. 
 In this case, the user read thru the MultipleOutputs docs and was assuming 
 that everything will be working fine, as there are blog posts saying that 
 MultipleOutputs does handle output commit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6456) Support configurable log aggregation policy

2015-08-21 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated MAPREDUCE-6456:
-
Assignee: Ming Ma

 Support configurable log aggregation policy
 ---

 Key: MAPREDUCE-6456
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6456
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma

 YARN-221 provides a way for a YARN application to specify log aggregation 
 policy via LogAggregationContext.
 This jira covers the necessary changes in MR to use that feature so that any 
 MR job can specify its log aggregation policy via job configuration. That 
 includes:
 * Have MR define its own configurations to config these policies.
 * Make code change at YarnRunner to retrieve these configurations and set the 
 values via LogAggregationContext.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6434) Add support for PartialFileOutputCommiter when checkpointing is an option during preemption

2015-08-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707130#comment-14707130
 ] 

Chris Douglas commented on MAPREDUCE-6434:
--

Offhand, I'd guess adding 
{{TaskType.REDUCE.equals(context.getTaskAttemptID().getTaskType())}} to the 
expression would prevent it from affecting more than reducers, but I haven't 
looked into it. Could you test with a map-only job, where 
{{context.getReducerClass()}} is undefined or not on the classpath?

 Add support for PartialFileOutputCommiter when checkpointing is an option 
 during preemption
 ---

 Key: MAPREDUCE-6434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6434
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Augusto Souza
Assignee: Augusto Souza
 Attachments: MAPREDUCE-6434.001.patch, MAPREDUCE-6434.002.patch, 
 MAPREDUCE-6434.003.patch, MAPREDUCE-6434.004.patch, MAPREDUCE-6434.005.patch


 Finish up some renaming work related to the annotation @Preemptable (it 
 should be @Checkpointable now) and help in the splitting of patch in 
 MAPREDUCE-5269 that is too large for being reviewed or accepted by Jenkins CI 
 scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6434) Add support for PartialFileOutputCommiter when checkpointing is an option during preemption

2015-08-21 Thread Augusto Souza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707248#comment-14707248
 ] 

Augusto Souza commented on MAPREDUCE-6434:
--

Thank you very much [~chris.douglas]! 

I tested the version I submitted before with a map-only job, and I think the 
IdentityReducer is used in cases with num reduce tasks setted to zero and no 
setting for the reducer class, so the previous patch doesn't crash. Am I right 
in this assumption? Is there another way of defining jobs in which I force the  
{{context.getReducerClass()}} to get undefined?

But, I think your feedback is valid, so I am adding a another statement to the 
expression to make sure only the PartialFileOutputCommiter can only be 
instantiated for reduce tasks. If there is a way of making 
{{context.getReducerClass()}} undefined, I can try to make better tests for the 
patch too.


 Add support for PartialFileOutputCommiter when checkpointing is an option 
 during preemption
 ---

 Key: MAPREDUCE-6434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6434
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Augusto Souza
Assignee: Augusto Souza
 Attachments: MAPREDUCE-6434.001.patch, MAPREDUCE-6434.002.patch, 
 MAPREDUCE-6434.003.patch, MAPREDUCE-6434.004.patch, MAPREDUCE-6434.005.patch


 Finish up some renaming work related to the annotation @Preemptable (it 
 should be @Checkpointable now) and help in the splitting of patch in 
 MAPREDUCE-5269 that is too large for being reviewed or accepted by Jenkins CI 
 scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6458) Figure out the way to pass build-in classpath (files in distributed cache, etc.) from parent to spawned shells

2015-08-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6458:

Attachment: MAPREDUCE-6458.00.patch

-00:
* MAPREDUCE-6454 but with HADOOP_CLASSPATH renamed to the not-already-used 
HADOOP_TASK_CLASSPATH
* added finalize code for HADOOP_TASK_CLASSPATH via a new 
hadoop_add_task_classpath function.
* hadoop_add_task_classpath safely verifies the path is valid, puts it in a 
decent place order-wise, etc, etc
* added shell unit tests for hadoop_add_task_classpath
* modified shell unit tests for hadoop_finalize_classpath
* added HADOOP_TASK_CLASSPATH to hadoop-config.cmd for Windows

 Figure out the way to pass build-in classpath (files in distributed cache, 
 etc.) from parent to spawned shells
 --

 Key: MAPREDUCE-6458
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6458
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Allen Wittenauer
 Attachments: MAPREDUCE-6458.00.patch


 In MAPREDUCE-6454 (target for branch-2.x), we provide a way with constraints 
 to pass built-in classpath from parent to child shell, via HADOOP_CLASSPATH, 
 so jars in distributed cache can still work in child tasks. In trunk, we may 
 think some way different, like: involve additional env var to safely pass 
 build-in classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6458) Figure out the way to pass build-in classpath (files in distributed cache, etc.) from parent to spawned shells

2015-08-21 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6458:

Status: Patch Available  (was: Open)

 Figure out the way to pass build-in classpath (files in distributed cache, 
 etc.) from parent to spawned shells
 --

 Key: MAPREDUCE-6458
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6458
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Allen Wittenauer
 Attachments: MAPREDUCE-6458.00.patch


 In MAPREDUCE-6454 (target for branch-2.x), we provide a way with constraints 
 to pass built-in classpath from parent to child shell, via HADOOP_CLASSPATH, 
 so jars in distributed cache can still work in child tasks. In trunk, we may 
 think some way different, like: involve additional env var to safely pass 
 build-in classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute

2015-08-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707589#comment-14707589
 ] 

Hudson commented on MAPREDUCE-6357:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #283 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/283/])
MAPREDUCE-6357. MultipleOutputs.write() API should document that output 
committing is not utilized when input path is absolute. Contributed by Dustin 
Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java
* hadoop-mapreduce-project/CHANGES.txt


 MultipleOutputs.write() API should document that output committing is not 
 utilized when input path is absolute
 --

 Key: MAPREDUCE-6357
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Ivan Mitic
Assignee: Dustin Cote
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6357-1.patch


 After spending the afternoon debugging a user job where reduce tasks were 
 failing on retry with the below exception, I think it would be worthwhile to 
 add a note in the MultipleOutputs.write() documentation, saying that absolute 
 paths may cause improper execution of tasks on retry or when MR speculative 
 execution is enabled. 
 {code}
 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.io.IOException: File already 
 exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2
at 
 org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354)
at 
 org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at 
 org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135)
at 
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475)
at 
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 {code}
 As discussed in MAPREDUCE-3772, when the baseOutputPath passed to 
 MultipleOutputs.write() is an absolute path (or more precisely a path that 
 resolves outside of the job output-dir), the concept of output committing is 
 not utilized. 
 In this case, the user read thru the MultipleOutputs docs and was assuming 
 that everything will be working fine, as there are blog posts saying that 
 MultipleOutputs does handle output commit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute

2015-08-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707605#comment-14707605
 ] 

Hudson commented on MAPREDUCE-6357:
---

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2221 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2221/])
MAPREDUCE-6357. MultipleOutputs.write() API should document that output 
committing is not utilized when input path is absolute. Contributed by Dustin 
Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java


 MultipleOutputs.write() API should document that output committing is not 
 utilized when input path is absolute
 --

 Key: MAPREDUCE-6357
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Ivan Mitic
Assignee: Dustin Cote
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6357-1.patch


 After spending the afternoon debugging a user job where reduce tasks were 
 failing on retry with the below exception, I think it would be worthwhile to 
 add a note in the MultipleOutputs.write() documentation, saying that absolute 
 paths may cause improper execution of tasks on retry or when MR speculative 
 execution is enabled. 
 {code}
 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.io.IOException: File already 
 exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2
at 
 org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354)
at 
 org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at 
 org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135)
at 
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475)
at 
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 {code}
 As discussed in MAPREDUCE-3772, when the baseOutputPath passed to 
 MultipleOutputs.write() is an absolute path (or more precisely a path that 
 resolves outside of the job output-dir), the concept of output committing is 
 not utilized. 
 In this case, the user read thru the MultipleOutputs docs and was assuming 
 that everything will be working fine, as there are blog posts saying that 
 MultipleOutputs does handle output commit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6357) MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute

2015-08-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707542#comment-14707542
 ] 

Hudson commented on MAPREDUCE-6357:
---

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2240 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2240/])
MAPREDUCE-6357. MultipleOutputs.write() API should document that output 
committing is not utilized when input path is absolute. Contributed by Dustin 
Cote. (aajisaka: rev 2ba90c93d71aa2d30ee9ed431750c10c685e5599)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java


 MultipleOutputs.write() API should document that output committing is not 
 utilized when input path is absolute
 --

 Key: MAPREDUCE-6357
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6357
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Ivan Mitic
Assignee: Dustin Cote
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6357-1.patch


 After spending the afternoon debugging a user job where reduce tasks were 
 failing on retry with the below exception, I think it would be worthwhile to 
 add a note in the MultipleOutputs.write() documentation, saying that absolute 
 paths may cause improper execution of tasks on retry or when MR speculative 
 execution is enabled. 
 {code}
 2015-04-28 23:13:10,452 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.io.IOException: File already 
 exists:wasb://full20150...@bgtstoragefull.blob.core.windows.net/user/hadoop/some/path/block-r-00299.bz2
at 
 org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1354)
at 
 org.apache.hadoop.fs.azure.NativeAzureFileSystem.create(NativeAzureFileSystem.java:1195)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:889)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:786)
at 
 org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:135)
at 
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.getRecordWriter(MultipleOutputs.java:475)
at 
 org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.write(MultipleOutputs.java:433)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.processValue(LevelReducer.java:91)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:69)
at 
 com.ancestry.bigtree.hadoop.LevelReducer.reduce(LevelReducer.java:14)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 {code}
 As discussed in MAPREDUCE-3772, when the baseOutputPath passed to 
 MultipleOutputs.write() is an absolute path (or more precisely a path that 
 resolves outside of the job output-dir), the concept of output committing is 
 not utilized. 
 In this case, the user read thru the MultipleOutputs docs and was assuming 
 that everything will be working fine, as there are blog posts saying that 
 MultipleOutputs does handle output commit. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6434) Add support for PartialFileOutputCommiter when checkpointing is an option during preemption

2015-08-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707498#comment-14707498
 ] 

Chris Douglas commented on MAPREDUCE-6434:
--

Agreed, the NPE is usually not a problem since the default should be defined in 
mapred-defaults, though {{JobContextImpl::getReducerClass}} can return null. At 
least two cases shouldn't cause a problem for map-only jobs:
# The base {{mapreduce.Reducer}} is {{\@Checkpointable}}, so it would 
instantiate a {{PartialFileOutputCommitter}}
# A {{Reducer}} in the config shouldn't cause a map-only job to fail if it's 
not on the classpath (this may not be true in the current code, but this 
shouldn't add another case)

We also don't want to do anything surprising for setup/cleanup tasks.

 Add support for PartialFileOutputCommiter when checkpointing is an option 
 during preemption
 ---

 Key: MAPREDUCE-6434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6434
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Augusto Souza
Assignee: Augusto Souza
 Attachments: MAPREDUCE-6434.001.patch, MAPREDUCE-6434.002.patch, 
 MAPREDUCE-6434.003.patch, MAPREDUCE-6434.004.patch, MAPREDUCE-6434.005.patch, 
 MAPREDUCE-6434.006.patch


 Finish up some renaming work related to the annotation @Preemptable (it 
 should be @Checkpointable now) and help in the splitting of patch in 
 MAPREDUCE-5269 that is too large for being reviewed or accepted by Jenkins CI 
 scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-08-21 Thread zhihai xu (JIRA)
zhihai xu created MAPREDUCE-6460:


 Summary: 
TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails
 Key: MAPREDUCE-6460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: zhihai xu
Assignee: zhihai xu


TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails 
with the following logs:
---
 T E S T S
---
Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec  
FAILURE! - in org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
  Time elapsed: 2.606 sec   FAILURE!
java.lang.AssertionError: Expected exception: 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
at 
org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)


Results :

Failed tests: 
  TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
Expected exception: 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException

Tests run: 24, Failures: 1, Errors: 0, Skipped: 0




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-08-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6460:
-
Component/s: test

 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails
 ---

 Key: MAPREDUCE-6460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu

 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails with the following logs:
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
  FAILURE! - in 
 org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
   Time elapsed: 2.606 sec   FAILURE!
 java.lang.AssertionError: Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
   at 
 org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 Results :
 Failed tests: 
   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-08-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6460:
-
Attachment: MAPREDUCE-6460.000.patch

 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails
 ---

 Key: MAPREDUCE-6460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6460.000.patch


 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails with the following logs:
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
  FAILURE! - in 
 org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
   Time elapsed: 2.606 sec   FAILURE!
 java.lang.AssertionError: Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
   at 
 org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 Results :
 Failed tests: 
   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-08-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6460:
-
Attachment: (was: MAPREDUCE-6460.000.patch)

 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails
 ---

 Key: MAPREDUCE-6460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6460.000.patch


 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails with the following logs:
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
  FAILURE! - in 
 org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
   Time elapsed: 2.606 sec   FAILURE!
 java.lang.AssertionError: Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
   at 
 org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 Results :
 Failed tests: 
   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6458) Figure out the way to pass build-in classpath (files in distributed cache, etc.) from parent to spawned shells

2015-08-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707669#comment-14707669
 ] 

Hadoop QA commented on MAPREDUCE-6458:
--

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  20m 18s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 3 new or modified test files. |
| {color:green}+1{color} | javac |   7m 32s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 47s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   3m  4s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | shellcheck |   0m  6s | There were no new shellcheck 
(v0.3.3) issues. |
| {color:red}-1{color} | whitespace |   0m  1s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 29s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   5m 40s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  23m 17s | Tests passed in 
hadoop-common. |
| {color:red}-1{color} | mapreduce tests |   0m 18s | Tests failed in 
hadoop-mapreduce-client-app. |
| {color:red}-1{color} | mapreduce tests |   0m 17s | Tests failed in 
hadoop-mapreduce-client-common. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-api. |
| | |  73m 13s | |
\\
\\
|| Reason || Tests ||
| Failed build | hadoop-mapreduce-client-app |
|   | hadoop-mapreduce-client-common |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12751766/MAPREDUCE-6458.00.patch
 |
| Optional Tests | shellcheck javac unit javadoc findbugs checkstyle |
| git revision | trunk / 22de7c1 |
| whitespace | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5949/artifact/patchprocess/whitespace.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5949/artifact/patchprocess/testrun_hadoop-common.txt
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5949/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-mapreduce-client-common test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5949/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5949/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5949/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5949/console |


This message was automatically generated.

 Figure out the way to pass build-in classpath (files in distributed cache, 
 etc.) from parent to spawned shells
 --

 Key: MAPREDUCE-6458
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6458
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Allen Wittenauer
 Attachments: MAPREDUCE-6458.00.patch


 In MAPREDUCE-6454 (target for branch-2.x), we provide a way with constraints 
 to pass built-in classpath from parent to child shell, via HADOOP_CLASSPATH, 
 so jars in distributed cache can still work in child tasks. In trunk, we may 
 think some way different, like: involve additional env var to safely pass 
 build-in classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-08-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6460:
-
Status: Patch Available  (was: Open)

 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails
 ---

 Key: MAPREDUCE-6460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6460.000.patch


 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails with the following logs:
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
  FAILURE! - in 
 org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
   Time elapsed: 2.606 sec   FAILURE!
 java.lang.AssertionError: Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
   at 
 org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 Results :
 Failed tests: 
   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6460) TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException fails

2015-08-21 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated MAPREDUCE-6460:
-
Attachment: MAPREDUCE-6460.000.patch

 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails
 ---

 Key: MAPREDUCE-6460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: MAPREDUCE-6460.000.patch


 TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 fails with the following logs:
 ---
  T E S T S
 ---
 Running org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 94.525 sec 
  FAILURE! - in 
 org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator
 testAttemptNotFoundCausesRMCommunicatorException(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator)
   Time elapsed: 2.606 sec   FAILURE!
 java.lang.AssertionError: Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
   at 
 org.junit.internal.runners.statements.ExpectException.evaluate(ExpectException.java:32)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 Results :
 Failed tests: 
   TestRMContainerAllocator.testAttemptNotFoundCausesRMCommunicatorException 
 Expected exception: 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocationException
 Tests run: 24, Failures: 1, Errors: 0, Skipped: 0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6415) Create a tool to combine aggregated logs into HAR files

2015-08-21 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706281#comment-14706281
 ] 

Arun Suresh commented on MAPREDUCE-6415:


[~rkanter], The patch looks good to me. You might want to clean up the TODOs 
and add some javaDocs though.
+1 pending that.

 Create a tool to combine aggregated logs into HAR files
 ---

 Key: MAPREDUCE-6415
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6415
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: HAR-ableAggregatedLogs_v1.pdf, 
 MAPREDUCE-6415_branch-2_prelim_001.patch, 
 MAPREDUCE-6415_branch-2_prelim_002.patch, MAPREDUCE-6415_prelim_001.patch, 
 MAPREDUCE-6415_prelim_002.patch


 While we wait for YARN-2942 to become viable, it would still be great to 
 improve the aggregated logs problem.  We can write a tool that combines 
 aggregated log files into a single HAR file per application, which should 
 solve the too many files and too many blocks problems.  See the design 
 document for details.
 See YARN-2942 for more context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6459) native task crashes when merging spilled file on ppc64

2015-08-21 Thread Tao Jie (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie updated MAPREDUCE-6459:
---
Attachment: ppc64_error.txt

 native task crashes when merging spilled file on ppc64
 --

 Key: MAPREDUCE-6459
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6459
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.6.0
 Environment: Linux version 2.6.32-431.el6.ppc64
Reporter: Tao Jie
 Attachments: ppc64_error.txt


 when running native task on ppc64,merging spilled files fails since we could 
 not deserialize local spill file correctly.
 Function readVLong in WritableUtils.h and Buffers.h, we try to compare a char 
 with a number and convert a char to int64_t. It does not work correctly on 
 ppc64 since char definition is different between ppc64 and x86 platform. On 
 x86 platform char is defined as signed number while on ppc64 char is 
 unsigned. As a result, we write EOF marker [-1, -1] at the end of spill 
 partition, but deserialize chars as [255, 255].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6458) Figure out the way to pass build-in classpath (files in distributed cache, etc.) from parent to spawned shells

2015-08-21 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6458:
--
Description: In MAPREDUCE-6454 (target for branch-2.x), we provide a way 
with constraints to pass built-in classpath from parent to child shell, via 
HADOOP_CLASSPATH, so jars in distributed cache can still work in child tasks. 
In trunk, we may think some way different, like: involve additional env var to 
safely pass build-in classpath.  (was: In MAPREDUCE-6454 (target for 
branch-2.x), we provide an extremely fragile way to pass built-in classpath 
from parent to child shell, via HADOOP_CLASSPATH, so jars in distributed cache 
can still work in child tasks. In trunk, we may think some way different, like: 
involve additional env var to safely pass build-in classpath.)

 Figure out the way to pass build-in classpath (files in distributed cache, 
 etc.) from parent to spawned shells
 --

 Key: MAPREDUCE-6458
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6458
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Allen Wittenauer

 In MAPREDUCE-6454 (target for branch-2.x), we provide a way with constraints 
 to pass built-in classpath from parent to child shell, via HADOOP_CLASSPATH, 
 so jars in distributed cache can still work in child tasks. In trunk, we may 
 think some way different, like: involve additional env var to safely pass 
 build-in classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6458) Figure out the way to pass build-in classpath (files in distributed cache, etc.) from parent to spawned shells

2015-08-21 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706488#comment-14706488
 ] 

Junping Du commented on MAPREDUCE-6458:
---

bq. Re-assigning this to me and updating the description to reflect reality, 
since I actually understand how bash works.
Please feel free to take it if you have bandwidth to work on it immediately.

 Figure out the way to pass build-in classpath (files in distributed cache, 
 etc.) from parent to spawned shells
 --

 Key: MAPREDUCE-6458
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6458
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Allen Wittenauer

 In MAPREDUCE-6454 (target for branch-2.x), we provide a way with constraints 
 to pass built-in classpath from parent to child shell, via HADOOP_CLASSPATH, 
 so jars in distributed cache can still work in child tasks. In trunk, we may 
 think some way different, like: involve additional env var to safely pass 
 build-in classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6363) [NNBench] Lease mismatch error when running with multiple mappers

2015-08-21 Thread Ajith S (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706491#comment-14706491
 ] 

Ajith S commented on MAPREDUCE-6363:


Hi [~ajisakaa]

I think what [~uladz] is pointing, that when we run CREATE test, we will create 
files withe unique names, thanks to taskid, so CREATE is fine. But when we run 
rename or delete, the taskid will be new and it will not actually rename or 
delete the files(created by CREATE benchmark) because it will not find the file 
name based on file_+taskId as taskId will be new. right.?

 [NNBench] Lease mismatch error when running with multiple mappers
 -

 Key: MAPREDUCE-6363
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6363
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: benchmarks
Reporter: Brahma Reddy Battula
Assignee: Vlad Sharanhovich
Priority: Critical
 Fix For: 2.8.0

 Attachments: HDFS4929.patch, MAPREDUCE-6363-001.patch, 
 MAPREDUCE-6363-002.patch, MAPREDUCE-6363-003.patch


 Command :
 ./yarn jar 
 ../share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.1-tests.jar 
 nnbench -operation create_write -numberOfFiles 1000 -blockSize 268435456 
 -bytesToWrite 102400 -baseDir /benchmarks/NNBench`hostname -s` 
 -replicationFactorPerFile 3 -maps 100 -reduces 10
 Trace :
 013-06-21 10:44:53,763 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 7 on 9005, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 
 192.168.105.214:36320: error: 
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
 on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by 
 DFSClient_attempt_1371782327901_0001_m_48_0_1383437860_1 but is accessed 
 by DFSClient_attempt_1371782327901_0001_m_84_0_1880545303_1
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
 on /benchmarks/NNBenchlinux-185/data/file_linux-214__0 owned by 
 DFSClient_attempt_1371782327901_0001_m_48_0_1383437860_1 but is accessed 
 by DFSClient_attempt_1371782327901_0001_m_84_0_1880545303_1
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2351)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2098)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2019)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:501)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:213)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:52012)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:435)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:925)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1710)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1706)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6459) native task crashes when merging spilled file on ppc64

2015-08-21 Thread Tao Jie (JIRA)
Tao Jie created MAPREDUCE-6459:
--

 Summary: native task crashes when merging spilled file on ppc64
 Key: MAPREDUCE-6459
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6459
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.6.0
 Environment: Linux version 2.6.32-431.el6.ppc64
Reporter: Tao Jie
 Attachments: ppc64_error.txt

when running native task on ppc64,merging spilled files fails since we could 
not deserialize local spill file correctly.
Function readVLong in WritableUtils.h and Buffers.h, we try to compare a char 
with a number and convert a char to int64_t. It does not work correctly on 
ppc64 since char definition is different between ppc64 and x86 platform. On x86 
platform char is defined as signed number while on ppc64 char is unsigned. As a 
result, we write EOF marker [-1, -1] at the end of spill partition, but 
deserialize chars as [255, 255].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)