[jira] [Created] (MAPREDUCE-5604) TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length

2013-10-31 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5604:


 Summary: TestMRAMWithNonNormalizedCapabilities fails on Windows 
due to exceeding max path length
 Key: MAPREDUCE-5604
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5604
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, test
Affects Versions: 2.2.0, 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


The test uses the full class name as a component of the 
{{yarn.nodemanager.local-dirs}} setting for a {{MiniMRYarnCluster}}.  This 
causes container launch to fail when trying to access files at a path longer 
than the maximum of 260 characters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5392) mapred job -history all command throws IndexOutOfBoundsException

2013-10-31 Thread Shinichi Yamashita (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809963#comment-13809963
 ] 

Shinichi Yamashita commented on MAPREDUCE-5392:
---

Jenkins occurred OutOfMemoryError. I checked Jenkins log and OOM occurred at 
native-maven-plugin phase.
To make sure, I attach a same patch and Jenkins test.

 mapred job -history all command throws IndexOutOfBoundsException
 --

 Key: MAPREDUCE-5392
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5392
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 3.0.0, 2.0.5-alpha, 2.2.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
Priority: Minor
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
 MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
 MAPREDUCE-5392.patch, MAPREDUCE-5392.patch


 When I use an all option by mapred job -history comamnd, the following 
 exceptions are displayed and do not work.
 {code}
 Exception in thread main java.lang.StringIndexOutOfBoundsException: String 
 index out of range: -3
 at java.lang.String.substring(String.java:1875)
 at 
 org.apache.hadoop.mapreduce.util.HostUtil.convertTrackerNameToHostName(HostUtil.java:49)
 at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.getTaskLogsUrl(HistoryViewer.java:459)
 at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.printAllTaskAttempts(HistoryViewer.java:235)
 at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.print(HistoryViewer.java:117)
 at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:472)
 at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1233)
 {code}
 This is because a node name recorded in History file is not given tracker_. 
 Therefore it makes modifications to be able to read History file even if a 
 node name is not given by tracker_.
 In addition, it fixes the URL of displayed task log.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5392) mapred job -history all command throws IndexOutOfBoundsException

2013-10-31 Thread Shinichi Yamashita (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shinichi Yamashita updated MAPREDUCE-5392:
--

Attachment: MAPREDUCE-5392.patch

 mapred job -history all command throws IndexOutOfBoundsException
 --

 Key: MAPREDUCE-5392
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5392
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 3.0.0, 2.0.5-alpha, 2.2.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
Priority: Minor
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
 MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
 MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch


 When I use an all option by mapred job -history comamnd, the following 
 exceptions are displayed and do not work.
 {code}
 Exception in thread main java.lang.StringIndexOutOfBoundsException: String 
 index out of range: -3
 at java.lang.String.substring(String.java:1875)
 at 
 org.apache.hadoop.mapreduce.util.HostUtil.convertTrackerNameToHostName(HostUtil.java:49)
 at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.getTaskLogsUrl(HistoryViewer.java:459)
 at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.printAllTaskAttempts(HistoryViewer.java:235)
 at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.print(HistoryViewer.java:117)
 at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:472)
 at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1233)
 {code}
 This is because a node name recorded in History file is not given tracker_. 
 Therefore it makes modifications to be able to read History file even if a 
 node name is not given by tracker_.
 In addition, it fixes the URL of displayed task log.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5604) TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length

2013-10-31 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated MAPREDUCE-5604:
-

Attachment: MAPREDUCE-5604.1.patch

I'm attaching a patch that applies the same fix we've used in similar cases: 
use the simple class name instead of the fullly qualified class name so that 
the testing directory is shorter.

 TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max 
 path length
 ---

 Key: MAPREDUCE-5604
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5604
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, test
Affects Versions: 3.0.0, 2.2.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: MAPREDUCE-5604.1.patch


 The test uses the full class name as a component of the 
 {{yarn.nodemanager.local-dirs}} setting for a {{MiniMRYarnCluster}}.  This 
 causes container launch to fail when trying to access files at a path longer 
 than the maximum of 260 characters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5604) TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length

2013-10-31 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated MAPREDUCE-5604:
-

Status: Patch Available  (was: Open)

 TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max 
 path length
 ---

 Key: MAPREDUCE-5604
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5604
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, test
Affects Versions: 2.2.0, 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: MAPREDUCE-5604.1.patch


 The test uses the full class name as a component of the 
 {{yarn.nodemanager.local-dirs}} setting for a {{MiniMRYarnCluster}}.  This 
 causes container launch to fail when trying to access files at a path longer 
 than the maximum of 260 characters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5604) TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length

2013-10-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809977#comment-13809977
 ] 

Hadoop QA commented on MAPREDUCE-5604:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12611259/MAPREDUCE-5604.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4160//console

This message is automatically generated.

 TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max 
 path length
 ---

 Key: MAPREDUCE-5604
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5604
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, test
Affects Versions: 3.0.0, 2.2.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: MAPREDUCE-5604.1.patch


 The test uses the full class name as a component of the 
 {{yarn.nodemanager.local-dirs}} setting for a {{MiniMRYarnCluster}}.  This 
 causes container launch to fail when trying to access files at a path longer 
 than the maximum of 260 characters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5392) mapred job -history all command throws IndexOutOfBoundsException

2013-10-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809983#comment-13809983
 ] 

Hadoop QA commented on MAPREDUCE-5392:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611258/MAPREDUCE-5392.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs:

  org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4161//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4161//console

This message is automatically generated.

 mapred job -history all command throws IndexOutOfBoundsException
 --

 Key: MAPREDUCE-5392
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5392
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 3.0.0, 2.0.5-alpha, 2.2.0
Reporter: Shinichi Yamashita
Assignee: Shinichi Yamashita
Priority: Minor
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
 MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
 MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch


 When I use an all option by mapred job -history comamnd, the following 
 exceptions are displayed and do not work.
 {code}
 Exception in thread main java.lang.StringIndexOutOfBoundsException: String 
 index out of range: -3
 at java.lang.String.substring(String.java:1875)
 at 
 org.apache.hadoop.mapreduce.util.HostUtil.convertTrackerNameToHostName(HostUtil.java:49)
 at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.getTaskLogsUrl(HistoryViewer.java:459)
 at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.printAllTaskAttempts(HistoryViewer.java:235)
 at 
 org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.print(HistoryViewer.java:117)
 at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:472)
 at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
 at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1233)
 {code}
 This is because a node name recorded in History file is not given tracker_. 
 Therefore it makes modifications to be able to read History file even if a 
 node name is not given by tracker_.
 In addition, it fixes the URL of displayed task log.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5604) TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length

2013-10-31 Thread Chuan Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809987#comment-13809987
 ] 

Chuan Liu commented on MAPREDUCE-5604:
--

+1. Change looks good to me.

 TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max 
 path length
 ---

 Key: MAPREDUCE-5604
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5604
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, test
Affects Versions: 3.0.0, 2.2.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: MAPREDUCE-5604.1.patch


 The test uses the full class name as a component of the 
 {{yarn.nodemanager.local-dirs}} setting for a {{MiniMRYarnCluster}}.  This 
 causes container launch to fail when trying to access files at a path longer 
 than the maximum of 260 characters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5604) TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max path length

2013-10-31 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810414#comment-13810414
 ] 

Chris Nauroth commented on MAPREDUCE-5604:
--

bq. -1 javac. The patch appears to cause the build to fail.

This is unrelated to the patch.  It looks like another problem with the Jenkins 
server being overloaded:

{code}
Error occurred during initialization of VM
Cannot create VM thread. Out of system resources.
{code}


 TestMRAMWithNonNormalizedCapabilities fails on Windows due to exceeding max 
 path length
 ---

 Key: MAPREDUCE-5604
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5604
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, test
Affects Versions: 3.0.0, 2.2.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: MAPREDUCE-5604.1.patch


 The test uses the full class name as a component of the 
 {{yarn.nodemanager.local-dirs}} setting for a {{MiniMRYarnCluster}}.  This 
 causes container launch to fail when trying to access files at a path longer 
 than the maximum of 260 characters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5186) mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail

2013-10-31 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5186:
--

Target Version/s: 2.2.1
  Status: Patch Available  (was: Open)

 mapreduce.job.max.split.locations causes some splits created by 
 CombineFileInputFormat to fail
 --

 Key: MAPREDUCE-5186
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 2.2.0, 2.0.4-alpha
Reporter: Sangjin Lee
Assignee: Robert Parker
Priority: Critical
 Attachments: MAPREDUCE-5186v1.patch, MAPREDUCE-5186v2.patch, 
 MAPREDUCE-5186v3.patch


 CombineFileInputFormat can easily create splits that can come from many 
 different locations (during the last pass of creating global splits). 
 However, we observe that this often runs afoul of the 
 mapreduce.job.max.split.locations check that's done by JobSplitWriter.
 The default value for mapreduce.job.max.split.locations is 10, and with any 
 decent size cluster, CombineFileInputFormat creates splits that are well 
 above this limit.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5186) mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail

2013-10-31 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5186:
--

Attachment: MAPREDUCE-5186v3.patch

Updating Rob's patch with unit tests to verify truncation of locations is 
occurring when necessary.  Also removed the TestBlockLimits test since it was 
checking for an exception in this case and is no longer necessary.

 mapreduce.job.max.split.locations causes some splits created by 
 CombineFileInputFormat to fail
 --

 Key: MAPREDUCE-5186
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 2.0.4-alpha, 2.2.0
Reporter: Sangjin Lee
Assignee: Robert Parker
Priority: Critical
 Attachments: MAPREDUCE-5186v1.patch, MAPREDUCE-5186v2.patch, 
 MAPREDUCE-5186v3.patch


 CombineFileInputFormat can easily create splits that can come from many 
 different locations (during the last pass of creating global splits). 
 However, we observe that this often runs afoul of the 
 mapreduce.job.max.split.locations check that's done by JobSplitWriter.
 The default value for mapreduce.job.max.split.locations is 10, and with any 
 decent size cluster, CombineFileInputFormat creates splits that are well 
 above this limit.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5186) mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail

2013-10-31 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810631#comment-13810631
 ] 

Sangjin Lee commented on MAPREDUCE-5186:


Looks good to me. Thanks!

 mapreduce.job.max.split.locations causes some splits created by 
 CombineFileInputFormat to fail
 --

 Key: MAPREDUCE-5186
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 2.0.4-alpha, 2.2.0
Reporter: Sangjin Lee
Assignee: Robert Parker
Priority: Critical
 Attachments: MAPREDUCE-5186v1.patch, MAPREDUCE-5186v2.patch, 
 MAPREDUCE-5186v3.patch


 CombineFileInputFormat can easily create splits that can come from many 
 different locations (during the last pass of creating global splits). 
 However, we observe that this often runs afoul of the 
 mapreduce.job.max.split.locations check that's done by JobSplitWriter.
 The default value for mapreduce.job.max.split.locations is 10, and with any 
 decent size cluster, CombineFileInputFormat creates splits that are well 
 above this limit.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5186) mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail

2013-10-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810639#comment-13810639
 ] 

Hadoop QA commented on MAPREDUCE-5186:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12611465/MAPREDUCE-5186v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4162//console

This message is automatically generated.

 mapreduce.job.max.split.locations causes some splits created by 
 CombineFileInputFormat to fail
 --

 Key: MAPREDUCE-5186
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 2.0.4-alpha, 2.2.0
Reporter: Sangjin Lee
Assignee: Robert Parker
Priority: Critical
 Attachments: MAPREDUCE-5186v1.patch, MAPREDUCE-5186v2.patch, 
 MAPREDUCE-5186v3.patch


 CombineFileInputFormat can easily create splits that can come from many 
 different locations (during the last pass of creating global splits). 
 However, we observe that this often runs afoul of the 
 mapreduce.job.max.split.locations check that's done by JobSplitWriter.
 The default value for mapreduce.job.max.split.locations is 10, and with any 
 decent size cluster, CombineFileInputFormat creates splits that are well 
 above this limit.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5186) mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail

2013-10-31 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5186:
--

Attachment: MAPREDUCE-5186v3.patch

Apache build machine had a bunch of processes that had escaped and caused the 
Jenkins user to hit the process ulimit.  Those have been cleaned up, so 
resubmitting the same patch to kick Jenkins again.

 mapreduce.job.max.split.locations causes some splits created by 
 CombineFileInputFormat to fail
 --

 Key: MAPREDUCE-5186
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 2.0.4-alpha, 2.2.0
Reporter: Sangjin Lee
Assignee: Robert Parker
Priority: Critical
 Attachments: MAPREDUCE-5186v1.patch, MAPREDUCE-5186v2.patch, 
 MAPREDUCE-5186v3.patch, MAPREDUCE-5186v3.patch


 CombineFileInputFormat can easily create splits that can come from many 
 different locations (during the last pass of creating global splits). 
 However, we observe that this often runs afoul of the 
 mapreduce.job.max.split.locations check that's done by JobSplitWriter.
 The default value for mapreduce.job.max.split.locations is 10, and with any 
 decent size cluster, CombineFileInputFormat creates splits that are well 
 above this limit.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-31 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5601:
--

Attachment: MAPREDUCE-5601.patch

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch, 
 MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810809#comment-13810809
 ] 

Hadoop QA commented on MAPREDUCE-5601:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12611494/MAPREDUCE-5601.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4164//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4164//console

This message is automatically generated.

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch, 
 MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5186) mapreduce.job.max.split.locations causes some splits created by CombineFileInputFormat to fail

2013-10-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810852#comment-13810852
 ] 

Hadoop QA commented on MAPREDUCE-5186:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12611483/MAPREDUCE-5186v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

org.apache.hadoop.mapreduce.v2.TestUberAM

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4163//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4163//console

This message is automatically generated.

 mapreduce.job.max.split.locations causes some splits created by 
 CombineFileInputFormat to fail
 --

 Key: MAPREDUCE-5186
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5186
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 2.0.4-alpha, 2.2.0
Reporter: Sangjin Lee
Assignee: Robert Parker
Priority: Critical
 Attachments: MAPREDUCE-5186v1.patch, MAPREDUCE-5186v2.patch, 
 MAPREDUCE-5186v3.patch, MAPREDUCE-5186v3.patch


 CombineFileInputFormat can easily create splits that can come from many 
 different locations (during the last pass of creating global splits). 
 However, we observe that this often runs afoul of the 
 mapreduce.job.max.split.locations check that's done by JobSplitWriter.
 The default value for mapreduce.job.max.split.locations is 10, and with any 
 decent size cluster, CombineFileInputFormat creates splits that are well 
 above this limit.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-31 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810873#comment-13810873
 ] 

Todd Lipcon commented on MAPREDUCE-5601:


bq. Or you're saying we would pass the amount of unreserved memory remaining?

Yea... though could be problematic due to parallel fetches.

I think best would be to add a new field to TaskAttemptCompletionEventProto 
which contains the size of the completed map output. Then the reducer scheduler 
could be smarter and avoid wasting the round trip on things which won't fit 
anyway. Given it's a PB, it could be done compatibly (and fall back to the 
current optimistic behavior).

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch, 
 MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-31 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810875#comment-13810875
 ] 

Todd Lipcon commented on MAPREDUCE-5601:


Also, +1 for the patch

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch, 
 MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

2013-10-31 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13810914#comment-13810914
 ] 

Sandy Ryza commented on MAPREDUCE-5601:
---

bq. Yea... though could be problematic due to parallel fetches.
Right.  It would help a little if the amount remaining was less than the total 
fetched in each request, but wouldn't solve the bigger problem.

bq. I think best would be to add a new field to TaskAttemptCompletionEventProto 
which contains the size of the completed map output.
That's been my thinking too.  Unfortunately the task umbilical protocol is 
still on Writables, so there could be compatibility issues.

 ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
 --

 Key: MAPREDUCE-5601
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch, 
 MAPREDUCE-5601.patch


 When a reducer initiates a fetch request, it does not know whether it will be 
 able to fit the fetched data in memory.  The first part of the response tells 
 how much data will be coming.  If space is not currently available, the 
 reduce will abandon its request and try again later.  When this occurs, the 
 ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
 next time it's asked for, it will definitely be read from disk, even if it 
 happened to be in the page cache before the request.
 I noticed this when trying to figure out why my job was doing so much more 
 disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
 that disk reads went to nearly 0 on machines that had enough memory to fit 
 map outputs into the page cache.  I then straced the NodeManager and noticed 
 that there were over four times as many fadvise DONTNEED calls as map-reduce 
 pairs.  Further logging showed the same map outputs being fetched about this 
 many times.
 This is a regression from MR1, which only did the fadvise DONTNEED after all 
 the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-4362) If possible, we should get back the feature of propagating task logs back to JobClient

2013-10-31 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4362:
---

Target Version/s: 2.3.0

Sandy, can you rebase this patch? We need to remove the YARN changes as 
YARN-649 is in. May be some tests too?

 If possible, we should get back the feature of propagating task logs back to 
 JobClient
 --

 Key: MAPREDUCE-4362
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4362
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 2.0.0-alpha
Reporter: Vinod Kumar Vavilapalli
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-4362.patch, MAPREDUCE-4362.patch


 MAPREDUCE-3889 removed the code which was trying to pull from /tasklog. We 
 should see if it is possible to get back the feature.



--
This message was sent by Atlassian JIRA
(v6.1#6144)