[jira] [Commented] (MAPREDUCE-5746) Job diagnostics can implicate wrong task for a failed job

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900223#comment-13900223
 ] 

Hudson commented on MAPREDUCE-5746:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #480 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/480/])
MAPREDUCE-5746. Job diagnostics can implicate wrong task for a failed job. 
(Jason Lowe via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567666)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java


 Job diagnostics can implicate wrong task for a failed job
 -

 Key: MAPREDUCE-5746
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5746
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 0.23.10, 2.1.1-beta
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 0.23.11, 2.4.0

 Attachments: MAPREDUCE-5746-v2.branch-0.23.patch, 
 MAPREDUCE-5746-v2.patch, MAPREDUCE-5746.patch


 We've seen a number of cases where the history server is showing the wrong 
 task as the reason a job failed.  For example, Task 
 task_1383802699973_515536_m_027135 failed 1 times when some other task had 
 failed 4 times and was the real reason the job failed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5746) Job diagnostics can implicate wrong task for a failed job

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900317#comment-13900317
 ] 

Hudson commented on MAPREDUCE-5746:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1672 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1672/])
MAPREDUCE-5746. Job diagnostics can implicate wrong task for a failed job. 
(Jason Lowe via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567666)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java


 Job diagnostics can implicate wrong task for a failed job
 -

 Key: MAPREDUCE-5746
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5746
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 0.23.10, 2.1.1-beta
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 0.23.11, 2.4.0

 Attachments: MAPREDUCE-5746-v2.branch-0.23.patch, 
 MAPREDUCE-5746-v2.patch, MAPREDUCE-5746.patch


 We've seen a number of cases where the history server is showing the wrong 
 task as the reason a job failed.  For example, Task 
 task_1383802699973_515536_m_027135 failed 1 times when some other task had 
 failed 4 times and was the real reason the job failed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5746) Job diagnostics can implicate wrong task for a failed job

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900388#comment-13900388
 ] 

Hudson commented on MAPREDUCE-5746:
---

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1697 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1697/])
MAPREDUCE-5746. Job diagnostics can implicate wrong task for a failed job. 
(Jason Lowe via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567666)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryParser.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryParsing.java


 Job diagnostics can implicate wrong task for a failed job
 -

 Key: MAPREDUCE-5746
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5746
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 0.23.10, 2.1.1-beta
Reporter: Jason Lowe
Assignee: Jason Lowe
 Fix For: 0.23.11, 2.4.0

 Attachments: MAPREDUCE-5746-v2.branch-0.23.patch, 
 MAPREDUCE-5746-v2.patch, MAPREDUCE-5746.patch


 We've seen a number of cases where the history server is showing the wrong 
 task as the reason a job failed.  For example, Task 
 task_1383802699973_515536_m_027135 failed 1 times when some other task had 
 failed 4 times and was the real reason the job failed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5757) ConcurrentModificationException in JobControl.toList

2014-02-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900704#comment-13900704
 ] 

Jason Lowe commented on MAPREDUCE-5757:
---

Stacktrace:

{noformat}
Caused by: java.util.ConcurrentModificationException
  at 
java.util.LinkedList$ListItr.checkForComodification(LinkedList.java:953)
  at java.util.LinkedList$ListItr.next(LinkedList.java:886)
  at 
org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.toList(JobControl.java:82)
  at 
org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.getSuccessfulJobList(JobControl.java:123)
  at 
org.apache.hadoop.mapred.jobcontrol.JobControl.getSuccessfulJobs(JobControl.java:75)
  at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.calculateProgress(Launcher.java:252)
  at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:319)
  at org.apache.pig.PigServer.launchPlan(PigServer.java:1283)
  ... 26 more
{noformat}


 ConcurrentModificationException in JobControl.toList
 

 Key: MAPREDUCE-5757
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5757
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe

 Despite having the fix for MAPREDUCE-5513 we saw another 
 ConcurrencyModificationException in JobControl, so something there still 
 isn't fixed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (MAPREDUCE-5757) ConcurrentModificationException in JobControl.toList

2014-02-13 Thread Jason Lowe (JIRA)
Jason Lowe created MAPREDUCE-5757:
-

 Summary: ConcurrentModificationException in JobControl.toList
 Key: MAPREDUCE-5757
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5757
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0, 0.23.10
Reporter: Jason Lowe


Despite having the fix for MAPREDUCE-5513 we saw another 
ConcurrencyModificationException in JobControl, so something there still isn't 
fixed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (MAPREDUCE-5757) ConcurrentModificationException in JobControl.toList

2014-02-13 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned MAPREDUCE-5757:
-

Assignee: Jason Lowe

The locking in the fix for MAPREDUCE-5513 is mismatched.  The toList method is 
static and therefore locking the class, while the other methods are locking the 
object.

 ConcurrentModificationException in JobControl.toList
 

 Key: MAPREDUCE-5757
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5757
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe

 Despite having the fix for MAPREDUCE-5513 we saw another 
 ConcurrencyModificationException in JobControl, so something there still 
 isn't fixed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5513) ConcurrentModificationException in JobControl

2014-02-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900721#comment-13900721
 ] 

Jason Lowe commented on MAPREDUCE-5513:
---

Note that this is still occurring, see MAPREDUCE-5757.

 ConcurrentModificationException in JobControl
 -

 Key: MAPREDUCE-5513
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5513
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.1.0-beta, 0.23.9
Reporter: Jason Lowe
Assignee: Robert Parker
 Fix For: 3.0.0, 0.23.10, 2.2.0

 Attachments: MAPREDUCE-5513-1.patch


 JobControl.toList is locking individual lists to iterate them, but those 
 lists can be modified elsewhere without holding the list lock.  The locking 
 approaches are mismatched, with toList holding the lock on the actual list 
 object while other methods hold the JobControl lock when modifying the lists.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAPREDUCE-5757) ConcurrentModificationException in JobControl.toList

2014-02-13 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5757:
--

Attachment: MAPREDUCE-5757.patch

Patch to always lock the object rather than the class.  Don't know of an easy 
way to unit test this.

 ConcurrentModificationException in JobControl.toList
 

 Key: MAPREDUCE-5757
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5757
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-5757.patch


 Despite having the fix for MAPREDUCE-5513 we saw another 
 ConcurrencyModificationException in JobControl, so something there still 
 isn't fixed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAPREDUCE-5757) ConcurrentModificationException in JobControl.toList

2014-02-13 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5757:
--

Target Version/s: 0.23.11, 2.4.0
  Status: Patch Available  (was: Open)

 ConcurrentModificationException in JobControl.toList
 

 Key: MAPREDUCE-5757
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5757
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0, 0.23.10
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-5757.patch


 Despite having the fix for MAPREDUCE-5513 we saw another 
 ConcurrencyModificationException in JobControl, so something there still 
 isn't fixed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5751) MR app master fails to start in some cases if mapreduce.job.classloader is true

2014-02-13 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900752#comment-13900752
 ] 

Gera Shegalov commented on MAPREDUCE-5751:
--

I think you can easily add a test case to TestMRAppMaster

 MR app master fails to start in some cases if mapreduce.job.classloader is 
 true
 ---

 Key: MAPREDUCE-5751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: mapreduce-5751.patch


 If mapreduce.job.classloader is set to true, and the MR client includes a 
 jetty jar in its libjars or job jar, the MR app master fails to start. A 
 typical stack trace we get is as follows:
 {noformat}
 java.lang.ClassCastException: org.mortbay.jetty.webapp.WebInfConfiguration 
 cannot be cast to org.mortbay.jetty.webapp.Configuration
   at 
 org.mortbay.jetty.webapp.WebAppContext.loadConfigurations(WebAppContext.java:890)
   at 
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:462)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
   at org.mortbay.jetty.Server.doStart(Server.java:224)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at org.apache.hadoop.http.HttpServer.start(HttpServer.java:676)
   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:208)
   at 
 org.apache.hadoop.mapreduce.v2.app.client.MRClientService.start(MRClientService.java:151)
   at 
 org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:1040)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1307)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1303)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1259)
 {noformat}
 This happens because as part of the MR app master start the jetty classes are 
 loaded normally through the app classloader, but WebAppContext tries to load 
 the specific Configuration class via the thread context classloader (which 
 had been set to the user job classloader).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5756) FileInputFormat.listStatus() including directories in its results

2014-02-13 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900772#comment-13900772
 ] 

Jason Dere commented on MAPREDUCE-5756:
---

Ok, looking a little more at this .. so FileInputFormat.listStatus() is 
returning the same results on hadoop-1 and hadoop-2, and it includes the 
directories, so I guess listStatus() is not the issue. It looks like what 
CombineFileInputFormat.getSplits() does with the file list after getting it is 
different between hadoop-1 and hadoop-2, where hadoop-2 includes those 
directories in the list of InputSplits:

(Hadoop 20S means hadoop 1.x)
{noformat}
2014-02-13 13:35:32,492 ERROR shims.HadoopShimsSecure 
(HadoopShimsSecure.java:getSplits(345)) - ** Hadoop version: 0.20S
2014-02-13 13:35:32,492 ERROR shims.HadoopShimsSecure 
(HadoopShimsSecure.java:getSplits(349)) - ** called super.getSplits(): 
[Paths:/00_0:0+50 Locations:127.0.0.1:; ]
{noformat}

(Hadoop 23 means hadoop 2.x)
{noformat}
2014-02-13 13:38:12,425 ERROR shims.HadoopShimsSecure 
(HadoopShimsSecure.java:getSplits(345)) - ** Hadoop version: 0.23
2014-02-13 13:38:12,425 ERROR shims.HadoopShimsSecure 
(HadoopShimsSecure.java:getSplits(349)) - ** called super.getSplits(): 
[Paths:/00_0:0+50 Locations:127.0.0.1:; , 
Paths:/Users:0+0,/build:0+0,/tmp:0+0,/user:0+0 Locations:; ]
{noformat}


 FileInputFormat.listStatus() including directories in its results
 -

 Key: MAPREDUCE-5756
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5756
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jason Dere

 Trying to track down HIVE-6401, where we see some is not a file errors 
 because getSplits() is giving us directories.  I believe the culprit is 
 FileInputFormat.listStatus():
 {code}
 if (recursive  stat.isDirectory()) {
   addInputPathRecursively(result, fs, stat.getPath(),
   inputFilter);
 } else {
   result.add(stat);
 }
 {code}
 Which seems to be allowing directories to be added to the results if 
 recursive is false.  Is this meant to return directories? If not, I think it 
 should look like this:
 {code}
 if (stat.isDirectory()) {
  if (recursive) {
   addInputPathRecursively(result, fs, stat.getPath(),
   inputFilter);
  }
 } else {
   result.add(stat);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5756) FileInputFormat.listStatus() including directories in its results

2014-02-13 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900776#comment-13900776
 ] 

Jason Dere commented on MAPREDUCE-5756:
---

Looks like the changes in MAPREDUCE-4470 may be causing the difference in the 
1.x vs 2.x behavior. Should CombineFileInputFormat be filtering out any 
locations which turn out to be directories here?

 FileInputFormat.listStatus() including directories in its results
 -

 Key: MAPREDUCE-5756
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5756
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jason Dere

 Trying to track down HIVE-6401, where we see some is not a file errors 
 because getSplits() is giving us directories.  I believe the culprit is 
 FileInputFormat.listStatus():
 {code}
 if (recursive  stat.isDirectory()) {
   addInputPathRecursively(result, fs, stat.getPath(),
   inputFilter);
 } else {
   result.add(stat);
 }
 {code}
 Which seems to be allowing directories to be added to the results if 
 recursive is false.  Is this meant to return directories? If not, I think it 
 should look like this:
 {code}
 if (stat.isDirectory()) {
  if (recursive) {
   addInputPathRecursively(result, fs, stat.getPath(),
   inputFilter);
  }
 } else {
   result.add(stat);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAPREDUCE-5756) CombineFileInputFormat.getSplits() including directories in its results

2014-02-13 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated MAPREDUCE-5756:
--

Summary: CombineFileInputFormat.getSplits() including directories in its 
results  (was: FileInputFormat.listStatus() including directories in its 
results)

 CombineFileInputFormat.getSplits() including directories in its results
 ---

 Key: MAPREDUCE-5756
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5756
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jason Dere

 Trying to track down HIVE-6401, where we see some is not a file errors 
 because getSplits() is giving us directories.  I believe the culprit is 
 FileInputFormat.listStatus():
 {code}
 if (recursive  stat.isDirectory()) {
   addInputPathRecursively(result, fs, stat.getPath(),
   inputFilter);
 } else {
   result.add(stat);
 }
 {code}
 Which seems to be allowing directories to be added to the results if 
 recursive is false.  Is this meant to return directories? If not, I think it 
 should look like this:
 {code}
 if (stat.isDirectory()) {
  if (recursive) {
   addInputPathRecursively(result, fs, stat.getPath(),
   inputFilter);
  }
 } else {
   result.add(stat);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5757) ConcurrentModificationException in JobControl.toList

2014-02-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900788#comment-13900788
 ] 

Hadoop QA commented on MAPREDUCE-5757:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12628849/MAPREDUCE-5757.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4356//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4356//console

This message is automatically generated.

 ConcurrentModificationException in JobControl.toList
 

 Key: MAPREDUCE-5757
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5757
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-5757.patch


 Despite having the fix for MAPREDUCE-5513 we saw another 
 ConcurrencyModificationException in JobControl, so something there still 
 isn't fixed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (MAPREDUCE-5758) Reducer local data is not deleted until job completes

2014-02-13 Thread Jason Lowe (JIRA)
Jason Lowe created MAPREDUCE-5758:
-

 Summary: Reducer local data is not deleted until job completes
 Key: MAPREDUCE-5758
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5758
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.2.0, 0.23.10
Reporter: Jason Lowe


Ran into an instance where a reducer shuffled a large amount of data and 
subsequently failed, but the local data is not purged when the task fails but 
only after the entire job completes.  This wastes disk space unnecessarily 
since the data is no longer relevant after the task-attempt exits.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5758) Reducer local data is not deleted until job completes

2014-02-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900827#comment-13900827
 ] 

Jason Lowe commented on MAPREDUCE-5758:
---

When tasks run under YARN they are handed app-level local directories to write 
into and those are in turn passed down to the tasks to write local data.  Since 
the local output locations are not under the container directory, YARN does not 
clean them up when the container exits.  They are only reaped when the 
app-level directory is deleted which occurs after the application completes.

Tasks should use the container-specific local directory for temporary local 
outputs rather than the app-specific directory, so if they crash YARN can 
automatically clean them promptly.  Note that map outputs would have to be 
committed to the same app-level local location they are today in order to 
survive the container exiting and the ShuffleHandler to find them later.  
However they could be accumulated before commit in a container-specific 
directory so if the map attempt fails the data is reaped promptly rather than 
only when the job completes.  This would also help minimize chances of 
inter-task file collisions such that occurred in MAPREDUCE-5211.

 Reducer local data is not deleted until job completes
 -

 Key: MAPREDUCE-5758
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5758
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe

 Ran into an instance where a reducer shuffled a large amount of data and 
 subsequently failed, but the local data is not purged when the task fails but 
 only after the entire job completes.  This wastes disk space unnecessarily 
 since the data is no longer relevant after the task-attempt exits.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5757) ConcurrentModificationException in JobControl.toList

2014-02-13 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900842#comment-13900842
 ] 

Kihwal Lee commented on MAPREDUCE-5757:
---

+1 lgtm

 ConcurrentModificationException in JobControl.toList
 

 Key: MAPREDUCE-5757
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5757
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.23.10, 2.2.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-5757.patch


 Despite having the fix for MAPREDUCE-5513 we saw another 
 ConcurrencyModificationException in JobControl, so something there still 
 isn't fixed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5670) CombineFileRecordReader should report progress when moving to the next file

2014-02-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900863#comment-13900863
 ] 

Jason Lowe commented on MAPREDUCE-5670:
---

+1 lgtm.  Committing this.

 CombineFileRecordReader should report progress when moving to the next file
 ---

 Key: MAPREDUCE-5670
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5670
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.9
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Attachments: MR-5670v3.patch


 If a combine split consists of many empty files (i.e.: no record found by 
 the underlying record reader) then theoretically a task can timeout due to 
 lack of reported progress.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAPREDUCE-5670) CombineFileRecordReader should report progress when moving to the next file

2014-02-13 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5670:
--

   Resolution: Fixed
Fix Version/s: 2.4.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks, Chen!  I committed this to trunk and branch-2.

 CombineFileRecordReader should report progress when moving to the next file
 ---

 Key: MAPREDUCE-5670
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5670
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.9
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Fix For: 2.4.0

 Attachments: MR-5670v3.patch


 If a combine split consists of many empty files (i.e.: no record found by 
 the underlying record reader) then theoretically a task can timeout due to 
 lack of reported progress.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5670) CombineFileRecordReader should report progress when moving to the next file

2014-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900903#comment-13900903
 ] 

Hudson commented on MAPREDUCE-5670:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #5166 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5166/])
MAPREDUCE-5670. CombineFileRecordReader should report progress when moving to 
the next file. Contributed by Chen He (jlowe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1568118)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/lib/CombineFileRecordReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CombineFileRecordReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/lib/TestCombineFileRecordReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/input/TestCombineFileRecordReader.java


 CombineFileRecordReader should report progress when moving to the next file
 ---

 Key: MAPREDUCE-5670
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5670
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.9
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Fix For: 2.4.0

 Attachments: MR-5670v3.patch


 If a combine split consists of many empty files (i.e.: no record found by 
 the underlying record reader) then theoretically a task can timeout due to 
 lack of reported progress.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAPREDUCE-5641) History for failed Application Masters should be made available to the Job History Server

2014-02-13 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-5641:
-

Attachment: MAPREDUCE-5641.patch

I’ve attached a preliminary version of the patch.  Once we all agree on the 
specifics of the design, I can add unit tests.  
The patch follows the design I outlined before where the RM will write a file 
when it sees an AM die and the JHS see that and copies the jhist and similar 
files to the done_intermediate dir.  I have tested this by running jobs and 
killing the AM.  This results in incomplete information, as expected; however, 
in some cases some of the information won’t make 100% sense or is missing (e.g. 
no Finish Time if the AM didn’t actually finish).  I’ve put in some code to 
take care of these situations.  I’ve also attached a preliminary YARN patch to 
YARN-1731.  

{quote}
How will the JHS copy the file to the intermediate directory? It likely won't 
have access to the staging directory containing the jhist file.
{quote}
I modified the permissions from 0700 to 0701.

 History for failed Application Masters should be made available to the Job 
 History Server
 -

 Key: MAPREDUCE-5641
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5641
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster, jobhistoryserver
Affects Versions: 2.2.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: MAPREDUCE-5641.patch


 Currently, the JHS has no information about jobs whose AMs have failed.  This 
 is because the History is written by the AM to the intermediate folder just 
 before finishing, so when it fails for any reason, this information isn't 
 copied there.  However, it is not lost as its in the AM's staging directory.  
 To make the History available in the JHS, all we need to do is have another 
 mechanism to move the History from the staging directory to the intermediate 
 directory.  The AM also writes a Summary file before exiting normally, 
 which is also unavailable when the AM fails.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)

2014-02-13 Thread sam liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sam liu updated MAPREDUCE-4490:
---

Attachment: MAPREDUCE-4490.patch

New patch basing on latest branch origin/branch-1.2

 JVM reuse is incompatible with LinuxTaskController (and therefore 
 incompatible with Security)
 -

 Key: MAPREDUCE-4490
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task-controller, tasktracker
Affects Versions: 0.20.205.0, 1.0.3, 1.2.1
Reporter: George Datskos
Assignee: sam liu
Priority: Critical
  Labels: patch
 Fix For: 1.2.1

 Attachments: MAPREDUCE-4490.patch, MAPREDUCE-4490.patch, 
 MAPREDUCE-4490.patch


 When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks  
 1) with more map tasks in a job than there are map slots in the cluster will 
 result in immediate task failures for the second task in each JVM (and then 
 the JVM exits). We have investigated this bug and the root cause is as 
 follows. When using LinuxTaskController, the userlog directory for a task 
 attempt (../userlogs/job/task-attempt) is created only on the first 
 invocation (when the JVM is launched) because userlogs directories are 
 created by the task-controller binary which only runs *once* per JVM. 
 Therefore, attempting to create log.index is guaranteed to fail with ENOENT 
 leading to immediate task failure and child JVM exit.
 {quote}
 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting 
 logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM 
 as that of the first task 
 /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0
 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 ENOENT: No such file or directory
 at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
 at 
 org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
 at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296)
 at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369)
 at org.apache.hadoop.mapred.Child.main(Child.java:229)
 {quote}
 The above error occurs in a JVM which runs tasks 6 and 27.  Task6 goes 
 smoothly. Then Task27 starts. The directory 
 /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0
  is never created so when mapred.Child tries to write the log.index file for 
 Task27, it fails with ENOENT because the 
 attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, 
 the second task in each JVM is guaranteed to fail (and then the JVM exits) 
 every time when using LinuxTaskController. Note that this problem does not 
 occur when using the DefaultTaskController because the userlogs directories 
 are created for each task (not just for each JVM as with LinuxTaskController).
 For each task, the TaskRunner calls the TaskController's createLogDir method 
 before attempting to write out an index file.
 * DefaultTaskController#createLogDir: creates log directory for each task
 * LinuxTaskController#createLogDir: does nothing
 ** task-controller binary creates log directory [create_attempt_directories] 
 (but only for the first task)
 Possible Solution: add a new command to task-controller *initialize task* to 
 create attempt directories.  Call that command, with ShellCommandExecutor, in 
 the LinuxTaskController#createLogDir method



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAPREDUCE-4490) JVM reuse is incompatible with LinuxTaskController (and therefore incompatible with Security)

2014-02-13 Thread sam liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sam liu updated MAPREDUCE-4490:
---

Status: Open  (was: Patch Available)

Will upload new patch for latest code base of branch origin/branch-1.2

 JVM reuse is incompatible with LinuxTaskController (and therefore 
 incompatible with Security)
 -

 Key: MAPREDUCE-4490
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task-controller, tasktracker
Affects Versions: 1.2.1, 1.0.3, 0.20.205.0
Reporter: George Datskos
Assignee: sam liu
Priority: Critical
  Labels: patch
 Fix For: 1.2.1

 Attachments: MAPREDUCE-4490.patch, MAPREDUCE-4490.patch, 
 MAPREDUCE-4490.patch


 When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks  
 1) with more map tasks in a job than there are map slots in the cluster will 
 result in immediate task failures for the second task in each JVM (and then 
 the JVM exits). We have investigated this bug and the root cause is as 
 follows. When using LinuxTaskController, the userlog directory for a task 
 attempt (../userlogs/job/task-attempt) is created only on the first 
 invocation (when the JVM is launched) because userlogs directories are 
 created by the task-controller binary which only runs *once* per JVM. 
 Therefore, attempting to create log.index is guaranteed to fail with ENOENT 
 leading to immediate task failure and child JVM exit.
 {quote}
 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting 
 logging for a new task attempt_201207241401_0013_m_27_0 in the same JVM 
 as that of the first task 
 /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_06_0
 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 ENOENT: No such file or directory
 at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
 at 
 org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
 at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296)
 at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369)
 at org.apache.hadoop.mapred.Child.main(Child.java:229)
 {quote}
 The above error occurs in a JVM which runs tasks 6 and 27.  Task6 goes 
 smoothly. Then Task27 starts. The directory 
 /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_027_0
  is never created so when mapred.Child tries to write the log.index file for 
 Task27, it fails with ENOENT because the 
 attempt_201207241401_0013_m_027_0 directory does not exist. Therefore, 
 the second task in each JVM is guaranteed to fail (and then the JVM exits) 
 every time when using LinuxTaskController. Note that this problem does not 
 occur when using the DefaultTaskController because the userlogs directories 
 are created for each task (not just for each JVM as with LinuxTaskController).
 For each task, the TaskRunner calls the TaskController's createLogDir method 
 before attempting to write out an index file.
 * DefaultTaskController#createLogDir: creates log directory for each task
 * LinuxTaskController#createLogDir: does nothing
 ** task-controller binary creates log directory [create_attempt_directories] 
 (but only for the first task)
 Possible Solution: add a new command to task-controller *initialize task* to 
 create attempt directories.  Call that command, with ShellCommandExecutor, in 
 the LinuxTaskController#createLogDir method



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)