[jira] [Created] (MAPREDUCE-4912) Investigate ways to clean up double job commit prevention

2013-01-04 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created MAPREDUCE-4912:
--

 Summary: Investigate ways to clean up double job commit prevention
 Key: MAPREDUCE-4912
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4912
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Robert Joseph Evans


Once MAPREDUCE-4819 goes in it fixes the issue where an OutputCommiter can 
double commit a job.  So that the output will never be touched after the job 
informs externally of success or failure.

The code and design could potentially use some cleanup and refactoring.

Issues brought up that should be investigated include:

# reporting KILL for killed jobs if they crash after the kill happens instead 
of error.
# using the job history log for recording the commit status instead of separate 
external files in HDFS.
# Placing the recovery/retry logic in the commit handler instead of the 
MRAppMaster, and having the recovery service replay the logs as it normally 
does for recovery.

This is not meant to be things that must be done, but alternatives that might 
clean up the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4904) TestMultipleLevelCaching failed in branch-1

2013-01-04 Thread Luke Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Lu resolved MAPREDUCE-4904.


Resolution: Fixed
  Assignee: Junping Du  (was: meng gong)

Committed to branch-1. Thanks Junping for the patch!

 TestMultipleLevelCaching failed in branch-1
 ---

 Key: MAPREDUCE-4904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4904
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 1.2.0
Reporter: meng gong
Assignee: Junping Du
 Fix For: 1.2.0

 Attachments: MAPREDUCE-4904.patch, MAPREDUCE-4904-v2.patch, 
 MAPREDUCE-4904-v2.patch


 TestMultipleLevelCaching will failed:
 {noformat}
 Testcase: testMultiLevelCaching took 30.406 sec
 FAILED
 Number of local maps expected:0 but was:1
 junit.framework.AssertionFailedError: Number of local maps expected:0 but 
 was:1
 at 
 org.apache.hadoop.mapred.TestRackAwareTaskPlacement.launchJobAndTestCounters(TestRackAwareTaskPlacement.java:78)
 at 
 org.apache.hadoop.mapred.TestMultipleLevelCaching.testCachingAtLevel(TestMultipleLevelCaching.java:113)
 at 
 org.apache.hadoop.mapred.TestMultipleLevelCaching.testMultiLevelCaching(TestMultipleLevelCaching.java:69)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4894) Renewal / cancellation of JobHistory tokens

2013-01-04 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved MAPREDUCE-4894.
--

   Resolution: Fixed
Fix Version/s: 0.23.6
   2.0.3-alpha
   3.0.0

 Renewal / cancellation of JobHistory tokens
 ---

 Key: MAPREDUCE-4894
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4894
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 0.23.4
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Blocker
 Fix For: 3.0.0, 2.0.3-alpha, 0.23.6

 Attachments: MAPREDUCE-4894_wip.txt, MR-4894_branch0.23.txt, 
 MR-4894_branch0.23.txt, MR-4894_branch0.23.txt, MR-4894_trunk.txt, 
 MR-4894_trunk.txt, MR-4894.txt


 Equivalent of YARN-50 for JobHistory tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4913) TestMRAppMaster#testMRAppMasterMissingStaging occasionally exits

2013-01-04 Thread Jason Lowe (JIRA)
Jason Lowe created MAPREDUCE-4913:
-

 Summary: TestMRAppMaster#testMRAppMasterMissingStaging 
occasionally exits
 Key: MAPREDUCE-4913
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4913
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.0.3-alpha, 0.23.6
Reporter: Jason Lowe


testMRAppMasterMissingStaging will sometimes cause the JVM to exit due to this 
error from AsyncDispatcher:

{noformat}
2013-01-05 02:14:54,682 FATAL [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(137)) - Error in 
dispatcher thread
java.lang.Exception: No handler for registered for class 
org.apache.hadoop.mapreduce.jobhistory.EventType, cannot deliver EventType: 
AM_STARTED
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:132)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
at java.lang.Thread.run(Thread.java:662)
2013-01-05 02:14:54,682 INFO  [AsyncDispatcher event handler] 
event.AsyncDispatcher (AsyncDispatcher.java:dispatch(140)) - Exiting, bbye..
{noformat}

This can cause a build to fail since the test process exits without 
unregistering from surefire which treats it as a build error rather than a test 
failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4916) TestTrackerDistributedCacheManager is flaky due to other badly written tests

2013-01-04 Thread Arun C Murthy (JIRA)
Arun C Murthy created MAPREDUCE-4916:


 Summary: TestTrackerDistributedCacheManager is flaky due to other 
badly written tests
 Key: MAPREDUCE-4916
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4916
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Arun C Murthy
Assignee: Xuan Gong


Credit to Xuan figuring this: TestTrackerDistributedCacheManager is flaky due 
to other badly written tests since it checks for existence of a directory 
upfront which might have bad perms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4917) multiple BlockFixer should be supported in order to improve scalability and reduce too much work on single BlockFixer

2013-01-04 Thread Jun Jin (JIRA)
Jun Jin created MAPREDUCE-4917:
--

 Summary: multiple BlockFixer should be supported in order to 
improve scalability and reduce too much work on single BlockFixer
 Key: MAPREDUCE-4917
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4917
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Affects Versions: 0.22.0
Reporter: Jun Jin
Assignee: Jun Jin
 Fix For: 0.22.0


current implementation can only run single BlockFixer since the fsck (in 
RaidDFSUtil.getCorruptFiles) only check the whole DFS file system. multiple 
BlockFixer will do the same thing and try to fix same file if multiple 
BlockFixer launched.
the change/fix will be mainly in BlockFixer.java and 
RaidDFSUtil.getCorruptFile(), to enable fsck to check the different paths 
defined in separated Raid.xml for single RaidNode/BlockFixer

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira