[jira] [Created] (MAPREDUCE-4852) Reducer should not signal fetch failures for disk errors on the reducer's side
Jason Lowe created MAPREDUCE-4852: - Summary: Reducer should not signal fetch failures for disk errors on the reducer's side Key: MAPREDUCE-4852 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4852 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: Jason Lowe Ran across a case where a reducer ran on a node where the disks were full, leading to an exception like this during the shuffle fetch: {noformat} 2012-12-05 09:07:28,749 INFO [fetcher#25] org.apache.hadoop.mapreduce.task.reduce.MergeManager: attempt_1352354913026_138167_m_000654_0: Shuffling to disk since 235056188 is greater than maxSingleShuffleLimit (155104064) 2012-12-05 09:07:28,755 INFO [fetcher#25] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#25 failed to read map headerattempt_1352354913026_138167_m_000654_0 decomp: 235056188, 101587629 org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/attempt_1352354913026_138167_r_000189_0/map_654.out at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131) at org.apache.hadoop.mapred.YarnOutputFiles.getInputFileForWrite(YarnOutputFiles.java:213) at org.apache.hadoop.mapreduce.task.reduce.MapOutput.init(MapOutput.java:81) at org.apache.hadoop.mapreduce.task.reduce.MergeManager.reserve(MergeManager.java:245) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:348) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:283) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:155) 2012-12-05 09:07:28,755 WARN [fetcher#25] org.apache.hadoop.mapreduce.task.reduce.Fetcher: copyMapOutput failed for tasks [attempt_1352354913026_138167_m_000654_0] 2012-12-05 09:07:28,756 INFO [fetcher#25] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Reporting fetch failure for attempt_1352354913026_138167_m_000654_0 to jobtracker. {noformat} Even though the error was local to the reducer, it reported the error as a fetch failure to the AM than failing the reducer itself. It then proceeded to run into the same error for many other maps, causing them to relaunch from reported fetch failures. In this case it would have been better to fail the reducer and try another node rather than blame the mapper for what is an error on the reducer's side. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4854) TestRumenJobTraces is broken in branch-1
Arun C Murthy created MAPREDUCE-4854: Summary: TestRumenJobTraces is broken in branch-1 Key: MAPREDUCE-4854 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4854 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.0 Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 1.1.2 TestRumenJobTraces is broken in branch-1, need to fix the 'gold' events it's checking against which is broken. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4854) TestRumenJobTraces is broken in branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy resolved MAPREDUCE-4854. -- Resolution: Cannot Reproduce Sorry, looks like an error - looking at wrong branch. TestRumenJobTraces is broken in branch-1 Key: MAPREDUCE-4854 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4854 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.0 Reporter: Arun C Murthy Assignee: Arun C Murthy TestRumenJobTraces is broken in branch-1, need to fix the 'gold' events it's checking against which is broken. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4855) Modify Security Conditional that check for KERBEROS
Robert Parker created MAPREDUCE-4855: Summary: Modify Security Conditional that check for KERBEROS Key: MAPREDUCE-4855 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4855 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Reporter: Robert Parker Assignee: Robert Parker To support PLAIN authentication, checks should disallow certain types (TOKEN for token delegation) instead of allowing only KERBEROS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4857) Fix 126 error during map/reduce phase
Fengdong Yu created MAPREDUCE-4857: -- Summary: Fix 126 error during map/reduce phase Key: MAPREDUCE-4857 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4857 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.0.4 Reporter: Fengdong Yu Fix For: 1.0.4 There is rare happenings during map or reduce phase, but mostly in map phase, the Exception messages: java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 126. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) and error logs are cleaned, so It's very hard to debug. but I compared DefaultTaskController.java with 0.22, they use bash command to start the job scritp, but 1.0.4 use bash, -c, command. I removed -c, everything is ok, 126 error code never happen again. I read man document of bash, it indicates when fork a new thread with write command, another thread with bash -c also has a writable fd. so it could return 126 status occasionally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4858) TestWebUIAuthorization fails on branch-1
Arun C Murthy created MAPREDUCE-4858: Summary: TestWebUIAuthorization fails on branch-1 Key: MAPREDUCE-4858 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4858 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Arun C Murthy Assignee: Arun C Murthy TestWebUIAuthorization fails on branch-1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4858) TestWebUIAuthorization fails on branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy resolved MAPREDUCE-4858. -- Resolution: Fixed Fix Version/s: 1.1.2 Thanks Mahadev Vinod. I committed this for Matt to pick up for 1.1.2. TestWebUIAuthorization fails on branch-1 Key: MAPREDUCE-4858 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4858 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 1.1.2 Attachments: MAPREDUCE-4858.patch, MAPREDUCE-4858.patch TestWebUIAuthorization fails on branch-1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira