[jira] [Created] (MAPREDUCE-4852) Reducer should not signal fetch failures for disk errors on the reducer's side

2012-12-06 Thread Jason Lowe (JIRA)
Jason Lowe created MAPREDUCE-4852:
-

 Summary: Reducer should not signal fetch failures for disk errors 
on the reducer's side
 Key: MAPREDUCE-4852
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4852
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Jason Lowe


Ran across a case where a reducer ran on a node where the disks were full, 
leading to an exception like this during the shuffle fetch:

{noformat}
2012-12-05 09:07:28,749 INFO [fetcher#25] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: 
attempt_1352354913026_138167_m_000654_0: Shuffling to disk since 235056188 is 
greater than maxSingleShuffleLimit (155104064)
2012-12-05 09:07:28,755 INFO [fetcher#25] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#25 failed to read map 
headerattempt_1352354913026_138167_m_000654_0 decomp: 235056188, 101587629
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid 
local directory for output/attempt_1352354913026_138167_r_000189_0/map_654.out
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:398)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:150)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:131)
at 
org.apache.hadoop.mapred.YarnOutputFiles.getInputFileForWrite(YarnOutputFiles.java:213)
at 
org.apache.hadoop.mapreduce.task.reduce.MapOutput.init(MapOutput.java:81)
at 
org.apache.hadoop.mapreduce.task.reduce.MergeManager.reserve(MergeManager.java:245)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:348)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:283)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:155)
2012-12-05 09:07:28,755 WARN [fetcher#25] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: copyMapOutput failed for tasks 
[attempt_1352354913026_138167_m_000654_0]
2012-12-05 09:07:28,756 INFO [fetcher#25] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Reporting fetch 
failure for attempt_1352354913026_138167_m_000654_0 to jobtracker.
{noformat}

Even though the error was local to the reducer, it reported the error as a 
fetch failure to the AM than failing the reducer itself.  It then proceeded to 
run into the same error for many other maps, causing them to relaunch from 
reported fetch failures.  In this case it would have been better to fail the 
reducer and try another node rather than blame the mapper for what is an error 
on the reducer's side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4854) TestRumenJobTraces is broken in branch-1

2012-12-06 Thread Arun C Murthy (JIRA)
Arun C Murthy created MAPREDUCE-4854:


 Summary: TestRumenJobTraces is broken in branch-1
 Key: MAPREDUCE-4854
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4854
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 1.1.2


TestRumenJobTraces is broken in branch-1, need to fix the 'gold' events it's 
checking against which is broken.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4854) TestRumenJobTraces is broken in branch-1

2012-12-06 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy resolved MAPREDUCE-4854.
--

Resolution: Cannot Reproduce

Sorry, looks like an error - looking at wrong branch.

 TestRumenJobTraces is broken in branch-1
 

 Key: MAPREDUCE-4854
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4854
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Arun C Murthy
Assignee: Arun C Murthy

 TestRumenJobTraces is broken in branch-1, need to fix the 'gold' events it's 
 checking against which is broken.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4855) Modify Security Conditional that check for KERBEROS

2012-12-06 Thread Robert Parker (JIRA)
Robert Parker created MAPREDUCE-4855:


 Summary: Modify Security Conditional that check for KERBEROS
 Key: MAPREDUCE-4855
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4855
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Reporter: Robert Parker
Assignee: Robert Parker


To support PLAIN authentication, checks should disallow certain types (TOKEN 
for token delegation) instead of allowing only KERBEROS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4857) Fix 126 error during map/reduce phase

2012-12-06 Thread Fengdong Yu (JIRA)
Fengdong Yu created MAPREDUCE-4857:
--

 Summary: Fix 126 error during map/reduce phase
 Key: MAPREDUCE-4857
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4857
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.0.4
Reporter: Fengdong Yu
 Fix For: 1.0.4


There is rare happenings during map or reduce phase, but mostly in map phase, 
the Exception messages: 
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 126.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

and error logs are cleaned, so It's very hard to debug.

but I compared DefaultTaskController.java with 0.22, they use bash command to 
start the job scritp, but 1.0.4 use bash, -c, command.

I removed -c, everything is ok, 126 error code never happen again.

I read man document of bash, it indicates when fork a new thread with write 
command, another thread with bash -c also has a writable fd. so it could 
return 126 status occasionally.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4858) TestWebUIAuthorization fails on branch-1

2012-12-06 Thread Arun C Murthy (JIRA)
Arun C Murthy created MAPREDUCE-4858:


 Summary: TestWebUIAuthorization fails on branch-1
 Key: MAPREDUCE-4858
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4858
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Arun C Murthy
Assignee: Arun C Murthy


TestWebUIAuthorization fails on branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-4858) TestWebUIAuthorization fails on branch-1

2012-12-06 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy resolved MAPREDUCE-4858.
--

   Resolution: Fixed
Fix Version/s: 1.1.2

Thanks Mahadev  Vinod. I committed this for Matt to pick up for 1.1.2.

 TestWebUIAuthorization fails on branch-1
 

 Key: MAPREDUCE-4858
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4858
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 1.1.2

 Attachments: MAPREDUCE-4858.patch, MAPREDUCE-4858.patch


 TestWebUIAuthorization fails on branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira