[jira] [Commented] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server

2015-08-12 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694624#comment-14694624
 ] 

Steve Loughran commented on MAPREDUCE-6443:
---

in HADOOP-12313 I've proposed making JvmPauseMonitor a subclass of 
AbstractService; this will make it trivial to add the monitor to any 
CompositeService, including JHS. While I think the pause monitor is a great 
idea, it needs to become part of the YARN service model to be avoid patching in 
something with a different lifecycle and failure modes.

 Add JvmPauseMonitor to Job History Server
 -

 Key: MAPREDUCE-6443
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 2.8.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6443.001.patch, MAPREDUCE-6443.002.patch


 We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History 
 Server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6447) reduce shuffle throws java.lang.OutOfMemoryError: Java heap space

2015-08-12 Thread shuzhangyao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693125#comment-14693125
 ] 

shuzhangyao commented on MAPREDUCE-6447:


yes

 reduce shuffle throws java.lang.OutOfMemoryError: Java heap space
 ---

 Key: MAPREDUCE-6447
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6447
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.7.1
Reporter: shuzhangyao
Assignee: shuzhangyao
Priority: Minor

 2015-08-11 14:03:54,550 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#10
   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at 
 org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:56)
   at 
 org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:46)
   at 
 org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.init(InMemoryMapOutput.java:63)
   at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:303)
   at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:293)
   at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:511)
   at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329)
   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6447) reduce shuffle throws java.lang.OutOfMemoryError: Java heap space

2015-08-12 Thread shuzhangyao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692985#comment-14692985
 ] 

shuzhangyao commented on MAPREDUCE-6447:


the default mapreduce.reduce.shuffle.input.buffer.percent is 0.9,so memorylimit 
= Runtime.getRuntime().maxMemory() * 0.9 .
0.9*0.25*5=1.1251.

 reduce shuffle throws java.lang.OutOfMemoryError: Java heap space
 ---

 Key: MAPREDUCE-6447
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6447
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.7.1
Reporter: shuzhangyao
Assignee: shuzhangyao
Priority: Minor

 2015-08-11 14:03:54,550 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#10
   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at 
 org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:56)
   at 
 org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:46)
   at 
 org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.init(InMemoryMapOutput.java:63)
   at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:303)
   at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:293)
   at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:511)
   at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329)
   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6447) reduce shuffle throws java.lang.OutOfMemoryError: Java heap space

2015-08-12 Thread shuzhangyao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692984#comment-14692984
 ] 

shuzhangyao commented on MAPREDUCE-6447:


the default mapreduce.reduce.shuffle.input.buffer.percent is 0.9,so memorylimit 
= Runtime.getRuntime().maxMemory() * 0.9 .
0.9*0.25*5=1.1251.

 reduce shuffle throws java.lang.OutOfMemoryError: Java heap space
 ---

 Key: MAPREDUCE-6447
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6447
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.7.1
Reporter: shuzhangyao
Assignee: shuzhangyao
Priority: Minor

 2015-08-11 14:03:54,550 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#10
   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at 
 org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:56)
   at 
 org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:46)
   at 
 org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.init(InMemoryMapOutput.java:63)
   at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:303)
   at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:293)
   at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:511)
   at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329)
   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6447) reduce shuffle throws java.lang.OutOfMemoryError: Java heap space

2015-08-12 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693058#comment-14693058
 ] 

Tsuyoshi Ozawa commented on MAPREDUCE-6447:
---

Hi guys, thank you for reporting this issue. Do you mean that should we fix the 
default value to avoid the exception on this jira?

 reduce shuffle throws java.lang.OutOfMemoryError: Java heap space
 ---

 Key: MAPREDUCE-6447
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6447
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.7.1
Reporter: shuzhangyao
Assignee: shuzhangyao
Priority: Minor

 2015-08-11 14:03:54,550 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#10
   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at 
 org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:56)
   at 
 org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:46)
   at 
 org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.init(InMemoryMapOutput.java:63)
   at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:303)
   at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:293)
   at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:511)
   at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329)
   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6440) Duplicate Key in Json Output for Job details

2015-08-12 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693169#comment-14693169
 ] 

Bibin A Chundatt commented on MAPREDUCE-6440:
-

[~zxu] any thoughts on this issue ?

 Duplicate Key in Json Output for Job details
 

 Key: MAPREDUCE-6440
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6440
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: Anushri
Assignee: Bibin A Chundatt
Priority: Minor

 Duplicate key in Json Output for Job details for the url : 
 http://jhs_ip:jhs_port/ws/v1/history/mapreduce/jobs/job_id/tasks/task_id/attempts
 If the task type is REDUCE the json output for this url contains duplicate 
 key for type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6447) reduce shuffle throws java.lang.OutOfMemoryError: Java heap space

2015-08-12 Thread shuzhangyao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692986#comment-14692986
 ] 

shuzhangyao commented on MAPREDUCE-6447:


the default mapreduce.reduce.shuffle.input.buffer.percent is 0.9,so memorylimit 
= Runtime.getRuntime().maxMemory() * 0.9 .
0.9*0.25*5=1.1251.

 reduce shuffle throws java.lang.OutOfMemoryError: Java heap space
 ---

 Key: MAPREDUCE-6447
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6447
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.7.1
Reporter: shuzhangyao
Assignee: shuzhangyao
Priority: Minor

 2015-08-11 14:03:54,550 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#10
   at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at 
 org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:56)
   at 
 org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:46)
   at 
 org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.init(InMemoryMapOutput.java:63)
   at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:303)
   at 
 org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:293)
   at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:511)
   at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329)
   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5817) mappers get rescheduled on node transition even after all reducers are completed

2015-08-12 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693807#comment-14693807
 ] 

Chris Douglas commented on MAPREDUCE-5817:
--

bq. The current patch skips re-running mappers only if all reducers are 
complete. So I don't think reducers will fail beyond that point? Did I 
understand your question right?

I see; sorry, I hadn't read the rest of the JIRA carefully. That's a fairly 
narrow window, isn't it? We may not need an extra state, if we kill all running 
maps when the last reducer completes. The condition this adds prevents new maps 
from being scheduled while cleanup/commit code is running.

Minor: could {{allReducersComplete()}} call {{getCompletedReduces()}}?

+1 on the patch

 mappers get rescheduled on node transition even after all reducers are 
 completed
 

 Key: MAPREDUCE-5817
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: MAPREDUCE-5817.001.patch, mapreduce-5817.patch


 We're seeing a behavior where a job runs long after all reducers were already 
 finished. We found that the job was rescheduling and running a number of 
 mappers beyond the point of reducer completion. In one situation, the job ran 
 for some 9 more hours after all reducers completed!
 This happens because whenever a node transition (to an unusable state) comes 
 into the app master, it just reschedules all mappers that already ran on the 
 node in all cases.
 Therefore, if any node transition has a potential to extend the job period. 
 Once this window opens, another node transition can prolong it, and this can 
 happen indefinitely in theory.
 If there is some instability in the pool (unhealthy, etc.) for a duration, 
 then any big job is severely vulnerable to this problem.
 If all reducers have been completed, JobImpl.actOnUnusableNode() should not 
 reschedule mapper tasks. If all reducers are completed, the mapper outputs 
 are no longer needed, and there is no need to reschedule mapper tasks as they 
 would not be consumed anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)