[jira] [Commented] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694624#comment-14694624 ] Steve Loughran commented on MAPREDUCE-6443: --- in HADOOP-12313 I've proposed making JvmPauseMonitor a subclass of AbstractService; this will make it trivial to add the monitor to any CompositeService, including JHS. While I think the pause monitor is a great idea, it needs to become part of the YARN service model to be avoid patching in something with a different lifecycle and failure modes. Add JvmPauseMonitor to Job History Server - Key: MAPREDUCE-6443 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 2.8.0 Reporter: Robert Kanter Assignee: Robert Kanter Fix For: 2.8.0 Attachments: MAPREDUCE-6443.001.patch, MAPREDUCE-6443.002.patch We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6447) reduce shuffle throws java.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/MAPREDUCE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693125#comment-14693125 ] shuzhangyao commented on MAPREDUCE-6447: yes reduce shuffle throws java.lang.OutOfMemoryError: Java heap space --- Key: MAPREDUCE-6447 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6447 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.7.1 Reporter: shuzhangyao Assignee: shuzhangyao Priority: Minor 2015-08-11 14:03:54,550 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#10 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:56) at org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:46) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.init(InMemoryMapOutput.java:63) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:303) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:293) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:511) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6447) reduce shuffle throws java.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/MAPREDUCE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692985#comment-14692985 ] shuzhangyao commented on MAPREDUCE-6447: the default mapreduce.reduce.shuffle.input.buffer.percent is 0.9,so memorylimit = Runtime.getRuntime().maxMemory() * 0.9 . 0.9*0.25*5=1.1251. reduce shuffle throws java.lang.OutOfMemoryError: Java heap space --- Key: MAPREDUCE-6447 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6447 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.7.1 Reporter: shuzhangyao Assignee: shuzhangyao Priority: Minor 2015-08-11 14:03:54,550 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#10 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:56) at org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:46) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.init(InMemoryMapOutput.java:63) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:303) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:293) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:511) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6447) reduce shuffle throws java.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/MAPREDUCE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692984#comment-14692984 ] shuzhangyao commented on MAPREDUCE-6447: the default mapreduce.reduce.shuffle.input.buffer.percent is 0.9,so memorylimit = Runtime.getRuntime().maxMemory() * 0.9 . 0.9*0.25*5=1.1251. reduce shuffle throws java.lang.OutOfMemoryError: Java heap space --- Key: MAPREDUCE-6447 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6447 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.7.1 Reporter: shuzhangyao Assignee: shuzhangyao Priority: Minor 2015-08-11 14:03:54,550 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#10 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:56) at org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:46) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.init(InMemoryMapOutput.java:63) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:303) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:293) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:511) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6447) reduce shuffle throws java.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/MAPREDUCE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693058#comment-14693058 ] Tsuyoshi Ozawa commented on MAPREDUCE-6447: --- Hi guys, thank you for reporting this issue. Do you mean that should we fix the default value to avoid the exception on this jira? reduce shuffle throws java.lang.OutOfMemoryError: Java heap space --- Key: MAPREDUCE-6447 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6447 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.7.1 Reporter: shuzhangyao Assignee: shuzhangyao Priority: Minor 2015-08-11 14:03:54,550 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#10 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:56) at org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:46) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.init(InMemoryMapOutput.java:63) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:303) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:293) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:511) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6440) Duplicate Key in Json Output for Job details
[ https://issues.apache.org/jira/browse/MAPREDUCE-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693169#comment-14693169 ] Bibin A Chundatt commented on MAPREDUCE-6440: - [~zxu] any thoughts on this issue ? Duplicate Key in Json Output for Job details Key: MAPREDUCE-6440 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6440 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: Anushri Assignee: Bibin A Chundatt Priority: Minor Duplicate key in Json Output for Job details for the url : http://jhs_ip:jhs_port/ws/v1/history/mapreduce/jobs/job_id/tasks/task_id/attempts If the task type is REDUCE the json output for this url contains duplicate key for type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6447) reduce shuffle throws java.lang.OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/MAPREDUCE-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692986#comment-14692986 ] shuzhangyao commented on MAPREDUCE-6447: the default mapreduce.reduce.shuffle.input.buffer.percent is 0.9,so memorylimit = Runtime.getRuntime().maxMemory() * 0.9 . 0.9*0.25*5=1.1251. reduce shuffle throws java.lang.OutOfMemoryError: Java heap space --- Key: MAPREDUCE-6447 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6447 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.7.1 Reporter: shuzhangyao Assignee: shuzhangyao Priority: Minor 2015-08-11 14:03:54,550 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#10 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:56) at org.apache.hadoop.io.BoundedByteArrayOutputStream.init(BoundedByteArrayOutputStream.java:46) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.init(InMemoryMapOutput.java:63) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:303) at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:293) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:511) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:329) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5817) mappers get rescheduled on node transition even after all reducers are completed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693807#comment-14693807 ] Chris Douglas commented on MAPREDUCE-5817: -- bq. The current patch skips re-running mappers only if all reducers are complete. So I don't think reducers will fail beyond that point? Did I understand your question right? I see; sorry, I hadn't read the rest of the JIRA carefully. That's a fairly narrow window, isn't it? We may not need an extra state, if we kill all running maps when the last reducer completes. The condition this adds prevents new maps from being scheduled while cleanup/commit code is running. Minor: could {{allReducersComplete()}} call {{getCompletedReduces()}}? +1 on the patch mappers get rescheduled on node transition even after all reducers are completed Key: MAPREDUCE-5817 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.3.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: MAPREDUCE-5817.001.patch, mapreduce-5817.patch We're seeing a behavior where a job runs long after all reducers were already finished. We found that the job was rescheduling and running a number of mappers beyond the point of reducer completion. In one situation, the job ran for some 9 more hours after all reducers completed! This happens because whenever a node transition (to an unusable state) comes into the app master, it just reschedules all mappers that already ran on the node in all cases. Therefore, if any node transition has a potential to extend the job period. Once this window opens, another node transition can prolong it, and this can happen indefinitely in theory. If there is some instability in the pool (unhealthy, etc.) for a duration, then any big job is severely vulnerable to this problem. If all reducers have been completed, JobImpl.actOnUnusableNode() should not reschedule mapper tasks. If all reducers are completed, the mapper outputs are no longer needed, and there is no need to reschedule mapper tasks as they would not be consumed anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)