[jira] [Commented] (MAPREDUCE-6941) The default setting doesn't work for MapReduce job
[ https://issues.apache.org/jira/browse/MAPREDUCE-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154675#comment-16154675 ] Junping Du commented on MAPREDUCE-6941: --- Sorry missing comments on this JIRA. I think Ray's comments make sense and I just missed the discussion on MAPREDUCE-6704. +1 on resolve this issue. > The default setting doesn't work for MapReduce job > -- > > Key: MAPREDUCE-6941 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6941 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: Junping Du >Priority: Blocker > > On the deployment of hadoop 3 cluster (based on current trunk branch) with > default settings, the MR job will get failed as following exceptions: > {noformat} > 2017-08-16 13:00:03,846 INFO mapreduce.Job: Job job_1502913552390_0001 > running in uber mode : false > 2017-08-16 13:00:03,847 INFO mapreduce.Job: map 0% reduce 0% > 2017-08-16 13:00:03,864 INFO mapreduce.Job: Job job_1502913552390_0001 failed > with state FAILED due to: Application application_1502913552390_0001 failed 2 > times due to AM Container for appattempt_1502913552390_0001_02 exited > with exitCode: 1 > Failing this attempt.Diagnostics: [2017-08-16 13:00:02.963]Exception from > container-launch. > Container id: container_1502913552390_0001_02_01 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:994) > at org.apache.hadoop.util.Shell.run(Shell.java:887) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:295) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:455) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:275) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:90) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This is because mapreduce related jar are not added into yarn setup by > default. To make MR job run successful, we need to add following > configurations to yarn-site.xml now: > {noformat} > > yarn.application.classpath > > ... > /share/hadoop/mapreduce/*, > /share/hadoop/mapreduce/lib/* > ... > > {noformat} > But this config is not necessary for previous version of Hadoop. We should > fix this issue before beta release otherwise it will be a regression for > configuration changes. > This could be more like a YARN issue (if so, we should move), depends on how > we fix it finally. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6941) The default setting doesn't work for MapReduce job
[ https://issues.apache.org/jira/browse/MAPREDUCE-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang resolved MAPREDUCE-6941. Resolution: Not A Problem I'm going to close this based on Ray's analysis. Junping, if you disagree, please re-open the JIRA. > The default setting doesn't work for MapReduce job > -- > > Key: MAPREDUCE-6941 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6941 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0-beta1 >Reporter: Junping Du >Priority: Blocker > > On the deployment of hadoop 3 cluster (based on current trunk branch) with > default settings, the MR job will get failed as following exceptions: > {noformat} > 2017-08-16 13:00:03,846 INFO mapreduce.Job: Job job_1502913552390_0001 > running in uber mode : false > 2017-08-16 13:00:03,847 INFO mapreduce.Job: map 0% reduce 0% > 2017-08-16 13:00:03,864 INFO mapreduce.Job: Job job_1502913552390_0001 failed > with state FAILED due to: Application application_1502913552390_0001 failed 2 > times due to AM Container for appattempt_1502913552390_0001_02 exited > with exitCode: 1 > Failing this attempt.Diagnostics: [2017-08-16 13:00:02.963]Exception from > container-launch. > Container id: container_1502913552390_0001_02_01 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:994) > at org.apache.hadoop.util.Shell.run(Shell.java:887) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:295) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:455) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:275) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:90) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > This is because mapreduce related jar are not added into yarn setup by > default. To make MR job run successful, we need to add following > configurations to yarn-site.xml now: > {noformat} > > yarn.application.classpath > > ... > /share/hadoop/mapreduce/*, > /share/hadoop/mapreduce/lib/* > ... > > {noformat} > But this config is not necessary for previous version of Hadoop. We should > fix this issue before beta release otherwise it will be a regression for > configuration changes. > This could be more like a YARN issue (if so, we should move), depends on how > we fix it finally. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events
[ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154189#comment-16154189 ] Jason Lowe commented on MAPREDUCE-5124: --- Ah, sorry, I thought we were still worrying about how to keep the AM from exploding. Sure, I could see a dynamic heartbeat still being useful once the flow control problem is addressed. Even with the current async processing without flow control we could feedback to the task information on how long to wait until the next heartbeat (e.g.: leverage the current AsyncDispatcher event queue size to scale the next task heartbeat interval accordingly) which could help avoid continued heartbeat pileups for large jobs. > AM lacks flow control for task events > - > > Key: MAPREDUCE-5124 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Haibo Chen > Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt > > > The AM does not have any flow control to limit the incoming rate of events > from tasks. If the AM is unable to keep pace with the rate of incoming > events for a sufficient period of time then it will eventually exhaust the > heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event > processing, but the AM could still get behind if it's starved for CPU and/or > handling a very large job with tens of thousands of active tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events
[ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154117#comment-16154117 ] Miklos Szegedi commented on MAPREDUCE-5124: --- [~jlowe], I absolutely agree that the heartbeat should be synchronous, with no new call until the previous is processed and I also agree that the async RPC support is needed to process other important messages. This solves the graceful degradation issue. What I am saying is that once 10 mappers send these heartbeats and wait for them, there will be a delay processing them due to the server bottleneck, so the metric would reach the client later, unless we minimize the delay with either a server to client approach or a dynamic heartbeat interval. > AM lacks flow control for task events > - > > Key: MAPREDUCE-5124 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Haibo Chen > Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt > > > The AM does not have any flow control to limit the incoming rate of events > from tasks. If the AM is unable to keep pace with the rate of incoming > events for a sufficient period of time then it will eventually exhaust the > heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event > processing, but the AM could still get behind if it's starved for CPU and/or > handling a very large job with tens of thousands of active tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events
[ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154108#comment-16154108 ] Jason Lowe commented on MAPREDUCE-5124: --- bq. I think either the server needs to control the heartbeat to minimize the delay (indeed a too big a change), or the task needs to tweak the heartbeat interval based on the previous response time as Peter Bacsko has suggested. The issue here isn't that tasks are seeing a long delay in heartbeat response time and failing to react to that. The problem is the AM is accepting and quickly responding to them at a rate far higher than it can actually process them in the background AsyncDispatcher thread. In other words, by the time a task notices a significant delay in heartbeat processing time the AM has probably already started going into GC hell and it's likely too late to course-correct at that point. The only way to get reliable feedback on how long the processing is really taking is to make the heartbeat processing synchronous, so the task doesn't get a response until the processing has actually completed. Without async RPC call support, that has the issue of tying up the server handler threads which prevents more important calls from being processed in a timely manner. > AM lacks flow control for task events > - > > Key: MAPREDUCE-5124 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Haibo Chen > Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt > > > The AM does not have any flow control to limit the incoming rate of events > from tasks. If the AM is unable to keep pace with the rate of incoming > events for a sufficient period of time then it will eventually exhaust the > heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event > processing, but the AM could still get behind if it's starved for CPU and/or > handling a very large job with tens of thousands of active tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events
[ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154072#comment-16154072 ] Miklos Szegedi commented on MAPREDUCE-5124: --- Thank you, [~jlowe] for the previous reply. Let me address your concerns there. You are right, doing an asynchronous call leveraging HADOOP-11552 is probably the smallest change possible in this case. What I was trying to solve is the theoretical problem sending heartbeat with metrics from large amount of tasks with graceful degradation with interval T and minimal delay D. The delay for a metric is {{D+T/2}}, when read from the AM. It waited D amount of time in the queue and once available it will be sampled with a mean delay of {{T/2}}. If the server controls the heartbeat both graceful degradation and minimal delay are met, since there is no delay D=0, the heartbeat is processed right away. If the task controls the heartbeat the average wait time adds to the delay of the current metrics, so any consumer will get those later. Indeed this would also mean making the client socket connection act as an RPC server, which is quite a big change. I think either the server needs to control the heartbeat to minimize the delay (indeed a too big a change), or the task needs to tweak the heartbeat interval based on the previous response time as [~pbacsko] has suggested. The second option could be implemented on top of HADOOP-11552. > AM lacks flow control for task events > - > > Key: MAPREDUCE-5124 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Haibo Chen > Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt > > > The AM does not have any flow control to limit the incoming rate of events > from tasks. If the AM is unable to keep pace with the rate of incoming > events for a sufficient period of time then it will eventually exhaust the > heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event > processing, but the AM could still get behind if it's starved for CPU and/or > handling a very large job with tens of thousands of active tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6432) Fix typos in hadoop-mapreduce-project module
[ https://issues.apache.org/jira/browse/MAPREDUCE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154023#comment-16154023 ] Hadoop QA commented on MAPREDUCE-6432: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} MAPREDUCE-6432 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | MAPREDUCE-6432 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12744761/MAPREDUCE-6432.001.patch | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7122/console | | Powered by | Apache Yetus 0.5.0 http://yetus.apache.org | This message was automatically generated. > Fix typos in hadoop-mapreduce-project module > > > Key: MAPREDUCE-6432 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6432 > Project: Hadoop Map/Reduce > Issue Type: Task >Affects Versions: 2.7.1 >Reporter: Ray Chiang >Assignee: Neelesh Srinivas Salian >Priority: Minor > Labels: supportability > Attachments: MAPREDUCE-6432.001.patch > > > Fix a bunch of typos in comments, strings, variable names, and method names > in the hadoop-mapreduce-project module. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6432) Fix typos in hadoop-mapreduce-project module
[ https://issues.apache.org/jira/browse/MAPREDUCE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154005#comment-16154005 ] Hadoop QA commented on MAPREDUCE-6432: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s{color} | {color:red} MAPREDUCE-6432 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | MAPREDUCE-6432 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12744761/MAPREDUCE-6432.001.patch | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7121/console | | Powered by | Apache Yetus 0.5.0 http://yetus.apache.org | This message was automatically generated. > Fix typos in hadoop-mapreduce-project module > > > Key: MAPREDUCE-6432 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6432 > Project: Hadoop Map/Reduce > Issue Type: Task >Affects Versions: 2.7.1 >Reporter: Ray Chiang >Assignee: Neelesh Srinivas Salian >Priority: Minor > Labels: supportability > Attachments: MAPREDUCE-6432.001.patch > > > Fix a bunch of typos in comments, strings, variable names, and method names > in the hadoop-mapreduce-project module. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153860#comment-16153860 ] Haibo Chen commented on MAPREDUCE-6441: --- bq. but I haven't managed to get it to fail with the old code My understanding is that the new test is supposed to fail with the old code and the new change is supposed to fix the test failure. Otherwise, the new test is not testing any new behavior, right? > Improve temporary directory name generation in LocalDistributedCacheManager > for concurrent processes > > > Key: MAPREDUCE-6441 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: William Watson >Assignee: Ray Chiang > Attachments: HADOOP-10924.02.patch, > HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, > MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch > > > Kicking off many sqoop processes in different threads results in: > {code} > 2014-08-01 13:47:24 -0400: INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: > Encountered IOException running import job: java.io.IOException: > java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot > overwrite non empty destination directory > /tmp/hadoop-hadoop/mapred/local/1406915233073 > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > java.security.AccessController.doPrivileged(Native Method) > 2014-08-01 13:47:24 -0400: INFO -at > javax.security.auth.Subject.doAs(Subject.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.run(Sqoop.java:145) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.main(Sqoop.java:238) > {code} > If two are kicked off in the same second. The issue is the following lines of > code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: > {code} > // Generating unique numbers for FSDownload. > AtomicLong uniqueNumberGenerator = >new AtomicLong(System.currentTimeMillis()); > {code} > and > {code} > Long.toString(uniqueNumberGenerator.incrementAndGet())), > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events
[ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153610#comment-16153610 ] Jason Lowe commented on MAPREDUCE-5124: --- Turning on the RPC backoff feature alone will not be enough, as the call queues aren't backing up today. We'd have to change the processing of the heartbeat to be synchronously processed by the IPC server handler thread rather than thrown on the AsyncDispatcher event queue as it's done today. That means we'll quickly start tying up server handler threads for large jobs, and that will end up choking out more important method calls like task assignment, task completion, etc. It would probably work but be far from ideal when things start to become congested. > AM lacks flow control for task events > - > > Key: MAPREDUCE-5124 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Haibo Chen > Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt > > > The AM does not have any flow control to limit the incoming rate of events > from tasks. If the AM is unable to keep pace with the rate of incoming > events for a sufficient period of time then it will eventually exhaust the > heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event > processing, but the AM could still get behind if it's starved for CPU and/or > handling a very large job with tens of thousands of active tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-5124) AM lacks flow control for task events
[ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153432#comment-16153432 ] Peter Bacsko edited comment on MAPREDUCE-5124 at 9/5/17 10:37 AM: -- Just a question - we already have https://issues.apache.org/jira/browse/HADOOP-10597. Can't we just enable this feature inside the MRAppMaster when it creates the RPC server for TaskUmbilicalProtocol? (I guess that's the message which mappers/reducers call). Then in {{TaskReporter}} we handle {{RetriableException}} and increase the heartbeat interval, let's say double it. If it succeeds after a couple of reports, we can try to decrease it again, back to the original value. This might not be the best flow control method, but we can think about this. was (Author: pbacsko): Just a question - we already have https://issues.apache.org/jira/browse/HADOOP-10597. Can't we just enable this feature inside the MRAppMaster when it creates the RCP server for TaskUmbilicalProtocol? (I guess that's the message which mappers/reducers call). Then in {{TaskReporter}} we handle {{RetriableException}} and increase the heartbeat interval, let's say double it. If it succeeds after a couple of reports, we can try to decrease it again, back to the original value. This might not be the best flow control method, but we can think about this. > AM lacks flow control for task events > - > > Key: MAPREDUCE-5124 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Haibo Chen > Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt > > > The AM does not have any flow control to limit the incoming rate of events > from tasks. If the AM is unable to keep pace with the rate of incoming > events for a sufficient period of time then it will eventually exhaust the > heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event > processing, but the AM could still get behind if it's starved for CPU and/or > handling a very large job with tens of thousands of active tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events
[ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153432#comment-16153432 ] Peter Bacsko commented on MAPREDUCE-5124: - Just a question - we already have https://issues.apache.org/jira/browse/HADOOP-10597. Can't we just enable this feature inside the MRAppMaster when it creates the RCP server for TaskUmbilicalProtocol? (I guess that's the message which mappers/reducers call). Then in {{TaskReporter}} we handle {{RetriableException}} and increase the heartbeat interval, let's say double it. If it succeeds after a couple of reports, we can try to decrease it again, back to the original value. This might not be the best flow control method, but we can think about this. > AM lacks flow control for task events > - > > Key: MAPREDUCE-5124 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Haibo Chen > Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt > > > The AM does not have any flow control to limit the incoming rate of events > from tasks. If the AM is unable to keep pace with the rate of incoming > events for a sufficient period of time then it will eventually exhaust the > heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event > processing, but the AM could still get behind if it's starved for CPU and/or > handling a very large job with tens of thousands of active tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6760) LocatedFileStatusFetcher to use listFiles(recursive)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated MAPREDUCE-6760: -- Target Version/s: 3.1.0 > LocatedFileStatusFetcher to use listFiles(recursive) > > > Key: MAPREDUCE-6760 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6760 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 2.8.0 >Reporter: Steve Loughran > > {{LocatedFileStatusFetcher}} does parallelized path listing, but it does make > recursive calls on every subdir. > If we could switch it to use {{FileSystem.listFiles(recursive)}}, object > stores that have high-performance implementations of that operation would see > significant speedup. > HADOOP-13208 implements that for S3A; Azure, swift &c can do the same. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org