[jira] [Commented] (MAPREDUCE-6941) The default setting doesn't work for MapReduce job

2017-09-05 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154675#comment-16154675
 ] 

Junping Du commented on MAPREDUCE-6941:
---

Sorry missing comments on this JIRA. I think Ray's comments make sense and I 
just missed the discussion on MAPREDUCE-6704. +1 on resolve this issue.

> The default setting doesn't work for MapReduce job
> --
>
> Key: MAPREDUCE-6941
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6941
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: Junping Du
>Priority: Blocker
>
> On the deployment of hadoop 3 cluster (based on current trunk branch) with 
> default settings, the MR job will get failed as following exceptions:
> {noformat}
> 2017-08-16 13:00:03,846 INFO mapreduce.Job: Job job_1502913552390_0001 
> running in uber mode : false
> 2017-08-16 13:00:03,847 INFO mapreduce.Job:  map 0% reduce 0%
> 2017-08-16 13:00:03,864 INFO mapreduce.Job: Job job_1502913552390_0001 failed 
> with state FAILED due to: Application application_1502913552390_0001 failed 2 
> times due to AM Container for appattempt_1502913552390_0001_02 exited 
> with  exitCode: 1
> Failing this attempt.Diagnostics: [2017-08-16 13:00:02.963]Exception from 
> container-launch.
> Container id: container_1502913552390_0001_02_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:994)
>   at org.apache.hadoop.util.Shell.run(Shell.java:887)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:295)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:455)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:275)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:90)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is because mapreduce related jar are not added into yarn setup by 
> default. To make MR job run successful, we need to add following 
> configurations to yarn-site.xml now:
> {noformat}
> 
>   yarn.application.classpath
>   
> ...
> /share/hadoop/mapreduce/*,
> /share/hadoop/mapreduce/lib/*
> ...
>   
> {noformat}
> But this config is not necessary for previous version of Hadoop. We should 
> fix this issue before beta release otherwise it will be a regression for 
> configuration changes.
> This could be more like a YARN issue (if so, we should move), depends on how 
> we fix it finally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6941) The default setting doesn't work for MapReduce job

2017-09-05 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang resolved MAPREDUCE-6941.

Resolution: Not A Problem

I'm going to close this based on Ray's analysis. Junping, if you disagree, 
please re-open the JIRA.

> The default setting doesn't work for MapReduce job
> --
>
> Key: MAPREDUCE-6941
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6941
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: Junping Du
>Priority: Blocker
>
> On the deployment of hadoop 3 cluster (based on current trunk branch) with 
> default settings, the MR job will get failed as following exceptions:
> {noformat}
> 2017-08-16 13:00:03,846 INFO mapreduce.Job: Job job_1502913552390_0001 
> running in uber mode : false
> 2017-08-16 13:00:03,847 INFO mapreduce.Job:  map 0% reduce 0%
> 2017-08-16 13:00:03,864 INFO mapreduce.Job: Job job_1502913552390_0001 failed 
> with state FAILED due to: Application application_1502913552390_0001 failed 2 
> times due to AM Container for appattempt_1502913552390_0001_02 exited 
> with  exitCode: 1
> Failing this attempt.Diagnostics: [2017-08-16 13:00:02.963]Exception from 
> container-launch.
> Container id: container_1502913552390_0001_02_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:994)
>   at org.apache.hadoop.util.Shell.run(Shell.java:887)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:295)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:455)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:275)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:90)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is because mapreduce related jar are not added into yarn setup by 
> default. To make MR job run successful, we need to add following 
> configurations to yarn-site.xml now:
> {noformat}
> 
>   yarn.application.classpath
>   
> ...
> /share/hadoop/mapreduce/*,
> /share/hadoop/mapreduce/lib/*
> ...
>   
> {noformat}
> But this config is not necessary for previous version of Hadoop. We should 
> fix this issue before beta release otherwise it will be a regression for 
> configuration changes.
> This could be more like a YARN issue (if so, we should move), depends on how 
> we fix it finally.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events

2017-09-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154189#comment-16154189
 ] 

Jason Lowe commented on MAPREDUCE-5124:
---

Ah, sorry, I thought we were still worrying about how to keep the AM from 
exploding.  Sure, I could see a dynamic heartbeat still being useful once the 
flow control problem is addressed.  Even with the current async processing 
without flow control we could feedback to the task information on how long to 
wait until the next heartbeat (e.g.: leverage the current AsyncDispatcher event 
queue size to scale the next task heartbeat interval accordingly) which could 
help avoid continued heartbeat pileups for large jobs.

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events

2017-09-05 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154117#comment-16154117
 ] 

Miklos Szegedi commented on MAPREDUCE-5124:
---

[~jlowe], I absolutely agree that the heartbeat should be synchronous, with no 
new call until the previous is processed and I also agree that the async RPC 
support is needed to process other important messages. This solves the graceful 
degradation issue. What I am saying is that once 10 mappers send these 
heartbeats and wait for them, there will be a delay processing them due to the 
server bottleneck, so the metric would reach the client later, unless we 
minimize the delay with either a server to client approach or a dynamic 
heartbeat interval.

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events

2017-09-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154108#comment-16154108
 ] 

Jason Lowe commented on MAPREDUCE-5124:
---

bq. I think either the server needs to control the heartbeat to minimize the 
delay (indeed a too big a change), or the task needs to tweak the heartbeat 
interval based on the previous response time as Peter Bacsko has suggested.

The issue here isn't that tasks are seeing a long delay in heartbeat response 
time and failing to react to that.  The problem is the AM is accepting and 
quickly responding to them at a rate far higher than it can actually process 
them in the background AsyncDispatcher thread.  In other words, by the time a 
task notices a significant delay in heartbeat processing time the AM has 
probably already started going into GC hell and it's likely too late to 
course-correct at that point.  The only way to get reliable feedback on how 
long the processing is really taking is to make the heartbeat processing 
synchronous, so the task doesn't get a response until the processing has 
actually completed.  Without async RPC call support, that has the issue of 
tying up the server handler threads which prevents more important calls from 
being processed in a timely manner.

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events

2017-09-05 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154072#comment-16154072
 ] 

Miklos Szegedi commented on MAPREDUCE-5124:
---

Thank you, [~jlowe] for the previous reply. Let me address your concerns there. 
You are right, doing an asynchronous call leveraging HADOOP-11552 is probably 
the smallest change possible in this case.
What I was trying to solve is the theoretical problem sending heartbeat with 
metrics from large amount of tasks with graceful degradation with interval T 
and minimal delay D. The delay for a metric is {{D+T/2}}, when read from the 
AM. It waited D amount of time in the queue and once available it will be 
sampled with a mean delay of {{T/2}}. If the server controls the heartbeat both 
graceful degradation and minimal delay are met, since there is no delay D=0, 
the heartbeat is processed right away. If the task controls the heartbeat the 
average wait time adds to the delay of the current metrics, so any consumer 
will get those later. Indeed this would also mean making the client socket 
connection act as an RPC server, which is quite a big change.
I think either the server needs to control the heartbeat to minimize the delay 
(indeed a too big a change), or the task needs to tweak the heartbeat interval 
based on the previous response time as [~pbacsko] has suggested. The second 
option could be implemented on top of HADOOP-11552.

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6432) Fix typos in hadoop-mapreduce-project module

2017-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154023#comment-16154023
 ] 

Hadoop QA commented on MAPREDUCE-6432:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} MAPREDUCE-6432 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | MAPREDUCE-6432 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12744761/MAPREDUCE-6432.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7122/console |
| Powered by | Apache Yetus 0.5.0   http://yetus.apache.org |


This message was automatically generated.



> Fix typos in hadoop-mapreduce-project module
> 
>
> Key: MAPREDUCE-6432
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6432
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>Affects Versions: 2.7.1
>Reporter: Ray Chiang
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
>  Labels: supportability
> Attachments: MAPREDUCE-6432.001.patch
>
>
> Fix a bunch of typos in comments, strings, variable names, and method names 
> in the hadoop-mapreduce-project module.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6432) Fix typos in hadoop-mapreduce-project module

2017-09-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154005#comment-16154005
 ] 

Hadoop QA commented on MAPREDUCE-6432:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} MAPREDUCE-6432 does not apply to trunk. Rebase required? Wrong 
Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | MAPREDUCE-6432 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12744761/MAPREDUCE-6432.001.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7121/console |
| Powered by | Apache Yetus 0.5.0   http://yetus.apache.org |


This message was automatically generated.



> Fix typos in hadoop-mapreduce-project module
> 
>
> Key: MAPREDUCE-6432
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6432
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>Affects Versions: 2.7.1
>Reporter: Ray Chiang
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
>  Labels: supportability
> Attachments: MAPREDUCE-6432.001.patch
>
>
> Fix a bunch of typos in comments, strings, variable names, and method names 
> in the hadoop-mapreduce-project module.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes

2017-09-05 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153860#comment-16153860
 ] 

Haibo Chen commented on MAPREDUCE-6441:
---

bq. but I haven't managed to get it to fail with the old code
My understanding is that the new test is supposed to fail with the old code and 
the new change is supposed to fix the test failure. Otherwise, the new test is 
not testing any new behavior, right?

> Improve temporary directory name generation in LocalDistributedCacheManager 
> for concurrent processes
> 
>
> Key: MAPREDUCE-6441
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: William Watson
>Assignee: Ray Chiang
> Attachments: HADOOP-10924.02.patch, 
> HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, 
> MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch
>
>
> Kicking off many sqoop processes in different threads results in:
> {code}
> 2014-08-01 13:47:24 -0400:  INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: 
> Encountered IOException running import job: java.io.IOException: 
> java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot 
> overwrite non empty destination directory 
> /tmp/hadoop-hadoop/mapred/local/1406915233073
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> java.security.AccessController.doPrivileged(Native Method)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> javax.security.auth.Subject.doAs(Subject.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> {code}
> If two are kicked off in the same second. The issue is the following lines of 
> code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: 
> {code}
> // Generating unique numbers for FSDownload.
> AtomicLong uniqueNumberGenerator =
>new AtomicLong(System.currentTimeMillis());
> {code}
> and 
> {code}
> Long.toString(uniqueNumberGenerator.incrementAndGet())),
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events

2017-09-05 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153610#comment-16153610
 ] 

Jason Lowe commented on MAPREDUCE-5124:
---

Turning on the RPC backoff feature alone will not be enough, as the call queues 
aren't backing up today.  We'd have to change the processing of the heartbeat 
to be synchronously processed by the IPC server handler thread rather than 
thrown on the AsyncDispatcher event queue as it's done today.  That means we'll 
quickly start tying up server handler threads for large jobs, and that will end 
up choking out more important method calls like task assignment, task 
completion, etc.  It would probably work but be far from ideal when things 
start to become congested.

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-5124) AM lacks flow control for task events

2017-09-05 Thread Peter Bacsko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153432#comment-16153432
 ] 

Peter Bacsko edited comment on MAPREDUCE-5124 at 9/5/17 10:37 AM:
--

Just a question - we already have 
https://issues.apache.org/jira/browse/HADOOP-10597. Can't we just enable this 
feature inside the MRAppMaster when it creates the RPC server for 
TaskUmbilicalProtocol? (I guess that's the message which mappers/reducers 
call). Then in {{TaskReporter}} we handle {{RetriableException}} and increase 
the heartbeat interval, let's say double it. If it succeeds after a couple of 
reports, we can try to decrease it again, back to the original value. This 
might not be the best flow control method, but we can think about this.


was (Author: pbacsko):
Just a question - we already have 
https://issues.apache.org/jira/browse/HADOOP-10597. Can't we just enable this 
feature inside the MRAppMaster when it creates the RCP server for 
TaskUmbilicalProtocol? (I guess that's the message which mappers/reducers 
call). Then in {{TaskReporter}} we handle {{RetriableException}} and increase 
the heartbeat interval, let's say double it. If it succeeds after a couple of 
reports, we can try to decrease it again, back to the original value. This 
might not be the best flow control method, but we can think about this.

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5124) AM lacks flow control for task events

2017-09-05 Thread Peter Bacsko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153432#comment-16153432
 ] 

Peter Bacsko commented on MAPREDUCE-5124:
-

Just a question - we already have 
https://issues.apache.org/jira/browse/HADOOP-10597. Can't we just enable this 
feature inside the MRAppMaster when it creates the RCP server for 
TaskUmbilicalProtocol? (I guess that's the message which mappers/reducers 
call). Then in {{TaskReporter}} we handle {{RetriableException}} and increase 
the heartbeat interval, let's say double it. If it succeeds after a couple of 
reports, we can try to decrease it again, back to the original value. This 
might not be the best flow control method, but we can think about this.

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6760) LocatedFileStatusFetcher to use listFiles(recursive)

2017-09-05 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated MAPREDUCE-6760:
--
Target Version/s: 3.1.0

> LocatedFileStatusFetcher to use listFiles(recursive)
> 
>
> Key: MAPREDUCE-6760
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6760
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>
> {{LocatedFileStatusFetcher}} does parallelized path listing, but it does make 
> recursive calls on every subdir.
> If we could switch it to use {{FileSystem.listFiles(recursive)}}, object 
> stores that have high-performance implementations of that operation would see 
> significant speedup.
> HADOOP-13208 implements that for S3A; Azure, swift &c can do the same.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org