[jira] [Commented] (MAPREDUCE-7274) Enable to limit running map and reduce tasks when job is running

2020-04-22 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090193#comment-17090193
 ] 

Hadoop QA commented on MAPREDUCE-7274:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 10s{color} 
| {color:red} MAPREDUCE-7274 does not apply to branch-3.2.0. Rebase required? 
Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | MAPREDUCE-7274 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13000898/MAPREDUCE-7274-branch-3.2.0.patch
 |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7770/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |


This message was automatically generated.



> Enable to limit running map and reduce tasks when job is running
> 
>
> Key: MAPREDUCE-7274
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7274
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: mr-am, mrv2
>Affects Versions: 3.2.0
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: MAPREDUCE-7274-branch-3.2.0.patch
>
>
> MRv2 enabled users to control the number of map or reduce tasks running 
> simultaneously by configuration  *_mapreduce.job.running.map.limit_* or 
> _*mapreduce.job.running.reduce.limit*._ But users can only set limit number 
> before submitting the job to rm. So, it's meaningful  to enable users to set 
> the limit of running map or reduce tasks when job is running, which can help 
> users to restrict  resource usage of job and give resources to high-priority 
> job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7274) Enable to limit running map and reduce tasks when job is running

2020-04-22 Thread Chengwei Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengwei Wang updated MAPREDUCE-7274:
-
   Attachment: MAPREDUCE-7274-branch-3.2.0.patch
 Target Version/s: 3.2.0
Affects Version/s: 3.2.0
   Status: Patch Available  (was: Open)

> Enable to limit running map and reduce tasks when job is running
> 
>
> Key: MAPREDUCE-7274
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7274
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: mr-am, mrv2
>Affects Versions: 3.2.0
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: MAPREDUCE-7274-branch-3.2.0.patch
>
>
> MRv2 enabled users to control the number of map or reduce tasks running 
> simultaneously by configuration  *_mapreduce.job.running.map.limit_* or 
> _*mapreduce.job.running.reduce.limit*._ But users can only set limit number 
> before submitting the job to rm. So, it's meaningful  to enable users to set 
> the limit of running map or reduce tasks when job is running, which can help 
> users to restrict  resource usage of job and give resources to high-priority 
> job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7274) Enable to limit running map and reduce tasks when job is running

2020-04-22 Thread Chengwei Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengwei Wang updated MAPREDUCE-7274:
-
Attachment: (was: MAPREDUCE-7274-branch-3.2.0.patch)

> Enable to limit running map and reduce tasks when job is running
> 
>
> Key: MAPREDUCE-7274
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7274
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: mr-am, mrv2
>Reporter: Chengwei Wang
>Priority: Major
>
> MRv2 enabled users to control the number of map or reduce tasks running 
> simultaneously by configuration  *_mapreduce.job.running.map.limit_* or 
> _*mapreduce.job.running.reduce.limit*._ But users can only set limit number 
> before submitting the job to rm. So, it's meaningful  to enable users to set 
> the limit of running map or reduce tasks when job is running, which can help 
> users to restrict  resource usage of job and give resources to high-priority 
> job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7274) Enable to limit running map and reduce tasks when job is running

2020-04-22 Thread Chengwei Wang (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084662#comment-17084662
 ] 

Chengwei Wang edited comment on MAPREDUCE-7274 at 4/23/20, 2:55 AM:


Upload patch for hadoop 3.2.0.

It enable us to set limit for running tasks when job is running by mapred 
client command:
{code:bash}
mapred job -set-running-task-limit JOB_ID TASK_TYPE LIMIT

e.g.
   mapred job -set-running-task-limit job_1583809537551_21297 MAP 100
   mapred job -set-running-task-limit job_1583809537551_21297 REDUCE 100
{code}
it send a rpc resquest to AM, and AM would update the max running limit of the 
specified type task as specified limit count.

 


was (Author: smarthan):
Upload patch for hadoop 3.2.0.

It enable us to set  task running limit when job is running by mapred client 
command:
{code:bash}
mapred job -set-running-task-limit JOB_ID TASK_TYPE LIMIT

e.g.
   mapred job -set-running-task-limit job_1583809537551_21297 MAP 100
   mapred job -set-running-task-limit job_1583809537551_21297 REDUCE 100
{code}
it send a rpc resquest to AM, and AM would update the max running limit of the 
specified type task as specified limit count.

 

> Enable to limit running map and reduce tasks when job is running
> 
>
> Key: MAPREDUCE-7274
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7274
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: mr-am, mrv2
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: MAPREDUCE-7274-branch-3.2.0.patch
>
>
> MRv2 enabled users to control the number of map or reduce tasks running 
> simultaneously by configuration  *_mapreduce.job.running.map.limit_* or 
> _*mapreduce.job.running.reduce.limit*._ But users can only set limit number 
> before submitting the job to rm. So, it's meaningful  to enable users to set 
> the limit of running map or reduce tasks when job is running, which can help 
> users to restrict  resource usage of job and give resources to high-priority 
> job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7274) Enable to limit running map and reduce tasks when job is running

2020-04-22 Thread Chengwei Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengwei Wang updated MAPREDUCE-7274:
-
Summary: Enable to limit running map and reduce tasks when job is running  
(was: Enable to set running task limit when job is running)

> Enable to limit running map and reduce tasks when job is running
> 
>
> Key: MAPREDUCE-7274
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7274
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: mr-am, mrv2
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: MAPREDUCE-7274-branch-3.2.0.patch
>
>
> MRv2 enabled users to control the number of map or reduce tasks running 
> simultaneously by configuration  *_mapreduce.job.running.map.limit_* or 
> _*mapreduce.job.running.reduce.limit*._ But users can only set limit number 
> before submitting the job to rm. So, it's meaningful  to enable users to set 
> the limit of running map or reduce tasks when job is running, which can help 
> users to restrict  resource usage of job and give resources to high-priority 
> job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7274) Enable to set running task limit when job is running

2020-04-22 Thread Chengwei Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengwei Wang updated MAPREDUCE-7274:
-
Summary: Enable to set running task limit when job is running  (was: Enable 
to set running task limit when mapreduce job is running)

> Enable to set running task limit when job is running
> 
>
> Key: MAPREDUCE-7274
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7274
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: mr-am, mrv2
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: MAPREDUCE-7274-branch-3.2.0.patch
>
>
> MRv2 enabled users to control the number of map or reduce tasks running 
> simultaneously by configuration  *_mapreduce.job.running.map.limit_* or 
> _*mapreduce.job.running.reduce.limit*._ But users can only set limit number 
> before submitting the job to rm. So, it's meaningful  to enable users to set 
> the limit of running map or reduce tasks when job is running, which can help 
> users to restrict  resource usage of job and give resources to high-priority 
> job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7277) IndexCache totalMemoryUsed differs from cache contents.

2020-04-22 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090063#comment-17090063
 ] 

Eric Payne commented on MAPREDUCE-7277:
---

Thanks a lot [~jeagles] for raising this issue and providing a patch. I am 
still going through the test code, but here are my thoughts so far:

- IndexCache#readIndexFileToCache
 -- Why is {{checkTotalMemoryUsed()}} called in the finally block? The return 
value is not checked and AFAICT, it doesn't have any side effects.
 - IndexCache#removeMap
 -- In the following code, if {{mapId}} isn't in {{queue}}, does that 
necessarily follow that it is not in {{cache}}? I think the answer is yes, 
right? It only gets put in the {{queue}} once it's in the {{cache}}.
{code:java}
  public void removeMap(String mapId) throws IOException {
if (!queue.remove(mapId)) {
  LOG.debug("Map ID {} not found in queue", mapId);
  return;
}
 ...
  }
{code}

 - IndexCache#freeIndexInformation:
 -- In the following code, if ever {{mapId}} is in {{queue}} but not in 
{{cache}}, I think {{totalMemoryUsed}} could still be out of sync, because by 
the time freeIndexInformation is called, {{mapId}}'s indexinfo size should have 
already been added to {{totalMemoryUsed}}. But that should never happen, right? 
{{cache}} gets updated first and then {{queue}}, so if {{mapId}} is in 
{{queue}}, it should also be in {{cache}}
{code:java}
  private synchronized void freeIndexInformation() throws IOException {
while (totalMemoryUsed.get() > totalMemoryAllowed) {
  String mapId = queue.remove();
  IndexInformation info = cache.remove(mapId);
  if (info == null) {
LOG.warn("Map ID " + mapId + " not found in cache");
continue;
  }
  ...
}
  }
{code}

> IndexCache totalMemoryUsed differs from cache contents.
> ---
>
> Key: MAPREDUCE-7277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7277
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Attachments: IndexCacheActualSize.png, MAPREDUCE-7277.001.patch
>
>
> It was observed recently in a nodemanager OOM that the memory was filled with 
> SpillRecords. However, the IndexCache was only 15% full (1.5MB used on a 10MB 
> configured cache size). In particular was noted that the booking variable 
> totalMemoryUsed, was out of sync with the contents of the cache showing 96% 
> full, thereby drastically reducing the effectiveness of the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7277) IndexCache totalMemoryUsed differs from cache contents.

2020-04-22 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089839#comment-17089839
 ] 

Jim Brennan commented on MAPREDUCE-7277:


Thanks for the patch [~jeagles]!

Question about readIndexFileToCache() in the case where we throw IOException if 
we fail to construct the SpillRecord.  In the old code, we just executed the 
finally clause and then threw the IOException without doing the queue.add() and 
total memory update.  In the new code, we are still doing the queue.add() in 
this case.  Was this intended?

Also, I don't understand why you call checkTotalMemoryUsed() here - is this 
left over debug code?


> IndexCache totalMemoryUsed differs from cache contents.
> ---
>
> Key: MAPREDUCE-7277
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7277
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
> Attachments: IndexCacheActualSize.png, MAPREDUCE-7277.001.patch
>
>
> It was observed recently in a nodemanager OOM that the memory was filled with 
> SpillRecords. However, the IndexCache was only 15% full (1.5MB used on a 10MB 
> configured cache size). In particular was noted that the booking variable 
> totalMemoryUsed, was out of sync with the contents of the cache showing 96% 
> full, thereby drastically reducing the effectiveness of the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org