[jira] [Updated] (YARN-5703) ReservationAgents are not correctly configured

2017-02-05 Thread Manikandan R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-5703:
---
Attachment: YARN-5703.003.patch

> ReservationAgents are not correctly configured
> --
>
> Key: YARN-5703
> URL: https://issues.apache.org/jira/browse/YARN-5703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Sean Po
>Assignee: Manikandan R
> Attachments: YARN-5703.001.patch, YARN-5703.002.patch, 
> YARN-5703.003.patch
>
>
> In AbstractReservationSystem, the method that instantiates a ReservationAgent 
> does not properly initialize it with the appropriate configuration because it 
> expects the ReservationAgent to implement Configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5703) ReservationAgents are not correctly configured

2017-02-05 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853216#comment-15853216
 ] 

Manikandan R commented on YARN-5703:


Thanks [~Naganarasimha] for review. I've attached newer patch based on comments 
and given explanation for each one below.

* Ok, I used {code} agentclass = Configuration.getClass(String, Class, Class) {code} to create objects.
* I've defined few constants in {code}ReservationSchedulerConfiguration{code}. 
To access those values through its corresponding getters method in 
{code}init(ReservationSchedulerConfiguration conf) {code} of 
AlignedPlannerWithGreedy & GreedyReservationAgent classes, I've defined the 
method {code} void init(ReservationSchedulerConfiguration conf); {code} in 
ReservationAgent.java. If I don't specify right sub class, those constants and 
its getters won't be available? Do you see any issues here?
* Yes, configurable doesn't suit here. But, 
{code}init(ReservationSchedulerConfiguration conf){code} has been implemented 
already.
* Yes, {code}yarnConf{code} doesn't add value here. The reason for including 
{code} init(ReservationSchedulerConfiguration conf) {code} in 
IterativePlanner.java is, it extends {code}PlanningAlgorithm{code}, which again 
has implemented {code}ReservationAgent{code}. Since configuration is available 
in AlignedPlannerWithGreedy & GreedyReservationAgent classes, thought of 
passing the same to IterativePlanner.java & TryManyReservationAgents.java as 
both classes has implemented {code}ReservationAgent{code} interface, but there 
is no use of it currently.

> ReservationAgents are not correctly configured
> --
>
> Key: YARN-5703
> URL: https://issues.apache.org/jira/browse/YARN-5703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Sean Po
>Assignee: Manikandan R
> Attachments: YARN-5703.001.patch, YARN-5703.002.patch, 
> YARN-5703.003.patch
>
>
> In AbstractReservationSystem, the method that instantiates a ReservationAgent 
> does not properly initialize it with the appropriate configuration because it 
> expects the ReservationAgent to implement Configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3571) AM does not re-blacklist NMs after ignoring-blacklist event happens?

2017-02-02 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850161#comment-15850161
 ] 

Manikandan R commented on YARN-3571:


I am interested in working on this. Can I work on it?

> AM does not re-blacklist NMs after ignoring-blacklist event happens?
> 
>
> Key: YARN-3571
> URL: https://issues.apache.org/jira/browse/YARN-3571
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager, resourcemanager
>Affects Versions: 2.5.1
>Reporter: Hao Zhu
>
> Detailed analysis are in item "3 Will AM re-blacklist NMs after 
> ignoring-blacklist event happens?" of below link:
> http://www.openkb.info/2015/05/when-will-application-master-blacklist.html
> The current behavior is : if that Node Manager has ever been blacklisted 
> before, then it will not be blacklisted again after ignore-blacklist happens; 
> Else, it will be blacklisted.
> However I think the right behavior should be : AM can re-blacklist NMs even 
> after ignoring-blacklist happens once.
>  The code logic is in function containerFailedOnHost(String hostName) of 
> RMContainerRequestor.java:
> {code}
>   protected void containerFailedOnHost(String hostName) {
> if (!nodeBlacklistingEnabled) {
>   return;
> }
> if (blacklistedNodes.contains(hostName)) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Host " + hostName + " is already blacklisted.");
>   }
>   return; //already blacklisted
> {code}
> The reason of above behavior is in above item 2: when ignoring-blacklist 
> happens, it only ask RM to clear "blacklistAdditions", however it dose not 
> clear the "blacklistedNodes" variable.
> This behavior may cause the whole job/application to fail if the previous 
> blacklisted NM was released after ignoring-blacklist event happens.
> Imagine a serial murder is released from prison just because the prison is 
> 33% full, and horribly he/she will never be put in prison again. Only new 
> murder will be put in prison.
> Example to prove:
> Test 1:
> One node(h4) has issue, other 3 nodes are healthy.
> The job failed with below AM logs:
> {code}
> [root@h1 container_1430425729977_0006_01_01]# egrep -i 'failures on 
> node|blacklist|FATAL' syslog
> 2015-05-02 18:38:41,246 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 
> nodeBlacklistingEnabled:true
> 2015-05-02 18:38:41,246 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 
> blacklistDisablePercent is 1
> 2015-05-02 18:39:07,249 FATAL [IPC Server handler 3 on 41696] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1430425729977_0006_m_02_0 - exited : java.io.IOException: Spill 
> failed
> 2015-05-02 18:39:07,297 INFO [Thread-49] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 1 failures on 
> node h4.poc.com
> 2015-05-02 18:39:07,950 FATAL [IPC Server handler 16 on 41696] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1430425729977_0006_m_08_0 - exited : java.io.IOException: Spill 
> failed
> 2015-05-02 18:39:07,954 INFO [Thread-49] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 2 failures on 
> node h4.poc.com
> 2015-05-02 18:39:08,148 FATAL [IPC Server handler 17 on 41696] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1430425729977_0006_m_07_0 - exited : java.io.IOException: Spill 
> failed
> 2015-05-02 18:39:08,152 INFO [Thread-49] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 3 failures on 
> node h4.poc.com
> 2015-05-02 18:39:08,152 INFO [Thread-49] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: Blacklisted host 
> h4.poc.com
> 2015-05-02 18:39:08,561 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: Update the 
> blacklist for application_1430425729977_0006: blacklistAdditions=1 
> blacklistRemovals=0
> 2015-05-02 18:39:08,561 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: Ignore 
> blacklisting set to true. Known: 4, Blacklisted: 1, 25%
> 2015-05-02 18:39:09,563 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: Update the 
> blacklist for application_1430425729977_0006: blacklistAdditions=0 
> blacklistRemovals=1
> 2015-05-02 18:39:32,912 FATAL [IPC Server handler 19 on 41696] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1430425729977_0006_m_02_1 - exited : java.io.IOException: Spill 
> failed
> 2015-05-02 18:39:35,076 FATAL [IPC Server handler 1 on 41696] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
> attempt_1430425729977_0006_m_09_0 - exited : java.io.IOException: Spill 
> failed
> 2015-05-02 18:39:35,133 FATAL [IPC Server 

[jira] [Commented] (YARN-5179) Issue of CPU usage of containers

2017-02-02 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850154#comment-15850154
 ] 

Manikandan R commented on YARN-5179:


I am interested in working on this. Can I take it forward?

> Issue of CPU usage of containers
> 
>
> Key: YARN-5179
> URL: https://issues.apache.org/jira/browse/YARN-5179
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.0
> Environment: Both on Windows and Linux
>Reporter: Zhongkai Mi
>
> // Multiply by 1000 to avoid losing data when converting to int 
>int milliVcoresUsed = (int) (cpuUsageTotalCoresPercentage * 1000 
>   * maxVCoresAllottedForContainers /nodeCpuPercentageForYARN); 
> This formula will not get right CPU usage based vcore if vcores != physical 
> cores. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5703) ReservationAgents are not correctly configured

2017-02-01 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15848674#comment-15848674
 ] 

Manikandan R commented on YARN-5703:


[~Naganarasimha], [~seanpo03]
Fixed Greedy RA Junit test case and attached patch for the same. Please review.


> ReservationAgents are not correctly configured
> --
>
> Key: YARN-5703
> URL: https://issues.apache.org/jira/browse/YARN-5703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Sean Po
>Assignee: Manikandan R
> Attachments: YARN-5703.001.patch, YARN-5703.002.patch
>
>
> In AbstractReservationSystem, the method that instantiates a ReservationAgent 
> does not properly initialize it with the appropriate configuration because it 
> expects the ReservationAgent to implement Configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5703) ReservationAgents are not correctly configured

2017-02-01 Thread Manikandan R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikandan R updated YARN-5703:
---
Attachment: YARN-5703.002.patch

> ReservationAgents are not correctly configured
> --
>
> Key: YARN-5703
> URL: https://issues.apache.org/jira/browse/YARN-5703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Sean Po
>Assignee: Manikandan R
> Attachments: YARN-5703.001.patch, YARN-5703.002.patch
>
>
> In AbstractReservationSystem, the method that instantiates a ReservationAgent 
> does not properly initialize it with the appropriate configuration because it 
> expects the ReservationAgent to implement Configurable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5703) ReservationAgents are not correctly configured

2017-01-28 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15844017#comment-15844017
 ] 

Manikandan R commented on YARN-5703:


Thanks [~seanpo03], [~Naganarasimha].

I've explained my approach below based on analysis. Please provide your 
feedback.

By Implementing Configurable interface in RA's - AlignedPlannerWithGreedy & 
GreedyReservationAgent classes, conf object would get passed from getAgent() 
method of AbstractReservationSystem.java while creating object for above said 
RA's. But, to make use of conf to fetch properties like smoothness factor etc, 
we will require new method (for ex, initialize() ? ) to set RA's vars (for ex, 
planner) because conf object cannot be used inside constructor as it would get 
set in setConf() method only after object creation. Since initialization 
happens in new method, we will need to make "planner" variable as non-final.

> ReservationAgents are not correctly configured
> --
>
> Key: YARN-5703
> URL: https://issues.apache.org/jira/browse/YARN-5703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Sean Po
>Assignee: Manikandan R
>
> In AbstractReservationSystem, the method that instantiates a ReservationAgent 
> does not properly initialize it with the appropriate configuration because it 
> expects the ReservationAgent to implement Configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5703) ReservationAgents are not correctly configured

2017-01-26 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840169#comment-15840169
 ] 

Manikandan R commented on YARN-5703:


I am interested in working on this. Shall I take this forward?

Thanks,
Mani

> ReservationAgents are not correctly configured
> --
>
> Key: YARN-5703
> URL: https://issues.apache.org/jira/browse/YARN-5703
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 3.0.0-alpha1
>Reporter: Sean Po
>Assignee: Sean Po
>
> In AbstractReservationSystem, the method that instantiates a ReservationAgent 
> does not properly initialize it with the appropriate configuration because it 
> expects the ReservationAgent to implement Configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5900) Configuring minimum-allocation-mb at queue level

2016-11-17 Thread Manikandan R (JIRA)
Manikandan R created YARN-5900:
--

 Summary: Configuring minimum-allocation-mb at queue level
 Key: YARN-5900
 URL: https://issues.apache.org/jira/browse/YARN-5900
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Manikandan R


Motivation for proposing minimum-allocation-mb at queue level in the form of 
yarn.scheduler.capacity..minimum-allocation-mb
is, when the queue structure has been designed on resource usages (only memory 
for now). For example, there could be three segments like small, medium & large 
jobs and queues can be created for each segment accordingly. With this, it 
would be good to configure the min container size of each queue separately. For 
example, For small, it is 1 GB, For medium, it is 3 GB, For large, it can be 6 
GB. 

Would this simplify container release process and its overall management, 
eventually reduces no. of containers running at any moment?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5370) Setting yarn.nodemanager.delete.debug-delay-sec to high number crashes NM because of OOM

2016-07-13 Thread Manikandan R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15375339#comment-15375339
 ] 

Manikandan R commented on YARN-5370:


To solve this issue, we tried by setting 
yarn.nodemanager.delete.debug-delay-sec to very low value (zero second) 
assuming that it may clear off the existing scheduled deletion tasks. It didn't 
happen - basically it is not applied for the existing tasks which has been 
already scheduled. Then, we come to know that canRecover() method is getting 
called in service start, which is trying to pull the info from NM recovery 
directory (from local filesystem) and building this entire info in memory, 
which in turn, causing the problems in starting the services and consuming so 
much amount of memory. Then, we tried by moving the contents of NM recovery 
directory to some other place. From this points onwards, it was able to start 
smoothly and works as expected. I think showing some warnings about this high 
value (for ex, 100+ days) somewhere (for ex, in logs) indicating that it can 
cause potential crash can saving significant amount of time to troubleshoot 
this issue.

> Setting yarn.nodemanager.delete.debug-delay-sec to high number crashes NM 
> because of OOM
> 
>
> Key: YARN-5370
> URL: https://issues.apache.org/jira/browse/YARN-5370
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Manikandan R
>
> I set yarn.nodemanager.delete.debug-delay-sec to 100 + days in my dev  
> cluster for some reasons. It has been done before 3-4 weeks. After setting 
> this up, at times, NM crashes because of OOM. So, I kept on increasing from 
> 512MB to 6 GB over the past few weeks gradually as and when this crash occurs 
> as temp fix. Sometimes, It won't start smoothly and after multiple tries, it 
> starts functioning. While analyzing heap dump of corresponding JVM, come to 
> know that DeletionService.Java is occupying almost 99% of total allocated 
> memory (-xmx) something like this
> org.apache.hadoop.yarn.server.nodemanager.DeletionService$DelServiceSchedThreadPoolExecutor
>  @ 0x6c1d09068| 80 | 3,544,094,696 | 99.13%
> Basically, there are huge no. of above mentioned tasks scheduled for 
> deletion. Usually, I see NM memory requirements as 2-4GB for large clusters. 
> In my case, cluster is very small and OOM occurs.
> Is it expected behaviour? (or) Is there any limit we can expose on 
> yarn.nodemanager.delete.debug-delay-sec to avoid these kind of issues?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-5370) Setting yarn.nodemanager.delete.debug-delay-sec to high number crashes NM because of OOM

2016-07-13 Thread Manikandan R (JIRA)
Manikandan R created YARN-5370:
--

 Summary: Setting yarn.nodemanager.delete.debug-delay-sec to high 
number crashes NM because of OOM
 Key: YARN-5370
 URL: https://issues.apache.org/jira/browse/YARN-5370
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Manikandan R


I set yarn.nodemanager.delete.debug-delay-sec to 100 + days in my dev  cluster 
for some reasons. It has been done before 3-4 weeks. After setting this up, at 
times, NM crashes because of OOM. So, I kept on increasing from 512MB to 6 GB 
over the past few weeks gradually as and when this crash occurs as temp fix. 
Sometimes, It won't start smoothly and after multiple tries, it starts 
functioning. While analyzing heap dump of corresponding JVM, come to know that 
DeletionService.Java is occupying almost 99% of total allocated memory (-xmx) 
something like this

org.apache.hadoop.yarn.server.nodemanager.DeletionService$DelServiceSchedThreadPoolExecutor
 @ 0x6c1d09068| 80 | 3,544,094,696 | 99.13%

Basically, there are huge no. of above mentioned tasks scheduled for deletion. 
Usually, I see NM memory requirements as 2-4GB for large clusters. In my case, 
cluster is very small and OOM occurs.

Is it expected behaviour? (or) Is there any limit we can expose on 
yarn.nodemanager.delete.debug-delay-sec to avoid these kind of issues?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



<    4   5   6   7   8   9