[jira] [Commented] (YARN-2266) Add an application timeout service in RM to kill applications which are not getting resources

2016-02-11 Thread Sudip Hazra Choudhury (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142399#comment-15142399
 ] 

Sudip Hazra Choudhury commented on YARN-2266:
-

Surely, we are interested in this feature. It would be very helpful.

This feature can have a default value of 0 (infinite) and others should be able 
to set non-zero value depending on the requirement.

> Add an application timeout service in RM to kill applications which are not 
> getting resources
> -
>
> Key: YARN-2266
> URL: https://issues.apache.org/jira/browse/YARN-2266
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Ashutosh Jindal
>
> Currently , If an application is submitted to RM, the app keeps waiting until 
> the resources are allocated for AM. Such an application may be stuck till a 
> resource is allocated for AM, and this may be due to over utilization of 
> Queue or User limits etc. In a production cluster, some periodic running 
> applications may have lesser cluster share. So after waiting for some time, 
> if resources are not available, such applications can be made as failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2266) Add an application timeout service in RM to kill applications which are not getting resources

2016-02-11 Thread Rohith Sharma K S (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143985#comment-15143985
 ] 

Rohith Sharma K S commented on YARN-2266:
-

Apologies for not observing this JIRA before creating YARN-3813. Both the 
JIRA's are intended with same use case. There are some progress in YARN-3813 
along with POC patch, so we can continue discussion in YARN-3813. 

> Add an application timeout service in RM to kill applications which are not 
> getting resources
> -
>
> Key: YARN-2266
> URL: https://issues.apache.org/jira/browse/YARN-2266
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Ashutosh Jindal
>
> Currently , If an application is submitted to RM, the app keeps waiting until 
> the resources are allocated for AM. Such an application may be stuck till a 
> resource is allocated for AM, and this may be due to over utilization of 
> Queue or User limits etc. In a production cluster, some periodic running 
> applications may have lesser cluster share. So after waiting for some time, 
> if resources are not available, such applications can be made as failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2266) Add an application timeout service in RM to kill applications which are not getting resources

2016-02-11 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143169#comment-15143169
 ] 

Devaraj K commented on YARN-2266:
-

Duplicate of YARN-3813

> Add an application timeout service in RM to kill applications which are not 
> getting resources
> -
>
> Key: YARN-2266
> URL: https://issues.apache.org/jira/browse/YARN-2266
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Reporter: Ashutosh Jindal
>
> Currently , If an application is submitted to RM, the app keeps waiting until 
> the resources are allocated for AM. Such an application may be stuck till a 
> resource is allocated for AM, and this may be due to over utilization of 
> Queue or User limits etc. In a production cluster, some periodic running 
> applications may have lesser cluster share. So after waiting for some time, 
> if resources are not available, such applications can be made as failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2266) Add an application timeout service in RM to kill applications which are not getting resources

2015-05-01 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524153#comment-14524153
 ] 

Zhijie Shen commented on YARN-2266:
---

Are we still interested in this enhancement? Otherwise, we can close this jira 
as won't fix.

 Add an application timeout service in RM to kill applications which are not 
 getting resources
 -

 Key: YARN-2266
 URL: https://issues.apache.org/jira/browse/YARN-2266
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Ashutosh Jindal

 Currently , If an application is submitted to RM, the app keeps waiting until 
 the resources are allocated for AM. Such an application may be stuck till a 
 resource is allocated for AM, and this may be due to over utilization of 
 Queue or User limits etc. In a production cluster, some periodic running 
 applications may have lesser cluster share. So after waiting for some time, 
 if resources are not available, such applications can be made as failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2266) Add an application timeout service in RM to kill applications which are not getting resources

2014-07-09 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056459#comment-14056459
 ] 

Vinod Kumar Vavilapalli commented on YARN-2266:
---

bq. So after waiting for some time, if resources are not available, such 
applications can be made as failed.
What happens next? The apps are going to be resubmitted and they will still 
wait in the queue. Trying to understand the overall picture..

It seems like you want to reserve some capacity for a queue of periodically 
running applications to avoid that from happening in the first place..

 Add an application timeout service in RM to kill applications which are not 
 getting resources
 -

 Key: YARN-2266
 URL: https://issues.apache.org/jira/browse/YARN-2266
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Ashutosh Jindal

 Currently , If an application is submitted to RM, the app keeps waiting until 
 the resources are allocated for AM. Such an application may be stuck till a 
 resource is allocated for AM, and this may be due to over utilization of 
 Queue or User limits etc. In a production cluster, some periodic running 
 applications may have lesser cluster share. So after waiting for some time, 
 if resources are not available, such applications can be made as failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2266) Add an application timeout service in RM to kill applications which are not getting resources

2014-07-09 Thread Ashutosh Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057127#comment-14057127
 ] 

Ashutosh Jindal commented on YARN-2266:
---

bq.What happens next? The apps are going to be resubmitted and they will still 
wait in the queue.
No, the same application will not be submitted again. Consider a case where an 
application run periodically every hour and the average time for the app 
completion is 30 mins. In such case, if the application is not getting 
resources for 30 mins or say it gets the resources after 30 mins, it is better 
to kill the application and let the next application serve the purpose.

 Add an application timeout service in RM to kill applications which are not 
 getting resources
 -

 Key: YARN-2266
 URL: https://issues.apache.org/jira/browse/YARN-2266
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Reporter: Ashutosh Jindal

 Currently , If an application is submitted to RM, the app keeps waiting until 
 the resources are allocated for AM. Such an application may be stuck till a 
 resource is allocated for AM, and this may be due to over utilization of 
 Queue or User limits etc. In a production cluster, some periodic running 
 applications may have lesser cluster share. So after waiting for some time, 
 if resources are not available, such applications can be made as failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)