[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-12-10 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:

Attachment: YARN-3946.v1.008.patch

Thanks for pointing it out [~wangda], have corrected the test case and the 
applicable and correct checkstyle issues

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch, 
> YARN-3946.v1.007.patch, YARN-3946.v1.008.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-12-04 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:

Attachment: YARN-3946.v1.006.patch

Hi [~wangda],
Attaching a patch with corrections for test case and removing duplicate method 
in SchedulerApplicationAttempt

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch, YARN-3946.v1.006.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-12-01 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:

Attachment: YARN-3946.v1.005.patch

Thanks for the comments [~wangda],
bq. When app goes to final state (FINISHED/KILLEd, etc.), should we simply set 
AMLaunchDiagnostics to null?
IIUC you are referring to RMAppAttemptImpl right ?, if so its mistake while 
correcting based on your previous comment missed to revert this part but anyway 
as per your 4th comment in cases of unmanaged AM i have updated it to null here.

bq. Why need two separate methods: 
updateDiagnosticsIfNotRunning/updateDiagnostics?
May be the name needs to be proper but two methods are required as the status 
needs to be updated only if AM is not running for example its called in 
FiCaSchedulerApp.allocate, this method will be called whenever container is 
assiged for a app but we want to update the diagnostic only when the AM is not 
yet launched. and similarly used in LeafQueue.assignContainers. But in some 
cases we are sure that the AM is not yet launched hence to avoid unwanted 
verification (whether AM is running) we have updateDiagnostics. May be i can 
name them as {{checkAndUpdateAMContainerDiagnostics}} and 
{{updateAMContainerDiagnostics}} ?

bq. Do you think is it better to rename AMState.PENDING to inactivated?
Yes, PENDING is not understandable to all hence the diagnostic message for 
{{PENDING}} is already set as *"Application is added to the scheduler and is 
not yet activated."* may be i can mention it as {{Application is added to the 
scheduler but is not yet scheduled.}} Thoughts? 

bq. Instead of setting AMLaunchDiagnostics to null when RMAppAttempt enters 
Scheduled state,do you think is it better to do that in RUNNING and 
FINAL_SAVING state? Unmanaged AM could skip the SCHEDULED state.
IMO i would prefer to set only for Unmanaged AMs in *FINAL_SAVING state* as 
already we are showing the *YarnApplicationState* as running and giving 
description abt it. so again if diagnostics is also showing that AM is launched 
and running then it can becomes repetitive in UI for normal (non unmanaged AM) 
apps.

bq. It will be also very usaful if you can update AM launch diagnostics when 
RMAppAttempt go to LAUNCHED state, 
Actually i wrongly considered AMContainerAllocatedTransition to reset the diag 
message, my intention was to reset only after its launched and registered. This 
would be very usefull for analyzing the state of AM. Have introduced 
{{LAUNCHED}} and setting after AMLauncher sends  LAUNCHED event to RMAppAttempt.

[~wangda] & [~jianhe]
Please review the latest patch,

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch, YARN-3946.v1.005.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-11-24 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:

Attachment: YARN-3946.v1.004.patch

Hi [~wangda]
  *TestAMAuthorization and TestClientRMTokens* test cases are not related to 
this issue and already there are jiras addressing these testcase failures, but 
{{TestApplicationLimitsByPartition}} is related to the patch which i have 
corrected and also have covered one case when application is not assigned to a 
node, diagnostics shows the information of the node and the reason.

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, 
> YARN-3946.v1.004.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-11-23 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:

Attachment: YARN-3946.v1.003.patch
YARN-3946.v1.003.Images.zip

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-11-09 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:

Attachment: (was: YARN3946_attemptDiagnistic message.png)

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-11-08 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:

Attachment: 3946WebImages.zip
YARN-3946.v1.002.patch

Thanks for the quick feedback [~wangda]
bq. AM launch diagnostics should have an intial value after added to scheduler: 
...
Initially thought of adding this message but the problem is 
{{LeafQueue.activateApplications}} will be immediately called by C.S in 
{{addApplicationAttempt}} hence the messages will be replaced very fast, hence 
initial message will not be helpfull but have ensured the related details are 
captured. Thoughts?

bq.Not caused by your patch, isWaitingForAMContainer checks if master container 
created, you may also need to check if application is in recover state or not. 
Because AM could contact to RM before AM container recovered by RM.
I am not sure i got this correctly
# ??AM could contact to RM before AM container recovered by RM?? failed to 
understand the impact of this, all the required information is restored from 
the RMState store ({{RMAppAttemptImpl.recover(RMState)}} sets the 
mastercontainer from the store) , so after the services are started there is a 
possibility of AM hearbeat to be earlier than NM heartbeat, but what impact 
could it have? Correct me if my understanding is wrong !
# ??check if application is in recover state or not?? not sure how to do this 
if req!, i went through RMAppAttemptImpl and RMAppImpl there was no such 
methods or internal state which can expose this. May be i am missing something 
here.

bq. Suggest to add to REST API / web UI together with this patch if changes are 
not complex.
Even earlier Implementation also had captured it as part of 
attempt.getDiagnostics, so it will be available in all the interfaces

Other comments have handled, Have attached the web images 

[~steve_l],
bq. I'd like to see this in application reports, so that client-side 
applications can display the details
Have taken care in this patch


> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, 
> YARN-3946.v1.002.patch, YARN3946_attemptDiagnistic message.png
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state.

2015-11-03 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:

Attachment: YARN-3946.v1.001.patch
YARN3946_attemptDiagnistic message.png

Hi [~wangda],[~rohithsharma],[~sunilg], [~sumit.nigam] & [~nijel].

As mentioned by Wangda in his 
[comment|https://issues.apache.org/jira/browse/YARN-4091?focusedCommentId=14735266=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14735266]
 in YARN-4091, its very difficult to capture to the status when *App's 
leafqueue or parent queue beyond its limit* as it would not be good to loop 
through all the apps in the hierarchy and update the status for each node 
update and also it will loose its imp info from previous updates.

So i think valid cases where we can update AMLaunchDiagnostics in 
SchedulerApplicationAttempt as (ForCS) :
 
* App is in Pending state, AMLimit/userlimit of the queue
* App waiting for resources of partition for AM to be launched (once moved from 
pending state)
* App waiting for resources of partition for AM to be launched Some nodes are 
blacklisted (if it fails to launch because of some black list nodes)
* AMLimit of the queue doesnt allow to launch 
* UserLimit of the queue doesnt allow to launch

Please check if the approach is proper, if its usefull and required then can 
get similar thing done for FairScheduler also. cc/ [~ka...@cloudera.com]

Also have taken the liberty to modify some small issues in 
{{SchedulerApplicationAttempt.isWaitingForAMContainer}} in the same patch if 
required can raise another jira and put these small changes there.


> Allow fetching exact reason as to why a submitted app is in ACCEPTED state.
> ---
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: YARN-3946.v1.001.patch, YARN3946_attemptDiagnistic 
> message.png
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-11-03 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:

Component/s: capacity scheduler

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: YARN-3946.v1.001.patch, YARN3946_attemptDiagnistic 
> message.png
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS

2015-11-03 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-3946:

Summary: Allow fetching exact reason as to why a submitted app is in 
ACCEPTED state in CS  (was: Allow fetching exact reason as to why a submitted 
app is in ACCEPTED state.)

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state in 
> CS
> 
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacity scheduler, resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
> Attachments: YARN-3946.v1.001.patch, YARN3946_attemptDiagnistic 
> message.png
>
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state.

2015-09-01 Thread Rohith Sharma K S (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-3946:

Issue Type: Sub-task  (was: Improvement)
Parent: YARN-4091

> Allow fetching exact reason as to why a submitted app is in ACCEPTED state.
> ---
>
> Key: YARN-3946
> URL: https://issues.apache.org/jira/browse/YARN-3946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Sumit Nigam
>Assignee: Naganarasimha G R
>
> Currently there is no direct way to get the exact reason as to why a 
> submitted app is still in ACCEPTED state. It should be possible to know 
> through RM REST API as to what aspect is not being met - say, queue limits 
> being reached, or core/ memory requirement not being met, or AM limit being 
> reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state.

2015-07-21 Thread Sumit Nigam (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumit Nigam updated YARN-3946:
--
Description: Currently there is no direct way to get the exact reason as to 
why a submitted app is still in ACCEPTED state. It should be possible to know 
through RM REST API as to what aspect is not being met - say, queue limits 
being reached, or core/ memory requirement not being met, or AM limit being 
reached, etc.  (was: Currently there is no direct way to get the exact reason 
as to why a submitted app is still in ACCEPTED state. It should be able to know 
through RM REST API as to what aspect is not being met - say, queue limits 
being reached, or core/ memory requirement not being met, or AM limit being 
reached, etc.)

 Allow fetching exact reason as to why a submitted app is in ACCEPTED state.
 ---

 Key: YARN-3946
 URL: https://issues.apache.org/jira/browse/YARN-3946
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Sumit Nigam

 Currently there is no direct way to get the exact reason as to why a 
 submitted app is still in ACCEPTED state. It should be possible to know 
 through RM REST API as to what aspect is not being met - say, queue limits 
 being reached, or core/ memory requirement not being met, or AM limit being 
 reached, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)