[ 
https://issues.apache.org/jira/browse/YARN-8659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604755#comment-16604755
 ] 

Szilard Nemeth edited comment on YARN-8659 at 9/5/18 6:23 PM:
--------------------------------------------------------------

Hi [~Prabhu Joseph]!
I found out the root cause of this bug and I was able to reproduce the bug with 
2 testcases.
When {{ClientRMService#getApplications}} is invoked, it first checks whether 
the user filters for queues. If yes, it iterates over the specified queues and 
retrieves the apps bound to the queue from the scheduler. Then, as a last step, 
a tricky iterator is set up, that basically can iterate over the collected 
application attempt IDs (since we can have multiple queues and each queue can 
have many apps associated to it, it's a list of lists).
See the iterator here: 
https://github.com/apache/hadoop/blob/9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java#L834-L859

What essentially is broken is the code that gets the application attempt IDs 
from the scheduler: 
https://github.com/apache/hadoop/blob/9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java#L829
The scheduler only returns the scheduled applications, but not the finished 
ones. This essentially means whatever is specified for the application state 
parameter, the code would only give applications back that are currently 
executing.

Let's go back what you described above: 
1. Just the RUNNING apps are returned if any queue is specified because the 
call to {{scheduler.getAppsInQueue\(queue\);}} only returns apps that are 
executing.
2. No applications are returned if the queue parameter is specified and the 
state parameter is set to FINISHED. 
As described above, this is faulty even if you don't specify a state parameter 
at all, as the call to {{scheduler.getAppsInQueue\(queue\);}} only returns apps 
that are executing, but not the other ones.

So basically, the solution for this is removing the tricky iterator and simply 
iterate over the apps retrieved from RMContext.
This should really work as the current code is also getting the applications 
from that collection: 
https://github.com/apache/hadoop/blob/9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java#L850

We should keep in mind API compatibility, though. 
With the current implementation, apps are only returned for a queue if they are 
executing.
With the code changes in my patch, if the user specifies the queue filter, the 
endpoint returns apps regardless of their states.

If we think about the apps endpoint as a set of filter parameters applied on 
applications, it seems to be more logical to return apps bound to a queue, 
regardless of what states they have, if the only filter is the queue filter.
If the user wants to have the apps that are executing and bound to a queue, one 
should specify both the queue and the state parameters.

[~templedf], [~leftnoteasy], [~haibochen]: Could you please share your opinions 
about what's more important? Keeping API compatibility or fix this bug?

Thanks!





was (Author: snemeth):
Hi [~Prabhu Joseph]!
I found out the root cause of this bug and I was able to reproduce the bug with 
2 testcases.
When {{ClientRMService#getApplications}} is invoked, it first checks whether 
the user filters for queues. If yes, it iterates over the specified queues and 
retrieves the apps bound to the queue from the scheduler. Then, as a last step, 
a tricky iterator is set up, that basically can iterate over the collected 
application attempt IDs (since we can have multiple queues and each queue can 
have many apps associated to it, it's a list of lists).
See the iterator here: 
https://github.com/apache/hadoop/blob/9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java#L834-L859

What essentially is broken is the code that gets the application attempt IDs 
from the scheduler: 
https://github.com/apache/hadoop/blob/9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java#L829
The scheduler only returns the scheduled applications, but not the finished 
ones. This essentially means whatever is specified for the application state 
parameter, the code would only give applications back that are currently 
executing.

Let's go back what you described above: 
1. Just the RUNNING apps are returned if any queue is specified because the 
call to {{scheduler.getAppsInQueue\(queue\);}} only returns apps that are 
executing.
2. No applications are returned if the queue parameter is specified and the 
state parameter is set to FINISHED. 
As described above, this is faulty even if you don't specify a state parameter 
at all, as the call to {{scheduler.getAppsInQueue\(queue\);}} only returns apps 
that are executing, but not the other ones.

So basically, the solution for this is removing the tricky iterator and simply 
iterate over the apps retrieved from RMContext.
This should really work as the current code is also getting the applications 
from that collection: 
https://github.com/apache/hadoop/blob/9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java#L850

We should keep in mind API compatibility, though. 
With the current implementation, apps are only returned for a queue if they are 
executing.
With the code changes in my patch, if the user specifies the queue filter, the 
endpoint returns apps regardless of their states.

If we think about the apps endpoint as a set of filter parameters applied on 
applications, it seems to be more logical to return apps bound to a queue, 
regardless of what states they have, if the only filter is the queue filter.
If the user wants to have the apps that are executing and bound to a queue, one 
should specify both the queue and the state parameters.





> RMWebServices returns only RUNNING apps when filtered with queue
> ----------------------------------------------------------------
>
>                 Key: YARN-8659
>                 URL: https://issues.apache.org/jira/browse/YARN-8659
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.7.3
>            Reporter: Prabhu Joseph
>            Assignee: Szilard Nemeth
>            Priority: Major
>         Attachments: Screen Shot 2018-08-13 at 8.01.29 PM.png, Screen Shot 
> 2018-08-13 at 8.01.52 PM.png, YARN-8659.001.patch
>
>
> RMWebServices returns only RUNNING apps when filtered with queue and returns 
> empty apps
> when filtered with both FINISHED states and queue.
> http://pjoseph-script-llap3.openstacklocal:8088/ws/v1/cluster/apps?queue=default
> http://pjoseph-script-llap3.openstacklocal:8088/ws/v1/cluster/apps?states=FINISHED&queue=default



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to