[
https://issues.apache.org/jira/browse/YARN-8659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604755#comment-16604755
]
Szilard Nemeth edited comment on YARN-8659 at 9/5/18 6:23 PM:
--------------------------------------------------------------
Hi [~Prabhu Joseph]!
I found out the root cause of this bug and I was able to reproduce the bug with
2 testcases.
When {{ClientRMService#getApplications}} is invoked, it first checks whether
the user filters for queues. If yes, it iterates over the specified queues and
retrieves the apps bound to the queue from the scheduler. Then, as a last step,
a tricky iterator is set up, that basically can iterate over the collected
application attempt IDs (since we can have multiple queues and each queue can
have many apps associated to it, it's a list of lists).
See the iterator here:
https://github.com/apache/hadoop/blob/9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java#L834-L859
What essentially is broken is the code that gets the application attempt IDs
from the scheduler:
https://github.com/apache/hadoop/blob/9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java#L829
The scheduler only returns the scheduled applications, but not the finished
ones. This essentially means whatever is specified for the application state
parameter, the code would only give applications back that are currently
executing.
Let's go back what you described above:
1. Just the RUNNING apps are returned if any queue is specified because the
call to {{scheduler.getAppsInQueue\(queue\);}} only returns apps that are
executing.
2. No applications are returned if the queue parameter is specified and the
state parameter is set to FINISHED.
As described above, this is faulty even if you don't specify a state parameter
at all, as the call to {{scheduler.getAppsInQueue\(queue\);}} only returns apps
that are executing, but not the other ones.
So basically, the solution for this is removing the tricky iterator and simply
iterate over the apps retrieved from RMContext.
This should really work as the current code is also getting the applications
from that collection:
https://github.com/apache/hadoop/blob/9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java#L850
We should keep in mind API compatibility, though.
With the current implementation, apps are only returned for a queue if they are
executing.
With the code changes in my patch, if the user specifies the queue filter, the
endpoint returns apps regardless of their states.
If we think about the apps endpoint as a set of filter parameters applied on
applications, it seems to be more logical to return apps bound to a queue,
regardless of what states they have, if the only filter is the queue filter.
If the user wants to have the apps that are executing and bound to a queue, one
should specify both the queue and the state parameters.
[~templedf], [~leftnoteasy], [~haibochen]: Could you please share your opinions
about what's more important? Keeping API compatibility or fix this bug?
Thanks!
was (Author: snemeth):
Hi [~Prabhu Joseph]!
I found out the root cause of this bug and I was able to reproduce the bug with
2 testcases.
When {{ClientRMService#getApplications}} is invoked, it first checks whether
the user filters for queues. If yes, it iterates over the specified queues and
retrieves the apps bound to the queue from the scheduler. Then, as a last step,
a tricky iterator is set up, that basically can iterate over the collected
application attempt IDs (since we can have multiple queues and each queue can
have many apps associated to it, it's a list of lists).
See the iterator here:
https://github.com/apache/hadoop/blob/9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java#L834-L859
What essentially is broken is the code that gets the application attempt IDs
from the scheduler:
https://github.com/apache/hadoop/blob/9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java#L829
The scheduler only returns the scheduled applications, but not the finished
ones. This essentially means whatever is specified for the application state
parameter, the code would only give applications back that are currently
executing.
Let's go back what you described above:
1. Just the RUNNING apps are returned if any queue is specified because the
call to {{scheduler.getAppsInQueue\(queue\);}} only returns apps that are
executing.
2. No applications are returned if the queue parameter is specified and the
state parameter is set to FINISHED.
As described above, this is faulty even if you don't specify a state parameter
at all, as the call to {{scheduler.getAppsInQueue\(queue\);}} only returns apps
that are executing, but not the other ones.
So basically, the solution for this is removing the tricky iterator and simply
iterate over the apps retrieved from RMContext.
This should really work as the current code is also getting the applications
from that collection:
https://github.com/apache/hadoop/blob/9af96d4ed4b6f80d3ca53a2b003d2ef768650dd4/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java#L850
We should keep in mind API compatibility, though.
With the current implementation, apps are only returned for a queue if they are
executing.
With the code changes in my patch, if the user specifies the queue filter, the
endpoint returns apps regardless of their states.
If we think about the apps endpoint as a set of filter parameters applied on
applications, it seems to be more logical to return apps bound to a queue,
regardless of what states they have, if the only filter is the queue filter.
If the user wants to have the apps that are executing and bound to a queue, one
should specify both the queue and the state parameters.
> RMWebServices returns only RUNNING apps when filtered with queue
> ----------------------------------------------------------------
>
> Key: YARN-8659
> URL: https://issues.apache.org/jira/browse/YARN-8659
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Affects Versions: 2.7.3
> Reporter: Prabhu Joseph
> Assignee: Szilard Nemeth
> Priority: Major
> Attachments: Screen Shot 2018-08-13 at 8.01.29 PM.png, Screen Shot
> 2018-08-13 at 8.01.52 PM.png, YARN-8659.001.patch
>
>
> RMWebServices returns only RUNNING apps when filtered with queue and returns
> empty apps
> when filtered with both FINISHED states and queue.
> http://pjoseph-script-llap3.openstacklocal:8088/ws/v1/cluster/apps?queue=default
> http://pjoseph-script-llap3.openstacklocal:8088/ws/v1/cluster/apps?states=FINISHED&queue=default
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]