[
https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471471#comment-16471471
]
Wilfred Spiegelenburg commented on YARN-8191:
---------------------------------------------
We're almost there.
First the simple one:
{quote}testRemovalOfDynamicParentQueue needs to cover a dynamic parent queue
without a leaf is removal.
* How can I create a dynamic parent queue without a leaf? I thought the only
way to have a parent queue without a leaf is to add it to the allocation config
with parent="true", but in this case it'd be a static queue.{quote}
It is almost covered in the new test {{testRemovalOfChildlessParentQueue}} but
I want to mimic what happens in the {{onReload}}.
BTW: I think we need to swap the two calls around in onReload. First mark
removed queues as dynamic then update so that the queue manager removes the
queues immediately if possible.
{quote}We already check all queues defined in the configuration on each reload
for existence in the updateAllocationConfiguration via the call to
removeEmptyIncompatibleQueues. If the queue of the correct type exists we
currently just return. The only thing we should do now before we return is
unset the isDynamic flag there and there is no need for a separate loop.
removeEmptyIncompatibleQueues is called for each configured queue with each
reload.
* Sorry, I do not understand what you're suggesting here. Could you please
elaborate a bit more?{quote}
Sure I can:
When we walk over the list of queues that are loaded from the config in
{{updateAllocationConfiguration}} we call for every queue that is configured
the method {{removeEmptyIncompatibleQueues}}. That method will check if the
queue with that name already exists or not. If the queue exists it also checks
if the queue is of the correct type (parent or leaf) if the queue is of the
correct type we currently just return:
{code:java}
FSQueue queue = queues.get(queueToCreate);
// Queue exists already.
if (queue != null) {
if (queue instanceof FSLeafQueue) {
if (queueType == FSQueueType.LEAF) {
// if queue is already a leaf then return true
return true;
}
// remove incompatibility since queue is a leaf currently
// needs to change to a parent.
return removeQueueIfEmpty(queue);
} else {
if (queueType == FSQueueType.PARENT) {
return true;
}
// If it's an existing parent queue and needs to change to leaf,
// remove it if it's empty.
return removeQueueIfEmpty(queue);
}
}
{code}
What I am proposing is to add one line of code per queue type. If the queue
exists and it is the correct type than we should make sure the dynamic flag is
set to false.That will have no effect if the queue was already defined in the
config. However if the queue was created as a dynamic queue it will then turn
that queue into a queue defined in the configuration. if the queue was already
In all other cases we remove the queue and create a new one later on which will
have the correct dynamic flag set.
So in the above code we would get two lines extra, this is the one for the
LEAF queues:
{code:java}
if (queueType == FSQueueType.LEAF) {
queue.setDynamic(false);
// if queue is already a leaf then return true
return true;
}
{code}
With that change we do not have to do anything in the
\{{updateAllocationConfiguration}} for static/dynamic change.
Does that make sense?
I just noticed two bugs in that code, neither are newly introduced:
# if I try to create a LEAF from the config and it is already a LEAF queue we
should not return true but false. Same for the PARENT check. The return code
triggers the queue creation which is not needed because the queue already
exists even with the right type.
# if the queue exists with the wrong type we try to remove it via
\{{removeQueueIfEmpty}} The result is passed back without checks and don't
follow up. If the remove failed I have a queue with the wrong type in the
system. That leaves the system in an inconsistent state: whatever I have in the
queue manager is now not what is in the configuration. This should be at least
logged and probably even throw.
1 we can fix now: just change the return value, 2 probably needs a follow up
Jira as it is more complex.
> Fair scheduler: queue deletion without RM restart
> -------------------------------------------------
>
> Key: YARN-8191
> URL: https://issues.apache.org/jira/browse/YARN-8191
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: fairscheduler
> Affects Versions: 3.0.1
> Reporter: Gergo Repas
> Assignee: Gergo Repas
> Priority: Major
> Attachments: Queue Deletion in Fair Scheduler.pdf,
> YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch,
> YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch,
> YARN-8191.006.patch, YARN-8191.007.patch, YARN-8191.008.patch,
> YARN-8191.009.patch, YARN-8191.010.patch
>
>
> The Fair Scheduler never cleans up queues even if they are deleted in the
> allocation file, or were dynamically created and are never going to be used
> again. Queues always remain in memory which leads to two following issues.
> # Steady fairshares aren’t calculated correctly due to remaining queues
> # WebUI shows deleted queues, which is confusing for users (YARN-4022).
> We want to support proper queue deletion without restarting the Resource
> Manager:
> # Static queues without any entries that are removed from fair-scheduler.xml
> should be deleted from memory.
> # Dynamic queues without any entries should be deleted.
> # RM Web UI should only show the queues defined in the scheduler at that
> point in time.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]