[
https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16468939#comment-16468939
]
Wilfred Spiegelenburg commented on YARN-8191:
---------------------------------------------
# Can we add a test for this case please? I do not see this specific case in
the tests covered, testNonEmptyStaticQueueBecomingDynamicQueue does not cover
this use case
# OK
# We already check all queues defined in the configuration on each reload for
existence in the {{updateAllocationConfiguration}} via the call to
{{removeEmptyIncompatibleQueues}}. If the queue of the correct type exists we
currently just return. The only thing we should do now before we return is
unset the isDynamic flag there and there is no need for a separate loop.
{{removeEmptyIncompatibleQueues}} is called for each configured queue with each
reload.
# Keeping the same list in two places does not make sense. You now need to keep
them in sync and it complicates the code in {{updateAllocationConfiguration}}
which is not needed. We have also seen clusters with 100's of dynamic queues
when user based queues are used so the loops can become long. Please move the
logic for generating the diff to the {{onReload}} method where we have the 2
copies as described above.
# OK
# OK, I'll file a follow up jira for that one.
My first review of the tests:
* In assertNotNull should use the q1 object that is returned, multiple tests
* testRemovalOfDynamicLeafQueue needs to cover a dynamic leaf queue removal
which has a static parent, and make sure the parent is still there. It should
also check if a non empty queue is not removed.
* testRemovalOfDynamicParentQueue needs to cover a dynamic parent queue without
a leaf is removal.
* testNonEmptyDynamicQueueBecomingStaticQueue is missing a check after the
configuration update for isDynamic or not. The test name does not really cover
the test you do unless you do that. The queue being empty or not does not
really matter.
* testNonEmptyStaticQueueBecomingDynamicQueue: "root.test.childA" is shown as
"root.queue1 is not a static queue" in the assert message. Check the order and
comments in the method, they seem a bit out of order.
* We need to cover a parent queue in the tests: dynamic parent + dynamic leaf
(no apps) changing to static. Leaf and parent must be static after the change
and the other way around (testQueueTypeChange ?)
* We need a test for the updating of the assignedApp in the FSLeafQueue and
make sure isEmpty is working OK.
> Fair scheduler: queue deletion without RM restart
> -------------------------------------------------
>
> Key: YARN-8191
> URL: https://issues.apache.org/jira/browse/YARN-8191
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: fairscheduler
> Affects Versions: 3.0.1
> Reporter: Gergo Repas
> Assignee: Gergo Repas
> Priority: Major
> Attachments: Queue Deletion in Fair Scheduler.pdf,
> YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch,
> YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch,
> YARN-8191.006.patch, YARN-8191.007.patch, YARN-8191.008.patch
>
>
> The Fair Scheduler never cleans up queues even if they are deleted in the
> allocation file, or were dynamically created and are never going to be used
> again. Queues always remain in memory which leads to two following issues.
> # Steady fairshares aren’t calculated correctly due to remaining queues
> # WebUI shows deleted queues, which is confusing for users (YARN-4022).
> We want to support proper queue deletion without restarting the Resource
> Manager:
> # Static queues without any entries that are removed from fair-scheduler.xml
> should be deleted from memory.
> # Dynamic queues without any entries should be deleted.
> # RM Web UI should only show the queues defined in the scheduler at that
> point in time.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]