[ 
https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471471#comment-16471471
 ] 

Wilfred Spiegelenburg commented on YARN-8191:
---------------------------------------------

We're almost there.

First the simple one:
{quote}testRemovalOfDynamicParentQueue needs to cover a dynamic parent queue 
without a leaf is removal.
 * How can I create a dynamic parent queue without a leaf? I thought the only 
way to have a parent queue without a leaf is to add it to the allocation config 
with parent="true", but in this case it'd be a static queue.{quote}
It is almost covered in the new test {{testRemovalOfChildlessParentQueue}} but 
I want to mimic what happens in the {{onReload}}.
 BTW: I think we need to swap the two calls around in onReload. First mark 
removed queues as dynamic then update so that the queue manager removes the 
queues immediately if possible.
{quote}We already check all queues defined in the configuration on each reload 
for existence in the updateAllocationConfiguration via the call to 
removeEmptyIncompatibleQueues. If the queue of the correct type exists we 
currently just return. The only thing we should do now before we return is 
unset the isDynamic flag there and there is no need for a separate loop. 
removeEmptyIncompatibleQueues is called for each configured queue with each 
reload.
 * Sorry, I do not understand what you're suggesting here. Could you please 
elaborate a bit more?{quote}
Sure I can:
 When we walk over the list of queues that are loaded from the config in 
{{updateAllocationConfiguration}} we call for every queue that is configured 
the method {{removeEmptyIncompatibleQueues}}. That method will check if the 
queue with that name already exists or not. If the queue exists it also checks 
if the queue is of the correct type (parent or leaf) if the queue is of the 
correct type we currently just return:
{code:java}
    FSQueue queue = queues.get(queueToCreate);
    // Queue exists already.
    if (queue != null) {
      if (queue instanceof FSLeafQueue) {
        if (queueType == FSQueueType.LEAF) {
          // if queue is already a leaf then return true
          return true;
        }
        // remove incompatibility since queue is a leaf currently
        // needs to change to a parent.
        return removeQueueIfEmpty(queue);
      } else {
        if (queueType == FSQueueType.PARENT) {
          return true;
        }
        // If it's an existing parent queue and needs to change to leaf, 
        // remove it if it's empty.
        return removeQueueIfEmpty(queue);
      }
    }
{code}
What I am proposing is to add one line of code per queue type. If the queue 
exists and it is the correct type than we should make sure the dynamic flag is 
set to false.That will have no effect if the queue was already defined in the 
config. However if the queue was created as a dynamic queue it will then turn 
that queue into a queue defined in the configuration. if the queue was already 
In all other cases we remove the queue and create a new one later on which will 
have the correct dynamic flag set.
 So in the above code we would get two lines extra, this is the one for the 
LEAF queues:
{code:java}
        if (queueType == FSQueueType.LEAF) {
          queue.setDynamic(false);
          // if queue is already a leaf then return true
          return true;
        }
{code}
With that change we do not have to do anything in the 
\{{updateAllocationConfiguration}} for static/dynamic change.

Does that make sense?

I just noticed two bugs in that code, neither are newly introduced:
 # if I try to create a LEAF from the config and it is already a LEAF queue we 
should not return true but false. Same for the PARENT check. The return code 
triggers the queue creation which is not needed because the queue already 
exists even with the right type.
 # if the queue exists with the wrong type we try to remove it via 
\{{removeQueueIfEmpty}} The result is passed back without checks and don't 
follow up. If the remove failed I have a queue with the wrong type in the 
system. That leaves the system in an inconsistent state: whatever I have in the 
queue manager is now not what is in the configuration. This should be at least 
logged and probably even throw.

1 we can fix now: just change the return value, 2 probably needs a follow up 
Jira as it is more complex.

> Fair scheduler: queue deletion without RM restart
> -------------------------------------------------
>
>                 Key: YARN-8191
>                 URL: https://issues.apache.org/jira/browse/YARN-8191
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: fairscheduler
>    Affects Versions: 3.0.1
>            Reporter: Gergo Repas
>            Assignee: Gergo Repas
>            Priority: Major
>         Attachments: Queue Deletion in Fair Scheduler.pdf, 
> YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch, 
> YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, 
> YARN-8191.006.patch, YARN-8191.007.patch, YARN-8191.008.patch, 
> YARN-8191.009.patch, YARN-8191.010.patch
>
>
> The Fair Scheduler never cleans up queues even if they are deleted in the 
> allocation file, or were dynamically created and are never going to be used 
> again. Queues always remain in memory which leads to two following issues.
>  # Steady fairshares aren’t calculated correctly due to remaining queues
>  # WebUI shows deleted queues, which is confusing for users (YARN-4022).
> We want to support proper queue deletion without restarting the Resource 
> Manager:
>  # Static queues without any entries that are removed from fair-scheduler.xml 
> should be deleted from memory.
>  # Dynamic queues without any entries should be deleted.
>  # RM Web UI should only show the queues defined in the scheduler at that 
> point in time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to