[ 
https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16481341#comment-16481341
 ] 

Haibo Chen commented on YARN-8191:
----------------------------------

Thanks [~grepas] for the patch! I have some comments.

1) In AllocationFileLoaderService, instead of adding a queueManager and 
allowing it to be set, we can add another method, onCheck() to 
AllocationFileLoaderService.Listener.  Then listener.onCheck() can be called in 
each iteration of the while loop in AllocationFileLoaderService. This approach 
is already taken by onReload() method, so we can simply follow it.

2) Can we move `Set<String> removedStaticQueues = 
getRemovedStaticQueues(queueInfo);   
queueMgr.setQueuesToDynamic(removedStaticQueues);` logic into 
QueueManager.updateAllocationConfiguration(), where we handle the queue 
updates?  setQueuesToDynamic() is technically not thread-safe.

3) The patch currently handles the case where some static queues are removed 
from the new allocation file. What about the other case where some dynamic 
queues are not added as static in the new allocation file? Though I think this 
is less likely to happen.

4) I suppose  getIncompatibleQueueName() finds that existing queue that is 
incompatible with the queueToCreate. If that's the case, we need to return 
queue.getName() when the type before and after is different. That is, the code 
should probably
{code:java}
    if (queue != null) {
      if (queue instanceof FSLeafQueue) {
        if (queueType == FSQueueType.LEAF) {
          return null;
        } else {
          return queue.getName();
        }
      } else {
        if (queueType == FSQueueType.PARENT) {
          return null;
        } else {
          return queue.getName();
        }
      }
    }
{code}
5) The removeEmptyIncompatibleQueues(String queueToCreate, FSQueueType 
queueType, boolean defaultReturnValue) seems complicated to me with the newly 
added defaultReturnValue. If you look at removeLeafQueue(), previously it would 
return true if there is no incompatible queue found. But it now returns false.  
I think it illustrates well that the defaultReturnValue adds too much mental 
overhead, but no obvious benefits, IMO.

6) Tracking IncompatibleQueueRemovalTask is probably unnecessary. If an 
incompatible queue cannot be removed because there are some running apps at the 
moment, it will stay in queues so that next time when the  
updateAllocationConfiguration() is called, it will still be identified as an 
incompatible queue and be removed.

 

> Fair scheduler: queue deletion without RM restart
> -------------------------------------------------
>
>                 Key: YARN-8191
>                 URL: https://issues.apache.org/jira/browse/YARN-8191
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: fairscheduler
>    Affects Versions: 3.0.1
>            Reporter: Gergo Repas
>            Assignee: Gergo Repas
>            Priority: Major
>         Attachments: Queue Deletion in Fair Scheduler.pdf, 
> YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch, 
> YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, 
> YARN-8191.006.patch, YARN-8191.007.patch, YARN-8191.008.patch, 
> YARN-8191.009.patch, YARN-8191.010.patch, YARN-8191.011.patch, 
> YARN-8191.012.patch, YARN-8191.013.patch
>
>
> The Fair Scheduler never cleans up queues even if they are deleted in the 
> allocation file, or were dynamically created and are never going to be used 
> again. Queues always remain in memory which leads to two following issues.
>  # Steady fairshares aren’t calculated correctly due to remaining queues
>  # WebUI shows deleted queues, which is confusing for users (YARN-4022).
> We want to support proper queue deletion without restarting the Resource 
> Manager:
>  # Static queues without any entries that are removed from fair-scheduler.xml 
> should be deleted from memory.
>  # Dynamic queues without any entries should be deleted.
>  # RM Web UI should only show the queues defined in the scheduler at that 
> point in time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to