[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267111#comment-17267111 ]
zhuqi edited comment on YARN-10532 at 1/18/21, 1:27 PM: -------------------------------------------------------- The latest patch, double check the "An additional requirement we should keep in mind: Scenario A: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - Before the signal arrives to scheduler, an app submitted to scheduler (T1). T1 > T0 - When at T2 (T2 > T1), the signal arrived at scheduler, scheduler should avoid removing the queue A because now it is used.{code} Scenario B: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - At T1 (T1 > T0), scheduler got the signal and deleted the queue. - At T2 (T2 > T1), an app submitted to scheduler. Scheduler should immediately recreate the queue, in another word, deleting an dynamic queue should NEVER fail a submitted application.{code} " This will not happen: Scenario A confirmed by : Double check before deletion, pass the latest last submitted time, and get before remove again and compare them. All will in the queue write lock. {code:java} // Double check for the lastSubmitTime has been expired. // In case if now, there is a new submitted app. if (queue instanceof LeafQueue && ((LeafQueue) queue).isDynamicQueue()) { LeafQueue underDeleted = (LeafQueue)queue; if (underDeleted.getLastSubmittedTimestamp() != lastSubmittedTime) { throw new SchedulerDynamicEditException("This should not happen, " + "trying to remove queue= " + childQueuePath + ", however the queue has new submitted apps."); } } else { throw new SchedulerDynamicEditException( "This should not happen, can't remove queue= " + childQueuePath + " is not a leafQueue or not a dynamic queue."); } // Now we can do remove and update this.childQueues.remove(queue); this.scheduler.getCapacitySchedulerQueueManager() .removeQueue(queue.getQueuePath()); {code} Signal will also update this in the write lock: {code:java} @Override public void submitApplication(ApplicationId applicationId, String userName, String queue) throws AccessControlException { // Careful! Locking order is important! validateSubmitApplication(applicationId, userName, queue); // Signal to queue submit time in dynamic queue if (this.isDynamicQueue()) { signalToSubmitToQueue(); } // Inform the parent queue try { getParent().submitApplication(applicationId, userName, queue); } catch (AccessControlException ace) { LOG.info("Failed to submit application to parent-queue: " + getParent().getQueuePath(), ace); throw ace; } } // "Tab" the queue, so this queue won't be removed because of idle timeout. public void signalToSubmitToQueue() { writeLock.lock(); try { this.lastSubmittedTimestamp = System.currentTimeMillis(); } finally { writeLock.unlock(); } } {code} Scenario B confirmed by : in addApplication and addApplicationOnRecovery. {code:java} //- At time T0, policy signals scheduler to delete queue A (an auto created queue). //- At T1 (T1 > T0), scheduler got the signal and deleted the queue. //- At T2 (T2 > T1), an app submitted to scheduler. // //Scheduler should immediately recreate the queue, in another word, // deleting an dynamic queue should NEVER fail a submitted application. // This will not happen, because : // The writelock in addApplication // and in addApplicationOnRecovery. // Will make sure the create and submit atomic. // Also the capacity scheduler writelock will be held in remove logic. private void addApplication(ApplicationId applicationId, String queueName, String user, Priority priority, ApplicationPlacementContext placementContext) { writeLock.lock(); ... } // The remove will hold writelock private CSQueue removeDynamicChildQueue(String childQueuePath, boolean isLeaf, long lastSubmittedTime) throws SchedulerDynamicEditException { writeLock.lock(); ... }{code} Above is for policy auto deletion. When reinitializeQueues, already in capacity scheduler write lock. It will be safe. was (Author: zhuqi): The latest patch, double check the "An additional requirement we should keep in mind: Scenario A: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - Before the signal arrives to scheduler, an app submitted to scheduler (T1). T1 > T0 - When at T2 (T2 > T1), the signal arrived at scheduler, scheduler should avoid removing the queue A because now it is used.{code} Scenario B: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - At T1 (T1 > T0), scheduler got the signal and deleted the queue. - At T2 (T2 > T1), an app submitted to scheduler. Scheduler should immediately recreate the queue, in another word, deleting an dynamic queue should NEVER fail a submitted application.{code} " This will not happen: Scenario A confirmed by : Double check before deletion, pass the latest last submitted time, and get before remove again and compare them. All will in the queue write lock. {code:java} // Double check for the lastSubmitTime has been expired. // In case if now, there is a new submitted app. if (queue instanceof LeafQueue && ((LeafQueue) queue).isDynamicQueue()) { LeafQueue underDeleted = (LeafQueue)queue; if (underDeleted.getLastSubmittedTimestamp() != lastSubmittedTime) { throw new SchedulerDynamicEditException("This should not happen, " + "trying to remove queue= " + childQueuePath + ", however the queue has new submitted apps."); } } else { throw new SchedulerDynamicEditException( "This should not happen, can't remove queue= " + childQueuePath + " is not a leafQueue or not a dynamic queue."); } // Now we can do remove and update this.childQueues.remove(queue); this.scheduler.getCapacitySchedulerQueueManager() .removeQueue(queue.getQueuePath()); {code} Signal will also update this in the write lock: {code:java} @Override public void submitApplication(ApplicationId applicationId, String userName, String queue) throws AccessControlException { // Careful! Locking order is important! validateSubmitApplication(applicationId, userName, queue); // Signal to queue submit time in dynamic queue if (this.isDynamicQueue()) { signalToSubmitToQueue(); } // Inform the parent queue try { getParent().submitApplication(applicationId, userName, queue); } catch (AccessControlException ace) { LOG.info("Failed to submit application to parent-queue: " + getParent().getQueuePath(), ace); throw ace; } } // "Tab" the queue, so this queue won't be removed because of idle timeout. public void signalToSubmitToQueue() { writeLock.lock(); try { this.lastSubmittedTimestamp = System.currentTimeMillis(); } finally { writeLock.unlock(); } } {code} Scenario B confirmed by : in addApplication and addApplicationOnRecovery. {code:java} //- At time T0, policy signals scheduler to delete queue A (an auto created queue). //- At T1 (T1 > T0), scheduler got the signal and deleted the queue. //- At T2 (T2 > T1), an app submitted to scheduler. // //Scheduler should immediately recreate the queue, in another word, // deleting an dynamic queue should NEVER fail a submitted application. // This will not happen, because : // The writelock in addApplication // and in addApplicationOnRecovery. // Will make sure the create and submit atomic. // Also the capacity scheduler writelock will be held in remove logic. private void addApplication(ApplicationId applicationId, String queueName, String user, Priority priority, ApplicationPlacementContext placementContext) { writeLock.lock(); ... } // The remove will hold writelock private CSQueue removeDynamicChildQueue(String childQueuePath, boolean isLeaf, long lastSubmittedTime) throws SchedulerDynamicEditException { writeLock.lock(); ... }{code} When reinitializeQueues, already in capacity scheduler write lock. > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > -------------------------------------------------------------------------------------------- > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Wangda Tan > Assignee: zhuqi > Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch, YARN-10532.004.patch > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org