[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17294951#comment-17294951 ] Qi Zhu edited comment on YARN-10532 at 3/4/21, 8:38 AM: Thanks [~gandras] for last confirm. [~pbacsko] [~snemeth] I have rebased for merge. If you can commit, when you are free. :D Thanks. was (Author: zhuqi): Thanks [~gandras] for last confirm. [~pbacsko] [~snemeth] I have rebased for merge, If you can commit, when you are free. :D Thanks. > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, > YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch, > YARN-10532.009.patch, YARN-10532.010.patch, YARN-10532.011.patch, > YARN-10532.012.patch, YARN-10532.013.patch, YARN-10532.014.patch, > YARN-10532.015.patch, YARN-10532.016.patch, YARN-10532.017.patch, > YARN-10532.018.patch, YARN-10532.019.patch, YARN-10532.020.patch, > YARN-10532.021.patch, YARN-10532.022.patch, YARN-10532.023.patch, > YARN-10532.024.patch, YARN-10532.025.patch, YARN-10532.026.patch, > YARN-10532.027.patch, image-2021-02-12-21-32-02-267.png > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17294951#comment-17294951 ] Qi Zhu edited comment on YARN-10532 at 3/4/21, 5:40 AM: Thanks [~gandras] for last confirm. [~pbacsko] [~snemeth] I have rebased for merge, If you can commit, when you are free. :D Thanks. was (Author: zhuqi): Thanks [~gandras] for last confirm. [~pbacsko] [~snemeth] If you can commit, when you are free.:D Thanks. > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, > YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch, > YARN-10532.009.patch, YARN-10532.010.patch, YARN-10532.011.patch, > YARN-10532.012.patch, YARN-10532.013.patch, YARN-10532.014.patch, > YARN-10532.015.patch, YARN-10532.016.patch, YARN-10532.017.patch, > YARN-10532.018.patch, YARN-10532.019.patch, YARN-10532.020.patch, > YARN-10532.021.patch, YARN-10532.022.patch, YARN-10532.023.patch, > YARN-10532.024.patch, YARN-10532.025.patch, YARN-10532.026.patch, > YARN-10532.027.patch, image-2021-02-12-21-32-02-267.png > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293113#comment-17293113 ] Peter Bacsko edited comment on YARN-10532 at 3/1/21, 7:16 PM: -- [~zhuqi] please fix the things that [~bteke] mentioned and I think I'll commit this patch to trunk. Also pay attention to the remaining checkstyle issues. was (Author: pbacsko): [~zhuqi] please fix the things that [~bteke] mentioned and I think I'll commit this patch to trunk. > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, > YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch, > YARN-10532.009.patch, YARN-10532.010.patch, YARN-10532.011.patch, > YARN-10532.012.patch, YARN-10532.013.patch, YARN-10532.014.patch, > YARN-10532.015.patch, YARN-10532.016.patch, YARN-10532.017.patch, > YARN-10532.018.patch, YARN-10532.019.patch, YARN-10532.020.patch, > YARN-10532.021.patch, YARN-10532.022.patch, YARN-10532.023.patch, > YARN-10532.024.patch, YARN-10532.025.patch, image-2021-02-12-21-32-02-267.png > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292953#comment-17292953 ] Qi Zhu edited comment on YARN-10532 at 3/1/21, 3:29 PM: Thanks a lot [~gandras] for final check. The yarn.scheduler.capacity is actually in the property, this is queue level setting for disabled auto deletion, the default is true and reasonable: {code:java} @Private public boolean isAutoExpiredDeletionEnabled(String queuePath) { boolean isAutoExpiredDeletionEnabled = getBoolean( getQueuePrefix(queuePath) + AUTO_CREATE_CHILD_QUEUE_AUTO_REMOVAL_ENABLE, DEFAULT_AUTO_CREATE_CHILD_QUEUE_AUTO_REMOVAL_ENABLE); return isAutoExpiredDeletionEnabled; } public static String getQueuePrefix(String queue) { String queueName = PREFIX + queue + DOT; return queueName; } @Private public static final String PREFIX = "yarn.scheduler.capacity."; {code} Other things are fixed in latest patch. [~pbacsko] If you any other advice? Thanks. was (Author: zhuqi): Thanks a lot [~gandras] for final check. The yarn.scheduler.capacity is actually in the property, this is queue level setting for disabled auto deletion: {code:java} @Private public boolean isAutoExpiredDeletionEnabled(String queuePath) { boolean isAutoExpiredDeletionEnabled = getBoolean( getQueuePrefix(queuePath) + AUTO_CREATE_CHILD_QUEUE_AUTO_REMOVAL_ENABLE, DEFAULT_AUTO_CREATE_CHILD_QUEUE_AUTO_REMOVAL_ENABLE); return isAutoExpiredDeletionEnabled; } public static String getQueuePrefix(String queue) { String queueName = PREFIX + queue + DOT; return queueName; } @Private public static final String PREFIX = "yarn.scheduler.capacity."; {code} Other things are fixed in latest patch. [~pbacsko] If you any other advice? Thanks. > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, > YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch, > YARN-10532.009.patch, YARN-10532.010.patch, YARN-10532.011.patch, > YARN-10532.012.patch, YARN-10532.013.patch, YARN-10532.014.patch, > YARN-10532.015.patch, YARN-10532.016.patch, YARN-10532.017.patch, > YARN-10532.018.patch, YARN-10532.019.patch, YARN-10532.020.patch, > YARN-10532.021.patch, YARN-10532.022.patch, YARN-10532.023.patch, > YARN-10532.024.patch, YARN-10532.025.patch, image-2021-02-12-21-32-02-267.png > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292953#comment-17292953 ] Qi Zhu edited comment on YARN-10532 at 3/1/21, 3:28 PM: Thanks a lot [~gandras] for final check. The yarn.scheduler.capacity is actually in the property, this is queue level setting for disabled auto deletion: {code:java} @Private public boolean isAutoExpiredDeletionEnabled(String queuePath) { boolean isAutoExpiredDeletionEnabled = getBoolean( getQueuePrefix(queuePath) + AUTO_CREATE_CHILD_QUEUE_AUTO_REMOVAL_ENABLE, DEFAULT_AUTO_CREATE_CHILD_QUEUE_AUTO_REMOVAL_ENABLE); return isAutoExpiredDeletionEnabled; } public static String getQueuePrefix(String queue) { String queueName = PREFIX + queue + DOT; return queueName; } @Private public static final String PREFIX = "yarn.scheduler.capacity."; {code} Other things are fixed in latest patch. [~pbacsko] If you any other advice? Thanks. was (Author: zhuqi): Thanks a lot [~gandras] for final check. The yarn.scheduler.capacity in the property is actually in the property, this is queue level setting for disabled auto deletion: {code:java} @Private public boolean isAutoExpiredDeletionEnabled(String queuePath) { boolean isAutoExpiredDeletionEnabled = getBoolean( getQueuePrefix(queuePath) + AUTO_CREATE_CHILD_QUEUE_AUTO_REMOVAL_ENABLE, DEFAULT_AUTO_CREATE_CHILD_QUEUE_AUTO_REMOVAL_ENABLE); return isAutoExpiredDeletionEnabled; } public static String getQueuePrefix(String queue) { String queueName = PREFIX + queue + DOT; return queueName; } @Private public static final String PREFIX = "yarn.scheduler.capacity."; {code} Other things are fixed in latest patch. [~pbacsko] If you any other advice? Thanks. > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, > YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch, > YARN-10532.009.patch, YARN-10532.010.patch, YARN-10532.011.patch, > YARN-10532.012.patch, YARN-10532.013.patch, YARN-10532.014.patch, > YARN-10532.015.patch, YARN-10532.016.patch, YARN-10532.017.patch, > YARN-10532.018.patch, YARN-10532.019.patch, YARN-10532.020.patch, > YARN-10532.021.patch, YARN-10532.022.patch, YARN-10532.023.patch, > YARN-10532.024.patch, YARN-10532.025.patch, image-2021-02-12-21-32-02-267.png > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289968#comment-17289968 ] Peter Bacsko edited comment on YARN-10532 at 2/24/21, 8:12 PM: --- FIRST round review. I might post more but these are that stand out to me right now. 1. AbstractYarnScheduler: {noformat} public void removeQueue(CSQueue queueName) throws YarnException { throw new YarnException(getClass().getSimpleName() + " does not support removing queues"); } {noformat} If this is an abstract class, just make this method abstract without implementation: {{public abstract void removeQueue(CSQueue queueName) throws YarnException;}} 2. {noformat} // When this queue has application submit to? // This property only applies to dynamic queue, // and will be used to check when the queue need to be removed. {noformat} Rephrase this comment a little bit: {noformat} // The timestamp of the last submitted application to this queue. // Only applies to dynamic queues. {noformat} 3. {noformat} // "Tab" the queue, so this queue won't be removed because of idle timeout. public void signalToSubmitToQueue() { {noformat} I'd comment that "Update the timestamp of the last submitted application". Also, the method name sounds weird to me. What it does is really simple. Call it {{updateLastSubmittedTimeStamp()}}. If you use the right naming, then the comment is probably unnecessary. We don't need comments if the method is simple and easy to understand its purpose. 4. Instead of this: {noformat} // just for test public void setLastSubmittedTimestamp(long lastSubmittedTimestamp) { {noformat} use this: {noformat} @VisibleForTesting public void setLastSubmittedTimestamp(long lastSubmittedTimestamp) { {noformat} 5. This comment is completely unnecessary I think: {noformat} // Expired queue, when there are no applications in queue, // and the last submit time has been expired. // Delete queue when expired deletion enabled. {noformat} It's obvious what the method is doing. Or if you insist on having a comment there, just add "Timeout expired, delete the dynamic queue" 6. I suggest a better exception message: {noformat} throw new SchedulerDynamicEditException( "The queue " + queue.getQueuePath() + " can't removed normally."); {noformat} It should say "The queue ABC cannot be removed because it's parent is null". 7. {{LOG.info("Removed queue: " + queue.getQueuePath());}} – not necessary to log a successful removal. If there is no message, it means that the removal was successful. 8. Typo in comment: {{// 300s for expired defualt}} --> "default" 9. These methods are used by the code itself, not just test: {noformat} @VisibleForTesting public void prepareForAutoDeletion() { ... @VisibleForTesting public void triggerAutoDeletionForExpiredQueues() { {noformat} So "VisibleForTesting" should be removed. 10. {noformat} private void queueAutoDeletion(CSQueue checkQueue) { //Scheduler update is asynchronous if (checkQueue != null) { {noformat} Three things: * {{queueAutoDeletion()}} - this method is a noun. Ideally, methods begin with a verb. For example "deleteDynamicQueue()" or "deleteAutoCreatedQueue()". * Also, why is it called "checkQueue"? Just call it "queue". * The comment is confusing: "Scheduler update is asynchronous". Why is it there? This statement does not tell me anything in this context. Does it refer to the null-check? 11. {noformat} @Before public void setUp() throws Exception { // The expired time for deletion will be 1s super.setUp(); } {noformat} This method is unnecessary, the setUp() method in the super class will be called anyway. 12. Test methods: {{testEditSchedule}}, {{testCapacitySchedulerAutoQueueDeletion}}, {{testCapacitySchedulerAutoQueueDeletionDisabled}} These test methods are long, but it's not my main problem. There are {{Thread.sleep()}} calls inside. This is really problematic, especially short sleeps like {{Thread.sleep(100)}}. I have fixed many flaky tests where the test code were full of {{Thread.sleep()}}. This must be avoided whever possible. We should come up with a better solution, eg. polling a certain state regularly, for example: {noformat} GenericTestUtils.waitFor(someObject.isConditionTrue(), 500, 10_000); {noformat} This method calls {{someObject.isConditionTrue()}} in every 500ms and it times out after 10 seconds. In case of a timeout, a {{TimeoutException}} will be thrown. was (Author: pbacsko): FIRST round review. I might post more but these are that stand out to me
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283691#comment-17283691 ] Qi Zhu edited comment on YARN-10532 at 2/13/21, 2:11 AM: - !image-2021-02-12-21-32-02-267.png|width=1085,height=764! cc [~gandras] [~snemeth] [~ztang] [~bteke] [~shuzirra] [~pbacsko] [~epayne] I have also confirmed it in my test cluster, i think this is very import for Auto created queue, if you could help review the latest patch? Thanks. was (Author: zhuqi): !image-2021-02-12-21-32-02-267.png|width=1085,height=764! cc [~gandras] [~snemeth] [~ztang] [~epayne] I have also confirmed it in my test cluster, i think this is very import for Auto created queue, if you could help review the latest patch? Thanks. > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, > YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch, > YARN-10532.009.patch, YARN-10532.010.patch, YARN-10532.011.patch, > YARN-10532.012.patch, YARN-10532.013.patch, YARN-10532.014.patch, > YARN-10532.015.patch, YARN-10532.016.patch, YARN-10532.017.patch, > YARN-10532.018.patch, YARN-10532.019.patch, YARN-10532.020.patch, > image-2021-02-12-21-32-02-267.png > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283691#comment-17283691 ] Qi Zhu edited comment on YARN-10532 at 2/12/21, 2:15 PM: - !image-2021-02-12-21-32-02-267.png|width=1085,height=764! cc [~gandras] [~snemeth] [~ztang] [~epayne] I have also confirmed it in my test cluster, i think this is very import for Auto created queue, if you could help review the latest patch? Thanks. was (Author: zhuqi): !image-2021-02-12-21-32-02-267.png|width=1085,height=764! cc [~gandras] [~snemeth] [~ztang] [~epayne] I have also confirmed it in my test cluster, i think this is very import for Auto created leaf queue, if you could help review the latest patch? Thanks. > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, > YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch, > YARN-10532.009.patch, YARN-10532.010.patch, YARN-10532.011.patch, > YARN-10532.012.patch, YARN-10532.013.patch, YARN-10532.014.patch, > YARN-10532.015.patch, YARN-10532.016.patch, YARN-10532.017.patch, > YARN-10532.018.patch, YARN-10532.019.patch, YARN-10532.020.patch, > image-2021-02-12-21-32-02-267.png > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283691#comment-17283691 ] Qi Zhu edited comment on YARN-10532 at 2/12/21, 2:15 PM: - !image-2021-02-12-21-32-02-267.png|width=1085,height=764! cc [~gandras] [~snemeth] [~ztang] [~epayne] I have also confirmed it in my test cluster, i think this is very import for Auto created leaf queue, if you could help review the latest patch? Thanks. was (Author: zhuqi): !image-2021-02-12-21-32-02-267.png|width=1085,height=764! cc [~gandras] [~snemeth] [~ztang] [~epayne] I have also confirmed it in my test cluster, if you could help review the latest patch? Thanks. > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, > YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch, > YARN-10532.009.patch, YARN-10532.010.patch, YARN-10532.011.patch, > YARN-10532.012.patch, YARN-10532.013.patch, YARN-10532.014.patch, > YARN-10532.015.patch, YARN-10532.016.patch, YARN-10532.017.patch, > YARN-10532.018.patch, YARN-10532.019.patch, YARN-10532.020.patch, > image-2021-02-12-21-32-02-267.png > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17283691#comment-17283691 ] Qi Zhu edited comment on YARN-10532 at 2/12/21, 1:34 PM: - !image-2021-02-12-21-32-02-267.png|width=1085,height=764! cc [~gandras] [~snemeth] [~ztang] [~epayne] I have also confirmed it in my test cluster, if you could help review the latest patch? Thanks. was (Author: zhuqi): !image-2021-02-12-21-32-02-267.png|width=1085,height=764! cc [~gandras] [~snemeth] [~ztang] [~epayne] I have also confirmed it in my test cluster, if you could review it ? Thanks. > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, > YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch, > YARN-10532.009.patch, YARN-10532.010.patch, YARN-10532.011.patch, > YARN-10532.012.patch, YARN-10532.013.patch, YARN-10532.014.patch, > YARN-10532.015.patch, YARN-10532.016.patch, YARN-10532.017.patch, > YARN-10532.018.patch, YARN-10532.019.patch, YARN-10532.020.patch, > image-2021-02-12-21-32-02-267.png > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280899#comment-17280899 ] Qi Zhu edited comment on YARN-10532 at 2/11/21, 3:13 PM: - [~gandras] I have updated it in testAutoCreateQueueAfterRemoval in latest patch, Thanks a lot for your patient review. cc [~wangda] [~ztang] [~epayne] [~snemeth] [~bteke] [~shuzirra] [~ebadger] Could you help review latest patch? was (Author: zhuqi): [~gandras] I have updated it in testAutoCreateQueueAfterRemoval in latest patch, Thanks a lot for your patient review. cc [~wangda] [~epayne] [~snemeth] [~bteke] [~shuzirra] [~ebadger] Could you help review latest patch? > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, > YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch, > YARN-10532.009.patch, YARN-10532.010.patch, YARN-10532.011.patch, > YARN-10532.012.patch, YARN-10532.013.patch, YARN-10532.014.patch, > YARN-10532.015.patch, YARN-10532.016.patch, YARN-10532.017.patch, > YARN-10532.018.patch, YARN-10532.019.patch, YARN-10532.020.patch > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280899#comment-17280899 ] Qi Zhu edited comment on YARN-10532 at 2/8/21, 1:11 PM: [~gandras] I have updated it in testAutoCreateQueueAfterRemoval in latest patch, Thanks a lot for your patient review. cc [~wangda] [~epayne] [~snemeth] [~bteke] [~shuzirra] [~ebadger] Could you help review latest patch? was (Author: zhuqi): [~gandras] I have updated it in testAutoCreateQueueAfterRemoval in latest patch, Thanks a lot for your patient review. cc [~wangda] [~epayne] [~snemeth] [~bteke] [~shuzirra] Could you help review latest patch? > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Qi Zhu >Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, > YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch, > YARN-10532.009.patch, YARN-10532.010.patch, YARN-10532.011.patch, > YARN-10532.012.patch, YARN-10532.013.patch, YARN-10532.014.patch, > YARN-10532.015.patch, YARN-10532.016.patch, YARN-10532.017.patch, > YARN-10532.018.patch, YARN-10532.019.patch, YARN-10532.020.patch > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276811#comment-17276811 ] zhuqi edited comment on YARN-10532 at 2/2/21, 5:40 AM: --- Thanks a lot for [~gandras] patient review, it make sense to me. # I used to think, parent should also check last submit time passed by leaf queue, it seems this is not needed. # But i think we should check all applications in leaf queue, not only "getNumActiveApplications() == 0". # The garbage collector in AutoDeletionForExpiredQueuePolicy will make this more clear, i agree. But i think the logic should not in the for cycle: {code:java} Set newMarks = new HashSet<>(); for (Map.Entry queueEntry : scheduler.getCapacitySchedulerQueueManager().getQueues().entrySet()) { String queuePath = queueEntry.getKey(); CSQueue queue = queueEntry.getValue(); if (queue instanceof AbstractCSQueue && ((AbstractCSQueue) queue).isEligibleForAutoDeletion()) { if (markedForDeletion.contains(queuePath)) { sentForDeletion.add(queuePath); markedForDeletion.remove(queuePath); } else { newMarks.add(queuePath); } } } markedForDeletion.clear(); markedForDeletion.addAll(newMarks); {code} I will update a new patch later today, your suggestions are very valid, i am glad to work with you.:) was (Author: zhuqi): Thanks a lot for [~gandras] patient review, it make sense to me. # I used to think, parent should also check last submit time passed by leaf queue, it seems this is not needed. # But i think we should check all applications in leaf queue, not only "getNumActiveApplications() == 0". # The garbage collector in AutoDeletionForExpiredQueuePolicy will make this more clear, i agree. I will update a new patch later today, your suggestions are very valid, i am glad to work with you.:) > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: zhuqi >Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, > YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272968#comment-17272968 ] Andras Gyori edited comment on YARN-10532 at 1/27/21, 4:38 PM: --- Thank you [~zhuqi] for the patch. I have come up with several points regarding this approach * In my opinion, implementing auto queue deletion for the legacy auto queue logic is not justified. Old CS users have their own way of keeping their queue hierarchy clean, thus providing this feature would be of little use for them. As for new CS users, they are encouraged to use the new auto queue creation. We should encourage the userbase to move away from ManagedParents, as the code is hard to maintain and very hard to reason about. All in all, I would refrain from adding new features to the old auto queue creation. * I think the approach chosen for this patch is hard to maintain because: ** Does not have a central point where the dynamic queue deletion happens (this was a major pain point of weight calculation as well, we should not repeat this mistake again). QueueManagementChanges and updateQueues both have twisted logic, which reduces readability. ** It does not cover all cases. If I understand correctly, the auto deletion only triggered if CS is reinitialised or a queue management change occurs. In my opinion, we should not rely on events of the users, which may, or may not happen. ** It does not handle deletion of ParentQueues. I think childless ParentQueues should get removed as well. My idea of implementing automatic queue deletion somewhat similar to a garbage collector: # Run a background thread, that periodically checks the whole queue hierarchy (maybe we could store the references of all the dynamic queues, in order to eliminate the cost of traversing the hierarchy) # Store the timestamp of the last submitted application immediately after receiving the queue in CS # Store the timestamp when a dynamic reaches 0 application (either in the queue itself or in an external map) # If a queue still has 0 active application after a duration of a configured expiration time: ## Check, if the last submitted application is also above this threshold ## If yes, delete it ## If no, it means, that an application has been submitted, but not active yet, thus we need to reset the expiration timer for this queue # Remove dynamic ParentQueues the same way, but instead of checking active applications, check the number of children To avoid any race condition in the scheduler, we should use the ReadWriteLock of CS during queue deletion. was (Author: gandras): Thank you [~zhuqi] for the patch. I have come up with several points regarding this approach * In my opinion, implementing auto queue deletion for the legacy auto queue logic is not justified. Old CS users have their own way of keeping their queue hierarchy clean, thus providing this feature would be of little use for them. As for new CS users, they are encouraged to use the new auto queue creation. We should encourage the userbase to move away from ManagedParents, as the code is hard to maintain and very hard to reason about. All in all, I would refrain from adding new features to the old auto queue creation. * I think the approach chosen for this patch is hard to maintain because: ** Does not have a central point where the dynamic queue deletion happens (this was a major pain point of weight calculation as well, we should not repeat this mistake again). QueueManagementChanges and updateQueues both have twisted logic, which reduces readability. ** It does not cover all cases. If I understand correctly, the auto deletion only triggered if CS is reinitialised or a queue management change occurs. In my opinion, we should not rely on events of the users, which may, or may not happen. ** It does not handle deletion of ParentQueues. I think childless ParentQueues should get removed as well. My idea of implementing automatic queue deletion somewhat similar to a garbage collector: # Run a background thread, that periodically checks the whole queue hierarchy (maybe we could store the references of all the dynamic queues, in order to eliminate the cost of traversing the hierarchy) # Store the timestamp when a dynamic reaches 0 application (either in the queue itself or in an external map) # Mark the queues for deletion, that has been without application for a configured time ## Marking introduces a grace period, to avoid race conditions (namely, delete a queue in the same as an application has been submitted ## Application submission to marked queues should be rejected or make the mapping rules step to the next rule # After the grace period, check that the marked queues does not have any application running, and: # ## Delete, if active application number is still == 0 ## Remove mark and timestamp if active
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17272968#comment-17272968 ] Andras Gyori edited comment on YARN-10532 at 1/27/21, 4:28 PM: --- Thank you [~zhuqi] for the patch. I have come up with several points regarding this approach * In my opinion, implementing auto queue deletion for the legacy auto queue logic is not justified. Old CS users have their own way of keeping their queue hierarchy clean, thus providing this feature would be of little use for them. As for new CS users, they are encouraged to use the new auto queue creation. We should encourage the userbase to move away from ManagedParents, as the code is hard to maintain and very hard to reason about. All in all, I would refrain from adding new features to the old auto queue creation. * I think the approach chosen for this patch is hard to maintain because: ** Does not have a central point where the dynamic queue deletion happens (this was a major pain point of weight calculation as well, we should not repeat this mistake again). QueueManagementChanges and updateQueues both have twisted logic, which reduces readability. ** It does not cover all cases. If I understand correctly, the auto deletion only triggered if CS is reinitialised or a queue management change occurs. In my opinion, we should not rely on events of the users, which may, or may not happen. ** It does not handle deletion of ParentQueues. I think childless ParentQueues should get removed as well. My idea of implementing automatic queue deletion somewhat similar to a garbage collector: # Run a background thread, that periodically checks the whole queue hierarchy (maybe we could store the references of all the dynamic queues, in order to eliminate the cost of traversing the hierarchy) # Store the timestamp when a dynamic reaches 0 application (either in the queue itself or in an external map) # Mark the queues for deletion, that has been without application for a configured time ## Marking introduces a grace period, to avoid race conditions (namely, delete a queue in the same as an application has been submitted ## Application submission to marked queues should be rejected or make the mapping rules step to the next rule # After the grace period, check that the marked queues does not have any application running, and: # ## Delete, if active application number is still == 0 ## Remove mark and timestamp if active application number > 0 # Remove dynamic ParentQueues the same way, but instead of checking active applications, check the number of children Now, I see that marking would introduce a surprising behaviour, but I can not come up with a way that is less disruptive and solves the race condition at the same time. was (Author: gandras): Thank you [~zhuqi] for the patch. I have come up with several points regarding this approach * In my opinion, implementing auto queue deletion for the legacy auto queue logic is not justified. Old CS users have their own way of keeping their queue hierarchy clean, thus providing this feature would be of little use for them. As for new CS users, they are encouraged to use the new auto queue creation. We should encourage the userbase to move away from ManagedParents, as the code is hard to maintain and very hard to reason about. * I think the approach chosen for this patch is hard to maintain because: ** Does not have a central point where the dynamic queue deletion happens (this was a major pain point of weight calculation as well, we should not repeat this mistake again). QueueManagementChanges and updateQueues both have twisted logic, which reduces readability. ** It does not cover all cases. If I understand correctly, the auto deletion only triggered if CS is reinitialised or a queue management change occurs. In my opinion, we should not rely on events of the users, which may, or may not happen. ** It does not handle deletion of ParentQueues. I think childless ParentQueues should get removed as well. My idea of implementing automatic queue deletion somewhat similar to a garbage collector: # Run a background thread, that periodically checks the whole queue hierarchy (maybe we could store the references of all the dynamic queues, in order to eliminate the cost of traversing the hierarchy) # Store the timestamp when a dynamic reaches 0 application (either in the queue itself or in an external map) # Mark the queues for deletion, that has been without application for a configured time ## Marking introduces a grace period, to avoid race conditions (namely, delete a queue in the same as an application has been submitted ## Application submission to marked queues should be rejected or make the mapping rules step to the next rule # After the grace period, check that the marked queues does not have any application running, and:
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267736#comment-17267736 ] zhuqi edited comment on YARN-10532 at 1/19/21, 8:19 AM: [~wangda] [~gandras] The test is not related, now it can be reviewed. Another question is : It realized auto deletion for old auto created also, if we just want to support new auto created queue. Thanks. was (Author: zhuqi): [~wangda] [~gandras] Now it can be reviewed. Another question is : It realized auto deletion for old auto created also, if we just want to support new auto created queue. Thanks. > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: zhuqi >Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, > YARN-10532.006.patch, YARN-10532.007.patch > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267111#comment-17267111 ] zhuqi edited comment on YARN-10532 at 1/18/21, 1:27 PM: The latest patch, double check the "An additional requirement we should keep in mind: Scenario A: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - Before the signal arrives to scheduler, an app submitted to scheduler (T1). T1 > T0 - When at T2 (T2 > T1), the signal arrived at scheduler, scheduler should avoid removing the queue A because now it is used.{code} Scenario B: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - At T1 (T1 > T0), scheduler got the signal and deleted the queue. - At T2 (T2 > T1), an app submitted to scheduler. Scheduler should immediately recreate the queue, in another word, deleting an dynamic queue should NEVER fail a submitted application.{code} " This will not happen: Scenario A confirmed by : Double check before deletion, pass the latest last submitted time, and get before remove again and compare them. All will in the queue write lock. {code:java} // Double check for the lastSubmitTime has been expired. // In case if now, there is a new submitted app. if (queue instanceof LeafQueue && ((LeafQueue) queue).isDynamicQueue()) { LeafQueue underDeleted = (LeafQueue)queue; if (underDeleted.getLastSubmittedTimestamp() != lastSubmittedTime) { throw new SchedulerDynamicEditException("This should not happen, " + "trying to remove queue= " + childQueuePath + ", however the queue has new submitted apps."); } } else { throw new SchedulerDynamicEditException( "This should not happen, can't remove queue= " + childQueuePath + " is not a leafQueue or not a dynamic queue."); } // Now we can do remove and update this.childQueues.remove(queue); this.scheduler.getCapacitySchedulerQueueManager() .removeQueue(queue.getQueuePath()); {code} Signal will also update this in the write lock: {code:java} @Override public void submitApplication(ApplicationId applicationId, String userName, String queue) throws AccessControlException { // Careful! Locking order is important! validateSubmitApplication(applicationId, userName, queue); // Signal to queue submit time in dynamic queue if (this.isDynamicQueue()) { signalToSubmitToQueue(); } // Inform the parent queue try { getParent().submitApplication(applicationId, userName, queue); } catch (AccessControlException ace) { LOG.info("Failed to submit application to parent-queue: " + getParent().getQueuePath(), ace); throw ace; } } // "Tab" the queue, so this queue won't be removed because of idle timeout. public void signalToSubmitToQueue() { writeLock.lock(); try { this.lastSubmittedTimestamp = System.currentTimeMillis(); } finally { writeLock.unlock(); } } {code} Scenario B confirmed by : in addApplication and addApplicationOnRecovery. {code:java} //- At time T0, policy signals scheduler to delete queue A (an auto created queue). //- At T1 (T1 > T0), scheduler got the signal and deleted the queue. //- At T2 (T2 > T1), an app submitted to scheduler. // //Scheduler should immediately recreate the queue, in another word, // deleting an dynamic queue should NEVER fail a submitted application. // This will not happen, because : // The writelock in addApplication // and in addApplicationOnRecovery. // Will make sure the create and submit atomic. // Also the capacity scheduler writelock will be held in remove logic. private void addApplication(ApplicationId applicationId, String queueName, String user, Priority priority, ApplicationPlacementContext placementContext) { writeLock.lock(); ... } // The remove will hold writelock private CSQueue removeDynamicChildQueue(String childQueuePath, boolean isLeaf, long lastSubmittedTime) throws SchedulerDynamicEditException { writeLock.lock(); ... }{code} Above is for policy auto deletion. When reinitializeQueues, already in capacity scheduler write lock. It will be safe. was (Author: zhuqi): The latest patch, double check the "An additional requirement we should keep in mind: Scenario A: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - Before the signal arrives to scheduler, an app submitted to scheduler (T1). T1 > T0 - When at T2 (T2 > T1), the signal arrived at scheduler, scheduler should avoid removing the queue A because now it is used.{code} Scenario B: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - At T1 (T1 > T0), scheduler got the signal and deleted the queue. - At T2 (T2 > T1), an app submitted to scheduler. Scheduler should immediately recreate the queue, in another word, deleting an
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267111#comment-17267111 ] zhuqi edited comment on YARN-10532 at 1/18/21, 1:27 PM: The latest patch, double check the "An additional requirement we should keep in mind: Scenario A: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - Before the signal arrives to scheduler, an app submitted to scheduler (T1). T1 > T0 - When at T2 (T2 > T1), the signal arrived at scheduler, scheduler should avoid removing the queue A because now it is used.{code} Scenario B: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - At T1 (T1 > T0), scheduler got the signal and deleted the queue. - At T2 (T2 > T1), an app submitted to scheduler. Scheduler should immediately recreate the queue, in another word, deleting an dynamic queue should NEVER fail a submitted application.{code} " This will not happen: Scenario A confirmed by : Double check before deletion, pass the latest last submitted time, and get before remove again and compare them. All will in the queue write lock. {code:java} // Double check for the lastSubmitTime has been expired. // In case if now, there is a new submitted app. if (queue instanceof LeafQueue && ((LeafQueue) queue).isDynamicQueue()) { LeafQueue underDeleted = (LeafQueue)queue; if (underDeleted.getLastSubmittedTimestamp() != lastSubmittedTime) { throw new SchedulerDynamicEditException("This should not happen, " + "trying to remove queue= " + childQueuePath + ", however the queue has new submitted apps."); } } else { throw new SchedulerDynamicEditException( "This should not happen, can't remove queue= " + childQueuePath + " is not a leafQueue or not a dynamic queue."); } // Now we can do remove and update this.childQueues.remove(queue); this.scheduler.getCapacitySchedulerQueueManager() .removeQueue(queue.getQueuePath()); {code} Signal will also update this in the write lock: {code:java} @Override public void submitApplication(ApplicationId applicationId, String userName, String queue) throws AccessControlException { // Careful! Locking order is important! validateSubmitApplication(applicationId, userName, queue); // Signal to queue submit time in dynamic queue if (this.isDynamicQueue()) { signalToSubmitToQueue(); } // Inform the parent queue try { getParent().submitApplication(applicationId, userName, queue); } catch (AccessControlException ace) { LOG.info("Failed to submit application to parent-queue: " + getParent().getQueuePath(), ace); throw ace; } } // "Tab" the queue, so this queue won't be removed because of idle timeout. public void signalToSubmitToQueue() { writeLock.lock(); try { this.lastSubmittedTimestamp = System.currentTimeMillis(); } finally { writeLock.unlock(); } } {code} Scenario B confirmed by : in addApplication and addApplicationOnRecovery. {code:java} //- At time T0, policy signals scheduler to delete queue A (an auto created queue). //- At T1 (T1 > T0), scheduler got the signal and deleted the queue. //- At T2 (T2 > T1), an app submitted to scheduler. // //Scheduler should immediately recreate the queue, in another word, // deleting an dynamic queue should NEVER fail a submitted application. // This will not happen, because : // The writelock in addApplication // and in addApplicationOnRecovery. // Will make sure the create and submit atomic. // Also the capacity scheduler writelock will be held in remove logic. private void addApplication(ApplicationId applicationId, String queueName, String user, Priority priority, ApplicationPlacementContext placementContext) { writeLock.lock(); ... } // The remove will hold writelock private CSQueue removeDynamicChildQueue(String childQueuePath, boolean isLeaf, long lastSubmittedTime) throws SchedulerDynamicEditException { writeLock.lock(); ... }{code} When reinitializeQueues, already in capacity scheduler write lock. was (Author: zhuqi): The latest patch, double check the "An additional requirement we should keep in mind: Scenario A: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - Before the signal arrives to scheduler, an app submitted to scheduler (T1). T1 > T0 - When at T2 (T2 > T1), the signal arrived at scheduler, scheduler should avoid removing the queue A because now it is used.{code} Scenario B: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - At T1 (T1 > T0), scheduler got the signal and deleted the queue. - At T2 (T2 > T1), an app submitted to scheduler. Scheduler should immediately recreate the queue, in another word, deleting an dynamic queue should NEVER fail a submitted
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267111#comment-17267111 ] zhuqi edited comment on YARN-10532 at 1/18/21, 1:10 PM: The latest patch, double check the "An additional requirement we should keep in mind: Scenario A: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - Before the signal arrives to scheduler, an app submitted to scheduler (T1). T1 > T0 - When at T2 (T2 > T1), the signal arrived at scheduler, scheduler should avoid removing the queue A because now it is used.{code} Scenario B: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - At T1 (T1 > T0), scheduler got the signal and deleted the queue. - At T2 (T2 > T1), an app submitted to scheduler. Scheduler should immediately recreate the queue, in another word, deleting an dynamic queue should NEVER fail a submitted application.{code} " This will not happen: Scenario A confirmed by : Double check before deletion, pass the latest last submitted time, and get before remove again and compare them. All will in the queue write lock. {code:java} // Double check for the lastSubmitTime has been expired. // In case if now, there is a new submitted app. if (queue instanceof LeafQueue && ((LeafQueue) queue).isDynamicQueue()) { LeafQueue underDeleted = (LeafQueue)queue; if (underDeleted.getLastSubmittedTimestamp() != lastSubmittedTime) { throw new SchedulerDynamicEditException("This should not happen, " + "trying to remove queue= " + childQueuePath + ", however the queue has new submitted apps."); } } else { throw new SchedulerDynamicEditException( "This should not happen, can't remove queue= " + childQueuePath + " is not a leafQueue or not a dynamic queue."); } // Now we can do remove and update this.childQueues.remove(queue); this.scheduler.getCapacitySchedulerQueueManager() .removeQueue(queue.getQueuePath()); {code} Signal will also update this in the write lock: {code:java} @Override public void submitApplication(ApplicationId applicationId, String userName, String queue) throws AccessControlException { // Careful! Locking order is important! validateSubmitApplication(applicationId, userName, queue); // Signal to queue submit time in dynamic queue if (this.isDynamicQueue()) { signalToSubmitToQueue(); } // Inform the parent queue try { getParent().submitApplication(applicationId, userName, queue); } catch (AccessControlException ace) { LOG.info("Failed to submit application to parent-queue: " + getParent().getQueuePath(), ace); throw ace; } } // "Tab" the queue, so this queue won't be removed because of idle timeout. public void signalToSubmitToQueue() { writeLock.lock(); try { this.lastSubmittedTimestamp = System.currentTimeMillis(); } finally { writeLock.unlock(); } } {code} Scenario B confirmed by : in addApplication and addApplicationOnRecovery. {code:java} //- At time T0, policy signals scheduler to delete queue A (an auto created queue). //- At T1 (T1 > T0), scheduler got the signal and deleted the queue. //- At T2 (T2 > T1), an app submitted to scheduler. // //Scheduler should immediately recreate the queue, in another word, // deleting an dynamic queue should NEVER fail a submitted application. // This will not happen, because : // The writelock in addApplication // and in addApplicationOnRecovery. // Will make sure the create and submit atomic. // Also the capacity scheduler writelock will be held in remove logic. private void addApplication(ApplicationId applicationId, String queueName, String user, Priority priority, ApplicationPlacementContext placementContext) { writeLock.lock(); ... } // The remove will hold writelock private CSQueue removeDynamicChildQueue(String childQueuePath, boolean isLeaf, long lastSubmittedTime) throws SchedulerDynamicEditException { writeLock.lock(); ... }{code} was (Author: zhuqi): The latest patch, double check the "An additional requirement we should keep in mind: Scenario A: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - Before the signal arrives to scheduler, an app submitted to scheduler (T1). T1 > T0 - When at T2 (T2 > T1), the signal arrived at scheduler, scheduler should avoid removing the queue A because now it is used.{code} Scenario B: {code:java} - At time T0, policy signals scheduler to delete queue A (an auto created queue). - At T1 (T1 > T0), scheduler got the signal and deleted the queue. - At T2 (T2 > T1), an app submitted to scheduler. Scheduler should immediately recreate the queue, in another word, deleting an dynamic queue should NEVER fail a submitted application.{code} " This will not happen: Scenario A confirmed by :
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266529#comment-17266529 ] zhuqi edited comment on YARN-10532 at 1/18/21, 5:47 AM: [~wangda] [~gandras] I add a draft poc patch for 1, 2, 3 above. The expired time is just used for old auto created leaf queue, i reuse the expired logic to trigger old auto created expired for deletion. And the new auto created leaf queue , i have used the submit time. I will fix and deep into some other cases for more details. Thanks. was (Author: zhuqi): [~wangda] [~gandras] I add a draft poc patch for 1, 2, 3 above. I will fix and deep into some other cases for more details. Thanks. > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: zhuqi >Priority: Major > Attachments: YARN-10532.001.patch, YARN-10532.002.patch, > YARN-10532.003.patch > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267022#comment-17267022 ] zhuqi edited comment on YARN-10532 at 1/18/21, 5:42 AM: The latest patch support: For old auto created leaf queue deletion: 1. Support policy based auto deletion for expired queue: For old auto created leaf queue: 1.1 Support GuaranteedOrZeroCapacityOverTimePolicy based deletion : {code:java} long lastActive = getLeafQueueState(leafQueue, nodeLabel).getMostRecentActivationTime(); long lastDeactive = getLeafQueueState(leafQueue, nodeLabel).getMostRecentDeactivationTime(); // Check if need delete when expired. if (lastActive >= lastDeactive || (lastDeactive - lastActive)/1000 <= scheduler.getConfiguration(). getAutoExpiredDeletionTime(managedParentQueue.getQueuePath()) || leafQueue.getAllApplications().size() > 0) { isExpired = false; } {code} For new auto created leaf queue: {code:java} private synchronized void computeDynamicLeafQueueChanges(LeafQueue leafQueue) throws SchedulerDynamicEditException { // Expired queue, when there are no running in leafQueue // and the last submit time has been expired // Delete queue when expired deletion enabled. ParentQueue parentQueue = (ParentQueue) leafQueue.getParent(); if (parentQueue == null) { throw new SchedulerDynamicEditException("Parent " + "queue should not be null for auto deletion!"); } long idleDuration = (System.currentTimeMillis() - leafQueue.getLastSubmittedTimestamp())/1000; if (leafQueue.getAllApplications().size() ==0 && idleDuration > this.getConfiguration() .getAutoExpiredDeletionTime(leafQueue.getParent().getQueuePath()) && this.getConfiguration(). isAutoExpiredDeletionEnabled(leafQueue.getParent().getQueuePath())){ LeafQueue removed = parentQueue. removeDynamicLeafQueue(leafQueue.getQueuePath()); if (removed != null) { this.getCapacitySchedulerQueueManager(). removeQueue(leafQueue.getQueuePath()); } } } {code} 2. Support policy not enabled with Reinitialize update deletion: {code:java} private void updateQueues(CSQueueStore existingQueues, CSQueueStore newQueues) { CapacitySchedulerConfiguration conf = csContext.getConfiguration(); for (CSQueue queue : newQueues.getQueues()) { if (existingQueues.get(queue.getQueuePath()) == null) { existingQueues.add(queue); } } for (CSQueue queue : existingQueues.getQueues()) { // should also support for auto created for expired deletion // 1. handle old auto created deletion for reinitializeQueues // 2. handle new auto created deletion for reinitializeQueues if ((queue.getParent() != null && queue instanceof AutoCreatedLeafQueue && conf.isAutoExpiredDeletionEnabled(queue.getParent().getQueuePath()) && (newQueues.get(queue.getQueuePath())) == null && ((AutoCreatedLeafQueue) queue).isExpiredQueue()) || (queue.getParent() != null && queue instanceof LeafQueue && ((LeafQueue) queue).isDynamicQueue() && conf.isAutoExpiredDeletionEnabled(queue.getParent().getQueuePath()) && (newQueues.get(queue.getQueuePath())) == null && ((System.currentTimeMillis() - ((LeafQueue)queue).getLastSubmittedTimestamp()) > conf.getAutoExpiredDeletionTime(queue.getParent().getQueuePath())) && ((LeafQueue)queue).getAllApplications().size() == 0) || !((AbstractCSQueue) queue).isDynamicQueue() && newQueues.get( queue.getQueuePath()) == null && !( queue instanceof AutoCreatedLeafQueue && conf .isAutoCreateChildQueueEnabled( queue.getParent().getQueuePath( { existingQueues.remove(queue); } } {code} Other remaining to do: # If we need to support auto deletion also for parent queues. # I will deep into more details about all the corner cases. # The queue name / queue path make confused when deletion and some related case. was (Author: zhuqi): The latest patch support: For old auto created leaf queue deletion: 1. Support policy based auto deletion for expired queue: For old auto created leaf queue: 1.1 Support GuaranteedOrZeroCapacityOverTimePolicy based deletion : {code:java} long lastActive = getLeafQueueState(leafQueue, nodeLabel).getMostRecentActivationTime(); long lastDeactive = getLeafQueueState(leafQueue, nodeLabel).getMostRecentDeactivationTime(); // Check if need delete when expired. if (lastActive >= lastDeactive || (lastDeactive - lastActive)/1000 <= scheduler.getConfiguration(). getAutoExpiredDeletionTime(managedParentQueue.getQueuePath()) || leafQueue.getAllApplications().size() > 0) { isExpired = false; } {code} For new auto created leaf queue: {code:java} private
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267022#comment-17267022 ] zhuqi edited comment on YARN-10532 at 1/18/21, 5:42 AM: [~wangda] [~gandras] The latest patch support: For old auto created leaf queue deletion: 1. Support policy based auto deletion for expired queue: For old auto created leaf queue: 1.1 Support GuaranteedOrZeroCapacityOverTimePolicy based deletion : {code:java} long lastActive = getLeafQueueState(leafQueue, nodeLabel).getMostRecentActivationTime(); long lastDeactive = getLeafQueueState(leafQueue, nodeLabel).getMostRecentDeactivationTime(); // Check if need delete when expired. if (lastActive >= lastDeactive || (lastDeactive - lastActive)/1000 <= scheduler.getConfiguration(). getAutoExpiredDeletionTime(managedParentQueue.getQueuePath()) || leafQueue.getAllApplications().size() > 0) { isExpired = false; } {code} For new auto created leaf queue: {code:java} private synchronized void computeDynamicLeafQueueChanges(LeafQueue leafQueue) throws SchedulerDynamicEditException { // Expired queue, when there are no running in leafQueue // and the last submit time has been expired // Delete queue when expired deletion enabled. ParentQueue parentQueue = (ParentQueue) leafQueue.getParent(); if (parentQueue == null) { throw new SchedulerDynamicEditException("Parent " + "queue should not be null for auto deletion!"); } long idleDuration = (System.currentTimeMillis() - leafQueue.getLastSubmittedTimestamp())/1000; if (leafQueue.getAllApplications().size() ==0 && idleDuration > this.getConfiguration() .getAutoExpiredDeletionTime(leafQueue.getParent().getQueuePath()) && this.getConfiguration(). isAutoExpiredDeletionEnabled(leafQueue.getParent().getQueuePath())){ LeafQueue removed = parentQueue. removeDynamicLeafQueue(leafQueue.getQueuePath()); if (removed != null) { this.getCapacitySchedulerQueueManager(). removeQueue(leafQueue.getQueuePath()); } } } {code} 2. Support policy not enabled with Reinitialize update deletion: {code:java} private void updateQueues(CSQueueStore existingQueues, CSQueueStore newQueues) { CapacitySchedulerConfiguration conf = csContext.getConfiguration(); for (CSQueue queue : newQueues.getQueues()) { if (existingQueues.get(queue.getQueuePath()) == null) { existingQueues.add(queue); } } for (CSQueue queue : existingQueues.getQueues()) { // should also support for auto created for expired deletion // 1. handle old auto created deletion for reinitializeQueues // 2. handle new auto created deletion for reinitializeQueues if ((queue.getParent() != null && queue instanceof AutoCreatedLeafQueue && conf.isAutoExpiredDeletionEnabled(queue.getParent().getQueuePath()) && (newQueues.get(queue.getQueuePath())) == null && ((AutoCreatedLeafQueue) queue).isExpiredQueue()) || (queue.getParent() != null && queue instanceof LeafQueue && ((LeafQueue) queue).isDynamicQueue() && conf.isAutoExpiredDeletionEnabled(queue.getParent().getQueuePath()) && (newQueues.get(queue.getQueuePath())) == null && ((System.currentTimeMillis() - ((LeafQueue)queue).getLastSubmittedTimestamp()) > conf.getAutoExpiredDeletionTime(queue.getParent().getQueuePath())) && ((LeafQueue)queue).getAllApplications().size() == 0) || !((AbstractCSQueue) queue).isDynamicQueue() && newQueues.get( queue.getQueuePath()) == null && !( queue instanceof AutoCreatedLeafQueue && conf .isAutoCreateChildQueueEnabled( queue.getParent().getQueuePath( { existingQueues.remove(queue); } } {code} Other remaining to do: # If we need to support auto deletion also for parent queues. # I will deep into more details about all the corner cases. # The queue name / queue path make confused when deletion and some related case. was (Author: zhuqi): The latest patch support: For old auto created leaf queue deletion: 1. Support policy based auto deletion for expired queue: For old auto created leaf queue: 1.1 Support GuaranteedOrZeroCapacityOverTimePolicy based deletion : {code:java} long lastActive = getLeafQueueState(leafQueue, nodeLabel).getMostRecentActivationTime(); long lastDeactive = getLeafQueueState(leafQueue, nodeLabel).getMostRecentDeactivationTime(); // Check if need delete when expired. if (lastActive >= lastDeactive || (lastDeactive - lastActive)/1000 <= scheduler.getConfiguration(). getAutoExpiredDeletionTime(managedParentQueue.getQueuePath()) || leafQueue.getAllApplications().size() > 0) { isExpired = false; } {code} For new auto created leaf queue:
[jira] [Comment Edited] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used
[ https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249512#comment-17249512 ] zhuqi edited comment on YARN-10532 at 12/15/20, 9:35 AM: - [~wangda] I want to take it , if no one to take. Submit a patch for review. And, it will change some logic, when weight mode done. Thanks.:) was (Author: zhuqi): [~wangda] I want to take it , if no one to take. Thanks.:) > Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is > not being used > > > Key: YARN-10532 > URL: https://issues.apache.org/jira/browse/YARN-10532 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: zhuqi >Priority: Major > Attachments: YARN-10532.001.patch > > > It's better if we can delete auto-created queues when they are not in use for > a period of time (like 5 mins). It will be helpful when we have a large > number of auto-created queues (e.g. from 500 users), but only a small subset > of queues are actively used. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org