[
https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267111#comment-17267111
]
zhuqi edited comment on YARN-10532 at 1/18/21, 1:27 PM:
--------------------------------------------------------
The latest patch, double check the
"An additional requirement we should keep in mind:
Scenario A:
{code:java}
- At time T0, policy signals scheduler to delete queue A (an auto created
queue).
- Before the signal arrives to scheduler, an app submitted to scheduler (T1).
T1 > T0
- When at T2 (T2 > T1), the signal arrived at scheduler, scheduler should avoid
removing the queue A because now it is used.{code}
Scenario B:
{code:java}
- At time T0, policy signals scheduler to delete queue A (an auto created
queue).
- At T1 (T1 > T0), scheduler got the signal and deleted the queue.
- At T2 (T2 > T1), an app submitted to scheduler.
Scheduler should immediately recreate the queue, in another word, deleting an
dynamic queue should NEVER fail a submitted application.{code}
"
This will not happen:
Scenario A confirmed by :
Double check before deletion, pass the latest last submitted time, and get
before remove again and compare them. All will in the queue write lock.
{code:java}
// Double check for the lastSubmitTime has been expired.
// In case if now, there is a new submitted app.
if (queue instanceof LeafQueue &&
((LeafQueue) queue).isDynamicQueue()) {
LeafQueue underDeleted = (LeafQueue)queue;
if (underDeleted.getLastSubmittedTimestamp() != lastSubmittedTime) {
throw new SchedulerDynamicEditException("This should not happen, " +
"trying to remove queue= " + childQueuePath
+ ", however the queue has new submitted apps.");
}
} else {
throw new SchedulerDynamicEditException(
"This should not happen, can't remove queue= " + childQueuePath
+ " is not a leafQueue or not a dynamic queue.");
}
// Now we can do remove and update
this.childQueues.remove(queue);
this.scheduler.getCapacitySchedulerQueueManager()
.removeQueue(queue.getQueuePath());
{code}
Signal will also update this in the write lock:
{code:java}
@Override
public void submitApplication(ApplicationId applicationId, String userName,
String queue) throws AccessControlException {
// Careful! Locking order is important!
validateSubmitApplication(applicationId, userName, queue);
// Signal to queue submit time in dynamic queue
if (this.isDynamicQueue()) {
signalToSubmitToQueue();
}
// Inform the parent queue
try {
getParent().submitApplication(applicationId, userName, queue);
} catch (AccessControlException ace) {
LOG.info("Failed to submit application to parent-queue: " +
getParent().getQueuePath(), ace);
throw ace;
}
}
// "Tab" the queue, so this queue won't be removed because of idle timeout.
public void signalToSubmitToQueue() {
writeLock.lock();
try {
this.lastSubmittedTimestamp = System.currentTimeMillis();
} finally {
writeLock.unlock();
}
}
{code}
Scenario B confirmed by :
in addApplication
and addApplicationOnRecovery.
{code:java}
//- At time T0, policy signals scheduler to delete queue A (an auto created
queue).
//- At T1 (T1 > T0), scheduler got the signal and deleted the queue.
//- At T2 (T2 > T1), an app submitted to scheduler.
//
//Scheduler should immediately recreate the queue, in another word,
// deleting an dynamic queue should NEVER fail a submitted application.
// This will not happen, because :
// The writelock in addApplication
// and in addApplicationOnRecovery.
// Will make sure the create and submit atomic.
// Also the capacity scheduler writelock will be held in remove logic.
private void addApplication(ApplicationId applicationId, String queueName,
String user, Priority priority,
ApplicationPlacementContext placementContext) {
writeLock.lock();
...
}
// The remove will hold writelock
private CSQueue removeDynamicChildQueue(String childQueuePath, boolean isLeaf,
long lastSubmittedTime)
throws SchedulerDynamicEditException {
writeLock.lock();
...
}{code}
Above is for policy auto deletion.
When reinitializeQueues, already in capacity scheduler write lock. It will be
safe.
was (Author: zhuqi):
The latest patch, double check the
"An additional requirement we should keep in mind:
Scenario A:
{code:java}
- At time T0, policy signals scheduler to delete queue A (an auto created
queue).
- Before the signal arrives to scheduler, an app submitted to scheduler (T1).
T1 > T0
- When at T2 (T2 > T1), the signal arrived at scheduler, scheduler should avoid
removing the queue A because now it is used.{code}
Scenario B:
{code:java}
- At time T0, policy signals scheduler to delete queue A (an auto created
queue).
- At T1 (T1 > T0), scheduler got the signal and deleted the queue.
- At T2 (T2 > T1), an app submitted to scheduler.
Scheduler should immediately recreate the queue, in another word, deleting an
dynamic queue should NEVER fail a submitted application.{code}
"
This will not happen:
Scenario A confirmed by :
Double check before deletion, pass the latest last submitted time, and get
before remove again and compare them. All will in the queue write lock.
{code:java}
// Double check for the lastSubmitTime has been expired.
// In case if now, there is a new submitted app.
if (queue instanceof LeafQueue &&
((LeafQueue) queue).isDynamicQueue()) {
LeafQueue underDeleted = (LeafQueue)queue;
if (underDeleted.getLastSubmittedTimestamp() != lastSubmittedTime) {
throw new SchedulerDynamicEditException("This should not happen, " +
"trying to remove queue= " + childQueuePath
+ ", however the queue has new submitted apps.");
}
} else {
throw new SchedulerDynamicEditException(
"This should not happen, can't remove queue= " + childQueuePath
+ " is not a leafQueue or not a dynamic queue.");
}
// Now we can do remove and update
this.childQueues.remove(queue);
this.scheduler.getCapacitySchedulerQueueManager()
.removeQueue(queue.getQueuePath());
{code}
Signal will also update this in the write lock:
{code:java}
@Override
public void submitApplication(ApplicationId applicationId, String userName,
String queue) throws AccessControlException {
// Careful! Locking order is important!
validateSubmitApplication(applicationId, userName, queue);
// Signal to queue submit time in dynamic queue
if (this.isDynamicQueue()) {
signalToSubmitToQueue();
}
// Inform the parent queue
try {
getParent().submitApplication(applicationId, userName, queue);
} catch (AccessControlException ace) {
LOG.info("Failed to submit application to parent-queue: " +
getParent().getQueuePath(), ace);
throw ace;
}
}
// "Tab" the queue, so this queue won't be removed because of idle timeout.
public void signalToSubmitToQueue() {
writeLock.lock();
try {
this.lastSubmittedTimestamp = System.currentTimeMillis();
} finally {
writeLock.unlock();
}
}
{code}
Scenario B confirmed by :
in addApplication
and addApplicationOnRecovery.
{code:java}
//- At time T0, policy signals scheduler to delete queue A (an auto created
queue).
//- At T1 (T1 > T0), scheduler got the signal and deleted the queue.
//- At T2 (T2 > T1), an app submitted to scheduler.
//
//Scheduler should immediately recreate the queue, in another word,
// deleting an dynamic queue should NEVER fail a submitted application.
// This will not happen, because :
// The writelock in addApplication
// and in addApplicationOnRecovery.
// Will make sure the create and submit atomic.
// Also the capacity scheduler writelock will be held in remove logic.
private void addApplication(ApplicationId applicationId, String queueName,
String user, Priority priority,
ApplicationPlacementContext placementContext) {
writeLock.lock();
...
}
// The remove will hold writelock
private CSQueue removeDynamicChildQueue(String childQueuePath, boolean isLeaf,
long lastSubmittedTime)
throws SchedulerDynamicEditException {
writeLock.lock();
...
}{code}
When reinitializeQueues, already in capacity scheduler write lock.
> Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is
> not being used
> --------------------------------------------------------------------------------------------
>
> Key: YARN-10532
> URL: https://issues.apache.org/jira/browse/YARN-10532
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Wangda Tan
> Assignee: zhuqi
> Priority: Major
> Attachments: YARN-10532.001.patch, YARN-10532.002.patch,
> YARN-10532.003.patch, YARN-10532.004.patch
>
>
> It's better if we can delete auto-created queues when they are not in use for
> a period of time (like 5 mins). It will be helpful when we have a large
> number of auto-created queues (e.g. from 500 users), but only a small subset
> of queues are actively used.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]