[ 
https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267111#comment-17267111
 ] 

zhuqi edited comment on YARN-10532 at 1/18/21, 1:27 PM:
--------------------------------------------------------

The latest patch, double check the 

"An additional requirement we should keep in mind: 

Scenario A:
{code:java}
- At time T0, policy signals scheduler to delete queue A (an auto created 
queue). 
- Before the signal arrives to scheduler, an app submitted to scheduler (T1). 
T1 > T0
- When at T2 (T2 > T1), the signal arrived at scheduler, scheduler should avoid 
removing the queue A because now it is used.{code}
Scenario B:
{code:java}
- At time T0, policy signals scheduler to delete queue A (an auto created 
queue).
- At T1 (T1 > T0), scheduler got the signal and deleted the queue.
- At T2 (T2 > T1), an app submitted to scheduler.

Scheduler should immediately recreate the queue, in another word, deleting an 
dynamic queue should NEVER fail a submitted application.{code}
"

This will not happen:

Scenario A confirmed by :

Double check before deletion, pass the latest last submitted time, and get 
before remove again and compare them. All will in the queue write lock.
{code:java}
// Double check for the lastSubmitTime has been expired.
// In case if now, there is a new submitted app.
if (queue instanceof LeafQueue &&
    ((LeafQueue) queue).isDynamicQueue()) {
  LeafQueue underDeleted = (LeafQueue)queue;
  if (underDeleted.getLastSubmittedTimestamp() != lastSubmittedTime) {
    throw new SchedulerDynamicEditException("This should not happen, " +
        "trying to remove queue= " + childQueuePath
        + ", however the queue has new submitted apps.");
  }
} else {
  throw new SchedulerDynamicEditException(
      "This should not happen, can't remove queue= " + childQueuePath
          + " is not a leafQueue or not a dynamic queue.");
}

// Now we can do remove and update
this.childQueues.remove(queue);
this.scheduler.getCapacitySchedulerQueueManager()
    .removeQueue(queue.getQueuePath());

{code}
Signal will also update this in the write lock:
{code:java}
@Override
public void submitApplication(ApplicationId applicationId, String userName,
    String queue)  throws AccessControlException {
  // Careful! Locking order is important!
  validateSubmitApplication(applicationId, userName, queue);

  // Signal to queue submit time in dynamic queue
  if (this.isDynamicQueue()) {
    signalToSubmitToQueue();
  }

  // Inform the parent queue
  try {
    getParent().submitApplication(applicationId, userName, queue);
  } catch (AccessControlException ace) {
    LOG.info("Failed to submit application to parent-queue: " +
        getParent().getQueuePath(), ace);
    throw ace;
  }

}

// "Tab" the queue, so this queue won't be removed because of idle timeout.
public void signalToSubmitToQueue() {
  writeLock.lock();
  try {
    this.lastSubmittedTimestamp = System.currentTimeMillis();
  } finally {
    writeLock.unlock();
  }
}
{code}
Scenario B confirmed by :

in addApplication

and addApplicationOnRecovery.
{code:java}
//- At time T0, policy signals scheduler to delete queue A (an auto created 
queue).
//- At T1 (T1 > T0), scheduler got the signal and deleted the queue.
//- At T2 (T2 > T1), an app submitted to scheduler.
//
//Scheduler should immediately recreate the queue, in another word,
// deleting an dynamic queue should NEVER fail a submitted application.

// This will not happen, because :
// The writelock in addApplication
// and in addApplicationOnRecovery.
// Will make sure the create and submit atomic.
// Also the capacity scheduler writelock will be held in remove logic.

private void addApplication(ApplicationId applicationId, String queueName,
    String user, Priority priority,
    ApplicationPlacementContext placementContext) {
  writeLock.lock(); 
  ...
} 


// The remove will hold writelock
private CSQueue removeDynamicChildQueue(String childQueuePath, boolean isLeaf, 
long lastSubmittedTime)
    throws SchedulerDynamicEditException {
  writeLock.lock();
...
}{code}
 

Above is for policy auto deletion.

When  reinitializeQueues, already in capacity scheduler write lock. It will be 
safe.

 


was (Author: zhuqi):
The latest patch, double check the 

"An additional requirement we should keep in mind: 

Scenario A:
{code:java}
- At time T0, policy signals scheduler to delete queue A (an auto created 
queue). 
- Before the signal arrives to scheduler, an app submitted to scheduler (T1). 
T1 > T0
- When at T2 (T2 > T1), the signal arrived at scheduler, scheduler should avoid 
removing the queue A because now it is used.{code}
Scenario B:
{code:java}
- At time T0, policy signals scheduler to delete queue A (an auto created 
queue).
- At T1 (T1 > T0), scheduler got the signal and deleted the queue.
- At T2 (T2 > T1), an app submitted to scheduler.

Scheduler should immediately recreate the queue, in another word, deleting an 
dynamic queue should NEVER fail a submitted application.{code}
"

This will not happen:

Scenario A confirmed by :

Double check before deletion, pass the latest last submitted time, and get 
before remove again and compare them. All will in the queue write lock.
{code:java}
// Double check for the lastSubmitTime has been expired.
// In case if now, there is a new submitted app.
if (queue instanceof LeafQueue &&
    ((LeafQueue) queue).isDynamicQueue()) {
  LeafQueue underDeleted = (LeafQueue)queue;
  if (underDeleted.getLastSubmittedTimestamp() != lastSubmittedTime) {
    throw new SchedulerDynamicEditException("This should not happen, " +
        "trying to remove queue= " + childQueuePath
        + ", however the queue has new submitted apps.");
  }
} else {
  throw new SchedulerDynamicEditException(
      "This should not happen, can't remove queue= " + childQueuePath
          + " is not a leafQueue or not a dynamic queue.");
}

// Now we can do remove and update
this.childQueues.remove(queue);
this.scheduler.getCapacitySchedulerQueueManager()
    .removeQueue(queue.getQueuePath());

{code}
Signal will also update this in the write lock:
{code:java}
@Override
public void submitApplication(ApplicationId applicationId, String userName,
    String queue)  throws AccessControlException {
  // Careful! Locking order is important!
  validateSubmitApplication(applicationId, userName, queue);

  // Signal to queue submit time in dynamic queue
  if (this.isDynamicQueue()) {
    signalToSubmitToQueue();
  }

  // Inform the parent queue
  try {
    getParent().submitApplication(applicationId, userName, queue);
  } catch (AccessControlException ace) {
    LOG.info("Failed to submit application to parent-queue: " +
        getParent().getQueuePath(), ace);
    throw ace;
  }

}

// "Tab" the queue, so this queue won't be removed because of idle timeout.
public void signalToSubmitToQueue() {
  writeLock.lock();
  try {
    this.lastSubmittedTimestamp = System.currentTimeMillis();
  } finally {
    writeLock.unlock();
  }
}
{code}
Scenario B confirmed by :

in addApplication

and addApplicationOnRecovery.
{code:java}
//- At time T0, policy signals scheduler to delete queue A (an auto created 
queue).
//- At T1 (T1 > T0), scheduler got the signal and deleted the queue.
//- At T2 (T2 > T1), an app submitted to scheduler.
//
//Scheduler should immediately recreate the queue, in another word,
// deleting an dynamic queue should NEVER fail a submitted application.

// This will not happen, because :
// The writelock in addApplication
// and in addApplicationOnRecovery.
// Will make sure the create and submit atomic.
// Also the capacity scheduler writelock will be held in remove logic.

private void addApplication(ApplicationId applicationId, String queueName,
    String user, Priority priority,
    ApplicationPlacementContext placementContext) {
  writeLock.lock(); 
  ...
} 


// The remove will hold writelock
private CSQueue removeDynamicChildQueue(String childQueuePath, boolean isLeaf, 
long lastSubmittedTime)
    throws SchedulerDynamicEditException {
  writeLock.lock();
...
}{code}
 

When  reinitializeQueues, already in capacity scheduler write lock.

 

 

> Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is 
> not being used
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-10532
>                 URL: https://issues.apache.org/jira/browse/YARN-10532
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: zhuqi
>            Priority: Major
>         Attachments: YARN-10532.001.patch, YARN-10532.002.patch, 
> YARN-10532.003.patch, YARN-10532.004.patch
>
>
> It's better if we can delete auto-created queues when they are not in use for 
> a period of time (like 5 mins). It will be helpful when we have a large 
> number of auto-created queues (e.g. from 500 users), but only a small subset 
> of queues are actively used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to