[jira] [Commented] (YARN-5716) Add global scheduler interface definition and update CapacityScheduler to use it.

Jian He (JIRA) Wed, 19 Oct 2016 16:44:42 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15590262#comment-15590262
 ]


Jian He commented on YARN-5716:
-------------------------------

- remove this?
{code}
        /*
        FIXME
        application
            .getCSLeafQueue()
            .getOrderingPolicy()
            .containerAllocated(application,
                application.getRMContainer(updatedContainer.getId()));
        */
{code}
- When stop, if the thread is interrupted, the else condition will make the 
thread continue to run. the join in stop will always timeout.
{code}
          if (!runSchedules.get() && !Thread.currentThread().isInterrupted()) {
            Thread.sleep(100);
          } else{
            // Don't run schedule if we have some pending backlogs already
            if (cs.getAsyncSchedulingPendingBacklogs() > 100) {
              Thread.sleep(1);
            } else{
              schedule(cs);
            }
          }
        } catch (InterruptedException ie) {
          // Do nothing
        }
{code}
- the withNodeHeartbeat flag is not needed, because it's already checked in the 
caller
{code}
  private boolean canAllocateMore(CSAssignment assignment,
      boolean withNodeHeartbeat) {
{code}

- how about check whether "getNode(node.getNodeID())" equals to null, I feel 
that's easier to reason for a removed node
{code}
    // Backward compatible way to make sure previous behavior which allocation
    // driven by node heartbeat works.
    if (getNode(node.getNodeID()) != node)
{code}
- This if condition can be merged into previous "if (reservedContainer != null) 
{" condition, as they are the same.
 {code}
    // Do not schedule if there are any reservations to fulfill on the node
    if (node.getReservedContainer() != null) {
      if (LOG.isDebugEnabled()) {
        LOG.debug("Skipping scheduling since node " + node.getNodeID()
            + " is reserved by application " + node.getReservedContainer()
            .getContainerId().getApplicationAttemptId());
      }
      return null;
    }
 {code}
 - canSchedule flag is not needed, the previuous if condition can just return 
direclty 
 {code}
     if (!canSchedule) {
      if (LOG.isDebugEnabled()) {
        LOG.debug("This node or this node partition doesn't have available or"
            + "killable resource");
      }
      return null;
    }
{code}
- the canSchedule flag here is also not needed
similarly,
{code}
    boolean canSchedule = false;

    // When this time look at multiple nodes, try schedule if the
    // partition has any available resource
    if (root.getQueueCapacities().getUsedCapacity(ps.getPartition()) < 1.0f
        || preemptionManager.getKillableResource(
        CapacitySchedulerConfiguration.ROOT, ps.getPartition()) != Resources
        .none()) {
      canSchedule = true;
    }

    if (!canSchedule) {
      if (LOG.isDebugEnabled()) {
        LOG.debug("This node or this node partition doesn't have available or"
            + "killable resource");
      }
      return null;
    }
{code}
- Looks like one behavior change is that previously on node heartbeat, we 
always satisfy reservedContainer first, now in async scheduling, it's not the 
case any more ?
- this if check is not needed, beacuse it's already checked in the caller 
{code}
  private CSAssignment allocateContainersOnMultiNodes(
      PlacementSet<FiCaSchedulerNode> ps) {
    if (rmContext.isWorkPreservingRecoveryEnabled() && !rmContext
        .isSchedulerReadyForAllocatingContainers()) {
      return null;
    }
{code}
- PlacementSet, placement is an abstract name, how about NodeSet to be more 
concret? 
-PlacementSetUtils.getSingleNode -> hasSingleNode
- ContainerAllocationContext -> ContainerAllocationProposal

- nodePartition parameter is not needed, it can be inferred from 'node' 
parameter 
{code}
  public SchedulerContainer(A app, N node, RMContainer rmContainer,
      String nodePartition, boolean allocated) {
{code}
- remove this method, a new method with the same name just makes it more 
confusing. And the other getSchedulerContainer method with false condition is 
already being used in some places.
{code}
  private SchedulerContainer<FiCaSchedulerApp, FiCaSchedulerNode> 
getSchedulerContainer(
      RMContainer rmContainer) {
    return getSchedulerContainer(rmContainer, false);
  }
{code}

- These two if conditidions code are duplicated, can be moved out side
{code}
          if (updateUnconfirmedAllocatedResource) {
            app.decUnconfirmedRes(request.getTotalAllocatedResource());
          }
          LOG.info("Allocation proposal accepted");
        } else{
          if (updateUnconfirmedAllocatedResource) {
            app.decUnconfirmedRes(request.getTotalAllocatedResource());
          }
{code}
- remove this
{{// If we do assign, remove the queue and re-insert in-order to re-sort}}\
- In LeafQueue#updateCurrentResourceLimits, multiple threads will update 
cachedResourceLimitsForHeadroom without synchronization 
- Fix indetation 
{code}
      if (null != application) {
      ActivitiesLogger.APP.startAppAllocationRecording(activitiesManager,
          node.getNodeID(), SystemClock.getInstance().getTime(), application);
      CSAssignment assignment = application.assignContainers(clusterResource,
          ps, currentResourceLimits, schedulingMode, reservedContainer);
      return assignment;
      }
{code}
- LeafQueeu#handleReservedContainer -> allocateReservedContainer
- SchedulerApplicationAttempt#incNumAllocatedContainers,  all the locality 
statistics functionality  are removed ?


> Add global scheduler interface definition and update CapacityScheduler to use 
> it.
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-5716
>                 URL: https://issues.apache.org/jira/browse/YARN-5716
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-5716.001.patch, YARN-5716.002.patch, 
> YARN-5716.003.patch, YARN-5716.004.patch
>
>
> Target of this JIRA:
> - Definition of interfaces / objects which will be used by global scheduling, 
> this will be shared by different schedulers.
> - Modify CapacityScheduler to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-5716) Add global scheduler interface definition and update CapacityScheduler to use it.

Reply via email to