[jira] [Commented] (YARN-10546) Limit application resource reservation on nodes for non-node/rack specific requests shoud be supported in CS.

zhuqi (Jira) Wed, 23 Dec 2020 23:47:06 -0800


    [ 
https://issues.apache.org/jira/browse/YARN-10546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254453#comment-17254453
 ]


zhuqi commented on YARN-10546:
------------------------------

[~leftnoteasy] [~tangzhankun]  [~Jim_Brennan]   [~Tao Yang]  

Our cluster now is FairScheduler, it will happened sometimes before YARN-4270 . 
When i write a test, i confirmed CS should also handle it. I will fixed it 
later.
{code:java}
@Test
@SuppressWarnings("unchecked")
public void testReservationLimitedForSingleApp() throws Exception {
  // Test that we now unreserve and use a node that has space

  CapacitySchedulerConfiguration csConf = new CapacitySchedulerConfiguration();
  setup(csConf);

  // Manipulate queue 'a'
  LeafQueue a = stubLeafQueue((LeafQueue) queues.get(A));

  // Users
  final String user_0 = "user_0";

  // Submit applications
  final ApplicationAttemptId appAttemptId_0 = TestUtils
      .getMockApplicationAttemptId(0, 0);
  FiCaSchedulerApp app_0 = new FiCaSchedulerApp(appAttemptId_0, user_0, a,
      mock(ActiveUsersManager.class), spyRMContext);
  app_0 = spy(app_0);
  
Mockito.doNothing().when(app_0).updateAMContainerDiagnostics(any(AMState.class),
      any(String.class));
  rmContext.getRMApps().put(app_0.getApplicationId(), mock(RMApp.class));

  a.submitApplicationAttempt(app_0, user_0);


  // Setup 5 nodes
  String host_0 = "host_0";
  FiCaSchedulerNode node_0 = TestUtils.getMockNode(host_0, DEFAULT_RACK, 0,
      8 * GB);
  String host_1 = "host_1";
  FiCaSchedulerNode node_1 = TestUtils.getMockNode(host_1, DEFAULT_RACK1, 0,
      8 * GB);
  String host_2 = "host_2";
  FiCaSchedulerNode node_2 = TestUtils.getMockNode(host_2, DEFAULT_RACK2, 0,
      8 * GB);
  String host_3 = "host_3";
  FiCaSchedulerNode node_3 = TestUtils.getMockNode(host_3, DEFAULT_RACK3, 0,
      8 * GB);
  String host_4 = "host_4";
  FiCaSchedulerNode node_4 = TestUtils.getMockNode(host_4, DEFAULT_RACK4, 0,
      8 * GB);

  Map<ApplicationAttemptId, FiCaSchedulerApp> apps = ImmutableMap.of(
      app_0.getApplicationAttemptId(), app_0);
  Map<NodeId, FiCaSchedulerNode> nodes = ImmutableMap.of(node_0.getNodeID(),
      node_0, node_1.getNodeID(), node_1, node_2.getNodeID(), node_2,
      node_3.getNodeID(), node_3, node_4.getNodeID(), node_4
  );

  when(csContext.getNode(node_0.getNodeID())).thenReturn(node_0);
  when(csContext.getNode(node_1.getNodeID())).thenReturn(node_1);
  when(csContext.getNode(node_2.getNodeID())).thenReturn(node_2);
  when(csContext.getNode(node_3.getNodeID())).thenReturn(node_3);
  when(csContext.getNode(node_4.getNodeID())).thenReturn(node_4);

  cs.getNodeTracker().addNode(node_0);
  cs.getNodeTracker().addNode(node_1);
  cs.getNodeTracker().addNode(node_2);
  cs.getNodeTracker().addNode(node_3);
  cs.getNodeTracker().addNode(node_4);

  final int numNodes = 5;
  Resource clusterResource = Resources.createResource(numNodes * (8 * GB));
  when(csContext.getNumClusterNodes()).thenReturn(numNodes);
  root.updateClusterResource(clusterResource,
      new ResourceLimits(clusterResource));

  // Setup resource-requests
  Priority priorityAM = TestUtils.createMockPriority(1);

  app_0.updateResourceRequests(Collections.singletonList(TestUtils
      .createResourceRequest(ResourceRequest.ANY, 5 * GB, 5, true,
          priorityAM, recordFactory)));

  TestUtils.applyResourceCommitRequest(clusterResource,
      a.assignContainers(clusterResource, node_0,
          new ResourceLimits(clusterResource),
          SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps);
  TestUtils.applyResourceCommitRequest(clusterResource,
      a.assignContainers(clusterResource, node_1,
          new ResourceLimits(clusterResource),
          SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps);
  TestUtils.applyResourceCommitRequest(clusterResource,
      a.assignContainers(clusterResource, node_2,
          new ResourceLimits(clusterResource),
          SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps);
  TestUtils.applyResourceCommitRequest(clusterResource,
      a.assignContainers(clusterResource, node_3,
          new ResourceLimits(clusterResource),
          SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps);
  TestUtils.applyResourceCommitRequest(clusterResource,
      a.assignContainers(clusterResource, node_4,
          new ResourceLimits(clusterResource),
          SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps);


  app_0.updateResourceRequests(Collections.singletonList(TestUtils
      .createResourceRequest(ResourceRequest.ANY, 4 * GB, 8, true,
          priorityAM, recordFactory)));

  TestUtils.applyResourceCommitRequest(clusterResource,
      a.assignContainers(clusterResource, node_0,
          new ResourceLimits(clusterResource),
          SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps);
  TestUtils.applyResourceCommitRequest(clusterResource,
      a.assignContainers(clusterResource, node_1,
          new ResourceLimits(clusterResource),
          SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps);
  TestUtils.applyResourceCommitRequest(clusterResource,
      a.assignContainers(clusterResource, node_2,
          new ResourceLimits(clusterResource),
          SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps);
  TestUtils.applyResourceCommitRequest(clusterResource,
      a.assignContainers(clusterResource, node_3,
          new ResourceLimits(clusterResource),
          SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps);
  TestUtils.applyResourceCommitRequest(clusterResource,
      a.assignContainers(clusterResource, node_4,
          new ResourceLimits(clusterResource),
          SchedulingMode.RESPECT_PARTITION_EXCLUSIVITY), nodes, apps);

  System.out.println(a.getUsedResources());

  System.out.println(a.getMetrics().getReservedMB());
  System.out.println(app_0.getReservedContainers());

  System.out.println(node_0.getReservedContainer());
  System.out.println(node_1.getReservedContainer());
  System.out.println(node_2.getReservedContainer());
  System.out.println(node_3.getReservedContainer());
  System.out.println(node_4.getReservedContainer());
}
{code}

> Limit application resource reservation on nodes for non-node/rack specific 
> requests shoud be supported in CS.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10546
>                 URL: https://issues.apache.org/jira/browse/YARN-10546
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 3.3.0
>            Reporter: zhuqi
>            Assignee: zhuqi
>            Priority: Major
>
> Just as fixed in YARN-4270  about FairScheduler.
> The capacityScheduler should also fixed it.
> It is a big problem in production cluster, when it happended.
> Also we should support fs convert to cs to support it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-10546) Limit application resource reservation on nodes for non-node/rack specific requests shoud be supported in CS.

Reply via email to