YunFan Zhou commented on YARN-6361:

[~yufeigu]  Thank you very much, Yufei. 
I have to say that this JIRA's description describes a very good idea, and I 
have achieved the corresponding code.
I used *O(N)* complexity before sorting to calculate the properties of each 
application,  and then according to the computed properties of each application 
ordering all applications using *O(n*log(n))* complexity.
It does look a lot faster. And I'll test the bench mark tomorrow, and see how 
the optimization of this approach improves performance compared to the 
unoptimized performance.

Here are some snippets of the code I've written, and many of the details have 
to be changed. But I think our general idea is similar, using O(N) complexity 
to calculate each of the Schedulable attribute values (such as the 
*SchedulableInnerAttr*). Then, order N of the SchedulableInnerAttr using 
O(n*log(n)) time complexity. 
Because every member variable of the *SchedulableInnerAttr *is the base type, 
it can be very fast in sorting. 
And it will be much faster than the traditional sort. 
I will upload the corresponding detailed code and bench mark for the test 
Agree with what I said?

  private Schedulable sched;
  private boolean isDemandNotNone;
  private boolean isNeedy;
  private double minShareRatio;
  private boolean isWeightExceedZero;
  private double memorySizePerWeight;
  private long startTime;
  private String name;
  public SchedulableInnerAttr(Schedulable sched) {
    this.sched = sched;
    this.isDemandNotNone = sched.getDemand().equals(Resources.none());
    Resource resourceUsage = sched.getResourceUsage();
    Resource minShare = Resources.min(RESOURCE_CALCULATOR, null,
        sched.getMinShare(), sched.getDemand());
    this.isNeedy = Resources.lessThan(RESOURCE_CALCULATOR, null,
        resourceUsage, minShare);
    this.minShareRatio = (double) resourceUsage.getMemorySize() /
        Resources.max(RESOURCE_CALCULATOR, null, minShare, ONE)
    double weight = sched.getWeights().getWeight(ResourceType.MEMORY);
    this.isWeightExceedZero = weight > 0.0;
    this.memorySizePerWeight = resourceUsage.getMemorySize();
    if (this.isWeightExceedZero) {
      this.memorySizePerWeight /= weight;
    this.startTime = sched.getStartTime();
    this.name = sched.getName();

   public int compare(SchedulableInnerAttr s1, SchedulableInnerAttr s2) {
      int res = 0;
      if (s1.isDemandNotNone() && !s2.isDemandNotNone()) {
        res = -1;
      } else if (!s1.isDemandNotNone() && s2.isDemandNotNone()) {
        res = 1;
      if (res == 0) {
        if (s1.isNeedy() && !s2.isNeedy()) {
          res = -1;
        } else if (!s1.isNeedy() && s2.isNeedy()) {
          res = 1;
      if (res == 0) {
        res = (int) Math.signum(s1.getMinShareRatio() - s2.getMinShareRatio());
      if (res == 0) {
        if (s1.isWeightExceedZero() && !s2.isWeightExceedZero()) {
          res = 1;
        } else if (!s1.isWeightExceedZero() && s2.isWeightExceedZero()) {
          res = -1;
      if (res == 0) {
        res = (int) Math.signum(s1.getMemorySizePerWeight() - 
      if (res == 0) {
        res = (int) Math.signum(s1.getStartTime() - s2.getStartTime());
      if (res == 0) {
        res = s1.getName().compareTo(s2.getName());
      return res;

> FairScheduler: FSLeafQueue.fetchAppsWithDemand CPU usage is high with big 
> queues
> --------------------------------------------------------------------------------
>                 Key: YARN-6361
>                 URL: https://issues.apache.org/jira/browse/YARN-6361
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Miklos Szegedi
>            Assignee: YunFan Zhou
>         Attachments: dispatcherthread.png, threads.png
> FSLeafQueue.fetchAppsWithDemand sorts the applications by the current policy. 
> Most of the time is spent in FairShareComparator.compare. We could improve 
> this by doing the calculations outside the sort loop {{(O\(n\))}} and we 
> sorted by a fixed number inside instead {{O(n*log\(n\))}}. This could be an 
> performance issue when there are huge number of applications in a single 
> queue. The attachments shows the performance impact when there are 10k 
> applications in one queue.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to