[ 
https://issues.apache.org/jira/browse/HDDS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDDS-3466:
--------------------------------
    Description: 
Per [~sodonnell]'s investigation, pipeline creation may have performance issue 
once the load-sorting algorithm in HDDS-3139. 

 

This task is to track potential performance bottleneck caused by this sorting 
operation for pipeline creation in large scale cluster.

 

I am a little concerned about the expense of forming the list of healthy nodes 
on large clusters. We have to do quite a lot of work to form a list and then 
only use 3 nodes from the list. Even the method {{currentPipelineCount()}} 
needs to do a few map lookups per node to get the current pipeline count. This 
is the case even before this change. Creating a pipeline on a large cluster 
would be expensive already, but this change probably makes it worse, due to the 
sort needed. I know it was me who suggested the sort.

I think the code as it is will work OK upto about 1000 nodes, and then the 
performance will drop off as the number of nodes goes toward 10k.

Eg here are some benchmarks I created using this test code, which is similar to 
what we are doing in {{filterViableNodes()}}:

 

{{  public List<Object> sortingWithMap(BenchmarkState state) \{
    return state.otherList.stream()
        .map(o -> new Mock(o, state.rand.nextInt(20)))
        .filter(o -> o.getSize() <= 20)
        .sorted(Comparator.comparingInt(Mock::getSize))
        .map(o -> o.getObject())
        .collect(Collectors.toList());
  }}}

The OPs per second for various list sizes are:

 

{{Benchmark               (listSize)   Mode  Cnt       Score     Error  Units
Sorting.sortingWithMap         100  thrpt    3  113948.345 ± 446.426  ops/s
Sorting.sortingWithMap        1000  thrpt    3    9468.507 ± 894.138  ops/s
Sorting.sortingWithMap        5000  thrpt    3    1931.612 ± 263.919  ops/s
Sorting.sortingWithMap       10000  thrpt    3     970.745 ±  25.823  ops/s
Sorting.sortingWithMap      100000  thrpt    3      87.684 ±  35.438  ops/s}}

For a 1000 node cluster, with 10 pipelines per node, we would be looking at 
about 1 second to form all the piplines.

For a 5k node cluster, it would be about 25 seconds

For a 10k node cluster it would be 103 seconds, but even here, that would be at 
close to 1000 pipelines per second.

  was:
Per [~sodonnell]'s investigation, pipeline creation may have performance issue 
once the load-sorting algorithm in 
https://issues.apache.org/jira/browse/HDDS-3139. 

 

This task is to track potential performance bottleneck caused by this sorting 
operation for pipeline creation in large scale cluster.

 

I am a little concerned about the expense of forming the list of healthy nodes 
on large clusters. We have to do quite a lot of work to form a list and then 
only use 3 nodes from the list. Even the method {{currentPipelineCount()}} 
needs to do a few map lookups per node to get the current pipeline count. This 
is the case even before this change. Creating a pipeline on a large cluster 
would be expensive already, but this change probably makes it worse, due to the 
sort needed. I know it was me who suggested the sort.

I think the code as it is will work OK upto about 1000 nodes, and then the 
performance will drop off as the number of nodes goes toward 10k.

Eg here are some benchmarks I created using this test code, which is similar to 
what we are doing in {{filterViableNodes()}}:

 

{{  public List<Object> sortingWithMap(BenchmarkState state) \{
    return state.otherList.stream()
        .map(o -> new Mock(o, state.rand.nextInt(20)))
        .filter(o -> o.getSize() <= 20)
        .sorted(Comparator.comparingInt(Mock::getSize))
        .map(o -> o.getObject())
        .collect(Collectors.toList());
  }}}

The OPs per second for various list sizes are:

 

{{Benchmark               (listSize)   Mode  Cnt       Score     Error  Units
Sorting.sortingWithMap         100  thrpt    3  113948.345 ± 446.426  ops/s
Sorting.sortingWithMap        1000  thrpt    3    9468.507 ± 894.138  ops/s
Sorting.sortingWithMap        5000  thrpt    3    1931.612 ± 263.919  ops/s
Sorting.sortingWithMap       10000  thrpt    3     970.745 ±  25.823  ops/s
Sorting.sortingWithMap      100000  thrpt    3      87.684 ±  35.438  ops/s}}

For a 1000 node cluster, with 10 pipelines per node, we would be looking at 
about 1 second to form all the piplines.

For a 5k node cluster, it would be about 25 seconds

For a 10k node cluster it would be 103 seconds, but even here, that would be at 
close to 1000 pipelines per second.


> Improve filterViableNodes performance in pipeline creation
> ----------------------------------------------------------
>
>                 Key: HDDS-3466
>                 URL: https://issues.apache.org/jira/browse/HDDS-3466
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: SCM
>    Affects Versions: 0.5.0
>            Reporter: Li Cheng
>            Priority: Major
>
> Per [~sodonnell]'s investigation, pipeline creation may have performance 
> issue once the load-sorting algorithm in HDDS-3139. 
>  
> This task is to track potential performance bottleneck caused by this sorting 
> operation for pipeline creation in large scale cluster.
>  
> I am a little concerned about the expense of forming the list of healthy 
> nodes on large clusters. We have to do quite a lot of work to form a list and 
> then only use 3 nodes from the list. Even the method 
> {{currentPipelineCount()}} needs to do a few map lookups per node to get the 
> current pipeline count. This is the case even before this change. Creating a 
> pipeline on a large cluster would be expensive already, but this change 
> probably makes it worse, due to the sort needed. I know it was me who 
> suggested the sort.
> I think the code as it is will work OK upto about 1000 nodes, and then the 
> performance will drop off as the number of nodes goes toward 10k.
> Eg here are some benchmarks I created using this test code, which is similar 
> to what we are doing in {{filterViableNodes()}}:
>  
> {{  public List<Object> sortingWithMap(BenchmarkState state) \{
>     return state.otherList.stream()
>         .map(o -> new Mock(o, state.rand.nextInt(20)))
>         .filter(o -> o.getSize() <= 20)
>         .sorted(Comparator.comparingInt(Mock::getSize))
>         .map(o -> o.getObject())
>         .collect(Collectors.toList());
>   }}}
> The OPs per second for various list sizes are:
>  
> {{Benchmark               (listSize)   Mode  Cnt       Score     Error  Units
> Sorting.sortingWithMap         100  thrpt    3  113948.345 ± 446.426  ops/s
> Sorting.sortingWithMap        1000  thrpt    3    9468.507 ± 894.138  ops/s
> Sorting.sortingWithMap        5000  thrpt    3    1931.612 ± 263.919  ops/s
> Sorting.sortingWithMap       10000  thrpt    3     970.745 ±  25.823  ops/s
> Sorting.sortingWithMap      100000  thrpt    3      87.684 ±  35.438  ops/s}}
> For a 1000 node cluster, with 10 pipelines per node, we would be looking at 
> about 1 second to form all the piplines.
> For a 5k node cluster, it would be about 25 seconds
> For a 10k node cluster it would be 103 seconds, but even here, that would be 
> at close to 1000 pipelines per second.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to