[ 
https://issues.apache.org/jira/browse/HDDS-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-3466:
-----------------------------
    Target Version/s: 0.7.0  (was: 0.6.0)

> Improve filterViableNodes performance in pipeline creation
> ----------------------------------------------------------
>
>                 Key: HDDS-3466
>                 URL: https://issues.apache.org/jira/browse/HDDS-3466
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: SCM
>    Affects Versions: 0.5.0
>            Reporter: Li Cheng
>            Priority: Major
>              Labels: TriagePending
>
> Per [~sodonnell]'s investigation, pipeline creation may have performance 
> issue once the load-sorting algorithm in HDDS-3139. 
>  
> This task is to track potential performance bottleneck caused by this sorting 
> operation for pipeline creation in large scale cluster.
>  
> I am a little concerned about the expense of forming the list of healthy 
> nodes on large clusters. We have to do quite a lot of work to form a list and 
> then only use 3 nodes from the list. Even the method 
> {{currentPipelineCount()}} needs to do a few map lookups per node to get the 
> current pipeline count. This is the case even before this change. Creating a 
> pipeline on a large cluster would be expensive already, but this change 
> probably makes it worse, due to the sort needed. I know it was me who 
> suggested the sort.
> I think the code as it is will work OK upto about 1000 nodes, and then the 
> performance will drop off as the number of nodes goes toward 10k.
> Eg here are some benchmarks I created using this test code, which is similar 
> to what we are doing in {{filterViableNodes()}}:
>  
> {{  public List<Object> sortingWithMap(BenchmarkState state) \{
>     return state.otherList.stream()
>         .map(o -> new Mock(o, state.rand.nextInt(20)))
>         .filter(o -> o.getSize() <= 20)
>         .sorted(Comparator.comparingInt(Mock::getSize))
>         .map(o -> o.getObject())
>         .collect(Collectors.toList());
>   }}}
> The OPs per second for various list sizes are:
>  
> {{Benchmark               (listSize)   Mode  Cnt       Score     Error  Units
> Sorting.sortingWithMap         100  thrpt    3  113948.345 ± 446.426  ops/s
> Sorting.sortingWithMap        1000  thrpt    3    9468.507 ± 894.138  ops/s
> Sorting.sortingWithMap        5000  thrpt    3    1931.612 ± 263.919  ops/s
> Sorting.sortingWithMap       10000  thrpt    3     970.745 ±  25.823  ops/s
> Sorting.sortingWithMap      100000  thrpt    3      87.684 ±  35.438  ops/s}}
> For a 1000 node cluster, with 10 pipelines per node, we would be looking at 
> about 1 second to form all the piplines.
> For a 5k node cluster, it would be about 25 seconds
> For a 10k node cluster it would be 103 seconds, but even here, that would be 
> at close to 1000 pipelines per second.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to