-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/
-----------------------------------------------------------

(Updated Jan. 31, 2018, 10:12 a.m.)


Review request for Aurora and Jordan Ly.


Changes
-------

Applied Stephan's suggestion, added a benchmark to validate.


Repository: aurora


Description
-------

Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style 
rather than functional.  I arrived at this result after running benchmarks with 
some of the other usual suspects (`ArrayList`, `LinkedList`).

This patch also enables stack and heap profilers in jmh (more details 
[here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
 providing insight into the heap impact of changes.  I started this change with 
a heap profiler as the primary motivation, and ended up using it to guide this 
improvement.


Diffs (updated)
-----

  build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1 
  src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 
9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
b59999ca9a5185e240ad729fefc6638476a4aecc 


Diff: https://reviews.apache.org/r/65303/diff/2/

Changes: https://reviews.apache.org/r/65303/diff/1-2/


Testing (updated)
-------

Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here is 
an abridged version.  It shows that task fetch throughput universally improves 
by ~2x (mod error margins), and heap allocation reduces by at least the same 
factor.  Overall GC time increases slightly as captured here, but the stddev 
was anecdotally high across runs.  I chose to present this output as a caveat 
and a discussion point.

If you scroll to the full output at the bottom, you will see some more granular 
allocation data.  Please note that the `norm` stats are normalized for the 
number of operations, which i find to be the most useful measure for validating 
a change.  Quoting the jmh sample link above:
```quote
It is often useful to look into non-normalized counters to see if the test is 
allocation/GC-bound (figure the allocation pressure "ceiling" for your 
configuration!), and normalized counters to see the more precise benchmark 
behavior.
```

Prior to this patch:
```console
Benchmark                                    (numTasks)         Score         
Error   Units
FetchAll.run                                      10000       481.529 ±     
184.751   ops/s
FetchAll.run:·gc.alloc.rate.norm                  10000    334970.771 ±   
33544.960    B/op

FetchAll.run                                      50000        78.652 ±      
20.869   ops/s
FetchAll.run:·gc.alloc.rate.norm                  50000   3991107.524 ±  
701585.657    B/op

FetchAll.run                                     100000        38.371 ±      
11.710   ops/s
FetchAll.run:·gc.alloc.rate.norm                 100000  13487028.139 ± 
3369614.510    B/op

IndexedFetchAndFilter.run                         10000       296.557 ±     
198.389   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm     10000    655319.005 ±   
98138.360    B/op

IndexedFetchAndFilter.run                         50000        50.300 ±       
5.818   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm     50000   6671548.381 ±  
452020.849    B/op

IndexedFetchAndFilter.run                        100000        17.637 ±       
3.739   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm    100000  28100173.458 ± 
4486308.188    B/op
```

With this patch:
```console
Benchmark                                    (numTasks)         Score         
Error   Units
FetchAll.run                                      10000      1653.572 ±     
799.123   ops/s
FetchAll.run:·gc.alloc.rate.norm                  10000    155426.052 ±   
10345.657    B/op

FetchAll.run                                      50000       210.454 ±      
54.340   ops/s
FetchAll.run:·gc.alloc.rate.norm                  50000   1457560.505 ±  
228631.547    B/op

FetchAll.run                                     100000        97.783 ±      
42.130   ops/s
FetchAll.run:·gc.alloc.rate.norm                 100000   5096464.582 ± 
1792136.191    B/op

IndexedFetchAndFilter.run                         10000       500.740 ±     
210.675   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm     10000    370760.068 ±   
36813.071    B/op

IndexedFetchAndFilter.run                         50000        95.316 ±      
23.084   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm     50000   3389472.432 ±  
550602.162    B/op

IndexedFetchAndFilter.run                        100000        41.572 ±      
26.747   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm    100000  12324183.188 ± 
7537788.165    B/op
```


**Full benchmark output**

Prior to this patch:
```console
Benchmark                                                   (numTasks)         
Score         Error   Units
FetchAll.run                                                     10000       
481.529 ±     184.751   ops/s
FetchAll.run:·gc.alloc.rate                                      10000       
148.678 ±      42.890  MB/sec
FetchAll.run:·gc.alloc.rate.norm                                 10000    
334970.771 ±   33544.960    B/op
FetchAll.run:·gc.churn.PS_Eden_Space                             10000       
146.991 ±     135.486  MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm                        10000    
332983.005 ±  347401.950    B/op
FetchAll.run:·gc.churn.PS_Survivor_Space                         10000         
0.804 ±       1.823  MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    10000      
1784.147 ±    3904.546    B/op
FetchAll.run:·gc.count                                           10000         
9.000                counts
FetchAll.run:·gc.time                                            10000       
143.000                    ms

FetchAll.run                                                     50000        
78.652 ±      20.869   ops/s
FetchAll.run:·gc.alloc.rate                                      50000       
250.771 ±      34.190  MB/sec
FetchAll.run:·gc.alloc.rate.norm                                 50000   
3991107.524 ±  701585.657    B/op
FetchAll.run:·gc.churn.PS_Eden_Space                             50000       
250.131 ±     144.214  MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm                        50000   
3999003.844 ± 2907196.744    B/op
FetchAll.run:·gc.churn.PS_Old_Gen                                50000         
6.937 ±      20.180  MB/sec
FetchAll.run:·gc.churn.PS_Old_Gen.norm                           50000    
111462.141 ±  322286.235    B/op
FetchAll.run:·gc.churn.PS_Survivor_Space                         50000         
6.056 ±       4.371  MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    50000     
96534.909 ±   73072.098    B/op
FetchAll.run:·gc.count                                           50000        
22.000                counts
FetchAll.run:·gc.time                                            50000      
3222.000                    ms

FetchAll.run                                                    100000        
38.371 ±      11.710   ops/s
FetchAll.run:·gc.alloc.rate                                     100000       
343.280 ±      63.923  MB/sec
FetchAll.run:·gc.alloc.rate.norm                                100000  
13487028.139 ± 3369614.510    B/op
FetchAll.run:·gc.churn.PS_Eden_Space                            100000       
343.804 ±     147.542  MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm                       100000  
13524848.537 ± 7132093.384    B/op
FetchAll.run:·gc.churn.PS_Old_Gen                               100000         
7.251 ±      26.847  MB/sec
FetchAll.run:·gc.churn.PS_Old_Gen.norm                          100000    
286256.200 ± 1043939.286    B/op
FetchAll.run:·gc.churn.PS_Survivor_Space                        100000        
11.448 ±      16.645  MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm                   100000    
440924.671 ±  539369.420    B/op
FetchAll.run:·gc.count                                          100000        
53.000                counts
FetchAll.run:·gc.time                                           100000      
8664.000                    ms

IndexedFetchAndFilter.run                                        10000       
296.557 ±     198.389   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate                         10000       
178.657 ±      96.891  MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    10000    
655319.005 ±   98138.360    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                10000       
181.829 ±     115.598  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           10000    
669894.533 ±  362265.228    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            10000         
1.017 ±       2.764  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       10000      
3509.419 ±    8933.232    B/op
IndexedFetchAndFilter.run:·gc.count                              10000        
11.000                counts
IndexedFetchAndFilter.run:·gc.time                               10000       
174.000                    ms

IndexedFetchAndFilter.run                                        50000        
50.300 ±       5.818   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate                         50000       
271.042 ±      35.522  MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    50000   
6671548.381 ±  452020.849    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                50000       
278.006 ±     188.990  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           50000   
6835542.988 ± 4208216.383    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                   50000         
7.836 ±      22.513  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm              50000    
194944.435 ±  557587.333    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            50000         
6.063 ±       2.432  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       50000    
148960.731 ±   42282.391    B/op
IndexedFetchAndFilter.run:·gc.count                              50000        
24.000                counts
IndexedFetchAndFilter.run:·gc.time                               50000      
3059.000                    ms

IndexedFetchAndFilter.run                                       100000        
17.637 ±       3.739   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate                        100000       
336.740 ±      69.527  MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm                   100000  
28100173.458 ± 4486308.188    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space               100000       
336.494 ±      88.830  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm          100000  
28063164.240 ± 4888826.638    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                  100000         
8.028 ±      37.263  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm             100000    
672808.968 ± 2924497.150    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space           100000        
11.351 ±      17.881  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm      100000    
930977.737 ± 1252367.282    B/op
IndexedFetchAndFilter.run:·gc.count                             100000        
47.000                counts
IndexedFetchAndFilter.run:·gc.time                              100000      
7245.000                    ms
```

With this patch:
```console
Benchmark                                                   (numTasks)         
Score         Error   Units
FetchAll.run                                                     10000      
1653.572 ±     799.123   ops/s
FetchAll.run:·gc.alloc.rate                                      10000       
236.532 ±      98.709  MB/sec
FetchAll.run:·gc.alloc.rate.norm                                 10000    
155426.052 ±   10345.657    B/op
FetchAll.run:·gc.churn.PS_Eden_Space                             10000       
247.755 ±      55.490  MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm                        10000    
163873.606 ±   59092.580    B/op
FetchAll.run:·gc.churn.PS_Survivor_Space                         10000         
1.328 ±       1.540  MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    10000       
883.684 ±    1120.393    B/op
FetchAll.run:·gc.count                                           10000        
18.000                counts
FetchAll.run:·gc.time                                            10000       
191.000                    ms

FetchAll.run                                                     50000       
210.454 ±      54.340   ops/s
FetchAll.run:·gc.alloc.rate                                      50000       
248.216 ±      15.196  MB/sec
FetchAll.run:·gc.alloc.rate.norm                                 50000   
1457560.505 ±  228631.547    B/op
FetchAll.run:·gc.churn.PS_Eden_Space                             50000       
239.336 ±     174.541  MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm                        50000   
1409078.860 ± 1141224.117    B/op
FetchAll.run:·gc.churn.PS_Old_Gen                                50000         
6.504 ±      17.220  MB/sec
FetchAll.run:·gc.churn.PS_Old_Gen.norm                           50000     
38644.950 ±  105262.889    B/op
FetchAll.run:·gc.churn.PS_Survivor_Space                         50000         
5.994 ±       4.160  MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm                    50000     
35246.411 ±   25958.915    B/op
FetchAll.run:·gc.count                                           50000        
21.000                counts
FetchAll.run:·gc.time                                            50000      
2875.000                    ms

FetchAll.run                                                    100000        
97.783 ±      42.130   ops/s
FetchAll.run:·gc.alloc.rate                                     100000       
336.209 ±      80.094  MB/sec
FetchAll.run:·gc.alloc.rate.norm                                100000   
5096464.582 ± 1792136.191    B/op
FetchAll.run:·gc.churn.PS_Eden_Space                            100000       
342.190 ±     144.180  MB/sec
FetchAll.run:·gc.churn.PS_Eden_Space.norm                       100000   
5167420.986 ± 1634774.992    B/op
FetchAll.run:·gc.churn.PS_Old_Gen                               100000        
11.783 ±      36.073  MB/sec
FetchAll.run:·gc.churn.PS_Old_Gen.norm                          100000    
182947.872 ±  525172.467    B/op
FetchAll.run:·gc.churn.PS_Survivor_Space                        100000        
12.299 ±      13.795  MB/sec
FetchAll.run:·gc.churn.PS_Survivor_Space.norm                   100000    
184635.309 ±  199254.266    B/op
FetchAll.run:·gc.count                                          100000        
46.000                counts
FetchAll.run:·gc.time                                           100000      
7778.000                    ms

IndexedFetchAndFilter.run                                        10000       
500.740 ±     210.675   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate                         10000       
171.305 ±      57.968  MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    10000    
370760.068 ±   36813.071    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                10000       
176.084 ±     103.579  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           10000    
387100.753 ±  376481.454    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            10000         
1.305 ±       1.866  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       10000      
2812.059 ±    3518.689    B/op
IndexedFetchAndFilter.run:·gc.count                              10000        
11.000                counts
IndexedFetchAndFilter.run:·gc.time                               10000       
170.000                    ms

IndexedFetchAndFilter.run                                        50000        
95.316 ±      23.084   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate                         50000       
258.291 ±      30.111  MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm                    50000   
3389472.432 ±  550602.162    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space                50000       
250.887 ±     148.296  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm           50000   
3308741.831 ± 2461004.974    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                   50000         
5.218 ±      21.710  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm              50000     
69254.269 ±  282577.478    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space            50000         
5.803 ±       2.885  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm       50000     
76523.177 ±   51120.227    B/op
IndexedFetchAndFilter.run:·gc.count                              50000        
21.000                counts
IndexedFetchAndFilter.run:·gc.time                               50000      
2775.000                    ms

IndexedFetchAndFilter.run                                       100000        
41.572 ±      26.747   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate                        100000       
331.638 ±      50.813  MB/sec
IndexedFetchAndFilter.run:·gc.alloc.rate.norm                   100000  
12324183.188 ± 7537788.165    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space               100000       
333.474 ±     116.673  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm          100000  
12357891.009 ± 7285356.875    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen                  100000        
10.296 ±      27.573  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm             100000    
371782.085 ±  910072.098    B/op
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space           100000        
11.815 ±      10.161  MB/sec
IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm      100000    
428555.780 ±  184610.507    B/op
IndexedFetchAndFilter.run:·gc.count                             100000        
49.000                counts
IndexedFetchAndFilter.run:·gc.time                              100000      
8602.000                    ms
```


Thanks,

Bill Farner

Reply via email to