-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/
-----------------------------------------------------------

Review request for Aurora and Jordan Ly.


Repository: aurora


Description
-------

Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style 
rather than functional.  I arrived at this result after running benchmarks with 
some of the other usual suspects (`ArrayList`, `LinkedList`).

This patch also enables stack and heap profilers in jmh (more details 
[here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
 providing insight into the heap impact of changes.  I started this change with 
a heap profiler as the primary motivation, and ended up using it to guide this 
improvement.


Diffs
-----

  build.gradle 64af7ae 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
b59999c 


Diff: https://reviews.apache.org/r/65303/diff/1/


Testing
-------

Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at 
the bottom, but here is an abridged version.  It shows that task fetch 
throughput universally improves by at least 2x, and heap allocation reduces by 
at least the same factor.  Overall GC time increases slightly as captured here, 
but the stddev was anecdotally high across runs.  I chose to present this 
output as a caveat and a discussion point.

If you scroll to the full output at the bottom, you will see some more granular 
allocation data.  Please note that the `norm` stats are normalized for the 
number of operations, which i find to be the most useful measure for validating 
a change.  Quoting the jmh sample link above:
```quote
It is often useful to look into non-normalized counters to see if the test is 
allocation/GC-bound (figure the allocation pressure "ceiling" for your 
configuration!), and normalized counters to see the more precise benchmark 
behavior.
```

Prior to this patch:
```console
Benchmark                 (numTasks)    Score         Error   Units

                          10000      1066.632 ±     266.924   ops/s
·gc.alloc.rate.norm       10000    289227.205 ±    8888.051    B/op
·gc.count                 10000        24.000                counts
·gc.time                  10000       103.000                    ms

                          50000        84.444 ±      32.620   ops/s
·gc.alloc.rate.norm       50000   3831210.967 ±  840844.713    B/op
·gc.count                 50000        21.000                counts
·gc.time                  50000      1407.000                    ms

                         100000        38.645 ±      20.557   ops/s
·gc.alloc.rate.norm      100000  13555430.931 ± 6787344.701    B/op
·gc.count                100000        52.000                counts
·gc.time                 100000      3304.000                    ms
```

With this patch:
```console
Benchmark               (numTasks)   Score         Error   Units

                         10000    2851.288 ±     481.472   ops/s
·gc.alloc.rate.norm      10000  145281.908 ±    2223.621    B/op
·gc.count                10000      39.000                counts
·gc.time                 10000     130.000                    ms

                         50000     297.380 ±      35.681   ops/s
·gc.alloc.rate.norm      50000 1183791.866 ±   77487.278    B/op
·gc.count                50000      25.000                counts
·gc.time                 50000    1821.000                    ms

                        100000     122.211 ±      81.618   ops/s                
        
·gc.alloc.rate.norm     100000 4364450.973 ± 2856586.882    B/op
·gc.count               100000      52.000                counts
·gc.time                100000    3698.000                    ms
```


**Full benchmark output**

Prior to this patch:
```console
Benchmark                                                                       
 (numTasks)   Mode  Cnt         Score         Error   Units
TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                  
      10000  thrpt    5      1066.632 ±     266.924   ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                   
      10000  thrpt    5       286.647 ±      62.371  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm              
      10000  thrpt    5    289227.205 ±    8888.051    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space          
      10000  thrpt    5       291.263 ±     159.266  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm     
      10000  thrpt    5    294277.617 ±  166069.041    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space      
      10000  thrpt    5         1.218 ±       1.029  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 
      10000  thrpt    5      1220.540 ±     708.455    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                        
      10000  thrpt    5        24.000                counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                         
      10000  thrpt    5       103.000                    ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                           
      10000  thrpt                NaN                   ---
TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                  
      50000  thrpt    5        84.444 ±      32.620   ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                   
      50000  thrpt    5       267.018 ±      27.389  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm              
      50000  thrpt    5   3831210.967 ±  840844.713    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space          
      50000  thrpt    5       258.565 ±     149.845  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm     
      50000  thrpt    5   3707563.530 ± 2262218.319    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen             
      50000  thrpt    5         4.487 ±      18.053  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm        
      50000  thrpt    5     63848.757 ±  264487.651    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space      
      50000  thrpt    5         6.034 ±       3.651  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 
      50000  thrpt    5     87385.381 ±   75159.508    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                        
      50000  thrpt    5        21.000                counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                         
      50000  thrpt    5      1407.000                    ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                           
      50000  thrpt                NaN                   ---
TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                  
     100000  thrpt    5        38.645 ±      20.557   ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                   
     100000  thrpt    5       381.453 ±      63.491  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm              
     100000  thrpt    5  13555430.931 ± 6787344.701    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space          
     100000  thrpt    5       389.816 ±     123.320  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm     
     100000  thrpt    5  13823571.735 ± 6642604.600    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen             
     100000  thrpt    5         1.947 ±      16.766  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm        
     100000  thrpt    5     92330.241 ±  794991.221    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space      
     100000  thrpt    5        11.934 ±      18.565  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 
     100000  thrpt    5    414896.926 ±  551658.959    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                        
     100000  thrpt    5        52.000                counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                         
     100000  thrpt    5      3304.000                    ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                           
     100000  thrpt                NaN                   ---
```

With this patch:
```console
Benchmark                                                                       
 (numTasks)   Mode  Cnt        Score         Error   Units
TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                  
      10000  thrpt    5     2851.288 ±     481.472   ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                   
      10000  thrpt    5      384.383 ±      58.697  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm              
      10000  thrpt    5   145281.908 ±    2223.621    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space          
      10000  thrpt    5      388.851 ±     114.120  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm     
      10000  thrpt    5   147171.915 ±   50430.527    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space      
      10000  thrpt    5        1.264 ±       0.980  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 
      10000  thrpt    5      479.848 ±     420.881    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                        
      10000  thrpt    5       39.000                counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                         
      10000  thrpt    5      130.000                    ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                           
      10000  thrpt               NaN                   ---
TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                  
      50000  thrpt    5      297.380 ±      35.681   ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                   
      50000  thrpt    5      288.839 ±      19.035  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm              
      50000  thrpt    5  1183791.866 ±   77487.278    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space          
      50000  thrpt    5      296.587 ±     125.148  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm     
      50000  thrpt    5  1214497.578 ±  457975.153    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen             
      50000  thrpt    5        6.942 ±      23.492  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm        
      50000  thrpt    5    28880.733 ±   99593.659    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space      
      50000  thrpt    5        6.440 ±       3.887  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 
      50000  thrpt    5    26354.762 ±   14876.857    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                        
      50000  thrpt    5       25.000                counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                         
      50000  thrpt    5     1821.000                    ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                           
      50000  thrpt               NaN                   ---
TaskStoreBenchmarks.MemFetchTasksBenchmark.run                                  
     100000  thrpt    5      122.211 ±      81.618   ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate                   
     100000  thrpt    5      377.099 ±      77.146  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm              
     100000  thrpt    5  4364450.973 ± 2856586.882    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space          
     100000  thrpt    5      381.570 ±     119.260  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm     
     100000  thrpt    5  4415115.428 ± 3000198.792    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen             
     100000  thrpt    5        1.914 ±      16.479  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm        
     100000  thrpt    5    31833.830 ±  274098.881    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space      
     100000  thrpt    5       12.117 ±      20.931  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm 
     100000  thrpt    5   136001.918 ±  196459.666    B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count                        
     100000  thrpt    5       52.000                counts
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time                         
     100000  thrpt    5     3698.000                    ms
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack                           
     100000  thrpt               NaN                   ---
```


Thanks,

Bill Farner

Reply via email to