Re: Review Request 65303: Improve performance of MemTaskStore queries

2018-01-23 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196085
---



Master (dbe7137) is green with this patch.
  ./build-support/jenkins/build.sh

However, it appears that it might lack test coverage.

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Jan. 24, 2018, 12:32 a.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> ---
> 
> (Updated Jan. 24, 2018, 12:32 a.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative 
> style rather than functional.  I arrived at this result after running 
> benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details 
> [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
>  providing insight into the heap impact of changes.  I started this change 
> with a heap profiler as the primary motivation, and ended up using it to 
> guide this improvement.
> 
> 
> Diffs
> -
> 
>   build.gradle 64af7ae 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
> b5c 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/1/
> 
> 
> Testing
> ---
> 
> Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at 
> the bottom, but here is an abridged version.  It shows that task fetch 
> throughput universally improves by at least 2x, and heap allocation reduces 
> by at least the same factor.  Overall GC time increases slightly as captured 
> here, but the stddev was anecdotally high across runs.  I chose to present 
> this output as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more 
> granular allocation data.  Please note that the `norm` stats are normalized 
> for the number of operations, which i find to be the most useful measure for 
> validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is 
> allocation/GC-bound (figure the allocation pressure "ceiling" for your 
> configuration!), and normalized counters to see the more precise benchmark 
> behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark (numTasks)Score Error   Units
> 
>   1  1066.632 ± 266.924   ops/s
> ·gc.alloc.rate.norm   1289227.205 ±.051B/op
> ·gc.count 124.000counts
> ·gc.time  1   103.000ms
> 
>   584.444 ±  32.620   ops/s
> ·gc.alloc.rate.norm   5   3831210.967 ±  840844.713B/op
> ·gc.count 521.000counts
> ·gc.time  5  1407.000ms
> 
>  1038.645 ±  20.557   ops/s
> ·gc.alloc.rate.norm  10  13555430.931 ± 6787344.701B/op
> ·gc.count1052.000counts
> ·gc.time 10  3304.000ms
> ```
> 
> With this patch:
> ```console
> Benchmark   (numTasks)   Score Error   Units
> 
>  12851.288 ± 481.472   ops/s
> ·gc.alloc.rate.norm  1  145281.908 ±2223.621B/op
> ·gc.count1  39.000counts
> ·gc.time 1 130.000ms
> 
>  5 297.380 ±  35.681   ops/s
> ·gc.alloc.rate.norm  5 1183791.866 ±   77487.278B/op
> ·gc.count5  25.000counts
> ·gc.time 51821.000ms
> 
> 10 122.211 ±  81.618   ops/s  
>   
> ·gc.alloc.rate.norm 10 4364450.973 ± 2856586.882B/op
> ·gc.count   10  52.000counts
> ·gc.time103698.000ms
> ```
> 
> 
> **Full benchmark output**
> 
> Prior to this patch:
> ```console
> Benchmark 
>(numTasks)   Mode  Cnt Score Error   Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run
> 1  thrpt5  

Review Request 65303: Improve performance of MemTaskStore queries

2018-01-23 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/
---

Review request for Aurora and Jordan Ly.


Repository: aurora


Description
---

Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style 
rather than functional.  I arrived at this result after running benchmarks with 
some of the other usual suspects (`ArrayList`, `LinkedList`).

This patch also enables stack and heap profilers in jmh (more details 
[here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
 providing insight into the heap impact of changes.  I started this change with 
a heap profiler as the primary motivation, and ended up using it to guide this 
improvement.


Diffs
-

  build.gradle 64af7ae 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
b5c 


Diff: https://reviews.apache.org/r/65303/diff/1/


Testing
---

Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at 
the bottom, but here is an abridged version.  It shows that task fetch 
throughput universally improves by at least 2x, and heap allocation reduces by 
at least the same factor.  Overall GC time increases slightly as captured here, 
but the stddev was anecdotally high across runs.  I chose to present this 
output as a caveat and a discussion point.

If you scroll to the full output at the bottom, you will see some more granular 
allocation data.  Please note that the `norm` stats are normalized for the 
number of operations, which i find to be the most useful measure for validating 
a change.  Quoting the jmh sample link above:
```quote
It is often useful to look into non-normalized counters to see if the test is 
allocation/GC-bound (figure the allocation pressure "ceiling" for your 
configuration!), and normalized counters to see the more precise benchmark 
behavior.
```

Prior to this patch:
```console
Benchmark (numTasks)Score Error   Units

  1  1066.632 ± 266.924   ops/s
·gc.alloc.rate.norm   1289227.205 ±.051B/op
·gc.count 124.000counts
·gc.time  1   103.000ms

  584.444 ±  32.620   ops/s
·gc.alloc.rate.norm   5   3831210.967 ±  840844.713B/op
·gc.count 521.000counts
·gc.time  5  1407.000ms

 1038.645 ±  20.557   ops/s
·gc.alloc.rate.norm  10  13555430.931 ± 6787344.701B/op
·gc.count1052.000counts
·gc.time 10  3304.000ms
```

With this patch:
```console
Benchmark   (numTasks)   Score Error   Units

 12851.288 ± 481.472   ops/s
·gc.alloc.rate.norm  1  145281.908 ±2223.621B/op
·gc.count1  39.000counts
·gc.time 1 130.000ms

 5 297.380 ±  35.681   ops/s
·gc.alloc.rate.norm  5 1183791.866 ±   77487.278B/op
·gc.count5  25.000counts
·gc.time 51821.000ms

10 122.211 ±  81.618   ops/s

·gc.alloc.rate.norm 10 4364450.973 ± 2856586.882B/op
·gc.count   10  52.000counts
·gc.time103698.000ms
```


**Full benchmark output**

Prior to this patch:
```console
Benchmark   
 (numTasks)   Mode  Cnt Score Error   Units
TaskStoreBenchmarks.MemFetchTasksBenchmark.run  
  1  thrpt5  1066.632 ± 266.924   ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate   
  1  thrpt5   286.647 ±  62.371  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm  
  1  thrpt5289227.205 ±.051B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space  
  1  thrpt5   291.263 ± 159.266  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 
  1  thrpt5294277.617 ±  166069.041B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space  
  1  thrpt5 1.218 ±   1.029  MB/sec

Re: Review Request 65281: Support PARTITIONED state in SLA calculations

2018-01-23 Thread David McLaughlin


> On Jan. 23, 2018, 6:11 p.m., Santhosh Kumar Shanmugham wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java
> > Lines 319 (patched)
> > 
> >
> > Shouldn't `RUNNING` -> `PARTITIONED` be `DOWN`?
> 
> Bill Farner wrote:
> +1
> 
> Stephan Erb wrote:
> I considered this as at first as well. However, it is the users choice to 
> be in `PARTITIONED` state, so this should not be counted as "cluster-induced 
> downtime". In addition, `PARTITIONED` does not indicate that the task is 
> down. It merely means that we don't know.
> 
> David McLaughlin wrote:
> That was my thought too. They are optimistically still running.
> 
> Santhosh Kumar Shanmugham wrote:
> That makes sense. 
> 
> Shouldn't we then include the time spent in `PARTITIONED` as `UP` in that 
> case? If I am reading it correctly, we will exclude time spent in 
> `PARTITIONED`, even if the task came an `UP` state.

You're right, this should carry the same logic as RUNNING.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65281/#review196047
---


On Jan. 23, 2018, 6:02 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65281/
> ---
> 
> (Updated Jan. 23, 2018, 6:02 a.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Support PARTITIONED state in SLA calculations. Also added a test to protect 
> against this test failing in the future.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
> 
> 
> Diff: https://reviews.apache.org/r/65281/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 65281: Support PARTITIONED state in SLA calculations

2018-01-23 Thread Santhosh Kumar Shanmugham


> On Jan. 23, 2018, 10:11 a.m., Santhosh Kumar Shanmugham wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java
> > Lines 319 (patched)
> > 
> >
> > Shouldn't `RUNNING` -> `PARTITIONED` be `DOWN`?
> 
> Bill Farner wrote:
> +1
> 
> Stephan Erb wrote:
> I considered this as at first as well. However, it is the users choice to 
> be in `PARTITIONED` state, so this should not be counted as "cluster-induced 
> downtime". In addition, `PARTITIONED` does not indicate that the task is 
> down. It merely means that we don't know.
> 
> David McLaughlin wrote:
> That was my thought too. They are optimistically still running.

That makes sense. 

Shouldn't we then include the time spent in `PARTITIONED` as `UP` in that case? 
If I am reading it correctly, we will exclude time spent in `PARTITIONED`, even 
if the task came an `UP` state.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65281/#review196047
---


On Jan. 22, 2018, 10:02 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65281/
> ---
> 
> (Updated Jan. 22, 2018, 10:02 p.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Support PARTITIONED state in SLA calculations. Also added a test to protect 
> against this test failing in the future.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
> 
> 
> Diff: https://reviews.apache.org/r/65281/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 65281: Support PARTITIONED state in SLA calculations

2018-01-23 Thread David McLaughlin


> On Jan. 23, 2018, 6:11 p.m., Santhosh Kumar Shanmugham wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java
> > Lines 319 (patched)
> > 
> >
> > Shouldn't `RUNNING` -> `PARTITIONED` be `DOWN`?
> 
> Bill Farner wrote:
> +1
> 
> Stephan Erb wrote:
> I considered this as at first as well. However, it is the users choice to 
> be in `PARTITIONED` state, so this should not be counted as "cluster-induced 
> downtime". In addition, `PARTITIONED` does not indicate that the task is 
> down. It merely means that we don't know.

That was my thought too. They are optimistically still running.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65281/#review196047
---


On Jan. 23, 2018, 6:02 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65281/
> ---
> 
> (Updated Jan. 23, 2018, 6:02 a.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Support PARTITIONED state in SLA calculations. Also added a test to protect 
> against this test failing in the future.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
> 
> 
> Diff: https://reviews.apache.org/r/65281/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 65281: Support PARTITIONED state in SLA calculations

2018-01-23 Thread Bill Farner


> On Jan. 23, 2018, 10:11 a.m., Santhosh Kumar Shanmugham wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java
> > Lines 319 (patched)
> > 
> >
> > Shouldn't `RUNNING` -> `PARTITIONED` be `DOWN`?

+1


- Bill


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65281/#review196047
---


On Jan. 22, 2018, 10:02 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65281/
> ---
> 
> (Updated Jan. 22, 2018, 10:02 p.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Support PARTITIONED state in SLA calculations. Also added a test to protect 
> against this test failing in the future.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
> 
> 
> Diff: https://reviews.apache.org/r/65281/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 65281: Support PARTITIONED state in SLA calculations

2018-01-23 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65281/#review196047
---




src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java
Lines 319 (patched)


Shouldn't `RUNNING` -> `PARTITIONED` be `DOWN`?


- Santhosh Kumar Shanmugham


On Jan. 22, 2018, 10:02 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65281/
> ---
> 
> (Updated Jan. 22, 2018, 10:02 p.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Support PARTITIONED state in SLA calculations. Also added a test to protect 
> against this test failing in the future.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
> 
> 
> Diff: https://reviews.apache.org/r/65281/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 65281: Support PARTITIONED state in SLA calculations

2018-01-23 Thread Jordan Ly

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65281/#review196046
---


Ship it!




Nice test for future cases.

- Jordan Ly


On Jan. 23, 2018, 6:02 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65281/
> ---
> 
> (Updated Jan. 23, 2018, 6:02 a.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Support PARTITIONED state in SLA calculations. Also added a test to protect 
> against this test failing in the future.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
> 5d8d5bd8f705770979f284d26d2e932aabe707e5 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
> 2e719ac6b7aea86faa22deff2cc6b5f73135761c 
> 
> 
> Diff: https://reviews.apache.org/r/65281/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>