Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21733
@tdas
I found the spare time to run performance tests though I've run only one
app for now... I couldn't run the tests concurrently. Please let me know if you
are not confident with the results from one app: I'll find more time to go with
all test cases. Hope this number could give confident to accept the patch.
> Machine info.
MBP 15-inch Mid 2015
* i7 2.5Ghz (4 core)
* 16GB 1600 Mhz DDR3
* SSD 512G
> Test information
* base commit : c9914cf (latest master branch)
* patch internally rebased with base commit before testing
* spark-submit options: master local[3] --driver-memory 6g
* I don't run perf. test with all cores and memory: I left some spare
resource for OS and background apps.
> Performance test code
https://github.com/HeartSaVioR/iot-trucking-app-spark-structured-streaming/blob/master/src/main/scala/com/hortonworks/spark/benchmark/BenchmarkMovingAggregationsListener.scala
Please note that there're 4 more apps (big key size, big value size, many
key columns, many value columns) in same repository.
> Test result
Both of version didn't catch up rate per seconds 200000, but since
processed rows per second were around 188000 I felt I don't need to adjust rate
per seconds more tightly (like 185000, 190000, etc...).
The numbers for input rows per seconds and processed rows per second are
calculated by taking average of 3 batches (38, 39, 40 respectively). The
numbers regarding state are picked when total state rows went to 60000.
version | input rows per second | processed rows per second | total state
rows | used bytes of current state version
---- | ---- | ---- | ---- | ----
| latest master (c9914cf) | 200492.065 | 188880.316 | 60000 | 17,755,895 |
| patch (on top of c9914cf) | 199242.598 | 188160.833 | 60000 | 14,687,543
|
So while two processed rows per seconds didn't show outstanding difference
(under 1%), the patch reduced memory usage of state (for latest version) by
17.29 %. One thing to note is, in performance test, state is saved to the local
SSD. It may give (small? trivial?) performance benefit on the patch when we set
remote checkpoint directory.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]