itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-517556064 > I agree keeping state in memory is not scalable, and the result looks promising. It might be better to have another kind of benchmark here, like stress test, to see the performance on stateful operations and let end users guide whether they're mostly encouraged to use this implementation, or use this selectively. > > What I did for my patch was following: > https://issues.apache.org/jira/browse/SPARK-21271 > [#21733 (comment)](https://github.com/apache/spark/pull/21733#issuecomment-411207042) > I have created the following [repo](https://github.com/itsvikramagr/spark-benchmark) in similar lines to what @HeartSaVioR has done for this patch. **Setup** - Used Qubole's distribution of Apache Spark 2.4.0 for my tests. - Master Instance Type = i3.xlarge - Driver Memory = 2g - num-executors = 1 - max-executors = 1 - spark.sql.shuffle.partitions = 8 - Run time = 30 mins - Source = Rate Source - executor Memory = 7g - spark.executor.memoryOverhead=3g - Processing Time = 30 sec Executor Instance type = i3.xlarge cores per executor = 4 ratePerSec = 20k | State Storage Type | Mode | Total Trigger Execution Time | Records Processed | Total State Rows | Comments| | --- | --- | --- | --- | --- | --- | | HDFS | Append | ~7 mins | 8.6 million | 2 million | Application failed before 30 mins | | RockSB | Append | ~30 minutes | 34.6 million | 7 million | | Executor Instance type = C5d.2xlarge cores per executor = 8 ratePerSec = 30k | State Storage Type | Mode | Total Trigger Execution Time | Records Processed | Total State Rows | Comments| | --- | --- | --- | --- | --- | --- | | HDFS | Append | 8 mins | 12.6 million | 3.1 million | Application was stuck because of GC | | RockSB | Complete | ~30 minutes | 47.34 million | 12.5 million | | Executor info when HDFS state storage is used <img width="1244" alt="Screenshot 2019-08-02 at 10 58 21 AM" src="https://user-images.githubusercontent.com/5220941/62346639-79443f80-b514-11e9-82ff-c41bdd2d5a91.png">
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
