itsvikramagr commented on issue #24922: [SPARK-28120][SS]  Rocksdb state 
storage implementation
URL: https://github.com/apache/spark/pull/24922#issuecomment-517556064
 
 
   > I agree keeping state in memory is not scalable, and the result looks 
promising. It might be better to have another kind of benchmark here, like 
stress test, to see the performance on stateful operations and let end users 
guide whether they're mostly encouraged to use this implementation, or use this 
selectively.
   > 
   > What I did for my patch was following:
   > https://issues.apache.org/jira/browse/SPARK-21271
   > [#21733 
(comment)](https://github.com/apache/spark/pull/21733#issuecomment-411207042)
   > 
   
   I have created the following 
[repo](https://github.com/itsvikramagr/spark-benchmark) in similar lines to 
what @HeartSaVioR has done for this patch. 
   
   **Setup**
   - Used Qubole's distribution of Apache Spark 2.4.0 for my tests. 
   - Master Instance Type =  i3.xlarge
   - Driver Memory = 2g
   - num-executors  = 1 
   - max-executors  = 1 
   - spark.sql.shuffle.partitions = 8
   - Run time = 30 mins 
   - Source = Rate Source
   - executor Memory = 7g
   - spark.executor.memoryOverhead=3g
   - Processing Time = 30 sec
   
   Executor Instance type =  i3.xlarge 
   cores per executor = 4
   ratePerSec = 20k
   
   | State Storage Type | Mode | Total Trigger Execution Time  | Records 
Processed | Total State Rows | Comments|
   | --- | --- | --- | --- | --- | --- |
   | HDFS | Append | ~7 mins | 8.6 million | 2 million | Application failed 
before 30 mins |
   | RockSB | Append | ~30 minutes | 34.6 million | 7 million |  |
   
   
   Executor Instance type = C5d.2xlarge 
   cores per executor = 8
   ratePerSec = 30k
   
   | State Storage Type | Mode | Total Trigger Execution Time  | Records 
Processed | Total State Rows | Comments|
   | --- | --- | --- | --- | --- | --- |
   | HDFS | Append | 8 mins | 12.6 million | 3.1 million | Application was 
stuck because of GC |
   | RockSB | Complete | ~30 minutes | 47.34 million | 12.5 million |  |
   
   Executor info when HDFS state storage is used 
   <img width="1244" alt="Screenshot 2019-08-02 at 10 58 21 AM" 
src="https://user-images.githubusercontent.com/5220941/62346639-79443f80-b514-11e9-82ff-c41bdd2d5a91.png";>
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to