[GitHub] [spark] viirya commented on a change in pull request #34502: [SPARK-37224][SS] Optimize write path on RocksDB state store provider

GitBox Thu, 18 Nov 2021 01:02:14 -0800


viirya commented on a change in pull request #34502:
URL: https://github.com/apache/spark/pull/34502#discussion_r752031160




##########
File path: docs/structured-streaming-programming-guide.md
##########
@@ -1956,8 +1956,21 @@ Here are the configs regarding to RocksDB instance of 
the state store provider:
     <td>Whether we resets all ticker and histogram stats for RocksDB on 
load.</td>
     <td>True</td>
   </tr>
+  <tr>
+    <td>spark.sql.streaming.stateStore.rocksdb.trackTotalNumberOfRows</td>
+    <td>Whether we track the total number of rows in state store. Please refer 
the details in <a href="#performance-aspect-considerations">Performance-aspect 
considerations</a>.</td>
+    <td>True</td>
+  </tr>
 </table>
 
+##### Performance-aspect considerations
+
+1. For write-heavy workloads, you may want to disable the track of total 
number of rows.

Review comment:
       What it means "write-heavy workloads" in this context? Should we use the 
terms that are more understandable under streaming context? E.g., throughput? 
rows per second?
   
   Because this seems indicating state store, I'm not sure how users measure if 
it is write-heavy on the state store.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #34502: [SPARK-37224][SS] Optimize write path on RocksDB state store provider

Reply via email to