[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37196: [SPARK-39781][SS] Add support for providing max_open_files to rocksdb state store provider

GitBox Thu, 14 Jul 2022 20:33:48 -0700


HeartSaVioR commented on code in PR #37196:
URL: https://github.com/apache/spark/pull/37196#discussion_r921777850



##########
docs/structured-streaming-programming-guide.md:
##########
@@ -1958,6 +1958,11 @@ Here are the configs regarding to RocksDB instance of 
the state store provider:
     <td>The waiting time in millisecond for acquiring lock in the load 
operation for RocksDB instance.</td>
     <td>60000</td>
   </tr>
+  <tr>
+    <td>spark.sql.streaming.stateStore.rocksdb.maxOpenFiles</td>

Review Comment:
   I think eventually we want to have a JVM-wide level of config since the 
actual problem is not only from a single RocksDB instance in the executor but 
also from multiple RocksDB instances in the executor.
   (It'd be much more complicated since it would require coordinator to 
consider when performing scheduling for stateful task.)
   
   This config still requires end users to predict the number of stateful tasks 
being assigned and run in a single executor, which is exactly what we claimed 
that this is one of the issue in HDFS backed state store (due to memory) and we 
resolved the issue with RocksDB state store. But it is still a great start and 
lot better than what we have nothing.



##########
docs/structured-streaming-programming-guide.md:
##########
@@ -1958,6 +1958,11 @@ Here are the configs regarding to RocksDB instance of 
the state store provider:
     <td>The waiting time in millisecond for acquiring lock in the load 
operation for RocksDB instance.</td>
     <td>60000</td>
   </tr>
+  <tr>
+    <td>spark.sql.streaming.stateStore.rocksdb.maxOpenFiles</td>

Review Comment:
   I think eventually we want to have a JVM-wide level of config since the 
actual problem is not only from a single RocksDB instance in the executor but 
also from multiple RocksDB instances in the executor.
   (It'd be much more complicated since it would require coordinator to 
consider when performing scheduling for stateful task.)
   
   This config still requires end users to predict the number of stateful tasks 
being assigned and run in a single executor, which is exactly what we claimed 
that this is one of the issue in HDFS backed state store (due to memory) and we 
resolved the issue with RocksDB state store. But it is still a great start and 
lot better than nothing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #37196: [SPARK-39781][SS] Add support for providing max_open_files to rocksdb state store provider

Reply via email to