skonto commented on a change in pull request #24922: [Spark 28120][SS]  Rocksdb 
state storage implementation
URL: https://github.com/apache/spark/pull/24922#discussion_r295748844
 
 

 ##########
 File path: sql/core/pom.xml
 ##########
 @@ -147,6 +147,12 @@
       <artifactId>mockito-core</artifactId>
       <scope>test</scope>
     </dependency>
+    <!-- RocksDB dependency for Structured Streaming State Store -->
+    <dependency>
+      <groupId>org.rocksdb</groupId>
+      <artifactId>rocksdbjni</artifactId>
 
 Review comment:
   This dependency has all the files packed for all major OSs. Flink uses a 
[custom build 
](https://github.com/apache/flink/blob/master/flink-state-backends/flink-statebackend-rocksdb/pom.xml#L60).
 Digging into this a bit more I see some additions  
[modifications](https://github.com/dataArtisans/frocksdb/pull/3) as described 
[here](https://github.com/azagrebin/frocksdb/commits/release). I understand 
this is flink specific but how about the TTL thing mentioned there, 
https://issues.apache.org/jira/browse/FLINK-10471 looks interesting. Structured 
Streaming fetches all state 
[here](https://github.com/apache/spark/blob/5f3658a8d8bc6b11b20eb0d3935662e08d319460/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L188)
 (memory) and filters out the timed out ones, is RockDB performing well there? 
Shouldnt we have the same mechanism or a similar one so we dont fectch 
everything and delegate this to state backend (which could run in the 
background btw)?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to