skonto commented on a change in pull request #24922: [Spark 28120][SS] Rocksdb
state storage implementation
URL: https://github.com/apache/spark/pull/24922#discussion_r295748844
##########
File path: sql/core/pom.xml
##########
@@ -147,6 +147,12 @@
<artifactId>mockito-core</artifactId>
<scope>test</scope>
</dependency>
+ <!-- RocksDB dependency for Structured Streaming State Store -->
+ <dependency>
+ <groupId>org.rocksdb</groupId>
+ <artifactId>rocksdbjni</artifactId>
Review comment:
This dependency has all the files packed for all major OSs. Flink uses a
[custom build
](https://github.com/apache/flink/blob/master/flink-state-backends/flink-statebackend-rocksdb/pom.xml#L60).
Digging into this a bit more I see some additions
[modifications](https://github.com/dataArtisans/frocksdb/pull/3) as described
[here](https://github.com/azagrebin/frocksdb/commits/release). I understand
this is flink specific but how about the TTL thing mentioned there,
https://issues.apache.org/jira/browse/FLINK-10471 looks interesting. Structured
Streaming fetches all state
[here](https://github.com/apache/spark/blob/5f3658a8d8bc6b11b20eb0d3935662e08d319460/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L188)
(memory) and filters out the timed out ones, is RockDB performing the same?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]