[jira] [Comment Edited] (SPARK-34198) Add RocksDB StateStore as external module

2021-02-14 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284491#comment-17284491
 ] 

Jungtaek Lim edited comment on SPARK-34198 at 2/14/21, 8:02 PM:


Thanks for considering it. I think it would be the best option for Apache Spark 
among these if it makes sense to Databricks as well, just because it has been 
served for years with enterprise level of support. We can't expect the 
stability from other options and may struggle with it for some period - it'd be 
best if we can avoid it.
(Worth noting that second one may also provide enterprise level of support, but 
less than an year, and I had 50+ of review comments on proposed PR and 
personally didn't feel the PR was super solid at that time. I mean, for me, the 
PR was not proposed with production level quality at first.)


was (Author: kabhwan):
Thanks for considering it. I think it would be the best option for Apache Spark 
among these if it makes sense to Databricks as well, just because it has been 
served for years with enterprise level of support. We can't expect the 
stability from other options and may struggle with it for some period - it'd be 
best if we can avoid it.
(Worth noting that second one may also provide enterprise level of support, but 
less than an year, and I had 50+ of review comments on proposed PR and 
personally didn't feel the PR was super solid at that time. I mean, for me, the 
PR was not proposed with production quality at first.)

> Add RocksDB StateStore as external module
> -
>
> Key: SPARK-34198
> URL: https://issues.apache.org/jira/browse/SPARK-34198
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Currently Spark SS only has one built-in StateStore implementation 
> HDFSBackedStateStore. Actually it uses in-memory map to store state rows. As 
> there are more and more streaming applications, some of them requires to use 
> large state in stateful operations such as streaming aggregation and join.
> Several other major streaming frameworks already use RocksDB for state 
> management. So it is proven to be good choice for large state usage. But 
> Spark SS still lacks of a built-in state store for the requirement.
> We would like to explore the possibility to add RocksDB-based StateStore into 
> Spark SS. For the concern about adding RocksDB as a direct dependency, our 
> plan is to add this StateStore as an external module first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-34198) Add RocksDB StateStore as external module

2021-02-14 Thread Reynold Xin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17284479#comment-17284479
 ] 

Reynold Xin edited comment on SPARK-34198 at 2/14/21, 6:59 PM:
---

I don't know the intricate details of it but I suspect it's a different one 
with much more features because it existed long before those two.


was (Author: rxin):
I don't know the intricate details of it but I suspect it's a different one 
because it existed long before those two.

> Add RocksDB StateStore as external module
> -
>
> Key: SPARK-34198
> URL: https://issues.apache.org/jira/browse/SPARK-34198
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Currently Spark SS only has one built-in StateStore implementation 
> HDFSBackedStateStore. Actually it uses in-memory map to store state rows. As 
> there are more and more streaming applications, some of them requires to use 
> large state in stateful operations such as streaming aggregation and join.
> Several other major streaming frameworks already use RocksDB for state 
> management. So it is proven to be good choice for large state usage. But 
> Spark SS still lacks of a built-in state store for the requirement.
> We would like to explore the possibility to add RocksDB-based StateStore into 
> Spark SS. For the concern about adding RocksDB as a direct dependency, our 
> plan is to add this StateStore as an external module first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org