marmbrus edited a comment on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation URL: https://github.com/apache/spark/pull/24922#issuecomment-541199399 First of all, I think this is great. Thanks for working on it! I tend to agree with @gatorsmile that we should consider making this feature an external package. The argument here is not that RocksDB is a bad choice. For many workloads it is likely a great option. However, for some workloads people might want to use cassandra, or leveldb or something else. Should we also allow new features to add dependencies on these? As the Spark project continues to grow, I think it is important that we guard against the core becoming a swiss army knife, with too many different configurations for the community to maintain in the long run. In this case we are not only adding a new dependency, but we are also committing the Spark to supporting the specifics of how you are packaging and uploading the RocksDB files *forever*. The whole reason we added an API for state stores to plug in was to enable this kind of innovation outside of the core of spark. If this package becomes super popular, I would reconsider this position, similar to how avro and csv were eventually inlined into core spark. > With integration in spark codebase, we can probably change the code in any way later, but if we take the separate jar route, the kind of extensions you can make are limited by the current contract. For example @skonto mentioned one of way where we can abstract state storage implementation to get the best out of rocksdb. How can we support such improvement of we take spark package route? If the abstraction boundaries are wrong here, we should improve the APIs, not punch through them. I don't think this is a good argument for putting this into Spark.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
