marmbrus edited a comment on issue #24922: [SPARK-28120][SS]  Rocksdb state 
storage implementation
URL: https://github.com/apache/spark/pull/24922#issuecomment-541199399
 
 
   First of all, I think this is great.  Thanks for working on it!
   
   I tend to agree with @gatorsmile that we should consider making this feature 
an external package.
   
   The argument here is not that RocksDB is a bad choice. For many workloads it 
is likely a great option. However, for some workloads people might want to use 
cassandra, or leveldb or something else. Should we also allow new features to 
add dependencies on these?
   
   As the Spark project continues to grow, I think it is important that we 
guard against the core becoming a swiss army knife, with too many different 
configurations for the community to maintain in the long run. In this case we 
are not only adding a new dependency, but we are also committing the Spark to 
supporting the specifics of how you are packaging and uploading the RocksDB 
files *forever*.
   
   The whole reason we added an API for state stores to plug in was to enable 
this kind of innovation outside of the core of spark.
   
   If this package becomes super popular, I would reconsider this position, 
similar to how avro and csv were eventually inlined into core spark.
   
   > With integration in spark codebase, we can probably change the code in any 
way later, but if we take the separate jar route, the kind of extensions you 
can make are limited by the current contract. For example @skonto mentioned one 
of way where we can abstract state storage implementation to get the best out 
of rocksdb. How can we support such improvement of we 
   take spark package route?
   
   If the abstraction boundaries are wrong here, we should improve the APIs, 
not punch through them.  I don't think this is a good argument for putting this 
into Spark.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to