maropu commented on pull request #29067: URL: https://github.com/apache/spark/pull/29067#issuecomment-657188271
Hi, @revans2 , thanks for the work. I have two questions now; > The use case we have right now is in connection with https://github.com/NVIDIA/spark-rapids where we would like to provide compression that is better suited for both the CPU and the GPU. 1. You cannot use the data source V2 interface for your purpose? (Or, you cannot extend the interface for that?) That's because I think the interface is intended for interconnection between Spark and other systems (or files). The current cache structure is tightly coupled to the Spark internal one, so I'm not sure that we can directly expose it to 3rd-party developers. 2. If `SPARK_CACHE_SERIALIZER` specified, does the current approach replace all the existing caching operations with custom ones? Users cannot select which cache structure (default or custom) is used on runtime? Btw, could you move your design comment above into the PR description? Since it will appear in a commit log, it is important to write the detail of a PR proposal for better traceability. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
