[GitHub] [spark] maropu commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

GitBox Sun, 12 Jul 2020 00:56:14 -0700


maropu commented on pull request #29067:
URL: https://github.com/apache/spark/pull/29067#issuecomment-657188271



   Hi, @revans2 , thanks for the work. I have two questions now;
   > The use case we have right now is in connection with 
https://github.com/NVIDIA/spark-rapids where we would like to provide 
compression that is better suited for both the CPU and the GPU.
   
   1.  You cannot use the data source V2 interface for your purpose? (Or, you 
cannot extend the interface for that?) That's because I think the interface is 
intended for interconnection between Spark and other systems (or files). The 
current cache structure is tightly coupled to the Spark internal one, so I'm 
not sure that we can directly expose it to 3rd-party developers.
   2. If `SPARK_CACHE_SERIALIZER` specified,  does the current approach replace 
all the existing caching operations with custom ones? Users cannot select which 
cache structure (default or custom) is used on runtime?
   
   Btw, could you move your design comment above into the PR description? Since 
it will appear in a commit log, it is important to write the detail of a PR 
proposal for better traceability.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu commented on pull request #29067: [SPARK-32274][SQL] Make SQL cache serialization pluggable

Reply via email to