rangadi opened a new pull request, #41580: URL: https://github.com/apache/spark/pull/41580
[This is a continuation of #41146, to change the author of the PR. Retains the description.] ### What changes were proposed in this pull request? This change adds a new spark connect relation type `CachedRemoteRelation`, which can represent a DataFrame that's been cached on the server side. On the server side, each session has a map to cache DataFrame. DataFrame will be removed from cache when the corresponding session expires. (The caller can also evict the DataFrame from cache earlier, depending on the logic.) On the client side, a new relation type and function is added. The new function can create a DataFrame reference given a key. The key is the id of a cached DataFrame, which is usually passed from server to the client. When transforming the DataFrame reference, the server finds the actual DataFrame from the cache and replace it. One use case of this function will be streaming foreachBatch(). Server needs to call user function for every batch which takes a DataFrame as argument. With the new function, we can cache the DataFrame on the server. Pass the id back to client which can creates the DataFrame reference. ### Why are the changes needed? This change is needed to support streaming foreachBatch() in Spark Connect. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Scala unit test. Manual test. (More end to end test will be added when foreachBatch() is supported. Currently there is no way to add a dataframe to the server cache using Python.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
