hvanhovell commented on PR #40147: URL: https://github.com/apache/spark/pull/40147#issuecomment-1442630247
@xkrogen yes, Spark Connect decouples client and server. That we synchronize jars and REPL generated classes does not mean we are re-introducing the coupling between client and server. All it means is that we will be using the server (driver) to orchistrate the execution of user defined code. This code could be executed within the same engine, however we can also use separate processes or VMs to execute this code. I do feel that adding this kind of functionality to connect keeps thing simple from the client's POV; it does not make a lot sense to me to do the synchronisation through a different service, because we need to go through the driver anyway. I do want to call out that this mechanism is just not here for JARs and REPL generated classed, but that there are also quite a few other use cases that we need this mechanism for, e.g.: reading client local files, synchronizing other kinds of dependencies (for example python wheels), uploading models, ... Finally it is early days for connect. We currently want to make it easy to folks to try and to move to connect. We need a REPL experience that is similar to the current experience for that. We want to do the simple thing first, and that is reuse the current UDF execution code (with some classpath isolation). I am happy to discuss the steps after that, if you are up to that please reach out to me ([email protected]). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
