hvanhovell commented on PR #40147:
URL: https://github.com/apache/spark/pull/40147#issuecomment-1442630247

   @xkrogen yes, Spark Connect decouples client and server. That we synchronize 
jars and REPL generated classes does not mean we are re-introducing the 
coupling between client and server. All it means is that we will be using the 
server (driver) to orchistrate the execution of user defined code. This code 
could be executed within the same engine, however we can also use separate 
processes or VMs to execute this code. I do feel that adding this kind of 
functionality to connect keeps thing simple from the client's POV; it does not 
make a lot sense to me to do the synchronisation through a different service, 
because we need to go through the driver anyway.
   
   I do want to call out that this mechanism is just not here for JARs and REPL 
generated classed, but that there are also quite a few other use cases that we 
need this mechanism for, e.g.: reading client local files, synchronizing other 
kinds of dependencies (for example python wheels), uploading models, ...
   
   Finally it is early days for connect. We currently want to make it easy to 
folks to try and to move to connect. We need a REPL experience that is similar 
to the current experience for that. We want to do the simple thing first, and 
that is reuse the current UDF execution code (with some classpath isolation). I 
am happy to discuss the steps after that, if you are up to that please reach 
out to me ([email protected]).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to