Re: [PR] [SPARK-55020][PYTHON] Do not delete remote cache in del [spark]

via GitHub Thu, 15 Jan 2026 14:43:53 -0800


gaogaotiantian commented on PR #53783:
URL: https://github.com/apache/spark/pull/53783#issuecomment-3757247829


   Okay I think we still need to make some design decisions on this matter 
because there's no obvious winner.
   
   The reality is we can't have these together:
   1. A gRPC request that can trigger at arbitrary point
   2. A synchronized gRPC call
   
   We need to at least give up one.
   
   If we give up 1., we basically disable gc when necessary, making sure it 
won't trigger when gRPC request is being built.
   
   If we don't need 2., we have a few options.
   1. Check if we are in critical function, only clear when we are not. If we 
can't clear now, we queue the cache ids and clear when
     a. next call to `_delete_ml_cache`
     b. next call to `execute_command` (or `_execute_and_fetch_as_iterator`)
     c. after current `execute_command` finishes.
   2. Always queue the request and run it when `execute_command` finishes.
   3. Have a separate thread to deal with clean up and run all cache cleanup 
work on that thread
   
   We don't have much async code on SparkConnectClient and I believe most of 
the code expected `execute_command` to return synchronously (it comes back with 
some data). We can not change the framework too much and make it async friendly.
   
   To be honest this is a bad design from the beginning. Sending out a gRPC 
request during `__del__` is a bad idea. We can delay this process to another 
`execute_command` but that also has side effects - the user might see a weird 
progress or the command might take unexpectedly long.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-55020][PYTHON] Do not delete remote cache in __del__ [spark]

Reply via email to

Re: [PR] [SPARK-55020][PYTHON] Do not delete remote cache in del [spark]