[GitHub] [spark] rangadi opened a new pull request, #40937: [SPARK-42940] Improve session management for streaming queries

via GitHub Mon, 24 Apr 2023 23:12:14 -0700


rangadi opened a new pull request, #40937:
URL: https://github.com/apache/spark/pull/40937


   ### What changes were proposed in this pull request?
   This fixes couple of important issues related to session management for 
streaming queries.
   
   1. Session mapping should be maintained at connect server as long as the 
streaming query is active, even if there are no accesses from the client side. 
Currently the session mapping is dropped after 1 hour of inactivity. 
   2. When streaming query is stopped, the Spark session drops its reference to 
the streaming query object. That implies it can not accessed by remote 
spark-connect client. It is common usage pattern for users to access a 
streaming query after it is is stopped (e.g. to check its metrics, any 
exception if failed, etc). 
      - This is not a problem in legacy mode since the user code in the REPL 
keeps the reference. This is no longer the case in Spark-Connect. 
   
   *Solution*: This PR adds `SparkConnectStreamingQueryCache` that does not the 
following:
     * Each new streaming query is registered with this cache.
     * It runs a periodic task that checks the status of these queries and 
polls session mapping in connect-server so that the session stays alive.
     * When query is stopped, it cached for 1 hour more so the it can be 
accessed from remote client. 
   
   ### Why are the changes needed?
     - Explained in the above description.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   - Unit tests
   - Manual testing
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] rangadi opened a new pull request, #40937: [SPARK-42940] Improve session management for streaming queries

Reply via email to