jonmio commented on code in PR #53820:
URL: https://github.com/apache/spark/pull/53820#discussion_r2725883000


##########
python/pyspark/sql/tests/test_session.py:
##########
@@ -461,6 +462,114 @@ def test_invalid_create(self):
             messageParameters={},
         )
 
+class SparkSessionBuilderCreateTests(unittest.TestCase, PySparkErrorTestUtils):

Review Comment:
   Thanks for the detailed review! I've added some tests to improve coverage, 
but also wanted to address a few of your high level concerns around connect 
mode compatibility, error/handling failure modes, and behaviors with multiple 
sessions. Importantly, this change is introducing parity with Connect (create 
already exists there and we are leaning on existing connect test coverage to 
ensure we have no introduced new regressions), multiple sessions attached to 
the same context already exists via the newSession API, and we are reusing 
existing primitives used by getOrCreate and newSession (e.g. session 
construction helpers and locking). Here are my thoughts on some of the more 
specific comments:
                                                                                
  
   > Connect mode fallback & invalid remote URLs
   
   This PR doesn't touch connect mode at all, and we rely on existing connect 
test coverage
                                                      
   >  Are there race conditions between create() and getOrCreate()?             
                                                                               
    
   Both methods use the same self._lock (line 606) and rely on 
SparkContext.getOrCreate() which has its own locking via SparkContext._lock. No 
new race conditions introduced here - just reusing the existing thread-safe 
mechanisms.                                                      
                                                                                
  
   >  Duplicate config keys
   
   Standard SparkConf behavior - last write wins. Same as getOrCreate(), 
nothing specific to create().                                                   
    
                                                                                
  
    > Session stop/cleanup:                                                     
   
   
   Stopping a session stops the parent SparkContext. This stops all sessions 
attached to the SparkContext, but this is not new behavior in Classic.
                                                                                
  
    > Conflicting SparkContext configs
                                           
   SparkContext.getOrCreate() handles this at line 615. When a SparkContext  
already exists, new configs apply to the session only, not the shared context. 
Same behavior as getOrCreate() - there's even a comment about this    at line 
558: "Do not update SparkConf for existing SparkContext, as it's shared by all 
sessions."                                                     
                                                                                
  
   Let me know if you would like specific tests for any of these behaviors 
though, happy to add it.
                                                



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to