jonmio commented on code in PR #53820:
URL: https://github.com/apache/spark/pull/53820#discussion_r2725883000
##########
python/pyspark/sql/tests/test_session.py:
##########
@@ -461,6 +462,114 @@ def test_invalid_create(self):
messageParameters={},
)
+class SparkSessionBuilderCreateTests(unittest.TestCase, PySparkErrorTestUtils):
Review Comment:
Thanks for the detailed review! I've added some tests to improve coverage,
but also wanted to address a few of your high level concerns around connect
mode compatibility, error/handling failure modes, and behaviors with multiple
sessions. Importantly, this change is introducing parity with Connect (create
already exists there and we are leaning on existing connect test coverage to
ensure we have no introduced new regressions), multiple sessions attached to
the same context already exists via the newSession API, and we are reusing
existing primitives used by getOrCreate and newSession (e.g. session
construction helpers and locking). Here are my thoughts on some of the more
specific comments:
> Connect mode fallback & invalid remote URLs
This PR doesn't touch connect mode at all, and we rely on existing connect
test coverage
> Are there race conditions between create() and getOrCreate()?
Both methods use the same self._lock (line 606) and rely on
SparkContext.getOrCreate() which has its own locking via SparkContext._lock. No
new race conditions introduced here - just reusing the existing thread-safe
mechanisms.
> Duplicate config keys
Standard SparkConf behavior - last write wins. Same as getOrCreate(),
nothing specific to create().
> Session stop/cleanup:
Stopping a session stops the parent SparkContext. This stops all sessions
attached to the SparkContext, but this is not new behavior in Classic.
> Conflicting SparkContext configs
SparkContext.getOrCreate() handles this at line 615. When a SparkContext
already exists, new configs apply to the session only, not the shared context.
Same behavior as getOrCreate() - there's even a comment about this at line
558: "Do not update SparkConf for existing SparkContext, as it's shared by all
sessions."
Let me know if you would like specific tests for any of these behaviors
though, happy to add it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]