Github user vijoshi commented on a diff in the pull request:
https://github.com/apache/spark/pull/16119#discussion_r93594802
--- Diff: python/pyspark/sql/context.py ---
@@ -72,8 +72,13 @@ def __init__(self, sparkContext, sparkSession=None,
jsqlContext=None):
self._sc = sparkContext
self._jsc = self._sc._jsc
self._jvm = self._sc._jvm
+
if sparkSession is None:
- sparkSession = SparkSession(sparkContext)
+ if sparkContext is SparkContext._active_spark_context:
+ sparkSession = SparkSession.builder.getOrCreate()
--- End diff --
1. the `SQLContext __init__` suggested that any sparkSession object could
be injected. And the `SparkSession __init__` suggested any sparkContext could
be injected. If not so - I thought they would all be invoking `getOrCreate()`
to get instances of each other.
2. I found a few hits around this property -
`spark.driver.allowMultipleContexts` which though not a python property -
showed that there may be cases where someone (for testing etc) wanted the
ability to bypass the one sparkContext limitation.
These two combined I was not 100% sure someone couldn't send in different
sparkContext objects / sparkSession objects tied to different sparkContexts
when constructing an SQLContext.
I will admit I am not familiar with Python side of things, so I hoped this
PR review would be the best way to get this clarified :).
If everyone agrees that we needn't worry about what sparkSession is passed
in when constructing SQLContext, then I can get rid of the `if sparkContext is
SparkContext._active_spark_context:` check.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]