yaooqinn edited a comment on pull request #30045:
URL: https://github.com/apache/spark/pull/30045#issuecomment-739534847
> I have a question about the semantics. We currently reset to the initial
state of the first session. You can have many different sessions (e.g. in
thrift server) with different initial settings, IMO it would be more sane to
reset to the initial state of the current session. WDYT?
I did some research:
w/o calling `clearDefaultSession` and `clearActiveSession`(there was a PR to
make these 2 internal but reverted for some reasons), the existing APIs for
creating new `SparkSession` cannot lead users to such a situation.
Let's assume the initial configs we talk about here is those going with the
instantiating process of a SparkSession instance, not those being the
first-time set
1. w/ the SparkSession.newSession() API, there are **no** parameters
provided to set initial configs
2. w the SparkSession.Builder.getOrCreate, we are just referencing the
original one only but **w/o creating a new SparkSession instance**.
So do we have a way to actually create a new session with initial configs
when there is an existing active one? The answer is NO.
This situation only happens when these 2 APIs called, but the current
approach actually meets our goal here. We do keep the session configs per
session in such a use case.
The actual problem here that revealed in the following case is that the
GLOBAL SharedState is not being shared after those clear-like APIs being called
```shell
bin/spark-shell \
--conf spark.sql.warehouse.dir=./warehouse \
--conf spark.sql.globalTempDatabase=mytemp \
--conf spark.sql.custom=abc
```
```scala
scala> org.apache.spark.sql.SparkSession.clearDefaultSession()
scala> org.apache.spark.sql.SparkSession.clearActiveSession()
scala> val nes =
org.apache.spark.sql.SparkSession.builder.config("spark.sql.warehouse.dir",
"w2").config("spark.sql.globalTempDatabase", "m2").config("spark.sql.custom",
"xyz").getOrCreate
20/12/07 00:59:35 WARN SparkContext: Using an existing SparkContext; some
configuration may not take effect.
nes: org.apache.spark.sql.SparkSession =
org.apache.spark.sql.SparkSession@175f1ff5
scala> nes.conf.get("spark.sql.warehouse.dir")
20/12/07 01:00:06 WARN SharedState: Not allowing to set
spark.sql.warehouse.dir or hive.metastore.warehouse.dir in SparkSession's
options, it should be set statically for cross-session usages
res2: String = w2
scala> nes.conf.get("spark.sql.globalTempDatabase")
res3: String = m2
scala> nes.conf.get("spark.sql.custom")
res4: String = xyz
scala> nes.sql("reset")
res5: org.apache.spark.sql.DataFrame = []
scala> nes.conf.get("spark.sql.globalTempDatabase")
res6: String = m2
scala> nes.conf.get("spark.sql.custom")
res7: String = xyz
```
cc @gatorsmile @cloud-fan
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]