yaooqinn edited a comment on pull request #30045:
URL: https://github.com/apache/spark/pull/30045#issuecomment-739534847


   > I have a question about the semantics. We currently reset to the initial 
state of the first session. You can have many different sessions (e.g. in 
thrift server) with different initial settings, IMO it would be more sane to 
reset to the initial state of the current session. WDYT?
   
   I did some research: 
   
   w/o calling `clearDefaultSession` and `clearActiveSession`(there was a PR to 
make these 2 internal but reverted for some reasons), the existing APIs for 
creating new `SparkSession` cannot lead users to such a situation.
   
   Let's assume the initial configs we talk about here is those going with the 
instantiating process of a SparkSession instance, not those being the 
first-time set
   1. w/ the SparkSession.newSession() API, there are **no** parameters 
provided to set  initial configs
   2. w the SparkSession.Builder.getOrCreate, we are just referencing the 
original one only but **w/o creating a new SparkSession instance**.
   
   So do we have a way to actually create a new session with initial configs 
when there is an existing active one? The answer is NO.
   
   This situation only happens when these 2 APIs called, but the current 
approach actually meets our goal here. We do keep the session configs per 
session in such a use case. 
   
   The actual problem here that revealed in the following case is that the 
GLOBAL SharedState is not being shared after those clear-like APIs being called
   
   ```scala
   scala> org.apache.spark.sql.SparkSession.clearDefaultSession()
   
   scala> org.apache.spark.sql.SparkSession.clearActiveSession()
   
   scala> val nes = 
org.apache.spark.sql.SparkSession.builder.config("spark.sql.warehouse.dir", 
"w2").config("spark.sql.globalTempDatabase", "m2").config("spark.sql.custom", 
"xyz").getOrCreate
   20/12/07 00:59:35 WARN SparkContext: Using an existing SparkContext; some 
configuration may not take effect.
   nes: org.apache.spark.sql.SparkSession = 
org.apache.spark.sql.SparkSession@175f1ff5
   
   scala> nes.conf.get("spark.sql.warehouse.dir")
   20/12/07 01:00:06 WARN SharedState: Not allowing to set 
spark.sql.warehouse.dir or hive.metastore.warehouse.dir in SparkSession's 
options, it should be set statically for cross-session usages
   res2: String = w2
   
   scala> nes.conf.get("spark.sql.globalTempDatabase")
   res3: String = m2
   
   scala> nes.conf.get("spark.sql.custom")
   res4: String = xyz
   
   scala> nes.sql("reset")
   res5: org.apache.spark.sql.DataFrame = []
   
   scala> nes.conf.get("spark.sql.globalTempDatabase")
   res6: String = m2
   
   scala> nes.conf.get("spark.sql.custom")
   res7: String = xyz
   ```
   
   cc @gatorsmile @cloud-fan 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to