Re: [PR] [SPARK-56505][SQL][TESTS] Add SparkSessionBinder to replace SharedSparkSession [spark]

via GitHub Thu, 28 May 2026 21:05:07 -0700


cloud-fan commented on code in PR #56190:
URL: https://github.com/apache/spark/pull/56190#discussion_r3322066761



##########
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:
##########
@@ -1211,7 +1211,7 @@ object QueryTest extends Assertions {
 
 }
 
-class QueryTestSuite extends test.SharedSparkSession {
+class QueryTestSuite extends QueryTest with SparkSessionBinder {

Review Comment:
   This migration — mixing the bare `sql.SparkSessionBinder` into a concrete 
suite — is the shape I'd push back on. `sql.SparkSessionBinder` binds a classic 
session but exposes `spark` only as the abstract `sql.SparkSession`, so it's 
really internal plumbing, not what a test author should reach for.
   
   The end-state I'd recommend documenting and demonstrating is a binder-free 
base + per-env concrete suites:
   
   ```scala
   abstract class FooSuiteBase extends QueryTest {          // no binder; spark 
abstract
     test("shared") { checkAnswer(sql("SELECT 1"), Row(1)) }
   }
   class FooSuite extends FooSuiteBase with classic.SparkSessionBinder {
     test("classic only") { ... }
   }
   class FooConnectSuite extends FooSuiteBase
     with connect.SparkSessionBinder with connect.QueryTest {
     test("connect only") { ... }
   }
   ```
   
   `QueryTest` already mixes in `SparkSessionProvider` (via `SQLTestData`) and 
leaves `spark` abstract, so it works as the env-agnostic base directly. 
Concretely: (1) steer the migration and the `@deprecated` message at 
`classic.SparkSessionBinder` / `connect.SparkSessionBinder` + this base 
pattern, not the bare binder; (2) `QueryTestWithConnectSuite` currently 
demonstrates the retrofit path (extending an already-classic-bound 
`QueryTestSuite` and overriding the binding) — a binder-free base would 
demonstrate the cleaner pattern and double as the template authors copy.



##########
sql/connect/server/src/test/scala/org/apache/spark/sql/connect/QueryTest.scala:
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connect
+
+import org.apache.spark.{sql => sqlApi}
+
+/**
+ * Extends [[sqlApi.QueryTest]] for use with Connect sessions.
+ *
+ * Overrides [[checkAnswer]] to avoid classic-only code paths (e.g. 
`queryExecution`,
+ * `logicalPlan`, `materializedRdd`) that are not available on Connect 
DataFrames.
+ */
+trait QueryTest extends sqlApi.QueryTest with SparkSessionProvider {
+
+  override protected def checkAnswer(

Review Comment:
   This overrides only the `checkAnswer(df, Seq[Row])` variant, which is enough 
for `QueryTestSuite`. But the stated goal is re-running arbitrary `sql/core` 
suites over Connect, and the other `QueryTest` helpers (other `checkAnswer` 
overloads, `checkDataset`, ...) still reach classic-only paths like 
`queryExecution`/`logicalPlan`. Worth a line in the trait doc noting that 
broader reuse will need more overrides.



##########
sql/connect/server/src/test/scala/org/apache/spark/sql/connect/SparkSessionBinder.scala:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connect
+
+import java.util.UUID
+
+import scala.concurrent.duration._
+
+import org.scalatest.concurrent.Eventually
+
+import org.apache.spark.DebugFilesystem
+import org.apache.spark.sql
+import org.apache.spark.sql.classic
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.connect.config.Connect
+import org.apache.spark.sql.connect.service.SparkConnectService
+
+/**
+ * A test trait that provides a Connect [[SparkSession]] backed by an 
in-process gRPC server.
+ * Extends [[sql.SparkSessionBinder sql.SparkSessionBinder]] (which creates a
+ * [[classic.SparkSession classic.SparkSession]] and SparkContext), then 
layers a Connect client
+ * session on top by starting the gRPC service in-process.
+ *
+ * Mix in this trait to exercise existing sql/core test suites through the 
Connect path:
+ * {{{
+ * class FooWithConnectSuite
+ *   extends FooSuite
+ *   with connect.SparkSessionBinder
+ *   with connect.QueryTest
+ * }}}
+ */
+trait SparkSessionBinder extends sql.SparkSessionBinder {
+
+  private val serverPort: Int =
+    ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+  @volatile private var _connectSpark: SparkSession = _
+
+  protected override def spark: SparkSession = _connectSpark
+
+  /** The underlying classic session used by the in-process server. */
+  private def classicSpark: classic.SparkSession = 
super.spark.asInstanceOf[classic.SparkSession]
+
+  override def beforeAll(): Unit = {
+    super.beforeAll()
+    withSparkEnvConfs((Connect.CONNECT_GRPC_BINDING_PORT.key, 
serverPort.toString)) {
+      SparkConnectService.start(classicSpark.sparkContext)
+    }
+    val client = SparkConnectClient
+      .builder()
+      .port(serverPort)
+      .sessionId(UUID.randomUUID().toString)
+      .userId("test")
+      .build()
+    _connectSpark = SparkSession
+      .builder()
+      .client(client)
+      .create()
+  }
+
+  override def afterAll(): Unit = {
+    try {
+      if (_connectSpark != null) {
+        _connectSpark.close()
+        _connectSpark = null
+      }
+      SparkConnectService.stop()
+    } finally {
+      super.afterAll()
+    }
+  }
+
+  // The base SharedSparkSessionBase.afterEach calls spark.sharedState which 
is not supported

Review Comment:
   This comment is inaccurate after the refactor: `connect.SparkSessionBinder` 
extends `sql.SparkSessionBinder` directly (not `SharedSparkSessionBase`), and 
that parent's `afterEach` clears the cache via the private `_spark` — the 
classic session, which is exactly what's used on Connect (`createSparkSession` 
isn't overridden). So the parent's `afterEach` already works here and this 
override is redundant. If you do keep it, note that skipping 
`super.afterEach()` drops the `BeforeAndAfterEach` chain. Simplest fix is to 
remove the override entirely.



##########
sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala:
##########
@@ -19,41 +19,12 @@ package org.apache.spark.sql.test
 
 import scala.concurrent.duration._
 
-import org.scalatest.{BeforeAndAfterEach, Suite}
-import org.scalatest.concurrent.Eventually
-
-import org.apache.spark.{DebugFilesystem, SparkConf}
-import org.apache.spark.internal.config.UNSAFE_EXCEPTION_ON_MEMORY_LEAK
-import org.apache.spark.sql.{classic, QueryTest, QueryTestBase, SparkSession, 
SparkSessionProvider, SQLContext}
-import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode
-import org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation
-import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.sql
+import org.apache.spark.sql.{classic, QueryTest, QueryTestBase}
 
+@deprecated("Use SparkSessionBinder and QueryTest instead")

Review Comment:
   `@deprecated` takes a `since` version as its second argument; adding it 
documents when the deprecation started and matches the convention elsewhere in 
the codebase. Same on line 59.



##########
sql/connect/server/src/test/scala/org/apache/spark/sql/connect/QueryTestWithConnectSuite.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connect
+
+import org.apache.spark.sql.QueryTestSuite
+
+/**
+ * Runs [[QueryTestSuite]] tests through a Connect session.
+ *
+ * This validates the `FooSuite with connect.SharedSparkSession` pattern: the 
existing

Review Comment:
   There's no `connect.SharedSparkSession` trait; the pattern this suite 
actually uses (and that the sibling `connect/SparkSessionBinder.scala` doc 
shows) is `connect.SparkSessionBinder with connect.QueryTest`.
   ```suggestion
    * This validates the `FooSuite with connect.SparkSessionBinder with 
connect.QueryTest` pattern: the existing
   ```



##########
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBinder.scala:
##########
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import scala.concurrent.duration._
+
+import org.scalatest.{BeforeAndAfterEach, Suite}
+import org.scalatest.concurrent.Eventually
+
+import org.apache.spark.{DebugFilesystem, SparkConf, SparkFunSuite}
+import org.apache.spark.internal.config.UNSAFE_EXCEPTION_ON_MEMORY_LEAK
+import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode
+import org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.sql.test.TestSparkSession
+
+trait SparkSessionBinder
+  extends SparkFunSuite
+  with SparkSessionProvider
+  with BeforeAndAfterEach
+  with Eventually {
+
+  protected def sparkConf = {
+    val conf = new SparkConf()
+      .set("spark.hadoop.fs.file.impl", classOf[DebugFilesystem].getName)
+      .set(UNSAFE_EXCEPTION_ON_MEMORY_LEAK, true)
+      .set(SQLConf.CODEGEN_FALLBACK.key, "false")
+      .set(SQLConf.CODEGEN_FACTORY_MODE.key, 
CodegenObjectFactoryMode.CODEGEN_ONLY.toString)
+      // Disable ConvertToLocalRelation for better test coverage. Test cases 
built on
+      // LocalRelation will exercise the optimization rules better by 
disabling it as
+      // this rule may potentially block testing of other optimization rules 
such as
+      // ConstantPropagation etc.
+      .set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, 
ConvertToLocalRelation.ruleName)
+    conf.set(
+      StaticSQLConf.WAREHOUSE_PATH,
+      conf.get(StaticSQLConf.WAREHOUSE_PATH) + "/" + getClass.getCanonicalName)
+    conf.set(StaticSQLConf.LOAD_SESSION_EXTENSIONS_FROM_CLASSPATH, false)
+    conf.set(StaticSQLConf.SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD,
+      sys.env.getOrElse("SPARK_TEST_SQL_SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD",
+        
StaticSQLConf.SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD.defaultValueString).toInt)
+    conf.set(StaticSQLConf.RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD,
+      
sys.env.getOrElse("SPARK_TEST_SQL_RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD",
+        
StaticSQLConf.RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD.defaultValueString).toInt)
+  }
+
+  /**
+   * The [[TestSparkSession]] to use for all tests in this suite.
+   *
+   * By default, the underlying [[org.apache.spark.SparkContext]] will be run 
in local
+   * mode with the default test configurations.
+   */
+  private var _spark: classic.SparkSession = null
+
+  protected override def spark: SparkSession = _spark
+
+  /**
+   * The [[TestSQLContext]] to use for all tests in this suite.
+   */
+  protected implicit def sqlContext: SQLContext = _spark.sqlContext
+
+  protected def createSparkSession: classic.SparkSession = {
+    classic.SparkSession.cleanupAnyExistingSession()
+    new TestSparkSession(sparkConf)
+  }
+
+  protected def sqlConf: SQLConf = _spark.sessionState.conf
+
+  /**
+   * Initialize the [[TestSparkSession]].  Generally, this is just called from
+   * beforeAll; however, in test using styles other than FunSuite, there is
+   * often code that relies on the session between test group constructs and
+   * the actual tests, which may need this session.  It is purely a semantic
+   * difference, but semantically, it makes more sense to call
+   * 'initializeSession' between a 'describe' and an 'it' call than it does to
+   * call 'beforeAll'.
+   */
+  protected def initializeSession(): Unit = {
+    if (_spark == null) {
+      _spark = createSparkSession
+    }
+  }
+
+  /**
+   * Suites extending [[SharedSparkSession]] are sharing resources (e.g. 
SparkSession) in their

Review Comment:
   This doc moved out of `SharedSparkSession`; the snapshot-before-init logic 
now lives in this trait, so referring to `SharedSparkSession` is stale.
   ```suggestion
      * Suites extending this trait are sharing resources (e.g. SparkSession) 
in their
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56505][SQL][TESTS] Add SparkSessionBinder to replace SharedSparkSession [spark]

Reply via email to