Re: [PR] [SPARK-56505][SQL][TESTS] Add SparkSessionBinder to replace SharedSparkSession [spark]

via GitHub Wed, 10 Jun 2026 15:45:10 -0700


cloud-fan commented on code in PR #56190:
URL: https://github.com/apache/spark/pull/56190#discussion_r3392033486



##########
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:
##########
@@ -156,7 +157,7 @@ trait QueryTestBase
    * @param df the [[DataFrame]] to be executed
    * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s.
    */
-  protected def checkAnswer(df: => DataFrame, expectedAnswer: Seq[Row]): Unit 
= {
+  override protected def checkAnswer(df: => DataFrame, expectedAnswer: 
Seq[Row]): Unit = {

Review Comment:
   This override is why the whole connect DSv2 suite fails with 
`UNSUPPORTED_CONNECT_FEATURE.DATASET_QUERY_EXECUTION`. It wins the 
linearization over the connect-compatible `CheckAnswerHelper.checkAnswer`, and 
it routes every call through `assertEmptyMissingInput` (`df.queryExecution`) 
and `QueryTest.checkAnswer` (`df.materializedRdd`) — both classic-only. Any 
suite with `QueryTestBase` in its linearization gets this path, including the 
DSv2 traits (`QueryTest with SessionQueryTestBase`), so mixing in 
`connect.SparkSessionBinder` can't make an existing suite Connect-runnable. 
This confirms my earlier comment on the since-deleted 
`connect/QueryTest.scala`: the helpers reaching classic-only APIs are the real 
blocker.
   
   I'd fix it here rather than in each suite: delegate to `super.checkAnswer` 
(the helper) and apply the classic-only extras — `assertEmptyMissingInput` and 
the `checkToRDD` pass — only when `df` is a `classic.DataFrame`.



##########
sql/connect/server/src/test/scala/org/apache/spark/sql/connect/DataSourceV2DataFrameConnectSuite.scala:
##########
@@ -34,7 +34,7 @@ import 
org.apache.spark.sql.connector.catalog.{CachingInMemoryTableCatalog, InMe
  * this class only provides the Connect-specific session, catalog access, and 
result comparison.
  */
 class DataSourceV2DataFrameConnectSuite
-    extends SparkConnectServerTest
+    extends SessionQueryTest

Review Comment:
   Switching from `SparkConnectServerTest` to a suite-wide session drops the 
per-test isolation the old base provided (`afterEach` -> 
`invalidateAllSessions()`), and that's what the `TABLE_OR_VIEW_ALREADY_EXISTS` 
CI failures are: server-side catalog state now leaks across tests. The classic 
peer compensates with `after { cachingcat.clearCache(); catalogManager.reset() 
}` (`DataSourceV2DataFrameSuite.scala:86-89`); this suite has no counterpart. 
I'd recreate the client session per test and invalidate server sessions in 
`afterEach` — that restores the isolation the caching-catalog tests are written 
against; failing that, add the same per-test catalog cleanup the classic suite 
uses.



##########
sql/core/src/test/scala/org/apache/spark/sql/ExampleSuite.scala:
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+class ExampleSuite extends SessionQueryTest {
+
+  test("replaceWhere with partitioned table preserves all partitions") {
+    withTable("foo") {
+      val data = Seq(
+        (1, "Alice", 29),
+        (2, "Bob", 35),
+        (3, "Charlie", 23),
+      )
+
+      val df = spark.createDataFrame(data).toDF("id", "name", "age")
+
+      df.write.partitionBy("age").format("delta").saveAsTable("foo")

Review Comment:
   `delta` isn't a data source in Apache Spark — this suite fails every CI run 
with `DATA_SOURCE_NOT_FOUND` (and `replaceWhere` is Delta-specific too). This 
looks like a local experiment that got committed. If you want a template suite 
demonstrating the pattern, rewrite it against a built-in source or the 
in-memory v2 catalog with assertions that hold here; otherwise drop both 
`ExampleSuite` and `ExampleConnectSuite`.



##########
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:
##########
@@ -823,21 +789,39 @@ trait QueryTest extends SparkFunSuite with QueryTestBase {
   }
 }
 
-object QueryTest extends Assertions {
+@deprecated("superseded by CheckAnswerHelper", since = "4.2")
+object QueryTest extends CheckAnswerHelper {

Review Comment:
   Now that the object extends `CheckAnswerHelper`, it carries two parallel 
copies of the comparison logic — the helper's private ones and the public 
`getErrorMessageInCheckAnswer`/`prepareAnswer`/`sameRows` kept below. The 
public surface is still called (e.g. `AggregationQuerySuite.scala:1089`), so it 
can't be deleted, but the kept methods could delegate to the helper's internals 
so the logic exists once. Worth doing while the file is being restructured — 
drift between the two copies will be invisible.



##########
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBinder.scala:
##########
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import scala.concurrent.duration._
+
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach, Suite}
+import org.scalatest.concurrent.Eventually
+
+import org.apache.spark.{DebugFilesystem, SparkConf, SparkFunSuite}
+import org.apache.spark.internal.config.UNSAFE_EXCEPTION_ON_MEMORY_LEAK
+import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode
+import org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.sql.test.TestSparkSession
+
+/**
+ * Provides a [[spark]] implementation by creating a [[classic.SparkSession]].
+ *
+ * counterpart to [[SparkSessionProvider]], used in 
[[org.apache.spark.sql.test.SharedSparkSession]]
+ */
+trait SparkSessionBinder extends SparkSessionBinderBase { self: SparkFunSuite 
=>
+
+  /**
+   * Suites extending this trait are sharing resources (e.g. SparkSession) in 
their
+   * tests. This trait initializes the spark session in its [[beforeAll()]] 
implementation before
+   * the automatic thread snapshot is performed, so the audit code could fail 
to report threads
+   * leaked by that shared session.
+   *
+   * The behavior is overridden here to take the snapshot before the spark 
session is initialized.
+   */
+  override protected val enableAutoThreadAudit = false
+
+  protected override def beforeAll(): Unit = {
+    doThreadPreAudit()
+    super.beforeAll()
+  }
+
+  protected override def afterAll(): Unit = {
+    try {
+      super.afterAll()
+    } finally {
+      doThreadPostAudit()
+    }
+  }
+}
+
+trait SparkSessionBinderBase

Review Comment:
   The Base/Binder split is doing real work, but nothing documents it, so it 
reads as permanent vocabulary when it's transitional. As far as I can trace, 
the split exists because (a) the thread audit needs `self: SparkFunSuite` while 
the FunSpec/WordSpec/FlatSpec suites reachable via `SharedSparkSessionBase` 
need `self: Suite`, and (b) the deprecated `SharedSparkSession` aliases 
recompose the binder with the old `QueryTest` API. Once those migrate, the 
binder content can merge into `SessionQueryTest`, with the per-env variants 
overriding `spark`/`createSparkSession` directly. Please add a scaladoc 
paragraph stating each layer's role, and a TODO recording that cleanup plan — 
this PR is establishing the test-infra vocabulary people will copy, and the 
layering rationale shouldn't have to be reverse-engineered from 
`GenericFunSpecSuite`.



##########
sql/core/src/test/scala/org/apache/spark/sql/CheckAnswerHelper.scala:
##########
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.util.TimeZone
+
+import org.scalatest.Assertions
+
+import org.apache.spark.sql.catalyst.plans.logical
+import org.apache.spark.util.{SparkErrorUtils, SparkStringUtils}
+
+trait CheckAnswerHelper extends Assertions {
+
+  /**
+   * Runs the plan and makes sure the answer matches the expected result.
+   *
+   * @param df the DataFrame to be executed
+   * @param expectedAnswer the expected result in a Seq of Rows.
+   */
+  protected def checkAnswer(df: => DataFrame, expectedAnswer: Seq[Row]): Unit 
= {
+    getErrorMessageInCheckAnswer(df, expectedAnswer) match {
+      case Some(errorMessage) => fail(errorMessage)
+      case None =>
+    }
+  }
+
+  protected def isDfSorted(df: DataFrame): Boolean = {
+    df match {
+      case df: classic.DataFrame =>
+        df.logicalPlan.collectFirst { case s: logical.Sort => s }.nonEmpty
+      case _ => throw new RuntimeException(s"Cannot determine whether df is 
sorted: $df")
+    }
+  }
+
+  /**
+   * Runs the plan and makes sure the answer matches the expected result.
+   * If there was exception during the execution or the contents of the 
DataFrame does not
+   * match the expected result, an error message will be returned. Otherwise, 
a None will
+   * be returned.
+   *
+   * @param df the DataFrame to be executed
+   * @param expectedAnswer the expected result in a Seq of Rows.
+   */
+  private def getErrorMessageInCheckAnswer(
+      df: DataFrame,
+      expectedAnswer: Seq[Row]): Option[String] = {
+    val sparkAnswer = try df.collect().toSeq catch {
+      case e: Exception =>
+        val errorMessage =
+          s"""
+             |Exception thrown while executing query:
+             |${df.queryExecution}

Review Comment:
   Both failure messages here interpolate `${df.queryExecution}`, which throws 
`UNSUPPORTED_CONNECT_FEATURE` on Connect client DataFrames — so on Connect, a 
*failing* `checkAnswer` reports the unsupported-API error instead of the row 
mismatch. Since this helper exists to be connect-compatible, guard the 
interpolation: include the plan only for `classic.DataFrame` and fall back to 
`df.toString` otherwise. (Same on line 81.)



##########
sql/connect/server/src/test/scala/org/apache/spark/sql/connect/SessionQueryTest.scala:
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.connect
+
+import org.apache.spark.sql
+
+/**
+ * Overrides test utils to implement 'connect variants' of suites declared in 
sql/core:
+ * {{{
+ *   // in sql/core
+ *   FooSuite extends SessionQueryTest { test("") { ... } }
+ *
+ *   // in sql/connect
+ *   FooConnectSuite extends connect.SessionQueryTest
+ * }}}
+ *
+ * This trait overrides [[spark]] to use a [[SparkSession 
connect.SparkSession]], which executes
+ * via the gRPC API using an in-process connect server.
+ */
+trait SessionQueryTest extends sql.SessionQueryTest with SparkSessionBinder {
+  final override def isDfSorted(df: sql.DataFrame): Boolean = false // TODO

Review Comment:
   Returning `false` makes every comparison order-insensitive on Connect — an 
`ORDER BY` test that passes on classic can no longer fail on wrong ordering in 
its Connect variant. Worth a scaladoc note on the trait spelling that out, and 
a JIRA reference on the TODO, so suite authors know ordering assertions are 
silently weakened.



##########
sql/connect/server/src/test/scala/org/apache/spark/sql/connect/DataSourceV2DataFrameConnectSuite.scala:
##########
@@ -53,45 +53,15 @@ class DataSourceV2DataFrameConnectSuite
     .set("spark.sql.catalog.nullbothidscat.copyOnLoad", "true")
 
   override protected def testPrefix: String = "[connect] "
-  override protected def isConnect: Boolean = true
-
-  override protected def withTestSession(fn: SparkSession => Unit): Unit =
-    withSession(fn)
-
-  // Cannot use QueryTest.checkAnswer directly because it accesses 
df.logicalPlan,
-  // df.queryExecution, and df.materializedRdd, which are not available on 
Connect *client*
-  // DataFrames (they throw ConnectClientUnsupportedErrors). Note: checkAnswer 
IS usable from
-  // Connect server tests that operate on classic server-side DataFrames, but 
in this suite
-  // `df` is a Connect client DataFrame returned by session.table() / 
session.sql().
-  // Instead, collect the rows and delegate to QueryTest.sameRows, which is 
the same
-  // value-based, order-agnostic comparison that checkAnswer uses internally.
-  override protected def checkRows(df: => DataFrame, expected: Seq[Row]): Unit 
=
-    QueryTest.sameRows(expected, df.collect().toSeq).foreach(msg => fail(msg))
 
   override protected def getTableCatalog[C <: TableCatalog: ClassTag](
       session: SparkSession,
       catalogName: String): C = {
-    val serverSession = getServerSession(session)
-    val catalog = 
serverSession.sessionState.catalogManager.catalog(catalogName)
+    val catalog = classicSpark.sessionState.catalogManager.catalog(catalogName)

Review Comment:
   `classicSpark` is not the session these queries run in. Connect server 
sessions are created via `newSession()` 
(`SparkConnectSessionManager.newIsolatedSession`), so they have their own 
`CatalogManager` and instantiate their own catalog plugin objects — 
`externalAppend` against this instance mutates state the Connect queries never 
read. The old implementation used `getServerSession(session)` for exactly this 
reason. I'd port that accessor onto `connect.SparkSessionBinder` and use it 
here. The `classicSpark` scaladoc ("The underlying classic session used by the 
in-process server") needs the same fix — the server never uses that session 
directly. I should have caught the doc claim last round.



##########
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBinder.scala:
##########
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import scala.concurrent.duration._
+
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach, Suite}
+import org.scalatest.concurrent.Eventually
+
+import org.apache.spark.{DebugFilesystem, SparkConf, SparkFunSuite}
+import org.apache.spark.internal.config.UNSAFE_EXCEPTION_ON_MEMORY_LEAK
+import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode
+import org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.sql.test.TestSparkSession
+
+/**
+ * Provides a [[spark]] implementation by creating a [[classic.SparkSession]].
+ *
+ * counterpart to [[SparkSessionProvider]], used in 
[[org.apache.spark.sql.test.SharedSparkSession]]
+ */
+trait SparkSessionBinder extends SparkSessionBinderBase { self: SparkFunSuite 
=>
+
+  /**
+   * Suites extending this trait are sharing resources (e.g. SparkSession) in 
their
+   * tests. This trait initializes the spark session in its [[beforeAll()]] 
implementation before
+   * the automatic thread snapshot is performed, so the audit code could fail 
to report threads
+   * leaked by that shared session.
+   *
+   * The behavior is overridden here to take the snapshot before the spark 
session is initialized.
+   */
+  override protected val enableAutoThreadAudit = false
+
+  protected override def beforeAll(): Unit = {
+    doThreadPreAudit()
+    super.beforeAll()
+  }
+
+  protected override def afterAll(): Unit = {
+    try {
+      super.afterAll()
+    } finally {
+      doThreadPostAudit()
+    }
+  }
+}
+
+trait SparkSessionBinderBase
+  extends SparkSessionProvider
+  with BeforeAndAfterEach
+  with BeforeAndAfterAll
+  with Eventually { self: Suite =>
+
+  protected def sparkConf: SparkConf = {
+    val conf = new SparkConf()
+      .set("spark.hadoop.fs.file.impl", classOf[DebugFilesystem].getName)
+      .set(UNSAFE_EXCEPTION_ON_MEMORY_LEAK, true)
+      .set(SQLConf.CODEGEN_FALLBACK.key, "false")
+      .set(SQLConf.CODEGEN_FACTORY_MODE.key, 
CodegenObjectFactoryMode.CODEGEN_ONLY.toString)
+      // Disable ConvertToLocalRelation for better test coverage. Test cases 
built on
+      // LocalRelation will exercise the optimization rules better by 
disabling it as
+      // this rule may potentially block testing of other optimization rules 
such as
+      // ConstantPropagation etc.
+      .set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key, 
ConvertToLocalRelation.ruleName)
+    conf.set(
+      StaticSQLConf.WAREHOUSE_PATH,
+      conf.get(StaticSQLConf.WAREHOUSE_PATH) + "/" + getClass.getCanonicalName)
+    conf.set(StaticSQLConf.LOAD_SESSION_EXTENSIONS_FROM_CLASSPATH, false)
+    conf.set(StaticSQLConf.SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD,
+      sys.env.getOrElse("SPARK_TEST_SQL_SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD",
+        
StaticSQLConf.SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD.defaultValueString).toInt)
+    conf.set(StaticSQLConf.RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD,
+      
sys.env.getOrElse("SPARK_TEST_SQL_RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD",
+        
StaticSQLConf.RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD.defaultValueString).toInt)
+  }
+
+  /**
+   * The [[TestSparkSession]] to use for all tests in this suite.
+   *
+   * By default, the underlying [[org.apache.spark.SparkContext]] will be run 
in local
+   * mode with the default test configurations.
+   */
+  private var _spark: classic.SparkSession = null
+
+  protected override def spark: SparkSession = _spark
+
+  /**
+   * The [[test.TestSQLContext]] to use for all tests in this suite.

Review Comment:
   `TestSQLContext` is a `private[sql]` object holding override confs, not this 
member's type — `sqlContext` returns a plain `SQLContext`. Stale text from the 
old `SharedSparkSessionBase`.
   ```suggestion
      * The [[SQLContext]] to use for all tests in this suite.
   ```



##########
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:
##########
@@ -202,6 +203,7 @@ trait QueryTestBase
    * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s.
    * @param absTol the absolute tolerance between actual and expected answers.
    */
+  @deprecated("rarely used")

Review Comment:
   Worth a consistent `since = "4.2.0"` and a message naming the replacement 
(like the `SharedSparkSession` one) — here and on the other two `"rarely used"` 
annotations, plus the object's `since = "4.2"`.



##########
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBinder.scala:
##########
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import scala.concurrent.duration._
+
+import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach, Suite}
+import org.scalatest.concurrent.Eventually
+
+import org.apache.spark.{DebugFilesystem, SparkConf, SparkFunSuite}
+import org.apache.spark.internal.config.UNSAFE_EXCEPTION_ON_MEMORY_LEAK
+import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode
+import org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.sql.test.TestSparkSession
+
+/**
+ * Provides a [[spark]] implementation by creating a [[classic.SparkSession]].
+ *
+ * counterpart to [[SparkSessionProvider]], used in 
[[org.apache.spark.sql.test.SharedSparkSession]]

Review Comment:
   ```suggestion
    * Counterpart to [[SparkSessionProvider]]; used in 
[[org.apache.spark.sql.test.SharedSparkSession]].
   ```



##########
sql/hive/src/test/scala/org/apache/spark/sql/hive/SessionQueryTest.scala:
##########
@@ -0,0 +1,24 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import org.apache.spark.sql.classic
+
+trait SessionQueryTest extends classic.SessionQueryTest with 
test.TestHiveSingleton {

Review Comment:
   This stacks two competing session lifecycles. `TestHiveSingleton.spark` is 
an eager `val` (`TestHive.sparkSession`, built at suite construction), but the 
inherited binder `beforeAll` still runs `initializeSession()` -> 
`cleanupAnyExistingSession()` + `new TestSparkSession(sparkConf)` — tearing 
down the active session after TestHive's exists and creating a second, unused 
one. Nothing in the PR extends this trait, so CI never exercises it. Either add 
a smoke-test suite or drop the hive variant from this PR; if it stays, the hive 
binding should override the session-source seam (`createSparkSession`) instead 
of stacking `TestHiveSingleton` on a live binder lifecycle.



##########
sql/core/src/test/scala/org/apache/spark/sql/SessionQueryTestBase.scala:
##########
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+// scalastyle:off funsuite
+import org.scalatest.funsuite.AnyFunSuite
+// scalastyle:on
+
+/**
+ * TODO should be moved to sql/api
+ *
+ * base for fully sql/core independent tests, i.e. this trait could be moved 
to sql/api and then
+ * used in sql/connect/client.
+ */
+trait SessionQueryTestBase
+  extends AnyFunSuite
+    with SparkSessionProvider
+    with CheckAnswerHelper
+    with QueryCleanupHelper {
+
+  /**
+   * Documents used session so that tests can handle and document 
session-specific behaviour
+   *
+   * {{{
+   *   test(...) {
+   *     val df = // query with connect-specific behaviour
+   *     if (sessionType = 'connect') {

Review Comment:
   The example isn't valid Scala (assignment, and single quotes on a `String`):
   ```suggestion
      *     if (sessionType == "connect") {
   ```
   While here: the trait doc on line 27 ("base for fully sql/core independent 
tests") and the summary on line 37 ("Documents used session so that...") could 
use a grammar pass too.



##########
sql/core/src/test/scala/org/apache/spark/sql/SessionQueryTest.scala:
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.SparkFunSuite
+
+/**
+ * Provides connect-compatible test utils to write suites that have 'connect 
variants':
+ * {{{
+ *   // in sql/core
+ *   FooSuite extends SessionQueryTest { test("") { ... } }
+ *
+ *   // in sql/connect
+ *   FooConnectSuite extends connect.SessionQueryTest
+ * }}}
+ *
+ * While this trait internally uses a [[classic.SparkSession]] when executing 
tests,
+ * it exposed as a [[SparkSession sql.SparkSession]] to allow for overriding 
on the connect side.

Review Comment:
   Grammar:
   ```suggestion
    * it is exposed as a [[SparkSession sql.SparkSession]] to allow for 
overriding on the connect side.
   ```



##########
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:
##########
@@ -823,21 +789,39 @@ trait QueryTest extends SparkFunSuite with QueryTestBase {
   }
 }
 
-object QueryTest extends Assertions {
+@deprecated("superseded by CheckAnswerHelper", since = "4.2")
+object QueryTest extends CheckAnswerHelper {
   /**
    * Runs the plan and makes sure the answer matches the expected result.
    *
    * @param df the DataFrame to be executed
    * @param expectedAnswer the expected result in a Seq of Rows.
    * @param checkToRDD whether to verify deserialization to an RDD. This runs 
the query twice.

Review Comment:
   This `@param checkToRDD` now documents the 2-arg overload, which doesn't 
have the parameter — move it to the 3-arg overload below, or drop it.



##########
sql/core/src/test/scala/org/apache/spark/sql/CheckAnswerHelper.scala:
##########
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.util.TimeZone
+
+import org.scalatest.Assertions
+
+import org.apache.spark.sql.catalyst.plans.logical
+import org.apache.spark.util.{SparkErrorUtils, SparkStringUtils}
+
+trait CheckAnswerHelper extends Assertions {
+
+  /**
+   * Runs the plan and makes sure the answer matches the expected result.
+   *
+   * @param df the DataFrame to be executed
+   * @param expectedAnswer the expected result in a Seq of Rows.
+   */
+  protected def checkAnswer(df: => DataFrame, expectedAnswer: Seq[Row]): Unit 
= {

Review Comment:
   `df` is by-name but is forced before any try (at the 
`getErrorMessageInCheckAnswer` call), so an analysis-time failure escapes the 
formatted error path. Either wrap the evaluation like 
`QueryTestBase.checkAnswer` does with `ExtendedAnalysisException`, or drop the 
by-name — as written it buys nothing.



##########
sql/core/src/test/scala/org/apache/spark/sql/SessionQueryTest.scala:
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.SparkFunSuite
+
+/**
+ * Provides connect-compatible test utils to write suites that have 'connect 
variants':
+ * {{{
+ *   // in sql/core
+ *   FooSuite extends SessionQueryTest { test("") { ... } }
+ *
+ *   // in sql/connect
+ *   FooConnectSuite extends connect.SessionQueryTest
+ * }}}
+ *
+ * While this trait internally uses a [[classic.SparkSession]] when executing 
tests,
+ * it exposed as a [[SparkSession sql.SparkSession]] to allow for overriding 
on the connect side.
+ *
+ * For classic-specific tests, use [[classic.SessionQueryTest]].
+ */
+trait SessionQueryTest
+  extends SparkFunSuite
+  with SessionQueryTestBase
+  with SparkSessionBinder {
+
+  override def sessionType: String = "classic"

Review Comment:
   Missing newline at EOF — scalastyle's `NewLineAtEofChecker` is error-level, 
and this is failing the linter job.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56505][SQL][TESTS] Add SparkSessionBinder to replace SharedSparkSession [spark]

Reply via email to