cloud-fan commented on code in PR #56190:
URL: https://github.com/apache/spark/pull/56190#discussion_r3322066761
##########
sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala:
##########
@@ -1211,7 +1211,7 @@ object QueryTest extends Assertions {
}
-class QueryTestSuite extends test.SharedSparkSession {
+class QueryTestSuite extends QueryTest with SparkSessionBinder {
Review Comment:
This migration — mixing the bare `sql.SparkSessionBinder` into a concrete
suite — is the shape I'd push back on. `sql.SparkSessionBinder` binds a classic
session but exposes `spark` only as the abstract `sql.SparkSession`, so it's
really internal plumbing, not what a test author should reach for.
The end-state I'd recommend documenting and demonstrating is a binder-free
base + per-env concrete suites:
```scala
abstract class FooSuiteBase extends QueryTest { // no binder; spark
abstract
test("shared") { checkAnswer(sql("SELECT 1"), Row(1)) }
}
class FooSuite extends FooSuiteBase with classic.SparkSessionBinder {
test("classic only") { ... }
}
class FooConnectSuite extends FooSuiteBase
with connect.SparkSessionBinder with connect.QueryTest {
test("connect only") { ... }
}
```
`QueryTest` already mixes in `SparkSessionProvider` (via `SQLTestData`) and
leaves `spark` abstract, so it works as the env-agnostic base directly.
Concretely: (1) steer the migration and the `@deprecated` message at
`classic.SparkSessionBinder` / `connect.SparkSessionBinder` + this base
pattern, not the bare binder; (2) `QueryTestWithConnectSuite` currently
demonstrates the retrofit path (extending an already-classic-bound
`QueryTestSuite` and overriding the binding) — a binder-free base would
demonstrate the cleaner pattern and double as the template authors copy.
##########
sql/connect/server/src/test/scala/org/apache/spark/sql/connect/QueryTest.scala:
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connect
+
+import org.apache.spark.{sql => sqlApi}
+
+/**
+ * Extends [[sqlApi.QueryTest]] for use with Connect sessions.
+ *
+ * Overrides [[checkAnswer]] to avoid classic-only code paths (e.g.
`queryExecution`,
+ * `logicalPlan`, `materializedRdd`) that are not available on Connect
DataFrames.
+ */
+trait QueryTest extends sqlApi.QueryTest with SparkSessionProvider {
+
+ override protected def checkAnswer(
Review Comment:
This overrides only the `checkAnswer(df, Seq[Row])` variant, which is enough
for `QueryTestSuite`. But the stated goal is re-running arbitrary `sql/core`
suites over Connect, and the other `QueryTest` helpers (other `checkAnswer`
overloads, `checkDataset`, ...) still reach classic-only paths like
`queryExecution`/`logicalPlan`. Worth a line in the trait doc noting that
broader reuse will need more overrides.
##########
sql/connect/server/src/test/scala/org/apache/spark/sql/connect/SparkSessionBinder.scala:
##########
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connect
+
+import java.util.UUID
+
+import scala.concurrent.duration._
+
+import org.scalatest.concurrent.Eventually
+
+import org.apache.spark.DebugFilesystem
+import org.apache.spark.sql
+import org.apache.spark.sql.classic
+import org.apache.spark.sql.connect.client.SparkConnectClient
+import org.apache.spark.sql.connect.common.config.ConnectCommon
+import org.apache.spark.sql.connect.config.Connect
+import org.apache.spark.sql.connect.service.SparkConnectService
+
+/**
+ * A test trait that provides a Connect [[SparkSession]] backed by an
in-process gRPC server.
+ * Extends [[sql.SparkSessionBinder sql.SparkSessionBinder]] (which creates a
+ * [[classic.SparkSession classic.SparkSession]] and SparkContext), then
layers a Connect client
+ * session on top by starting the gRPC service in-process.
+ *
+ * Mix in this trait to exercise existing sql/core test suites through the
Connect path:
+ * {{{
+ * class FooWithConnectSuite
+ * extends FooSuite
+ * with connect.SparkSessionBinder
+ * with connect.QueryTest
+ * }}}
+ */
+trait SparkSessionBinder extends sql.SparkSessionBinder {
+
+ private val serverPort: Int =
+ ConnectCommon.CONNECT_GRPC_BINDING_PORT + util.Random.nextInt(1000)
+
+ @volatile private var _connectSpark: SparkSession = _
+
+ protected override def spark: SparkSession = _connectSpark
+
+ /** The underlying classic session used by the in-process server. */
+ private def classicSpark: classic.SparkSession =
super.spark.asInstanceOf[classic.SparkSession]
+
+ override def beforeAll(): Unit = {
+ super.beforeAll()
+ withSparkEnvConfs((Connect.CONNECT_GRPC_BINDING_PORT.key,
serverPort.toString)) {
+ SparkConnectService.start(classicSpark.sparkContext)
+ }
+ val client = SparkConnectClient
+ .builder()
+ .port(serverPort)
+ .sessionId(UUID.randomUUID().toString)
+ .userId("test")
+ .build()
+ _connectSpark = SparkSession
+ .builder()
+ .client(client)
+ .create()
+ }
+
+ override def afterAll(): Unit = {
+ try {
+ if (_connectSpark != null) {
+ _connectSpark.close()
+ _connectSpark = null
+ }
+ SparkConnectService.stop()
+ } finally {
+ super.afterAll()
+ }
+ }
+
+ // The base SharedSparkSessionBase.afterEach calls spark.sharedState which
is not supported
Review Comment:
This comment is inaccurate after the refactor: `connect.SparkSessionBinder`
extends `sql.SparkSessionBinder` directly (not `SharedSparkSessionBase`), and
that parent's `afterEach` clears the cache via the private `_spark` — the
classic session, which is exactly what's used on Connect (`createSparkSession`
isn't overridden). So the parent's `afterEach` already works here and this
override is redundant. If you do keep it, note that skipping
`super.afterEach()` drops the `BeforeAndAfterEach` chain. Simplest fix is to
remove the override entirely.
##########
sql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala:
##########
@@ -19,41 +19,12 @@ package org.apache.spark.sql.test
import scala.concurrent.duration._
-import org.scalatest.{BeforeAndAfterEach, Suite}
-import org.scalatest.concurrent.Eventually
-
-import org.apache.spark.{DebugFilesystem, SparkConf}
-import org.apache.spark.internal.config.UNSAFE_EXCEPTION_ON_MEMORY_LEAK
-import org.apache.spark.sql.{classic, QueryTest, QueryTestBase, SparkSession,
SparkSessionProvider, SQLContext}
-import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode
-import org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation
-import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.sql
+import org.apache.spark.sql.{classic, QueryTest, QueryTestBase}
+@deprecated("Use SparkSessionBinder and QueryTest instead")
Review Comment:
`@deprecated` takes a `since` version as its second argument; adding it
documents when the deprecation started and matches the convention elsewhere in
the codebase. Same on line 59.
##########
sql/connect/server/src/test/scala/org/apache/spark/sql/connect/QueryTestWithConnectSuite.scala:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connect
+
+import org.apache.spark.sql.QueryTestSuite
+
+/**
+ * Runs [[QueryTestSuite]] tests through a Connect session.
+ *
+ * This validates the `FooSuite with connect.SharedSparkSession` pattern: the
existing
Review Comment:
There's no `connect.SharedSparkSession` trait; the pattern this suite
actually uses (and that the sibling `connect/SparkSessionBinder.scala` doc
shows) is `connect.SparkSessionBinder with connect.QueryTest`.
```suggestion
* This validates the `FooSuite with connect.SparkSessionBinder with
connect.QueryTest` pattern: the existing
```
##########
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionBinder.scala:
##########
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import scala.concurrent.duration._
+
+import org.scalatest.{BeforeAndAfterEach, Suite}
+import org.scalatest.concurrent.Eventually
+
+import org.apache.spark.{DebugFilesystem, SparkConf, SparkFunSuite}
+import org.apache.spark.internal.config.UNSAFE_EXCEPTION_ON_MEMORY_LEAK
+import org.apache.spark.sql.catalyst.expressions.CodegenObjectFactoryMode
+import org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.sql.test.TestSparkSession
+
+trait SparkSessionBinder
+ extends SparkFunSuite
+ with SparkSessionProvider
+ with BeforeAndAfterEach
+ with Eventually {
+
+ protected def sparkConf = {
+ val conf = new SparkConf()
+ .set("spark.hadoop.fs.file.impl", classOf[DebugFilesystem].getName)
+ .set(UNSAFE_EXCEPTION_ON_MEMORY_LEAK, true)
+ .set(SQLConf.CODEGEN_FALLBACK.key, "false")
+ .set(SQLConf.CODEGEN_FACTORY_MODE.key,
CodegenObjectFactoryMode.CODEGEN_ONLY.toString)
+ // Disable ConvertToLocalRelation for better test coverage. Test cases
built on
+ // LocalRelation will exercise the optimization rules better by
disabling it as
+ // this rule may potentially block testing of other optimization rules
such as
+ // ConstantPropagation etc.
+ .set(SQLConf.OPTIMIZER_EXCLUDED_RULES.key,
ConvertToLocalRelation.ruleName)
+ conf.set(
+ StaticSQLConf.WAREHOUSE_PATH,
+ conf.get(StaticSQLConf.WAREHOUSE_PATH) + "/" + getClass.getCanonicalName)
+ conf.set(StaticSQLConf.LOAD_SESSION_EXTENSIONS_FROM_CLASSPATH, false)
+ conf.set(StaticSQLConf.SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD,
+ sys.env.getOrElse("SPARK_TEST_SQL_SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD",
+
StaticSQLConf.SHUFFLE_EXCHANGE_MAX_THREAD_THRESHOLD.defaultValueString).toInt)
+ conf.set(StaticSQLConf.RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD,
+
sys.env.getOrElse("SPARK_TEST_SQL_RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD",
+
StaticSQLConf.RESULT_QUERY_STAGE_MAX_THREAD_THRESHOLD.defaultValueString).toInt)
+ }
+
+ /**
+ * The [[TestSparkSession]] to use for all tests in this suite.
+ *
+ * By default, the underlying [[org.apache.spark.SparkContext]] will be run
in local
+ * mode with the default test configurations.
+ */
+ private var _spark: classic.SparkSession = null
+
+ protected override def spark: SparkSession = _spark
+
+ /**
+ * The [[TestSQLContext]] to use for all tests in this suite.
+ */
+ protected implicit def sqlContext: SQLContext = _spark.sqlContext
+
+ protected def createSparkSession: classic.SparkSession = {
+ classic.SparkSession.cleanupAnyExistingSession()
+ new TestSparkSession(sparkConf)
+ }
+
+ protected def sqlConf: SQLConf = _spark.sessionState.conf
+
+ /**
+ * Initialize the [[TestSparkSession]]. Generally, this is just called from
+ * beforeAll; however, in test using styles other than FunSuite, there is
+ * often code that relies on the session between test group constructs and
+ * the actual tests, which may need this session. It is purely a semantic
+ * difference, but semantically, it makes more sense to call
+ * 'initializeSession' between a 'describe' and an 'it' call than it does to
+ * call 'beforeAll'.
+ */
+ protected def initializeSession(): Unit = {
+ if (_spark == null) {
+ _spark = createSparkSession
+ }
+ }
+
+ /**
+ * Suites extending [[SharedSparkSession]] are sharing resources (e.g.
SparkSession) in their
Review Comment:
This doc moved out of `SharedSparkSession`; the snapshot-before-init logic
now lives in this trait, so referring to `SharedSparkSession` is stale.
```suggestion
* Suites extending this trait are sharing resources (e.g. SparkSession)
in their
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]