cloud-fan commented on code in PR #55707: URL: https://github.com/apache/spark/pull/55707#discussion_r3232660321
########## AGENTS.md: ########## @@ -20,6 +20,33 @@ Spark Connect protocol is defined in proto files under `sql/connect/common/src/m Avoid introducing non-ASCII characters in code or comments. String literals may contain non-ASCII when the content requires it (error messages, test data, etc.). Identifiers are ASCII by convention. The common failure mode is typographic characters (em-dash, smart quotes, ellipsis, non-breaking space) sneaking into comments; scalastyle flags some of these. Spot-check before committing: `grep -rn -P "[^\x00-\x7F]" <files>`. +## Scala Test Base Classes + +When writing a new Scala test suite, pick the lowest base class that provides what the test actually needs. Spark uses the `AnyFunSuite` ScalaTest style throughout, so the bases below are the chain to choose from. Each adds capability on top of the previous: + + SparkFunSuite (core) + <- PlanTest (sql/catalyst) + <- QueryTest (sql/core) + +| Test scope | Base | Notes | +|------------|------|-------| +| Plain JVM/Scala — no Spark SQL | `SparkFunSuite` | `core` utilities, RDD, network, util classes, etc. Adds per-test timeout, `testRetry`, `gridTest`, thread audit, fixed timezone/locale, `withTempDir`, `withLogAppender`, `checkError`. | +| Catalyst plan tests — no `SparkSession` | `PlanTest` | Adds `comparePlans`, `normalizePlan`, `normalizeExprIds`. For analyzer / optimizer / planner rule tests. | +| SQL/DataFrame tests — needs a `SparkSession` | `QueryTest` | Adds `checkAnswer`, codegen-on/off helpers. `spark: SparkSession` is abstract and must be supplied by a session-providing trait (see below). | + +### Providing a `SparkSession` for `QueryTest` + +`QueryTest` declares `spark: SparkSession` abstractly via `SparkSessionProvider`, so it cannot be instantiated on its own. A concrete suite mixes in one of the session-providing traits below: + + QueryTest (abstract `spark`) + + SharedSparkSession (sql/core) -> classic in-process `TestSparkSession` + + TestHiveSingleton (sql/hive) -> Hive-backed `TestHive` session + +| Session provider | Module / location | Typical usage | +|---|---|---| +| `SharedSparkSession` | `sql/core` | Itself extends `QueryTest`, so concrete suites just `extends SharedSparkSession`. Default for tests under `sql/core`. | Review Comment: shall we recommend people to do `extends QueryTest with SharedSparkSession`? Then it's more consistent with hive -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
