cloud-fan commented on code in PR #55707:
URL: https://github.com/apache/spark/pull/55707#discussion_r3218720523


##########
AGENTS.md:
##########
@@ -20,6 +20,35 @@ Spark Connect protocol is defined in proto files under 
`sql/connect/common/src/m
 
 Avoid introducing non-ASCII characters in code or comments. String literals 
may contain non-ASCII when the content requires it (error messages, test data, 
etc.). Identifiers are ASCII by convention. The common failure mode is 
typographic characters (em-dash, smart quotes, ellipsis, non-breaking space) 
sneaking into comments; scalastyle flags some of these. Spot-check before 
committing: `grep -rn -P "[^\x00-\x7F]" <files>`.
 
+## Scala Test Base Classes
+
+When writing a new Scala test suite, pick the lowest base class that provides 
what the test actually needs. Spark uses the `AnyFunSuite` ScalaTest style 
throughout, so the bases below are the chain to choose from. Each adds 
capability on top of the previous:
+
+    SparkFunSuite                                                           
(core)
+      <- PlanTest                                                           
(sql/catalyst)
+        <- QueryTest                                                        
(sql/core)
+
+| Test scope | Base | Notes |
+|------------|------|-------|
+| Plain JVM/Scala — no Spark SQL | `SparkFunSuite` | `core` utilities, RDD, 
network, util classes, etc. Adds per-test timeout, `testRetry`, `gridTest`, 
thread audit, fixed timezone/locale, `withTempDir`, `withLogAppender`, 
`checkError`. |
+| Catalyst plan tests — no `SparkSession` | `PlanTest` | Adds `comparePlans`, 
`normalizePlan`, `normalizeExprIds`. For analyzer / optimizer / planner rule 
tests. |
+| SQL/DataFrame tests — needs a `SparkSession` | `QueryTest` | Adds 
`checkAnswer`, codegen-on/off helpers. `spark: SparkSession` is abstract and 
must be supplied by a session-providing trait (see below). |
+
+### Providing a `SparkSession` for `QueryTest`
+
+`QueryTest` declares `spark: SparkSession` abstractly via 
`SparkSessionProvider`, so it cannot be instantiated on its own. A concrete 
suite mixes in one of the session-providing traits below:
+
+    QueryTest                                                               
(abstract `spark`)
+      + SharedSparkSession (sql/core)        -> classic in-process 
`TestSparkSession`
+      + TestHiveSingleton  (sql/hive)        -> Hive-backed `TestHive` session
+
+| Session provider | Module / location | Typical usage |
+|---|---|---|
+| `SharedSparkSession` | `sql/core` | Itself extends `QueryTest`, so concrete 
suites just `extends SharedSparkSession`. Default for tests under `sql/core`. |
+| `TestHiveSingleton` | `sql/hive` | Mixed in alongside `QueryTest`, e.g. 
`class X extends QueryTest with TestHiveSingleton`. Used by tests under 
`sql/hive`. |
+
+Linearization gotcha: the first item in the `extends` clause must transitively 
extend a class (i.e. carry a non-`Object` superclass). All three bases above 
plus `SharedSparkSession` carry the `SparkFunSuite` chain, so any of them can 
appear first. A "pure helper" trait (e.g. `*ErrorsBase`, `*Helper`) does not — 
if you put one first, mix in a class-bearing trait immediately after, or 
compilation fails with `superclass Object is not a subclass of the superclass 
SparkFunSuite of the mixin trait ...`. Quick check: `grep "^trait <Name>"` — if 
it ends in `extends DataTypeErrorsBase` or another pure trait, it does not 
carry the class chain.

Review Comment:
   do we need this paragraph at all? No `*ErrorsBase` `*Helper` is mentioned in 
this section



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to