Connect mode [spark]

via GitHub Tue, 26 May 2026 03:39:09 -0700


cloud-fan commented on code in PR #55462:
URL: https://github.com/apache/spark/pull/55462#discussion_r3303173883



##########
sql/core/src/test/scala/org/apache/spark/sql/connector/DSv2RepeatedTableAccessTests.scala:
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.connector.catalog.{CachingInMemoryTableCatalog, 
Column, Identifier, InMemoryTableCatalog, TableChange, TableInfo}
+import org.apache.spark.sql.types.IntegerType
+
+/**
+ * Shared repeated table access tests with external changes for DSv2 tables. 
These tests verify
+ * that repeated `sql()` calls correctly reflect external mutations made via 
the catalog API:
+ *
+ *  - Scenario 1 (external writes): external data appended via the catalog API 
is visible.
+ *  - Scenario 2 (external schema changes): external ADD COLUMN via the 
catalog API is visible.
+ *  - Scenario 3 (external drop/recreate): external drop and recreate via the 
catalog API
+ *    resolves to the new empty table.
+ *
+ * Each scenario includes a session mutation baseline, an external mutation 
test, and a
+ * caching-connector variant showing stale results until `REFRESH TABLE`.
+ *
+ * NOTE: All `session.sql(...)` calls append `.collect()` because Connect 
client DataFrames
+ * are lazy and require an action to trigger execution. In classic mode 
`.collect()` on DDL
+ * is a no-op (DDL executes eagerly), so this is harmless.
+ */
+trait DSv2RepeatedTableAccessTests extends DSv2ExternalMutationTestBase {
+
+  private val T = "testcat.ns1.ns2.tbl"
+  private val CT = "cachingcat.ns1.ns2.tbl"
+  private val testIdent = Identifier.of(Array("ns1", "ns2"), "tbl")

Review Comment:
   Nit: `T`, `CT`, and `testIdent` here are byte-for-byte identical to the ones 
in `DSv2TempViewWithStoredPlanTests` (lines 36-38). Consider hoisting them to 
`DSv2ExternalMutationTestBase` as `protected val`s so both consumer traits 
share the same namespace fixture in one place.



##########
sql/core/src/test/scala/org/apache/spark/sql/connector/DSv2RepeatedTableAccessTests.scala:
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.connector.catalog.{CachingInMemoryTableCatalog, 
Column, Identifier, InMemoryTableCatalog, TableChange, TableInfo}
+import org.apache.spark.sql.types.IntegerType
+
+/**
+ * Shared repeated table access tests with external changes for DSv2 tables. 
These tests verify
+ * that repeated `sql()` calls correctly reflect external mutations made via 
the catalog API:
+ *
+ *  - Scenario 1 (external writes): external data appended via the catalog API 
is visible.
+ *  - Scenario 2 (external schema changes): external ADD COLUMN via the 
catalog API is visible.
+ *  - Scenario 3 (external drop/recreate): external drop and recreate via the 
catalog API
+ *    resolves to the new empty table.
+ *
+ * Each scenario includes a session mutation baseline, an external mutation 
test, and a
+ * caching-connector variant showing stale results until `REFRESH TABLE`.
+ *
+ * NOTE: All `session.sql(...)` calls append `.collect()` because Connect 
client DataFrames
+ * are lazy and require an action to trigger execution. In classic mode 
`.collect()` on DDL

Review Comment:
   Nit: the NOTE says "`.collect()` on DDL is a no-op (DDL executes eagerly)", 
but the test bodies also append `.collect()` to `INSERT INTO ...` and `REFRESH 
TABLE ...`, which are DML / maintenance commands. Substantively correct (eager 
in classic, lazy on Connect — same behavior either way), but the framing is 
narrower than the actual usage. Could read as "eager statements" or "DDL / DML".



##########
sql/core/src/test/scala/org/apache/spark/sql/connector/DSv2RepeatedTableAccessTests.scala:
##########
@@ -0,0 +1,224 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.connector.catalog.{CachingInMemoryTableCatalog, 
Column, Identifier, InMemoryTableCatalog, TableChange, TableInfo}
+import org.apache.spark.sql.types.IntegerType
+
+/**
+ * Shared repeated table access tests with external changes for DSv2 tables. 
These tests verify
+ * that repeated `sql()` calls correctly reflect external mutations made via 
the catalog API:

Review Comment:
   Nit: the lead sentence frames this as "external mutations made via the 
catalog API", but the `*.1` tests in each scenario are session-mutation 
baselines. The next paragraph clarifies ("Each scenario includes a session 
mutation baseline"), but a small tweak — e.g. "...verify that repeated `sql()` 
calls correctly reflect both session and external mutations" — would match the 
actual scope.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56619][SQL][TESTS] Add DSv2 repeated table access tests with internal/external changes in Classic/Connect mode [spark]

Reply via email to