cloud-fan commented on code in PR #56111:
URL: https://github.com/apache/spark/pull/56111#discussion_r3346354523


##########
sql/core/src/main/scala/org/apache/spark/sql/classic/DataFrameWriter.scala:
##########
@@ -176,7 +176,21 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) extends sql.DataFram
               val catalog = CatalogV2Util.getTableProviderCatalog(
                 supportsExtract, catalogManager, dsOptions)
 
-              (catalog.loadTable(ident), Some(catalog), Some(ident))
+              try {
+                (catalog.loadTable(ident), Some(catalog), Some(ident))
+              } catch {
+                // SPARK-57068: align Overwrite-on-missing with V1 source 
behavior

Review Comment:
   I'd reconsider aligning with V1 source behavior here. A 
`SupportsCatalogOptions` source resolves a real catalog table, so `save()` is 
DML on that table, not path-based data writing: `mode("overwrite")` on an 
existing table is `OverwriteByExpression(Literal(true))` just below (= SQL 
`INSERT OVERWRITE`, truncate + write), and `mode("append")` is `AppendData` (= 
`INSERT INTO`).
   
   SQL `INSERT OVERWRITE`/`INSERT INTO` both require an existing table and 
throw `TABLE_OR_VIEW_NOT_FOUND` on a missing one — there's no auto-create on 
insert. So under SQL table semantics both Overwrite-missing and Append-missing 
should throw, which is the current (consistent) behavior. Creating only on 
Overwrite-missing breaks that symmetry and mixes in the V1 pathless-source 
model.
   
   The create-or-replace verb already exists via `saveAsTable` 
(`ReplaceTableAsSelect(orCreate = true)` = `CREATE OR REPLACE TABLE AS 
SELECT`). My suggestion is to leave `save().mode("overwrite")` throwing on a 
missing catalog table and point users at `saveAsTable` for create-or-replace.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to