juliuszsompolski commented on code in PR #53173:
URL: https://github.com/apache/spark/pull/53173#discussion_r2555560154
##########
sql/core/src/main/scala/org/apache/spark/sql/classic/DataFrameWriter.scala:
##########
@@ -484,12 +484,50 @@ final class DataFrameWriter[T] private[sql](ds:
Dataset[T]) extends sql.DataFram
serde = None,
external = false,
constraints = Seq.empty)
+ val writeOptions = if (source == "delta") {
Review Comment:
@dongjoon-hyun I am right now working on researching cleaner solutions. I
have raised this PR as a straw man, given the Spark 4.1 timeline and trying to
propose the most narrowly scoped change possible that would prevent a behaviour
change that would lead to table corruption in Delta (unintentionally
overwriting table's metadata in operations that have required an explicit
`overwriteSchema` option for that before). In any case, if this is allowed to
get in as a stop gap fix, I would like to replace it with a proper solution.
It could also be done by adding this option always and not mention Delta
here, as @HyukjinKwon suggested. It would still be a "strange" piece of code to
attach this kind of option just for this particular case, and kind of "leaky".
Some people suggested to me that maybe having Spark attach some options that
always point to the API that the command originated from could be useful also
in other cases, but that's also a much bigger change to design and make.
Another option could be to, since the semantics of `saveAsTable` in DFWV1
can be interpreted differently that `createOrReplace` / `replace` of DFWV2,
maybe it could have a new plan node `SaveAsV2TableCommand`, just like it has
it's own node for `SaveAsV1TableCommand`? But again, this is a lot of changes.
Or, the existing `CreateTableAsSelect` / `ReplaceTableAsSelect` should have
a flag parameter indicating that it's actually a `SaveAsTable` command? But
again, this is not really clean...
I am researching options.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]