cloud-fan commented on a change in pull request #30273:
URL: https://github.com/apache/spark/pull/30273#discussion_r518533903



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##########
@@ -325,11 +325,12 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
       val dsOptions = new CaseInsensitiveStringMap(finalOptions.asJava)
 
       def getTable: Table = {
-        // For file source, it's expensive to infer schema/partition at each 
write. Here we pass
-        // the schema of input query and the user-specified partitioning to 
`getTable`. If the
+        // If the source accepts external table metadata, here we pass the 
schema of input query
+        // and the user-specified partitioning to `getTable`. This is for 
avoiding
+        // schema/partitioning inference, which can be very expensive. If the
         // query schema is not compatible with the existing data, the write 
can still success but

Review comment:
       This is only true for file source. How about
   ```
   If the query schema is not compatible with the existing data, the behavior 
is undefined.
   For example, file source write will success but following reads would fail.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to