[GitHub] [spark] gengliangwang commented on a change in pull request #30273: [SPARK-33369][SQL] Skip schema inference in DataframeWriter.save() if table provider supports external metadata

GitBox Mon, 09 Nov 2020 07:19:20 -0800


gengliangwang commented on a change in pull request #30273:
URL: https://github.com/apache/spark/pull/30273#discussion_r519891333




##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
##########
@@ -325,11 +325,12 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
       val dsOptions = new CaseInsensitiveStringMap(finalOptions.asJava)
 
       def getTable: Table = {
-        // For file source, it's expensive to infer schema/partition at each 
write. Here we pass
-        // the schema of input query and the user-specified partitioning to 
`getTable`. If the
+        // If the source accepts external table metadata, here we pass the 
schema of input query
+        // and the user-specified partitioning to `getTable`. This is for 
avoiding
+        // schema/partitioning inference, which can be very expensive. If the
         // query schema is not compatible with the existing data, the write 
can still success but

Review comment:
       +1, thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] gengliangwang commented on a change in pull request #30273: [SPARK-33369][SQL] Skip schema inference in DataframeWriter.save() if table provider supports external metadata

Reply via email to