cloud-fan commented on issue #23829: [SPARK-26915][SQL]File source should write without schema validation in DataFrameWriter.save() URL: https://github.com/apache/spark/pull/23829#issuecomment-464954983 @rdblue there are 2 problems here 1. file source should not have schema validation during write 2. file source can't report schema during write, if the output path doesn't exist For 1, I think we can introduce a new trait(or capability API) to indicate that a data source doesn't need schema validation during write. For 2, I think we need the CTAS(and RTAS) operator. One thing we need to note that, `DataFrameWriter` API can mix data and metadata operations. e.g. `df.mode("append")` can append data to a non-existing table, with CTAS semantic. How would the ongoing catalog API proposal solve this issue?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
