rdblue commented on issue #23829: [SPARK-26915][SQL]File source should write without schema validation in DataFrameWriter.save() URL: https://github.com/apache/spark/pull/23829#issuecomment-465256668 @cloud-fan, here are my answers: > 1. file source should not have schema validation during write Validation should be configured by the source, just like we talked about for sources that can data with missing columns. I think the larger issue is finding out what the correct behavior is. What tables should opt out of validation? What tables should just use different rules, like allowing new columns? > 2. file source can't report schema during write, if the output path doesn't exist In this case, the table catalog that supports path-based tables will check existence. If the path doesn't exist, then the table doesn't exist. Then the writer should use a `CreateTableAsSelect` plan instead of an overwrite plan. CTAS doesn't validation against an existing schema, it creates the table using the given schema.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
