[GitHub] rdblue commented on issue #23829: [SPARK-26915][SQL]File source should write without schema validation in DataFrameWriter.save()

GitBox Tue, 19 Feb 2019 10:39:21 -0800

rdblue commented on issue #23829: [SPARK-26915][SQL]File source should write 
without schema validation in DataFrameWriter.save()
URL: https://github.com/apache/spark/pull/23829#issuecomment-465256668
 
 
   @cloud-fan, here are my answers:
   
   > 1. file source should not have schema validation during write
   
   Validation should be configured by the source, just like we talked about for 
sources that can data with missing columns.
   
   I think the larger issue is finding out what the correct behavior is. What 
tables should opt out of validation? What tables should just use different 
rules, like allowing new columns?
   
   > 2. file source can't report schema during write, if the output path 
doesn't exist
   
   In this case, the table catalog that supports path-based tables will check 
existence. If the path doesn't exist, then the table doesn't exist. Then the 
writer should use a `CreateTableAsSelect` plan instead of an overwrite plan. 
CTAS doesn't validation against an existing schema, it creates the table using 
the given schema.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] rdblue commented on issue #23829: [SPARK-26915][SQL]File source should write without schema validation in DataFrameWriter.save()

Reply via email to