[GitHub] cloud-fan commented on issue #23836: [SPARK-26915][SQL] DataFrameWriter.save() should write without schema validation

GitBox Tue, 19 Feb 2019 19:35:56 -0800

cloud-fan commented on issue #23836: [SPARK-26915][SQL] DataFrameWriter.save()  
should write without schema validation
URL: https://github.com/apache/spark/pull/23836#issuecomment-465408204
 
 
   I definitely agree with the direction: translate `SaveMode` to operators 
with clear semantic, and remove `SaveMode` from ds v2 but keep it in the public 
API for a while.
   
   However I think the current translation is not precise: append mode doesn't 
mean append, it's actually "create table if not exist or append table". At 
least this is the case for file source and JDBC source.
   
   The next problem is, how to implement "create table if not exist or append 
table" with ds v2 APIs. I have 2 proposals:
   1. keep the "catalog -> table -> write builder -> write", and implementation 
has 2 steps: a) create the table if not exists. b) do a normal append.
   2. slightly change the abstraction to "catalog -> staged table -> write 
builder -> write", so that we can write data to a non-existing table, and make 
the entire process atomic.
   
   For proposal 1, file source doesn't work because it can't create an empty 
table(it doesn't have metastore). I guess other data source will face the same 
issue. And it requires the catalog API, which is not done yet.
   
   I think proposal 2 is better. It's useful even after we have the catalog 
API, to implement atomic CTAS.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] cloud-fan commented on issue #23836: [SPARK-26915][SQL] DataFrameWriter.save() should write without schema validation

Reply via email to