HeartSaVioR edited a comment on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-738336115


   > Per #30521 (comment), for the partition column. Now the param only takes 
effects in the v1 sink. How do we address the conflict between user input and 
the partitioning for the existing table? If we want to let the data source 
decide, the V2 plan didn't carry the partition info for now.
   
   I don't know which is the best way to handle. It seems both are not good.
   
   The configuration of partition columns was added without table support. 
Without table support, we don't know whether data source has such partition 
information or not, so we're forced to "always" provide the information, even 
it's unnecessary.
   (I don't like to make this happen again against table - that's one of 
reasons I don't like the proposal on creating table by default. But I'm fine if 
there's a way to avoid, like I said in "lowest bar".)
   
   With existing table, the table should have partition information in prior, 
hence the configuration is useless unless we mean to create table. In 
DataFrameWriterV2, once you provide the partition information or table 
property, you are no longer able to do append. You are forced to create or 
replace, which should always respect the input or simply fail. There's no 
confusion on such part.
   
   More and more I revisit DataFrameWriterV2, more and more I realize how much 
DataStreamWriter is lacking on table support. That was OK (shouldn't be 
blamed), because there's no support on table write, but that's no longer an 
excuse once we are adding it. Simple table write was also OK as we should just 
follow the table information, but we're now creating table as well.
   
   Anyway it would be safer to follow how we do with SaveMode.Append in 
DataFrameWriter.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to