HeartSaVioR edited a comment on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-736957997


   > You can perform complete mode writes, which overwrites the entire data 
every time.
   
   Sorry probably I wasn't clear. This isn't true for DSv1 Sink interface 
unless data source does the hack to require providing output mode to the data 
source option directly. You have no idea of output mode in DSv1, and that's 
what I have been concerned about. Output mode is effectively no-op at least for 
behavior on DSv1 sink. For the backward compatibility we allow to do 
update/complete as append, but that's just to not break backward compatibility 
on old data sources and we shouldn't continue doing this.
   
   I've already raised related discussion in dev. mailing list months ago, but 
no response. I wish we don't ignore the discussion thread in dev mailing list.
   
http://apache-spark-developers-list.1001551.n3.nabble.com/Output-mode-in-Structured-Streaming-and-DSv1-sink-DSv2-table-tt30216.html#a30239
   
   > Users are LAAAAZZY. As a developer, I would also prefer that people 
explicitly create their tables first, but plenty of users complain about that 
workflow.
   
   I agree about this, but user are not always wanted to create a table if it 
doesn't exist. That's the reason there's `append` in save mode, and we don't 
have such in new approach. Yes, users are lazy, and that said they don't always 
want to assume a new table could be created and provide all informations in 
case of table creation. If the table exists, these provided options are 
meaningless and just a burden (and also quite confused if the existing table 
has different options).
   
   > Can't we parse the string partitions as expressions?
   
   ~DSv1 interface doesn't allow to provide expression to partition. Please 
refer the definition of DataSource. That would be completely data source's role 
to parse and interpret the string partition column. This is quite different 
from what we do for DSv2. That said, we can't fully leverage the functionality 
of create table against DSv2 in interfaces based on DSv1, like 
DataStreamWriter.~
   
   My bad, probably you're talking about DSv2. Even in DataFrameWriter we don't 
do that (please correct me if I'm mistaken) - please refer 
`DataFrameWriter.partitioningAsV2`. The difference between DataFrameWriter and 
DataFrameWriterV2 is not only removing savemode. DataFrameWriter doesn't fully 
support DSv2 table creation - exactly same problem with what I pointed out. In 
batch query, you can prevent creating DSv2 table unintentionally with immature 
table properties via using savemode as "append", or use DataFrameWriterV2 to 
create DSv2 table with full support. There's no such thing in streaming path.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to