HeartSaVioR edited a comment on pull request #30521: URL: https://github.com/apache/spark/pull/30521#issuecomment-736957997
> You can perform complete mode writes, which overwrites the entire data every time. Sorry probably I wasn't clear. This isn't true for DSv1 Sink interface unless data source does the hack to require providing output mode to the data source option directly. You have no idea of output mode in DSv1, and that's what I have been concerned about. Output mode is effectively no-op at least for behavior on DSv1 sink. For the backward compatibility we allow to do update/complete as append, but that's just to not break backward compatibility on old data sources and we shouldn't continue doing this. I've already raised related discussion in dev. mailing list months ago, but no response. I wish we don't ignore the discussion thread in dev mailing list. http://apache-spark-developers-list.1001551.n3.nabble.com/Output-mode-in-Structured-Streaming-and-DSv1-sink-DSv2-table-tt30216.html#a30239 > Users are LAAAAZZY. As a developer, I would also prefer that people explicitly create their tables first, but plenty of users complain about that workflow. I agree about this, but user are not always wanted to create a table if it doesn't exist. That's the reason there's `append` in save mode, and we don't have such in new approach. Yes, users are lazy, and that said they don't always want to assume a new table could be created and provide all informations in case of table creation. If the table exists, these provided options are meaningless and just a burden (and also quite confused if the existing table has different options). > Can't we parse the string partitions as expressions? ~DSv1 interface doesn't allow to provide expression to partition. Please refer the definition of DataSource. That would be completely data source's role to parse and interpret the string partition column. This is quite different from what we do for DSv2. That said, we can't fully leverage the functionality of create table against DSv2 in interfaces based on DSv1, like DataStreamWriter.~ My bad, probably you're talking about DSv2. Even in DataFrameWriter we don't do that (please correct me if I'm mistaken) - please refer `DataFrameWriter.partitioningAsV2`. The difference between DataFrameWriter and DataFrameWriterV2 is not only removing savemode. DataFrameWriter doesn't fully support DSv2 table creation - exactly same problem with what I pointed out. In batch query, you can prevent creating DSv2 table unintentionally with immature table properties via using savemode as "append", or use DataFrameWriterV2 to create DSv2 table with full support. There's no such thing in streaming path. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
