HeartSaVioR edited a comment on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-736957997


   > You can perform complete mode writes, which overwrites the entire data 
every time.
   
   Sorry probably I wasn't clear. This isn't true for DSv1 Sink interface 
unless data source does the hack to require providing output mode to the data 
source option directly. You have no idea of output mode in DSv1, and that's 
what I have been concerned about. Output mode is effectively no-op at least for 
DSv1 sink. For the backward compatibility we allow to do update/complete as 
append, but that's just to not break backward compatibility on old data sources 
and we shouldn't continue doing this.
   
   I've already raised related discussion in dev. mailing list months ago, but 
no response. I wish we don't ignore the discussion thread in dev mailing list.
   
http://apache-spark-developers-list.1001551.n3.nabble.com/Output-mode-in-Structured-Streaming-and-DSv1-sink-DSv2-table-tt30216.html#a30239
   
   > Users are LAAAAZZY. As a developer, I would also prefer that people 
explicitly create their tables first, but plenty of users complain about that 
workflow.
   
   I agree about this, but user are not always wanted to create a table if it 
doesn't exist. That's the reason there's `append` in save mode, and we don't 
have such in new approach. Yes, users are lazy, and that said they don't always 
want to assume a new table could be created and provide all informations in 
case of table creation. If the table exists, these provided options are 
meaningless and just a burden (and also quite confused if the existing table 
has different options).
   
   > Can't we parse the string partitions as expressions?
   
   ~DSv1 interface doesn't allow to provide expression to partition. Please 
refer the definition of DataSource. That would be completely data source's role 
to parse and interpret the string partition column. This is quite different 
from what we do for DSv2. That said, we can't fully leverage the functionality 
of create table against DSv2 in interfaces based on DSv1, like 
DataStreamWriter.~
   
   My bad, probably you're talking about DSv2. Even in DataFrameWriter we don't 
do that (please correct me if I'm mistaken) - please refer 
`DataFrameWriter.partitioningAsV2`. The difference between DataFrameWriter and 
DataFrameWriterV2 is not only removing savemode. DataFrameWriter doesn't fully 
support DSv2 table creation - same problem with what I pointed out. In batch 
query, you can prevent creating wrong DSv2 table with using savemode as 
"append", or use DataFrameWriterV2 for DSv2 table. There's no such thing in 
streaming path.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to