[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

GitBox Wed, 02 Dec 2020 12:14:03 -0800


HeartSaVioR edited a comment on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-737466463



   >> You have no idea of output mode in DSv1, and that's what I have been 
concerned about.
   
   > We pass the output mode to the sink here:
   
   My bad, please disregard about the sentence `You have no idea of output mode 
in DSv1`. I still think that is still not good enough compared to what we do in 
DSv2. In DSv2 there's a check for availability, and it works like enforcing. In 
both cases update mode is never supported properly, but at least complete mode 
DSv2 enforces truncate.
   
   > Why not make the behavior of creating table consistent with creating an 
output path?
   
   I don't think both needs to be consistent, otherwise we should just remove 
append mode in batch query on saveAsTable. If we think about consistency with 
batch path, it should be just possible to not create table even if the table 
doesn't exist.
   
   I have commented multiple times to explain about why we should not create 
table by default and how DataStreamWriter cannot cover all cases, so I'll just 
quote my comments instead.
   
   >> Users are LAAAAZZY. As a developer, I would also prefer that people 
explicitly create their tables first, but plenty of users complain about that 
workflow.
   
   > I agree about this, but user are not always wanted to create a table if it 
doesn't exist. That's the reason there's append in save mode, and we don't have 
such in new approach. Yes, users are lazy, and that said they don't always want 
to assume a new table could be created and provide all informations in case of 
table creation. If the table exists, these provided options are meaningless and 
just a burden (and also quite confused if the existing table has different 
options).
   
   >> Can't we parse the string partitions as expressions?
   
   > My bad, probably you're talking about DSv2. Even in DataFrameWriter we 
don't do that (please correct me if I'm mistaken) - please refer 
DataFrameWriter.partitioningAsV2. The difference between DataFrameWriter and 
DataFrameWriterV2 is not only removing savemode. DataFrameWriter doesn't fully 
support DSv2 table creation - exactly same problem with what I pointed out. In 
batch query, you can prevent creating DSv2 table unintentionally with immature 
table properties via using savemode as "append", or use DataFrameWriterV2 to 
create DSv2 table with full support. There's no such thing in streaming path.
   
   > I think leveraging the old (probably DSv1) options is not sufficient - 
this doesn't have full coverage on DSv2 table - no Transform on partitioning, 
no properties, no options.
   > Using source (via format(...)) as USE <provider> is also not intuitive - 
it is only effective when table creation is taking place, and it occurs 
implicitly.
   
   In overall, I don't see any deep consideration about v2 table here, whereas 
my initial rationalization of adding the API was to enable support v2 table. 
Can we please stop thinking only on v1 table and ensure we also cover v2 table?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

Reply via email to