[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

GitBox Thu, 03 Dec 2020 15:55:22 -0800


HeartSaVioR edited a comment on pull request #30521:
URL: https://github.com/apache/spark/pull/30521#issuecomment-738316097



   Apologize misunderstood you. That was not an assumption, got it. `in most 
existing of cases, table can be viewed as an alias of a path` - this could be 
true, but we should account of the fact that revised DSv2 API is just released 
in 3.0.0, last summer in this year. It's just unfair if we compare two by apple 
vs apple.
   
   My request has been clear, respect v2 table. v2 tables are disregarded in 
this discussion even I made clear multiple times this API was intended to 
support write v2 catalog table with streaming query which "wasn't possible".
   
   From the view of v2 table, create table by default is unhappy for sure. Even 
the table has partitioning information, end users should make an assumption of 
"what if the table doesn't exist?" and either add create table all the time or 
add partitioning information to write path all the time. This is even no longer 
possible if they use non-identity transform. Same for table properties, but for 
the table properties we even don't give a chance to provide. So their only 
workaround is ensuring create table is made all the time, which is not a thing 
I can agree with.
   
   Similar problem happens in batch side when you use DataFrameWriter without 
SaveMode.Append, but I don't claim it's wrong because 1) there's still 
SaveMode.Append which doesn't enforce creating table 2) there's reserved API 
for providing full support of v2 table. Eventually streaming path must provide 
the same, but 2) could be deferred via SPARK-33638. 1) should be made now.
   
   That defines the lowest bar:
   
   1. In any way end users should have choices to pick either "create + append" 
or "append". I can step back about default value if we really insist to only 
have a single method (so the default value of create table flag parameter can 
be true), as end users can still avoid the problem with turning off flag 
parameter.
   
   2. The limitation of support v2 table in DataStreamWriter must be explained 
thoughtfully in javadoc of `toTable`. End users deserve to know about it and 
decide how to deal with.
   
   Additionally, filing SPARK-33638 doesn't mean I am OK to not address lack of 
support on v2 table. That's just deferred because time is running out for 
3.1.0. For the symmetric support across batch and streaming, SPARK-33638 should 
probably be a blocker for Spark 3.2.0.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR edited a comment on pull request #30521: [SPARK-33577][SS] Add support for V1Table in stream writer table API

Reply via email to