HeartSaVioR edited a comment on pull request #30521: URL: https://github.com/apache/spark/pull/30521#issuecomment-738316097
Apologize misunderstanding you. That was not an assumption, got it. `in most existing of cases, table can be viewed as an alias of a path` - this could be true, but we should account of the fact that revised DSv2 API is just released in 3.0.0, last summer in this year. It's just unfair if we compare two by apple vs apple. My request has been clear, respect v2 table. v2 tables are disregarded in this discussion even I made clear multiple times this API was intended to support write v2 catalog table with streaming query which "wasn't possible". From the view of v2 table, create table by default is unhappy for sure. Even the table has partitioning information, end users should make an assumption of "what if the table doesn't exist?" and either add create table all the time or add partitioning information to write path all the time. This is even no longer possible if they use non-identity transform. Same for table properties, but for the table properties we even don't give a chance to provide. So their only workaround is ensuring create table is made all the time, which is not a thing I can agree with. Similar problem happens in batch side when you use DataFrameWriter without SaveMode.Append, but I don't claim it's wrong because 1) there's still SaveMode.Append which doesn't enforce creating table 2) there's reserved API for providing full support of v2 table. Eventually streaming path must provide the same, but 2) could be deferred via SPARK-33638. 1) should be made now. That defines the lowest bar: 1. In any way end users should have choices to pick either "create + append" or "append". I can step back about default value if we really insist to only have a single method (so the default value of create table flag parameter can be true), as end users can still avoid the problem with turning off flag parameter. 2. The limitation of support v2 table in DataStreamWriter must be explained thoughtfully in javadoc of `toTable`. End users deserve to know about it and decide how to deal with. Additionally, filing SPARK-33638 doesn't mean I am OK to not address lack of support on v2 table. That's just deferred because time is running out for 3.1.0. For the symmetric support across batch and streaming, SPARK-33638 should probably be a blocker for Spark 3.2.0. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
