Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi, all. I started a vote for this FLIP[1], please vote there[2] or ask additional questions here[3]. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement+in+batch+mode [2] https://lists.apache.org/thread/fosvz0zcyfn6bp6vz2oxl45vq9qhkn2v [3] https://lists.apache.org/thread/m4r3wrd7p96wdst3nz3ncqzog6kf51cf Best regards, Yuxia - 原始邮件 - 发件人: "Jark Wu" 收件人: "dev" 发送时间: 星期五, 2023年 4 月 14日 下午 11:04:58 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement Hi Yuxia, Thank you for the updating. That sounds good to me. Best, Jark > 2023年4月14日 19:00,yuxia 写道: > > Hi, Jark. > I'm expecting if the "executeTruncation" returns false, Flink will throw an > generic exception like "Fail to execute truncate table statement." > But the connector implementation can also throw more specific exception like > "Fail to execute truncate table statement for it table is been writing by > other jobs". > > But after think it over, I'm afraid of the connector implementation will > always return false to make Flink itself construnct the exception which maybe > not very useful for it provides > much less exception message instead of throwing more specific exception. > So I decide to change it to `void executeTruncation()` and reminder to throw > exception if truncate operation hasn't been executed successfully in the java > doc of the method. > I had updated this FLIP. > > > Best regards, > Yuxia > > - 原始邮件 ----- > 发件人: "Jark Wu" > 收件人: "dev" > 发送时间: 星期五, 2023年 4 月 14日 下午 5:10:48 > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > The FLIP looks good to me. +1 to start a vote. > > I just have a question: what will happen if the "executeTruncation" returns > false without any exceptions? > > Best, > Jark > > On Thu, 13 Apr 2023 at 19:59, Jing Ge wrote: > >> Thanks Yuxia for the clarification and FLIP update. The FLIP looks good! >> >> Best regards, >> Jing >> >> On Mon, Apr 10, 2023 at 3:51 AM yuxia wrote: >> >>> 1: >>> Actaully, considering the Flink's implementation, Flink just provides >>> Truncate Table syntax to help user simlify data management as said in >> this >>> FLIP and push the implementation of Truncate Table to external connector. >>> Normally, the effect of TRUENCATE TABLE is same as Drop Table + Create >>> Table. But the real difference/benefit depends on the implementation of >> the >>> external connector. >>> For example, for DROP Table statement, some external connectors may also >>> drop the view related or other things. >>> But for Truncate Table, the connectors may just delete all data without >>> other operations. >>> >>> >>> 2: >>> At very begining, I'm thinking about in which case user may want to >>> truncate a temporary table. >>> I thought users can always create a table in catalog(if the table doesn't >>> exist in a catalog) and truncate the table. So I tend not to expose it to >>> user. >>> But after I think it over again, I think it may be reasonable to support >>> truncate a temporary table for the case that user just want to delete all >>> datas from a table in an external storage without storing the metadata of >>> the table in a catalog so that the other user/session can't see the >> metada. >>> I think we can relax to the constraint to support truncate temporary >>> table. Now, I update it to the FLIP. >>> >>> >>> 3: >>> Thanks for your input, I agree that we can dicuss it in a different FLIP. >>> >>> >>> >>> Best regards, >>> Yuxia >>> >>> - 原始邮件 - >>> 发件人: "Jing Ge" >>> 收件人: "dev" >>> 发送时间: 星期六, 2023年 4 月 08日 上午 3:05:11 >>> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement >>> >>> Hi yuxia, >>> >>> Thanks for raising this topic. It is indeed a useful feature. +1 for >>> having it in Flink. I have some small questions and it would be great if >>> related information could be described in the FLIP. >>> >>> 1. Speaking of data warehouse use cases, what is the benefit of using >>> TRUNCATE table over DROP table + CREATE table IF NOT EXISTS with the >>> consideration of concrete Flink implementations? What would be the >>> suggestion for users to use TRUNCATE instead of DROP + CREATE... and >>> vise versa? >>> >>>
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi Yuxia, Thank you for the updating. That sounds good to me. Best, Jark > 2023年4月14日 19:00,yuxia 写道: > > Hi, Jark. > I'm expecting if the "executeTruncation" returns false, Flink will throw an > generic exception like "Fail to execute truncate table statement." > But the connector implementation can also throw more specific exception like > "Fail to execute truncate table statement for it table is been writing by > other jobs". > > But after think it over, I'm afraid of the connector implementation will > always return false to make Flink itself construnct the exception which maybe > not very useful for it provides > much less exception message instead of throwing more specific exception. > So I decide to change it to `void executeTruncation()` and reminder to throw > exception if truncate operation hasn't been executed successfully in the java > doc of the method. > I had updated this FLIP. > > > Best regards, > Yuxia > > - 原始邮件 ----- > 发件人: "Jark Wu" > 收件人: "dev" > 发送时间: 星期五, 2023年 4 月 14日 下午 5:10:48 > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > The FLIP looks good to me. +1 to start a vote. > > I just have a question: what will happen if the "executeTruncation" returns > false without any exceptions? > > Best, > Jark > > On Thu, 13 Apr 2023 at 19:59, Jing Ge wrote: > >> Thanks Yuxia for the clarification and FLIP update. The FLIP looks good! >> >> Best regards, >> Jing >> >> On Mon, Apr 10, 2023 at 3:51 AM yuxia wrote: >> >>> 1: >>> Actaully, considering the Flink's implementation, Flink just provides >>> Truncate Table syntax to help user simlify data management as said in >> this >>> FLIP and push the implementation of Truncate Table to external connector. >>> Normally, the effect of TRUENCATE TABLE is same as Drop Table + Create >>> Table. But the real difference/benefit depends on the implementation of >> the >>> external connector. >>> For example, for DROP Table statement, some external connectors may also >>> drop the view related or other things. >>> But for Truncate Table, the connectors may just delete all data without >>> other operations. >>> >>> >>> 2: >>> At very begining, I'm thinking about in which case user may want to >>> truncate a temporary table. >>> I thought users can always create a table in catalog(if the table doesn't >>> exist in a catalog) and truncate the table. So I tend not to expose it to >>> user. >>> But after I think it over again, I think it may be reasonable to support >>> truncate a temporary table for the case that user just want to delete all >>> datas from a table in an external storage without storing the metadata of >>> the table in a catalog so that the other user/session can't see the >> metada. >>> I think we can relax to the constraint to support truncate temporary >>> table. Now, I update it to the FLIP. >>> >>> >>> 3: >>> Thanks for your input, I agree that we can dicuss it in a different FLIP. >>> >>> >>> >>> Best regards, >>> Yuxia >>> >>> - 原始邮件 - >>> 发件人: "Jing Ge" >>> 收件人: "dev" >>> 发送时间: 星期六, 2023年 4 月 08日 上午 3:05:11 >>> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement >>> >>> Hi yuxia, >>> >>> Thanks for raising this topic. It is indeed a useful feature. +1 for >>> having it in Flink. I have some small questions and it would be great if >>> related information could be described in the FLIP. >>> >>> 1. Speaking of data warehouse use cases, what is the benefit of using >>> TRUNCATE table over DROP table + CREATE table IF NOT EXISTS with the >>> consideration of concrete Flink implementations? What would be the >>> suggestion for users to use TRUNCATE instead of DROP + CREATE... and >>> vise versa? >>> >>> 2. Since some engines support it, would you like to describe your >>> thought about why TRUNCATE table does not support temporary table? >>> >>> 3. The partition support is an important feature, afaic. It might >>> deserve a different FLIP and consider e.g.: TRUNCATE TABLE >>> tt_dw_usr_exp_xxx PARTITION(dt='20230303') and ALTER TABLE >>> tt_dw_usr_exp_xxx DROP IF EXISTS PARTITION(dt='20230303'). >>> >>> Looking forward to your thoughts. Thanks! >&g
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi, Jark. I'm expecting if the "executeTruncation" returns false, Flink will throw an generic exception like "Fail to execute truncate table statement." But the connector implementation can also throw more specific exception like "Fail to execute truncate table statement for it table is been writing by other jobs". But after think it over, I'm afraid of the connector implementation will always return false to make Flink itself construnct the exception which maybe not very useful for it provides much less exception message instead of throwing more specific exception. So I decide to change it to `void executeTruncation()` and reminder to throw exception if truncate operation hasn't been executed successfully in the java doc of the method. I had updated this FLIP. Best regards, Yuxia - 原始邮件 - 发件人: "Jark Wu" 收件人: "dev" 发送时间: 星期五, 2023年 4 月 14日 下午 5:10:48 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement The FLIP looks good to me. +1 to start a vote. I just have a question: what will happen if the "executeTruncation" returns false without any exceptions? Best, Jark On Thu, 13 Apr 2023 at 19:59, Jing Ge wrote: > Thanks Yuxia for the clarification and FLIP update. The FLIP looks good! > > Best regards, > Jing > > On Mon, Apr 10, 2023 at 3:51 AM yuxia wrote: > > > 1: > > Actaully, considering the Flink's implementation, Flink just provides > > Truncate Table syntax to help user simlify data management as said in > this > > FLIP and push the implementation of Truncate Table to external connector. > > Normally, the effect of TRUENCATE TABLE is same as Drop Table + Create > > Table. But the real difference/benefit depends on the implementation of > the > > external connector. > > For example, for DROP Table statement, some external connectors may also > > drop the view related or other things. > > But for Truncate Table, the connectors may just delete all data without > > other operations. > > > > > > 2: > > At very begining, I'm thinking about in which case user may want to > > truncate a temporary table. > > I thought users can always create a table in catalog(if the table doesn't > > exist in a catalog) and truncate the table. So I tend not to expose it to > > user. > > But after I think it over again, I think it may be reasonable to support > > truncate a temporary table for the case that user just want to delete all > > datas from a table in an external storage without storing the metadata of > > the table in a catalog so that the other user/session can't see the > metada. > > I think we can relax to the constraint to support truncate temporary > > table. Now, I update it to the FLIP. > > > > > > 3: > > Thanks for your input, I agree that we can dicuss it in a different FLIP. > > > > > > > > Best regards, > > Yuxia > > > > - 原始邮件 - > > 发件人: "Jing Ge" > > 收件人: "dev" > > 发送时间: 星期六, 2023年 4 月 08日 上午 3:05:11 > > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > > > Hi yuxia, > > > > Thanks for raising this topic. It is indeed a useful feature. +1 for > > having it in Flink. I have some small questions and it would be great if > > related information could be described in the FLIP. > > > > 1. Speaking of data warehouse use cases, what is the benefit of using > > TRUNCATE table over DROP table + CREATE table IF NOT EXISTS with the > > consideration of concrete Flink implementations? What would be the > > suggestion for users to use TRUNCATE instead of DROP + CREATE... and > > vise versa? > > > > 2. Since some engines support it, would you like to describe your > > thought about why TRUNCATE table does not support temporary table? > > > > 3. The partition support is an important feature, afaic. It might > > deserve a different FLIP and consider e.g.: TRUNCATE TABLE > > tt_dw_usr_exp_xxx PARTITION(dt='20230303') and ALTER TABLE > > tt_dw_usr_exp_xxx DROP IF EXISTS PARTITION(dt='20230303'). > > > > Looking forward to your thoughts. Thanks! > > > > Best regards, > > > > Jing > > > > On 4/7/23 05:04, Jingsong Li wrote: > > > +1 for voting. > > > > > > Best, > > > Jingsong > > > > > > On Thu, Apr 6, 2023 at 4:52 PM yuxia > > wrote: > > >> Hi everyone. > > >> > > >> If there are no other questions or concerns for the FLIP[1], I'd like > > to start the vote next Monday (4.10). > > >> > > >> [1] > > > https://cwiki.apa
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
The FLIP looks good to me. +1 to start a vote. I just have a question: what will happen if the "executeTruncation" returns false without any exceptions? Best, Jark On Thu, 13 Apr 2023 at 19:59, Jing Ge wrote: > Thanks Yuxia for the clarification and FLIP update. The FLIP looks good! > > Best regards, > Jing > > On Mon, Apr 10, 2023 at 3:51 AM yuxia wrote: > > > 1: > > Actaully, considering the Flink's implementation, Flink just provides > > Truncate Table syntax to help user simlify data management as said in > this > > FLIP and push the implementation of Truncate Table to external connector. > > Normally, the effect of TRUENCATE TABLE is same as Drop Table + Create > > Table. But the real difference/benefit depends on the implementation of > the > > external connector. > > For example, for DROP Table statement, some external connectors may also > > drop the view related or other things. > > But for Truncate Table, the connectors may just delete all data without > > other operations. > > > > > > 2: > > At very begining, I'm thinking about in which case user may want to > > truncate a temporary table. > > I thought users can always create a table in catalog(if the table doesn't > > exist in a catalog) and truncate the table. So I tend not to expose it to > > user. > > But after I think it over again, I think it may be reasonable to support > > truncate a temporary table for the case that user just want to delete all > > datas from a table in an external storage without storing the metadata of > > the table in a catalog so that the other user/session can't see the > metada. > > I think we can relax to the constraint to support truncate temporary > > table. Now, I update it to the FLIP. > > > > > > 3: > > Thanks for your input, I agree that we can dicuss it in a different FLIP. > > > > > > > > Best regards, > > Yuxia > > > > - 原始邮件 - > > 发件人: "Jing Ge" > > 收件人: "dev" > > 发送时间: 星期六, 2023年 4 月 08日 上午 3:05:11 > > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > > > Hi yuxia, > > > > Thanks for raising this topic. It is indeed a useful feature. +1 for > > having it in Flink. I have some small questions and it would be great if > > related information could be described in the FLIP. > > > > 1. Speaking of data warehouse use cases, what is the benefit of using > > TRUNCATE table over DROP table + CREATE table IF NOT EXISTS with the > > consideration of concrete Flink implementations? What would be the > > suggestion for users to use TRUNCATE instead of DROP + CREATE... and > > vise versa? > > > > 2. Since some engines support it, would you like to describe your > > thought about why TRUNCATE table does not support temporary table? > > > > 3. The partition support is an important feature, afaic. It might > > deserve a different FLIP and consider e.g.: TRUNCATE TABLE > > tt_dw_usr_exp_xxx PARTITION(dt='20230303') and ALTER TABLE > > tt_dw_usr_exp_xxx DROP IF EXISTS PARTITION(dt='20230303'). > > > > Looking forward to your thoughts. Thanks! > > > > Best regards, > > > > Jing > > > > On 4/7/23 05:04, Jingsong Li wrote: > > > +1 for voting. > > > > > > Best, > > > Jingsong > > > > > > On Thu, Apr 6, 2023 at 4:52 PM yuxia > > wrote: > > >> Hi everyone. > > >> > > >> If there are no other questions or concerns for the FLIP[1], I'd like > > to start the vote next Monday (4.10). > > >> > > >> [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > > >> > > >> Best regards, > > >> Yuxia > > >> > > >> - 原始邮件 - > > >> 发件人: "yuxia" > > >> 收件人: "dev" > > >> 发送时间: 星期五, 2023年 3 月 24日 上午 11:27:42 > > >> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > >> > > >> Thanks all for your feedback. > > >> > > >> @Shammon FY > > >> My gut feeling is that the end user shouldn't care about whether it'll > > delete direcotry or move to Trash directory with the TRUNCATE TABLE > > statement. They only need to know it will delete all rows from a table. > > >> To me, I think delete directory or move to trash is more likely to be > a > > behavior of external storage level instead of SQL statement level. In > Hive, > > i
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Thanks Yuxia for the clarification and FLIP update. The FLIP looks good! Best regards, Jing On Mon, Apr 10, 2023 at 3:51 AM yuxia wrote: > 1: > Actaully, considering the Flink's implementation, Flink just provides > Truncate Table syntax to help user simlify data management as said in this > FLIP and push the implementation of Truncate Table to external connector. > Normally, the effect of TRUENCATE TABLE is same as Drop Table + Create > Table. But the real difference/benefit depends on the implementation of the > external connector. > For example, for DROP Table statement, some external connectors may also > drop the view related or other things. > But for Truncate Table, the connectors may just delete all data without > other operations. > > > 2: > At very begining, I'm thinking about in which case user may want to > truncate a temporary table. > I thought users can always create a table in catalog(if the table doesn't > exist in a catalog) and truncate the table. So I tend not to expose it to > user. > But after I think it over again, I think it may be reasonable to support > truncate a temporary table for the case that user just want to delete all > datas from a table in an external storage without storing the metadata of > the table in a catalog so that the other user/session can't see the metada. > I think we can relax to the constraint to support truncate temporary > table. Now, I update it to the FLIP. > > > 3: > Thanks for your input, I agree that we can dicuss it in a different FLIP. > > > > Best regards, > Yuxia > > ----- 原始邮件 - > 发件人: "Jing Ge" > 收件人: "dev" > 发送时间: 星期六, 2023年 4 月 08日 上午 3:05:11 > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > Hi yuxia, > > Thanks for raising this topic. It is indeed a useful feature. +1 for > having it in Flink. I have some small questions and it would be great if > related information could be described in the FLIP. > > 1. Speaking of data warehouse use cases, what is the benefit of using > TRUNCATE table over DROP table + CREATE table IF NOT EXISTS with the > consideration of concrete Flink implementations? What would be the > suggestion for users to use TRUNCATE instead of DROP + CREATE... and > vise versa? > > 2. Since some engines support it, would you like to describe your > thought about why TRUNCATE table does not support temporary table? > > 3. The partition support is an important feature, afaic. It might > deserve a different FLIP and consider e.g.: TRUNCATE TABLE > tt_dw_usr_exp_xxx PARTITION(dt='20230303') and ALTER TABLE > tt_dw_usr_exp_xxx DROP IF EXISTS PARTITION(dt='20230303'). > > Looking forward to your thoughts. Thanks! > > Best regards, > > Jing > > On 4/7/23 05:04, Jingsong Li wrote: > > +1 for voting. > > > > Best, > > Jingsong > > > > On Thu, Apr 6, 2023 at 4:52 PM yuxia > wrote: > >> Hi everyone. > >> > >> If there are no other questions or concerns for the FLIP[1], I'd like > to start the vote next Monday (4.10). > >> > >> [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > >> > >> Best regards, > >> Yuxia > >> > >> - 原始邮件 - > >> 发件人: "yuxia" > >> 收件人: "dev" > >> 发送时间: 星期五, 2023年 3 月 24日 上午 11:27:42 > >> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > >> > >> Thanks all for your feedback. > >> > >> @Shammon FY > >> My gut feeling is that the end user shouldn't care about whether it'll > delete direcotry or move to Trash directory with the TRUNCATE TABLE > statement. They only need to know it will delete all rows from a table. > >> To me, I think delete directory or move to trash is more likely to be a > behavior of external storage level instead of SQL statement level. In Hive, > if user configure Trash, it will then move files to trash for DROP statment. > >> Also, hardly did I see such usage with TRUNCATE TABLE statement in > other engines. What's more, to support it, we have to extend the TRUNCATE > TABLE synax which won't then compliant with SQL standard. I really don't > want to do that and I believe it'll make user confused if we do so. > >> > >> @Hang > >> `TRUNCATE TABLE` is meant to delete all rows of a base table. So, it > makes no sense that table source implements it. > >> If user use TRUNCATE TABLE statement to truncate a table, the planner > will only try to > >> find the DynamicTableSink for the corresponding table. > >> > >> @Ran Tao > >> 1: Than
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi. Thanks all for valuable inputs. If there are no other questions or concerns for the FLIP[1], I'd like to start the vote tomorrow (4.14). [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement Best regards, Yuxia - 原始邮件 - 发件人: "Aitozi" 收件人: "dev" 发送时间: 星期四, 2023年 4 月 13日 下午 4:49:11 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement Well, thanks xia for your clarification. Agree with your point, I have no other concerns. Best, Aitozi. yuxia 于2023年4月13日周四 16:17写道: > > Hi, Aitozi. > Thanks for your inputs. I understand your concern. Althogh the external > connector can update the metadata in method `executeTruncation`, > but the Flink catalog can't be aware the updating in some case. If the Hive > catalog only store hive tables, everything will be fine. > But if the Hive catalog also store non-hive table, and the non-hive table > can't be update the underlying Hive metatasore, as a result of which > the Hive catalog will still get old metata. > > Since this problem is generic which is not only limited to truncate table > statment, but also to other statement, like insert, update/delete or other > statments on the way. > I think it deserves another dedicated channel to discuss what the Flink > catalog is for or do we need to introduce some new mechanism for it. > > > Best regards, > Yuxia > > - 原始邮件 ----- > 发件人: "Aitozi" > 收件人: "dev" > 发送时间: 星期四, 2023年 4 月 13日 下午 2:37:48 > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > Hi, xia >> which I think if Flink supports table cache in framework-level, > we can also recache in framework-level for truncate table statement. > > I think currently flink catalog already will some stats for the table, > eg: after `ANALYZE TABLE`, the table's Statistics will be stored in > the > catalog, but truncate table will not correct the statistic. > > I know it's hard for Flink to do the unified follow-up actions after > truncating table. But I think we need define a clear location for the > Flink Catalog > in mind. > IMO, Flink as a compute engine, it's hard for it to maintain the > catalog for different storage table itself. So with more and more > `Executable` > command introduced the data in catalog will be cleaved. > In this case, after truncate the catalog's following part may be affected: > > - the table/column statistic will be not correct > - the partition of this table should be cleared > > > Best, > Aitozi. > > > liu ron 于2023年4月13日周四 11:28写道: > > > > > Hi, xia > > > > Thanks for your explanation, for the first question, given the current > > status, I think we can provide the generic interface in the future if we > > need it. For the second question, it makes sense to me if we can > > support the table cache at the framework level. > > > > Best, > > Ron > > > > yuxia 于2023年4月11日周二 16:12写道: > > > > > Hi, ron. > > > > > > 1: Considering for deleting rows, Flink will also write delete record to > > > achive purpose of deleting data, it may not as so strange for connector > > > devs to make DynamicTableSink implement SupportsTruncate to support > > > truncate the table. Based on the assume that DynamicTableSink is used for > > > inserting/updating/deleting, I think it's reasonable for DynamicTableSink > > > to implement SupportsTruncate. But I think it sounds reasonable to add a > > > generic interface like DynamicTable to differentiate DynamicTableSource & > > > DynamicTableSink. But it will definitely requires much design and > > > discussion which deserves a dedicated FLIP. I perfer not to do that in > > > this > > > FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe > > > we can discuss it if some day if we do need the new generic table > > > interface. > > > > > > 2: Considering various catalogs and tables, it's hard for Flink to do the > > > unified follow-up actions after truncating table. But still the external > > > connector can do such follow-up actions in method `executeTruncation`. > > > Btw, in Spark, for the newly truncate table interface[1], Spark only > > > recaches the table after truncating table[2] which I think if Flink > > > supports table cache in framework-level, > > > we can also recache in framework-level for truncate table statement. > > > > > > [1] > > > https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/T
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Well, thanks xia for your clarification. Agree with your point, I have no other concerns. Best, Aitozi. yuxia 于2023年4月13日周四 16:17写道: > > Hi, Aitozi. > Thanks for your inputs. I understand your concern. Althogh the external > connector can update the metadata in method `executeTruncation`, > but the Flink catalog can't be aware the updating in some case. If the Hive > catalog only store hive tables, everything will be fine. > But if the Hive catalog also store non-hive table, and the non-hive table > can't be update the underlying Hive metatasore, as a result of which > the Hive catalog will still get old metata. > > Since this problem is generic which is not only limited to truncate table > statment, but also to other statement, like insert, update/delete or other > statments on the way. > I think it deserves another dedicated channel to discuss what the Flink > catalog is for or do we need to introduce some new mechanism for it. > > > Best regards, > Yuxia > > - 原始邮件 - > 发件人: "Aitozi" > 收件人: "dev" > 发送时间: 星期四, 2023年 4 月 13日 下午 2:37:48 > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > Hi, xia >> which I think if Flink supports table cache in framework-level, > we can also recache in framework-level for truncate table statement. > > I think currently flink catalog already will some stats for the table, > eg: after `ANALYZE TABLE`, the table's Statistics will be stored in > the > catalog, but truncate table will not correct the statistic. > > I know it's hard for Flink to do the unified follow-up actions after > truncating table. But I think we need define a clear location for the > Flink Catalog > in mind. > IMO, Flink as a compute engine, it's hard for it to maintain the > catalog for different storage table itself. So with more and more > `Executable` > command introduced the data in catalog will be cleaved. > In this case, after truncate the catalog's following part may be affected: > > - the table/column statistic will be not correct > - the partition of this table should be cleared > > > Best, > Aitozi. > > > liu ron 于2023年4月13日周四 11:28写道: > > > > > Hi, xia > > > > Thanks for your explanation, for the first question, given the current > > status, I think we can provide the generic interface in the future if we > > need it. For the second question, it makes sense to me if we can > > support the table cache at the framework level. > > > > Best, > > Ron > > > > yuxia 于2023年4月11日周二 16:12写道: > > > > > Hi, ron. > > > > > > 1: Considering for deleting rows, Flink will also write delete record to > > > achive purpose of deleting data, it may not as so strange for connector > > > devs to make DynamicTableSink implement SupportsTruncate to support > > > truncate the table. Based on the assume that DynamicTableSink is used for > > > inserting/updating/deleting, I think it's reasonable for DynamicTableSink > > > to implement SupportsTruncate. But I think it sounds reasonable to add a > > > generic interface like DynamicTable to differentiate DynamicTableSource & > > > DynamicTableSink. But it will definitely requires much design and > > > discussion which deserves a dedicated FLIP. I perfer not to do that in > > > this > > > FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe > > > we can discuss it if some day if we do need the new generic table > > > interface. > > > > > > 2: Considering various catalogs and tables, it's hard for Flink to do the > > > unified follow-up actions after truncating table. But still the external > > > connector can do such follow-up actions in method `executeTruncation`. > > > Btw, in Spark, for the newly truncate table interface[1], Spark only > > > recaches the table after truncating table[2] which I think if Flink > > > supports table cache in framework-level, > > > we can also recache in framework-level for truncate table statement. > > > > > > [1] > > > https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java > > > [2] > > > https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala > > > > > > > > > I think the external catalog can implemnet such logic in method > > > `executeTruncation`. > > > > > > Best regards, >
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi, Aitozi. Thanks for your inputs. I understand your concern. Althogh the external connector can update the metadata in method `executeTruncation`, but the Flink catalog can't be aware the updating in some case. If the Hive catalog only store hive tables, everything will be fine. But if the Hive catalog also store non-hive table, and the non-hive table can't be update the underlying Hive metatasore, as a result of which the Hive catalog will still get old metata. Since this problem is generic which is not only limited to truncate table statment, but also to other statement, like insert, update/delete or other statments on the way. I think it deserves another dedicated channel to discuss what the Flink catalog is for or do we need to introduce some new mechanism for it. Best regards, Yuxia - 原始邮件 - 发件人: "Aitozi" 收件人: "dev" 发送时间: 星期四, 2023年 4 月 13日 下午 2:37:48 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement Hi, xia > which I think if Flink supports table cache in framework-level, we can also recache in framework-level for truncate table statement. I think currently flink catalog already will some stats for the table, eg: after `ANALYZE TABLE`, the table's Statistics will be stored in the catalog, but truncate table will not correct the statistic. I know it's hard for Flink to do the unified follow-up actions after truncating table. But I think we need define a clear location for the Flink Catalog in mind. IMO, Flink as a compute engine, it's hard for it to maintain the catalog for different storage table itself. So with more and more `Executable` command introduced the data in catalog will be cleaved. In this case, after truncate the catalog's following part may be affected: - the table/column statistic will be not correct - the partition of this table should be cleared Best, Aitozi. liu ron 于2023年4月13日周四 11:28写道: > > Hi, xia > > Thanks for your explanation, for the first question, given the current > status, I think we can provide the generic interface in the future if we > need it. For the second question, it makes sense to me if we can > support the table cache at the framework level. > > Best, > Ron > > yuxia 于2023年4月11日周二 16:12写道: > > > Hi, ron. > > > > 1: Considering for deleting rows, Flink will also write delete record to > > achive purpose of deleting data, it may not as so strange for connector > > devs to make DynamicTableSink implement SupportsTruncate to support > > truncate the table. Based on the assume that DynamicTableSink is used for > > inserting/updating/deleting, I think it's reasonable for DynamicTableSink > > to implement SupportsTruncate. But I think it sounds reasonable to add a > > generic interface like DynamicTable to differentiate DynamicTableSource & > > DynamicTableSink. But it will definitely requires much design and > > discussion which deserves a dedicated FLIP. I perfer not to do that in this > > FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe > > we can discuss it if some day if we do need the new generic table interface. > > > > 2: Considering various catalogs and tables, it's hard for Flink to do the > > unified follow-up actions after truncating table. But still the external > > connector can do such follow-up actions in method `executeTruncation`. > > Btw, in Spark, for the newly truncate table interface[1], Spark only > > recaches the table after truncating table[2] which I think if Flink > > supports table cache in framework-level, > > we can also recache in framework-level for truncate table statement. > > > > [1] > > https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java > > [2] > > https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala > > > > > > I think the external catalog can implemnet such logic in method > > `executeTruncation`. > > > > Best regards, > > Yuxia > > > > - 原始邮件 - > > 发件人: "liu ron" > > 收件人: "dev" > > 发送时间: 星期二, 2023年 4 月 11日 上午 10:51:36 > > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > > > Hi, xia > > It's a nice improvement to support TRUNCATE TABLE statement, making Flink > > more feature-rich. > > I think the truncate syntax is a command that will be executed in the > > client's process, rather than pulling up a Flink job to execute on the > > cluster. So on the user-facing exposed interface, I think we should not let > > users implement the Supports
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi, xia > which I think if Flink supports table cache in framework-level, we can also recache in framework-level for truncate table statement. I think currently flink catalog already will some stats for the table, eg: after `ANALYZE TABLE`, the table's Statistics will be stored in the catalog, but truncate table will not correct the statistic. I know it's hard for Flink to do the unified follow-up actions after truncating table. But I think we need define a clear location for the Flink Catalog in mind. IMO, Flink as a compute engine, it's hard for it to maintain the catalog for different storage table itself. So with more and more `Executable` command introduced the data in catalog will be cleaved. In this case, after truncate the catalog's following part may be affected: - the table/column statistic will be not correct - the partition of this table should be cleared Best, Aitozi. liu ron 于2023年4月13日周四 11:28写道: > > Hi, xia > > Thanks for your explanation, for the first question, given the current > status, I think we can provide the generic interface in the future if we > need it. For the second question, it makes sense to me if we can > support the table cache at the framework level. > > Best, > Ron > > yuxia 于2023年4月11日周二 16:12写道: > > > Hi, ron. > > > > 1: Considering for deleting rows, Flink will also write delete record to > > achive purpose of deleting data, it may not as so strange for connector > > devs to make DynamicTableSink implement SupportsTruncate to support > > truncate the table. Based on the assume that DynamicTableSink is used for > > inserting/updating/deleting, I think it's reasonable for DynamicTableSink > > to implement SupportsTruncate. But I think it sounds reasonable to add a > > generic interface like DynamicTable to differentiate DynamicTableSource & > > DynamicTableSink. But it will definitely requires much design and > > discussion which deserves a dedicated FLIP. I perfer not to do that in this > > FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe > > we can discuss it if some day if we do need the new generic table interface. > > > > 2: Considering various catalogs and tables, it's hard for Flink to do the > > unified follow-up actions after truncating table. But still the external > > connector can do such follow-up actions in method `executeTruncation`. > > Btw, in Spark, for the newly truncate table interface[1], Spark only > > recaches the table after truncating table[2] which I think if Flink > > supports table cache in framework-level, > > we can also recache in framework-level for truncate table statement. > > > > [1] > > https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java > > [2] > > https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala > > > > > > I think the external catalog can implemnet such logic in method > > `executeTruncation`. > > > > Best regards, > > Yuxia > > > > - 原始邮件 - > > 发件人: "liu ron" > > 收件人: "dev" > > 发送时间: 星期二, 2023年 4 月 11日 上午 10:51:36 > > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > > > Hi, xia > > It's a nice improvement to support TRUNCATE TABLE statement, making Flink > > more feature-rich. > > I think the truncate syntax is a command that will be executed in the > > client's process, rather than pulling up a Flink job to execute on the > > cluster. So on the user-facing exposed interface, I think we should not let > > users implement the SupportsTruncate interface on the DynamicTableSink > > interface. This seems a bit strange and also confuses users, as hang said, > > why Source table does not support truncate. It would be nice if we could > > come up with a generic interface that supports truncate instead of binding > > it to the DynamicTableSink interface, and maybe in the future we will > > support more commands like truncate command. > > > > In addition, after truncating data, we may also need to update the metadata > > of the table, such as Hive table, we need to update the statistics, as well > > as clear the cache in the metastore, I think we should also consider these > > capabilities, Sparky has considered these, refer to > > > > https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573 > > . > > > >
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi, xia Thanks for your explanation, for the first question, given the current status, I think we can provide the generic interface in the future if we need it. For the second question, it makes sense to me if we can support the table cache at the framework level. Best, Ron yuxia 于2023年4月11日周二 16:12写道: > Hi, ron. > > 1: Considering for deleting rows, Flink will also write delete record to > achive purpose of deleting data, it may not as so strange for connector > devs to make DynamicTableSink implement SupportsTruncate to support > truncate the table. Based on the assume that DynamicTableSink is used for > inserting/updating/deleting, I think it's reasonable for DynamicTableSink > to implement SupportsTruncate. But I think it sounds reasonable to add a > generic interface like DynamicTable to differentiate DynamicTableSource & > DynamicTableSink. But it will definitely requires much design and > discussion which deserves a dedicated FLIP. I perfer not to do that in this > FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe > we can discuss it if some day if we do need the new generic table interface. > > 2: Considering various catalogs and tables, it's hard for Flink to do the > unified follow-up actions after truncating table. But still the external > connector can do such follow-up actions in method `executeTruncation`. > Btw, in Spark, for the newly truncate table interface[1], Spark only > recaches the table after truncating table[2] which I think if Flink > supports table cache in framework-level, > we can also recache in framework-level for truncate table statement. > > [1] > https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java > [2] > https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala > > > I think the external catalog can implemnet such logic in method > `executeTruncation`. > > Best regards, > Yuxia > > ----- 原始邮件 - > 发件人: "liu ron" > 收件人: "dev" > 发送时间: 星期二, 2023年 4 月 11日 上午 10:51:36 > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > Hi, xia > It's a nice improvement to support TRUNCATE TABLE statement, making Flink > more feature-rich. > I think the truncate syntax is a command that will be executed in the > client's process, rather than pulling up a Flink job to execute on the > cluster. So on the user-facing exposed interface, I think we should not let > users implement the SupportsTruncate interface on the DynamicTableSink > interface. This seems a bit strange and also confuses users, as hang said, > why Source table does not support truncate. It would be nice if we could > come up with a generic interface that supports truncate instead of binding > it to the DynamicTableSink interface, and maybe in the future we will > support more commands like truncate command. > > In addition, after truncating data, we may also need to update the metadata > of the table, such as Hive table, we need to update the statistics, as well > as clear the cache in the metastore, I think we should also consider these > capabilities, Sparky has considered these, refer to > > https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573 > . > > Best, > > Ron > > Jim Hughes 于2023年4月11日周二 02:15写道: > > > Hi Yuxia, > > > > On Mon, Apr 10, 2023 at 10:35 AM yuxia > > wrote: > > > > > Hi, Jim. > > > > > > 1: I'm expecting all DynamicTableSinks to support. But it's hard to > > > support all at one shot. For the DynamicTableSinks that haven't > > implemented > > > SupportsTruncate interface, we'll throw exception > > > like 'The truncate statement for the table is not supported as it > hasn't > > > implemented the interface SupportsTruncate'. Also, for some sinks that > > > doesn't support deleting data, it can also implements it but throw more > > > concrete exception like "xxx donesn't support to truncate a table as > > delete > > > is impossible for xxx". It depends on the external connector's > > > implementation. > > > Thanks for your advice, I updated it to the FLIP. > > > > > > > Makes sense. > > > > > > > 2: What do you mean by saying "truncate an input to a streaming query"? > > > This FLIP is aimed to support TRUNCATE TABLE statement which is for > > > truncating a table. In whic
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi, ron. 1: Considering for deleting rows, Flink will also write delete record to achive purpose of deleting data, it may not as so strange for connector devs to make DynamicTableSink implement SupportsTruncate to support truncate the table. Based on the assume that DynamicTableSink is used for inserting/updating/deleting, I think it's reasonable for DynamicTableSink to implement SupportsTruncate. But I think it sounds reasonable to add a generic interface like DynamicTable to differentiate DynamicTableSource & DynamicTableSink. But it will definitely requires much design and discussion which deserves a dedicated FLIP. I perfer not to do that in this FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe we can discuss it if some day if we do need the new generic table interface. 2: Considering various catalogs and tables, it's hard for Flink to do the unified follow-up actions after truncating table. But still the external connector can do such follow-up actions in method `executeTruncation`. Btw, in Spark, for the newly truncate table interface[1], Spark only recaches the table after truncating table[2] which I think if Flink supports table cache in framework-level, we can also recache in framework-level for truncate table statement. [1] https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java [2] https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala I think the external catalog can implemnet such logic in method `executeTruncation`. Best regards, Yuxia - 原始邮件 - 发件人: "liu ron" 收件人: "dev" 发送时间: 星期二, 2023年 4 月 11日 上午 10:51:36 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement Hi, xia It's a nice improvement to support TRUNCATE TABLE statement, making Flink more feature-rich. I think the truncate syntax is a command that will be executed in the client's process, rather than pulling up a Flink job to execute on the cluster. So on the user-facing exposed interface, I think we should not let users implement the SupportsTruncate interface on the DynamicTableSink interface. This seems a bit strange and also confuses users, as hang said, why Source table does not support truncate. It would be nice if we could come up with a generic interface that supports truncate instead of binding it to the DynamicTableSink interface, and maybe in the future we will support more commands like truncate command. In addition, after truncating data, we may also need to update the metadata of the table, such as Hive table, we need to update the statistics, as well as clear the cache in the metastore, I think we should also consider these capabilities, Sparky has considered these, refer to https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573 . Best, Ron Jim Hughes 于2023年4月11日周二 02:15写道: > Hi Yuxia, > > On Mon, Apr 10, 2023 at 10:35 AM yuxia > wrote: > > > Hi, Jim. > > > > 1: I'm expecting all DynamicTableSinks to support. But it's hard to > > support all at one shot. For the DynamicTableSinks that haven't > implemented > > SupportsTruncate interface, we'll throw exception > > like 'The truncate statement for the table is not supported as it hasn't > > implemented the interface SupportsTruncate'. Also, for some sinks that > > doesn't support deleting data, it can also implements it but throw more > > concrete exception like "xxx donesn't support to truncate a table as > delete > > is impossible for xxx". It depends on the external connector's > > implementation. > > Thanks for your advice, I updated it to the FLIP. > > > > Makes sense. > > > > 2: What do you mean by saying "truncate an input to a streaming query"? > > This FLIP is aimed to support TRUNCATE TABLE statement which is for > > truncating a table. In which case it will inoperates with streaming > queries? > > > > Let's take a source like Kafka as an example. Suppose I have an input > topic Foo, and query which uses it as an input. > > When Foo is truncated, if the truncation works as a delete and create, then > the connector may need to be made aware (otherwise it may try to use > offsets from the previous topic). On the other hand, one may have to ask > Kafka to delete records up to a certain point. > > Also, savepoints for the query may contain information from the truncated > table. Should this FLIP involve invalidating that information in some > manner? Or does truncating a source table for a query cause undefined > behavior on that query? > >
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi, xia It's a nice improvement to support TRUNCATE TABLE statement, making Flink more feature-rich. I think the truncate syntax is a command that will be executed in the client's process, rather than pulling up a Flink job to execute on the cluster. So on the user-facing exposed interface, I think we should not let users implement the SupportsTruncate interface on the DynamicTableSink interface. This seems a bit strange and also confuses users, as hang said, why Source table does not support truncate. It would be nice if we could come up with a generic interface that supports truncate instead of binding it to the DynamicTableSink interface, and maybe in the future we will support more commands like truncate command. In addition, after truncating data, we may also need to update the metadata of the table, such as Hive table, we need to update the statistics, as well as clear the cache in the metastore, I think we should also consider these capabilities, Sparky has considered these, refer to https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573 . Best, Ron Jim Hughes 于2023年4月11日周二 02:15写道: > Hi Yuxia, > > On Mon, Apr 10, 2023 at 10:35 AM yuxia > wrote: > > > Hi, Jim. > > > > 1: I'm expecting all DynamicTableSinks to support. But it's hard to > > support all at one shot. For the DynamicTableSinks that haven't > implemented > > SupportsTruncate interface, we'll throw exception > > like 'The truncate statement for the table is not supported as it hasn't > > implemented the interface SupportsTruncate'. Also, for some sinks that > > doesn't support deleting data, it can also implements it but throw more > > concrete exception like "xxx donesn't support to truncate a table as > delete > > is impossible for xxx". It depends on the external connector's > > implementation. > > Thanks for your advice, I updated it to the FLIP. > > > > Makes sense. > > > > 2: What do you mean by saying "truncate an input to a streaming query"? > > This FLIP is aimed to support TRUNCATE TABLE statement which is for > > truncating a table. In which case it will inoperates with streaming > queries? > > > > Let's take a source like Kafka as an example. Suppose I have an input > topic Foo, and query which uses it as an input. > > When Foo is truncated, if the truncation works as a delete and create, then > the connector may need to be made aware (otherwise it may try to use > offsets from the previous topic). On the other hand, one may have to ask > Kafka to delete records up to a certain point. > > Also, savepoints for the query may contain information from the truncated > table. Should this FLIP involve invalidating that information in some > manner? Or does truncating a source table for a query cause undefined > behavior on that query? > > Basically, I'm trying to think through the implementations of a truncate > operation to streaming sources and queries. > > Cheers, > > Jim > > > > Best regards, > > Yuxia > > > > - 原始邮件 - > > 发件人: "Jim Hughes" > > 收件人: "dev" > > 发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28 > > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > > > Hi Yuxia, > > > > Two questions: > > > > 1. Are you expecting all DynamicTableSinks to support Truncate? The > FLIP > > could use some explanation for what supporting and not supporting the > > operation means. > > > > 2. How will truncate inoperate with streaming queries? That is, if I > > truncate an input to a streaming query, is there any defined behavior? > > > > Cheers, > > > > Jim > > > > On Wed, Mar 22, 2023 at 9:13 AM yuxia > wrote: > > > > > Hi, devs. > > > > > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE > > > statement [1]. > > > > > > The TRUNCATE TABLE statement is a SQL command that allows users to > > quickly > > > and efficiently delete all rows from a table without dropping the table > > > itself. This statement is commonly used in data warehouse, where large > > data > > > sets are frequently loaded and unloaded from tables. > > > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore > > exactly, > > > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface > > with > > > which the coresponding connectors can implement their own logic for > > > truncating table. > > > > > > Looking forwards to your feedback. > > > > > > [1]: [ > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > > > | > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > > > ] > > > > > > > > > Best regards, > > > Yuxia > > > > > >
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi, Jim. Thanks for your explanation. Now, I got you. I think you raise a good question. As Flink doesn't manage the underlying storage, it's hard for Flink itself to do the real coordiantion. For me, it looks like Flink needs to introduce some common coordiantion which maybe dicussed in another dedicated FLIP or the external connector/storage should consider such coordiantion. Also, the question makes me think over the semantic for truncate table statement in stream scenario which I miss. Considering the use cases of truncate table are mainly for batch scenario and the semantic in stream scenario should be discussed separately, I'd like to limit the scope of the FLIP to batch only. Now, I have updated the title & content of the FLIP to avoid misunderstanding. Best regards, Yuxia - 原始邮件 - 发件人: "Jim Hughes" 收件人: "dev" 发送时间: 星期二, 2023年 4 月 11日 上午 2:15:10 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement Hi Yuxia, On Mon, Apr 10, 2023 at 10:35 AM yuxia wrote: > Hi, Jim. > > 1: I'm expecting all DynamicTableSinks to support. But it's hard to > support all at one shot. For the DynamicTableSinks that haven't implemented > SupportsTruncate interface, we'll throw exception > like 'The truncate statement for the table is not supported as it hasn't > implemented the interface SupportsTruncate'. Also, for some sinks that > doesn't support deleting data, it can also implements it but throw more > concrete exception like "xxx donesn't support to truncate a table as delete > is impossible for xxx". It depends on the external connector's > implementation. > Thanks for your advice, I updated it to the FLIP. > Makes sense. > 2: What do you mean by saying "truncate an input to a streaming query"? > This FLIP is aimed to support TRUNCATE TABLE statement which is for > truncating a table. In which case it will inoperates with streaming queries? > Let's take a source like Kafka as an example. Suppose I have an input topic Foo, and query which uses it as an input. When Foo is truncated, if the truncation works as a delete and create, then the connector may need to be made aware (otherwise it may try to use offsets from the previous topic). On the other hand, one may have to ask Kafka to delete records up to a certain point. Also, savepoints for the query may contain information from the truncated table. Should this FLIP involve invalidating that information in some manner? Or does truncating a source table for a query cause undefined behavior on that query? Basically, I'm trying to think through the implementations of a truncate operation to streaming sources and queries. Cheers, Jim > Best regards, > Yuxia > > ----- 原始邮件 ----- > 发件人: "Jim Hughes" > 收件人: "dev" > 发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28 > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > Hi Yuxia, > > Two questions: > > 1. Are you expecting all DynamicTableSinks to support Truncate? The FLIP > could use some explanation for what supporting and not supporting the > operation means. > > 2. How will truncate inoperate with streaming queries? That is, if I > truncate an input to a streaming query, is there any defined behavior? > > Cheers, > > Jim > > On Wed, Mar 22, 2023 at 9:13 AM yuxia wrote: > > > Hi, devs. > > > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE > > statement [1]. > > > > The TRUNCATE TABLE statement is a SQL command that allows users to > quickly > > and efficiently delete all rows from a table without dropping the table > > itself. This statement is commonly used in data warehouse, where large > data > > sets are frequently loaded and unloaded from tables. > > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore > exactly, > > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface > with > > which the coresponding connectors can implement their own logic for > > truncating table. > > > > Looking forwards to your feedback. > > > > [1]: [ > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > > | > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > > ] > > > > > > Best regards, > > Yuxia > > >
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi Yuxia, On Mon, Apr 10, 2023 at 10:35 AM yuxia wrote: > Hi, Jim. > > 1: I'm expecting all DynamicTableSinks to support. But it's hard to > support all at one shot. For the DynamicTableSinks that haven't implemented > SupportsTruncate interface, we'll throw exception > like 'The truncate statement for the table is not supported as it hasn't > implemented the interface SupportsTruncate'. Also, for some sinks that > doesn't support deleting data, it can also implements it but throw more > concrete exception like "xxx donesn't support to truncate a table as delete > is impossible for xxx". It depends on the external connector's > implementation. > Thanks for your advice, I updated it to the FLIP. > Makes sense. > 2: What do you mean by saying "truncate an input to a streaming query"? > This FLIP is aimed to support TRUNCATE TABLE statement which is for > truncating a table. In which case it will inoperates with streaming queries? > Let's take a source like Kafka as an example. Suppose I have an input topic Foo, and query which uses it as an input. When Foo is truncated, if the truncation works as a delete and create, then the connector may need to be made aware (otherwise it may try to use offsets from the previous topic). On the other hand, one may have to ask Kafka to delete records up to a certain point. Also, savepoints for the query may contain information from the truncated table. Should this FLIP involve invalidating that information in some manner? Or does truncating a source table for a query cause undefined behavior on that query? Basically, I'm trying to think through the implementations of a truncate operation to streaming sources and queries. Cheers, Jim > Best regards, > Yuxia > > - 原始邮件 - > 发件人: "Jim Hughes" > 收件人: "dev" > 发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28 > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > Hi Yuxia, > > Two questions: > > 1. Are you expecting all DynamicTableSinks to support Truncate? The FLIP > could use some explanation for what supporting and not supporting the > operation means. > > 2. How will truncate inoperate with streaming queries? That is, if I > truncate an input to a streaming query, is there any defined behavior? > > Cheers, > > Jim > > On Wed, Mar 22, 2023 at 9:13 AM yuxia wrote: > > > Hi, devs. > > > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE > > statement [1]. > > > > The TRUNCATE TABLE statement is a SQL command that allows users to > quickly > > and efficiently delete all rows from a table without dropping the table > > itself. This statement is commonly used in data warehouse, where large > data > > sets are frequently loaded and unloaded from tables. > > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore > exactly, > > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface > with > > which the coresponding connectors can implement their own logic for > > truncating table. > > > > Looking forwards to your feedback. > > > > [1]: [ > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > > | > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > > ] > > > > > > Best regards, > > Yuxia > > >
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi, Jim. 1: I'm expecting all DynamicTableSinks to support. But it's hard to support all at one shot. For the DynamicTableSinks that haven't implemented SupportsTruncate interface, we'll throw exception like 'The truncate statement for the table is not supported as it hasn't implemented the interface SupportsTruncate'. Also, for some sinks that doesn't support deleting data, it can also implements it but throw more concrete exception like "xxx donesn't support to truncate a table as delete is impossible for xxx". It depends on the external connector's implementation. Thanks for your advice, I updated it to the FLIP. 2: What do you mean by saying "truncate an input to a streaming query"? This FLIP is aimed to support TRUNCATE TABLE statement which is for truncating a table. In which case it will inoperates with streaming queries? Best regards, Yuxia - 原始邮件 - 发件人: "Jim Hughes" 收件人: "dev" 发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement Hi Yuxia, Two questions: 1. Are you expecting all DynamicTableSinks to support Truncate? The FLIP could use some explanation for what supporting and not supporting the operation means. 2. How will truncate inoperate with streaming queries? That is, if I truncate an input to a streaming query, is there any defined behavior? Cheers, Jim On Wed, Mar 22, 2023 at 9:13 AM yuxia wrote: > Hi, devs. > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE > statement [1]. > > The TRUNCATE TABLE statement is a SQL command that allows users to quickly > and efficiently delete all rows from a table without dropping the table > itself. This statement is commonly used in data warehouse, where large data > sets are frequently loaded and unloaded from tables. > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly, > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with > which the coresponding connectors can implement their own logic for > truncating table. > > Looking forwards to your feedback. > > [1]: [ > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > | > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > ] > > > Best regards, > Yuxia >
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi Yuxia, Two questions: 1. Are you expecting all DynamicTableSinks to support Truncate? The FLIP could use some explanation for what supporting and not supporting the operation means. 2. How will truncate inoperate with streaming queries? That is, if I truncate an input to a streaming query, is there any defined behavior? Cheers, Jim On Wed, Mar 22, 2023 at 9:13 AM yuxia wrote: > Hi, devs. > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE > statement [1]. > > The TRUNCATE TABLE statement is a SQL command that allows users to quickly > and efficiently delete all rows from a table without dropping the table > itself. This statement is commonly used in data warehouse, where large data > sets are frequently loaded and unloaded from tables. > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly, > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with > which the coresponding connectors can implement their own logic for > truncating table. > > Looking forwards to your feedback. > > [1]: [ > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > | > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > ] > > > Best regards, > Yuxia >
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
1: Actaully, considering the Flink's implementation, Flink just provides Truncate Table syntax to help user simlify data management as said in this FLIP and push the implementation of Truncate Table to external connector. Normally, the effect of TRUENCATE TABLE is same as Drop Table + Create Table. But the real difference/benefit depends on the implementation of the external connector. For example, for DROP Table statement, some external connectors may also drop the view related or other things. But for Truncate Table, the connectors may just delete all data without other operations. 2: At very begining, I'm thinking about in which case user may want to truncate a temporary table. I thought users can always create a table in catalog(if the table doesn't exist in a catalog) and truncate the table. So I tend not to expose it to user. But after I think it over again, I think it may be reasonable to support truncate a temporary table for the case that user just want to delete all datas from a table in an external storage without storing the metadata of the table in a catalog so that the other user/session can't see the metada. I think we can relax to the constraint to support truncate temporary table. Now, I update it to the FLIP. 3: Thanks for your input, I agree that we can dicuss it in a different FLIP. Best regards, Yuxia - 原始邮件 - 发件人: "Jing Ge" 收件人: "dev" 发送时间: 星期六, 2023年 4 月 08日 上午 3:05:11 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement Hi yuxia, Thanks for raising this topic. It is indeed a useful feature. +1 for having it in Flink. I have some small questions and it would be great if related information could be described in the FLIP. 1. Speaking of data warehouse use cases, what is the benefit of using TRUNCATE table over DROP table + CREATE table IF NOT EXISTS with the consideration of concrete Flink implementations? What would be the suggestion for users to use TRUNCATE instead of DROP + CREATE... and vise versa? 2. Since some engines support it, would you like to describe your thought about why TRUNCATE table does not support temporary table? 3. The partition support is an important feature, afaic. It might deserve a different FLIP and consider e.g.: TRUNCATE TABLE tt_dw_usr_exp_xxx PARTITION(dt='20230303') and ALTER TABLE tt_dw_usr_exp_xxx DROP IF EXISTS PARTITION(dt='20230303'). Looking forward to your thoughts. Thanks! Best regards, Jing On 4/7/23 05:04, Jingsong Li wrote: > +1 for voting. > > Best, > Jingsong > > On Thu, Apr 6, 2023 at 4:52 PM yuxia wrote: >> Hi everyone. >> >> If there are no other questions or concerns for the FLIP[1], I'd like to >> start the vote next Monday (4.10). >> >> [1] >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement >> >> Best regards, >> Yuxia >> >> - 原始邮件 - >> 发件人: "yuxia" >> 收件人: "dev" >> 发送时间: 星期五, 2023年 3 月 24日 上午 11:27:42 >> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement >> >> Thanks all for your feedback. >> >> @Shammon FY >> My gut feeling is that the end user shouldn't care about whether it'll >> delete direcotry or move to Trash directory with the TRUNCATE TABLE >> statement. They only need to know it will delete all rows from a table. >> To me, I think delete directory or move to trash is more likely to be a >> behavior of external storage level instead of SQL statement level. In Hive, >> if user configure Trash, it will then move files to trash for DROP statment. >> Also, hardly did I see such usage with TRUNCATE TABLE statement in other >> engines. What's more, to support it, we have to extend the TRUNCATE TABLE >> synax which won't then compliant with SQL standard. I really don't want to >> do that and I believe it'll make user confused if we do so. >> >> @Hang >> `TRUNCATE TABLE` is meant to delete all rows of a base table. So, it makes >> no sense that table source implements it. >> If user use TRUNCATE TABLE statement to truncate a table, the planner will >> only try to >> find the DynamicTableSink for the corresponding table. >> >> @Ran Tao >> 1: Thanks for you reminder. I said it won't support view in the FLIP, but >> forget to said temporary table is also not supported. Now, I add this part >> to this FLIP. >> >> 2: Yes, I also considered to incldue it in this FLIP before. But as far as I >> see, I haven't seen much usage of truncate table with partition. It's not as >> useful as truncate table. So, I tend to keep this FLIP simple in here >> without supporting truncate table with partition. >> Also, seems for `truncate table with partition`, d
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi yuxia, Thanks for raising this topic. It is indeed a useful feature. +1 for having it in Flink. I have some small questions and it would be great if related information could be described in the FLIP. 1. Speaking of data warehouse use cases, what is the benefit of using TRUNCATE table over DROP table + CREATE table IF NOT EXISTS with the consideration of concrete Flink implementations? What would be the suggestion for users to use TRUNCATE instead of DROP + CREATE... and vise versa? 2. Since some engines support it, would you like to describe your thought about why TRUNCATE table does not support temporary table? 3. The partition support is an important feature, afaic. It might deserve a different FLIP and consider e.g.: TRUNCATE TABLE tt_dw_usr_exp_xxx PARTITION(dt='20230303') and ALTER TABLE tt_dw_usr_exp_xxx DROP IF EXISTS PARTITION(dt='20230303'). Looking forward to your thoughts. Thanks! Best regards, Jing On 4/7/23 05:04, Jingsong Li wrote: +1 for voting. Best, Jingsong On Thu, Apr 6, 2023 at 4:52 PM yuxia wrote: Hi everyone. If there are no other questions or concerns for the FLIP[1], I'd like to start the vote next Monday (4.10). [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement Best regards, Yuxia - 原始邮件 - 发件人: "yuxia" 收件人: "dev" 发送时间: 星期五, 2023年 3 月 24日 上午 11:27:42 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement Thanks all for your feedback. @Shammon FY My gut feeling is that the end user shouldn't care about whether it'll delete direcotry or move to Trash directory with the TRUNCATE TABLE statement. They only need to know it will delete all rows from a table. To me, I think delete directory or move to trash is more likely to be a behavior of external storage level instead of SQL statement level. In Hive, if user configure Trash, it will then move files to trash for DROP statment. Also, hardly did I see such usage with TRUNCATE TABLE statement in other engines. What's more, to support it, we have to extend the TRUNCATE TABLE synax which won't then compliant with SQL standard. I really don't want to do that and I believe it'll make user confused if we do so. @Hang `TRUNCATE TABLE` is meant to delete all rows of a base table. So, it makes no sense that table source implements it. If user use TRUNCATE TABLE statement to truncate a table, the planner will only try to find the DynamicTableSink for the corresponding table. @Ran Tao 1: Thanks for you reminder. I said it won't support view in the FLIP, but forget to said temporary table is also not supported. Now, I add this part to this FLIP. 2: Yes, I also considered to incldue it in this FLIP before. But as far as I see, I haven't seen much usage of truncate table with partition. It's not as useful as truncate table. So, I tend to keep this FLIP simple in here without supporting truncate table with partition. Also, seems for `truncate table with partition`, differnet engines may have differernt syntax; Hive[1]/Spark[2] use the following syntax: TRUNCATE TABLE table_name [PARTITION partition_spec] SqlServer[3] use the follwoing syntax: TRUNCATE TABLE { database_name.schema_name.table_name | schema_name.table_name | table_name } [ WITH ( PARTITIONS ( { | } So, I'm tend to be cautious about it. But I'm open to this. If there's any feedback or strong requirement, I don't mind to add it in this FLIP. If we do need it in some day, I can propose it in a new FLIP. It won't break the current design. As for concrete syntax in the FLIP, I think the current one is the concrete syntax, we don't allow TABLE keyword to be optional. 3: Thanks for your reminder, I have updadted the FLIP for this. [1]https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl#LanguageManualDDL-TruncateTable [2]https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-truncate-table.html [3]https://learn.microsoft.com/en-us/sql/t-sql/statements/truncate-table-transact-sql?view=sql-server-ver16 Best regards, Yuxia - 原始邮件 - 发件人: "Ran Tao" 收件人: "dev" 发送时间: 星期四, 2023年 3 月 23日 下午 6:28:17 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement Hi, yuxia. Thanks for starting the discussion. I think it's a nice improvement to support TRUNCATE TABLE statement because many other mature engines supports it. I have some questions. 1. because table has different types, whether we will support view or temporary tables? 2. some other engines such as spark and hive support TRUNCATE TABLE with partition. whether we will support? btw, i think you need give the TRUNCATE TABLE concrete syntax in the FLIP because some engines has different syntaxes. for example, hive allow TRUNCATE TABLE be TRUNCATE [TABLE] which means TABLE keyword can be optional. 3. The Proposed Changes try to use SqlToOperationConverter and run in TableEnvironmentImpl#executeInternal. I think it's out of date, the commu
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
+1 for voting. Best, Jingsong On Thu, Apr 6, 2023 at 4:52 PM yuxia wrote: > > Hi everyone. > > If there are no other questions or concerns for the FLIP[1], I'd like to > start the vote next Monday (4.10). > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > > Best regards, > Yuxia > > - 原始邮件 - > 发件人: "yuxia" > 收件人: "dev" > 发送时间: 星期五, 2023年 3 月 24日 上午 11:27:42 > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > Thanks all for your feedback. > > @Shammon FY > My gut feeling is that the end user shouldn't care about whether it'll delete > direcotry or move to Trash directory with the TRUNCATE TABLE statement. They > only need to know it will delete all rows from a table. > To me, I think delete directory or move to trash is more likely to be a > behavior of external storage level instead of SQL statement level. In Hive, > if user configure Trash, it will then move files to trash for DROP statment. > Also, hardly did I see such usage with TRUNCATE TABLE statement in other > engines. What's more, to support it, we have to extend the TRUNCATE TABLE > synax which won't then compliant with SQL standard. I really don't want to do > that and I believe it'll make user confused if we do so. > > @Hang > `TRUNCATE TABLE` is meant to delete all rows of a base table. So, it makes no > sense that table source implements it. > If user use TRUNCATE TABLE statement to truncate a table, the planner will > only try to > find the DynamicTableSink for the corresponding table. > > @Ran Tao > 1: Thanks for you reminder. I said it won't support view in the FLIP, but > forget to said temporary table is also not supported. Now, I add this part to > this FLIP. > > 2: Yes, I also considered to incldue it in this FLIP before. But as far as I > see, I haven't seen much usage of truncate table with partition. It's not as > useful as truncate table. So, I tend to keep this FLIP simple in here without > supporting truncate table with partition. > Also, seems for `truncate table with partition`, differnet engines may have > differernt syntax; > Hive[1]/Spark[2] use the following syntax: > TRUNCATE TABLE table_name [PARTITION partition_spec] > > SqlServer[3] use the follwoing syntax: > TRUNCATE TABLE { database_name.schema_name.table_name | > schema_name.table_name | table_name } [ WITH ( PARTITIONS ( { > | } > So, I'm tend to be cautious about it. > > But I'm open to this. If there's any feedback or strong requirement, I don't > mind to add it in this FLIP. > If we do need it in some day, I can propose it in a new FLIP. It won't break > the current design. > > As for concrete syntax in the FLIP, I think the current one is the concrete > syntax, we don't allow TABLE keyword to be optional. > > 3: Thanks for your reminder, I have updadted the FLIP for this. > > > [1]https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl#LanguageManualDDL-TruncateTable > [2]https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-truncate-table.html > [3]https://learn.microsoft.com/en-us/sql/t-sql/statements/truncate-table-transact-sql?view=sql-server-ver16 > > > > Best regards, > Yuxia > > - 原始邮件 - > 发件人: "Ran Tao" > 收件人: "dev" > 发送时间: 星期四, 2023年 3 月 23日 下午 6:28:17 > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > Hi, yuxia. > > Thanks for starting the discussion. > I think it's a nice improvement to support TRUNCATE TABLE statement because > many other mature engines supports it. > > I have some questions. > 1. because table has different types, whether we will support view or > temporary tables? > > 2. some other engines such as spark and hive support TRUNCATE TABLE with > partition. whether we will support? > btw, i think you need give the TRUNCATE TABLE concrete syntax in the FLIP > because some engines has different syntaxes. > for example, hive allow TRUNCATE TABLE be TRUNCATE [TABLE] which means > TABLE keyword can be optional. > > 3. The Proposed Changes try to use SqlToOperationConverter and run in > TableEnvironmentImpl#executeInternal. > I think it's out of date, the community is refactoring the conversion logic > from SqlNode to operation[1] and executions in TableEnvironmentImpl[2]. > I suggest you can use new way to support it. > > [1] https://issues.apache.org/jira/browse/FLINK-31464 > [2] https://issues.apache.org/jira/browse/FLINK-31368 > > Best Regards, > Ran Tao > https://github.com/chucheng92 > > > yuxia 于2023年3月22日周三 21:13写道: > > > Hi, devs. > > > > I'd like to start a discussion abou
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi everyone. If there are no other questions or concerns for the FLIP[1], I'd like to start the vote next Monday (4.10). [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement Best regards, Yuxia - 原始邮件 - 发件人: "yuxia" 收件人: "dev" 发送时间: 星期五, 2023年 3 月 24日 上午 11:27:42 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement Thanks all for your feedback. @Shammon FY My gut feeling is that the end user shouldn't care about whether it'll delete direcotry or move to Trash directory with the TRUNCATE TABLE statement. They only need to know it will delete all rows from a table. To me, I think delete directory or move to trash is more likely to be a behavior of external storage level instead of SQL statement level. In Hive, if user configure Trash, it will then move files to trash for DROP statment. Also, hardly did I see such usage with TRUNCATE TABLE statement in other engines. What's more, to support it, we have to extend the TRUNCATE TABLE synax which won't then compliant with SQL standard. I really don't want to do that and I believe it'll make user confused if we do so. @Hang `TRUNCATE TABLE` is meant to delete all rows of a base table. So, it makes no sense that table source implements it. If user use TRUNCATE TABLE statement to truncate a table, the planner will only try to find the DynamicTableSink for the corresponding table. @Ran Tao 1: Thanks for you reminder. I said it won't support view in the FLIP, but forget to said temporary table is also not supported. Now, I add this part to this FLIP. 2: Yes, I also considered to incldue it in this FLIP before. But as far as I see, I haven't seen much usage of truncate table with partition. It's not as useful as truncate table. So, I tend to keep this FLIP simple in here without supporting truncate table with partition. Also, seems for `truncate table with partition`, differnet engines may have differernt syntax; Hive[1]/Spark[2] use the following syntax: TRUNCATE TABLE table_name [PARTITION partition_spec] SqlServer[3] use the follwoing syntax: TRUNCATE TABLE { database_name.schema_name.table_name | schema_name.table_name | table_name } [ WITH ( PARTITIONS ( { | } So, I'm tend to be cautious about it. But I'm open to this. If there's any feedback or strong requirement, I don't mind to add it in this FLIP. If we do need it in some day, I can propose it in a new FLIP. It won't break the current design. As for concrete syntax in the FLIP, I think the current one is the concrete syntax, we don't allow TABLE keyword to be optional. 3: Thanks for your reminder, I have updadted the FLIP for this. [1]https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl#LanguageManualDDL-TruncateTable [2]https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-truncate-table.html [3]https://learn.microsoft.com/en-us/sql/t-sql/statements/truncate-table-transact-sql?view=sql-server-ver16 Best regards, Yuxia - 原始邮件 - 发件人: "Ran Tao" 收件人: "dev" 发送时间: 星期四, 2023年 3 月 23日 下午 6:28:17 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement Hi, yuxia. Thanks for starting the discussion. I think it's a nice improvement to support TRUNCATE TABLE statement because many other mature engines supports it. I have some questions. 1. because table has different types, whether we will support view or temporary tables? 2. some other engines such as spark and hive support TRUNCATE TABLE with partition. whether we will support? btw, i think you need give the TRUNCATE TABLE concrete syntax in the FLIP because some engines has different syntaxes. for example, hive allow TRUNCATE TABLE be TRUNCATE [TABLE] which means TABLE keyword can be optional. 3. The Proposed Changes try to use SqlToOperationConverter and run in TableEnvironmentImpl#executeInternal. I think it's out of date, the community is refactoring the conversion logic from SqlNode to operation[1] and executions in TableEnvironmentImpl[2]. I suggest you can use new way to support it. [1] https://issues.apache.org/jira/browse/FLINK-31464 [2] https://issues.apache.org/jira/browse/FLINK-31368 Best Regards, Ran Tao https://github.com/chucheng92 yuxia 于2023年3月22日周三 21:13写道: > Hi, devs. > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE > statement [1]. > > The TRUNCATE TABLE statement is a SQL command that allows users to quickly > and efficiently delete all rows from a table without dropping the table > itself. This statement is commonly used in data warehouse, where large data > sets are frequently loaded and unloaded from tables. > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly, > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with > which the coresponding connectors can implement their own logic for > truncating table. > > Looking forwards to your feedbac
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Thanks all for your feedback. @Shammon FY My gut feeling is that the end user shouldn't care about whether it'll delete direcotry or move to Trash directory with the TRUNCATE TABLE statement. They only need to know it will delete all rows from a table. To me, I think delete directory or move to trash is more likely to be a behavior of external storage level instead of SQL statement level. In Hive, if user configure Trash, it will then move files to trash for DROP statment. Also, hardly did I see such usage with TRUNCATE TABLE statement in other engines. What's more, to support it, we have to extend the TRUNCATE TABLE synax which won't then compliant with SQL standard. I really don't want to do that and I believe it'll make user confused if we do so. @Hang `TRUNCATE TABLE` is meant to delete all rows of a base table. So, it makes no sense that table source implements it. If user use TRUNCATE TABLE statement to truncate a table, the planner will only try to find the DynamicTableSink for the corresponding table. @Ran Tao 1: Thanks for you reminder. I said it won't support view in the FLIP, but forget to said temporary table is also not supported. Now, I add this part to this FLIP. 2: Yes, I also considered to incldue it in this FLIP before. But as far as I see, I haven't seen much usage of truncate table with partition. It's not as useful as truncate table. So, I tend to keep this FLIP simple in here without supporting truncate table with partition. Also, seems for `truncate table with partition`, differnet engines may have differernt syntax; Hive[1]/Spark[2] use the following syntax: TRUNCATE TABLE table_name [PARTITION partition_spec] SqlServer[3] use the follwoing syntax: TRUNCATE TABLE { database_name.schema_name.table_name | schema_name.table_name | table_name } [ WITH ( PARTITIONS ( { | } So, I'm tend to be cautious about it. But I'm open to this. If there's any feedback or strong requirement, I don't mind to add it in this FLIP. If we do need it in some day, I can propose it in a new FLIP. It won't break the current design. As for concrete syntax in the FLIP, I think the current one is the concrete syntax, we don't allow TABLE keyword to be optional. 3: Thanks for your reminder, I have updadted the FLIP for this. [1]https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl#LanguageManualDDL-TruncateTable [2]https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-truncate-table.html [3]https://learn.microsoft.com/en-us/sql/t-sql/statements/truncate-table-transact-sql?view=sql-server-ver16 Best regards, Yuxia - 原始邮件 - 发件人: "Ran Tao" 收件人: "dev" 发送时间: 星期四, 2023年 3 月 23日 下午 6:28:17 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement Hi, yuxia. Thanks for starting the discussion. I think it's a nice improvement to support TRUNCATE TABLE statement because many other mature engines supports it. I have some questions. 1. because table has different types, whether we will support view or temporary tables? 2. some other engines such as spark and hive support TRUNCATE TABLE with partition. whether we will support? btw, i think you need give the TRUNCATE TABLE concrete syntax in the FLIP because some engines has different syntaxes. for example, hive allow TRUNCATE TABLE be TRUNCATE [TABLE] which means TABLE keyword can be optional. 3. The Proposed Changes try to use SqlToOperationConverter and run in TableEnvironmentImpl#executeInternal. I think it's out of date, the community is refactoring the conversion logic from SqlNode to operation[1] and executions in TableEnvironmentImpl[2]. I suggest you can use new way to support it. [1] https://issues.apache.org/jira/browse/FLINK-31464 [2] https://issues.apache.org/jira/browse/FLINK-31368 Best Regards, Ran Tao https://github.com/chucheng92 yuxia 于2023年3月22日周三 21:13写道: > Hi, devs. > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE > statement [1]. > > The TRUNCATE TABLE statement is a SQL command that allows users to quickly > and efficiently delete all rows from a table without dropping the table > itself. This statement is commonly used in data warehouse, where large data > sets are frequently loaded and unloaded from tables. > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly, > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with > which the coresponding connectors can implement their own logic for > truncating table. > > Looking forwards to your feedback. > > [1]: [ > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > | > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > ] > > > Best regards, > Yuxia >
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi, yuxia. Thanks for starting the discussion. I think it's a nice improvement to support TRUNCATE TABLE statement because many other mature engines supports it. I have some questions. 1. because table has different types, whether we will support view or temporary tables? 2. some other engines such as spark and hive support TRUNCATE TABLE with partition. whether we will support? btw, i think you need give the TRUNCATE TABLE concrete syntax in the FLIP because some engines has different syntaxes. for example, hive allow TRUNCATE TABLE be TRUNCATE [TABLE] which means TABLE keyword can be optional. 3. The Proposed Changes try to use SqlToOperationConverter and run in TableEnvironmentImpl#executeInternal. I think it's out of date, the community is refactoring the conversion logic from SqlNode to operation[1] and executions in TableEnvironmentImpl[2]. I suggest you can use new way to support it. [1] https://issues.apache.org/jira/browse/FLINK-31464 [2] https://issues.apache.org/jira/browse/FLINK-31368 Best Regards, Ran Tao https://github.com/chucheng92 yuxia 于2023年3月22日周三 21:13写道: > Hi, devs. > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE > statement [1]. > > The TRUNCATE TABLE statement is a SQL command that allows users to quickly > and efficiently delete all rows from a table without dropping the table > itself. This statement is commonly used in data warehouse, where large data > sets are frequently loaded and unloaded from tables. > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly, > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with > which the coresponding connectors can implement their own logic for > truncating table. > > Looking forwards to your feedback. > > [1]: [ > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > | > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > ] > > > Best regards, > Yuxia >
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi, yuxia, Thanks for starting the discussion. I wonder what the behavior is when we truncate a table which is used as a source. Source table and sink table may have different table options. IMO, the truncate sql should be supported no matter which kind the table is. Best, Hang Shammon FY 于2023年3月23日周四 08:55写道: > Hi yuxia > > Thanks for initiating this discussion. > > There are usually two types of data deletion in a production environment: > one is deleting data directly and the other is moving the data to the trash > directory which will be deleted periodically by the underlying system. > > Can we distinguish between these two operations in the truncate syntax? Or > support adding options in `with`? > > Best, > Shammon FY > > > On Wed, Mar 22, 2023 at 9:13 PM yuxia wrote: > > > Hi, devs. > > > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE > > statement [1]. > > > > The TRUNCATE TABLE statement is a SQL command that allows users to > quickly > > and efficiently delete all rows from a table without dropping the table > > itself. This statement is commonly used in data warehouse, where large > data > > sets are frequently loaded and unloaded from tables. > > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore > exactly, > > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface > with > > which the coresponding connectors can implement their own logic for > > truncating table. > > > > Looking forwards to your feedback. > > > > [1]: [ > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > > | > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > > ] > > > > > > Best regards, > > Yuxia > > >
Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi yuxia Thanks for initiating this discussion. There are usually two types of data deletion in a production environment: one is deleting data directly and the other is moving the data to the trash directory which will be deleted periodically by the underlying system. Can we distinguish between these two operations in the truncate syntax? Or support adding options in `with`? Best, Shammon FY On Wed, Mar 22, 2023 at 9:13 PM yuxia wrote: > Hi, devs. > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE > statement [1]. > > The TRUNCATE TABLE statement is a SQL command that allows users to quickly > and efficiently delete all rows from a table without dropping the table > itself. This statement is commonly used in data warehouse, where large data > sets are frequently loaded and unloaded from tables. > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly, > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with > which the coresponding connectors can implement their own logic for > truncating table. > > Looking forwards to your feedback. > > [1]: [ > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > | > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > ] > > > Best regards, > Yuxia >
[DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
Hi, devs. I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE statement [1]. The TRUNCATE TABLE statement is a SQL command that allows users to quickly and efficiently delete all rows from a table without dropping the table itself. This statement is commonly used in data warehouse, where large data sets are frequently loaded and unloaded from tables. So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly, this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with which the coresponding connectors can implement their own logic for truncating table. Looking forwards to your feedback. [1]: [ https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement | https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement ] Best regards, Yuxia