Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-17 Thread yuxia
Hi, all.
I started a vote for this FLIP[1], please vote there[2] or ask additional
questions here[3].

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement+in+batch+mode
[2] https://lists.apache.org/thread/fosvz0zcyfn6bp6vz2oxl45vq9qhkn2v
[3] https://lists.apache.org/thread/m4r3wrd7p96wdst3nz3ncqzog6kf51cf

Best regards,
Yuxia

- 原始邮件 -
发件人: "Jark Wu" 
收件人: "dev" 
发送时间: 星期五, 2023年 4 月 14日 下午 11:04:58
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

Hi Yuxia,

Thank you for the updating. That sounds good to me.

Best,
Jark

> 2023年4月14日 19:00,yuxia  写道:
> 
> Hi, Jark.
> I'm expecting if the "executeTruncation" returns false, Flink will throw an 
> generic exception like "Fail to execute truncate table statement."
> But the connector implementation can also throw more specific exception like 
> "Fail to execute truncate table statement for it table is been writing by 
> other jobs".
> 
> But after think it over, I'm afraid of the connector implementation will 
> always return false to make Flink itself construnct the exception which maybe 
> not very useful for it provides 
> much less exception message instead of throwing more specific exception.
> So I decide to change it to `void executeTruncation()` and reminder to throw 
> exception if truncate operation hasn't been executed successfully in the java 
> doc of the method.
> I had updated this FLIP.
> 
> 
> Best regards,
> Yuxia
> 
> - 原始邮件 -----
> 发件人: "Jark Wu" 
> 收件人: "dev" 
> 发送时间: 星期五, 2023年 4 月 14日 下午 5:10:48
> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> 
> The FLIP looks good to me. +1 to start a vote.
> 
> I just have a question: what will happen if the "executeTruncation" returns
> false without any exceptions?
> 
> Best,
> Jark
> 
> On Thu, 13 Apr 2023 at 19:59, Jing Ge  wrote:
> 
>> Thanks Yuxia for the clarification and FLIP update. The FLIP looks good!
>> 
>> Best regards,
>> Jing
>> 
>> On Mon, Apr 10, 2023 at 3:51 AM yuxia  wrote:
>> 
>>> 1:
>>> Actaully, considering the Flink's implementation, Flink just provides
>>> Truncate Table syntax to help user simlify data management as said in
>> this
>>> FLIP and push the implementation of Truncate Table to external connector.
>>> Normally, the effect of TRUENCATE TABLE is same as Drop Table + Create
>>> Table. But the real difference/benefit depends on the implementation of
>> the
>>> external connector.
>>> For example, for DROP Table statement, some external connectors may also
>>> drop the view related or other things.
>>> But for Truncate Table, the connectors may just delete all data without
>>> other operations.
>>> 
>>> 
>>> 2:
>>> At very begining, I'm thinking about in which case user may want to
>>> truncate a temporary table.
>>> I thought users can always create a table in catalog(if the table doesn't
>>> exist in a catalog) and truncate the table. So I tend not to expose it to
>>> user.
>>> But after I think it over again, I think it may be reasonable to support
>>> truncate a temporary table for the case that user just want to delete all
>>> datas from a table in an external storage without storing the metadata of
>>> the table in a catalog so that the other user/session can't see the
>> metada.
>>> I think we can relax to the constraint to support truncate temporary
>>> table. Now, I update it to the FLIP.
>>> 
>>> 
>>> 3:
>>> Thanks for your input, I agree that we can dicuss it in a different FLIP.
>>> 
>>> 
>>> 
>>> Best regards,
>>> Yuxia
>>> 
>>> - 原始邮件 -
>>> 发件人: "Jing Ge" 
>>> 收件人: "dev" 
>>> 发送时间: 星期六, 2023年 4 月 08日 上午 3:05:11
>>> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
>>> 
>>> Hi yuxia,
>>> 
>>> Thanks for raising this topic. It is indeed a useful feature. +1 for
>>> having it in Flink. I have some small questions and it would be great if
>>> related information could be described in the FLIP.
>>> 
>>> 1. Speaking of data warehouse use cases, what is the benefit of using
>>> TRUNCATE table over DROP table + CREATE table IF NOT EXISTS with the
>>> consideration of concrete Flink implementations? What would be the
>>> suggestion for users to use TRUNCATE instead of DROP + CREATE... and
>>> vise versa?
>>> 
>>>

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-14 Thread Jark Wu
Hi Yuxia,

Thank you for the updating. That sounds good to me.

Best,
Jark

> 2023年4月14日 19:00,yuxia  写道:
> 
> Hi, Jark.
> I'm expecting if the "executeTruncation" returns false, Flink will throw an 
> generic exception like "Fail to execute truncate table statement."
> But the connector implementation can also throw more specific exception like 
> "Fail to execute truncate table statement for it table is been writing by 
> other jobs".
> 
> But after think it over, I'm afraid of the connector implementation will 
> always return false to make Flink itself construnct the exception which maybe 
> not very useful for it provides 
> much less exception message instead of throwing more specific exception.
> So I decide to change it to `void executeTruncation()` and reminder to throw 
> exception if truncate operation hasn't been executed successfully in the java 
> doc of the method.
> I had updated this FLIP.
> 
> 
> Best regards,
> Yuxia
> 
> - 原始邮件 -----
> 发件人: "Jark Wu" 
> 收件人: "dev" 
> 发送时间: 星期五, 2023年 4 月 14日 下午 5:10:48
> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> 
> The FLIP looks good to me. +1 to start a vote.
> 
> I just have a question: what will happen if the "executeTruncation" returns
> false without any exceptions?
> 
> Best,
> Jark
> 
> On Thu, 13 Apr 2023 at 19:59, Jing Ge  wrote:
> 
>> Thanks Yuxia for the clarification and FLIP update. The FLIP looks good!
>> 
>> Best regards,
>> Jing
>> 
>> On Mon, Apr 10, 2023 at 3:51 AM yuxia  wrote:
>> 
>>> 1:
>>> Actaully, considering the Flink's implementation, Flink just provides
>>> Truncate Table syntax to help user simlify data management as said in
>> this
>>> FLIP and push the implementation of Truncate Table to external connector.
>>> Normally, the effect of TRUENCATE TABLE is same as Drop Table + Create
>>> Table. But the real difference/benefit depends on the implementation of
>> the
>>> external connector.
>>> For example, for DROP Table statement, some external connectors may also
>>> drop the view related or other things.
>>> But for Truncate Table, the connectors may just delete all data without
>>> other operations.
>>> 
>>> 
>>> 2:
>>> At very begining, I'm thinking about in which case user may want to
>>> truncate a temporary table.
>>> I thought users can always create a table in catalog(if the table doesn't
>>> exist in a catalog) and truncate the table. So I tend not to expose it to
>>> user.
>>> But after I think it over again, I think it may be reasonable to support
>>> truncate a temporary table for the case that user just want to delete all
>>> datas from a table in an external storage without storing the metadata of
>>> the table in a catalog so that the other user/session can't see the
>> metada.
>>> I think we can relax to the constraint to support truncate temporary
>>> table. Now, I update it to the FLIP.
>>> 
>>> 
>>> 3:
>>> Thanks for your input, I agree that we can dicuss it in a different FLIP.
>>> 
>>> 
>>> 
>>> Best regards,
>>> Yuxia
>>> 
>>> - 原始邮件 -
>>> 发件人: "Jing Ge" 
>>> 收件人: "dev" 
>>> 发送时间: 星期六, 2023年 4 月 08日 上午 3:05:11
>>> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
>>> 
>>> Hi yuxia,
>>> 
>>> Thanks for raising this topic. It is indeed a useful feature. +1 for
>>> having it in Flink. I have some small questions and it would be great if
>>> related information could be described in the FLIP.
>>> 
>>> 1. Speaking of data warehouse use cases, what is the benefit of using
>>> TRUNCATE table over DROP table + CREATE table IF NOT EXISTS with the
>>> consideration of concrete Flink implementations? What would be the
>>> suggestion for users to use TRUNCATE instead of DROP + CREATE... and
>>> vise versa?
>>> 
>>> 2. Since some engines support it, would you like to describe your
>>> thought about why TRUNCATE table does not support temporary table?
>>> 
>>> 3. The partition support is an important feature, afaic. It might
>>> deserve a different FLIP and consider e.g.: TRUNCATE TABLE
>>> tt_dw_usr_exp_xxx PARTITION(dt='20230303') and ALTER TABLE
>>> tt_dw_usr_exp_xxx DROP IF EXISTS PARTITION(dt='20230303').
>>> 
>>> Looking forward to your thoughts. Thanks!
>&g

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-14 Thread yuxia
Hi, Jark.
I'm expecting if the "executeTruncation" returns false, Flink will throw an 
generic exception like "Fail to execute truncate table statement."
But the connector implementation can also throw more specific exception like 
"Fail to execute truncate table statement for it table is been writing by other 
jobs".

But after think it over, I'm afraid of the connector implementation will always 
return false to make Flink itself construnct the exception which maybe not very 
useful for it provides 
much less exception message instead of throwing more specific exception.
So I decide to change it to `void executeTruncation()` and reminder to throw 
exception if truncate operation hasn't been executed successfully in the java 
doc of the method.
I had updated this FLIP.


Best regards,
Yuxia

- 原始邮件 -
发件人: "Jark Wu" 
收件人: "dev" 
发送时间: 星期五, 2023年 4 月 14日 下午 5:10:48
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

The FLIP looks good to me. +1 to start a vote.

I just have a question: what will happen if the "executeTruncation" returns
false without any exceptions?

Best,
Jark

On Thu, 13 Apr 2023 at 19:59, Jing Ge  wrote:

> Thanks Yuxia for the clarification and FLIP update. The FLIP looks good!
>
> Best regards,
> Jing
>
> On Mon, Apr 10, 2023 at 3:51 AM yuxia  wrote:
>
> > 1:
> > Actaully, considering the Flink's implementation, Flink just provides
> > Truncate Table syntax to help user simlify data management as said in
> this
> > FLIP and push the implementation of Truncate Table to external connector.
> > Normally, the effect of TRUENCATE TABLE is same as Drop Table + Create
> > Table. But the real difference/benefit depends on the implementation of
> the
> > external connector.
> > For example, for DROP Table statement, some external connectors may also
> > drop the view related or other things.
> > But for Truncate Table, the connectors may just delete all data without
> > other operations.
> >
> >
> > 2:
> > At very begining, I'm thinking about in which case user may want to
> > truncate a temporary table.
> > I thought users can always create a table in catalog(if the table doesn't
> > exist in a catalog) and truncate the table. So I tend not to expose it to
> > user.
> > But after I think it over again, I think it may be reasonable to support
> > truncate a temporary table for the case that user just want to delete all
> > datas from a table in an external storage without storing the metadata of
> > the table in a catalog so that the other user/session can't see the
> metada.
> > I think we can relax to the constraint to support truncate temporary
> > table. Now, I update it to the FLIP.
> >
> >
> > 3:
> > Thanks for your input, I agree that we can dicuss it in a different FLIP.
> >
> >
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Jing Ge" 
> > 收件人: "dev" 
> > 发送时间: 星期六, 2023年 4 月 08日 上午 3:05:11
> > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> >
> > Hi yuxia,
> >
> > Thanks for raising this topic. It is indeed a useful feature. +1 for
> > having it in Flink. I have some small questions and it would be great if
> > related information could be described in the FLIP.
> >
> > 1. Speaking of data warehouse use cases, what is the benefit of using
> > TRUNCATE table over DROP table + CREATE table IF NOT EXISTS with the
> > consideration of concrete Flink implementations? What would be the
> > suggestion for users to use TRUNCATE instead of DROP + CREATE... and
> > vise versa?
> >
> > 2. Since some engines support it, would you like to describe your
> > thought about why TRUNCATE table does not support temporary table?
> >
> > 3. The partition support is an important feature, afaic. It might
> > deserve a different FLIP and consider e.g.: TRUNCATE TABLE
> > tt_dw_usr_exp_xxx PARTITION(dt='20230303') and ALTER TABLE
> > tt_dw_usr_exp_xxx DROP IF EXISTS PARTITION(dt='20230303').
> >
> > Looking forward to your thoughts. Thanks!
> >
> > Best regards,
> >
> > Jing
> >
> > On 4/7/23 05:04, Jingsong Li wrote:
> > > +1 for voting.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Thu, Apr 6, 2023 at 4:52 PM yuxia 
> > wrote:
> > >> Hi everyone.
> > >>
> > >> If there are no other questions or concerns for the FLIP[1], I'd like
> > to start the vote next Monday (4.10).
> > >>
> > >> [1]
> >
> https://cwiki.apa

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-14 Thread Jark Wu
The FLIP looks good to me. +1 to start a vote.

I just have a question: what will happen if the "executeTruncation" returns
false without any exceptions?

Best,
Jark

On Thu, 13 Apr 2023 at 19:59, Jing Ge  wrote:

> Thanks Yuxia for the clarification and FLIP update. The FLIP looks good!
>
> Best regards,
> Jing
>
> On Mon, Apr 10, 2023 at 3:51 AM yuxia  wrote:
>
> > 1:
> > Actaully, considering the Flink's implementation, Flink just provides
> > Truncate Table syntax to help user simlify data management as said in
> this
> > FLIP and push the implementation of Truncate Table to external connector.
> > Normally, the effect of TRUENCATE TABLE is same as Drop Table + Create
> > Table. But the real difference/benefit depends on the implementation of
> the
> > external connector.
> > For example, for DROP Table statement, some external connectors may also
> > drop the view related or other things.
> > But for Truncate Table, the connectors may just delete all data without
> > other operations.
> >
> >
> > 2:
> > At very begining, I'm thinking about in which case user may want to
> > truncate a temporary table.
> > I thought users can always create a table in catalog(if the table doesn't
> > exist in a catalog) and truncate the table. So I tend not to expose it to
> > user.
> > But after I think it over again, I think it may be reasonable to support
> > truncate a temporary table for the case that user just want to delete all
> > datas from a table in an external storage without storing the metadata of
> > the table in a catalog so that the other user/session can't see the
> metada.
> > I think we can relax to the constraint to support truncate temporary
> > table. Now, I update it to the FLIP.
> >
> >
> > 3:
> > Thanks for your input, I agree that we can dicuss it in a different FLIP.
> >
> >
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Jing Ge" 
> > 收件人: "dev" 
> > 发送时间: 星期六, 2023年 4 月 08日 上午 3:05:11
> > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> >
> > Hi yuxia,
> >
> > Thanks for raising this topic. It is indeed a useful feature. +1 for
> > having it in Flink. I have some small questions and it would be great if
> > related information could be described in the FLIP.
> >
> > 1. Speaking of data warehouse use cases, what is the benefit of using
> > TRUNCATE table over DROP table + CREATE table IF NOT EXISTS with the
> > consideration of concrete Flink implementations? What would be the
> > suggestion for users to use TRUNCATE instead of DROP + CREATE... and
> > vise versa?
> >
> > 2. Since some engines support it, would you like to describe your
> > thought about why TRUNCATE table does not support temporary table?
> >
> > 3. The partition support is an important feature, afaic. It might
> > deserve a different FLIP and consider e.g.: TRUNCATE TABLE
> > tt_dw_usr_exp_xxx PARTITION(dt='20230303') and ALTER TABLE
> > tt_dw_usr_exp_xxx DROP IF EXISTS PARTITION(dt='20230303').
> >
> > Looking forward to your thoughts. Thanks!
> >
> > Best regards,
> >
> > Jing
> >
> > On 4/7/23 05:04, Jingsong Li wrote:
> > > +1 for voting.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Thu, Apr 6, 2023 at 4:52 PM yuxia 
> > wrote:
> > >> Hi everyone.
> > >>
> > >> If there are no other questions or concerns for the FLIP[1], I'd like
> > to start the vote next Monday (4.10).
> > >>
> > >> [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > >>
> > >> Best regards,
> > >> Yuxia
> > >>
> > >> - 原始邮件 -
> > >> 发件人: "yuxia" 
> > >> 收件人: "dev" 
> > >> 发送时间: 星期五, 2023年 3 月 24日 上午 11:27:42
> > >> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> > >>
> > >> Thanks all for your feedback.
> > >>
> > >> @Shammon FY
> > >> My gut feeling is that the end user shouldn't care about whether it'll
> > delete direcotry or move to Trash directory with the TRUNCATE TABLE
> > statement. They only need to know it will delete all rows from a table.
> > >> To me, I think delete directory or move to trash is more likely to be
> a
> > behavior of external storage level instead of SQL statement level. In
> Hive,
> > i

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-13 Thread Jing Ge
Thanks Yuxia for the clarification and FLIP update. The FLIP looks good!

Best regards,
Jing

On Mon, Apr 10, 2023 at 3:51 AM yuxia  wrote:

> 1:
> Actaully, considering the Flink's implementation, Flink just provides
> Truncate Table syntax to help user simlify data management as said in this
> FLIP and push the implementation of Truncate Table to external connector.
> Normally, the effect of TRUENCATE TABLE is same as Drop Table + Create
> Table. But the real difference/benefit depends on the implementation of the
> external connector.
> For example, for DROP Table statement, some external connectors may also
> drop the view related or other things.
> But for Truncate Table, the connectors may just delete all data without
> other operations.
>
>
> 2:
> At very begining, I'm thinking about in which case user may want to
> truncate a temporary table.
> I thought users can always create a table in catalog(if the table doesn't
> exist in a catalog) and truncate the table. So I tend not to expose it to
> user.
> But after I think it over again, I think it may be reasonable to support
> truncate a temporary table for the case that user just want to delete all
> datas from a table in an external storage without storing the metadata of
> the table in a catalog so that the other user/session can't see the metada.
> I think we can relax to the constraint to support truncate temporary
> table. Now, I update it to the FLIP.
>
>
> 3:
> Thanks for your input, I agree that we can dicuss it in a different FLIP.
>
>
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -
> 发件人: "Jing Ge" 
> 收件人: "dev" 
> 发送时间: 星期六, 2023年 4 月 08日 上午 3:05:11
> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
>
> Hi yuxia,
>
> Thanks for raising this topic. It is indeed a useful feature. +1 for
> having it in Flink. I have some small questions and it would be great if
> related information could be described in the FLIP.
>
> 1. Speaking of data warehouse use cases, what is the benefit of using
> TRUNCATE table over DROP table + CREATE table IF NOT EXISTS with the
> consideration of concrete Flink implementations? What would be the
> suggestion for users to use TRUNCATE instead of DROP + CREATE... and
> vise versa?
>
> 2. Since some engines support it, would you like to describe your
> thought about why TRUNCATE table does not support temporary table?
>
> 3. The partition support is an important feature, afaic. It might
> deserve a different FLIP and consider e.g.: TRUNCATE TABLE
> tt_dw_usr_exp_xxx PARTITION(dt='20230303') and ALTER TABLE
> tt_dw_usr_exp_xxx DROP IF EXISTS PARTITION(dt='20230303').
>
> Looking forward to your thoughts. Thanks!
>
> Best regards,
>
> Jing
>
> On 4/7/23 05:04, Jingsong Li wrote:
> > +1 for voting.
> >
> > Best,
> > Jingsong
> >
> > On Thu, Apr 6, 2023 at 4:52 PM yuxia 
> wrote:
> >> Hi everyone.
> >>
> >> If there are no other questions or concerns for the FLIP[1], I'd like
> to start the vote next Monday (4.10).
> >>
> >> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> >>
> >> Best regards,
> >> Yuxia
> >>
> >> - 原始邮件 -
> >> 发件人: "yuxia" 
> >> 收件人: "dev" 
> >> 发送时间: 星期五, 2023年 3 月 24日 上午 11:27:42
> >> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> >>
> >> Thanks all for your feedback.
> >>
> >> @Shammon FY
> >> My gut feeling is that the end user shouldn't care about whether it'll
> delete direcotry or move to Trash directory with the TRUNCATE TABLE
> statement. They only need to know it will delete all rows from a table.
> >> To me, I think delete directory or move to trash is more likely to be a
> behavior of external storage level instead of SQL statement level. In Hive,
> if user configure Trash, it will then move files to trash for DROP statment.
> >> Also, hardly did I see such usage with TRUNCATE TABLE statement in
> other engines. What's more, to support it, we have to extend the TRUNCATE
> TABLE synax which won't then compliant with SQL standard. I really don't
> want to do that and I believe it'll make user confused if we do so.
> >>
> >> @Hang
> >> `TRUNCATE TABLE` is meant to delete all rows of a base table. So, it
> makes no sense that table source implements it.
> >> If user use TRUNCATE TABLE statement to truncate a table, the planner
> will only try to
> >> find the DynamicTableSink for the corresponding table.
> >>
> >> @Ran Tao
> >> 1: Than

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-13 Thread yuxia
Hi.
Thanks all for valuable inputs. If there are no other questions or concerns for 
the FLIP[1],  I'd like to start the vote tomorrow (4.14).

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement

Best regards,
Yuxia

- 原始邮件 -
发件人: "Aitozi" 
收件人: "dev" 
发送时间: 星期四, 2023年 4 月 13日 下午 4:49:11
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

Well, thanks xia for your clarification. Agree with your point, I have
no other concerns.

Best,
Aitozi.

yuxia  于2023年4月13日周四 16:17写道:
>
> Hi, Aitozi.
> Thanks for your inputs. I understand your concern. Althogh the external 
> connector can update the metadata in method `executeTruncation`,
> but the Flink catalog can't be aware the updating in some case. If the Hive 
> catalog only store hive tables, everything will be fine.
> But if the Hive catalog also store non-hive table, and the non-hive table 
> can't be update the underlying Hive metatasore, as a result of which
> the Hive catalog will still get old metata.
>
> Since this problem is generic which is not only limited to truncate table 
> statment, but also to other statement, like insert, update/delete or other 
> statments on the way.
> I think it deserves another dedicated channel to discuss what the Flink 
> catalog is for or do we need to introduce some new mechanism for it.
>
>
> Best regards,
> Yuxia
>
> - 原始邮件 -----
> 发件人: "Aitozi" 
> 收件人: "dev" 
> 发送时间: 星期四, 2023年 4 月 13日 下午 2:37:48
> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
>
> Hi, xia
>> which I think if Flink supports table cache in framework-level,
> we can also recache in framework-level for truncate table statement.
>
> I think currently flink catalog already will some stats for the table,
> eg: after `ANALYZE TABLE`, the table's Statistics will be stored in
> the
> catalog, but truncate table will not correct the statistic.
>
> I know it's hard for Flink to do the unified follow-up actions after
> truncating table. But I think we need define a clear location for the
> Flink Catalog
> in mind.
> IMO, Flink as a compute engine, it's hard for it to maintain the
> catalog for different storage table itself. So with more and more
> `Executable`
> command introduced the data in catalog will be cleaved.
> In this case, after truncate the catalog's following part may be affected:
>
> - the table/column statistic will be not correct
> - the partition of this table should be cleared
>
>
> Best,
> Aitozi.
>
>
> liu ron  于2023年4月13日周四 11:28写道:
>
> >
> > Hi, xia
> >
> > Thanks for your explanation, for the first question, given the current
> > status, I think we can provide the generic interface in the future if we
> > need it. For the second question,  it makes sense to me if we can
> > support the table cache at the framework level.
> >
> > Best,
> > Ron
> >
> > yuxia  于2023年4月11日周二 16:12写道:
> >
> > > Hi, ron.
> > >
> > > 1: Considering for deleting rows, Flink will also write delete record to
> > > achive purpose of deleting data, it may not as so strange for connector
> > > devs to make DynamicTableSink implement SupportsTruncate to support
> > > truncate the table. Based on the assume that DynamicTableSink is used for
> > > inserting/updating/deleting, I think it's reasonable for DynamicTableSink
> > > to implement SupportsTruncate. But I think it sounds reasonable to add a
> > > generic interface like DynamicTable to differentiate DynamicTableSource &
> > > DynamicTableSink. But it will definitely requires much design and
> > > discussion which deserves a dedicated FLIP. I perfer not to do that in 
> > > this
> > > FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe
> > > we can discuss it if some day if we do need the new generic table 
> > > interface.
> > >
> > > 2: Considering various catalogs and tables, it's hard for Flink to do the
> > > unified follow-up actions after truncating table. But still the external
> > > connector can do such follow-up actions in method `executeTruncation`.
> > > Btw, in Spark, for the newly truncate table interface[1], Spark only
> > > recaches the table after truncating table[2] which I think if Flink
> > > supports table cache in framework-level,
> > > we can also recache in framework-level for truncate table statement.
> > >
> > > [1]
> > > https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/T

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-13 Thread Aitozi
Well, thanks xia for your clarification. Agree with your point, I have
no other concerns.

Best,
Aitozi.

yuxia  于2023年4月13日周四 16:17写道:
>
> Hi, Aitozi.
> Thanks for your inputs. I understand your concern. Althogh the external 
> connector can update the metadata in method `executeTruncation`,
> but the Flink catalog can't be aware the updating in some case. If the Hive 
> catalog only store hive tables, everything will be fine.
> But if the Hive catalog also store non-hive table, and the non-hive table 
> can't be update the underlying Hive metatasore, as a result of which
> the Hive catalog will still get old metata.
>
> Since this problem is generic which is not only limited to truncate table 
> statment, but also to other statement, like insert, update/delete or other 
> statments on the way.
> I think it deserves another dedicated channel to discuss what the Flink 
> catalog is for or do we need to introduce some new mechanism for it.
>
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Aitozi" 
> 收件人: "dev" 
> 发送时间: 星期四, 2023年 4 月 13日 下午 2:37:48
> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
>
> Hi, xia
>> which I think if Flink supports table cache in framework-level,
> we can also recache in framework-level for truncate table statement.
>
> I think currently flink catalog already will some stats for the table,
> eg: after `ANALYZE TABLE`, the table's Statistics will be stored in
> the
> catalog, but truncate table will not correct the statistic.
>
> I know it's hard for Flink to do the unified follow-up actions after
> truncating table. But I think we need define a clear location for the
> Flink Catalog
> in mind.
> IMO, Flink as a compute engine, it's hard for it to maintain the
> catalog for different storage table itself. So with more and more
> `Executable`
> command introduced the data in catalog will be cleaved.
> In this case, after truncate the catalog's following part may be affected:
>
> - the table/column statistic will be not correct
> - the partition of this table should be cleared
>
>
> Best,
> Aitozi.
>
>
> liu ron  于2023年4月13日周四 11:28写道:
>
> >
> > Hi, xia
> >
> > Thanks for your explanation, for the first question, given the current
> > status, I think we can provide the generic interface in the future if we
> > need it. For the second question,  it makes sense to me if we can
> > support the table cache at the framework level.
> >
> > Best,
> > Ron
> >
> > yuxia  于2023年4月11日周二 16:12写道:
> >
> > > Hi, ron.
> > >
> > > 1: Considering for deleting rows, Flink will also write delete record to
> > > achive purpose of deleting data, it may not as so strange for connector
> > > devs to make DynamicTableSink implement SupportsTruncate to support
> > > truncate the table. Based on the assume that DynamicTableSink is used for
> > > inserting/updating/deleting, I think it's reasonable for DynamicTableSink
> > > to implement SupportsTruncate. But I think it sounds reasonable to add a
> > > generic interface like DynamicTable to differentiate DynamicTableSource &
> > > DynamicTableSink. But it will definitely requires much design and
> > > discussion which deserves a dedicated FLIP. I perfer not to do that in 
> > > this
> > > FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe
> > > we can discuss it if some day if we do need the new generic table 
> > > interface.
> > >
> > > 2: Considering various catalogs and tables, it's hard for Flink to do the
> > > unified follow-up actions after truncating table. But still the external
> > > connector can do such follow-up actions in method `executeTruncation`.
> > > Btw, in Spark, for the newly truncate table interface[1], Spark only
> > > recaches the table after truncating table[2] which I think if Flink
> > > supports table cache in framework-level,
> > > we can also recache in framework-level for truncate table statement.
> > >
> > > [1]
> > > https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
> > > [2]
> > > https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala
> > >
> > >
> > > I think the external catalog can implemnet such logic in method
> > > `executeTruncation`.
> > >
> > > Best regards,
>

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-13 Thread yuxia
Hi, Aitozi.
Thanks for your inputs. I understand your concern. Althogh the external 
connector can update the metadata in method `executeTruncation`,
but the Flink catalog can't be aware the updating in some case. If the Hive 
catalog only store hive tables, everything will be fine.
But if the Hive catalog also store non-hive table, and the non-hive table can't 
be update the underlying Hive metatasore, as a result of which
the Hive catalog will still get old metata.

Since this problem is generic which is not only limited to truncate table 
statment, but also to other statement, like insert, update/delete or other 
statments on the way.
I think it deserves another dedicated channel to discuss what the Flink catalog 
is for or do we need to introduce some new mechanism for it.


Best regards,
Yuxia

- 原始邮件 -
发件人: "Aitozi" 
收件人: "dev" 
发送时间: 星期四, 2023年 4 月 13日 下午 2:37:48
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

Hi, xia
   > which I think if Flink supports table cache in framework-level,
we can also recache in framework-level for truncate table statement.

I think currently flink catalog already will some stats for the table,
eg: after `ANALYZE TABLE`, the table's Statistics will be stored in
the
catalog, but truncate table will not correct the statistic.

I know it's hard for Flink to do the unified follow-up actions after
truncating table. But I think we need define a clear location for the
Flink Catalog
in mind.
IMO, Flink as a compute engine, it's hard for it to maintain the
catalog for different storage table itself. So with more and more
`Executable`
command introduced the data in catalog will be cleaved.
In this case, after truncate the catalog's following part may be affected:

- the table/column statistic will be not correct
- the partition of this table should be cleared


Best,
Aitozi.


liu ron  于2023年4月13日周四 11:28写道:

>
> Hi, xia
>
> Thanks for your explanation, for the first question, given the current
> status, I think we can provide the generic interface in the future if we
> need it. For the second question,  it makes sense to me if we can
> support the table cache at the framework level.
>
> Best,
> Ron
>
> yuxia  于2023年4月11日周二 16:12写道:
>
> > Hi, ron.
> >
> > 1: Considering for deleting rows, Flink will also write delete record to
> > achive purpose of deleting data, it may not as so strange for connector
> > devs to make DynamicTableSink implement SupportsTruncate to support
> > truncate the table. Based on the assume that DynamicTableSink is used for
> > inserting/updating/deleting, I think it's reasonable for DynamicTableSink
> > to implement SupportsTruncate. But I think it sounds reasonable to add a
> > generic interface like DynamicTable to differentiate DynamicTableSource &
> > DynamicTableSink. But it will definitely requires much design and
> > discussion which deserves a dedicated FLIP. I perfer not to do that in this
> > FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe
> > we can discuss it if some day if we do need the new generic table interface.
> >
> > 2: Considering various catalogs and tables, it's hard for Flink to do the
> > unified follow-up actions after truncating table. But still the external
> > connector can do such follow-up actions in method `executeTruncation`.
> > Btw, in Spark, for the newly truncate table interface[1], Spark only
> > recaches the table after truncating table[2] which I think if Flink
> > supports table cache in framework-level,
> > we can also recache in framework-level for truncate table statement.
> >
> > [1]
> > https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
> > [2]
> > https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala
> >
> >
> > I think the external catalog can implemnet such logic in method
> > `executeTruncation`.
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "liu ron" 
> > 收件人: "dev" 
> > 发送时间: 星期二, 2023年 4 月 11日 上午 10:51:36
> > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> >
> > Hi, xia
> > It's a nice improvement to support TRUNCATE TABLE statement, making Flink
> > more feature-rich.
> > I think the truncate syntax is a command that will be executed in the
> > client's process, rather than pulling up a Flink job to execute on the
> > cluster. So on the user-facing exposed interface, I think we should not let
> > users implement the Supports

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-13 Thread Aitozi
Hi, xia
   > which I think if Flink supports table cache in framework-level,
we can also recache in framework-level for truncate table statement.

I think currently flink catalog already will some stats for the table,
eg: after `ANALYZE TABLE`, the table's Statistics will be stored in
the
catalog, but truncate table will not correct the statistic.

I know it's hard for Flink to do the unified follow-up actions after
truncating table. But I think we need define a clear location for the
Flink Catalog
in mind.
IMO, Flink as a compute engine, it's hard for it to maintain the
catalog for different storage table itself. So with more and more
`Executable`
command introduced the data in catalog will be cleaved.
In this case, after truncate the catalog's following part may be affected:

- the table/column statistic will be not correct
- the partition of this table should be cleared


Best,
Aitozi.


liu ron  于2023年4月13日周四 11:28写道:

>
> Hi, xia
>
> Thanks for your explanation, for the first question, given the current
> status, I think we can provide the generic interface in the future if we
> need it. For the second question,  it makes sense to me if we can
> support the table cache at the framework level.
>
> Best,
> Ron
>
> yuxia  于2023年4月11日周二 16:12写道:
>
> > Hi, ron.
> >
> > 1: Considering for deleting rows, Flink will also write delete record to
> > achive purpose of deleting data, it may not as so strange for connector
> > devs to make DynamicTableSink implement SupportsTruncate to support
> > truncate the table. Based on the assume that DynamicTableSink is used for
> > inserting/updating/deleting, I think it's reasonable for DynamicTableSink
> > to implement SupportsTruncate. But I think it sounds reasonable to add a
> > generic interface like DynamicTable to differentiate DynamicTableSource &
> > DynamicTableSink. But it will definitely requires much design and
> > discussion which deserves a dedicated FLIP. I perfer not to do that in this
> > FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe
> > we can discuss it if some day if we do need the new generic table interface.
> >
> > 2: Considering various catalogs and tables, it's hard for Flink to do the
> > unified follow-up actions after truncating table. But still the external
> > connector can do such follow-up actions in method `executeTruncation`.
> > Btw, in Spark, for the newly truncate table interface[1], Spark only
> > recaches the table after truncating table[2] which I think if Flink
> > supports table cache in framework-level,
> > we can also recache in framework-level for truncate table statement.
> >
> > [1]
> > https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
> > [2]
> > https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala
> >
> >
> > I think the external catalog can implemnet such logic in method
> > `executeTruncation`.
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "liu ron" 
> > 收件人: "dev" 
> > 发送时间: 星期二, 2023年 4 月 11日 上午 10:51:36
> > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> >
> > Hi, xia
> > It's a nice improvement to support TRUNCATE TABLE statement, making Flink
> > more feature-rich.
> > I think the truncate syntax is a command that will be executed in the
> > client's process, rather than pulling up a Flink job to execute on the
> > cluster. So on the user-facing exposed interface, I think we should not let
> > users implement the SupportsTruncate interface on the DynamicTableSink
> > interface. This seems a bit strange and also confuses users, as hang said,
> > why Source table does not support truncate. It would be nice if we could
> > come up with a generic interface that supports truncate instead of binding
> > it to the DynamicTableSink interface, and maybe in the future we will
> > support more commands like truncate command.
> >
> > In addition, after truncating data, we may also need to update the metadata
> > of the table, such as Hive table, we need to update the statistics, as well
> > as clear the cache in the metastore, I think we should also consider these
> > capabilities, Sparky has considered these, refer to
> >
> > https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573
> > .
> >
> >

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-12 Thread liu ron
Hi, xia

Thanks for your explanation, for the first question, given the current
status, I think we can provide the generic interface in the future if we
need it. For the second question,  it makes sense to me if we can
support the table cache at the framework level.

Best,
Ron

yuxia  于2023年4月11日周二 16:12写道:

> Hi, ron.
>
> 1: Considering for deleting rows, Flink will also write delete record to
> achive purpose of deleting data, it may not as so strange for connector
> devs to make DynamicTableSink implement SupportsTruncate to support
> truncate the table. Based on the assume that DynamicTableSink is used for
> inserting/updating/deleting, I think it's reasonable for DynamicTableSink
> to implement SupportsTruncate. But I think it sounds reasonable to add a
> generic interface like DynamicTable to differentiate DynamicTableSource &
> DynamicTableSink. But it will definitely requires much design and
> discussion which deserves a dedicated FLIP. I perfer not to do that in this
> FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe
> we can discuss it if some day if we do need the new generic table interface.
>
> 2: Considering various catalogs and tables, it's hard for Flink to do the
> unified follow-up actions after truncating table. But still the external
> connector can do such follow-up actions in method `executeTruncation`.
> Btw, in Spark, for the newly truncate table interface[1], Spark only
> recaches the table after truncating table[2] which I think if Flink
> supports table cache in framework-level,
> we can also recache in framework-level for truncate table statement.
>
> [1]
> https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
> [2]
> https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala
>
>
> I think the external catalog can implemnet such logic in method
> `executeTruncation`.
>
> Best regards,
> Yuxia
>
> ----- 原始邮件 -
> 发件人: "liu ron" 
> 收件人: "dev" 
> 发送时间: 星期二, 2023年 4 月 11日 上午 10:51:36
> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
>
> Hi, xia
> It's a nice improvement to support TRUNCATE TABLE statement, making Flink
> more feature-rich.
> I think the truncate syntax is a command that will be executed in the
> client's process, rather than pulling up a Flink job to execute on the
> cluster. So on the user-facing exposed interface, I think we should not let
> users implement the SupportsTruncate interface on the DynamicTableSink
> interface. This seems a bit strange and also confuses users, as hang said,
> why Source table does not support truncate. It would be nice if we could
> come up with a generic interface that supports truncate instead of binding
> it to the DynamicTableSink interface, and maybe in the future we will
> support more commands like truncate command.
>
> In addition, after truncating data, we may also need to update the metadata
> of the table, such as Hive table, we need to update the statistics, as well
> as clear the cache in the metastore, I think we should also consider these
> capabilities, Sparky has considered these, refer to
>
> https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573
> .
>
> Best,
>
> Ron
>
> Jim Hughes  于2023年4月11日周二 02:15写道:
>
> > Hi Yuxia,
> >
> > On Mon, Apr 10, 2023 at 10:35 AM yuxia 
> > wrote:
> >
> > > Hi, Jim.
> > >
> > > 1: I'm expecting all DynamicTableSinks to support. But it's hard to
> > > support all at one shot. For the DynamicTableSinks that haven't
> > implemented
> > > SupportsTruncate interface, we'll throw exception
> > > like 'The truncate statement for the table is not supported as it
> hasn't
> > > implemented the interface SupportsTruncate'. Also, for some sinks that
> > > doesn't support deleting data, it can also implements it but throw more
> > > concrete exception like "xxx donesn't support to truncate a table as
> > delete
> > > is impossible for xxx". It depends on the external connector's
> > > implementation.
> > > Thanks for your advice, I updated it to the FLIP.
> > >
> >
> > Makes sense.
> >
> >
> > > 2: What do you mean by saying "truncate an input to a streaming query"?
> > > This FLIP is aimed to support TRUNCATE TABLE statement which is for
> > > truncating a table. In whic

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-11 Thread yuxia
Hi, ron.

1: Considering for deleting rows, Flink will also write delete record to achive 
purpose of deleting data, it may not as so strange for connector devs to make 
DynamicTableSink implement SupportsTruncate to support truncate the table. 
Based on the assume that DynamicTableSink is used for 
inserting/updating/deleting, I think it's reasonable for DynamicTableSink to 
implement SupportsTruncate. But I think it sounds reasonable to add a generic 
interface like DynamicTable to differentiate DynamicTableSource & 
DynamicTableSink. But it will definitely requires much design and discussion 
which deserves a dedicated FLIP. I perfer not to do that in this FLIP to avoid 
overdesign and I think it's not a must for this FLIP. Maybe we can discuss it 
if some day if we do need the new generic table interface.

2: Considering various catalogs and tables, it's hard for Flink to do the 
unified follow-up actions after truncating table. But still the external 
connector can do such follow-up actions in method `executeTruncation`. 
Btw, in Spark, for the newly truncate table interface[1], Spark only recaches 
the table after truncating table[2] which I think if Flink supports table cache 
in framework-level,
we can also recache in framework-level for truncate table statement.

[1] 
https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
[2] 
https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala


I think the external catalog can implemnet such logic in method 
`executeTruncation`.

Best regards,
Yuxia

- 原始邮件 -
发件人: "liu ron" 
收件人: "dev" 
发送时间: 星期二, 2023年 4 月 11日 上午 10:51:36
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

Hi, xia
It's a nice improvement to support TRUNCATE TABLE statement, making Flink
more feature-rich.
I think the truncate syntax is a command that will be executed in the
client's process, rather than pulling up a Flink job to execute on the
cluster. So on the user-facing exposed interface, I think we should not let
users implement the SupportsTruncate interface on the DynamicTableSink
interface. This seems a bit strange and also confuses users, as hang said,
why Source table does not support truncate. It would be nice if we could
come up with a generic interface that supports truncate instead of binding
it to the DynamicTableSink interface, and maybe in the future we will
support more commands like truncate command.

In addition, after truncating data, we may also need to update the metadata
of the table, such as Hive table, we need to update the statistics, as well
as clear the cache in the metastore, I think we should also consider these
capabilities, Sparky has considered these, refer to
https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573
.

Best,

Ron

Jim Hughes  于2023年4月11日周二 02:15写道:

> Hi Yuxia,
>
> On Mon, Apr 10, 2023 at 10:35 AM yuxia 
> wrote:
>
> > Hi, Jim.
> >
> > 1: I'm expecting all DynamicTableSinks to support. But it's hard to
> > support all at one shot. For the DynamicTableSinks that haven't
> implemented
> > SupportsTruncate interface, we'll throw exception
> > like 'The truncate statement for the table is not supported as it hasn't
> > implemented the interface SupportsTruncate'. Also, for some sinks that
> > doesn't support deleting data, it can also implements it but throw more
> > concrete exception like "xxx donesn't support to truncate a table as
> delete
> > is impossible for xxx". It depends on the external connector's
> > implementation.
> > Thanks for your advice, I updated it to the FLIP.
> >
>
> Makes sense.
>
>
> > 2: What do you mean by saying "truncate an input to a streaming query"?
> > This FLIP is aimed to support TRUNCATE TABLE statement which is for
> > truncating a table. In which case it will inoperates with streaming
> queries?
> >
>
> Let's take a source like Kafka as an example.  Suppose I have an input
> topic Foo, and query which uses it as an input.
>
> When Foo is truncated, if the truncation works as a delete and create, then
> the connector may need to be made aware (otherwise it may try to use
> offsets from the previous topic).  On the other hand, one may have to ask
> Kafka to delete records up to a certain point.
>
> Also, savepoints for the query may contain information from the truncated
> table.  Should this FLIP involve invalidating that information in some
> manner?  Or does truncating a source table for a query cause undefined
> behavior on that query?
>
>

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-10 Thread liu ron
Hi, xia
It's a nice improvement to support TRUNCATE TABLE statement, making Flink
more feature-rich.
I think the truncate syntax is a command that will be executed in the
client's process, rather than pulling up a Flink job to execute on the
cluster. So on the user-facing exposed interface, I think we should not let
users implement the SupportsTruncate interface on the DynamicTableSink
interface. This seems a bit strange and also confuses users, as hang said,
why Source table does not support truncate. It would be nice if we could
come up with a generic interface that supports truncate instead of binding
it to the DynamicTableSink interface, and maybe in the future we will
support more commands like truncate command.

In addition, after truncating data, we may also need to update the metadata
of the table, such as Hive table, we need to update the statistics, as well
as clear the cache in the metastore, I think we should also consider these
capabilities, Sparky has considered these, refer to
https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573
.

Best,

Ron

Jim Hughes  于2023年4月11日周二 02:15写道:

> Hi Yuxia,
>
> On Mon, Apr 10, 2023 at 10:35 AM yuxia 
> wrote:
>
> > Hi, Jim.
> >
> > 1: I'm expecting all DynamicTableSinks to support. But it's hard to
> > support all at one shot. For the DynamicTableSinks that haven't
> implemented
> > SupportsTruncate interface, we'll throw exception
> > like 'The truncate statement for the table is not supported as it hasn't
> > implemented the interface SupportsTruncate'. Also, for some sinks that
> > doesn't support deleting data, it can also implements it but throw more
> > concrete exception like "xxx donesn't support to truncate a table as
> delete
> > is impossible for xxx". It depends on the external connector's
> > implementation.
> > Thanks for your advice, I updated it to the FLIP.
> >
>
> Makes sense.
>
>
> > 2: What do you mean by saying "truncate an input to a streaming query"?
> > This FLIP is aimed to support TRUNCATE TABLE statement which is for
> > truncating a table. In which case it will inoperates with streaming
> queries?
> >
>
> Let's take a source like Kafka as an example.  Suppose I have an input
> topic Foo, and query which uses it as an input.
>
> When Foo is truncated, if the truncation works as a delete and create, then
> the connector may need to be made aware (otherwise it may try to use
> offsets from the previous topic).  On the other hand, one may have to ask
> Kafka to delete records up to a certain point.
>
> Also, savepoints for the query may contain information from the truncated
> table.  Should this FLIP involve invalidating that information in some
> manner?  Or does truncating a source table for a query cause undefined
> behavior on that query?
>
> Basically, I'm trying to think through the implementations of a truncate
> operation to streaming sources and queries.
>
> Cheers,
>
> Jim
>
>
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Jim Hughes" 
> > 收件人: "dev" 
> > 发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28
> > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> >
> > Hi Yuxia,
> >
> > Two questions:
> >
> > 1.  Are you expecting all DynamicTableSinks to support Truncate?  The
> FLIP
> > could use some explanation for what supporting and not supporting the
> > operation means.
> >
> > 2.  How will truncate inoperate with streaming queries?  That is, if I
> > truncate an input to a streaming query, is there any defined behavior?
> >
> > Cheers,
> >
> > Jim
> >
> > On Wed, Mar 22, 2023 at 9:13 AM yuxia 
> wrote:
> >
> > > Hi, devs.
> > >
> > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> > > statement [1].
> > >
> > > The TRUNCATE TABLE statement is a SQL command that allows users to
> > quickly
> > > and efficiently delete all rows from a table without dropping the table
> > > itself. This statement is commonly used in data warehouse, where large
> > data
> > > sets are frequently loaded and unloaded from tables.
> > > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore
> > exactly,
> > > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface
> > with
> > > which the coresponding connectors can implement their own logic for
> > > truncating table.
> > >
> > > Looking forwards to your feedback.
> > >
> > > [1]: [
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > > |
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > > ]
> > >
> > >
> > > Best regards,
> > > Yuxia
> > >
> >
>


Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-10 Thread yuxia
Hi, Jim.
Thanks for your explanation. Now, I got you. 
I think you raise a good question. 
As Flink doesn't manage the underlying storage, it's hard for Flink itself to 
do the real coordiantion.
For me, it looks like Flink needs to introduce some common coordiantion which 
maybe dicussed in another dedicated FLIP 
or the external connector/storage should consider such coordiantion. 
Also, the question makes me think over the semantic for truncate table 
statement in stream scenario which I miss. Considering the use cases of 
truncate table are mainly for batch scenario and the semantic in stream 
scenario should be discussed separately, I'd like to limit the scope of the 
FLIP to batch only. Now, I have updated the title & content of the FLIP to 
avoid misunderstanding.
 

Best regards,
Yuxia

- 原始邮件 -
发件人: "Jim Hughes" 
收件人: "dev" 
发送时间: 星期二, 2023年 4 月 11日 上午 2:15:10
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

Hi Yuxia,

On Mon, Apr 10, 2023 at 10:35 AM yuxia  wrote:

> Hi, Jim.
>
> 1: I'm expecting all DynamicTableSinks to support. But it's hard to
> support all at one shot. For the DynamicTableSinks that haven't implemented
> SupportsTruncate interface, we'll throw exception
> like 'The truncate statement for the table is not supported as it hasn't
> implemented the interface SupportsTruncate'. Also, for some sinks that
> doesn't support deleting data, it can also implements it but throw more
> concrete exception like "xxx donesn't support to truncate a table as delete
> is impossible for xxx". It depends on the external connector's
> implementation.
> Thanks for your advice, I updated it to the FLIP.
>

Makes sense.


> 2: What do you mean by saying "truncate an input to a streaming query"?
> This FLIP is aimed to support TRUNCATE TABLE statement which is for
> truncating a table. In which case it will inoperates with streaming queries?
>

Let's take a source like Kafka as an example.  Suppose I have an input
topic Foo, and query which uses it as an input.

When Foo is truncated, if the truncation works as a delete and create, then
the connector may need to be made aware (otherwise it may try to use
offsets from the previous topic).  On the other hand, one may have to ask
Kafka to delete records up to a certain point.

Also, savepoints for the query may contain information from the truncated
table.  Should this FLIP involve invalidating that information in some
manner?  Or does truncating a source table for a query cause undefined
behavior on that query?

Basically, I'm trying to think through the implementations of a truncate
operation to streaming sources and queries.

Cheers,

Jim


> Best regards,
> Yuxia
>
> ----- 原始邮件 -----
> 发件人: "Jim Hughes" 
> 收件人: "dev" 
> 发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28
> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
>
> Hi Yuxia,
>
> Two questions:
>
> 1.  Are you expecting all DynamicTableSinks to support Truncate?  The FLIP
> could use some explanation for what supporting and not supporting the
> operation means.
>
> 2.  How will truncate inoperate with streaming queries?  That is, if I
> truncate an input to a streaming query, is there any defined behavior?
>
> Cheers,
>
> Jim
>
> On Wed, Mar 22, 2023 at 9:13 AM yuxia  wrote:
>
> > Hi, devs.
> >
> > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> > statement [1].
> >
> > The TRUNCATE TABLE statement is a SQL command that allows users to
> quickly
> > and efficiently delete all rows from a table without dropping the table
> > itself. This statement is commonly used in data warehouse, where large
> data
> > sets are frequently loaded and unloaded from tables.
> > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore
> exactly,
> > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface
> with
> > which the coresponding connectors can implement their own logic for
> > truncating table.
> >
> > Looking forwards to your feedback.
> >
> > [1]: [
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > |
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > ]
> >
> >
> > Best regards,
> > Yuxia
> >
>


Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-10 Thread Jim Hughes
Hi Yuxia,

On Mon, Apr 10, 2023 at 10:35 AM yuxia  wrote:

> Hi, Jim.
>
> 1: I'm expecting all DynamicTableSinks to support. But it's hard to
> support all at one shot. For the DynamicTableSinks that haven't implemented
> SupportsTruncate interface, we'll throw exception
> like 'The truncate statement for the table is not supported as it hasn't
> implemented the interface SupportsTruncate'. Also, for some sinks that
> doesn't support deleting data, it can also implements it but throw more
> concrete exception like "xxx donesn't support to truncate a table as delete
> is impossible for xxx". It depends on the external connector's
> implementation.
> Thanks for your advice, I updated it to the FLIP.
>

Makes sense.


> 2: What do you mean by saying "truncate an input to a streaming query"?
> This FLIP is aimed to support TRUNCATE TABLE statement which is for
> truncating a table. In which case it will inoperates with streaming queries?
>

Let's take a source like Kafka as an example.  Suppose I have an input
topic Foo, and query which uses it as an input.

When Foo is truncated, if the truncation works as a delete and create, then
the connector may need to be made aware (otherwise it may try to use
offsets from the previous topic).  On the other hand, one may have to ask
Kafka to delete records up to a certain point.

Also, savepoints for the query may contain information from the truncated
table.  Should this FLIP involve invalidating that information in some
manner?  Or does truncating a source table for a query cause undefined
behavior on that query?

Basically, I'm trying to think through the implementations of a truncate
operation to streaming sources and queries.

Cheers,

Jim


> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Jim Hughes" 
> 收件人: "dev" 
> 发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28
> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
>
> Hi Yuxia,
>
> Two questions:
>
> 1.  Are you expecting all DynamicTableSinks to support Truncate?  The FLIP
> could use some explanation for what supporting and not supporting the
> operation means.
>
> 2.  How will truncate inoperate with streaming queries?  That is, if I
> truncate an input to a streaming query, is there any defined behavior?
>
> Cheers,
>
> Jim
>
> On Wed, Mar 22, 2023 at 9:13 AM yuxia  wrote:
>
> > Hi, devs.
> >
> > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> > statement [1].
> >
> > The TRUNCATE TABLE statement is a SQL command that allows users to
> quickly
> > and efficiently delete all rows from a table without dropping the table
> > itself. This statement is commonly used in data warehouse, where large
> data
> > sets are frequently loaded and unloaded from tables.
> > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore
> exactly,
> > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface
> with
> > which the coresponding connectors can implement their own logic for
> > truncating table.
> >
> > Looking forwards to your feedback.
> >
> > [1]: [
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > |
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > ]
> >
> >
> > Best regards,
> > Yuxia
> >
>


Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-10 Thread yuxia
Hi, Jim.

1: I'm expecting all DynamicTableSinks to support. But it's hard to support all 
at one shot. For the DynamicTableSinks that haven't implemented 
SupportsTruncate interface, we'll throw exception
like 'The truncate statement for the table is not supported as it hasn't 
implemented the interface SupportsTruncate'. Also, for some sinks that doesn't 
support deleting data, it can also implements it but throw more concrete 
exception like "xxx donesn't support to truncate a table as delete is 
impossible for xxx". It depends on the external connector's implementation.
Thanks for your advice, I updated it to the FLIP.


2: What do you mean by saying "truncate an input to a streaming query"? This 
FLIP is aimed to support TRUNCATE TABLE statement which is for truncating a 
table. In which case it will inoperates with streaming queries?

Best regards,
Yuxia

- 原始邮件 -
发件人: "Jim Hughes" 
收件人: "dev" 
发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

Hi Yuxia,

Two questions:

1.  Are you expecting all DynamicTableSinks to support Truncate?  The FLIP
could use some explanation for what supporting and not supporting the
operation means.

2.  How will truncate inoperate with streaming queries?  That is, if I
truncate an input to a streaming query, is there any defined behavior?

Cheers,

Jim

On Wed, Mar 22, 2023 at 9:13 AM yuxia  wrote:

> Hi, devs.
>
> I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> statement [1].
>
> The TRUNCATE TABLE statement is a SQL command that allows users to quickly
> and efficiently delete all rows from a table without dropping the table
> itself. This statement is commonly used in data warehouse, where large data
> sets are frequently loaded and unloaded from tables.
> So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly,
> this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with
> which the coresponding connectors can implement their own logic for
> truncating table.
>
> Looking forwards to your feedback.
>
> [1]: [
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> |
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> ]
>
>
> Best regards,
> Yuxia
>


Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-10 Thread Jim Hughes
Hi Yuxia,

Two questions:

1.  Are you expecting all DynamicTableSinks to support Truncate?  The FLIP
could use some explanation for what supporting and not supporting the
operation means.

2.  How will truncate inoperate with streaming queries?  That is, if I
truncate an input to a streaming query, is there any defined behavior?

Cheers,

Jim

On Wed, Mar 22, 2023 at 9:13 AM yuxia  wrote:

> Hi, devs.
>
> I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> statement [1].
>
> The TRUNCATE TABLE statement is a SQL command that allows users to quickly
> and efficiently delete all rows from a table without dropping the table
> itself. This statement is commonly used in data warehouse, where large data
> sets are frequently loaded and unloaded from tables.
> So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly,
> this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with
> which the coresponding connectors can implement their own logic for
> truncating table.
>
> Looking forwards to your feedback.
>
> [1]: [
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> |
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> ]
>
>
> Best regards,
> Yuxia
>


Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-09 Thread yuxia
1: 
Actaully, considering the Flink's implementation, Flink just provides Truncate 
Table syntax to help user simlify data management as said in this FLIP and push 
the implementation of Truncate Table to external connector. Normally, the 
effect of TRUENCATE TABLE is same as Drop Table + Create Table. But the real 
difference/benefit depends on the implementation of the external connector. 
For example, for DROP Table statement, some external connectors may also drop 
the view related or other things.
But for Truncate Table, the connectors may just delete all data without other 
operations. 


2:
At very begining, I'm thinking about in which case user may want to truncate a 
temporary table.
I thought users can always create a table in catalog(if the table doesn't exist 
in a catalog) and truncate the table. So I tend not to expose it to user.
But after I think it over again, I think it may be reasonable to support 
truncate a temporary table for the case that user just want to delete all datas 
from a table in an external storage without storing the metadata of the table 
in a catalog so that the other user/session can't see the metada.
I think we can relax to the constraint to support truncate temporary table. 
Now, I update it to the FLIP.


3:
Thanks for your input, I agree that we can dicuss it in a different FLIP.



Best regards,
Yuxia

- 原始邮件 -
发件人: "Jing Ge" 
收件人: "dev" 
发送时间: 星期六, 2023年 4 月 08日 上午 3:05:11
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

Hi yuxia,

Thanks for raising this topic. It is indeed a useful feature. +1 for 
having it in Flink. I have some small questions and it would be great if 
related information could be described in the FLIP.

1. Speaking of data warehouse use cases, what is the benefit of using 
TRUNCATE table over DROP table + CREATE table IF NOT EXISTS with the 
consideration of concrete Flink implementations? What would be the 
suggestion for users to use TRUNCATE instead of DROP + CREATE... and 
vise versa?

2. Since some engines support it, would you like to describe your 
thought about why TRUNCATE table does not support temporary table?

3. The partition support is an important feature, afaic. It might 
deserve a different FLIP and consider e.g.: TRUNCATE TABLE 
tt_dw_usr_exp_xxx PARTITION(dt='20230303') and ALTER TABLE 
tt_dw_usr_exp_xxx DROP IF EXISTS PARTITION(dt='20230303').

Looking forward to your thoughts. Thanks!

Best regards,

Jing

On 4/7/23 05:04, Jingsong Li wrote:
> +1 for voting.
>
> Best,
> Jingsong
>
> On Thu, Apr 6, 2023 at 4:52 PM yuxia  wrote:
>> Hi everyone.
>>
>> If there are no other questions or concerns for the FLIP[1], I'd like to 
>> start the vote next Monday (4.10).
>>
>> [1] 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
>>
>> Best regards,
>> Yuxia
>>
>> - 原始邮件 -
>> 发件人: "yuxia" 
>> 收件人: "dev" 
>> 发送时间: 星期五, 2023年 3 月 24日 上午 11:27:42
>> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
>>
>> Thanks all for your feedback.
>>
>> @Shammon FY
>> My gut feeling is that the end user shouldn't care about whether it'll 
>> delete direcotry or move to Trash directory with the TRUNCATE TABLE 
>> statement. They only need to know it will delete all rows from a table.
>> To me, I think delete directory or move to trash is more likely to be a 
>> behavior of external storage level instead of SQL statement level. In Hive, 
>> if user configure Trash, it will then move files to trash for DROP statment.
>> Also, hardly did I see such usage with TRUNCATE TABLE statement in other 
>> engines. What's more, to support it, we have to extend the TRUNCATE TABLE 
>> synax which won't then compliant with SQL standard. I really don't want to 
>> do that and I believe it'll make user confused if we do so.
>>
>> @Hang
>> `TRUNCATE TABLE` is meant to delete all rows of a base table. So, it makes 
>> no sense that table source implements it.
>> If user use TRUNCATE TABLE statement to truncate a table, the planner will 
>> only try to
>> find the DynamicTableSink for the corresponding table.
>>
>> @Ran Tao
>> 1: Thanks for you reminder. I said it won't support view in the FLIP, but 
>> forget to said temporary table is also not supported. Now, I add this part 
>> to this FLIP.
>>
>> 2: Yes, I also considered to incldue it in this FLIP before. But as far as I 
>> see, I haven't seen much usage of truncate table with partition. It's not as 
>> useful as truncate table. So, I tend to keep this FLIP simple in here 
>> without supporting truncate table with partition.
>> Also, seems for `truncate table with partition`, d

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-07 Thread Jing Ge

Hi yuxia,

Thanks for raising this topic. It is indeed a useful feature. +1 for 
having it in Flink. I have some small questions and it would be great if 
related information could be described in the FLIP.


1. Speaking of data warehouse use cases, what is the benefit of using 
TRUNCATE table over DROP table + CREATE table IF NOT EXISTS with the 
consideration of concrete Flink implementations? What would be the 
suggestion for users to use TRUNCATE instead of DROP + CREATE... and 
vise versa?


2. Since some engines support it, would you like to describe your 
thought about why TRUNCATE table does not support temporary table?


3. The partition support is an important feature, afaic. It might 
deserve a different FLIP and consider e.g.: TRUNCATE TABLE 
tt_dw_usr_exp_xxx PARTITION(dt='20230303') and ALTER TABLE 
tt_dw_usr_exp_xxx DROP IF EXISTS PARTITION(dt='20230303').


Looking forward to your thoughts. Thanks!

Best regards,

Jing

On 4/7/23 05:04, Jingsong Li wrote:

+1 for voting.

Best,
Jingsong

On Thu, Apr 6, 2023 at 4:52 PM yuxia  wrote:

Hi everyone.

If there are no other questions or concerns for the FLIP[1], I'd like to start 
the vote next Monday (4.10).

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement

Best regards,
Yuxia

- 原始邮件 -
发件人: "yuxia" 
收件人: "dev" 
发送时间: 星期五, 2023年 3 月 24日 上午 11:27:42
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

Thanks all for your feedback.

@Shammon FY
My gut feeling is that the end user shouldn't care about whether it'll delete 
direcotry or move to Trash directory with the TRUNCATE TABLE statement. They 
only need to know it will delete all rows from a table.
To me, I think delete directory or move to trash is more likely to be a 
behavior of external storage level instead of SQL statement level. In Hive, if 
user configure Trash, it will then move files to trash for DROP statment.
Also, hardly did I see such usage with TRUNCATE TABLE statement in other 
engines. What's more, to support it, we have to extend the TRUNCATE TABLE synax 
which won't then compliant with SQL standard. I really don't want to do that 
and I believe it'll make user confused if we do so.

@Hang
`TRUNCATE TABLE` is meant to delete all rows of a base table. So, it makes no 
sense that table source implements it.
If user use TRUNCATE TABLE statement to truncate a table, the planner will only 
try to
find the DynamicTableSink for the corresponding table.

@Ran Tao
1: Thanks for you reminder. I said it won't support view in the FLIP, but 
forget to said temporary table is also not supported. Now, I add this part to 
this FLIP.

2: Yes, I also considered to incldue it in this FLIP before. But as far as I 
see, I haven't seen much usage of truncate table with partition. It's not as 
useful as truncate table. So, I tend to keep this FLIP simple in here without 
supporting truncate table with partition.
Also, seems for `truncate table with partition`, differnet engines may have 
differernt syntax;
Hive[1]/Spark[2] use the following syntax:
TRUNCATE TABLE table_name [PARTITION partition_spec]

SqlServer[3] use the follwoing syntax:
TRUNCATE TABLE { database_name.schema_name.table_name | schema_name.table_name | table_name 
} [ WITH ( PARTITIONS ( {  |  }
So, I'm tend to be cautious about it.

But I'm open to this. If there's any feedback or strong requirement, I don't 
mind to add it in this FLIP.
If we do need it in some day, I can propose it in a new FLIP. It won't break 
the current design.

As for concrete syntax in the FLIP, I think the current one is the concrete 
syntax, we don't allow TABLE keyword to be optional.

3: Thanks for your reminder, I have updadted the FLIP for this.


[1]https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl#LanguageManualDDL-TruncateTable
[2]https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-truncate-table.html
[3]https://learn.microsoft.com/en-us/sql/t-sql/statements/truncate-table-transact-sql?view=sql-server-ver16



Best regards,
Yuxia

- 原始邮件 -
发件人: "Ran Tao" 
收件人: "dev" 
发送时间: 星期四, 2023年 3 月 23日 下午 6:28:17
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

Hi, yuxia.

Thanks for starting the discussion.
I think it's a nice improvement to support TRUNCATE TABLE statement because
many other mature engines supports it.

I have some questions.
1. because table has different types, whether we will support view or
temporary tables?

2. some other engines such as spark and hive support TRUNCATE TABLE with
partition. whether we will support?
btw, i think you need give the TRUNCATE TABLE concrete syntax in the FLIP
because some engines has different syntaxes.
for example, hive allow TRUNCATE TABLE be TRUNCATE [TABLE] which means
TABLE keyword can be optional.

3. The Proposed Changes try to use SqlToOperationConverter and run in
TableEnvironmentImpl#executeInternal.
I think it's out of date, the commu

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-06 Thread Jingsong Li
+1 for voting.

Best,
Jingsong

On Thu, Apr 6, 2023 at 4:52 PM yuxia  wrote:
>
> Hi everyone.
>
> If there are no other questions or concerns for the FLIP[1], I'd like to 
> start the vote next Monday (4.10).
>
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "yuxia" 
> 收件人: "dev" 
> 发送时间: 星期五, 2023年 3 月 24日 上午 11:27:42
> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
>
> Thanks all for your feedback.
>
> @Shammon FY
> My gut feeling is that the end user shouldn't care about whether it'll delete 
> direcotry or move to Trash directory with the TRUNCATE TABLE statement. They 
> only need to know it will delete all rows from a table.
> To me, I think delete directory or move to trash is more likely to be a 
> behavior of external storage level instead of SQL statement level. In Hive, 
> if user configure Trash, it will then move files to trash for DROP statment.
> Also, hardly did I see such usage with TRUNCATE TABLE statement in other 
> engines. What's more, to support it, we have to extend the TRUNCATE TABLE 
> synax which won't then compliant with SQL standard. I really don't want to do 
> that and I believe it'll make user confused if we do so.
>
> @Hang
> `TRUNCATE TABLE` is meant to delete all rows of a base table. So, it makes no 
> sense that table source implements it.
> If user use TRUNCATE TABLE statement to truncate a table, the planner will 
> only try to
> find the DynamicTableSink for the corresponding table.
>
> @Ran Tao
> 1: Thanks for you reminder. I said it won't support view in the FLIP, but 
> forget to said temporary table is also not supported. Now, I add this part to 
> this FLIP.
>
> 2: Yes, I also considered to incldue it in this FLIP before. But as far as I 
> see, I haven't seen much usage of truncate table with partition. It's not as 
> useful as truncate table. So, I tend to keep this FLIP simple in here without 
> supporting truncate table with partition.
> Also, seems for `truncate table with partition`, differnet engines may have 
> differernt syntax;
> Hive[1]/Spark[2] use the following syntax:
> TRUNCATE TABLE table_name [PARTITION partition_spec]
>
> SqlServer[3] use the follwoing syntax:
> TRUNCATE TABLE { database_name.schema_name.table_name | 
> schema_name.table_name | table_name } [ WITH ( PARTITIONS ( { 
>  |  }
> So, I'm tend to be cautious about it.
>
> But I'm open to this. If there's any feedback or strong requirement, I don't 
> mind to add it in this FLIP.
> If we do need it in some day, I can propose it in a new FLIP. It won't break 
> the current design.
>
> As for concrete syntax in the FLIP, I think the current one is the concrete 
> syntax, we don't allow TABLE keyword to be optional.
>
> 3: Thanks for your reminder, I have updadted the FLIP for this.
>
>
> [1]https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl#LanguageManualDDL-TruncateTable
> [2]https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-truncate-table.html
> [3]https://learn.microsoft.com/en-us/sql/t-sql/statements/truncate-table-transact-sql?view=sql-server-ver16
>
>
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Ran Tao" 
> 收件人: "dev" 
> 发送时间: 星期四, 2023年 3 月 23日 下午 6:28:17
> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
>
> Hi, yuxia.
>
> Thanks for starting the discussion.
> I think it's a nice improvement to support TRUNCATE TABLE statement because
> many other mature engines supports it.
>
> I have some questions.
> 1. because table has different types, whether we will support view or
> temporary tables?
>
> 2. some other engines such as spark and hive support TRUNCATE TABLE with
> partition. whether we will support?
> btw, i think you need give the TRUNCATE TABLE concrete syntax in the FLIP
> because some engines has different syntaxes.
> for example, hive allow TRUNCATE TABLE be TRUNCATE [TABLE] which means
> TABLE keyword can be optional.
>
> 3. The Proposed Changes try to use SqlToOperationConverter and run in
> TableEnvironmentImpl#executeInternal.
> I think it's out of date, the community is refactoring the conversion logic
> from SqlNode to operation[1] and executions in TableEnvironmentImpl[2].
> I suggest you can use new way to support it.
>
> [1] https://issues.apache.org/jira/browse/FLINK-31464
> [2] https://issues.apache.org/jira/browse/FLINK-31368
>
> Best Regards,
> Ran Tao
> https://github.com/chucheng92
>
>
> yuxia  于2023年3月22日周三 21:13写道:
>
> > Hi, devs.
> >
> > I'd like to start a discussion abou

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-06 Thread yuxia
Hi everyone.

If there are no other questions or concerns for the FLIP[1], I'd like to start 
the vote next Monday (4.10).

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement

Best regards,
Yuxia

- 原始邮件 -
发件人: "yuxia" 
收件人: "dev" 
发送时间: 星期五, 2023年 3 月 24日 上午 11:27:42
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

Thanks all for your feedback.

@Shammon FY
My gut feeling is that the end user shouldn't care about whether it'll delete 
direcotry or move to Trash directory with the TRUNCATE TABLE statement. They 
only need to know it will delete all rows from a table.
To me, I think delete directory or move to trash is more likely to be a 
behavior of external storage level instead of SQL statement level. In Hive, if 
user configure Trash, it will then move files to trash for DROP statment.
Also, hardly did I see such usage with TRUNCATE TABLE statement in other 
engines. What's more, to support it, we have to extend the TRUNCATE TABLE synax 
which won't then compliant with SQL standard. I really don't want to do that 
and I believe it'll make user confused if we do so.

@Hang
`TRUNCATE TABLE` is meant to delete all rows of a base table. So, it makes no 
sense that table source implements it.
If user use TRUNCATE TABLE statement to truncate a table, the planner will only 
try to
find the DynamicTableSink for the corresponding table. 

@Ran Tao
1: Thanks for you reminder. I said it won't support view in the FLIP, but 
forget to said temporary table is also not supported. Now, I add this part to 
this FLIP.

2: Yes, I also considered to incldue it in this FLIP before. But as far as I 
see, I haven't seen much usage of truncate table with partition. It's not as 
useful as truncate table. So, I tend to keep this FLIP simple in here without 
supporting truncate table with partition.
Also, seems for `truncate table with partition`, differnet engines may have 
differernt syntax;
Hive[1]/Spark[2] use the following syntax:
TRUNCATE TABLE table_name [PARTITION partition_spec]

SqlServer[3] use the follwoing syntax:
TRUNCATE TABLE { database_name.schema_name.table_name | schema_name.table_name 
| table_name } [ WITH ( PARTITIONS ( {  |  }
So, I'm tend to be cautious about it.

But I'm open to this. If there's any feedback or strong requirement, I don't 
mind to add it in this FLIP.
If we do need it in some day, I can propose it in a new FLIP. It won't break 
the current design.

As for concrete syntax in the FLIP, I think the current one is the concrete 
syntax, we don't allow TABLE keyword to be optional.

3: Thanks for your reminder, I have updadted the FLIP for this.


[1]https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl#LanguageManualDDL-TruncateTable
[2]https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-truncate-table.html
[3]https://learn.microsoft.com/en-us/sql/t-sql/statements/truncate-table-transact-sql?view=sql-server-ver16



Best regards,
Yuxia

- 原始邮件 -
发件人: "Ran Tao" 
收件人: "dev" 
发送时间: 星期四, 2023年 3 月 23日 下午 6:28:17
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

Hi, yuxia.

Thanks for starting the discussion.
I think it's a nice improvement to support TRUNCATE TABLE statement because
many other mature engines supports it.

I have some questions.
1. because table has different types, whether we will support view or
temporary tables?

2. some other engines such as spark and hive support TRUNCATE TABLE with
partition. whether we will support?
btw, i think you need give the TRUNCATE TABLE concrete syntax in the FLIP
because some engines has different syntaxes.
for example, hive allow TRUNCATE TABLE be TRUNCATE [TABLE] which means
TABLE keyword can be optional.

3. The Proposed Changes try to use SqlToOperationConverter and run in
TableEnvironmentImpl#executeInternal.
I think it's out of date, the community is refactoring the conversion logic
from SqlNode to operation[1] and executions in TableEnvironmentImpl[2].
I suggest you can use new way to support it.

[1] https://issues.apache.org/jira/browse/FLINK-31464
[2] https://issues.apache.org/jira/browse/FLINK-31368

Best Regards,
Ran Tao
https://github.com/chucheng92


yuxia  于2023年3月22日周三 21:13写道:

> Hi, devs.
>
> I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> statement [1].
>
> The TRUNCATE TABLE statement is a SQL command that allows users to quickly
> and efficiently delete all rows from a table without dropping the table
> itself. This statement is commonly used in data warehouse, where large data
> sets are frequently loaded and unloaded from tables.
> So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly,
> this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with
> which the coresponding connectors can implement their own logic for
> truncating table.
>
> Looking forwards to your feedbac

Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-03-23 Thread yuxia
Thanks all for your feedback.

@Shammon FY
My gut feeling is that the end user shouldn't care about whether it'll delete 
direcotry or move to Trash directory with the TRUNCATE TABLE statement. They 
only need to know it will delete all rows from a table.
To me, I think delete directory or move to trash is more likely to be a 
behavior of external storage level instead of SQL statement level. In Hive, if 
user configure Trash, it will then move files to trash for DROP statment.
Also, hardly did I see such usage with TRUNCATE TABLE statement in other 
engines. What's more, to support it, we have to extend the TRUNCATE TABLE synax 
which won't then compliant with SQL standard. I really don't want to do that 
and I believe it'll make user confused if we do so.

@Hang
`TRUNCATE TABLE` is meant to delete all rows of a base table. So, it makes no 
sense that table source implements it.
If user use TRUNCATE TABLE statement to truncate a table, the planner will only 
try to
find the DynamicTableSink for the corresponding table. 

@Ran Tao
1: Thanks for you reminder. I said it won't support view in the FLIP, but 
forget to said temporary table is also not supported. Now, I add this part to 
this FLIP.

2: Yes, I also considered to incldue it in this FLIP before. But as far as I 
see, I haven't seen much usage of truncate table with partition. It's not as 
useful as truncate table. So, I tend to keep this FLIP simple in here without 
supporting truncate table with partition.
Also, seems for `truncate table with partition`, differnet engines may have 
differernt syntax;
Hive[1]/Spark[2] use the following syntax:
TRUNCATE TABLE table_name [PARTITION partition_spec]

SqlServer[3] use the follwoing syntax:
TRUNCATE TABLE { database_name.schema_name.table_name | schema_name.table_name 
| table_name } [ WITH ( PARTITIONS ( {  |  }
So, I'm tend to be cautious about it.

But I'm open to this. If there's any feedback or strong requirement, I don't 
mind to add it in this FLIP.
If we do need it in some day, I can propose it in a new FLIP. It won't break 
the current design.

As for concrete syntax in the FLIP, I think the current one is the concrete 
syntax, we don't allow TABLE keyword to be optional.

3: Thanks for your reminder, I have updadted the FLIP for this.


[1]https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl#LanguageManualDDL-TruncateTable
[2]https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-truncate-table.html
[3]https://learn.microsoft.com/en-us/sql/t-sql/statements/truncate-table-transact-sql?view=sql-server-ver16



Best regards,
Yuxia

- 原始邮件 -
发件人: "Ran Tao" 
收件人: "dev" 
发送时间: 星期四, 2023年 3 月 23日 下午 6:28:17
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

Hi, yuxia.

Thanks for starting the discussion.
I think it's a nice improvement to support TRUNCATE TABLE statement because
many other mature engines supports it.

I have some questions.
1. because table has different types, whether we will support view or
temporary tables?

2. some other engines such as spark and hive support TRUNCATE TABLE with
partition. whether we will support?
btw, i think you need give the TRUNCATE TABLE concrete syntax in the FLIP
because some engines has different syntaxes.
for example, hive allow TRUNCATE TABLE be TRUNCATE [TABLE] which means
TABLE keyword can be optional.

3. The Proposed Changes try to use SqlToOperationConverter and run in
TableEnvironmentImpl#executeInternal.
I think it's out of date, the community is refactoring the conversion logic
from SqlNode to operation[1] and executions in TableEnvironmentImpl[2].
I suggest you can use new way to support it.

[1] https://issues.apache.org/jira/browse/FLINK-31464
[2] https://issues.apache.org/jira/browse/FLINK-31368

Best Regards,
Ran Tao
https://github.com/chucheng92


yuxia  于2023年3月22日周三 21:13写道:

> Hi, devs.
>
> I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> statement [1].
>
> The TRUNCATE TABLE statement is a SQL command that allows users to quickly
> and efficiently delete all rows from a table without dropping the table
> itself. This statement is commonly used in data warehouse, where large data
> sets are frequently loaded and unloaded from tables.
> So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly,
> this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with
> which the coresponding connectors can implement their own logic for
> truncating table.
>
> Looking forwards to your feedback.
>
> [1]: [
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> |
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> ]
>
>
> Best regards,
> Yuxia
>


Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-03-23 Thread Ran Tao
Hi, yuxia.

Thanks for starting the discussion.
I think it's a nice improvement to support TRUNCATE TABLE statement because
many other mature engines supports it.

I have some questions.
1. because table has different types, whether we will support view or
temporary tables?

2. some other engines such as spark and hive support TRUNCATE TABLE with
partition. whether we will support?
btw, i think you need give the TRUNCATE TABLE concrete syntax in the FLIP
because some engines has different syntaxes.
for example, hive allow TRUNCATE TABLE be TRUNCATE [TABLE] which means
TABLE keyword can be optional.

3. The Proposed Changes try to use SqlToOperationConverter and run in
TableEnvironmentImpl#executeInternal.
I think it's out of date, the community is refactoring the conversion logic
from SqlNode to operation[1] and executions in TableEnvironmentImpl[2].
I suggest you can use new way to support it.

[1] https://issues.apache.org/jira/browse/FLINK-31464
[2] https://issues.apache.org/jira/browse/FLINK-31368

Best Regards,
Ran Tao
https://github.com/chucheng92


yuxia  于2023年3月22日周三 21:13写道:

> Hi, devs.
>
> I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> statement [1].
>
> The TRUNCATE TABLE statement is a SQL command that allows users to quickly
> and efficiently delete all rows from a table without dropping the table
> itself. This statement is commonly used in data warehouse, where large data
> sets are frequently loaded and unloaded from tables.
> So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly,
> this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with
> which the coresponding connectors can implement their own logic for
> truncating table.
>
> Looking forwards to your feedback.
>
> [1]: [
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> |
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> ]
>
>
> Best regards,
> Yuxia
>


Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-03-23 Thread Hang Ruan
Hi, yuxia,

Thanks for starting the discussion.

I wonder what the behavior is when we truncate a table which is used as a
source. Source table and sink table may have different table options.
IMO, the truncate sql should be supported no matter which kind the table is.

Best,
Hang

Shammon FY  于2023年3月23日周四 08:55写道:

> Hi yuxia
>
> Thanks for initiating this discussion.
>
> There are usually two types of data deletion in a production environment:
> one is deleting data directly and the other is moving the data to the trash
> directory which will be deleted periodically by the underlying system.
>
> Can we distinguish between these two operations in the truncate syntax? Or
> support adding options in `with`?
>
> Best,
> Shammon FY
>
>
> On Wed, Mar 22, 2023 at 9:13 PM yuxia  wrote:
>
> > Hi, devs.
> >
> > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> > statement [1].
> >
> > The TRUNCATE TABLE statement is a SQL command that allows users to
> quickly
> > and efficiently delete all rows from a table without dropping the table
> > itself. This statement is commonly used in data warehouse, where large
> data
> > sets are frequently loaded and unloaded from tables.
> > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore
> exactly,
> > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface
> with
> > which the coresponding connectors can implement their own logic for
> > truncating table.
> >
> > Looking forwards to your feedback.
> >
> > [1]: [
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > |
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > ]
> >
> >
> > Best regards,
> > Yuxia
> >
>


Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-03-22 Thread Shammon FY
Hi yuxia

Thanks for initiating this discussion.

There are usually two types of data deletion in a production environment:
one is deleting data directly and the other is moving the data to the trash
directory which will be deleted periodically by the underlying system.

Can we distinguish between these two operations in the truncate syntax? Or
support adding options in `with`?

Best,
Shammon FY


On Wed, Mar 22, 2023 at 9:13 PM yuxia  wrote:

> Hi, devs.
>
> I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> statement [1].
>
> The TRUNCATE TABLE statement is a SQL command that allows users to quickly
> and efficiently delete all rows from a table without dropping the table
> itself. This statement is commonly used in data warehouse, where large data
> sets are frequently loaded and unloaded from tables.
> So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly,
> this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with
> which the coresponding connectors can implement their own logic for
> truncating table.
>
> Looking forwards to your feedback.
>
> [1]: [
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> |
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> ]
>
>
> Best regards,
> Yuxia
>


[DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-03-22 Thread yuxia
Hi, devs. 

I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE statement 
[1]. 

The TRUNCATE TABLE statement is a SQL command that allows users to quickly and 
efficiently delete all rows from a table without dropping the table itself. 
This statement is commonly used in data warehouse, where large data sets are 
frequently loaded and unloaded from tables. 
So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly, this 
FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with which the 
coresponding connectors can implement their own logic for truncating table. 

Looking forwards to your feedback. 

[1]: [ 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
 | 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
 ] 


Best regards, 
Yuxia