Re: Re: [VOTE] FLIP-214: Support Advanced Function DDL
Hi, godfrey The add/delete jar syntax parse is supported in table environment side currently, but the execution is implemented in SqlClient side. After this FLIP, we will move the execution to table environment, so here is no public api change. Moreover, I have updated the description in Core Code Design section. > -原始邮件- > 发件人: "godfrey he" > 发送时间: 2022-04-22 12:26:44 (星期五) > 收件人: dev > 抄送: > 主题: Re: [VOTE] FLIP-214: Support Advanced Function DDL > > hi Ron, > > I don't see any section mentioned `delete jar`, could you update it? > > Best, > Godfrey > > Jing Zhang 于2022年4月21日周四 17:57写道: > > > > Ron, > > +1 (binding) > > > > Thanks for driving this FLIP. > > > > Best, > > Jing Zhang > > > > Jark Wu 于2022年4月21日周四 11:31写道: > > > > > Thanks for driving this work @Ron, > > > > > > +1 (binding) > > > > > > Best, > > > Jark > > > > > > On Thu, 21 Apr 2022 at 10:42, Mang Zhang wrote: > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Best regards, > > > > Mang Zhang > > > > > > > > > > > > > > > > > > > > > > > > At 2022-04-20 18:28:28, "刘大龙" wrote: > > > > >Hi, everyone > > > > > > > > > > > > > > > > > > > > > > > > >I'd like to start a vote on FLIP-214: Support Advanced Function DDL [1] > > > > which has been discussed in [2]. > > > > > > > > > >The vote will be open for at least 72 hours unless there is an > > > > >objection > > > > or not enough votes. > > > > > > > > > > > > > > > > > > > > > > > > >[1] > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL > > > > > > > > > >[2] https://lists.apache.org/thread/7m5md150qgodgz1wllp5plx15j1nowx8 > > > > > > > > > > > > > > > > > > > > > > > > >Best, > > > > > > > > > >Ron > > > > > > > -- Best, Ron
[VOTE] FLIP-214: Support Advanced Function DDL
Hi, everyone I'd like to start a vote on FLIP-214: Support Advanced Function DDL [1] which has been discussed in [2]. The vote will be open for at least 72 hours unless there is an objection or not enough votes. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL [2] https://lists.apache.org/thread/7m5md150qgodgz1wllp5plx15j1nowx8 Best, Ron
Re: Re: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
Thanks for all discuss about this FLIP again. I will open a vote tomorrow. Best, Ron > -原始邮件- > 发件人: "Jark Wu" > 发送时间: 2022-04-19 16:03:22 (星期二) > 收件人: dev > 抄送: > 主题: Re: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced > Function DDL > > Thank Ron for updating the FLIP. > > I think the updated FLIP has addressed Martijn's concern. > I don't have other feedback. So +1 for a vote. > > Best, > Jark > > On Fri, 15 Apr 2022 at 16:36, 刘大龙 wrote: > > > Hi, Jingsong > > > > Thanks for your feedback, we will use flink FileSytem abstraction, so HDFS > > S3 OSS will be supported. > > > > Best, > > > > Ron > > > > > -原始邮件- > > > 发件人: "Jingsong Li" > > > 发送时间: 2022-04-14 17:55:03 (星期四) > > > 收件人: dev > > > 抄送: > > > 主题: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced > > Function DDL > > > > > > I agree with Martijn. > > > > > > At least, HDFS S3 OSS should be supported. > > > > > > Best, > > > Jingsong > > > > > > On Thu, Apr 14, 2022 at 4:46 PM Martijn Visser > > wrote: > > > > > > > > Hi Ron, > > > > > > > > The FLIP mentions that the priority will be set to support HDFS as a > > > > resource provider. I'm concerned that we end up with a partially > > > > implemented FLIP which only supports local and HDFS and then we move > > on to > > > > other features, as we see happen with others. I would argue that we > > should > > > > not focus on one resource provider, but that at least S3 support is > > > > included in the same Flink release as HDFS support is. > > > > > > > > Best regards, > > > > > > > > Martijn Visser > > > > https://twitter.com/MartijnVisser82 > > > > https://github.com/MartijnVisser > > > > > > > > > > > > On Thu, 14 Apr 2022 at 08:50, 刘大龙 wrote: > > > > > > > > > Hi, everyone > > > > > > > > > > First of all, thanks for the valuable suggestions received about this > > > > > FLIP. After some discussion, it looks like all concerns have been > > addressed > > > > > for now, so I will start a vote about this FLIP in two or three days > > later. > > > > > Also, further feedback is very welcome. > > > > > > > > > > Best, > > > > > > > > > > Ron > > > > > > > > > > > > > > > > -原始邮件- > > > > > > 发件人: "刘大龙" > > > > > > 发送时间: 2022-04-08 10:09:46 (星期五) > > > > > > 收件人: dev@flink.apache.org > > > > > > 抄送: > > > > > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced > > Function > > > > > DDL > > > > > > > > > > > > Hi, Martijn > > > > > > > > > > > > Do you have any question about this FLIP? looking forward to your > > more > > > > > feedback. > > > > > > > > > > > > Best, > > > > > > > > > > > > Ron > > > > > > > > > > > > > > > > > > > -原始邮件- > > > > > > > 发件人: "刘大龙" > > > > > > > 发送时间: 2022-03-29 19:33:58 (星期二) > > > > > > > 收件人: dev@flink.apache.org > > > > > > > 抄送: > > > > > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced > > Function DDL > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -原始邮件- > > > > > > > > 发件人: "Martijn Visser" > > > > > > > > 发送时间: 2022-03-24 16:18:14 (星期四) > > > > > > > > 收件人: dev > > > > > > > > 抄送: > > > > > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function > > DDL > > > > > > > > > > > > > > > > Hi Ron, > > > > > > > > > > > > > > > > Thanks for creating the FLIP. You're talking about both local > > and > > > > > remote > > > > > > > > resources. With regards to r
Re: Re: Re: Discussion about enhancing the partitioned table syntax
Hi,Martijn Thanks for the Jingsong remind. This issue is the part of FLIP [1] which propose the related partition syntax, In our this issue, here no new added DDL/DMLs is introduced about partitioned table, so these syntax have been discussed and voted in community. I think we don't need a new FLIP. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support Best, Ron > -原始邮件- > 发件人: "刘大龙" > 发送时间: 2022-04-15 17:15:36 (星期五) > 收件人: dev@flink.apache.org > 抄送: > 主题: Re: Re: Discussion about enhancing the partitioned table syntax > > Hi, Jingsong > > Thanks for your reply, this FLIP [1] is good to me, I have not found it. The > enhanced partition syntax in FLINK-27237 is a part of FLIP [1], there is no > new added DDL/DMLs, so these syntax have been discussed and voted in > community. This issue is continue to finish the work of syntax part in FLIP > [1]. > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support > > Best, > > Ron > > > -原始邮件- > > 发件人: "Jingsong Li" > > 发送时间: 2022-04-15 16:02:37 (星期五) > > 收件人: dev > > 抄送: > > 主题: Re: Discussion about enhancing the partitioned table syntax > > > > Thanks for taking this. > > > > There is a FLIP [1] to define some partition DMLs. > > > > If you have added DMLs, we need to vote again. (Create a new FLIP) > > > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support > > > > Best, > > Jingsong > > > > On Fri, Apr 15, 2022 at 3:20 PM Martijn Visser > > wrote: > > > > > > Hi Ron, > > > > > > Thanks for reaching out to the mailing list. As I mentioned in the Jira > > > ticket, a SQL syntax change requires a FLIP [1]. A FLIP requires a > > > discussion and a vote, which is explained in more detail in the Flink > > > Bylaws [2] > > > > > > Please create a FLIP and follow the process for this. > > > > > > Best regards, > > > > > > Martijn Visser > > > https://twitter.com/MartijnVisser82 > > > https://github.com/MartijnVisser > > > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > > [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws > > > > > > > > > > > > On Fri, 15 Apr 2022 at 09:08, 刘大龙 wrote: > > > > > > > Hi, everyone > > > > > > > > > > > > > > > > > > > > For partitioned table, there are still some related syntaxes that are > > > > not > > > > supported, which will be very useful for partitioned table. In batch > > > > analysis scenarios, such as Hive table, partitioned table is a very > > > > common > > > > case. Since the current Flink is mainly for streaming processing, not > > > > much > > > > work has been done on partitioned table. In order to enhance the > > > > experience > > > > of using Flink in batch job, as well as to improve the DDL syntax. So I > > > > created the issue FLINK-27237 to discuss about enhancing the partition > > > > table syntax. You are welcome to discuss on the issue and give comments > > > > so > > > > that this proposal can be improved. > > > > > > > > Best, > > > > > > > > > > > > > > > > > > > > Ron
Re: Re: Discussion about enhancing the partitioned table syntax
Hi, Jingsong Thanks for your reply, this FLIP [1] is good to me, I have not found it. The enhanced partition syntax in FLINK-27237 is a part of FLIP [1], there is no new added DDL/DMLs, so these syntax have been discussed and voted in community. This issue is continue to finish the work of syntax part in FLIP [1]. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support Best, Ron > -原始邮件- > 发件人: "Jingsong Li" > 发送时间: 2022-04-15 16:02:37 (星期五) > 收件人: dev > 抄送: > 主题: Re: Discussion about enhancing the partitioned table syntax > > Thanks for taking this. > > There is a FLIP [1] to define some partition DMLs. > > If you have added DMLs, we need to vote again. (Create a new FLIP) > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support > > Best, > Jingsong > > On Fri, Apr 15, 2022 at 3:20 PM Martijn Visser > wrote: > > > > Hi Ron, > > > > Thanks for reaching out to the mailing list. As I mentioned in the Jira > > ticket, a SQL syntax change requires a FLIP [1]. A FLIP requires a > > discussion and a vote, which is explained in more detail in the Flink > > Bylaws [2] > > > > Please create a FLIP and follow the process for this. > > > > Best regards, > > > > Martijn Visser > > https://twitter.com/MartijnVisser82 > > https://github.com/MartijnVisser > > > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws > > > > > > > > On Fri, 15 Apr 2022 at 09:08, 刘大龙 wrote: > > > > > Hi, everyone > > > > > > > > > > > > > > > For partitioned table, there are still some related syntaxes that are not > > > supported, which will be very useful for partitioned table. In batch > > > analysis scenarios, such as Hive table, partitioned table is a very common > > > case. Since the current Flink is mainly for streaming processing, not much > > > work has been done on partitioned table. In order to enhance the > > > experience > > > of using Flink in batch job, as well as to improve the DDL syntax. So I > > > created the issue FLINK-27237 to discuss about enhancing the partition > > > table syntax. You are welcome to discuss on the issue and give comments so > > > that this proposal can be improved. > > > > > > Best, > > > > > > > > > > > > > > > Ron
Re: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
Hi, Jingsong Thanks for your feedback, we will use flink FileSytem abstraction, so HDFS S3 OSS will be supported. Best, Ron > -原始邮件- > 发件人: "Jingsong Li" > 发送时间: 2022-04-14 17:55:03 (星期四) > 收件人: dev > 抄送: > 主题: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced > Function DDL > > I agree with Martijn. > > At least, HDFS S3 OSS should be supported. > > Best, > Jingsong > > On Thu, Apr 14, 2022 at 4:46 PM Martijn Visser wrote: > > > > Hi Ron, > > > > The FLIP mentions that the priority will be set to support HDFS as a > > resource provider. I'm concerned that we end up with a partially > > implemented FLIP which only supports local and HDFS and then we move on to > > other features, as we see happen with others. I would argue that we should > > not focus on one resource provider, but that at least S3 support is > > included in the same Flink release as HDFS support is. > > > > Best regards, > > > > Martijn Visser > > https://twitter.com/MartijnVisser82 > > https://github.com/MartijnVisser > > > > > > On Thu, 14 Apr 2022 at 08:50, 刘大龙 wrote: > > > > > Hi, everyone > > > > > > First of all, thanks for the valuable suggestions received about this > > > FLIP. After some discussion, it looks like all concerns have been > > > addressed > > > for now, so I will start a vote about this FLIP in two or three days > > > later. > > > Also, further feedback is very welcome. > > > > > > Best, > > > > > > Ron > > > > > > > > > > -原始邮件- > > > > 发件人: "刘大龙" > > > > 发送时间: 2022-04-08 10:09:46 (星期五) > > > > 收件人: dev@flink.apache.org > > > > 抄送: > > > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function > > > DDL > > > > > > > > Hi, Martijn > > > > > > > > Do you have any question about this FLIP? looking forward to your more > > > feedback. > > > > > > > > Best, > > > > > > > > Ron > > > > > > > > > > > > > -原始邮件- > > > > > 发件人: "刘大龙" > > > > > 发送时间: 2022-03-29 19:33:58 (星期二) > > > > > 收件人: dev@flink.apache.org > > > > > 抄送: > > > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > > > > > > > > > > > > > > > > > > > -原始邮件- > > > > > > 发件人: "Martijn Visser" > > > > > > 发送时间: 2022-03-24 16:18:14 (星期四) > > > > > > 收件人: dev > > > > > > 抄送: > > > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > > > > > Hi Ron, > > > > > > > > > > > > Thanks for creating the FLIP. You're talking about both local and > > > remote > > > > > > resources. With regards to remote resources, how do you see this > > > work with > > > > > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop > > > > > > dependencies are not packaged, but I would hope that we do that for > > > all > > > > > > filesystem implementation. I don't think it's a good idea to have > > > any tight > > > > > > coupling to file system implementations, especially if at some point > > > we > > > > > > could also externalize file system implementations (like we're doing > > > for > > > > > > connectors already). I think the FLIP would be better by not only > > > > > > referring to "Hadoop" as a remote resource provider, but a more > > > generic > > > > > > term since there are more options than Hadoop. > > > > > > > > > > > > I'm also thinking about security/operations implications: would it > > > > > > be > > > > > > possible for bad actor X to create a JAR that either influences > > > > > > other > > > > > > running jobs, leaks data or credentials or anything else? If so, I > > > think it > > > > > > would also be good to have an option to disable this feature > > > completely. I > > > > > > think there are roughly two types of companies who run Flink:
Re: Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
Hi, Martijn My description in the FLIP is not very clear, we will use flink FileSystem abstraction to download resource, so HDFS/S3/OSS etc are will be supported in first version. Best, Ron > -原始邮件- > 发件人: "Martijn Visser" > 发送时间: 2022-04-14 16:46:24 (星期四) > 收件人: dev > 抄送: > 主题: Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > Hi Ron, > > The FLIP mentions that the priority will be set to support HDFS as a > resource provider. I'm concerned that we end up with a partially > implemented FLIP which only supports local and HDFS and then we move on to > other features, as we see happen with others. I would argue that we should > not focus on one resource provider, but that at least S3 support is > included in the same Flink release as HDFS support is. > > Best regards, > > Martijn Visser > https://twitter.com/MartijnVisser82 > https://github.com/MartijnVisser > > > On Thu, 14 Apr 2022 at 08:50, 刘大龙 wrote: > > > Hi, everyone > > > > First of all, thanks for the valuable suggestions received about this > > FLIP. After some discussion, it looks like all concerns have been addressed > > for now, so I will start a vote about this FLIP in two or three days later. > > Also, further feedback is very welcome. > > > > Best, > > > > Ron > > > > > > > -原始邮件- > > > 发件人: "刘大龙" > > > 发送时间: 2022-04-08 10:09:46 (星期五) > > > 收件人: dev@flink.apache.org > > > 抄送: > > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function > > DDL > > > > > > Hi, Martijn > > > > > > Do you have any question about this FLIP? looking forward to your more > > feedback. > > > > > > Best, > > > > > > Ron > > > > > > > > > > -原始邮件- > > > > 发件人: "刘大龙" > > > > 发送时间: 2022-03-29 19:33:58 (星期二) > > > > 收件人: dev@flink.apache.org > > > > 抄送: > > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > > > > > > > > > > > > > > -原始邮件- > > > > > 发件人: "Martijn Visser" > > > > > 发送时间: 2022-03-24 16:18:14 (星期四) > > > > > 收件人: dev > > > > > 抄送: > > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > > > Hi Ron, > > > > > > > > > > Thanks for creating the FLIP. You're talking about both local and > > remote > > > > > resources. With regards to remote resources, how do you see this > > work with > > > > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop > > > > > dependencies are not packaged, but I would hope that we do that for > > all > > > > > filesystem implementation. I don't think it's a good idea to have > > any tight > > > > > coupling to file system implementations, especially if at some point > > we > > > > > could also externalize file system implementations (like we're doing > > for > > > > > connectors already). I think the FLIP would be better by not only > > > > > referring to "Hadoop" as a remote resource provider, but a more > > generic > > > > > term since there are more options than Hadoop. > > > > > > > > > > I'm also thinking about security/operations implications: would it be > > > > > possible for bad actor X to create a JAR that either influences other > > > > > running jobs, leaks data or credentials or anything else? If so, I > > think it > > > > > would also be good to have an option to disable this feature > > completely. I > > > > > think there are roughly two types of companies who run Flink: those > > who > > > > > open it up for everyone to use (here the feature would be welcomed) > > and > > > > > those who need to follow certain minimum standards/have a more > > closed Flink > > > > > ecosystem). They usually want to validate a JAR upfront before > > making it > > > > > available, even at the expense of speed, because it gives them more > > control > > > > > over what will be running in their environment. > > > > > > > > > > Best regards, > > > > > > > > > > Martijn Visser > > > &
Discussion about enhancing the partitioned table syntax
Hi, everyone For partitioned table, there are still some related syntaxes that are not supported, which will be very useful for partitioned table. In batch analysis scenarios, such as Hive table, partitioned table is a very common case. Since the current Flink is mainly for streaming processing, not much work has been done on partitioned table. In order to enhance the experience of using Flink in batch job, as well as to improve the DDL syntax. So I created the issue FLINK-27237 to discuss about enhancing the partition table syntax. You are welcome to discuss on the issue and give comments so that this proposal can be improved. Best, Ron
Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
Hi, everyone First of all, thanks for the valuable suggestions received about this FLIP. After some discussion, it looks like all concerns have been addressed for now, so I will start a vote about this FLIP in two or three days later. Also, further feedback is very welcome. Best, Ron > -原始邮件- > 发件人: "刘大龙" > 发送时间: 2022-04-08 10:09:46 (星期五) > 收件人: dev@flink.apache.org > 抄送: > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > Hi, Martijn > > Do you have any question about this FLIP? looking forward to your more > feedback. > > Best, > > Ron > > > > -原始邮件- > > 发件人: "刘大龙" > > 发送时间: 2022-03-29 19:33:58 (星期二) > > 收件人: dev@flink.apache.org > > 抄送: > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > > > > -原始邮件- > > > 发件人: "Martijn Visser" > > > 发送时间: 2022-03-24 16:18:14 (星期四) > > > 收件人: dev > > > 抄送: > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > Hi Ron, > > > > > > Thanks for creating the FLIP. You're talking about both local and remote > > > resources. With regards to remote resources, how do you see this work with > > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop > > > dependencies are not packaged, but I would hope that we do that for all > > > filesystem implementation. I don't think it's a good idea to have any > > > tight > > > coupling to file system implementations, especially if at some point we > > > could also externalize file system implementations (like we're doing for > > > connectors already). I think the FLIP would be better by not only > > > referring to "Hadoop" as a remote resource provider, but a more generic > > > term since there are more options than Hadoop. > > > > > > I'm also thinking about security/operations implications: would it be > > > possible for bad actor X to create a JAR that either influences other > > > running jobs, leaks data or credentials or anything else? If so, I think > > > it > > > would also be good to have an option to disable this feature completely. I > > > think there are roughly two types of companies who run Flink: those who > > > open it up for everyone to use (here the feature would be welcomed) and > > > those who need to follow certain minimum standards/have a more closed > > > Flink > > > ecosystem). They usually want to validate a JAR upfront before making it > > > available, even at the expense of speed, because it gives them more > > > control > > > over what will be running in their environment. > > > > > > Best regards, > > > > > > Martijn Visser > > > https://twitter.com/MartijnVisser82 > > > > > > > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 wrote: > > > > > > > > > > > > > > > > > > > > -原始邮件- > > > > > 发件人: "Peter Huang" > > > > > 发送时间: 2022-03-23 11:13:32 (星期三) > > > > > 收件人: dev > > > > > 抄送: > > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > > > Hi Ron, > > > > > > > > > > Thanks for reviving the discussion of the work. The design looks > > > > > good. A > > > > > small typo in the FLIP is that currently it is marked as released in > > > > 1.16. > > > > > > > > > > > > > > > Best Regards > > > > > Peter Huang > > > > > > > > > > > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang > > > > > wrote: > > > > > > > > > > > hi Yuxia, > > > > > > > > > > > > > > > > > > Thanks for your reply. Your reminder is very important ! > > > > > > > > > > > > > > > > > > Since we download the file to the local, remember to clean it up > > > > > > when > > > > the > > > > > > flink client exits > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
Hi, Martijn Do you have any question about this FLIP? looking forward to your more feedback. Best, Ron > -原始邮件- > 发件人: "刘大龙" > 发送时间: 2022-03-29 19:33:58 (星期二) > 收件人: dev@flink.apache.org > 抄送: > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > -原始邮件- > > 发件人: "Martijn Visser" > > 发送时间: 2022-03-24 16:18:14 (星期四) > > 收件人: dev > > 抄送: > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > Hi Ron, > > > > Thanks for creating the FLIP. You're talking about both local and remote > > resources. With regards to remote resources, how do you see this work with > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop > > dependencies are not packaged, but I would hope that we do that for all > > filesystem implementation. I don't think it's a good idea to have any tight > > coupling to file system implementations, especially if at some point we > > could also externalize file system implementations (like we're doing for > > connectors already). I think the FLIP would be better by not only > > referring to "Hadoop" as a remote resource provider, but a more generic > > term since there are more options than Hadoop. > > > > I'm also thinking about security/operations implications: would it be > > possible for bad actor X to create a JAR that either influences other > > running jobs, leaks data or credentials or anything else? If so, I think it > > would also be good to have an option to disable this feature completely. I > > think there are roughly two types of companies who run Flink: those who > > open it up for everyone to use (here the feature would be welcomed) and > > those who need to follow certain minimum standards/have a more closed Flink > > ecosystem). They usually want to validate a JAR upfront before making it > > available, even at the expense of speed, because it gives them more control > > over what will be running in their environment. > > > > Best regards, > > > > Martijn Visser > > https://twitter.com/MartijnVisser82 > > > > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 wrote: > > > > > > > > > > > > > > > -原始邮件- > > > > 发件人: "Peter Huang" > > > > 发送时间: 2022-03-23 11:13:32 (星期三) > > > > 收件人: dev > > > > 抄送: > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > > > Hi Ron, > > > > > > > > Thanks for reviving the discussion of the work. The design looks good. A > > > > small typo in the FLIP is that currently it is marked as released in > > > 1.16. > > > > > > > > > > > > Best Regards > > > > Peter Huang > > > > > > > > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang wrote: > > > > > > > > > hi Yuxia, > > > > > > > > > > > > > > > Thanks for your reply. Your reminder is very important ! > > > > > > > > > > > > > > > Since we download the file to the local, remember to clean it up when > > > the > > > > > flink client exits > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Best regards, > > > > > Mang Zhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)" > > > > > wrote: > > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will > > > > > benefit from it. The flip looks good to me. I just have two minor > > > questions: > > > > > >1. For synax explanation, I see it's "Create function as > > > > > identifier", I think the word "identifier" may not be > > > > > self-dedescriptive for actually it's not a random name but the name of > > > the > > > > > class that provides the implementation for function to be create. > > > > > >May be it'll be more clear to use "class_name&qu
Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
. > > > > I'm also thinking about security/operations implications: would it be > > possible for bad actor X to create a JAR that either influences other > > running jobs, leaks data or credentials or anything else? If so, I think it > > would also be good to have an option to disable this feature completely. I > > think there are roughly two types of companies who run Flink: those who > > open it up for everyone to use (here the feature would be welcomed) and > > those who need to follow certain minimum standards/have a more closed Flink > > ecosystem). They usually want to validate a JAR upfront before making it > > available, even at the expense of speed, because it gives them more control > > over what will be running in their environment. > > > > Best regards, > > > > Martijn Visser > > https://twitter.com/MartijnVisser82 > > > > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 wrote: > > > >> > >> > >>> -原始邮件- > >>> 发件人: "Peter Huang" > >>> 发送时间: 2022-03-23 11:13:32 (星期三) > >>> 收件人: dev > >>> 抄送: > >>> 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > >>> > >>> Hi Ron, > >>> > >>> Thanks for reviving the discussion of the work. The design looks good. A > >>> small typo in the FLIP is that currently it is marked as released in > >> 1.16. > >>> > >>> Best Regards > >>> Peter Huang > >>> > >>> > >>> On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang wrote: > >>> > >>>> hi Yuxia, > >>>> > >>>> > >>>> Thanks for your reply. Your reminder is very important ! > >>>> > >>>> > >>>> Since we download the file to the local, remember to clean it up when > >> the > >>>> flink client exits > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> > >>>> Best regards, > >>>> Mang Zhang > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> At 2022-03-23 10:02:26, "罗宇侠(莫辞)" > >>>> wrote: > >>>>> Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will > >>>> benefit from it. The flip looks good to me. I just have two minor > >> questions: > >>>>> 1. For synax explanation, I see it's "Create function as > >>>> identifier", I think the word "identifier" may not be > >>>> self-dedescriptive for actually it's not a random name but the name of > >> the > >>>> class that provides the implementation for function to be create. > >>>>> May be it'll be more clear to use "class_name" replace "identifier" > >> just > >>>> like what Hive[1]/Spark[2] do. > >>>>> 2. >> If the resource used is a remote resource, it will first > >> download > >>>> the resource to a local temporary directory, which will be generated > >> using > >>>> UUID, and then register the local path to the user class loader. > >>>>> For the above explanation in this FLIP, It seems for such statement > >> sets, > >>>>> "" > >>>>> Create function as org.apache.udf1 using jar 'hdfs://myudfs.jar'; > >>>>> Create function as org.apache.udf2 using jar 'hdfs://myudfs.jar'; > >>>>> "" > >>>>> it'll download the resource 'hdfs://myudfs.jar' for twice. So is it > >>>> possible to provide some cache mechanism that we won't need to > >> download / > >>>> store for twice? > >>>>> > >>>>> Best regards, > >>>>> Yuxia > >>>>> [1] > >> https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl > >>>>> [2] > >> https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html-- > >>>>> 发件人:Mang Zhang > >>>>> 日 期:2022年03月22日 11:35:24 > >>>>> 收件人: > >>>>>
Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> -原始邮件- > 发件人: "Martijn Visser" > 发送时间: 2022-03-24 16:18:14 (星期四) > 收件人: dev > 抄送: > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > Hi Ron, > > Thanks for creating the FLIP. You're talking about both local and remote > resources. With regards to remote resources, how do you see this work with > Flink's filesystem abstraction? I did read in the FLIP that Hadoop > dependencies are not packaged, but I would hope that we do that for all > filesystem implementation. I don't think it's a good idea to have any tight > coupling to file system implementations, especially if at some point we > could also externalize file system implementations (like we're doing for > connectors already). I think the FLIP would be better by not only > referring to "Hadoop" as a remote resource provider, but a more generic > term since there are more options than Hadoop. > > I'm also thinking about security/operations implications: would it be > possible for bad actor X to create a JAR that either influences other > running jobs, leaks data or credentials or anything else? If so, I think it > would also be good to have an option to disable this feature completely. I > think there are roughly two types of companies who run Flink: those who > open it up for everyone to use (here the feature would be welcomed) and > those who need to follow certain minimum standards/have a more closed Flink > ecosystem). They usually want to validate a JAR upfront before making it > available, even at the expense of speed, because it gives them more control > over what will be running in their environment. > > Best regards, > > Martijn Visser > https://twitter.com/MartijnVisser82 > > > On Wed, 23 Mar 2022 at 16:47, 刘大龙 wrote: > > > > > > > > > > -原始邮件- > > > 发件人: "Peter Huang" > > > 发送时间: 2022-03-23 11:13:32 (星期三) > > > 收件人: dev > > > 抄送: > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > > Hi Ron, > > > > > > Thanks for reviving the discussion of the work. The design looks good. A > > > small typo in the FLIP is that currently it is marked as released in > > 1.16. > > > > > > > > > Best Regards > > > Peter Huang > > > > > > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang wrote: > > > > > > > hi Yuxia, > > > > > > > > > > > > Thanks for your reply. Your reminder is very important ! > > > > > > > > > > > > Since we download the file to the local, remember to clean it up when > > the > > > > flink client exits > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Best regards, > > > > Mang Zhang > > > > > > > > > > > > > > > > > > > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)" > > > > wrote: > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will > > > > benefit from it. The flip looks good to me. I just have two minor > > questions: > > > > >1. For synax explanation, I see it's "Create function as > > > > identifier", I think the word "identifier" may not be > > > > self-dedescriptive for actually it's not a random name but the name of > > the > > > > class that provides the implementation for function to be create. > > > > >May be it'll be more clear to use "class_name" replace "identifier" > > just > > > > like what Hive[1]/Spark[2] do. > > > > > > > > > >2. >> If the resource used is a remote resource, it will first > > download > > > > the resource to a local temporary directory, which will be generated > > using > > > > UUID, and then register the local path to the user class loader. > > > > >For the above explanation in this FLIP, It seems for such statement > > sets, > > > > >"" > > > > >Create function as org.apache.udf1 using jar 'hdfs://myudfs.jar'; > > > > >Create function as org.apache.udf2 using jar 'hdfs://myudfs.jar'; > > > > >"" > > > > > it'll download the resou
Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> -原始邮件- > 发件人: "Peter Huang" > 发送时间: 2022-03-23 11:13:32 (星期三) > 收件人: dev > 抄送: > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > Hi Ron, > > Thanks for reviving the discussion of the work. The design looks good. A > small typo in the FLIP is that currently it is marked as released in 1.16. > > > Best Regards > Peter Huang > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang wrote: > > > hi Yuxia, > > > > > > Thanks for your reply. Your reminder is very important ! > > > > > > Since we download the file to the local, remember to clean it up when the > > flink client exits > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > Mang Zhang > > > > > > > > > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)" > > wrote: > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will > > benefit from it. The flip looks good to me. I just have two minor questions: > > >1. For synax explanation, I see it's "Create function as > > identifier", I think the word "identifier" may not be > > self-dedescriptive for actually it's not a random name but the name of the > > class that provides the implementation for function to be create. > > >May be it'll be more clear to use "class_name" replace "identifier" just > > like what Hive[1]/Spark[2] do. > > > > > >2. >> If the resource used is a remote resource, it will first download > > the resource to a local temporary directory, which will be generated using > > UUID, and then register the local path to the user class loader. > > >For the above explanation in this FLIP, It seems for such statement sets, > > >"" > > >Create function as org.apache.udf1 using jar 'hdfs://myudfs.jar'; > > >Create function as org.apache.udf2 using jar 'hdfs://myudfs.jar'; > > >"" > > > it'll download the resource 'hdfs://myudfs.jar' for twice. So is it > > possible to provide some cache mechanism that we won't need to download / > > store for twice? > > > > > > > > >Best regards, > > >Yuxia > > >[1] https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl > > >[2] > > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html-- > > >发件人:Mang Zhang > > >日 期:2022年03月22日 11:35:24 > > >收件人: > > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > > > > >Hi Ron, Thank you so much for this suggestion, this is so good. > > >In our company, when users use custom UDF, it is very inconvenient, and > > the code needs to be packaged into the job jar, > > >and cannot refer to the existing udf jar through the existing udf jar. > > >Or pass in the jar reference in the startup command. > > >If we implement this feature, users can focus on their own business > > development. > > >I can also contribute if needed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >-- > > > > > >Best regards, > > >Mang Zhang > > > > > > > > > > > > > > > > > >At 2022-03-21 14:57:32, "刘大龙" wrote: > > >>Hi, everyone > > >> > > >> > > >> > > >> > > >>I would like to open a discussion for support advanced Function DDL, > > this proposal is a continuation of FLIP-79 in which Flink Function DDL is > > defined. Until now it is partially released as the Flink function DDL with > > user defined resources is not clearly discussed and implemented. It is an > > important feature for support to register UDF with custom jar resource, > > users can use UDF more more easily without having to put jars under the > > classpath in advance. > > >> > > >>Looking forward to your feedback. > > >> > > >> > > >> > > >> > > >>[1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL > > >> > > >> > > >> > > >> > > >>Best, > > >> > > >>Ron > > >> > > >> > > > > > Hi, Peter, Thanks for your feedback. This work also has your effort, thank you very much.
Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> -原始邮件- > 发件人: "罗宇侠(莫辞)" > 发送时间: 2022-03-23 10:02:26 (星期三) > 收件人: "Mang Zhang" , "Flink Dev" > 抄送: > 主题: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will benefit > from it. The flip looks good to me. I just have two minor questions: > 1. For synax explanation, I see it's "Create function as > identifier", I think the word "identifier" may not be self-dedescriptive > for actually it's not a random name but the name of the class that provides > the implementation for function to be create. > May be it'll be more clear to use "class_name" replace "identifier" just like > what Hive[1]/Spark[2] do. > > 2. >> If the resource used is a remote resource, it will first download the > resource to a local temporary directory, which will be generated using UUID, > and then register the local path to the user class loader. > For the above explanation in this FLIP, It seems for such statement sets, > "" > Create function as org.apache.udf1 using jar 'hdfs://myudfs.jar'; > Create function as org.apache.udf2 using jar 'hdfs://myudfs.jar'; > "" > it'll download the resource 'hdfs://myudfs.jar' for twice. So is it possible > to provide some cache mechanism that we won't need to download / store for > twice? > > > Best regards, > Yuxia > [1] https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl > [2] > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html-- > 发件人:Mang Zhang > 日 期:2022年03月22日 11:35:24 > 收件人: > 主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > Hi Ron, Thank you so much for this suggestion, this is so good. > In our company, when users use custom UDF, it is very inconvenient, and the > code needs to be packaged into the job jar, > and cannot refer to the existing udf jar through the existing udf jar. > Or pass in the jar reference in the startup command. > If we implement this feature, users can focus on their own business > development. > I can also contribute if needed. > > > > > > > > > > > > > > > > -- > > Best regards, > Mang Zhang > > > > > > At 2022-03-21 14:57:32, "刘大龙" wrote: > >Hi, everyone > > > > > > > > > >I would like to open a discussion for support advanced Function DDL, this > >proposal is a continuation of FLIP-79 in which Flink Function DDL is > >defined. Until now it is partially released as the Flink function DDL with > >user defined resources is not clearly discussed and implemented. It is an > >important feature for support to register UDF with custom jar resource, > >users can use UDF more more easily without having to put jars under the > >classpath in advance. > > > >Looking forward to your feedback. > > > > > > > > > >[1] > >https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL > > > > > > > > > >Best, > > > >Ron > > > > > Hi, Yuxia, thanks for your feedback. It is very good for your advice. 1. I think you are right, "identifier" must be the class name which provides the implementation for function. We should use "class_name" replace "identifier". 2. Yes, we should cache the resource in local when their url are the same. This will be considered in code implementation.
Re: Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> -原始邮件- > 发件人: "Mang Zhang" > 发送时间: 2022-03-22 11:35:24 (星期二) > 收件人: dev@flink.apache.org > 抄送: > 主题: Re:[DISCUSS] FLIP-214 Support Advanced Function DDL > > Hi Ron, Thank you so much for this suggestion, this is so good. > In our company, when users use custom UDF, it is very inconvenient, and the > code needs to be packaged into the job jar, > and cannot refer to the existing udf jar through the existing udf jar. > Or pass in the jar reference in the startup command. > If we implement this feature, users can focus on their own business > development. > I can also contribute if needed. > > > > > > > > > > > > > > > > -- > > Best regards, > Mang Zhang > > > > > > At 2022-03-21 14:57:32, "刘大龙" wrote: > >Hi, everyone > > > > > > > > > >I would like to open a discussion for support advanced Function DDL, this > >proposal is a continuation of FLIP-79 in which Flink Function DDL is > >defined. Until now it is partially released as the Flink function DDL with > >user defined resources is not clearly discussed and implemented. It is an > >important feature for support to register UDF with custom jar resource, > >users can use UDF more more easily without having to put jars under the > >classpath in advance. > > > >Looking forward to your feedback. > > > > > > > > > >[1] > >https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL > > > > > > > > > >Best, > > > >Ron > > > > Hi, Mang Glad to receive your feedback, the advice you gave from the perspective of actual production use case within your company made this proposal even more meaningful to me. Thank you very much. Very welcome to contribute together. Best, Ron
[DISCUSS] FLIP-214 Support Advanced Function DDL
Hi, everyone I would like to open a discussion for support advanced Function DDL, this proposal is a continuation of FLIP-79 in which Flink Function DDL is defined. Until now it is partially released as the Flink function DDL with user defined resources is not clearly discussed and implemented. It is an important feature for support to register UDF with custom jar resource, users can use UDF more more easily without having to put jars under the classpath in advance. Looking forward to your feedback. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL Best, Ron
Re: Re: Re: [ANNOUNCE] New Apache Flink Committer - Rui Li
Congratulations Rui! Best > -原始邮件- > 发件人: "Benchao Li" > 发送时间: 2021-04-22 14:43:33 (星期四) > 收件人: dev > 抄送: > 主题: Re: Re: [ANNOUNCE] New Apache Flink Committer - Rui Li > > Congratulations Rui! > > Jingsong Li 于2021年4月22日周四 下午2:33写道: > > > Congratulations Rui! > > > > Best, > > Jingsong > > > > On Thu, Apr 22, 2021 at 11:52 AM Yun Gao > > wrote: > > > > > Congratulations Rui! > > > > > > Best, > > > Yun > > > > > > > > > -- > > > Sender:Nicholas Jiang > > > Date:2021/04/22 11:26:05 > > > Recipient: > > > Theme:Re: [ANNOUNCE] New Apache Flink Committer - Rui Li > > > > > > Congrats, Rui! > > > > > > Best, > > > Nicholas Jiang > > > > > > > > > > > > -- > > > Sent from: > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > > > > > > > > > -- > > Best, Jingsong Lee > > > > > -- > > Best, > Benchao Li
Re: Re: [VOTE] FLIP-145: Support SQL windowing table-valued function (2nd)
+1 > -原始邮件- > 发件人: "Timo Walther" > 发送时间: 2020-11-11 18:55:06 (星期三) > 收件人: dev@flink.apache.org > 抄送: > 主题: Re: [VOTE] FLIP-145: Support SQL windowing table-valued function (2nd) > > +1 (binding) > > Thanks, > Timo > > On 11.11.20 07:14, Pengcheng Liu wrote: > > +1 (binding) > > > > Jark Wu 于2020年11月11日周三 上午10:13写道: > > > >> +1 (binding) > >> > >> On Tue, 10 Nov 2020 at 14:59, Jark Wu wrote: > >> > >>> Hi all, > >>> > >>> There is new feedback on the FLIP-145. So I would like to start a new > >> vote > >>> for FLIP-145 [1], > >>> which has been discussed and reached consensus in the discussion thread > >>> [2]. > >>> > >>> The vote will be open until 15:00 (UTC+8) 13th Nov. (72h), unless there > >> is > >>> an objection or not enough votes. > >>> > >>> Best, > >>> Jark > >>> > >>> [1]: > >>> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function > >>> [2]: > >>> > >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-145-Support-SQL-windowing-table-valued-function-td45269.html > >>> > >> > >
Re: Re: [VOTE] FLIP-145: Support SQL windowing table-valued function
+1 > -原始邮件- > 发件人: "Jark Wu" > 发送时间: 2020-10-10 18:50:20 (星期六) > 收件人: dev > 抄送: > 主题: Re: [VOTE] FLIP-145: Support SQL windowing table-valued function > > +1 > > On Sat, 10 Oct 2020 at 18:41, Benchao Li wrote: > > > +1 > > > > Jark Wu 于2020年10月10日周六 下午6:06写道: > > > > > Hi all, > > > > > > I would like to start the vote for FLIP-145 [1], which is discussed and > > > reached consensus in the discussion thread [2]. > > > > > > The vote will be open until 13th Oct. (72h), unless there is an objection > > > or not enough votes. > > > > > > Best, > > > Jark > > > > > > [1]: > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function > > > [2]: > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-145-Support-SQL-windowing-table-valued-function-td45269.html > > > > > > > > > -- > > > > Best, > > Benchao Li > > -- Best
Re: Re: [DISCUSS] Support source/sink parallelism config in Flink sql
+1 > -原始邮件- > 发件人: "Benchao Li" > 发送时间: 2020-09-20 16:28:20 (星期日) > 收件人: dev > 抄送: > 主题: Re: [DISCUSS] Support source/sink parallelism config in Flink sql > > Hi admin, > > Thanks for bringing up this discussion. > IMHO, it's a valuable feature. We also added this feature for our internal > SQL engine. > And our way is very similar to your proposal. > > Regarding the implementation, there is one shorthand that we should modify > each connector > to support this property. > We can wait for others' opinion whether this is a valid proposal. If yes, > then we can discuss > the implementation detailedly. > > admin <17626017...@163.com> 于2020年9月10日周四 上午1:19写道: > > > Hi devs: > > Currently,Flink sql does not support source/sink parallelism config.So,it > > will result in wasting or lacking resources in some cases. > > I think it is necessary to introduce configuration of source/sink > > parallelism in sql. > > From my side,i have the solution for this feature.Add parallelism config > > in ‘with’ properties of DDL. > > > > Before 1.11,we can get parallelism and then set it to > > StreamTableSink#consumeDataStream or StreamTableSource#getDataStream > > After 1.11,we can get parallelism from catalogTable and then set it to > > transformation in CommonPhysicalTableSourceScan or CommonPhysicalSink. > > > > What do you think? > > > > > > > > > > > > -- > > Best, > Benchao Li
Re: Re: [ANNOUNCE] New Apache Flink Committer - Godfrey He
Congratulations! > -原始邮件- > 发件人: "Benchao Li" > 发送时间: 2020-09-16 14:22:25 (星期三) > 收件人: dev > 抄送: "贺小令" > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Godfrey He > > Congratulations! > > Zhu Zhu 于2020年9月16日周三 下午1:36写道: > > > Congratulations! > > > > Thanks, > > Zhu > > > > Leonard Xu 于2020年9月16日周三 下午1:32写道: > > > > > Congratulations! Godfrey > > > > > > Best, > > > Leonard > > > > > > > 在 2020年9月16日,13:12,Yangze Guo 写道: > > > > > > > > Congratulations! Xiaoling. > > > > > > > > Best, > > > > Yangze Guo > > > > > > > > On Wed, Sep 16, 2020 at 12:45 PM Dian Fu > > wrote: > > > >> > > > >> Congratulations, well deserved! > > > >> > > > >> Regards, > > > >> Dian > > > >> > > > >>> 在 2020年9月16日,下午12:36,Guowei Ma 写道: > > > >>> > > > >>> Congratulations :) > > > >>> > > > >>> Best, > > > >>> Guowei > > > >>> > > > >>> > > > >>> On Wed, Sep 16, 2020 at 12:19 PM Jark Wu wrote: > > > >>> > > > Hi everyone, > > > > > > It's great seeing many new Flink committers recently, and on behalf > > > of the > > > PMC, > > > I'd like to announce one more new committer: Godfrey He. > > > > > > Godfrey is a very long time contributor in the Flink community since > > > the > > > end of 2016. > > > He has been a very active contributor in the Flink SQL component > > with > > > 153 > > > PRs and more than 571,414 lines which is quite outstanding. > > > Godfrey has paid essential effort with SQL optimization and helped a > > > lot > > > during the blink merging. > > > Besides that, he is also quite active with community work especially > > > in > > > Chinese mailing list. > > > > > > Please join me in congratulating Godfrey for becoming a Flink > > > committer! > > > > > > Cheers, > > > Jark Wu > > > > > > >> > > > > > > > > > > > -- > > Best, > Benchao Li
Re: Re: [ANNOUNCE] New Apache Flink Committer - Yun Tang
Congratulations! > -原始邮件- > 发件人: "Dawid Wysakowicz" > 发送时间: 2020-09-15 20:34:34 (星期二) > 收件人: dev@flink.apache.org, tang...@apache.org, "Yun Tang" > 抄送: > 主题: Re: [ANNOUNCE] New Apache Flink Committer - Yun Tang > > Congratulations! > > On 15/09/2020 12:19, Yu Li wrote: > > Hi all, > > > > It's great seeing many new Flink committers recently, and on behalf of the > > PMC, I'd like to announce one more new committer: Yun Tang! > > > > Yun has been an active contributor for more than two years, with 132 > > contributions including 72 commits and many PR reviews. > > > > Yun mainly works on state backend and checkpoint modules, and is one of the > > main maintainers of RocksDB state backend, involved in critical features > > like RocksDB memory management, etc. > > > > Besides that, Yun is very actively involved in QA and discussions in the > > user and dev mailing lists (more than 300 replies since Jul. 2018). > > > > Please join me in congratulating Yun for becoming a Flink committer! > > > > Cheers, > > Yu > > > -- 刘大龙 浙江大学 控制系 智能系统与控制研究所 工控新楼217 地址:浙江省杭州市浙大路38号浙江大学玉泉校区 Tel:18867547281
Fw: Re: Re: Re: The use of state ttl incremental cleanup strategy in sql deduplication resulting in significant performance degradation
-原始邮件- 发件人:"刘大龙" 发送时间:2020-05-06 17:55:25 (星期三) 收件人: "Jark Wu" 抄送: 主题: Re: Re: Re: The use of state ttl incremental cleanup strategy in sql deduplication resulting in significant performance degradation Thanks for your tuning ideas, I will test it later. Just to emphasize, I use non-mini batch deduplication for tests. -原始邮件- 发件人:"Jark Wu" 发送时间:2020-05-05 10:48:27 (星期二) 收件人: dev 抄送: "刘大龙" , "Yu Li" , "Yun Tang" 主题: Re: Re: The use of state ttl incremental cleanup strategy in sql deduplication resulting in significant performance degradation Hi Andrey, Thanks for the tuning ideas. I will explain the design of deduplication. The mini-batch implementation of deduplication buffers a bundle of input data in heap (Java Map), when the bundle size hit the trigger size or trigger time, the buffered data will be processed together. So we only need to access the state once per key. This is designed for rocksdb statebackend to reduce the frequently accessing, (de)serialization. And yes, this may slow down the checkpoint, but the suggested mini-batch timeout is <= 10s. From our production experience, it doesn't have much impact on checkpoint. Best, Jark On Tue, 5 May 2020 at 06:48, Andrey Zagrebin wrote: Hi lsyldliu, You can try to tune the StateTtlConfig. As the documentation suggests [1] the TTL incremental cleanup can decrease the per record performance. This is the price of the automatic cleanup. If the only thing, which happens mostly in your operator, is working with state then even checking one additional record to cleanup is two times more actions to do. Timer approach was discussed in TTL feature design. It needs an additional implementation and keeps more state but performs only one cleanup action exactly when needed so it is a performance/storage trade-off. Anyways, 20x degradation looks indeed a lot. As a first step, I would suggest to configure the incremental cleanup explicitly in `StateTtlConfigUtil#createTtlConfig` with a less entries to check, e.g. 1 because processFirstRow/processLastRow already access the state twice and do cleanup: .cleanupIncrementally(1, false) Also not sure but depending on the input data, finishBundle can happen mostly during the snapshotting which slows down taking the checkpoint. Could this fail the checkpoint accumulating the backpressure and slowing down the pipeline? Not sure why to keep the deduplication data in a Java map and in Flink state at the same time, why not to keep it only in Flink state and deduplicate on each incoming record? Best, Andrey [1] note 2 in https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#incremental-cleanup On Wed, Apr 29, 2020 at 11:53 AM 刘大龙 wrote: > > > > > -原始邮件- > > 发件人: "Jark Wu" > > 发送时间: 2020-04-29 14:09:44 (星期三) > > 收件人: dev , "Yu Li" , > myas...@live.com > > 抄送: azagre...@apache.org > > 主题: Re: The use of state ttl incremental cleanup strategy in sql > deduplication resulting in significant performance degradation > > > > Hi lsyldliu, > > > > Thanks for investigating this. > > > > First of all, if you are using mini-batch deduplication, it doesn't > support > > state ttl in 1.9. That's why the tps looks the same with 1.11 disable > state > > ttl. > > We just introduce state ttl for mini-batch deduplication recently. > > > > Regarding to the performance regression, it looks very surprise to me. > The > > performance is reduced by 19x when StateTtlConfig is enabled in 1.11. > > I don't have much experience of the underlying of StateTtlConfig. So I > loop > > in @Yu Li @YunTang in CC who may have more insights > on > > this. > > > > For more information, we use the following StateTtlConfig [1] in blink > > planner: > > > > StateTtlConfig > > .newBuilder(Time.milliseconds(retentionTime)) > > .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite) > > .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired) > > .build(); > > > > > > Best, > > Jark > > > > > > [1]: > > > https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/runtime/util/StateTtlConfigUtil.java#L27 > > > > > > > > > > > > On Wed, 29 Apr 2020 at 11:53, 刘大龙 wrote: > > > > > Hi, all! > > > > > > At flink master branch, we have supported state ttl for sql mini-batch > > > deduplication using incremental cleanup strategy on heap backend, > refer to > > > FLINK-16581. Because I want to test the performance of this feature, > so I > >
Re: Re: The use of state ttl incremental cleanup strategy in sql deduplication resulting in significant performance degradation
> -原始邮件- > 发件人: "Jark Wu" > 发送时间: 2020-04-29 14:09:44 (星期三) > 收件人: dev , "Yu Li" , myas...@live.com > 抄送: azagre...@apache.org > 主题: Re: The use of state ttl incremental cleanup strategy in sql > deduplication resulting in significant performance degradation > > Hi lsyldliu, > > Thanks for investigating this. > > First of all, if you are using mini-batch deduplication, it doesn't support > state ttl in 1.9. That's why the tps looks the same with 1.11 disable state > ttl. > We just introduce state ttl for mini-batch deduplication recently. > > Regarding to the performance regression, it looks very surprise to me. The > performance is reduced by 19x when StateTtlConfig is enabled in 1.11. > I don't have much experience of the underlying of StateTtlConfig. So I loop > in @Yu Li @YunTang in CC who may have more insights on > this. > > For more information, we use the following StateTtlConfig [1] in blink > planner: > > StateTtlConfig > .newBuilder(Time.milliseconds(retentionTime)) > .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite) > .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired) > .build(); > > > Best, > Jark > > > [1]: > https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/runtime/util/StateTtlConfigUtil.java#L27 > > > > > > On Wed, 29 Apr 2020 at 11:53, 刘大龙 wrote: > > > Hi, all! > > > > At flink master branch, we have supported state ttl for sql mini-batch > > deduplication using incremental cleanup strategy on heap backend, refer to > > FLINK-16581. Because I want to test the performance of this feature, so I > > compile master branch code and deploy the jar to production > > environment,then run three types of tests, respectively: > > > > > > > > > > flink 1.9.0 release version enable state ttl > > flink 1.11-snapshot version disable state ttl > > flink 1.11-snapshot version enable state ttl > > > > > > > > > > The test query sql as follows: > > > > select order_date, > > sum(price * amount - goods_all_fav_amt - virtual_money_amt + > > goods_carriage_amt) as saleP, > > sum(amount) as saleN, > > count(distinct parent_sn) as orderN, > > count(distinct user_id) as cusN > >from( > > select order_date, user_id, > > order_type, order_status, terminal, last_update_time, > > goods_all_fav_amt, > > goods_carriage_amt, virtual_money_amt, price, amount, > > order_quality, quality_goods_cnt, acture_goods_amt > > from (select *, row_number() over(partition by order_id, > > order_goods_id order by proctime desc) as rownum from dm_trd_order_goods) > > where rownum=1 > > and (order_type in (1,2,3,4,5) or order_status = 70) > > and terminal = 'shop' and price > 0) > > group by order_date > > > > > > At runtime, this query will generate two operators which include > > Deduplication and GroupAgg. In the test, the configuration is same, > > parallelism is 20, set kafka consumer from the earliest, and disable > > mini-batch function, The test results as follows: > > > > flink 1.9.0 enable state ttl:this test lasted 44m, flink receive 1374w > > records, average tps at 5200/s, Flink UI picture link back pressure, > > checkpoint > > flink 1.11-snapshot version disable state ttl:this test lasted 28m, flink > > receive 883w records, average tps at 5200/s, Flink UI picture link back > > pressure, checkpoint > > flink 1.11-snapshot version enable state ttl:this test lasted 1h 43m, > > flink only receive 168w records because of deduplication operator serious > > back pressure, average tps at 270/s, moreover, checkpoint always fail > > because of deduplication operator serious back pressure, Flink UI picture > > link back pressure, checkpoint > > > > Deduplication state clean up implement in flink 1.9.0 use timer, but > > 1.11-snapshot version use StateTtlConfig, this is the main difference. > > Comparing the three tests comprehensively, we can see that if disable state > > ttl in 1.11-snapshot the performance is the same with 1.9.0 enable state > > ttl. However, if enable state ttl in 1.11-snapshot, performance down is > > nearly 20 times, so I think incremental cleanup strategy cause this > > problem, what do you think about it? @azagrebin, @jark. > > > > Thanks. > > > > lsyldliu > > > > Zhejiang University, College of Control Science and engineer, CSC -- 刘大龙 浙江大学 控制系 智能系统与控制研究所 工控新楼217 地址:浙江省杭州市浙大路38号浙江大学玉泉校区 Tel:18867547281 Hi Jark, I use non-minibtach deduplication and group agg for the tests, non-minibatch deduplicaiton state ttl implementation has been refactored use StateTtlConfig replace timer in current 1.11 master branch that PR is my work, I also surprise to the 19x performance down.
The use of state ttl incremental cleanup strategy in sql deduplication resulting in significant performance degradation
Hi, all! At flink master branch, we have supported state ttl for sql mini-batch deduplication using incremental cleanup strategy on heap backend, refer to FLINK-16581. Because I want to test the performance of this feature, so I compile master branch code and deploy the jar to production environment,then run three types of tests, respectively: flink 1.9.0 release version enable state ttl flink 1.11-snapshot version disable state ttl flink 1.11-snapshot version enable state ttl The test query sql as follows: select order_date, sum(price * amount - goods_all_fav_amt - virtual_money_amt + goods_carriage_amt) as saleP, sum(amount) as saleN, count(distinct parent_sn) as orderN, count(distinct user_id) as cusN from( select order_date, user_id, order_type, order_status, terminal, last_update_time, goods_all_fav_amt, goods_carriage_amt, virtual_money_amt, price, amount, order_quality, quality_goods_cnt, acture_goods_amt from (select *, row_number() over(partition by order_id, order_goods_id order by proctime desc) as rownum from dm_trd_order_goods) where rownum=1 and (order_type in (1,2,3,4,5) or order_status = 70) and terminal = 'shop' and price > 0) group by order_date At runtime, this query will generate two operators which include Deduplication and GroupAgg. In the test, the configuration is same, parallelism is 20, set kafka consumer from the earliest, and disable mini-batch function, The test results as follows: flink 1.9.0 enable state ttl:this test lasted 44m, flink receive 1374w records, average tps at 5200/s, Flink UI picture link back pressure, checkpoint flink 1.11-snapshot version disable state ttl:this test lasted 28m, flink receive 883w records, average tps at 5200/s, Flink UI picture link back pressure, checkpoint flink 1.11-snapshot version enable state ttl:this test lasted 1h 43m, flink only receive 168w records because of deduplication operator serious back pressure, average tps at 270/s, moreover, checkpoint always fail because of deduplication operator serious back pressure, Flink UI picture link back pressure, checkpoint Deduplication state clean up implement in flink 1.9.0 use timer, but 1.11-snapshot version use StateTtlConfig, this is the main difference. Comparing the three tests comprehensively, we can see that if disable state ttl in 1.11-snapshot the performance is the same with 1.9.0 enable state ttl. However, if enable state ttl in 1.11-snapshot, performance down is nearly 20 times, so I think incremental cleanup strategy cause this problem, what do you think about it? @azagrebin, @jark. Thanks. lsyldliu Zhejiang University, College of Control Science and engineer, CSC