Re: Re: [VOTE] FLIP-214: Support Advanced Function DDL

2022-04-21 Thread

Hi, godfrey

The add/delete jar syntax parse is supported in table environment side 
currently, but the execution is implemented in SqlClient side. After this FLIP, 
we will move the execution to table environment, so here is no public api 
change. Moreover, I have updated the description in Core Code Design section.

> -原始邮件-
> 发件人: "godfrey he" 
> 发送时间: 2022-04-22 12:26:44 (星期五)
> 收件人: dev 
> 抄送: 
> 主题: Re: [VOTE] FLIP-214: Support Advanced Function DDL
> 
> hi Ron,
> 
> I don't see any section mentioned `delete jar`, could you update it?
> 
> Best,
> Godfrey
> 
> Jing Zhang  于2022年4月21日周四 17:57写道:
> >
> > Ron,
> > +1 (binding)
> >
> > Thanks for driving this FLIP.
> >
> > Best,
> > Jing Zhang
> >
> > Jark Wu  于2022年4月21日周四 11:31写道:
> >
> > > Thanks for driving this work @Ron,
> > >
> > > +1 (binding)
> > >
> > > Best,
> > > Jark
> > >
> > > On Thu, 21 Apr 2022 at 10:42, Mang Zhang  wrote:
> > >
> > > > +1
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Mang Zhang
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > At 2022-04-20 18:28:28, "刘大龙"  wrote:
> > > > >Hi, everyone
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >I'd like to start a vote on FLIP-214: Support Advanced Function DDL [1]
> > > > which has been discussed in [2].
> > > > >
> > > > >The vote will be open for at least 72 hours unless there is an 
> > > > >objection
> > > > or not enough votes.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >[1]
> > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > > > >
> > > > >[2] https://lists.apache.org/thread/7m5md150qgodgz1wllp5plx15j1nowx8
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >Best,
> > > > >
> > > > >Ron
> > > >
> > >


--
Best,
Ron


[VOTE] FLIP-214: Support Advanced Function DDL

2022-04-20 Thread
Hi, everyone




I'd like to start a vote on FLIP-214: Support Advanced Function DDL [1] which 
has been discussed in [2].

The vote will be open for at least 72 hours unless there is an objection or not 
enough votes.




[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL

[2] https://lists.apache.org/thread/7m5md150qgodgz1wllp5plx15j1nowx8




Best,

Ron

Re: Re: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

2022-04-19 Thread
Thanks for all discuss about this FLIP again. I will open a vote tomorrow.

Best,
Ron


> -原始邮件-
> 发件人: "Jark Wu" 
> 发送时间: 2022-04-19 16:03:22 (星期二)
> 收件人: dev 
> 抄送: 
> 主题: Re: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced 
> Function DDL
> 
> Thank Ron for updating the FLIP.
> 
> I think the updated FLIP has addressed Martijn's concern.
> I don't have other feedback. So +1 for a vote.
> 
> Best,
> Jark
> 
> On Fri, 15 Apr 2022 at 16:36, 刘大龙  wrote:
> 
> > Hi, Jingsong
> >
> > Thanks for your feedback, we will use flink FileSytem abstraction, so HDFS
> > S3 OSS will be supported.
> >
> > Best,
> >
> > Ron
> >
> > > -原始邮件-
> > > 发件人: "Jingsong Li" 
> > > 发送时间: 2022-04-14 17:55:03 (星期四)
> > > 收件人: dev 
> > > 抄送:
> > > 主题: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> > Function DDL
> > >
> > > I agree with Martijn.
> > >
> > > At least, HDFS S3 OSS should be supported.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Thu, Apr 14, 2022 at 4:46 PM Martijn Visser 
> > wrote:
> > > >
> > > > Hi Ron,
> > > >
> > > > The FLIP mentions that the priority will be set to support HDFS as a
> > > > resource provider. I'm concerned that we end up with a partially
> > > > implemented FLIP which only supports local and HDFS and then we move
> > on to
> > > > other features, as we see happen with others. I would argue that we
> > should
> > > > not focus on one resource provider, but that at least S3 support is
> > > > included in the same Flink release as HDFS support is.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn Visser
> > > > https://twitter.com/MartijnVisser82
> > > > https://github.com/MartijnVisser
> > > >
> > > >
> > > > On Thu, 14 Apr 2022 at 08:50, 刘大龙  wrote:
> > > >
> > > > > Hi, everyone
> > > > >
> > > > > First of all, thanks for the valuable suggestions received about this
> > > > > FLIP. After some discussion, it looks like all concerns have been
> > addressed
> > > > > for now, so I will start a vote about this FLIP in two or three days
> > later.
> > > > > Also, further feedback is very welcome.
> > > > >
> > > > > Best,
> > > > >
> > > > > Ron
> > > > >
> > > > >
> > > > > > -原始邮件-
> > > > > > 发件人: "刘大龙" 
> > > > > > 发送时间: 2022-04-08 10:09:46 (星期五)
> > > > > > 收件人: dev@flink.apache.org
> > > > > > 抄送:
> > > > > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> > Function
> > > > > DDL
> > > > > >
> > > > > > Hi, Martijn
> > > > > >
> > > > > > Do you have any question about this FLIP? looking forward to your
> > more
> > > > > feedback.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Ron
> > > > > >
> > > > > >
> > > > > > > -原始邮件-
> > > > > > > 发件人: "刘大龙" 
> > > > > > > 发送时间: 2022-03-29 19:33:58 (星期二)
> > > > > > > 收件人: dev@flink.apache.org
> > > > > > > 抄送:
> > > > > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced
> > Function DDL
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > -原始邮件-
> > > > > > > > 发件人: "Martijn Visser" 
> > > > > > > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > > > > > > 收件人: dev 
> > > > > > > > 抄送:
> > > > > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> > DDL
> > > > > > > >
> > > > > > > > Hi Ron,
> > > > > > > >
> > > > > > > > Thanks for creating the FLIP. You're talking about both local
> > and
> > > > > remote
> > > > > > > > resources. With regards to r

Re: Re: Re: Discussion about enhancing the partitioned table syntax

2022-04-15 Thread
Hi,Martijn

Thanks for the Jingsong remind. This issue is the part of FLIP [1] which 
propose the related partition syntax, In our this issue, here no new added 
DDL/DMLs is introduced about partitioned table, so these syntax have been 
discussed and voted in community. I think we don't need a new FLIP.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support

Best,

Ron

> -原始邮件-
> 发件人: "刘大龙" 
> 发送时间: 2022-04-15 17:15:36 (星期五)
> 收件人: dev@flink.apache.org
> 抄送: 
> 主题: Re: Re: Discussion about enhancing the partitioned table syntax
> 
> Hi, Jingsong
> 
> Thanks for your reply, this FLIP [1] is good to me, I have not found it. The 
> enhanced partition syntax in FLINK-27237 is a part of FLIP [1], there is no 
> new added DDL/DMLs, so these syntax have been discussed and voted in 
> community. This issue is continue to finish the work of syntax part in FLIP 
> [1].
> 
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> 
> Best,
> 
> Ron
> 
> > -原始邮件-
> > 发件人: "Jingsong Li" 
> > 发送时间: 2022-04-15 16:02:37 (星期五)
> > 收件人: dev 
> > 抄送: 
> > 主题: Re: Discussion about enhancing the partitioned table syntax
> > 
> > Thanks for taking this.
> > 
> > There is a FLIP [1] to define some partition DMLs.
> > 
> > If you have added DMLs, we need to vote again. (Create a new FLIP)
> > 
> > [1] 
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> > 
> > Best,
> > Jingsong
> > 
> > On Fri, Apr 15, 2022 at 3:20 PM Martijn Visser  
> > wrote:
> > >
> > > Hi Ron,
> > >
> > > Thanks for reaching out to the mailing list. As I mentioned in the Jira
> > > ticket, a SQL syntax change requires a FLIP [1]. A FLIP requires a
> > > discussion and a vote, which is explained in more detail in the Flink
> > > Bylaws [2]
> > >
> > > Please create a FLIP and follow the process for this.
> > >
> > > Best regards,
> > >
> > > Martijn Visser
> > > https://twitter.com/MartijnVisser82
> > > https://github.com/MartijnVisser
> > >
> > > [1]
> > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > > [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
> > >
> > >
> > >
> > > On Fri, 15 Apr 2022 at 09:08, 刘大龙  wrote:
> > >
> > > > Hi, everyone
> > > >
> > > >
> > > >
> > > >
> > > > For partitioned table, there are still some related syntaxes that are 
> > > > not
> > > > supported, which will be very useful for partitioned table. In batch
> > > > analysis scenarios, such as Hive table, partitioned table is a very 
> > > > common
> > > > case. Since the current Flink is mainly for streaming processing, not 
> > > > much
> > > > work has been done on partitioned table. In order to enhance the 
> > > > experience
> > > > of using Flink in batch job, as well as to improve the DDL syntax. So I
> > > > created the issue FLINK-27237 to discuss about enhancing the partition
> > > > table syntax. You are welcome to discuss on the issue and give comments 
> > > > so
> > > > that this proposal can be improved.
> > > >
> > > > Best,
> > > >
> > > >
> > > >
> > > >
> > > > Ron


Re: Re: Discussion about enhancing the partitioned table syntax

2022-04-15 Thread
Hi, Jingsong

Thanks for your reply, this FLIP [1] is good to me, I have not found it. The 
enhanced partition syntax in FLINK-27237 is a part of FLIP [1], there is no new 
added DDL/DMLs, so these syntax have been discussed and voted in community. 
This issue is continue to finish the work of syntax part in FLIP [1].


[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support

Best,

Ron

> -原始邮件-
> 发件人: "Jingsong Li" 
> 发送时间: 2022-04-15 16:02:37 (星期五)
> 收件人: dev 
> 抄送: 
> 主题: Re: Discussion about enhancing the partitioned table syntax
> 
> Thanks for taking this.
> 
> There is a FLIP [1] to define some partition DMLs.
> 
> If you have added DMLs, we need to vote again. (Create a new FLIP)
> 
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support
> 
> Best,
> Jingsong
> 
> On Fri, Apr 15, 2022 at 3:20 PM Martijn Visser  
> wrote:
> >
> > Hi Ron,
> >
> > Thanks for reaching out to the mailing list. As I mentioned in the Jira
> > ticket, a SQL syntax change requires a FLIP [1]. A FLIP requires a
> > discussion and a vote, which is explained in more detail in the Flink
> > Bylaws [2]
> >
> > Please create a FLIP and follow the process for this.
> >
> > Best regards,
> >
> > Martijn Visser
> > https://twitter.com/MartijnVisser82
> > https://github.com/MartijnVisser
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
> >
> >
> >
> > On Fri, 15 Apr 2022 at 09:08, 刘大龙  wrote:
> >
> > > Hi, everyone
> > >
> > >
> > >
> > >
> > > For partitioned table, there are still some related syntaxes that are not
> > > supported, which will be very useful for partitioned table. In batch
> > > analysis scenarios, such as Hive table, partitioned table is a very common
> > > case. Since the current Flink is mainly for streaming processing, not much
> > > work has been done on partitioned table. In order to enhance the 
> > > experience
> > > of using Flink in batch job, as well as to improve the DDL syntax. So I
> > > created the issue FLINK-27237 to discuss about enhancing the partition
> > > table syntax. You are welcome to discuss on the issue and give comments so
> > > that this proposal can be improved.
> > >
> > > Best,
> > >
> > >
> > >
> > >
> > > Ron

Re: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

2022-04-15 Thread
Hi, Jingsong

Thanks for your feedback, we will use flink FileSytem abstraction, so HDFS S3 
OSS will be supported.

Best,

Ron

> -原始邮件-
> 发件人: "Jingsong Li" 
> 发送时间: 2022-04-14 17:55:03 (星期四)
> 收件人: dev 
> 抄送: 
> 主题: [SPAM] Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced 
> Function DDL
> 
> I agree with Martijn.
> 
> At least, HDFS S3 OSS should be supported.
> 
> Best,
> Jingsong
> 
> On Thu, Apr 14, 2022 at 4:46 PM Martijn Visser  wrote:
> >
> > Hi Ron,
> >
> > The FLIP mentions that the priority will be set to support HDFS as a
> > resource provider. I'm concerned that we end up with a partially
> > implemented FLIP which only supports local and HDFS and then we move on to
> > other features, as we see happen with others. I would argue that we should
> > not focus on one resource provider, but that at least S3 support is
> > included in the same Flink release as HDFS support is.
> >
> > Best regards,
> >
> > Martijn Visser
> > https://twitter.com/MartijnVisser82
> > https://github.com/MartijnVisser
> >
> >
> > On Thu, 14 Apr 2022 at 08:50, 刘大龙  wrote:
> >
> > > Hi, everyone
> > >
> > > First of all, thanks for the valuable suggestions received about this
> > > FLIP. After some discussion, it looks like all concerns have been 
> > > addressed
> > > for now, so I will start a vote about this FLIP in two or three days 
> > > later.
> > > Also, further feedback is very welcome.
> > >
> > > Best,
> > >
> > > Ron
> > >
> > >
> > > > -原始邮件-
> > > > 发件人: "刘大龙" 
> > > > 发送时间: 2022-04-08 10:09:46 (星期五)
> > > > 收件人: dev@flink.apache.org
> > > > 抄送:
> > > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> > > DDL
> > > >
> > > > Hi, Martijn
> > > >
> > > > Do you have any question about this FLIP? looking forward to your more
> > > feedback.
> > > >
> > > > Best,
> > > >
> > > > Ron
> > > >
> > > >
> > > > > -原始邮件-
> > > > > 发件人: "刘大龙" 
> > > > > 发送时间: 2022-03-29 19:33:58 (星期二)
> > > > > 收件人: dev@flink.apache.org
> > > > > 抄送:
> > > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > -原始邮件-
> > > > > > 发件人: "Martijn Visser" 
> > > > > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > > > > 收件人: dev 
> > > > > > 抄送:
> > > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > > >
> > > > > > Hi Ron,
> > > > > >
> > > > > > Thanks for creating the FLIP. You're talking about both local and
> > > remote
> > > > > > resources. With regards to remote resources, how do you see this
> > > work with
> > > > > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop
> > > > > > dependencies are not packaged, but I would hope that we do that for
> > > all
> > > > > > filesystem implementation. I don't think it's a good idea to have
> > > any tight
> > > > > > coupling to file system implementations, especially if at some point
> > > we
> > > > > > could also externalize file system implementations (like we're doing
> > > for
> > > > > > connectors already). I think the FLIP would be better by not only
> > > > > > referring to "Hadoop" as a remote resource provider, but a more
> > > generic
> > > > > > term since there are more options than Hadoop.
> > > > > >
> > > > > > I'm also thinking about security/operations implications: would it 
> > > > > > be
> > > > > > possible for bad actor X to create a JAR that either influences 
> > > > > > other
> > > > > > running jobs, leaks data or credentials or anything else? If so, I
> > > think it
> > > > > > would also be good to have an option to disable this feature
> > > completely. I
> > > > > > think there are roughly two types of companies who run Flink:

Re: Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

2022-04-15 Thread
Hi, Martijn

My description in the FLIP is not very clear, we will use flink FileSystem 
abstraction to download resource, so HDFS/S3/OSS etc are will be supported in 
first version.

Best,

Ron


> -原始邮件-
> 发件人: "Martijn Visser" 
> 发送时间: 2022-04-14 16:46:24 (星期四)
> 收件人: dev 
> 抄送: 
> 主题: Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Hi Ron,
> 
> The FLIP mentions that the priority will be set to support HDFS as a
> resource provider. I'm concerned that we end up with a partially
> implemented FLIP which only supports local and HDFS and then we move on to
> other features, as we see happen with others. I would argue that we should
> not focus on one resource provider, but that at least S3 support is
> included in the same Flink release as HDFS support is.
> 
> Best regards,
> 
> Martijn Visser
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
> 
> 
> On Thu, 14 Apr 2022 at 08:50, 刘大龙  wrote:
> 
> > Hi, everyone
> >
> > First of all, thanks for the valuable suggestions received about this
> > FLIP. After some discussion, it looks like all concerns have been addressed
> > for now, so I will start a vote about this FLIP in two or three days later.
> > Also, further feedback is very welcome.
> >
> > Best,
> >
> > Ron
> >
> >
> > > -原始邮件-
> > > 发件人: "刘大龙" 
> > > 发送时间: 2022-04-08 10:09:46 (星期五)
> > > 收件人: dev@flink.apache.org
> > > 抄送:
> > > 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function
> > DDL
> > >
> > > Hi, Martijn
> > >
> > > Do you have any question about this FLIP? looking forward to your more
> > feedback.
> > >
> > > Best,
> > >
> > > Ron
> > >
> > >
> > > > -原始邮件-
> > > > 发件人: "刘大龙" 
> > > > 发送时间: 2022-03-29 19:33:58 (星期二)
> > > > 收件人: dev@flink.apache.org
> > > > 抄送:
> > > > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > >
> > > >
> > > >
> > > >
> > > > > -原始邮件-
> > > > > 发件人: "Martijn Visser" 
> > > > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > > > 收件人: dev 
> > > > > 抄送:
> > > > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > >
> > > > > Hi Ron,
> > > > >
> > > > > Thanks for creating the FLIP. You're talking about both local and
> > remote
> > > > > resources. With regards to remote resources, how do you see this
> > work with
> > > > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop
> > > > > dependencies are not packaged, but I would hope that we do that for
> > all
> > > > > filesystem implementation. I don't think it's a good idea to have
> > any tight
> > > > > coupling to file system implementations, especially if at some point
> > we
> > > > > could also externalize file system implementations (like we're doing
> > for
> > > > > connectors already). I think the FLIP would be better by not only
> > > > > referring to "Hadoop" as a remote resource provider, but a more
> > generic
> > > > > term since there are more options than Hadoop.
> > > > >
> > > > > I'm also thinking about security/operations implications: would it be
> > > > > possible for bad actor X to create a JAR that either influences other
> > > > > running jobs, leaks data or credentials or anything else? If so, I
> > think it
> > > > > would also be good to have an option to disable this feature
> > completely. I
> > > > > think there are roughly two types of companies who run Flink: those
> > who
> > > > > open it up for everyone to use (here the feature would be welcomed)
> > and
> > > > > those who need to follow certain minimum standards/have a more
> > closed Flink
> > > > > ecosystem). They usually want to validate a JAR upfront before
> > making it
> > > > > available, even at the expense of speed, because it gives them more
> > control
> > > > > over what will be running in their environment.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn Visser
> > > &

Discussion about enhancing the partitioned table syntax

2022-04-15 Thread
Hi, everyone




For partitioned table, there are still some related syntaxes that are not 
supported, which will be very useful for partitioned table. In batch analysis 
scenarios, such as Hive table, partitioned table is a very common case. Since 
the current Flink is mainly for streaming processing, not much work has been 
done on partitioned table. In order to enhance the experience of using Flink in 
batch job, as well as to improve the DDL syntax. So I created the issue 
FLINK-27237 to discuss about enhancing the partition table syntax. You are 
welcome to discuss on the issue and give comments so that this proposal can be 
improved.

Best,




Ron

Re: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

2022-04-14 Thread
Hi, everyone

First of all, thanks for the valuable suggestions received about this FLIP. 
After some discussion, it looks like all concerns have been addressed for now, 
so I will start a vote about this FLIP in two or three days later. Also, 
further feedback is very welcome.

Best,

Ron


> -原始邮件-
> 发件人: "刘大龙" 
> 发送时间: 2022-04-08 10:09:46 (星期五)
> 收件人: dev@flink.apache.org
> 抄送: 
> 主题: Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Hi, Martijn
> 
> Do you have any question about this FLIP? looking forward to your more 
> feedback.
> 
> Best,
> 
> Ron
> 
> 
> > -原始邮件-
> > 发件人: "刘大龙" 
> > 发送时间: 2022-03-29 19:33:58 (星期二)
> > 收件人: dev@flink.apache.org
> > 抄送: 
> > 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > 
> > 
> > 
> > 
> > > -原始邮件-
> > > 发件人: "Martijn Visser" 
> > > 发送时间: 2022-03-24 16:18:14 (星期四)
> > > 收件人: dev 
> > > 抄送: 
> > > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > 
> > > Hi Ron,
> > > 
> > > Thanks for creating the FLIP. You're talking about both local and remote
> > > resources. With regards to remote resources, how do you see this work with
> > > Flink's filesystem abstraction? I did read in the FLIP that Hadoop
> > > dependencies are not packaged, but I would hope that we do that for all
> > > filesystem implementation. I don't think it's a good idea to have any 
> > > tight
> > > coupling to file system implementations, especially if at some point we
> > > could also externalize file system implementations (like we're doing for
> > > connectors already). I think the FLIP would be better by not only
> > > referring to "Hadoop" as a remote resource provider, but a more generic
> > > term since there are more options than Hadoop.
> > > 
> > > I'm also thinking about security/operations implications: would it be
> > > possible for bad actor X to create a JAR that either influences other
> > > running jobs, leaks data or credentials or anything else? If so, I think 
> > > it
> > > would also be good to have an option to disable this feature completely. I
> > > think there are roughly two types of companies who run Flink: those who
> > > open it up for everyone to use (here the feature would be welcomed) and
> > > those who need to follow certain minimum standards/have a more closed 
> > > Flink
> > > ecosystem). They usually want to validate a JAR upfront before making it
> > > available, even at the expense of speed, because it gives them more 
> > > control
> > > over what will be running in their environment.
> > > 
> > > Best regards,
> > > 
> > > Martijn Visser
> > > https://twitter.com/MartijnVisser82
> > > 
> > > 
> > > On Wed, 23 Mar 2022 at 16:47, 刘大龙  wrote:
> > > 
> > > >
> > > >
> > > >
> > > > > -原始邮件-
> > > > > 发件人: "Peter Huang" 
> > > > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > > > 收件人: dev 
> > > > > 抄送:
> > > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > > >
> > > > > Hi Ron,
> > > > >
> > > > > Thanks for reviving the discussion of the work. The design looks 
> > > > > good. A
> > > > > small typo in the FLIP is that currently it is marked as released in
> > > > 1.16.
> > > > >
> > > > >
> > > > > Best Regards
> > > > > Peter Huang
> > > > >
> > > > >
> > > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang  
> > > > > wrote:
> > > > >
> > > > > > hi Yuxia,
> > > > > >
> > > > > >
> > > > > > Thanks for your reply. Your reminder is very important !
> > > > > >
> > > > > >
> > > > > > Since we download the file to the local, remember to clean it up 
> > > > > > when
> > > > the
> > > > > > flink client exits
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > 

Re: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

2022-04-07 Thread
Hi, Martijn

Do you have any question about this FLIP? looking forward to your more feedback.

Best,

Ron


> -原始邮件-
> 发件人: "刘大龙" 
> 发送时间: 2022-03-29 19:33:58 (星期二)
> 收件人: dev@flink.apache.org
> 抄送: 
> 主题: Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> 
> 
> 
> > -原始邮件-
> > 发件人: "Martijn Visser" 
> > 发送时间: 2022-03-24 16:18:14 (星期四)
> > 收件人: dev 
> > 抄送: 
> > 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > 
> > Hi Ron,
> > 
> > Thanks for creating the FLIP. You're talking about both local and remote
> > resources. With regards to remote resources, how do you see this work with
> > Flink's filesystem abstraction? I did read in the FLIP that Hadoop
> > dependencies are not packaged, but I would hope that we do that for all
> > filesystem implementation. I don't think it's a good idea to have any tight
> > coupling to file system implementations, especially if at some point we
> > could also externalize file system implementations (like we're doing for
> > connectors already). I think the FLIP would be better by not only
> > referring to "Hadoop" as a remote resource provider, but a more generic
> > term since there are more options than Hadoop.
> > 
> > I'm also thinking about security/operations implications: would it be
> > possible for bad actor X to create a JAR that either influences other
> > running jobs, leaks data or credentials or anything else? If so, I think it
> > would also be good to have an option to disable this feature completely. I
> > think there are roughly two types of companies who run Flink: those who
> > open it up for everyone to use (here the feature would be welcomed) and
> > those who need to follow certain minimum standards/have a more closed Flink
> > ecosystem). They usually want to validate a JAR upfront before making it
> > available, even at the expense of speed, because it gives them more control
> > over what will be running in their environment.
> > 
> > Best regards,
> > 
> > Martijn Visser
> > https://twitter.com/MartijnVisser82
> > 
> > 
> > On Wed, 23 Mar 2022 at 16:47, 刘大龙  wrote:
> > 
> > >
> > >
> > >
> > > > -原始邮件-
> > > > 发件人: "Peter Huang" 
> > > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > > 收件人: dev 
> > > > 抄送:
> > > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > > >
> > > > Hi Ron,
> > > >
> > > > Thanks for reviving the discussion of the work. The design looks good. A
> > > > small typo in the FLIP is that currently it is marked as released in
> > > 1.16.
> > > >
> > > >
> > > > Best Regards
> > > > Peter Huang
> > > >
> > > >
> > > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang  wrote:
> > > >
> > > > > hi Yuxia,
> > > > >
> > > > >
> > > > > Thanks for your reply. Your reminder is very important !
> > > > >
> > > > >
> > > > > Since we download the file to the local, remember to clean it up when
> > > the
> > > > > flink client exits
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Best regards,
> > > > > Mang Zhang
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > > >  wrote:
> > > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will
> > > > > benefit from it. The flip looks good to me. I just have two minor
> > > questions:
> > > > > >1. For synax explanation, I see it's "Create  function as
> > > > > identifier", I think the word "identifier" may not be
> > > > > self-dedescriptive for actually it's not a random name but the name of
> > > the
> > > > > class that provides the implementation for function to be create.
> > > > > >May be it'll be more clear to use "class_name&qu

Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

2022-03-29 Thread
.
> >
> > I'm also thinking about security/operations implications: would it be
> > possible for bad actor X to create a JAR that either influences other
> > running jobs, leaks data or credentials or anything else? If so, I think it
> > would also be good to have an option to disable this feature completely. I
> > think there are roughly two types of companies who run Flink: those who
> > open it up for everyone to use (here the feature would be welcomed) and
> > those who need to follow certain minimum standards/have a more closed Flink
> > ecosystem). They usually want to validate a JAR upfront before making it
> > available, even at the expense of speed, because it gives them more control
> > over what will be running in their environment.
> >
> > Best regards,
> >
> > Martijn Visser
> > https://twitter.com/MartijnVisser82
> >
> >
> > On Wed, 23 Mar 2022 at 16:47, 刘大龙  wrote:
> >
> >>
> >>
> >>> -原始邮件-
> >>> 发件人: "Peter Huang" 
> >>> 发送时间: 2022-03-23 11:13:32 (星期三)
> >>> 收件人: dev 
> >>> 抄送:
> >>> 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> >>>
> >>> Hi Ron,
> >>>
> >>> Thanks for reviving the discussion of the work. The design looks good. A
> >>> small typo in the FLIP is that currently it is marked as released in
> >> 1.16.
> >>>
> >>> Best Regards
> >>> Peter Huang
> >>>
> >>>
> >>> On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang  wrote:
> >>>
> >>>> hi Yuxia,
> >>>>
> >>>>
> >>>> Thanks for your reply. Your reminder is very important !
> >>>>
> >>>>
> >>>> Since we download the file to the local, remember to clean it up when
> >> the
> >>>> flink client exits
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Best regards,
> >>>> Mang Zhang
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> >>>>  wrote:
> >>>>> Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will
> >>>> benefit from it. The flip looks good to me. I just have two minor
> >> questions:
> >>>>> 1. For synax explanation, I see it's "Create  function as
> >>>> identifier", I think the word "identifier" may not be
> >>>> self-dedescriptive for actually it's not a random name but the name of
> >> the
> >>>> class that provides the implementation for function to be create.
> >>>>> May be it'll be more clear to use "class_name" replace "identifier"
> >> just
> >>>> like what Hive[1]/Spark[2] do.
> >>>>> 2.  >> If the resource used is a remote resource, it will first
> >> download
> >>>> the resource to a local temporary directory, which will be generated
> >> using
> >>>> UUID, and then register the local path to the user class loader.
> >>>>> For the above explanation in this FLIP, It seems for such statement
> >> sets,
> >>>>> ""
> >>>>> Create  function as org.apache.udf1 using jar 'hdfs://myudfs.jar';
> >>>>> Create  function as org.apache.udf2 using jar 'hdfs://myudfs.jar';
> >>>>> ""
> >>>>> it'll download the resource 'hdfs://myudfs.jar' for twice. So is it
> >>>> possible to provide some cache mechanism that we won't need to
> >> download /
> >>>> store for twice?
> >>>>>
> >>>>> Best regards,
> >>>>> Yuxia
> >>>>> [1]
> >> https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> >>>>> [2]
> >> https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html--
> >>>>> 发件人:Mang Zhang
> >>>>> 日 期:2022年03月22日 11:35:24
> >>>>> 收件人:
> >>>>> 

Re: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

2022-03-29 Thread



> -原始邮件-
> 发件人: "Martijn Visser" 
> 发送时间: 2022-03-24 16:18:14 (星期四)
> 收件人: dev 
> 抄送: 
> 主题: Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Hi Ron,
> 
> Thanks for creating the FLIP. You're talking about both local and remote
> resources. With regards to remote resources, how do you see this work with
> Flink's filesystem abstraction? I did read in the FLIP that Hadoop
> dependencies are not packaged, but I would hope that we do that for all
> filesystem implementation. I don't think it's a good idea to have any tight
> coupling to file system implementations, especially if at some point we
> could also externalize file system implementations (like we're doing for
> connectors already). I think the FLIP would be better by not only
> referring to "Hadoop" as a remote resource provider, but a more generic
> term since there are more options than Hadoop.
> 
> I'm also thinking about security/operations implications: would it be
> possible for bad actor X to create a JAR that either influences other
> running jobs, leaks data or credentials or anything else? If so, I think it
> would also be good to have an option to disable this feature completely. I
> think there are roughly two types of companies who run Flink: those who
> open it up for everyone to use (here the feature would be welcomed) and
> those who need to follow certain minimum standards/have a more closed Flink
> ecosystem). They usually want to validate a JAR upfront before making it
> available, even at the expense of speed, because it gives them more control
> over what will be running in their environment.
> 
> Best regards,
> 
> Martijn Visser
> https://twitter.com/MartijnVisser82
> 
> 
> On Wed, 23 Mar 2022 at 16:47, 刘大龙  wrote:
> 
> >
> >
> >
> > > -原始邮件-
> > > 发件人: "Peter Huang" 
> > > 发送时间: 2022-03-23 11:13:32 (星期三)
> > > 收件人: dev 
> > > 抄送:
> > > 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > >
> > > Hi Ron,
> > >
> > > Thanks for reviving the discussion of the work. The design looks good. A
> > > small typo in the FLIP is that currently it is marked as released in
> > 1.16.
> > >
> > >
> > > Best Regards
> > > Peter Huang
> > >
> > >
> > > On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang  wrote:
> > >
> > > > hi Yuxia,
> > > >
> > > >
> > > > Thanks for your reply. Your reminder is very important !
> > > >
> > > >
> > > > Since we download the file to the local, remember to clean it up when
> > the
> > > > flink client exits
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Mang Zhang
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> > > >  wrote:
> > > > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will
> > > > benefit from it. The flip looks good to me. I just have two minor
> > questions:
> > > > >1. For synax explanation, I see it's "Create  function as
> > > > identifier", I think the word "identifier" may not be
> > > > self-dedescriptive for actually it's not a random name but the name of
> > the
> > > > class that provides the implementation for function to be create.
> > > > >May be it'll be more clear to use "class_name" replace "identifier"
> > just
> > > > like what Hive[1]/Spark[2] do.
> > > > >
> > > > >2.  >> If the resource used is a remote resource, it will first
> > download
> > > > the resource to a local temporary directory, which will be generated
> > using
> > > > UUID, and then register the local path to the user class loader.
> > > > >For the above explanation in this FLIP, It seems for such statement
> > sets,
> > > > >""
> > > > >Create  function as org.apache.udf1 using jar 'hdfs://myudfs.jar';
> > > > >Create  function as org.apache.udf2 using jar 'hdfs://myudfs.jar';
> > > > >""
> > > > > it'll download the resou

Re: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

2022-03-23 Thread



> -原始邮件-
> 发件人: "Peter Huang" 
> 发送时间: 2022-03-23 11:13:32 (星期三)
> 收件人: dev 
> 抄送: 
> 主题: Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Hi Ron,
> 
> Thanks for reviving the discussion of the work. The design looks good. A
> small typo in the FLIP is that currently it is marked as released in 1.16.
> 
> 
> Best Regards
> Peter Huang
> 
> 
> On Tue, Mar 22, 2022 at 10:58 PM Mang Zhang  wrote:
> 
> > hi Yuxia,
> >
> >
> > Thanks for your reply. Your reminder is very important !
> >
> >
> > Since we download the file to the local, remember to clean it up when the
> > flink client exits
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > Best regards,
> > Mang Zhang
> >
> >
> >
> >
> >
> > At 2022-03-23 10:02:26, "罗宇侠(莫辞)"
> >  wrote:
> > >Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will
> > benefit from it. The flip looks good to me. I just have two minor questions:
> > >1. For synax explanation, I see it's "Create  function as
> > identifier", I think the word "identifier" may not be
> > self-dedescriptive for actually it's not a random name but the name of the
> > class that provides the implementation for function to be create.
> > >May be it'll be more clear to use "class_name" replace "identifier" just
> > like what Hive[1]/Spark[2] do.
> > >
> > >2.  >> If the resource used is a remote resource, it will first download
> > the resource to a local temporary directory, which will be generated using
> > UUID, and then register the local path to the user class loader.
> > >For the above explanation in this FLIP, It seems for such statement sets,
> > >""
> > >Create  function as org.apache.udf1 using jar 'hdfs://myudfs.jar';
> > >Create  function as org.apache.udf2 using jar 'hdfs://myudfs.jar';
> > >""
> > > it'll download the resource 'hdfs://myudfs.jar' for twice. So is it
> > possible to provide some cache mechanism that we won't need to download /
> > store for twice?
> > >
> > >
> > >Best regards,
> > >Yuxia
> > >[1] https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> > >[2]
> > https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html--
> > >发件人:Mang Zhang
> > >日 期:2022年03月22日 11:35:24
> > >收件人:
> > >主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> > >
> > >Hi Ron, Thank you so much for this suggestion, this is so good.
> > >In our company, when users use custom UDF, it is very inconvenient, and
> > the code needs to be packaged into the job jar,
> > >and cannot refer to the existing udf jar through the existing udf jar.
> > >Or pass in the jar reference in the startup command.
> > >If we implement this feature, users can focus on their own business
> > development.
> > >I can also contribute if needed.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >--
> > >
> > >Best regards,
> > >Mang Zhang
> > >
> > >
> > >
> > >
> > >
> > >At 2022-03-21 14:57:32, "刘大龙"  wrote:
> > >>Hi, everyone
> > >>
> > >>
> > >>
> > >>
> > >>I would like to open a discussion for support advanced Function DDL,
> > this proposal is a continuation of FLIP-79 in which Flink Function DDL is
> > defined. Until now it is partially released as the Flink function DDL with
> > user defined resources is not clearly discussed and implemented. It is an
> > important feature for support to register UDF with custom jar resource,
> > users can use UDF more more easily without having to put jars under the
> > classpath in advance.
> > >>
> > >>Looking forward to your feedback.
> > >>
> > >>
> > >>
> > >>
> > >>[1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> > >>
> > >>
> > >>
> > >>
> > >>Best,
> > >>
> > >>Ron
> > >>
> > >>
> > >
> >

Hi, Peter, Thanks for your feedback. This work also has your effort, thank you 
very much.


Re: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

2022-03-23 Thread



> -原始邮件-
> 发件人: "罗宇侠(莫辞)" 
> 发送时间: 2022-03-23 10:02:26 (星期三)
> 收件人: "Mang Zhang" , "Flink Dev" 
> 抄送: 
> 主题: 回复:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Hi Ron, Thanks for starting this dicuss, some Spark/Hive users will benefit 
> from it. The flip looks good to me. I just have two minor questions:
> 1. For synax explanation, I see it's "Create  function as 
> identifier", I think the word "identifier" may not be self-dedescriptive 
> for actually it's not a random name but the name of the class that provides 
> the implementation for function to be create.
> May be it'll be more clear to use "class_name" replace "identifier" just like 
> what Hive[1]/Spark[2] do.
> 
> 2.  >> If the resource used is a remote resource, it will first download the 
> resource to a local temporary directory, which will be generated using UUID, 
> and then register the local path to the user class loader.
> For the above explanation in this FLIP, It seems for such statement sets,
> ""
> Create  function as org.apache.udf1 using jar 'hdfs://myudfs.jar';
> Create  function as org.apache.udf2 using jar 'hdfs://myudfs.jar';
> ""
>  it'll download the resource 'hdfs://myudfs.jar' for twice. So is it possible 
> to provide some cache mechanism that we won't need to download / store for 
> twice?
> ​ 
> 
> Best regards,
> Yuxia
> [1] https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl
> [2] 
> https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-ddl-create-function.html--
> 发件人:Mang Zhang
> 日 期:2022年03月22日 11:35:24
> 收件人:
> 主 题:Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Hi Ron, Thank you so much for this suggestion, this is so good.
> In our company, when users use custom UDF, it is very inconvenient, and the 
> code needs to be packaged into the job jar, 
> and cannot refer to the existing udf jar through the existing udf jar.
> Or pass in the jar reference in the startup command.
> If we implement this feature, users can focus on their own business 
> development.
> I can also contribute if needed.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> 
> Best regards,
> Mang Zhang
> 
> 
> 
> 
> 
> At 2022-03-21 14:57:32, "刘大龙"  wrote:
> >Hi, everyone
> >
> >
> >
> >
> >I would like to open a discussion for support advanced Function DDL, this 
> >proposal is a continuation of FLIP-79 in which Flink Function DDL is 
> >defined. Until now it is partially released as the Flink function DDL with 
> >user defined resources is not clearly discussed and implemented. It is an 
> >important feature for support to register UDF with custom jar resource, 
> >users can use UDF more more easily without having to put jars under the 
> >classpath in advance.
> >
> >Looking forward to your feedback.
> >
> >
> >
> >
> >[1] 
> >https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> >
> >
> >
> >
> >Best,
> >
> >Ron
> >
> >
>

Hi, Yuxia, thanks for your feedback. It is very good for your advice.

1. I think you are right, "identifier" must be the class name which provides 
the implementation for function. We should use "class_name" replace 
"identifier".

2. Yes, we should cache the resource in local when their url are the same. This 
will be considered in code implementation.



Re: Re:[DISCUSS] FLIP-214 Support Advanced Function DDL

2022-03-23 Thread



> -原始邮件-
> 发件人: "Mang Zhang" 
> 发送时间: 2022-03-22 11:35:24 (星期二)
> 收件人: dev@flink.apache.org
> 抄送: 
> 主题: Re:[DISCUSS] FLIP-214 Support Advanced Function DDL
> 
> Hi Ron, Thank you so much for this suggestion, this is so good.
> In our company, when users use custom UDF, it is very inconvenient, and the 
> code needs to be packaged into the job jar, 
> and cannot refer to the existing udf jar through the existing udf jar.
> Or pass in the jar reference in the startup command.
> If we implement this feature, users can focus on their own business 
> development.
> I can also contribute if needed.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> 
> Best regards,
> Mang Zhang
> 
> 
> 
> 
> 
> At 2022-03-21 14:57:32, "刘大龙"  wrote:
> >Hi, everyone
> >
> >
> >
> >
> >I would like to open a discussion for support advanced Function DDL, this 
> >proposal is a continuation of FLIP-79 in which Flink Function DDL is 
> >defined. Until now it is partially released as the Flink function DDL with 
> >user defined resources is not clearly discussed and implemented. It is an 
> >important feature for support to register UDF with custom jar resource, 
> >users can use UDF more more easily without having to put jars under the 
> >classpath in advance.
> >
> >Looking forward to your feedback.
> >
> >
> >
> >
> >[1] 
> >https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL
> >
> >
> >
> >
> >Best,
> >
> >Ron
> >
> >


Hi, Mang
Glad to receive your feedback, the advice you gave from the perspective of 
actual production use case within your company made this proposal even more 
meaningful to me. 
Thank you very much. Very welcome to contribute together.

Best,

Ron

[DISCUSS] FLIP-214 Support Advanced Function DDL

2022-03-21 Thread
Hi, everyone




I would like to open a discussion for support advanced Function DDL, this 
proposal is a continuation of FLIP-79 in which Flink Function DDL is defined. 
Until now it is partially released as the Flink function DDL with user defined 
resources is not clearly discussed and implemented. It is an important feature 
for support to register UDF with custom jar resource, users can use UDF more 
more easily without having to put jars under the classpath in advance.

Looking forward to your feedback.




[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL




Best,

Ron




Re: Re: Re: [ANNOUNCE] New Apache Flink Committer - Rui Li

2021-04-22 Thread
Congratulations Rui!

Best

> -原始邮件-
> 发件人: "Benchao Li" 
> 发送时间: 2021-04-22 14:43:33 (星期四)
> 收件人: dev 
> 抄送: 
> 主题: Re: Re: [ANNOUNCE] New Apache Flink Committer - Rui Li
> 
> Congratulations Rui!
> 
> Jingsong Li  于2021年4月22日周四 下午2:33写道:
> 
> > Congratulations Rui!
> >
> > Best,
> > Jingsong
> >
> > On Thu, Apr 22, 2021 at 11:52 AM Yun Gao 
> > wrote:
> >
> > > Congratulations Rui!
> > >
> > > Best,
> > >  Yun
> > >
> > >
> > > --
> > > Sender:Nicholas Jiang
> > > Date:2021/04/22 11:26:05
> > > Recipient:
> > > Theme:Re: [ANNOUNCE] New Apache Flink Committer - Rui Li
> > >
> > > Congrats, Rui!
> > >
> > > Best,
> > > Nicholas Jiang
> > >
> > >
> > >
> > > --
> > > Sent from:
> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
> > >
> >
> >
> > --
> > Best, Jingsong Lee
> >
> 
> 
> -- 
> 
> Best,
> Benchao Li


Re: Re: [VOTE] FLIP-145: Support SQL windowing table-valued function (2nd)

2020-11-11 Thread

+1

> -原始邮件-
> 发件人: "Timo Walther" 
> 发送时间: 2020-11-11 18:55:06 (星期三)
> 收件人: dev@flink.apache.org
> 抄送: 
> 主题: Re: [VOTE] FLIP-145: Support SQL windowing table-valued function (2nd)
> 
> +1 (binding)
> 
> Thanks,
> Timo
> 
> On 11.11.20 07:14, Pengcheng Liu wrote:
> > +1 (binding)
> > 
> > Jark Wu  于2020年11月11日周三 上午10:13写道:
> > 
> >> +1 (binding)
> >>
> >> On Tue, 10 Nov 2020 at 14:59, Jark Wu  wrote:
> >>
> >>> Hi all,
> >>>
> >>> There is new feedback on the FLIP-145. So I would like to start a new
> >> vote
> >>> for FLIP-145 [1],
> >>> which has been discussed and reached consensus in the discussion thread
> >>> [2].
> >>>
> >>> The vote will be open until 15:00 (UTC+8) 13th Nov. (72h), unless there
> >> is
> >>> an objection or not enough votes.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> [1]:
> >>>
> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function
> >>> [2]:
> >>>
> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-145-Support-SQL-windowing-table-valued-function-td45269.html
> >>>
> >>
> >


Re: Re: [VOTE] FLIP-145: Support SQL windowing table-valued function

2020-10-10 Thread


+1
> -原始邮件-
> 发件人: "Jark Wu" 
> 发送时间: 2020-10-10 18:50:20 (星期六)
> 收件人: dev 
> 抄送: 
> 主题: Re: [VOTE] FLIP-145: Support SQL windowing table-valued function
> 
> +1
> 
> On Sat, 10 Oct 2020 at 18:41, Benchao Li  wrote:
> 
> > +1
> >
> > Jark Wu  于2020年10月10日周六 下午6:06写道:
> >
> > > Hi all,
> > >
> > > I would like to start the vote for FLIP-145 [1], which is discussed and
> > > reached consensus in the discussion thread [2].
> > >
> > > The vote will be open until 13th Oct. (72h), unless there is an objection
> > > or not enough votes.
> > >
> > > Best,
> > > Jark
> > >
> > > [1]:
> > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function
> > > [2]:
> > >
> > >
> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-145-Support-SQL-windowing-table-valued-function-td45269.html
> > >
> >
> >
> > --
> >
> > Best,
> > Benchao Li
> >


--
Best


Re: Re: [DISCUSS] Support source/sink parallelism config in Flink sql

2020-09-20 Thread

+1

> -原始邮件-
> 发件人: "Benchao Li" 
> 发送时间: 2020-09-20 16:28:20 (星期日)
> 收件人: dev 
> 抄送: 
> 主题: Re: [DISCUSS] Support source/sink parallelism config in Flink sql
> 
> Hi admin,
> 
> Thanks for bringing up this discussion.
> IMHO, it's a valuable feature. We also added this feature for our internal
> SQL engine.
> And our way is very similar to your proposal.
> 
> Regarding the implementation, there is one shorthand that we should modify
> each connector
> to support this property.
> We can wait for others' opinion whether this is a valid proposal. If yes,
> then we can discuss
> the implementation detailedly.
> 
> admin <17626017...@163.com> 于2020年9月10日周四 上午1:19写道:
> 
> > Hi devs:
> > Currently,Flink sql does not support source/sink parallelism config.So,it
> > will result in wasting or lacking resources in some cases.
> > I think it is necessary to introduce configuration of source/sink
> > parallelism in sql.
> > From my side,i have the solution for this feature.Add parallelism config
> > in ‘with’ properties of DDL.
> >
> > Before 1.11,we can get parallelism and then set it to
> > StreamTableSink#consumeDataStream or StreamTableSource#getDataStream
> > After 1.11,we can get parallelism from catalogTable and then set it to
> > transformation in CommonPhysicalTableSourceScan or CommonPhysicalSink.
> >
> > What do you think?
> >
> >
> >
> >
> >
> 
> -- 
> 
> Best,
> Benchao Li


Re: Re: [ANNOUNCE] New Apache Flink Committer - Godfrey He

2020-09-16 Thread

Congratulations!

> -原始邮件-
> 发件人: "Benchao Li" 
> 发送时间: 2020-09-16 14:22:25 (星期三)
> 收件人: dev 
> 抄送: "贺小令" 
> 主题: Re: [ANNOUNCE] New Apache Flink Committer - Godfrey He
> 
> Congratulations!
> 
> Zhu Zhu  于2020年9月16日周三 下午1:36写道:
> 
> > Congratulations!
> >
> > Thanks,
> > Zhu
> >
> > Leonard Xu  于2020年9月16日周三 下午1:32写道:
> >
> > > Congratulations! Godfrey
> > >
> > > Best,
> > > Leonard
> > >
> > > > 在 2020年9月16日,13:12,Yangze Guo  写道:
> > > >
> > > > Congratulations! Xiaoling.
> > > >
> > > > Best,
> > > > Yangze Guo
> > > >
> > > > On Wed, Sep 16, 2020 at 12:45 PM Dian Fu 
> > wrote:
> > > >>
> > > >> Congratulations, well deserved!
> > > >>
> > > >> Regards,
> > > >> Dian
> > > >>
> > > >>> 在 2020年9月16日,下午12:36,Guowei Ma  写道:
> > > >>>
> > > >>> Congratulations :)
> > > >>>
> > > >>> Best,
> > > >>> Guowei
> > > >>>
> > > >>>
> > > >>> On Wed, Sep 16, 2020 at 12:19 PM Jark Wu  wrote:
> > > >>>
> > >  Hi everyone,
> > > 
> > >  It's great seeing many new Flink committers recently, and on behalf
> > > of the
> > >  PMC,
> > >  I'd like to announce one more new committer: Godfrey He.
> > > 
> > >  Godfrey is a very long time contributor in the Flink community since
> > > the
> > >  end of 2016.
> > >  He has been a very active contributor in the Flink SQL component
> > with
> > > 153
> > >  PRs and more than 571,414 lines which is quite outstanding.
> > >  Godfrey has paid essential effort with SQL optimization and helped a
> > > lot
> > >  during the blink merging.
> > >  Besides that, he is also quite active with community work especially
> > > in
> > >  Chinese mailing list.
> > > 
> > >  Please join me in congratulating Godfrey for becoming a Flink
> > > committer!
> > > 
> > >  Cheers,
> > >  Jark Wu
> > > 
> > > >>
> > >
> > >
> >
> 
> 
> -- 
> 
> Best,
> Benchao Li



Re: Re: [ANNOUNCE] New Apache Flink Committer - Yun Tang

2020-09-15 Thread
Congratulations!


> -原始邮件-
> 发件人: "Dawid Wysakowicz" 
> 发送时间: 2020-09-15 20:34:34 (星期二)
> 收件人: dev@flink.apache.org, tang...@apache.org, "Yun Tang" 
> 抄送: 
> 主题: Re: [ANNOUNCE] New Apache Flink Committer - Yun Tang
> 
> Congratulations!
> 
> On 15/09/2020 12:19, Yu Li wrote:
> > Hi all,
> >
> > It's great seeing many new Flink committers recently, and on behalf of the
> > PMC, I'd like to announce one more new committer: Yun Tang!
> >
> > Yun has been an active contributor for more than two years, with 132
> > contributions including 72 commits and many PR reviews.
> >
> > Yun mainly works on state backend and checkpoint modules, and is one of the
> > main maintainers of RocksDB state backend, involved in critical features
> > like RocksDB memory management, etc.
> >
> > Besides that, Yun is very actively involved in QA and discussions in the
> > user and dev mailing lists (more than 300 replies since Jul. 2018).
> >
> > Please join me in congratulating Yun for becoming a Flink committer!
> >
> > Cheers,
> > Yu
> >
>


--
刘大龙

浙江大学 控制系 智能系统与控制研究所 工控新楼217
地址:浙江省杭州市浙大路38号浙江大学玉泉校区
Tel:18867547281


Fw: Re: Re: Re: The use of state ttl incremental cleanup strategy in sql deduplication resulting in significant performance degradation

2020-05-06 Thread



-原始邮件-
发件人:"刘大龙" 
发送时间:2020-05-06 17:55:25 (星期三)
收件人: "Jark Wu" 
抄送:
主题: Re: Re: Re: The use of state ttl incremental cleanup strategy in sql 
deduplication resulting in significant performance degradation

Thanks for your tuning ideas, I will test it later. Just to emphasize, I use 
non-mini batch deduplication for tests.


-原始邮件-
发件人:"Jark Wu" 
发送时间:2020-05-05 10:48:27 (星期二)
收件人: dev 
抄送: "刘大龙" , "Yu Li" , "Yun Tang" 

主题: Re: Re: The use of state ttl incremental cleanup strategy in sql 
deduplication resulting in significant performance degradation


Hi Andrey, 



Thanks for the tuning ideas. I will explain the design of deduplication. 


The mini-batch implementation of deduplication buffers a bundle of input data 
in heap (Java Map),

when the bundle size hit the trigger size or trigger time, the buffered data 
will be processed together. 
So we only need to access the state once per key. This is designed for rocksdb 
statebackend to reduce the
frequently accessing, (de)serialization. And yes, this may slow down the 
checkpoint, but the suggested 
mini-batch timeout is <= 10s. From our production experience, it doesn't have 
much impact on checkpoint.


Best,
Jark


On Tue, 5 May 2020 at 06:48, Andrey Zagrebin  wrote:

Hi lsyldliu,

You can try to tune the StateTtlConfig. As the documentation suggests [1]
the TTL incremental cleanup can decrease the per record performance. This
is the price of the automatic cleanup.
If the only thing, which happens mostly in your operator, is working with
state then even checking one additional record to cleanup is two times more
actions to do.
Timer approach was discussed in TTL feature design. It needs an additional
implementation and keeps more state but performs only one cleanup action
exactly when needed so it is a performance/storage trade-off.

Anyways, 20x degradation looks indeed a lot.
As a first step, I would suggest to configure the incremental cleanup
explicitly in `StateTtlConfigUtil#createTtlConfig` with a less entries to
check, e.g. 1 because processFirstRow/processLastRow already access the
state twice and do cleanup:

.cleanupIncrementally(1, false)


Also not sure but depending on the input data, finishBundle can happen
mostly during the snapshotting which slows down taking the checkpoint.
Could this fail the checkpoint accumulating the backpressure and slowing
down the pipeline?

Not sure why to keep the deduplication data in a Java map and in Flink
state at the same time, why not to keep it only in Flink state and
deduplicate on each incoming record?

Best,
Andrey

[1] note 2 in
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#incremental-cleanup

On Wed, Apr 29, 2020 at 11:53 AM 刘大龙  wrote:

>
>
>
> > -原始邮件-
> > 发件人: "Jark Wu" 
> > 发送时间: 2020-04-29 14:09:44 (星期三)
> > 收件人: dev , "Yu Li" ,
> myas...@live.com
> > 抄送: azagre...@apache.org
> > 主题: Re: The use of state ttl incremental cleanup strategy in sql
> deduplication resulting in significant performance degradation
> >
> > Hi lsyldliu,
> >
> > Thanks for investigating this.
> >
> > First of all, if you are using mini-batch deduplication, it doesn't
> support
> > state ttl in 1.9. That's why the tps looks the same with 1.11 disable
> state
> > ttl.
> > We just introduce state ttl for mini-batch deduplication recently.
> >
> > Regarding to the performance regression, it looks very surprise to me.
> The
> > performance is reduced by 19x when StateTtlConfig is enabled in 1.11.
> > I don't have much experience of the underlying of StateTtlConfig. So I
> loop
> > in @Yu Li  @YunTang in CC who may have more insights
> on
> > this.
> >
> > For more information, we use the following StateTtlConfig [1] in blink
> > planner:
> >
> > StateTtlConfig
> >   .newBuilder(Time.milliseconds(retentionTime))
> >   .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
> >   .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
> >   .build();
> >
> >
> > Best,
> > Jark
> >
> >
> > [1]:
> >
> https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/runtime/util/StateTtlConfigUtil.java#L27
> >
> >
> >
> >
> >
> > On Wed, 29 Apr 2020 at 11:53, 刘大龙  wrote:
> >
> > > Hi, all!
> > >
> > > At flink master branch, we have supported state ttl  for sql mini-batch
> > > deduplication using incremental cleanup strategy on heap backend,
> refer to
> > > FLINK-16581. Because I want to test the performance of this feature,
> so I
> > 

Re: Re: The use of state ttl incremental cleanup strategy in sql deduplication resulting in significant performance degradation

2020-04-29 Thread



> -原始邮件-
> 发件人: "Jark Wu" 
> 发送时间: 2020-04-29 14:09:44 (星期三)
> 收件人: dev , "Yu Li" , myas...@live.com
> 抄送: azagre...@apache.org
> 主题: Re: The use of state ttl incremental cleanup strategy in sql 
> deduplication resulting in significant performance degradation
> 
> Hi lsyldliu,
> 
> Thanks for investigating this.
> 
> First of all, if you are using mini-batch deduplication, it doesn't support
> state ttl in 1.9. That's why the tps looks the same with 1.11 disable state
> ttl.
> We just introduce state ttl for mini-batch deduplication recently.
> 
> Regarding to the performance regression, it looks very surprise to me. The
> performance is reduced by 19x when StateTtlConfig is enabled in 1.11.
> I don't have much experience of the underlying of StateTtlConfig. So I loop
> in @Yu Li  @YunTang in CC who may have more insights on
> this.
> 
> For more information, we use the following StateTtlConfig [1] in blink
> planner:
> 
> StateTtlConfig
>   .newBuilder(Time.milliseconds(retentionTime))
>   .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
>   .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
>   .build();
> 
> 
> Best,
> Jark
> 
> 
> [1]:
> https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime-blink/src/main/java/org/apache/flink/table/runtime/util/StateTtlConfigUtil.java#L27
> 
> 
> 
> 
> 
> On Wed, 29 Apr 2020 at 11:53, 刘大龙  wrote:
> 
> > Hi, all!
> >
> > At flink master branch, we have supported state ttl  for sql mini-batch
> > deduplication using incremental cleanup strategy on heap backend, refer to
> > FLINK-16581. Because I want to test the performance of this feature, so I
> > compile master branch code and deploy the jar to production
> > environment,then run three types of tests, respectively:
> >
> >
> >
> >
> > flink 1.9.0 release version enable state ttl
> > flink 1.11-snapshot version disable state ttl
> > flink 1.11-snapshot version enable state ttl
> >
> >
> >
> >
> > The test query sql as follows:
> >
> > select order_date,
> > sum(price * amount - goods_all_fav_amt - virtual_money_amt +
> > goods_carriage_amt) as saleP,
> > sum(amount) as saleN,
> > count(distinct parent_sn) as orderN,
> > count(distinct user_id) as cusN
> >from(
> > select order_date, user_id,
> > order_type, order_status, terminal, last_update_time,
> > goods_all_fav_amt,
> > goods_carriage_amt, virtual_money_amt, price, amount,
> > order_quality, quality_goods_cnt, acture_goods_amt
> > from (select *, row_number() over(partition by order_id,
> > order_goods_id order by proctime desc) as rownum from dm_trd_order_goods)
> > where rownum=1
> > and (order_type in (1,2,3,4,5) or order_status = 70)
> > and terminal = 'shop' and price > 0)
> > group by order_date
> >
> >
> > At runtime, this query will generate two operators which include
> > Deduplication and GroupAgg. In the test, the configuration is same,
> > parallelism is 20, set kafka consumer from the earliest, and disable
> > mini-batch function, The test results as follows:
> >
> > flink 1.9.0 enable state ttl:this test lasted 44m, flink receive 1374w
> > records, average tps at 5200/s, Flink UI picture link back pressure,
> > checkpoint
> > flink 1.11-snapshot version disable state ttl:this test lasted 28m, flink
> > receive 883w records, average tps at 5200/s, Flink UI picture link back
> > pressure, checkpoint
> > flink 1.11-snapshot version enable state ttl:this test lasted 1h 43m,
> > flink only receive 168w records because of deduplication operator serious
> > back pressure, average tps at 270/s, moreover, checkpoint always fail
> > because of deduplication operator serious back pressure, Flink UI picture
> > link back pressure, checkpoint
> >
> > Deduplication state clean up implement in flink 1.9.0 use timer, but
> > 1.11-snapshot version use StateTtlConfig, this is the main difference.
> > Comparing the three tests comprehensively, we can see that if disable state
> > ttl in 1.11-snapshot the performance is the same with 1.9.0 enable state
> > ttl. However, if enable state ttl in 1.11-snapshot, performance down is
> > nearly 20 times, so I think incremental cleanup strategy cause this
> > problem, what do you think about it? @azagrebin, @jark.
> >
> > Thanks.
> >
> > lsyldliu
> >
> > Zhejiang University, College of Control Science and engineer, CSC


--
刘大龙

浙江大学 控制系 智能系统与控制研究所 工控新楼217
地址:浙江省杭州市浙大路38号浙江大学玉泉校区
Tel:18867547281
Hi Jark,
I use non-minibtach deduplication and group agg for the tests, non-minibatch 
deduplicaiton state ttl implementation has been refactored use StateTtlConfig 
replace timer in current 1.11 master branch that PR is my work, I also surprise 
to the 19x performance down.

The use of state ttl incremental cleanup strategy in sql deduplication resulting in significant performance degradation

2020-04-28 Thread
Hi, all!

At flink master branch, we have supported state ttl  for sql mini-batch 
deduplication using incremental cleanup strategy on heap backend, refer to 
FLINK-16581. Because I want to test the performance of this feature, so I 
compile master branch code and deploy the jar to production environment,then 
run three types of tests, respectively: 




flink 1.9.0 release version enable state ttl
flink 1.11-snapshot version disable state ttl
flink 1.11-snapshot version enable state ttl




The test query sql as follows:

select order_date,
sum(price * amount - goods_all_fav_amt - virtual_money_amt + 
goods_carriage_amt) as saleP,
sum(amount) as saleN,
count(distinct parent_sn) as orderN,
count(distinct user_id) as cusN
   from(
select order_date, user_id, 
order_type, order_status, terminal, last_update_time, 
goods_all_fav_amt, 
goods_carriage_amt, virtual_money_amt, price, amount, 
order_quality, quality_goods_cnt, acture_goods_amt 
from (select *, row_number() over(partition by order_id, 
order_goods_id order by proctime desc) as rownum from dm_trd_order_goods) 
where rownum=1 
and (order_type in (1,2,3,4,5) or order_status = 70) 
and terminal = 'shop' and price > 0)
group by order_date


At runtime, this query will generate two operators which include Deduplication 
and GroupAgg. In the test, the configuration is same, parallelism is 20, set 
kafka consumer from the earliest, and disable mini-batch function, The test 
results as follows:

flink 1.9.0 enable state ttl:this test lasted 44m, flink receive 1374w records, 
average tps at 5200/s, Flink UI picture link back pressure, checkpoint
flink 1.11-snapshot version disable state ttl:this test lasted 28m, flink 
receive 883w records, average tps at 5200/s, Flink UI picture link back 
pressure, checkpoint
flink 1.11-snapshot version enable state ttl:this test lasted 1h 43m, flink 
only receive 168w records because of deduplication operator serious back 
pressure, average tps at 270/s, moreover, checkpoint always fail because of 
deduplication operator serious back pressure, Flink UI picture link back 
pressure, checkpoint

Deduplication state clean up implement in flink 1.9.0 use timer, but 
1.11-snapshot version use StateTtlConfig, this is the main difference. 
Comparing the three tests comprehensively, we can see that if disable state ttl 
in 1.11-snapshot the performance is the same with 1.9.0 enable state ttl. 
However, if enable state ttl in 1.11-snapshot, performance down is nearly 20 
times, so I think incremental cleanup strategy cause this problem, what do you 
think about it? @azagrebin, @jark.

Thanks.

lsyldliu

Zhejiang University, College of Control Science and engineer, CSC