RE: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-04-09 Thread David Radley
: Hao Li Date: Wednesday, 3 April 2024 at 18:58 To: dev@flink.apache.org Subject: [EXTERNAL] Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL Cross post David Radley's comments here from voting thread: > I don’t think this counts as an objection, I have some comments. I should have

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-04-03 Thread Hao Li
Cross post David Radley's comments here from voting thread: > I don’t think this counts as an objection, I have some comments. I should have put this on the discussion thread earlier but have just got to this. > - I suggest we can put a model version in the model resource. Versions are

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-28 Thread Hao Li
Thanks Timo. I'll start a vote tomorrow if no further discussion. Thanks, Hao On Thu, Mar 28, 2024 at 9:33 AM Timo Walther wrote: > Hi everyone, > > I updated the FLIP according to this discussion. > > @Hao Li: Let me know if I made a mistake somewhere. I added some > additional explaning

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-28 Thread Timo Walther
Hi everyone, I updated the FLIP according to this discussion. @Hao Li: Let me know if I made a mistake somewhere. I added some additional explaning comments about the new PTF syntax. There are no further objections from my side. If nobody objects, Hao feel free to start the voting tomorrow.

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-28 Thread Jark Wu
Thanks, Hao, Sounds good to me. Best, Jark On Thu, 28 Mar 2024 at 01:02, Hao Li wrote: > Hi Jark, > > I think we can start with supporting popular model providers such as > openai, azureml, sagemaker for remote models. > > Thanks, > Hao > > On Tue, Mar 26, 2024 at 8:15 PM Jark Wu wrote: > >

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-27 Thread Hao Li
Hi Jark, I think we can start with supporting popular model providers such as openai, azureml, sagemaker for remote models. Thanks, Hao On Tue, Mar 26, 2024 at 8:15 PM Jark Wu wrote: > Thanks for the PoC and updating, > > The final syntax looks good to me, at least it is a nice and concise

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-26 Thread Jark Wu
Thanks for the PoC and updating, The final syntax looks good to me, at least it is a nice and concise first step. SELECT f1, f2, label FROM ML_PREDICT( input => `my_data`, model => `my_cat`.`my_db`.`classifier_model`, args => DESCRIPTOR(f1, f2)); Besides, what built-in models

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-26 Thread Hao Li
Hi Timo, Yeah. For `primary key` and `from table(...)` those are explicitly matched in parser: [1]. > SELECT f1, f2, label FROM ML_PREDICT( input => `my_data`, model => `my_cat`.`my_db`.`classifier_model`, args => DESCRIPTOR(f1, f2)); This named argument syntax looks good to

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-26 Thread Timo Walther
Hi Hao, > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't > work since `TABLE` and `MODEL` are already key words This argument doesn't count. The parser supports introducing keywords that are still non-reserved. For example, this enables using "key" for both primary key

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-25 Thread Hao Li
Hi Timo, > Please double check if this is implementable with the current stack. I fear the parser or validator might not like the "identifier" argument? I checked this, currently the validator throws an exception trying to get the full qualifier name for `classifier_model`. But since

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-25 Thread Hao Li
Hi Ahmed, Looks like the feature freeze time for 1.20 release is June 15th. We can definitely get the model DDL into 1.20. For predict and evaluate functions, if we can't get into the 1.20 release, we can get them into the 1.21 release for sure. Thanks, Hao On Mon, Mar 25, 2024 at 1:25 AM

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-25 Thread Timo Walther
Hi Jark and Hao, thanks for the information, Jark! Great that the Calcite community already fixed the problem for us. +1 to adopt the simplified syntax asap. Maybe even before we upgrade Calcite (i.e. copy over classes), if upgrading Calcite is too much work right now? > Is `DESCRIPTOR` a

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-23 Thread Ahmed Hamdy
Hi everyone, +1 for this proposal, I believe it is very useful to the minimum, It would be great even having "ML_PREDICT" and "ML_EVALUATE" as built-in PTFs in this FLIP as discussed. IIUC this will be included in the 1.20 roadmap? Best Regards Ahmed Hamdy On Fri, 22 Mar 2024 at 23:54, Hao Li

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-22 Thread Hao Li
Hi Timo and Jark, I agree Oracle's syntax seems concise and more descriptive. For the built-in `ML_PREDICT` and `ML_EVALUATE` functions I agree with Jark we can support them as built-in PTF using `SqlTableFunction` for this FLIP. We can have a different FLIP discussing user defined PTF and adopt

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-22 Thread Jark Wu
Sorry, I mean we can bump the Calcite version if needed in Flink 1.20. On Fri, 22 Mar 2024 at 22:19, Jark Wu wrote: > Hi Timo, > > Introducing user-defined PTF is very useful in Flink, I'm +1 for this. > But I think the ML model FLIP is not blocked by this, because we > can introduce ML_PREDICT

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-22 Thread Jark Wu
Hi Timo, Introducing user-defined PTF is very useful in Flink, I'm +1 for this. But I think the ML model FLIP is not blocked by this, because we can introduce ML_PREDICT and ML_EVALUATE as built-in PTFs just like TUMBLE/HOP. And support user-defined ML functions as a future FLIP. Regarding the

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-22 Thread Timo Walther
Hi everyone, this is a very important change to the Flink SQL syntax but we can't wait until the SQL standard is ready for this. So I'm +1 on introducing the MODEL concept as a first class citizen in Flink. For your information: Over the past months I have already spent a significant amount

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-20 Thread Mingge Deng
Thanks Jark for all the insightful comments. We have updated the proposal per our offline discussions: 1. Model will be treated as a new relation in FlinkSQL. 2. Include the common ML predict and evaluate functions into the open source flink to complete the user journey. And we should be able

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-18 Thread Jark Wu
Hi Hao, > I meant how the table name in window TVF gets translated to `SqlCallingBinding`. Probably we need to fetch the table definition from the catalog somewhere. Do we treat those window TVF specially in parser/planner so that catalog is looked up when they are seen? The table names are

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-14 Thread Hao Li
Hi Jark, Thanks for the pointer. Sorry for the confusion: I meant how the table name in window TVF gets translated to `SqlCallingBinding`. Probably we need to fetch the table definition from the catalog somewhere. Do we treat those window TVF specially in parser/planner so that catalog is looked

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-14 Thread Jark Wu
Hi Hao, > Can you send me some pointers where the function gets the table information? Here is the code of cumulate window type checking [1]. > Also is it possible to support in window functions in addiction to table? Yes. It is not allowed in TVF. Thanks for the syntax links of other

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-14 Thread Hao Li
Hi Jark, Thanks for the pointers. It's very helpful. 1. Looks like `tumble`, `hopping` are keywords in calcite parser. And the syntax `cumulate(Table my_table, ...)` needs to get table information from catalog somewhere for type validation etc. Can you send me some pointers where the function

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-13 Thread Jark Wu
Hi Mingge, Hao, Thanks for your replies. > PTF is actually the ideal approach for model functions, and we do have the plans to use PTF for all model functions (including prediction, evaluation etc..) once the PTF is supported in FlinkSQL confluent extension. It sounds that PTF is the ideal way

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-13 Thread Hao Li
Hi Jark, Thanks for your questions. These are good questions! 1. The polymorphism table function I was referring to takes a table as input and outputs a table. So the syntax would be like ``` SELECT * FROM ML_PREDICT('model', (SELECT * FROM my_table)) ``` As far as I know, this is not supported

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-12 Thread Jark Wu
Hi Minge, Chris, Hao, Thanks for proposing this interesting idea. I think this is a nice step towards the AI world for Apache Flink. I don't know much about AI/ML, so I may have some stupid questions. 1. Could you tell more about why polymorphism table function (PTF) doesn't work and do we have

[DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-12 Thread Hao Li
Hi, Dev Mingge, Chris and I would like to start a discussion about FLIP-437: Support ML Models in Flink SQL. This FLIP is proposing to support machine learning models in Flink SQL syntax so that users can CRUD models with Flink SQL and use models on Flink to do prediction with Flink data. The