Re: [DISCUSSION] FLIP-456: CompiledPlan support for Batch Execution Mode

2024-06-07 Thread Jim Hughes
Hi Alexey,

Responses inline below:

On Mon, May 13, 2024 at 7:18 PM Alexey Leonov-Vendrovskiy <
vendrov...@gmail.com> wrote:

> Thanks Jim.
>
> > 1. For the testing, I'd call the tests "execution" tests rather than
> > "restore" tests.  For streaming execution, restore tests have the
> compiled
> > plan and intermediate state; the tests verify that those can work
> together
> > and continue processing.
>
> Agree that we don't need to store and restore the intermediate state. So
> the most critical part is that the CompiledPlan for batch can be executed.
>

On the FLIP, can you be more specific about what we are checking during
execution?  I'd suggest that `executeSql(_)` and
`executePlan(compilePlanSql(_))` should be compared.


> 2. The FLIP implicitly suggests "completeness tests" (to use FLIP-190's
> > words).  Do we need "change detection tests"?  I'm a little unsure if
> that
> > is presently happening in an automatic way for streaming operators.
>
>
>  We might need to elaborate more on this, but the idea is that  we need to
> make sure that compiled plans created by an older version of SQL Planner
> are executable on newer runtimes.
>
> 3.  Can we remove old versions of batch operators eventually?  Or do we
> > need to keep them forever like we would for streaming operators?
> >
>
> We could have deprecation paths for old operator nodes in some cases. It is
> a matter of the time window: what could be practical the "time distance"
> between query planner and flink runtime against which the query query can
> be resubmitted.
> Note, here we don't have continuous queries, so there is always an option
> to "re-plan" the original SQL query text into a newer version of the
> CompiledPlan.
> With this in mind, a time window of 1yr+ would allow deprecation of older
> batch exec nodes, though I don't see this as a frequent event.
>

As I read the JavaDocs for `TableEnvironment.loadPlan`, it looks like the
compiled plan ought to be sufficient to run a job at a later time.

I think the FLIP should be clear on the backwards support strategy here.
The strategy for streaming is "forever".  This may be the most interesting
part of the FLIP to discuss.

Can you let us know when you've updated the FLIP?

Cheers,

Jim


> -Alexey
>
>
>
> On Mon, May 13, 2024 at 1:52 PM Jim Hughes 
> wrote:
>
> > Hi Alexey,
> >
> > After some thought, I have a question about deprecations:
> >
> > 3.  Can we remove old versions of batch operators eventually?  Or do we
> > need to keep them forever like we would for streaming operators?
> >
> > Cheers,
> >
> > Jim
> >
> > On Thu, May 9, 2024 at 11:29 AM Jim Hughes  wrote:
> >
> > > Hi Alexey,
> > >
> > > Overall, the FLIP looks good and makes sense to me.
> > >
> > > 1. For the testing, I'd call the tests "execution" tests rather than
> > > "restore" tests.  For streaming execution, restore tests have the
> > compiled
> > > plan and intermediate state; the tests verify that those can work
> > together
> > > and continue processing.
> > >
> > > For batch execution, I think we just want that all existing compiled
> > plans
> > > can be executed in future versions.
> > >
> > > 2. The FLIP implicitly suggests "completeness tests" (to use FLIP-190's
> > > words).  Do we need "change detection tests"?  I'm a little unsure if
> > that
> > > is presently happening in an automatic way for streaming operators.
> > >
> > > In RestoreTestBase, generateTestSetupFiles is disabled and has to be
> run
> > > manually when tests are being written.
> > >
> > > Cheers,
> > >
> > > Jim
> > >
> > > On Tue, May 7, 2024 at 5:11 AM Paul Lam  wrote:
> > >
> > >> Hi Alexey,
> > >>
> > >> Thanks a lot for bringing up the discussion. I’m big +1 for the FLIP.
> > >>
> > >> I suppose the goal doesn’t involve the interchangeability of json
> plans
> > >> between batch mode and streaming mode, right?
> > >> In other words, a json plan compiled in a batch program can’t be run
> in
> > >> streaming mode without a migration (which is not yet supported).
> > >>
> > >> Best,
> > >> Paul Lam
> > >>
> > >> > 2024年5月7日 14:38,Alexey Leonov-Vendrovskiy 
> 写道:
> > >> >
> > >> > Hi everyone,
> > >> >
> > >> > PTAL at the proposed FLIP-456: CompiledPlan support for Batch
> > Execution
> > >> > Mode. It is pretty self-describing.
> > >> >
> > >> > Any thoughts are welcome!
> > >> >
> > >> > Thanks,
> > >> > Alexey
> > >> >
> > >> > [1]
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-456%3A+CompiledPlan+support+for+Batch+Execution+Mode
> > >> > .
> > >>
> > >>
> >
>


Re: [VOTE] FLIP-459: Support Flink hybrid shuffle integration with Apache Celeborn

2024-06-07 Thread Jim Hughes
HI all,

+1 (non-binding)

Cheers,

Jim

On Fri, Jun 7, 2024 at 4:03 AM Yuxin Tan  wrote:

> Hi everyone,
>
> Thanks for all the feedback about the FLIP-459 Support Flink
> hybrid shuffle integration with Apache Celeborn[1].
> The discussion thread is here [2].
>
> I'd like to start a vote for it. The vote will be open for at least
> 72 hours unless there is an objection or insufficient votes.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-459%3A+Support+Flink+hybrid+shuffle+integration+with+Apache+Celeborn
> [2] https://lists.apache.org/thread/gy7sm7qrf7yrv1rl5f4vtk5fo463ts33
>
> Best,
> Yuxin
>


Re: [DISCUSSION] FLIP-456: CompiledPlan support for Batch Execution Mode

2024-05-13 Thread Jim Hughes
Hi Alexey,

After some thought, I have a question about deprecations:

3.  Can we remove old versions of batch operators eventually?  Or do we
need to keep them forever like we would for streaming operators?

Cheers,

Jim

On Thu, May 9, 2024 at 11:29 AM Jim Hughes  wrote:

> Hi Alexey,
>
> Overall, the FLIP looks good and makes sense to me.
>
> 1. For the testing, I'd call the tests "execution" tests rather than
> "restore" tests.  For streaming execution, restore tests have the compiled
> plan and intermediate state; the tests verify that those can work together
> and continue processing.
>
> For batch execution, I think we just want that all existing compiled plans
> can be executed in future versions.
>
> 2. The FLIP implicitly suggests "completeness tests" (to use FLIP-190's
> words).  Do we need "change detection tests"?  I'm a little unsure if that
> is presently happening in an automatic way for streaming operators.
>
> In RestoreTestBase, generateTestSetupFiles is disabled and has to be run
> manually when tests are being written.
>
> Cheers,
>
> Jim
>
> On Tue, May 7, 2024 at 5:11 AM Paul Lam  wrote:
>
>> Hi Alexey,
>>
>> Thanks a lot for bringing up the discussion. I’m big +1 for the FLIP.
>>
>> I suppose the goal doesn’t involve the interchangeability of json plans
>> between batch mode and streaming mode, right?
>> In other words, a json plan compiled in a batch program can’t be run in
>> streaming mode without a migration (which is not yet supported).
>>
>> Best,
>> Paul Lam
>>
>> > 2024年5月7日 14:38,Alexey Leonov-Vendrovskiy  写道:
>> >
>> > Hi everyone,
>> >
>> > PTAL at the proposed FLIP-456: CompiledPlan support for Batch Execution
>> > Mode. It is pretty self-describing.
>> >
>> > Any thoughts are welcome!
>> >
>> > Thanks,
>> > Alexey
>> >
>> > [1]
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-456%3A+CompiledPlan+support+for+Batch+Execution+Mode
>> > .
>>
>>


Re: Alignment with FlinkSQL Standards | Feedback requested

2024-05-13 Thread Jim Hughes
Hi Kanchi,

One can already directly query Kafka topics using Flink SQL.  The Flink
Kafka connector allows one to use Kafka to back a Flink table.

As a concrete example, this blog post should explain things:
https://flink.apache.org/2020/07/28/flink-sql-demo-building-an-end-to-end-streaming-application/#creating-a-kafka-table-using-ddl

Is there some capability which is missing from the existing set of features?

Cheers,

Jim

On Fri, May 10, 2024 at 2:21 AM Kanchi Masalia
 wrote:

> Dear Flink Community,
>
> I hope this message finds you well. I am considering the idea of extending
> FlinkSQL's capabilities to include direct queries from Kafka topics,
> specifically through the use of "SELECT * FROM kafka_topic". Currently, as
> we are aware, FlinkSQL permits queries like "SELECT * FROM
> flink_sql_table".
>
> I am writing to seek your valued opinions and insights on whether extending
> this capability to include direct Kafka topic queries would align with the
> guidelines and architectural philosophy of FlinkSQL. This potential feature
> could offer a more streamlined approach for users to access and analyze
> data directly from Kafka potentially simplifying workflow processes.
>
> Here are a few points I would like your input on:
>
>1. Does this idea align with the current and future usage philosophy of
>FlinkSQL?
>2. Are there potential challenges or conflicts this feature might pose
>against existing FlinkSQL guidelines?
>3. Would this enhancement align with how you currently use or plan to
>use Flink in your data processing tasks?
>
> Your feedback will be instrumental in guiding the direction of this
> potential feature.
>
> Thank you very much for considering my inquiry. I am looking forward to
> your insights and hope this can spark a constructive discussion within our
> community.
>
> Thanks,
> Kanchi Masalia
>


Re: [DISCUSSION] FLIP-456: CompiledPlan support for Batch Execution Mode

2024-05-09 Thread Jim Hughes
Hi Alexey,

Overall, the FLIP looks good and makes sense to me.

1. For the testing, I'd call the tests "execution" tests rather than
"restore" tests.  For streaming execution, restore tests have the compiled
plan and intermediate state; the tests verify that those can work together
and continue processing.

For batch execution, I think we just want that all existing compiled plans
can be executed in future versions.

2. The FLIP implicitly suggests "completeness tests" (to use FLIP-190's
words).  Do we need "change detection tests"?  I'm a little unsure if that
is presently happening in an automatic way for streaming operators.

In RestoreTestBase, generateTestSetupFiles is disabled and has to be run
manually when tests are being written.

Cheers,

Jim

On Tue, May 7, 2024 at 5:11 AM Paul Lam  wrote:

> Hi Alexey,
>
> Thanks a lot for bringing up the discussion. I’m big +1 for the FLIP.
>
> I suppose the goal doesn’t involve the interchangeability of json plans
> between batch mode and streaming mode, right?
> In other words, a json plan compiled in a batch program can’t be run in
> streaming mode without a migration (which is not yet supported).
>
> Best,
> Paul Lam
>
> > 2024年5月7日 14:38,Alexey Leonov-Vendrovskiy  写道:
> >
> > Hi everyone,
> >
> > PTAL at the proposed FLIP-456: CompiledPlan support for Batch Execution
> > Mode. It is pretty self-describing.
> >
> > Any thoughts are welcome!
> >
> > Thanks,
> > Alexey
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-456%3A+CompiledPlan+support+for+Batch+Execution+Mode
> > .
>
>


Re: [DISCUSS] FLIP-XXX: Introduce Flink SQL variables

2024-03-27 Thread Jim Hughes
Hi Ferenc,

Looks like a good idea.

I'd prefer sticking to the SQL standard if possible.  Would it be possible
/ sensible to allow for each syntax, perhaps managed by a config setting?

Cheers,

Jim

On Tue, Mar 26, 2024 at 6:59 AM Ferenc Csaky 
wrote:

> Hello devs,
>
> I would like to start a discussion about FLIP-XXX: Introduce Flink SQL
> variables [1].
>
> The main motivation behing this change is to be able to abstract Flink SQL
> from
> environment-specific configuration and provide a way to carry jobs between
> environments (e.g. dev-stage-prod) without the need to make changes in the
> code.
> It can also be a way to decouple sensitive information from the job code,
> or help
> with redundant literals.
>
> The main decision regarding the proposed solution is to handle the
> variable resolution
> as early as possible on the given string statement, so the whole operation
> is an easy and
> lightweight string replace. But this approach introduces some limitations
> as well:
>
> - The executed SQL will always be the unresolved, raw string, so in case
> of secrets
> a DESC operation would show them.
> - Changing the value of a variable can break code that uses that variable.
>
> For more details, please check the FLIP [1]. There is also a stale Jira
> about this [2].
>
> Looking forward to any comments and opinions!
>
> Thanks,
> Ferenc
>
> [1]
> https://docs.google.com/document/d/1-eUz-PBCdqNggG_irDT0X7fdL61ysuHOaWnrkZHb5Hc/edit?usp=sharing
> [2] https://issues.apache.org/jira/browse/FLINK-17377


Re: [DISCUSS] FLIP-434: Support optimizations for pre-partitioned data sources

2024-03-11 Thread Jim Hughes
Hi Jeyhun,

I like the idea!  Given FLIP-376[1], I wonder if it'd make sense to
generalize FLIP-434 to be about "pre-divided" data to cover "buckets" and
"partitions" (and maybe even situations where a data source is partitioned
and bucketed).

Separate from that, the page mentions TPC-H Q1 as an example.  For a join,
any two tables joined on the same bucket key should provide a concrete
example of a join.  Systems like Kafka Streams/ksqlDB call this
"co-partitioning"; for those systems, it is a requirement placed on the
input sources.  For Flink, with FLIP-434, the proposed planner rule
could remove the shuffle.

Definitely a fun idea; I look forward to hearing more!

Cheers,

Jim


1.
https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause
2.
https://docs.ksqldb.io/en/latest/developer-guide/joins/partition-data/#co-partitioning-requirements

On Sun, Mar 10, 2024 at 3:38 PM Jeyhun Karimov  wrote:

> Hi devs,
>
> I’d like to start a discussion on FLIP-434: Support optimizations for
> pre-partitioned data sources [1].
>
> The FLIP introduces taking advantage of pre-partitioned data sources for
> SQL/Table API (it is already supported as experimental feature in
> DataStream API [2]).
>
>
> Please find more details in the FLIP wiki document [1].
> Looking forward to your feedback.
>
> Regards,
> Jeyhun
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-434%3A+Support+optimizations+for+pre-partitioned+data+sources
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/experimental/
>


[jira] [Created] (FLINK-34173) Implement CatalogTable.Builder

2024-01-19 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-34173:
--

 Summary: Implement CatalogTable.Builder
 Key: FLINK-34173
 URL: https://issues.apache.org/jira/browse/FLINK-34173
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes
Assignee: Jim Hughes






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-34172) Add support for altering a distribution via ALTER TABLE

2024-01-19 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-34172:
--

 Summary: Add support for altering a distribution via ALTER TABLE 
 Key: FLINK-34172
 URL: https://issues.apache.org/jira/browse/FLINK-34172
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-419: Optimize multi-sink query plan generation

2024-01-16 Thread Jim Hughes
Hi Jeyhun,


Generally, I like the idea of speeding up the optimizer in the case of
multiple queries!


I am new to the optimizer, but I have a few comments / questions.



   1. StreamOptimizeContext may still be needed to pass the fact that we
   are optimizing a streaming query.  I don't think this class will go away
   completely.  (I agree it may become more simple if the kind or
   mini-batch configuration can be removed.)
   2. How are the mini-batch and changelog inference rules tightly coupled?
   I looked a little bit and I haven't seen any connection between them.  It
   seems like the changelog inference is what needs to run multiple times.
   3. I think your point about code complexity is unnecessary.
StreamOptimizeContext
   extends org.apache.calcite.plan.Context which is used an interface to pass
   information and objects through the Calcite stack.
   4. Is an alternative where the complexity of the changelog optimization
   can be moved into the `FlinkChangelogModeInferenceProgram`?  (If this is
   coupling between the mini-batch and changelog rules, then this would not
   make sense.)
   5. There are some other smaller refactorings.  I tried some of them
   here: https://github.com/apache/flink/pull/24108 Mostly, it is syntax
   and using lazy vals to avoid recomputing various things.  (Feel free to
   take whatever actually works; I haven't run the tests.)

Separately, folks on the Calcite dev list are thinking about multi-query
optimization:
https://lists.apache.org/thread/mcdqwrtpx0os54t2nn9vtk17spkp5o5k
https://issues.apache.org/jira/browse/CALCITE-6188

Cheers,


Jim

On Tue, Jan 16, 2024 at 5:45 PM Jeyhun Karimov  wrote:

> Hi devs,
>
> I’d like to start a discussion on FLIP-419: Optimize multi-sink query plan
> generation [1].
>
>
> Currently, the optimization process of multi-sink query plans are
> suboptimal: 1) it requires to go through the optimization process several
> times and 2) as a result of this some low-level code complexity is
> introduced on high level optimization classes such
> as StreamCommonSubGraphBasedOptimizer.
>
>
> To address this issue, this FLIP introduces  to decouple changelog and
> mini-batch interval inference from the main optimization process.
>
> Please find more details in the FLIP wiki document [1]. Looking forward to
> your feedback.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-419%3A+Optimize+multi-sink+query+plan+generation
>
>
> Regards,
> Jeyhun Karimov
>


[jira] [Created] (FLINK-34067) Fix javacc warnings in flink-sql-parser

2024-01-11 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-34067:
--

 Summary: Fix javacc warnings in flink-sql-parser
 Key: FLINK-34067
 URL: https://issues.apache.org/jira/browse/FLINK-34067
 Project: Flink
  Issue Type: Improvement
Reporter: Jim Hughes
Assignee: Jim Hughes


While extending the Flink SQL parser, I noticed these two warnings:

```
[INFO] --- javacc:2.4:javacc (javacc) @ flink-sql-parser ---                    
                             
Java Compiler Compiler Version 4.0 (Parser Generator)                           
                            
(type "javacc" with no arguments for help)                                      
                                                           
Reading from file 
.../flink-table/flink-sql-parser/target/generated-sources/javacc/Parser.jj . . 
.               
Note: UNICODE_INPUT option is specified. Please make sure you create the 
parser/lexer using a Reader with the correct character encoding.  
Warning: Choice conflict involving two expansions at                            
                               
         line 2043, column 13 and line 2052, column 9 respectively.             
                           
         A common prefix is: "IF"                                               
                                                            Consider using a 
lookahead of 2 for earlier expansion.                                           
     
Warning: Choice conflict involving two expansions at                            
                              
         line 2097, column 13 and line 2105, column 8 respectively.             
                            
         A common prefix is: "IF"                                               
                                                   
         Consider using a lookahead of 2 for earlier expansion.     
```

As the warning suggestions, adding `LOOKAHEAD(2)` in a few places addresses the 
warning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-400: AsyncScalarFunction for asynchronous scalar function support

2023-12-21 Thread Jim Hughes
Hi Alan,

+1 (non binding)

Cheers,

Jim

On Wed, Dec 20, 2023 at 2:41 PM Alan Sheinberg
 wrote:

> Hi everyone,
>
> I'd like to start a vote on FLIP-400 [1]. It covers introducing a new UDF
> type, AsyncScalarFunction for completing invocations asynchronously.  It
> has been discussed in this thread [2].
>
> I would like to start a vote.  The vote will be open for at least 72 hours
> (until December 28th 18:00 GMT) unless there is an objection or
> insufficient votes.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-400%3A+AsyncScalarFunction+for+asynchronous+scalar+function+support
> [2] https://lists.apache.org/thread/q3st6t1w05grd7bthzfjtr4r54fv4tm2
>
> Thanks,
> Alan
>


[jira] [Created] (FLINK-33918) Fix AsyncSinkWriterThrottlingTest test failure

2023-12-20 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33918:
--

 Summary: Fix AsyncSinkWriterThrottlingTest test failure
 Key: FLINK-33918
 URL: https://issues.apache.org/jira/browse/FLINK-33918
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.19.0
Reporter: Jim Hughes


>From 
>[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55700=logs=1c002d28-a73d-5309-26ee-10036d8476b4=d1c117a6-8f13-5466-55f0-d48dbb767fcd]

 

```
Dec 20 03:09:03 03:09:03.411 [ERROR] 
org.apache.flink.connector.base.sink.writer.AsyncSinkWriterThrottlingTest.testSinkThroughputShouldThrottleToHalfBatchSize
 -- Time elapsed: 0.879 s <<< ERROR! 
Dec 20 03:09:03 java.lang.IllegalStateException: Illegal thread detected. This 
method must be called from inside the mailbox thread! 
Dec 20 03:09:03 at 
org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.checkIsMailboxThread(TaskMailboxImpl.java:262)
 
Dec 20 03:09:03 at 
org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.take(TaskMailboxImpl.java:137)
 
Dec 20 03:09:03 at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.yield(MailboxExecutorImpl.java:84)
 
Dec 20 03:09:03 at 
org.apache.flink.connector.base.sink.writer.AsyncSinkWriter.flush(AsyncSinkWriter.java:367)
 
Dec 20 03:09:03 at 
org.apache.flink.connector.base.sink.writer.AsyncSinkWriter.lambda$registerCallback$3(AsyncSinkWriter.java:315)
 
Dec 20 03:09:03 at 
org.apache.flink.streaming.runtime.tasks.TestProcessingTimeService$CallbackTask.onProcessingTime(TestProcessingTimeService.java:199)
 
Dec 20 03:09:03 at 
org.apache.flink.streaming.runtime.tasks.TestProcessingTimeService.setCurrentTime(TestProcessingTimeService.java:76)
 
Dec 20 03:09:03 at 
org.apache.flink.connector.base.sink.writer.AsyncSinkWriterThrottlingTest.testSinkThroughputShouldThrottleToHalfBatchSize(AsyncSinkWriterThrottlingTest.java:64)
 
Dec 20 03:09:03 at java.lang.reflect.Method.invoke(Method.java:498) 
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33805) Implement restore tests for OverAggregate node

2023-12-12 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33805:
--

 Summary: Implement restore tests for OverAggregate node
 Key: FLINK-33805
 URL: https://issues.apache.org/jira/browse/FLINK-33805
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-11 Thread Jim Hughes
Hi Xuyang,

On Sun, Dec 10, 2023 at 10:41 PM Xuyang  wrote:

> Hi, Jim.
> >As a clarification, since FLINK-24204 is finishing up work from
> >FLIP-145[1], do we need to discuss anything before you work out the
> details
> >of FLINK-24024 as a PR?
> Which issue do you mean? It seems that FLINK-24204[1] is the issue with
> table api type system.
>

Ah, I mean to ask if you can contribute the new SESSION Table support
without needing FLIP-392 completely settled.  I was trying to see if that
is separate work which can be done or if there is some dependency on this
FLIP.


> I've got a PR up [3] for moving at least one of the classes you are
> touching.
> Nice work! Since we are not going to delete the legacy group window agg
> operator actually, the only compatibility issue
> may be that when using flink sql, the legacy group window agg operator
> will be rewritten into new operators. Will these tests be affected about
> this rewritten?
>

The tests should not be impacted.  Depending on what order our work lands
in, one of the tests you've added/updated would likely move to the
RestoreTests that Bonnie and I are working on.  Just mentioning that ahead
of time

Cheers,

Jim



>
> [1] https://issues.apache.org/jira/browse/FLINK-24204
>
>
>
>
>
>
> --
>
> Best!
> Xuyang
>
>
>
>
>
> At 2023-12-09 06:25:30, "Jim Hughes"  wrote:
> >Hi Xuyang,
> >
> >As a clarification, since FLINK-24204 is finishing up work from
> >FLIP-145[1], do we need to discuss anything before you work out the
> details
> >of FLINK-24024 as a PR?
> >
> >Relatedly, as that goes up for a PR, as part of FLINK-33421 [2], Bonnie
> and
> >I are working through migrating some of the JsonPlan Tests and ITCases to
> >RestoreTests.  I've got a PR up [3] for moving at least one of the classes
> >you are touching.  Let me know if I can share any details about that work.
> >
> >Cheers,
> >
> >Jim
> >
> >1.
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows
> >
> >2. https://issues.apache.org/jira/browse/FLINK-33421
> >3. https://github.com/apache/flink/pull/23886
> >https://issues.apache.org/jira/browse/FLINK-33676
> >
> >On Tue, Nov 28, 2023 at 7:31 AM Xuyang  wrote:
> >
> >> Hi all.
> >> I'd like to start a discussion of FLIP-392: Deprecate the Legacy Group
> >> Window Aggregation.
> >>
> >>
> >> Although the current Flink SQL Window Aggregation documentation[1]
> >> indicates that the legacy Group Window Aggregation
> >> syntax has been deprecated, the new Window TVF Aggregation syntax has
> not
> >> fully covered all of the features of the legacy one.
> >>
> >>
> >> Compared to Group Window Aggergation, Window TVF Aggergation has several
> >> advantages, such as two-stage optimization,
> >> support for standard GROUPING SET syntax, and so on. However, it needs
> to
> >> supplement and enrich the following features.
> >>
> >>
> >> 1. Support for SESSION Window TVF Aggregation
> >> 2. Support for consuming CDC stream
> >> 3. Support for HOP window size with non-integer step length
> >> 4. Support for configurations such as early fire, late fire and allow
> >> lateness
> >> (which are internal experimental configurations in Group Window
> >> Aggregation and not public to users yet.)
> >> 5. Unification of the Window TVF Aggregation operator in runtime at the
> >> implementation layer
> >> (In the long term, the cost to maintain the operators about Window TVF
> >> Aggregation and Group Window Aggregation is too expensive.)
> >>
> >>
> >> This flip aims to continue the unfinished work in FLIP-145[2], which is
> to
> >> fully enable the capabilities of Window TVF Aggregation
> >>  and officially deprecate the legacy syntax Group Window Aggregation, to
> >> prepare for the removal of the legacy one in Flink 2.0.
> >>
> >>
> >> I have already done some preliminary POC to validate the feasibility of
> >> the related work in this flip as follows.
> >> 1. POC for SESSION Window TVF Aggregation [3]
> >> 2. POC for CUMULATE in Group Window Aggregation operator [4]
> >> 3. POC for consuming CDC stream in Window Aggregation operator [5]
> >>
> >>
> >> Looking forward to your feedback and thoughts!
> >>
> >>
> >>
> >> [1]
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/
> >>
> >> [2]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows
> >> [3] https://github.com/xuyangzhong/flink/tree/FLINK-24024
> >> [4]
> >>
> https://github.com/xuyangzhong/flink/tree/poc_legacy_group_window_agg_cumulate
> >> [5]
> >>
> https://github.com/xuyangzhong/flink/tree/poc_window_agg_consumes_cdc_stream
> >>
> >>
> >>
> >> --
> >>
> >> Best!
> >> Xuyang
>


Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-08 Thread Jim Hughes
Hi Xuyang,

As a clarification, since FLINK-24204 is finishing up work from
FLIP-145[1], do we need to discuss anything before you work out the details
of FLINK-24024 as a PR?

Relatedly, as that goes up for a PR, as part of FLINK-33421 [2], Bonnie and
I are working through migrating some of the JsonPlan Tests and ITCases to
RestoreTests.  I've got a PR up [3] for moving at least one of the classes
you are touching.  Let me know if I can share any details about that work.

Cheers,

Jim

1.
https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows

2. https://issues.apache.org/jira/browse/FLINK-33421
3. https://github.com/apache/flink/pull/23886
https://issues.apache.org/jira/browse/FLINK-33676

On Tue, Nov 28, 2023 at 7:31 AM Xuyang  wrote:

> Hi all.
> I'd like to start a discussion of FLIP-392: Deprecate the Legacy Group
> Window Aggregation.
>
>
> Although the current Flink SQL Window Aggregation documentation[1]
> indicates that the legacy Group Window Aggregation
> syntax has been deprecated, the new Window TVF Aggregation syntax has not
> fully covered all of the features of the legacy one.
>
>
> Compared to Group Window Aggergation, Window TVF Aggergation has several
> advantages, such as two-stage optimization,
> support for standard GROUPING SET syntax, and so on. However, it needs to
> supplement and enrich the following features.
>
>
> 1. Support for SESSION Window TVF Aggregation
> 2. Support for consuming CDC stream
> 3. Support for HOP window size with non-integer step length
> 4. Support for configurations such as early fire, late fire and allow
> lateness
> (which are internal experimental configurations in Group Window
> Aggregation and not public to users yet.)
> 5. Unification of the Window TVF Aggregation operator in runtime at the
> implementation layer
> (In the long term, the cost to maintain the operators about Window TVF
> Aggregation and Group Window Aggregation is too expensive.)
>
>
> This flip aims to continue the unfinished work in FLIP-145[2], which is to
> fully enable the capabilities of Window TVF Aggregation
>  and officially deprecate the legacy syntax Group Window Aggregation, to
> prepare for the removal of the legacy one in Flink 2.0.
>
>
> I have already done some preliminary POC to validate the feasibility of
> the related work in this flip as follows.
> 1. POC for SESSION Window TVF Aggregation [3]
> 2. POC for CUMULATE in Group Window Aggregation operator [4]
> 3. POC for consuming CDC stream in Window Aggregation operator [5]
>
>
> Looking forward to your feedback and thoughts!
>
>
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/
>
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows
> [3] https://github.com/xuyangzhong/flink/tree/FLINK-24024
> [4]
> https://github.com/xuyangzhong/flink/tree/poc_legacy_group_window_agg_cumulate
> [5]
> https://github.com/xuyangzhong/flink/tree/poc_window_agg_consumes_cdc_stream
>
>
>
> --
>
> Best!
> Xuyang


[jira] [Created] (FLINK-33777) ParquetTimestampITCase>FsStreamingSinkITCaseBase failing in CI

2023-12-07 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33777:
--

 Summary: ParquetTimestampITCase>FsStreamingSinkITCaseBase failing 
in CI
 Key: FLINK-33777
 URL: https://issues.apache.org/jira/browse/FLINK-33777
 Project: Flink
  Issue Type: Bug
Reporter: Jim Hughes


>From this CI run: 
>[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=55334=logs=2e8cb2f7-b2d3-5c62-9c05-cd756d33a819=2dd510a3-5041-5201-6dc3-54d310f68906]

```
Dec 07 19:57:30 19:57:30.026 [ERROR] Errors: 
Dec 07 19:57:30 19:57:30.026 [ERROR] 
ParquetTimestampITCase>FsStreamingSinkITCaseBase.testNonPart:84->FsStreamingSinkITCaseBase.testPartitionCustomFormatDate:151->FsStreamingSinkITCaseBase.test:186
 » Validation 
Dec 07 19:57:30 19:57:30.026 [ERROR] 
ParquetTimestampITCase>FsStreamingSinkITCaseBase.testPart:89->FsStreamingSinkITCaseBase.testPartitionCustomFormatDate:151->FsStreamingSinkITCaseBase.test:186
 » Validation 
Dec 07 19:57:30 19:57:30.026 [ERROR] 
ParquetTimestampITCase>FsStreamingSinkITCaseBase.testPartitionWithBasicDate:126->FsStreamingSinkITCaseBase.test:186
 » Validation 
```
The errors each appear somewhat similar:
```

Dec 07 19:54:43 19:54:43.934 [ERROR] 
org.apache.flink.formats.parquet.ParquetTimestampITCase.testPartitionWithBasicDate
 Time elapsed: 1.822 s <<< ERROR! 
Dec 07 19:54:43 org.apache.flink.table.api.ValidationException: Unable to find 
a field named 'f0' in the physical data type derived from the given type 
information for schema declaration. Make sure that the type information is not 
a generic raw type. Currently available fields are: [a, b, c, d, e] 
Dec 07 19:54:43 at 
org.apache.flink.table.catalog.SchemaTranslator.patchDataTypeFromColumn(SchemaTranslator.java:350)
 
Dec 07 19:54:43 at 
org.apache.flink.table.catalog.SchemaTranslator.patchDataTypeFromDeclaredSchema(SchemaTranslator.java:337)
 
Dec 07 19:54:43 at 
org.apache.flink.table.catalog.SchemaTranslator.createConsumingResult(SchemaTranslator.java:235)
 
Dec 07 19:54:43 at 
org.apache.flink.table.catalog.SchemaTranslator.createConsumingResult(SchemaTranslator.java:180)
 
Dec 07 19:54:43 at 
org.apache.flink.table.api.bridge.internal.AbstractStreamTableEnvironmentImpl.fromStreamInternal(AbstractStreamTableEnvironmentImpl.java:141)
 
Dec 07 19:54:43 at 
org.apache.flink.table.api.bridge.scala.internal.StreamTableEnvironmentImpl.createTemporaryView(StreamTableEnvironmentImpl.scala:121)
 
Dec 07 19:54:43 at 
org.apache.flink.table.planner.runtime.stream.FsStreamingSinkITCaseBase.test(FsStreamingSinkITCaseBase.scala:186)
 
Dec 07 19:54:43 at 
org.apache.flink.table.planner.runtime.stream.FsStreamingSinkITCaseBase.testPartitionWithBasicDate(FsStreamingSinkITCaseBase.scala:126)
```
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-400: AsyncScalarFunction for asynchronous scalar function support

2023-12-06 Thread Jim Hughes
Hi Alan,

Nicely written and makes sense.  The only feedback I have is around the
naming of the generalization, e.g. "Specifically, PythonCalcSplitRuleBase
will be generalized into RemoteCalcSplitRuleBase."  This naming seems to
imply/suggest that all Async functions are remote.  I wonder if we can find
another name which doesn't carry that connotation; maybe
AsyncCalcSplitRuleBase.  (An AsyncCalcSplitRuleBase which handles Python
and Async functions seems reasonable.)

Cheers,

Jim

On Wed, Dec 6, 2023 at 5:45 PM Alan Sheinberg
 wrote:

> I'd like to start a discussion of FLIP-400: AsyncScalarFunction for
> asynchronous scalar function support [1]
>
> This feature proposes adding a new UDF type AsyncScalarFunction which is
> invoked just like a normal ScalarFunction, but is implemented with an
> asynchronous eval method.  I had brought this up including the motivation
> in a previous discussion thread [2].
>
> The purpose is to achieve high throughput scalar function UDFs while
> allowing that an individual call may have high latency.  It allows scaling
> up the parallelism of just these calls without having to increase the
> parallelism of the whole query (which could be rather resource
> inefficient).
>
> In practice, it should enable SQL integration with external services and
> systems, which Flink has limited support for at the moment. It should also
> allow easier integration with existing libraries which use asynchronous
> APIs.
>
> Looking forward to your feedback and suggestions.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-400%3A+AsyncScalarFunction+for+asynchronous+scalar+function+support
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-400%3A+AsyncScalarFunction+for+asynchronous+scalar+function+support
> >
>
> [2] https://lists.apache.org/thread/bn153gmcobr41x2nwgodvmltlk810hzs
> 
>
> Thanks,
> Alan
>


[jira] [Created] (FLINK-33767) Implement restore tests for TemporalJoin node

2023-12-06 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33767:
--

 Summary: Implement restore tests for TemporalJoin node
 Key: FLINK-33767
 URL: https://issues.apache.org/jira/browse/FLINK-33767
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes
Assignee: Jim Hughes






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33758) Implement restore tests for TemporalSort node

2023-12-05 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33758:
--

 Summary: Implement restore tests for TemporalSort node
 Key: FLINK-33758
 URL: https://issues.apache.org/jira/browse/FLINK-33758
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33757) Implement restore tests for Rank node

2023-12-05 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33757:
--

 Summary: Implement restore tests for Rank node
 Key: FLINK-33757
 URL: https://issues.apache.org/jira/browse/FLINK-33757
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes
Assignee: Jim Hughes






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33756) Missing record with CUMULATE/HOP windows using an optimization

2023-12-05 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33756:
--

 Summary: Missing record with CUMULATE/HOP windows using an 
optimization
 Key: FLINK-33756
 URL: https://issues.apache.org/jira/browse/FLINK-33756
 Project: Flink
  Issue Type: Bug
Reporter: Jim Hughes


I have seen an optimization cause a window fail to emit a record.

With the optimization `TABLE_OPTIMIZER_DISTINCT_AGG_SPLIT_ENABLED` set to true, 
the configuration AggregatePhaseStrategy.TWO_PHASE set, using a HOP or CUMULATE 
window with an offset, a record can be sent which causes one of the multiple 
active windows to fail to emit a record.

The link code modifies the `WindowAggregateJsonITCase` to demonstrate the case. 
 
 
The test `testDistinctSplitDisabled` shows the expected behavior.  The test 
`testDistinctSplitEnabled` tests the above configurations and shows that one 
record is missing from the output.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33676) Implement restore tests for WindowAggregate node

2023-11-28 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33676:
--

 Summary: Implement restore tests for WindowAggregate node
 Key: FLINK-33676
 URL: https://issues.apache.org/jira/browse/FLINK-33676
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes
Assignee: Jim Hughes






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33667) Implement restore tests for MatchRecognize node

2023-11-27 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33667:
--

 Summary: Implement restore tests for MatchRecognize node
 Key: FLINK-33667
 URL: https://issues.apache.org/jira/browse/FLINK-33667
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes
Assignee: Jim Hughes






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-393: Make QueryOperations SQL serializable

2023-11-21 Thread Jim Hughes
+1 (non-binding)

Thanks, Dawid!

Jim

On Tue, Nov 21, 2023 at 7:20 AM Gyula Fóra  wrote:

> +1 (binding)
>
> Gyula
>
> On Tue, 21 Nov 2023 at 13:11, xiangyu feng  wrote:
>
> > +1 (non-binding)
> >
> > Thanks for driving this.
> >
> > Best,
> > Xiangyu Feng
> >
> >
> > Ferenc Csaky  于2023年11月21日周二 20:07写道:
> >
> > > +1 (non-binding)
> > >
> > > Lookgin forward to this!
> > >
> > > Best,
> > > Ferenc
> > >
> > >
> > >
> > >
> > > On Tuesday, November 21st, 2023 at 12:21, Martijn Visser <
> > > martijnvis...@apache.org> wrote:
> > >
> > >
> > > >
> > > >
> > > > +1 (binding)
> > > >
> > > > Thanks for driving this.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > On Tue, Nov 21, 2023 at 12:18 PM Benchao Li libenc...@apache.org
> > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Dawid Wysakowicz wysakowicz.da...@gmail.com 于2023年11月21日周二
> 18:56写道:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > Thank you to everyone for the feedback on FLIP-393: Make
> > > QueryOperations
> > > > > > SQL serializable[1]
> > > > > > which has been discussed in this thread [2].
> > > > > >
> > > > > > I would like to start a vote for it. The vote will be open for at
> > > least 72
> > > > > > hours unless there is an objection or not enough votes.
> > > > > >
> > > > > > [1] https://cwiki.apache.org/confluence/x/vQ2ZE
> > > > > > [2]
> > https://lists.apache.org/thread/ztyk68brsbmwwo66o1nvk3f6fqqhdxgk
> > > > >
> > > > > --
> > > > >
> > > > > Best,
> > > > > Benchao Li
> > >
> >
>


[jira] [Created] (FLINK-33601) Implement restore tests for Expand node

2023-11-20 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33601:
--

 Summary: Implement restore tests for Expand node
 Key: FLINK-33601
 URL: https://issues.apache.org/jira/browse/FLINK-33601
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33521) Implement restore tests for PythonCalc node

2023-11-10 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33521:
--

 Summary: Implement restore tests for PythonCalc node
 Key: FLINK-33521
 URL: https://issues.apache.org/jira/browse/FLINK-33521
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33488) Implement restore tests for Deduplicate node

2023-11-08 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33488:
--

 Summary: Implement restore tests for Deduplicate node
 Key: FLINK-33488
 URL: https://issues.apache.org/jira/browse/FLINK-33488
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [VOTE] FLIP-376: Add DISTRIBUTED BY clause

2023-11-07 Thread Jim Hughes
Hi all,

+1 (non-binding)

Cheers,

Jim

On Mon, Nov 6, 2023 at 6:39 AM Timo Walther  wrote:

> Hi everyone,
>
> I'd like to start a vote on FLIP-376: Add DISTRIBUTED BY clause[1] which
> has been discussed in this thread [2].
>
> The vote will be open for at least 72 hours unless there is an objection
> or not enough votes.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause
> [2] https://lists.apache.org/thread/z1p4f38x9ohv29y4nlq8g1wdxwfjwxd1
>
> Cheers,
> Timo
>


[jira] [Created] (FLINK-33470) Implement restore tests for Join node

2023-11-06 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33470:
--

 Summary: Implement restore tests for Join node
 Key: FLINK-33470
 URL: https://issues.apache.org/jira/browse/FLINK-33470
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-33469) Implement restore tests for Limit node

2023-11-06 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-33469:
--

 Summary: Implement restore tests for Limit node 
 Key: FLINK-33469
 URL: https://issues.apache.org/jira/browse/FLINK-33469
 Project: Flink
  Issue Type: Sub-task
Reporter: Jim Hughes






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints

2023-09-11 Thread Jim Hughes
Hi Jing and Jark!

I can definitely appreciate the desire to have fewer configurations.

Do you have a suggested alternative for platform providers to limit or
restrict the hints that Bonnie is talking about?

As one possibility, maybe one configuration could be set to control all
hints.

Cheers,

Jim

On Sat, Sep 9, 2023 at 6:16 AM Jark Wu  wrote:

> I agree with Jing,
>
> My biggest concern is this makes the boundary of adding an option very
> unclear.
> It's not a strong reason to add a config just because of it doesn't affect
> existing
> users. Does this mean that in the future we might add an option to disable
> each feature?
>
> Flink already has a very long list of configurations [1][2] and this is
> very scary
> and not easy to use. We should try to remove the unnecessary configuration
> from
> the list in Flink 2.0. However, from my perspective, adding this option
> makes us far
> away from this direction.
>
> Best,
> Jark
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/
>
> On Sat, 9 Sept 2023 at 17:33, Jing Ge  wrote:
>
> > Hi,
> >
> > Thanks for bringing this to our attention. At the first glance, it looks
> > reasonable to offer a new configuration to enable/disable SQL hints
> > globally. However, IMHO, it is not the right timing to do it now, because
> > we should not only think as platform providers but also as end users(the
> > number of end users are much bigger than platform providers):
> >
> > 1. Users don't need it because users have the choice to use hints or not,
> > just like Jark pointed out. With this configuration, there will be a
> fight
> > between platform providers and users which will cause more confusions and
> > conflicts. And users will probably win, IMHO, because they are the end
> > customers that use Flink to create business values.
> > 2. SQL hints could be considered as an additional feature for users to
> > control, to optimize the execution plan without touching the internal
> > logic, i.e. features for advanced use cases and i.e. don't use it if you
> > don't understand it.
> > 3. Before the system is smart enough to take over(where we are now,
> > fortunately and unfortunately :-))), there should be a way for users to
> do
> > such tuning, even if it is a temporary phase from a
> > long-term's perspective, i.e. just because it is a temporary solution,
> does
> > not mean it is not necessary for now.
> > 4. What if users write wrong hints? Well, the code review process is
> > recommended. Someone who truly understands hints should double check it
> > before hints are merged to the master or submitted to the production env.
> > Just like a common software development process.
> >
> > Just my two cents.
> >
> > Best regards,
> > Jing
> >
> > On Thu, Sep 7, 2023 at 10:02 PM Bonnie Arogyam Varghese
> >  wrote:
> >
> > > Hi Liu,
> > >  The default will be set to enabled which is the current behavior. The
> > > option will allow users/platform providers to disable it if they want
> to.
> > >
> > > On Wed, Sep 6, 2023 at 6:39 PM liu ron  wrote:
> > >
> > > > Hi, Boonie
> > > >
> > > > I'm with Jark on why disable hint is needed if it won't affect
> > security.
> > > If
> > > > users don't need to use hint, then they won't care about it and I
> don't
> > > > think it's going to be a nuisance. On top of that, Lookup Join Hint
> is
> > > very
> > > > useful for streaming jobs, and disabling the hint would result in
> users
> > > not
> > > > being able to use it.
> > > >
> > > > Best,
> > > > Ron
> > > >
> > > > Bonnie Arogyam Varghese 
> 于2023年9月6日周三
> > > > 23:52写道:
> > > >
> > > > > Hi Liu Ron,
> > > > >  To answer your question,
> > > > >Security might not be the main reason for disabling this option
> > but
> > > > > other arguments brought forward by Timo. Let me know if you have
> any
> > > > > further questions or concerns.
> > > > >
> > > > > On Tue, Sep 5, 2023 at 9:35 PM Bonnie Arogyam Varghese <
> > > > > bvargh...@confluent.io> wrote:
> > > > >
> > > > > > It looks like it will be nice to have a config to disable hints.
> > Any
> > > > > other
> > > > > > thoughts/concerns before we can close this discussion?
> > > > > >
> > > > > > On Fri, Aug 18, 2023 at 7:43 AM Timo Walther  >
> > > > wrote:
> > > > > >
> > > > > >>  > lots of the streaming SQL syntax are extensions of SQL
> standard
> > > > > >>
> > > > > >> That is true. But hints are kind of a special case because they
> > are
> > > > not
> > > > > >> even "part of Flink SQL" that's why they are written in a
> comment
> > > > > syntax.
> > > > > >>
> > > > > >> Anyway, I feel hints could be sometimes confusing for users
> > because
> > > > most
> > > > > >> of them have no effect for streaming and long-term we could also
> > set
> > > > > >> some hints via the CompiledPlan. And if you have multiple teams,
> > > > > >> non-skilled users should not play around with hints and leave

[jira] [Created] (FLINK-32691) SELECT fcn does not work with an unset catalog or database

2023-07-26 Thread Jim Hughes (Jira)
Jim Hughes created FLINK-32691:
--

 Summary: SELECT fcn does not work with an unset catalog or database
 Key: FLINK-32691
 URL: https://issues.apache.org/jira/browse/FLINK-32691
 Project: Flink
  Issue Type: Bug
Reporter: Jim Hughes
 Fix For: 1.18.0


Relative to https://issues.apache.org/jira/browse/FLINK-32584, function lookup 
fails without the catalog and database set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-10 Thread Jim Hughes
Hi Yuxia,

On Mon, Apr 10, 2023 at 10:35 AM yuxia  wrote:

> Hi, Jim.
>
> 1: I'm expecting all DynamicTableSinks to support. But it's hard to
> support all at one shot. For the DynamicTableSinks that haven't implemented
> SupportsTruncate interface, we'll throw exception
> like 'The truncate statement for the table is not supported as it hasn't
> implemented the interface SupportsTruncate'. Also, for some sinks that
> doesn't support deleting data, it can also implements it but throw more
> concrete exception like "xxx donesn't support to truncate a table as delete
> is impossible for xxx". It depends on the external connector's
> implementation.
> Thanks for your advice, I updated it to the FLIP.
>

Makes sense.


> 2: What do you mean by saying "truncate an input to a streaming query"?
> This FLIP is aimed to support TRUNCATE TABLE statement which is for
> truncating a table. In which case it will inoperates with streaming queries?
>

Let's take a source like Kafka as an example.  Suppose I have an input
topic Foo, and query which uses it as an input.

When Foo is truncated, if the truncation works as a delete and create, then
the connector may need to be made aware (otherwise it may try to use
offsets from the previous topic).  On the other hand, one may have to ask
Kafka to delete records up to a certain point.

Also, savepoints for the query may contain information from the truncated
table.  Should this FLIP involve invalidating that information in some
manner?  Or does truncating a source table for a query cause undefined
behavior on that query?

Basically, I'm trying to think through the implementations of a truncate
operation to streaming sources and queries.

Cheers,

Jim


> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Jim Hughes" 
> 收件人: "dev" 
> 发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28
> 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
>
> Hi Yuxia,
>
> Two questions:
>
> 1.  Are you expecting all DynamicTableSinks to support Truncate?  The FLIP
> could use some explanation for what supporting and not supporting the
> operation means.
>
> 2.  How will truncate inoperate with streaming queries?  That is, if I
> truncate an input to a streaming query, is there any defined behavior?
>
> Cheers,
>
> Jim
>
> On Wed, Mar 22, 2023 at 9:13 AM yuxia  wrote:
>
> > Hi, devs.
> >
> > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> > statement [1].
> >
> > The TRUNCATE TABLE statement is a SQL command that allows users to
> quickly
> > and efficiently delete all rows from a table without dropping the table
> > itself. This statement is commonly used in data warehouse, where large
> data
> > sets are frequently loaded and unloaded from tables.
> > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore
> exactly,
> > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface
> with
> > which the coresponding connectors can implement their own logic for
> > truncating table.
> >
> > Looking forwards to your feedback.
> >
> > [1]: [
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > |
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > ]
> >
> >
> > Best regards,
> > Yuxia
> >
>


Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

2023-04-10 Thread Jim Hughes
Hi Yuxia,

Two questions:

1.  Are you expecting all DynamicTableSinks to support Truncate?  The FLIP
could use some explanation for what supporting and not supporting the
operation means.

2.  How will truncate inoperate with streaming queries?  That is, if I
truncate an input to a streaming query, is there any defined behavior?

Cheers,

Jim

On Wed, Mar 22, 2023 at 9:13 AM yuxia  wrote:

> Hi, devs.
>
> I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> statement [1].
>
> The TRUNCATE TABLE statement is a SQL command that allows users to quickly
> and efficiently delete all rows from a table without dropping the table
> itself. This statement is commonly used in data warehouse, where large data
> sets are frequently loaded and unloaded from tables.
> So, this FLIP is meant to support TRUNCATE TABLE statement. M ore exactly,
> this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface with
> which the coresponding connectors can implement their own logic for
> truncating table.
>
> Looking forwards to your feedback.
>
> [1]: [
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> |
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> ]
>
>
> Best regards,
> Yuxia
>


Re: [DISCUSS] Add support for Apache Arrow format

2023-03-30 Thread Jim Hughes
Hi all,

How do Flink formats relate to or interact with Paimon (formerly
Flink-Table-Store)?  If the Flink format interface is used there, then it
may be useful to consider Arrow along with other columnar formats.

Separately, from previous experience, I've seen the Arrow format be useful
as an output format for clients to read efficiently.  Arrow does support
returning batches of records, so there may be some options to use the
format in a streaming situation where a sufficient collection of records
can be gathered.

Cheers,

Jim



On Thu, Mar 30, 2023 at 8:32 AM Martijn Visser 
wrote:

> Hi,
>
> To be honest, I haven't seen that much demand for supporting the Arrow
> format directly in Flink as a flink-format. I'm wondering if there's really
> much benefit for the Flink project to add another file format, over
> properly supporting the format that we already have in the project.
>
> Best regards,
>
> Martijn
>
> On Thu, Mar 30, 2023 at 2:21 PM Ran Tao  wrote:
>
> > It is a good point that flink integrates apache arrow as a format.
> > Arrow can take advantage of SIMD-specific or vectorized optimizations,
> > which should be of great benefit to batch tasks.
> > However, as mentioned in the issue you listed, it may take a lot of work
> > and the community's consideration for integrating Arrow.
> >
> > I think you can try to make a simple poc for verification and some
> specific
> > plans.
> >
> >
> > Best Regards,
> > Ran Tao
> >
> >
> > Aitozi  于2023年3月29日周三 19:12写道:
> >
> > > Hi guys
> > >  I'm opening this thread to discuss supporting the Apache Arrow
> > format
> > > in Flink.
> > >  Arrow is a language-independent columnar memory format that has
> > become
> > > widely used in different systems, and It can also serve as an
> > > inter-exchange format between other systems.
> > > So, using it directly in the Flink system will be nice. We also
> received
> > > some requests from slack[1][2] and jira[3].
> > >  In our company's internal usage, we have used flink-python
> moudle's
> > > ArrowReader and ArrowWriter to support this work. But it still can not
> > > integrate with the current flink-formats framework closely.
> > > So, I'd like to introduce the flink-arrow formats module to support the
> > > arrow format naturally.
> > >  Looking forward to some suggestions.
> > >
> > >
> > > Best,
> > > Aitozi
> > >
> > >
> > > [1]:
> > https://apache-flink.slack.com/archives/C03GV7L3G2C/p1677915016551629
> > >
> > > [2]:
> > https://apache-flink.slack.com/archives/C03GV7L3G2C/p1666326443152789
> > >
> > > [3]: https://issues.apache.org/jira/browse/FLINK-10929
> > >
> >
>


Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

2022-11-30 Thread Jim Hughes
Hi Yu,

Thanks for moving my comments to this thread!  Also, thank you for
answering my questions; it is helping me understand the SQL Gateway
better.

5.
> Our idea is to introduce a new session option (like
'sql-client.result.fetch-interval') to control
the fetching requests sending frequency. What do you think?

Should this configuration be mentioned in the FLIP?

One slight concern I have with having 'sql-client.result.fetch-interval' as
a session configuration is that users could set it low and cause the client
to send a large volume of requests to the SQL gateway.

Generally, I'd like to see some way for the server to be able to limit the
number of requests it receives.  If that really needs to be done by a proxy
in front of the SQL gateway, that is fine as well.  (To be clear, I don't
think my concern here should be blocking in any way.)

7.
> What is the serialization lifecycle for results?

I wonder if two other options are possible:
3) Could the Gateway just forward the result byte array?  (Or does the
Gateway need to deserialize the response in order to understand it for some
reason?)
4) Could the JobManager prepare the results in JSON?  (Or similarly could
the Client read the format which the JobManager sends?)

Thanks again!

Cheers,

Jim

On Wed, Nov 30, 2022 at 9:40 AM yu zelin  wrote:

> Hi, all
>
> Thanks Jim’s questions below. Here I’d like to reply to them.
>
> >   1. For the Client Parser, is it going to work with the extended syntax
> >   from the Flink Table Store?
> >
> >   2. Relatedly, what will happen if an older Client tries to handle
> syntax
> >   that a newer service supports?  (Suppose I use a 1.17 client with a
> 1.18
> >   Gateway/system which has a new keyword.  Is there anything we should be
> >   designing for upfront?)
> >
> >   3. How will client and server version mismatches be handled?  Will a
> >   single gateway be able to support multiple endpoint versions?
> >   4. How are commands which change a session handled?  Are those sent via
> >   an ExecuteStatementRequest?
> >
> >   5. The remote POC uses polling for getting back status and getting back
> >   results.  Would it be possible to switch to web sockets or some other
> >   mechanism to avoid polling?  If polling is used for both, the polling
> >   frequency should be different between local and remote configurations.
> >
> >   6. What does this sentence mean?  "The reason why we didn't get the sql
> >   type in client side is because it's hard for the lightweight
> client-level
> >   parser to recognize some sql type  sql, such as query with CTE.  "
> >
> >   7. What is the serialization lifecycle for results?  It makes sense to
> >   have some control over whether the gateway returns results as SQL or
> JSON.
> >   I'd love to see a way to avoid needing to serialize and deserialize
> results
> >   on the SQL Gateway if possible.  I'm still new enough to the project
> that
> >   I'm not sure if that's readily possible.  Maybe the SQL Gateway's
> return
> >   type can be sent as part of the request so that the JobManager can send
> >   back results in an advantageous format?
> >
> >   8. Does ErrorType need to be marked as @PublicEvolving?
> >
> > I'm excited for the SQL client to support gateway mode!  Given the change
> > in design, do you think it'll still be part of the Flink 1.17 release?
>
> 1.  ClientParser can work with new (and unknown) SQL syntax. It is because
> if the
> sql type is not recognized, the sql will be submitted to the gateway
> directly.
>
> For more information: Actually, the proposed ClientParser only do two
> things:
> (1) Tell client commands (help, clear, etc) and sqls apart.
> (2) parses several sql types (e.g. SHOW CREATE statement, we can print raw
> string
> for the SHOW CREATE result instead of table). Here the recognization of
> sql types
> mostly affects the print style, and unrecognized sql also can be submitted
> to cluster.
> So the Client with new ClientParser can work compatible with new syntax.
>
> 2. First, I'd like to explain that the gateway APIs and supported syntax
> is two things.
> For example, ‘configureSession' and 'completeStatement' are APIs. As
> mentioned
> in #1, the sql statements which syntax is unknown will be submitted to the
> gateway,
> and whether they can be executed normally depends on whether the execution
> environment supports the syntax.
>
> > Is there anything we should be designing for upfront?
>
> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
> gateway APIs.
>
> 3.
> > How will client and server version mismatches be handled?
>
> A lower version client can work compatible with a higher version gateway
> because the
> old interfaces won’t be deleted. When a higher version client connects to
> a lower version
> gateway, the client should notify the users if they try to use unsupported
> features. For
> example, the client start option ‘-i’  means using initialization file to
> initialize the session.
> We plan to use 

Re: SQL Gateway and SQL Client

2022-11-28 Thread Jim Hughes
Hi Shengkai, Yu,

Thanks for the FLIP!  I have had a chance to read it, and it looks good.  I
do have some questions:

I do like the idea of unifying the approaches so that the code doesn't get
out of step.

   1. For the Client Parser, is it going to work with the extended syntax
   from the Flink Table Store?

   2. Relatedly, what will happen if an older Client tries to handle syntax
   that a newer service supports?  (Suppose I use a 1.17 client with a 1.18
   Gateway/system which has a new keyword.  Is there anything we should be
   designing for upfront?)

   3. How will client and server version mismatches be handled?  Will a
   single gateway be able to support multiple endpoint versions?
   4. How are commands which change a session handled?  Are those sent via
   an ExecuteStatementRequest?

   5. The remote POC uses polling for getting back status and getting back
   results.  Would it be possible to switch to web sockets or some other
   mechanism to avoid polling?  If polling is used for both, the polling
   frequency should be different between local and remote configurations.

   6. What does this sentence mean?  "The reason why we didn't get the sql
   type in client side is because it's hard for the lightweight client-level
   parser to recognize some sql type  sql, such as query with CTE.  "

   7. What is the serialization lifecycle for results?  It makes sense to
   have some control over whether the gateway returns results as SQL or JSON.
   I'd love to see a way to avoid needing to serialize and deserialize results
   on the SQL Gateway if possible.  I'm still new enough to the project that
   I'm not sure if that's readily possible.  Maybe the SQL Gateway's return
   type can be sent as part of the request so that the JobManager can send
   back results in an advantageous format?

   8. Does ErrorType need to be marked as @PublicEvolving?

I'm excited for the SQL client to support gateway mode!  Given the change
in design, do you think it'll still be part of the Flink 1.17 release?

Cheers,

Jim

On Sun, Nov 27, 2022 at 8:54 PM Shengkai Fang  wrote:

> Hi, Jim and Alexey.
>
> We have written the proposal[1]. It would be appreciated if you can give us
> some feedback.
>
> Best,
> Shengkai
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-275%3A+Support+Remote+SQL+Client+Based+on+SQL+Gateway
>
> yu zelin  于2022年11月24日周四 12:06写道:
>
> > Hi Jim,
> > Sorry for incorrect message in last reply.
> > > Shengkai will help to your PR
> > I mean Shengkai will help to review the PRs. And I will add some of your
> > suggestion to my design.
> >
> > I think the POC code will be cherry-picked to my new design of SQL
> Client.
> > > 2022年11月23日 12:18,yu zelin  写道:
> > >
> > > Hi Jim,
> > > Sorry for late response. Just another busy week :)
> > > Last week, I’ve discussed within my team about my design. My teammates
> > think it’s better to unify the local and remote mode, so I’ve
> investigated
> > and redesigned a new plan. I’ll inform you after the rewriting of FLIP
> > finished (will be soon) and Shengkai will help to your PR.
> > >
> > > Best,
> > >
> > > Yu Zelin
> > >
> > >> 2022年11月22日 02:59,Jim Hughes  写道:
> > >>
> > >> Hi Yu, Shengkai,
> > >>
> > >> As a quick update, I've had a chance to try out Yu's POC and it is
> > working
> > >> for me.  (Admittedly, I haven't tried it too extensively; I only tried
> > >> basic operations.)
> > >>
> > >> From my experiments, I did leave a few comments on
> > >> https://github.com/apache/flink/pull/20958.
> > >>
> > >> Overall, the PRs I see look pretty good.  Are they going to be merged
> > >> soon?  Anything else I can do to help?
> > >>
> > >> Cheers,
> > >>
> > >> Jim
> > >
> >
> >
>


Re: SQL Gateway and SQL Client

2022-11-21 Thread Jim Hughes
Hi Yu, Shengkai,

As a quick update, I've had a chance to try out Yu's POC and it is working
for me.  (Admittedly, I haven't tried it too extensively; I only tried
basic operations.)

>From my experiments, I did leave a few comments on
https://github.com/apache/flink/pull/20958.

Overall, the PRs I see look pretty good.  Are they going to be merged
soon?  Anything else I can do to help?

Cheers,

Jim

On Mon, Nov 14, 2022 at 12:52 PM Jim Hughes  wrote:

> Hi Yu,
>
> The PR looks good to me; the only thing I noticed is that the
> RemoveJarOperation may need to be added.
>
> I asked some other questions; I don't think any of them are particularly
> large issues.  Let me know when the next PR goes up and I'll take a look.
>
> Cheers,
>
> Jim
>
> On Sun, Nov 13, 2022 at 10:45 PM yu zelin  wrote:
>
>> Hi Jim,
>>
>> It would be nice if you can take a look on
>> https://github.com/apache/flink/pull/21133 and give me some feedback.
>>
>> Best,
>> Yu Zelin
>>
>> > 2022年11月12日 00:44,Jim Hughes  写道:
>> >
>> > Hi Shengkai,
>> >
>> > I think there is an additional case where a proxy is between the client
>> and
>> > gateway.  In that case, being able to pass headers would allow for
>> > additional options / features.
>> >
>> > I see several PRs from Yu Zelin.  Is there a first one to review?
>> >
>> > Cheers,
>> >
>> > Jim
>> >
>> > On Thu, Nov 10, 2022 at 9:42 PM Shengkai Fang 
>> wrote:
>> >
>> >> Hi, Jim.
>> >>
>> >>> how to pass additional headers when sending REST requests
>> >>
>> >> Could you share what headers do you want to send when using SQL
>> Client?  I
>> >> think there are two cases we need to consider. Please correct me if I
>> am
>> >> wrong.
>> >>
>> >> # Case 1
>> >>
>> >> If users wants to connect to the SQL Gateway with its password, I
>> think the
>> >> users should type
>> >> ```
>> >> ./sql-client.sh --user xxx --password xxx
>> >> ```
>> >> in the terminal and the OpenSessionRequest should be enough.
>> >>
>> >> # Case 2
>> >>
>> >> If users  wants to modify the execution config, users should type
>> >> ```
>> >> Flink SQL> SET  `` = ``;
>> >> ```
>> >> in the terminal. The Client can send ExecuteStatementRequest to the
>> >> Gateway.
>> >>
>> >>> As you have FLIPs or PRs, feel free to let me, Jamie, and Alexey know.
>> >>
>> >> It would be nice you can join us to finish the feature. I think the
>> >> modification about the SQL Gateway side is ready to review.
>> >>
>> >> Best,
>> >> Shengkai
>> >>
>> >>
>> >> Jim Hughes  于2022年11月11日周五 05:19写道:
>> >>
>> >>> Hi Yu Zelin,
>> >>>
>> >>> I have read through your draft and it looks good.  I am new to Flink,
>> so
>> >> I
>> >>> haven't learned about everything which needs to be done yet.
>> >>>
>> >>> One of the considerations that I'm interested in understanding is how
>> to
>> >>> pass additional headers when sending REST requests.  From looking at
>> the
>> >>> code, it looks like a custom `OutboundChannelHandlerFactory` could be
>> >>> created to read additional configuration and set headers.  Does that
>> make
>> >>> sense?
>> >>>
>> >>> Thank you very much for sharing the proof of concept code and the
>> >>> document.  As you have FLIPs or PRs, feel free to let me, Jamie, and
>> >> Alexey
>> >>> know.  We'll be happy to review them.
>> >>>
>> >>> Cheers,
>> >>>
>> >>> Jim
>> >>>
>> >>> On Wed, Nov 9, 2022 at 11:43 PM yu zelin 
>> wrote:
>> >>>
>> >>>> Hi, all
>> >>>> Sorry for late response. As Shengkai mentioned, Currently I’m working
>> >>> with
>> >>>> him on SQL Client, dedicating to implement the Remote Mode of SQL
>> >>> Client. I
>> >>>> have written a draft of implementation plan and will write a FLIP
>> about
>> >>> it
>> >>>> ASAP. If you are interested in, please take a look at the draft and
>> >> it’s
>> &

Re: SQL Gateway and SQL Client

2022-11-14 Thread Jim Hughes
Hi Shengkai,

Thanks!  Can you say more about how the RestClient sets the headers?  I
looked at the code, and I only see headers being set in the RestClient
here[1].

I'm hoping to set arbitrary headers from configuration.  I'm not seeing a
way to do that presently.

[1]
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/rest/RestClient.java#L410-L425

Cheers,

Jim

On Sun, Nov 13, 2022 at 10:42 PM Shengkai Fang  wrote:

> Hi Jim,
>
> Thanks for your input. You can look here[1] on the server side. You can
> modify the return type of the handleRequest to Tuple2 to
> get session-specific headers. RestClient has already exposed a method to
> send requests with headers for the client side.
>
> [1]
>
> https://github.com/apache/flink/blob/master/flink-table/flink-sql-gateway/src/main/java/org/apache/flink/table/gateway/rest/handler/AbstractSqlGatewayRestHandler.java#L98
>
> Jim Hughes  于2022年11月12日周六 00:45写道:
>
> > Hi Shengkai,
> >
> > I think there is an additional case where a proxy is between the client
> and
> > gateway.  In that case, being able to pass headers would allow for
> > additional options / features.
> >
> > I see several PRs from Yu Zelin.  Is there a first one to review?
> >
> > Cheers,
> >
> > Jim
> >
> > On Thu, Nov 10, 2022 at 9:42 PM Shengkai Fang  wrote:
> >
> > > Hi, Jim.
> > >
> > > > how to pass additional headers when sending REST requests
> > >
> > > Could you share what headers do you want to send when using SQL Client?
> > I
> > > think there are two cases we need to consider. Please correct me if I
> am
> > > wrong.
> > >
> > > # Case 1
> > >
> > > If users wants to connect to the SQL Gateway with its password, I think
> > the
> > > users should type
> > > ```
> > > ./sql-client.sh --user xxx --password xxx
> > > ```
> > > in the terminal and the OpenSessionRequest should be enough.
> > >
> > > # Case 2
> > >
> > > If users  wants to modify the execution config, users should type
> > > ```
> > > Flink SQL> SET  `` = ``;
> > > ```
> > > in the terminal. The Client can send ExecuteStatementRequest to the
> > > Gateway.
> > >
> > > > As you have FLIPs or PRs, feel free to let me, Jamie, and Alexey
> know.
> > >
> > > It would be nice you can join us to finish the feature. I think the
> > > modification about the SQL Gateway side is ready to review.
> > >
> > > Best,
> > > Shengkai
> > >
> > >
> > > Jim Hughes  于2022年11月11日周五 05:19写道:
> > >
> > > > Hi Yu Zelin,
> > > >
> > > > I have read through your draft and it looks good.  I am new to Flink,
> > so
> > > I
> > > > haven't learned about everything which needs to be done yet.
> > > >
> > > > One of the considerations that I'm interested in understanding is how
> > to
> > > > pass additional headers when sending REST requests.  From looking at
> > the
> > > > code, it looks like a custom `OutboundChannelHandlerFactory` could be
> > > > created to read additional configuration and set headers.  Does that
> > make
> > > > sense?
> > > >
> > > > Thank you very much for sharing the proof of concept code and the
> > > > document.  As you have FLIPs or PRs, feel free to let me, Jamie, and
> > > Alexey
> > > > know.  We'll be happy to review them.
> > > >
> > > > Cheers,
> > > >
> > > > Jim
> > > >
> > > > On Wed, Nov 9, 2022 at 11:43 PM yu zelin 
> > wrote:
> > > >
> > > > > Hi, all
> > > > > Sorry for late response. As Shengkai mentioned, Currently I’m
> working
> > > > with
> > > > > him on SQL Client, dedicating to implement the Remote Mode of SQL
> > > > Client. I
> > > > > have written a draft of implementation plan and will write a FLIP
> > about
> > > > it
> > > > > ASAP. If you are interested in, please take a look at the draft and
> > > it’s
> > > > > nice if you give me some feedback.
> > > > > The doc is at:
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/14cS4VBSamMUnlM_PZuK6QKLfriUuQU51iqET5oiYy_c/edit?usp=sharing
> > > > >
> > > > > > 2022年11月7日 11:19,Sh

Re: SQL Gateway and SQL Client

2022-11-14 Thread Jim Hughes
Hi Yu,

The PR looks good to me; the only thing I noticed is that the
RemoveJarOperation may need to be added.

I asked some other questions; I don't think any of them are particularly
large issues.  Let me know when the next PR goes up and I'll take a look.

Cheers,

Jim

On Sun, Nov 13, 2022 at 10:45 PM yu zelin  wrote:

> Hi Jim,
>
> It would be nice if you can take a look on
> https://github.com/apache/flink/pull/21133 and give me some feedback.
>
> Best,
> Yu Zelin
>
> > 2022年11月12日 00:44,Jim Hughes  写道:
> >
> > Hi Shengkai,
> >
> > I think there is an additional case where a proxy is between the client
> and
> > gateway.  In that case, being able to pass headers would allow for
> > additional options / features.
> >
> > I see several PRs from Yu Zelin.  Is there a first one to review?
> >
> > Cheers,
> >
> > Jim
> >
> > On Thu, Nov 10, 2022 at 9:42 PM Shengkai Fang  wrote:
> >
> >> Hi, Jim.
> >>
> >>> how to pass additional headers when sending REST requests
> >>
> >> Could you share what headers do you want to send when using SQL
> Client?  I
> >> think there are two cases we need to consider. Please correct me if I am
> >> wrong.
> >>
> >> # Case 1
> >>
> >> If users wants to connect to the SQL Gateway with its password, I think
> the
> >> users should type
> >> ```
> >> ./sql-client.sh --user xxx --password xxx
> >> ```
> >> in the terminal and the OpenSessionRequest should be enough.
> >>
> >> # Case 2
> >>
> >> If users  wants to modify the execution config, users should type
> >> ```
> >> Flink SQL> SET  `` = ``;
> >> ```
> >> in the terminal. The Client can send ExecuteStatementRequest to the
> >> Gateway.
> >>
> >>> As you have FLIPs or PRs, feel free to let me, Jamie, and Alexey know.
> >>
> >> It would be nice you can join us to finish the feature. I think the
> >> modification about the SQL Gateway side is ready to review.
> >>
> >> Best,
> >> Shengkai
> >>
> >>
> >> Jim Hughes  于2022年11月11日周五 05:19写道:
> >>
> >>> Hi Yu Zelin,
> >>>
> >>> I have read through your draft and it looks good.  I am new to Flink,
> so
> >> I
> >>> haven't learned about everything which needs to be done yet.
> >>>
> >>> One of the considerations that I'm interested in understanding is how
> to
> >>> pass additional headers when sending REST requests.  From looking at
> the
> >>> code, it looks like a custom `OutboundChannelHandlerFactory` could be
> >>> created to read additional configuration and set headers.  Does that
> make
> >>> sense?
> >>>
> >>> Thank you very much for sharing the proof of concept code and the
> >>> document.  As you have FLIPs or PRs, feel free to let me, Jamie, and
> >> Alexey
> >>> know.  We'll be happy to review them.
> >>>
> >>> Cheers,
> >>>
> >>> Jim
> >>>
> >>> On Wed, Nov 9, 2022 at 11:43 PM yu zelin 
> wrote:
> >>>
> >>>> Hi, all
> >>>> Sorry for late response. As Shengkai mentioned, Currently I’m working
> >>> with
> >>>> him on SQL Client, dedicating to implement the Remote Mode of SQL
> >>> Client. I
> >>>> have written a draft of implementation plan and will write a FLIP
> about
> >>> it
> >>>> ASAP. If you are interested in, please take a look at the draft and
> >> it’s
> >>>> nice if you give me some feedback.
> >>>> The doc is at:
> >>>>
> >>>
> >>
> https://docs.google.com/document/d/14cS4VBSamMUnlM_PZuK6QKLfriUuQU51iqET5oiYy_c/edit?usp=sharing
> >>>>
> >>>>> 2022年11月7日 11:19,Shengkai Fang  写道:
> >>>>>
> >>>>> Hi, all. Sorry for the late reply.
> >>>>>
> >>>>>> Is the gateway mode planned to be supported for SQL Client in 1.17?
> >>>>>> Do you have anything you can already share so we can start with
> >> your
> >>>> work or just play around with it.
> >>>>>
> >>>>> Yes. @yzl is working on it and he will list the implementation plan
> >>>> later and share the progress. I think the change is not very large and
> >> I
> &g

Re: SQL Gateway and SQL Client

2022-11-11 Thread Jim Hughes
Hi Shengkai,

I think there is an additional case where a proxy is between the client and
gateway.  In that case, being able to pass headers would allow for
additional options / features.

I see several PRs from Yu Zelin.  Is there a first one to review?

Cheers,

Jim

On Thu, Nov 10, 2022 at 9:42 PM Shengkai Fang  wrote:

> Hi, Jim.
>
> > how to pass additional headers when sending REST requests
>
> Could you share what headers do you want to send when using SQL Client?  I
> think there are two cases we need to consider. Please correct me if I am
> wrong.
>
> # Case 1
>
> If users wants to connect to the SQL Gateway with its password, I think the
> users should type
> ```
> ./sql-client.sh --user xxx --password xxx
> ```
> in the terminal and the OpenSessionRequest should be enough.
>
> # Case 2
>
> If users  wants to modify the execution config, users should type
> ```
> Flink SQL> SET  `` = ``;
> ```
> in the terminal. The Client can send ExecuteStatementRequest to the
> Gateway.
>
> > As you have FLIPs or PRs, feel free to let me, Jamie, and Alexey know.
>
> It would be nice you can join us to finish the feature. I think the
> modification about the SQL Gateway side is ready to review.
>
> Best,
> Shengkai
>
>
> Jim Hughes  于2022年11月11日周五 05:19写道:
>
> > Hi Yu Zelin,
> >
> > I have read through your draft and it looks good.  I am new to Flink, so
> I
> > haven't learned about everything which needs to be done yet.
> >
> > One of the considerations that I'm interested in understanding is how to
> > pass additional headers when sending REST requests.  From looking at the
> > code, it looks like a custom `OutboundChannelHandlerFactory` could be
> > created to read additional configuration and set headers.  Does that make
> > sense?
> >
> > Thank you very much for sharing the proof of concept code and the
> > document.  As you have FLIPs or PRs, feel free to let me, Jamie, and
> Alexey
> > know.  We'll be happy to review them.
> >
> > Cheers,
> >
> > Jim
> >
> > On Wed, Nov 9, 2022 at 11:43 PM yu zelin  wrote:
> >
> > > Hi, all
> > > Sorry for late response. As Shengkai mentioned, Currently I’m working
> > with
> > > him on SQL Client, dedicating to implement the Remote Mode of SQL
> > Client. I
> > > have written a draft of implementation plan and will write a FLIP about
> > it
> > > ASAP. If you are interested in, please take a look at the draft and
> it’s
> > > nice if you give me some feedback.
> > > The doc is at:
> > >
> >
> https://docs.google.com/document/d/14cS4VBSamMUnlM_PZuK6QKLfriUuQU51iqET5oiYy_c/edit?usp=sharing
> > >
> > > > 2022年11月7日 11:19,Shengkai Fang  写道:
> > > >
> > > > Hi, all. Sorry for the late reply.
> > > >
> > > > > Is the gateway mode planned to be supported for SQL Client in 1.17?
> > > > > Do you have anything you can already share so we can start with
> your
> > > work or just play around with it.
> > > >
> > > > Yes. @yzl is working on it and he will list the implementation plan
> > > later and share the progress. I think the change is not very large and
> I
> > > think it's not a big problem to finish this in the release-1.17. I will
> > > join to develop this in the mid of November.
> > > >
> > > > Best,
> > > > Shengkai
> > > >
> > > >
> > > >
> > > >
> > > > Jamie Grier mailto:jgr...@apache.org>>
> > > 于2022年11月5日周六 00:48写道:
> > > >> Hi Shengkai,
> > > >>
> > > >> We're doing more and more Flink development at Confluent these days
> > and
> > > we're currently trying to bootstrap a prototype that relies on the SQL
> > > Client and Gateway.  We will be using the the components in some of our
> > > projects and would like to co-develop them with you and the rest of the
> > > Flink community.
> > > >>
> > > >> As of right now it's a pretty big blocker for our upcoming milestone
> > > that the SQL Client has not yet been modified to talk to the SQL
> Gateway
> > > and we want to help with this effort ASAP!  We would be even willing to
> > > take over the work if it's not yet started but I suspect it already is.
> > > >>
> > > >> Anyway, rather than start working immediately on the SQL Client and
> > > adding a the new Gateway mode ourselves we wanted to start a
> conve

Re: SQL Gateway and SQL Client

2022-11-10 Thread Jim Hughes
Hi Yu Zelin,

I have read through your draft and it looks good.  I am new to Flink, so I
haven't learned about everything which needs to be done yet.

One of the considerations that I'm interested in understanding is how to
pass additional headers when sending REST requests.  From looking at the
code, it looks like a custom `OutboundChannelHandlerFactory` could be
created to read additional configuration and set headers.  Does that make
sense?

Thank you very much for sharing the proof of concept code and the
document.  As you have FLIPs or PRs, feel free to let me, Jamie, and Alexey
know.  We'll be happy to review them.

Cheers,

Jim

On Wed, Nov 9, 2022 at 11:43 PM yu zelin  wrote:

> Hi, all
> Sorry for late response. As Shengkai mentioned, Currently I’m working with
> him on SQL Client, dedicating to implement the Remote Mode of SQL Client. I
> have written a draft of implementation plan and will write a FLIP about it
> ASAP. If you are interested in, please take a look at the draft and it’s
> nice if you give me some feedback.
> The doc is at:
> https://docs.google.com/document/d/14cS4VBSamMUnlM_PZuK6QKLfriUuQU51iqET5oiYy_c/edit?usp=sharing
>
> > 2022年11月7日 11:19,Shengkai Fang  写道:
> >
> > Hi, all. Sorry for the late reply.
> >
> > > Is the gateway mode planned to be supported for SQL Client in 1.17?
> > > Do you have anything you can already share so we can start with your
> work or just play around with it.
> >
> > Yes. @yzl is working on it and he will list the implementation plan
> later and share the progress. I think the change is not very large and I
> think it's not a big problem to finish this in the release-1.17. I will
> join to develop this in the mid of November.
> >
> > Best,
> > Shengkai
> >
> >
> >
> >
> > Jamie Grier mailto:jgr...@apache.org>>
> 于2022年11月5日周六 00:48写道:
> >> Hi Shengkai,
> >>
> >> We're doing more and more Flink development at Confluent these days and
> we're currently trying to bootstrap a prototype that relies on the SQL
> Client and Gateway.  We will be using the the components in some of our
> projects and would like to co-develop them with you and the rest of the
> Flink community.
> >>
> >> As of right now it's a pretty big blocker for our upcoming milestone
> that the SQL Client has not yet been modified to talk to the SQL Gateway
> and we want to help with this effort ASAP!  We would be even willing to
> take over the work if it's not yet started but I suspect it already is.
> >>
> >> Anyway, rather than start working immediately on the SQL Client and
> adding a the new Gateway mode ourselves we wanted to start a conversation
> with you and see where you're at with things and offer to help.
> >>
> >> Do you have anything you can already share so we can start with your
> work or just play around with it.  Like I said, we just want to get started
> and are very able to help in this area.  We see both the SQL Client and
> Gateway being very important for us and have a good team to help develop it.
> >>
> >> Let me know if there is a branch you can share, etc.  It would be much
> appreciated!
> >>
> >> -Jamie Grier
> >>
> >>
> >> On 2022/10/28 06:06:49 Shengkai Fang wrote:
> >> > Hi.
> >> >
> >> > > Is there a possibility for us to get engaged and at least introduce
> >> > initial changes to support authentication/authorization?
> >> >
> >> > Yes. You can write a FLIP about the design and change. We can discuss
> this
> >> > in the dev mail. If the FLIP passes, we can develop it together.
> >> >
> >> > > Another question about persistent Gateway: did you have any specific
> >> > thoughts about it or some draft design?
> >> >
> >> > We don't have any detailed plan about this. But I know Livy has a
> similar
> >> > feature.
> >> >
> >> > Best,
> >> > Shengkai
> >> >
> >> >
> >> > Alexey Leonov-Vendrovskiy  vendrov...@gmail.com>> 于2022年10月27日周四 15:12写道:
> >> >
> >> > > Apologies from the delayed response on my side.
> >> > >
> >> > >  I think the authentication module is not part of our plan in 1.17
> because
> >> > >> of the busy work. I think we'll start the design at the end of the
> >> > >> release-1.17.
> >> > >
> >> > >
> >> > > Is there a possibility for us to get engaged and at least introduce
> >> > > initial changes to support authentication/authorization?
> Specifically,
> >> > > changes in the API and in SQL Client.
> >> > >
> >> > > We expect the following authentication flow:
> >> > >
> >> > > On the SQL gateway we want to be able to use a delegation token.
> >> > > SQL client should be able to supply an API key.
> >> > > The SQL Gateway *would not *be submitting jobs on behalf of the
> client.
> >> > >
> >> > > Ideally it would be nice to introduce some interfaces in the SQL
> Gateway
> >> > > that would allow implementing custom authentication and
> authorization.
> >> > >
> >> > > Another question about persistent Gateway: did you have any specific
> >> > > thoughts about it or some draft design?
> >> > >
> >> > > Thanks,
> >> > > Alexey
> >> > >
> >>