Re:Re: Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2024-01-01 Thread Xuyang
Hi, Martijn.
Thank you for your reminder :)


My idea is that in the current 1.x version, we can automatically convert the 
agg operator in the old grammar into the agg operator
in the new grammar. Huge changes will be introduced in version 2.0 that old 
syntax will be directly deleted at the code level.
>That would imply that we will never be able to remove the old SQL
>language from the code base, since we would always rewrite that old
>language to the new implementation under the hood.
I'm a little curious why the old syntax can't be removed in the code in 2.0. If 
you have any better ideas, let’s discuss it together.




--

Best!
Xuyang





At 2023-12-27 23:20:06, "Martijn Visser"  wrote:
>Hi Xuyang,
>
>It's currently the holiday season in Europe so do expect some slow responses.
>
>> The key reason the original FLIP is called "Deprecate the Legacy Group 
>> Window Aggregation" is that we also plan to automatically rewrite the group 
>> window agg corresponding the old syntax into the window agg corresponding 
>> the new window TVF syntax (will provide a fallback option from a 
>> compatibility perspective). Whether the window agg corresponding the new 
>> syntax is actively used by user or automatically rewritten, we all rely on 
>> the alignment of the functionality between the window agg and the legacy 
>> group window agg.
>
>That would imply that we will never be able to remove the old SQL
>language from the code base, since we would always rewrite that old
>language to the new implementation under the hood. I don't think
>that's necessarily a good idea, especially given that Flink 2.0 is
>coming next year and we could make a clean break there.
>
>Best regards,
>
>Martijn
>
>On Thu, Dec 21, 2023 at 12:44 PM Xuyang  wrote:
>>
>> Hi, Timo. Sorry to bother you. There's something I really need to hear your 
>> thoughts on.
>>
>>
>>
>>
>> When I'm trying to split this flip, having reviewed this discussion and the 
>> FLIP document again, I realized that there is still a key issue that hasn't 
>> been clarified. The key reason the original FLIP is called "Deprecate the 
>> Legacy Group Window Aggregation" is that we also plan to automatically 
>> rewrite the group window agg corresponding the old syntax into the window 
>> agg corresponding the new window TVF syntax (will provide a fallback option 
>> from a compatibility perspective). Whether the window agg corresponding the 
>> new syntax is actively used by user or automatically rewritten, we all rely 
>> on the alignment of the functionality between the window agg and the legacy 
>> group window agg.
>>
>>
>>
>>
>> To explain in detail, the original flip has the following two core parts.
>>
>>
>>
>>
>> 1. Automatically rewrite the legacy group window agg into the new window agg 
>> during plan optimization. (Corresponding to Section 5 in the Proposed 
>> Changes of the original FLIP)
>>
>>
>>
>>
>> 2. The alignment subtasks that the rewriting work depends on, involve 
>> aligning the features of the two operators. (No one had objections to this 
>> part of the work, and some of them are WIP) (Corresponding to Section 1-4 in 
>> the Proposed Changes of the original FLIP)
>>
>>
>>
>>
>> Currently, there are two ways to deal with this flip.
>>
>>
>>
>>
>> 1. According to your previous suggestion, split the subtasks of the two 
>> alignment features of supporting cdc stream and supporting HOP window size 
>> with non-integer step length into independent flips. Moreover, an additional 
>> Flip should be added to describe the work of automatic plan rewriting. In 
>> the discussion email, associate these three flips together. I'm not sure 
>> that's a bit trivial about doing this because the first two flips are small 
>> features actually. (And we need to discuss it separately and vote on them 
>> individually, right?)
>>
>>
>>
>>
>> 2. Modify the original flip title and change the content to the following 
>> style to highlight the automatic plan rewriting.
>>
>>
>>
>>
>> title: FLIP-392: Automatically rewrite the old syntax's agg to the window 
>> TVF agg
>>
>> part 1: automatic plan rewriting
>>
>> part 2: alignment subtasks the rewriting work depends on
>>
>>
>>
>>
>> I'm looking forward to your reply.
>>
>>
>>
>>
>> --
>>
>> Best!
>> Xuyang
>>
>>
>>
>>
>>
>> 在 2023-12-19 22:30:27,"Timo Walther"  写道:
>> >Hi Xuyang,
>> >
>> >sorry I missed the ping. Sounds reasonable to me. One FLIP about
>> >changelog semantics, the other about SQL semantics.
>> >
>> >Regards,
>> >Timo
>> >
>> >On 19.12.23 02:39, Xuyang wrote:
>> >> Hi, Timo. Sorry for this noise.
>> >> What do you think about splitting the flip like this?
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >>  Best!
>> >>  Xuyang
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> At 2023-12-15 10:05:32, "Xuyang"  wrote:
>> >>> Hi, Timo, thanks for your advice.
>> >>>
>> >>>
>> >>> I am considering splitting the existing flip into two while leaving the 
>> >>> existing flip (or without).
>> >>> One 

Re:Re: Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-11 Thread Xuyang
Hi, Jim.
Thanks for your explaination.
>Ah, I mean to ask if you can contribute the new SESSION Table support
>without needing FLIP-392 completely settled.  I was trying to see if that
>is separate work which can be done or if there is some dependency on this
>FLIP.
The pr available about session window tvf belongs to this flip I think, and is 
part of the work about this flip. Actually the poc pr is not ready completely 
yet, 
I'll try to update it to implement the session window tvf in window agg 
operator instead of using legacy group window agg operator.
>The tests should not be impacted.  Depending on what order our work lands
>in, one of the tests you've added/updated would likely move to the
>RestoreTests that Bonnie and I are working on.  Just mentioning that ahead
>of time

Got it! I will pay attention to it.




--

Best!
Xuyang





在 2023-12-11 21:35:00,"Jim Hughes"  写道:
>Hi Xuyang,
>
>On Sun, Dec 10, 2023 at 10:41 PM Xuyang  wrote:
>
>> Hi, Jim.
>> >As a clarification, since FLINK-24204 is finishing up work from
>> >FLIP-145[1], do we need to discuss anything before you work out the
>> details
>> >of FLINK-24024 as a PR?
>> Which issue do you mean? It seems that FLINK-24204[1] is the issue with
>> table api type system.
>>
>
>Ah, I mean to ask if you can contribute the new SESSION Table support
>without needing FLIP-392 completely settled.  I was trying to see if that
>is separate work which can be done or if there is some dependency on this
>FLIP.
>
>
>> I've got a PR up [3] for moving at least one of the classes you are
>> touching.
>> Nice work! Since we are not going to delete the legacy group window agg
>> operator actually, the only compatibility issue
>> may be that when using flink sql, the legacy group window agg operator
>> will be rewritten into new operators. Will these tests be affected about
>> this rewritten?
>>
>
>The tests should not be impacted.  Depending on what order our work lands
>in, one of the tests you've added/updated would likely move to the
>RestoreTests that Bonnie and I are working on.  Just mentioning that ahead
>of time
>
>Cheers,
>
>Jim
>
>
>
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-24204
>>
>>
>>
>>
>>
>>
>> --
>>
>> Best!
>> Xuyang
>>
>>
>>
>>
>>
>> At 2023-12-09 06:25:30, "Jim Hughes"  wrote:
>> >Hi Xuyang,
>> >
>> >As a clarification, since FLINK-24204 is finishing up work from
>> >FLIP-145[1], do we need to discuss anything before you work out the
>> details
>> >of FLINK-24024 as a PR?
>> >
>> >Relatedly, as that goes up for a PR, as part of FLINK-33421 [2], Bonnie
>> and
>> >I are working through migrating some of the JsonPlan Tests and ITCases to
>> >RestoreTests.  I've got a PR up [3] for moving at least one of the classes
>> >you are touching.  Let me know if I can share any details about that work.
>> >
>> >Cheers,
>> >
>> >Jim
>> >
>> >1.
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows
>> >
>> >2. https://issues.apache.org/jira/browse/FLINK-33421
>> >3. https://github.com/apache/flink/pull/23886
>> >https://issues.apache.org/jira/browse/FLINK-33676
>> >
>> >On Tue, Nov 28, 2023 at 7:31 AM Xuyang  wrote:
>> >
>> >> Hi all.
>> >> I'd like to start a discussion of FLIP-392: Deprecate the Legacy Group
>> >> Window Aggregation.
>> >>
>> >>
>> >> Although the current Flink SQL Window Aggregation documentation[1]
>> >> indicates that the legacy Group Window Aggregation
>> >> syntax has been deprecated, the new Window TVF Aggregation syntax has
>> not
>> >> fully covered all of the features of the legacy one.
>> >>
>> >>
>> >> Compared to Group Window Aggergation, Window TVF Aggergation has several
>> >> advantages, such as two-stage optimization,
>> >> support for standard GROUPING SET syntax, and so on. However, it needs
>> to
>> >> supplement and enrich the following features.
>> >>
>> >>
>> >> 1. Support for SESSION Window TVF Aggregation
>> >> 2. Support for consuming CDC stream
>> >> 3. Support for HOP window size with non-integer step length
>> >> 4. Support for configurations such as early fire, late fire and allow
>> >> lateness
>> >> (which are internal experimental configurations in Group Window
>> >> Aggregation and not public to users yet.)
>> >> 5. Unification of the Window TVF Aggregation operator in runtime at the
>> >> implementation layer
>> >> (In the long term, the cost to maintain the operators about Window TVF
>> >> Aggregation and Group Window Aggregation is too expensive.)
>> >>
>> >>
>> >> This flip aims to continue the unfinished work in FLIP-145[2], which is
>> to
>> >> fully enable the capabilities of Window TVF Aggregation
>> >>  and officially deprecate the legacy syntax Group Window Aggregation, to
>> >> prepare for the removal of the legacy one in Flink 2.0.
>> >>
>> >>
>> >> I have already done some preliminary POC to validate the feasibility of
>> >> the related work 

Re:Re: Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-06 Thread Xuyang
Hi, Shengkai. Thanks to share your thought. Let me answer related questions。


> Could you give an example about the pass-through column. A session window may 
> contain multiple rows, which value is selected by the windowoperator?


The table function make the entire inpyt row available in the output. Take an 
example in Flink, if the input row is  with schema ,
the output of the window tvf can also access the  with schema 
. The session window always output these multi rows with attached 
window_start,
window_end and window_time column.


> What's the behavior if all the data in the window have been removed?


Align the behavior of existing group window agg, and do not output data when 
the window is empty.


> Could you explain more details about the how window process CDC? For example, 
> what's the behavior if the window only gets the DELETE data from the upstream 
> Operator.


Also align the behavior of existing group window agg. However, there is a bug 
in group window agg that currently group window agg has different results when 
only
consuming -D records while using or not using minibatch. Refs more at 
FLINK-33760.


> The subtitle is not correct here.


Updated the doc to fix it.


> It's better if we can introduce a syntax like the `emit` keyword to set the 
> emit strategy for every window.
I agree with you. But I don't recommend using this syntax SESSION(data 
[PARTITION BY (keycols, ...)], DESCRIPTOR(timecol), gap, emit='') because it 
breaks the syntax in Flip-145.
I think using query hint is a better idea. Anyway, this work should belong to 
another flip.


> I think more work should be mentioned in the FLIP. What's the behavior if the 
> input schema contains a column named `window_start`?
> In the current design, `window_start` is a reserved keyword in the window TVF 
> syntax, but it is legal in the legacy implementation.


This is a good question. Perhaps we can introduce a config to add a specific 
config (named "table.window-additional-columns.prefix") to add a prefix 
of all window addition columns to solve this situation. For example, user can 
set the conf to "$", and the additional column from window will become 
"$window_start", "$window_end" and "$window_time". WDYT?


> In the FLIP, you mention the FLIP should introduce an option to fall back to 
> the legacy behavior. Could you tell us what's the name of the option?
> BTW, I think we should unify the implementation when window TVF can do all 
> the work that the legacy operator can do and there is no need to introduce an 
> option to fallback.


What about using config named "table.optimizer.window-rewrite-enabled"? If
I agree with you that util all features are aligned and everything is ok about 
window tvf, we should also remove this config about fallback.
But I think to be on the safe side, we can observe one or two versions of this 
rewrite, and allows users to roll back when problems arise.


> If we remove the legacy window operator in the future, how users upgrade 
> their jobs? Do you have any plan to support state migration from the legacy 
> window to Windows TVF?
IIRC, currently compatibility across middle versions of SQL is not guaranteed. 
Should we add constraints on this part?


Look for your feedback!







--

Best!
Xuyang





At 2023-12-05 12:06:27, "liu ron"  wrote:
>Hi, xuyang
>
>Thanks for starting this FLIP discussion, currently there are two types of
>window aggregation in Flink SQL, namely legacy group window aggregation and
>window tvf aggregation, these two types of window aggregation are not fully
>aligned in behavior, which will bring a lot of confusion to the users, so
>there is a need to unify and align them. I think the final ideal state
>should be that there is only one window tvf aggregation, which supports
>Tumble, HOP, Cumulate and Session windows, and supports consuming CDC data
>streams. There is also support for configuring EARLY-FIRE and LATER-FIRE.
>
>This FLIP is a continuation of FLIP-145, and also supports legacy group
>window aggregation to flat-migrate to the new window tvf agregation, which
>is very useful, especially for the support of CDC streams, a pain point
>that users often feedback. Big +1 for this FLIP.
>
>Best,
>Ron
>
>Xuyang  于2023年12月5日周二 11:11写道:
>
>> Hi, Feng and David.
>>
>>
>> Thank you very much to share your thoughts.
>>
>>
>> This flip does not include the official exposure of these experimental
>> conf to users. Thus there is not adetailed description of this part.
>> However, in view that some technical users may have added these
>> experimental conf in actual production jobs, the processing
>> of these conf while using window tvf syntax has been added to this flip.
>>
>>
>> Overall, the behavior of using these experimental parameters is no
>> different from before, and I think we should provide the compatibility
>> about using these experimental conf.
>>
>>
>> Look for your thoughs.
>>
>>
>>
>>
>> --
>>
>> Best!
>> Xuyang
>>
>>
>>
>>
>>
>>