from:"David Anderson"

[jira] [Created] (FLINK-34477) support capture groups in REGEXP_REPLACE

2024-02-20 Thread David Anderson (Jira)

David Anderson created FLINK-34477:
--

 Summary: support capture groups in REGEXP_REPLACE
 Key: FLINK-34477
 URL: https://issues.apache.org/jira/browse/FLINK-34477
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: David Anderson


For example, I would expect this query
{code:java}

{code}
{{select REGEXP_REPLACE('ERR1,ERR2', '([^,]+)', 'AA$1AA'); }}

 

to produce
{code:java}
AAERR1AA,AAERR2AA{code}
but instead it produces
{code:java}
AA$1AA,AA$1AA{code}
With FLINK-9990 support was added for REGEXP_EXTRACT, which does provide access 
to the capture groups, but for many use cases supporting this in the way that 
users expect in REGEXP_REPLACE would be more natural and convenient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-413: Enable unaligned checkpoints by default

2024-01-07 Thread David Anderson

Piotr, I think the situation is more nuanced than what you've described.

One concern I have is that unaligned checkpoints are somewhat less flexible
in terms of which operational tasks can be safely performed with them --
i.e., if you look at the table in the docs [1], aligned checkpoints support
arbitrary job upgrades and flink minor version upgrades, and unaligned
checkpoints do not.

The change you propose makes the situation here more delicate, because for
most users, most of their checkpoints will actually be aligned checkpoints
(since their checkpoints will typically not contain any on-the-wire state),
and so these unsupported operations would actually work -- but they could
fail. So if a user is in the habit of doing job upgrades with checkpoints,
and are unaware of the danger posed by the change you propose, and continue
to do these operations afterwards, their upgrades will probably continue to
work -- until someday when they may mysteriously fail.

On a separate point, in the sentence below it seems to me it would be
clearer to say that in the unlikely scenario you've described, the change
would "significantly increase checkpoint sizes" -- assuming I understand
things correctly.

> For those users [the] change to the unaligned checkpoints will
significantly increase state size, without any benefits.

It seems to me that the worst case would be situations where this
increase in checkpoint size causes checkpoint failures because the
available throughput to the checkpoint storage is insufficient to handle
the increase in size, resulting in timeouts where it was (perhaps just
barely) okay before.

Admittedly, this is perhaps a contrived scenario, but it is possible.

I haven't made up my mind about this proposal. Overall I'm unhappy about
the level of complexity we've created, and am trying to figure out if this
proposal makes things better or worse overall. At the moment I'm guessing
it makes things better for a significant minority of users, and worse for a
smaller minority.

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/checkpoints_vs_savepoints/#capabilities-and-limitations

David

On Fri, Jan 5, 2024 at 5:42 AM Piotr Nowojski  wrote:

> Ops, fixing the topic.
>
> Hi!
> >
> > I would like to propose by default to enable unaligned checkpoints and
> > also simultaneously increase the aligned checkpoints timeout from 0ms to
> > 5s. I think this change is the right one to do for the majority of Flink
> > users.
> >
> > For more rationale please take a look into the short FLIP-413 [1].
> >
> > What do you all think?
> >
> > Best,
> > Piotrek
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-413%3A+Enable+unaligned+checkpoints+by+default
> >
>

Re: [ANNOUNCE] New Apache Flink Committer - Alexander Fedulov

2024-01-02 Thread David Anderson

That's great news. Congratulations, Alex!

David

On Tue, Jan 2, 2024 at 9:00 AM Ryan Skraba 
wrote:

> Awesome news for the community -- congratulations Alex (and Happy New
> Year everyone!)
>
> Ryan
>
> On Tue, Jan 2, 2024 at 2:55 PM Yun Tang  wrote:
> >
> > Congratulation to Alex and Happy New Year everyone!
> >
> > Best
> > Yun Tang
> > 
> > From: Rui Fan <1996fan...@gmail.com>
> > Sent: Tuesday, January 2, 2024 21:33
> > To: dev@flink.apache.org 
> > Cc: Alexander Fedulov 
> > Subject: Re: [ANNOUNCE] New Apache Flink Committer - Alexander Fedulov
> >
> > Happy new year!
> >
> > Hmm, sorry for the typo in the last email.
> > Congratulations Alex, well done!
> >
> > Best,
> > Rui
> >
> > On Tue, 2 Jan 2024 at 20:23, Rui Fan <1996fan...@gmail.com> wrote:
> >
> > > Configurations Alexander!
> > >
> > > Best,
> > > Rui
> > >
> > > On Tue, Jan 2, 2024 at 8:15 PM Maximilian Michels 
> wrote:
> > >
> > >> Happy New Year everyone,
> > >>
> > >> I'd like to start the year off by announcing Alexander Fedulov as a
> > >> new Flink committer.
> > >>
> > >> Alex has been active in the Flink community since 2019. He has
> > >> contributed more than 100 commits to Flink, its Kubernetes operator,
> > >> and various connectors [1][2].
> > >>
> > >> Especially noteworthy are his contributions on deprecating and
> > >> migrating the old Source API functions and test harnesses, the
> > >> enhancement to flame graphs, the dynamic rescale time computation in
> > >> Flink Autoscaling, as well as all the small enhancements Alex has
> > >> contributed which make a huge difference.
> > >>
> > >> Beyond code contributions, Alex has been an active community member
> > >> with his activity on the mailing lists [3][4], as well as various
> > >> talks and blog posts about Apache Flink [5][6].
> > >>
> > >> Congratulations Alex! The Flink community is proud to have you.
> > >>
> > >> Best,
> > >> The Flink PMC
> > >>
> > >> [1]
> > >>
> https://github.com/search?type=commits=author%3Aafedulov+org%3Aapache
> > >> [2]
> > >>
> https://issues.apache.org/jira/browse/FLINK-28229?jql=status%20in%20(Resolved%2C%20Closed)%20AND%20assignee%20in%20(afedulov)%20ORDER%20BY%20resolved%20DESC%2C%20created%20DESC
> > >> [3]
> https://lists.apache.org/list?dev@flink.apache.org:lte=100M:Fedulov
> > >> [4]
> https://lists.apache.org/list?u...@flink.apache.org:lte=100M:Fedulov
> > >> [5]
> > >>
> https://flink.apache.org/2020/01/15/advanced-flink-application-patterns-vol.1-case-study-of-a-fraud-detection-system/
> > >> [6]
> > >>
> https://www.ververica.com/blog/presenting-our-streaming-concepts-introduction-to-flink-video-series
> > >>
> > >
>

Re: [DISCUSS] FLIP-400: AsyncScalarFunction for asynchronous scalar function support

2023-12-20 Thread David Anderson

I'm delighted to see the progress on this. This is going to be a major
enabler for some important use cases.

The proposed simplifications (global config and ordered mode) for V1 make a
lot of sense to me. +1

David

On Wed, Dec 20, 2023 at 12:31 PM Alan Sheinberg
 wrote:

> Thanks for that feedback Lincoln,
>
> Only one question with the async `timeout` parameter[1](since I
> > haven't seen the POC code), current description is: 'The time which can
> > pass before a restart strategy is triggered',
> > but in the previous flip-232[2] and flip-234[3], in retry scenario, this
> > timeout is the total time, do we keep the behavior of the parameter
> > consistent?
>
> That's a good catch.  I was intending to use *AsyncWaitOperator*, and to
> pass this timeout directly.  Looking through the code a bit, it appears
> that it doesn't restart the timer on a retry, and this timeout is total, as
> you're saying.  I do intend on being consistent with the other FLIPs and
> retaining this behavior, so I will update the wording on my FLIP to reflect
> that.
>
> -Alan
>
> On Wed, Dec 20, 2023 at 1:36 AM Lincoln Lee 
> wrote:
>
> > +1 for this useful feature!
> > Hope this reply isn't too late. Agree that we start with global
> > async-scalar configuration and ordered mode first.
> >
> > @Alan Only one question with the async `timeout` parameter[1](since I
> > haven't seen the POC code), current description is: 'The time which can
> > pass before a restart strategy is triggered',
> > but in the previous flip-232[2] and flip-234[3], in retry scenario, this
> > timeout is the total time, do we keep the behavior of the parameter
> > consistent?
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-400%3A+AsyncScalarFunction+for+asynchronous+scalar+function+support
> > [2]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211883963
> > [3]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-234%3A+Support+Retryable+Lookup+Join+To+Solve+Delayed+Updates+Issue+In+External+Systems
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Alan Sheinberg  于2023年12月20日周三 08:41写道：
> >
> > > Thanks for the comments Timo.
> > >
> > >
> > > > Can you remove the necessary parts? Esp.:
> > >
> > >  @Override
> > > >  public Set getRequirements() {
> > > >  return Collections.singleton(FunctionRequirement.ORDERED);
> > > >  }
> > >
> > >
> > > I removed this section from the FLIP since presumably, there's no use
> in
> > > adding to the public API if it's ignored, with handling just ORDERED
> for
> > > the first version.  I'm not sure how quickly I'll want to add UNORDERED
> > > support, but I guess I can always do another FLIP.
> > >
> > > Otherwise I have no objections to start a VOTE soonish. If others are
> > > > fine as well?
> > >
> > > That would be great.  Any areas that people are interested in
> discussing
> > > further before a vote?
> > >
> > > -Alan
> > >
> > > On Tue, Dec 19, 2023 at 5:49 AM Timo Walther 
> wrote:
> > >
> > > >  > I would be totally fine with the first version only having ORDERED
> > > >  > mode. For a v2, we could attempt to do the next most conservative
> > > >  > thing
> > > >
> > > > Sounds good to me.
> > > >
> > > > I also cheked AsyncWaitOperator and could not find n access of
> > > > StreamRecord's timestamp but only watermarks. But as we said, let's
> > > > focus on ORDERED first.
> > > >
> > > > Can you remove the necessary parts? Esp.:
> > > >
> > > >  @Override
> > > >  public Set getRequirements() {
> > > >  return Collections.singleton(FunctionRequirement.ORDERED);
> > > >  }
> > > >
> > > > Otherwise I have no objections to start a VOTE soonish. If others are
> > > > fine as well?
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > >
> > > > On 19.12.23 07:32, Alan Sheinberg wrote:
> > > > > Thanks for the helpful comments, Xuyang and Timo.
> > > > >
> > > > > @Timo, @Alan: IIUC, there seems to be something wrong here. Take
> > kafka
> > > as
> > > > >> source and mysql as sink as an example.
> > > > >> Although kafka is an append-only source, one of its fields is used
> > as
> > > pk
> > > > >> when writing to mysql. If async udx is executed
> > > > >>   in an unordered mode, there may be problems with the data in
> mysql
> > > in
> > > > the
> > > > >> end. In this case, we need to ensure that
> > > > >> the sink-based pk is in order actually.
> > > > >
> > > > >
> > > > > @Xuyang: That's a great point.  If some node downstream of my
> > operator
> > > > > cares about ordering, there's no way for it to reconstruct the
> > original
> > > > > ordering of the rows as they were input to my operator.  So even if
> > > they
> > > > > want to preserve ordering by key, the order in which they see it
> may
> > > > > already be incorrect.  Somehow I thought that maybe the analysis of
> > the
> > > > > changelog mode at a given operator was aware of downstream
> > operations,
> > > > but
> > > > > it seems not.
> > > > >
>

Re: [DISCUSS] FLIP-392: Deprecate the Legacy Group Window Aggregation

2023-12-04 Thread David Anderson

The current situation (where we have both the legacy windows and the
TVF-based windows) is confusing for users, and I'd like to see us move
forward as rapidly as possible.

Since the early fire, late fire, and allowed lateness features were never
documented or exposed to users, I don't feel that we need to provide
replacements for these internal, experimental features before officially
deprecating the legacy group window aggregations, and I'd rather not wait.

However, I'd be delighted to see a proposal for what that might look like.

Best,
David

On Mon, Dec 4, 2023 at 12:45 PM Feng Jin  wrote:

> Hi xuyang,
>
> Thank you for initiating this proposal.
>
> I'm glad to see that TVF's functionality can be fully supported.
>
> Regarding the early fire, late fire, and allow lateness features, how will
> they be provided to users? The documentation doesn't seem to provide a
> detailed description of this part.
>
> Since this FLIP will also involve a lot of feature development, I am more
> than willing to help, including development and code review.
>
> Best,
> Feng
>
> On Tue, Nov 28, 2023 at 8:31 PM Xuyang  wrote:
>
> > Hi all.
> > I'd like to start a discussion of FLIP-392: Deprecate the Legacy Group
> > Window Aggregation.
> >
> >
> > Although the current Flink SQL Window Aggregation documentation[1]
> > indicates that the legacy Group Window Aggregation
> > syntax has been deprecated, the new Window TVF Aggregation syntax has not
> > fully covered all of the features of the legacy one.
> >
> >
> > Compared to Group Window Aggergation, Window TVF Aggergation has several
> > advantages, such as two-stage optimization,
> > support for standard GROUPING SET syntax, and so on. However, it needs to
> > supplement and enrich the following features.
> >
> >
> > 1. Support for SESSION Window TVF Aggregation
> > 2. Support for consuming CDC stream
> > 3. Support for HOP window size with non-integer step length
> > 4. Support for configurations such as early fire, late fire and allow
> > lateness
> > (which are internal experimental configurations in Group Window
> > Aggregation and not public to users yet.)
> > 5. Unification of the Window TVF Aggregation operator in runtime at the
> > implementation layer
> > (In the long term, the cost to maintain the operators about Window TVF
> > Aggregation and Group Window Aggregation is too expensive.)
> >
> >
> > This flip aims to continue the unfinished work in FLIP-145[2], which is
> to
> > fully enable the capabilities of Window TVF Aggregation
> >  and officially deprecate the legacy syntax Group Window Aggregation, to
> > prepare for the removal of the legacy one in Flink 2.0.
> >
> >
> > I have already done some preliminary POC to validate the feasibility of
> > the related work in this flip as follows.
> > 1. POC for SESSION Window TVF Aggregation [3]
> > 2. POC for CUMULATE in Group Window Aggregation operator [4]
> > 3. POC for consuming CDC stream in Window Aggregation operator [5]
> >
> >
> > Looking forward to your feedback and thoughts!
> >
> >
> >
> > [1]
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/
> >
> > [2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-SessionWindows
> > [3] https://github.com/xuyangzhong/flink/tree/FLINK-24024
> > [4]
> >
> https://github.com/xuyangzhong/flink/tree/poc_legacy_group_window_agg_cumulate
> > [5]
> >
> https://github.com/xuyangzhong/flink/tree/poc_window_agg_consumes_cdc_stream
> >
> >
> >
> > --
> >
> > Best！
> > Xuyang
>

Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-11-17 Thread David Anderson

Rui,

I don't have any direct experience with this topic, but given the
motivation you shared, the proposal makes sense to me. Given that the new
default feels more complex than the current behavior, if we decide to do
this I think it will be important to include the rationale you've shared in
the documentation.

David

On Wed, Nov 15, 2023 at 10:17 PM Rui Fan <1996fan...@gmail.com> wrote:

> Hi dear flink users and devs:
>
> FLIP-364[1] intends to make some improvements to restart-strategy
> and discuss updating some of the default values of exponential-delay,
> and whether exponential-delay can be used as the default restart-strategy.
> After discussing at dev mail list[2], we hope to collect more feedback
> from Flink users.
>
> # Why does the default restart-strategy need to be updated?
>
> If checkpointing is enabled, the default value is fixed-delay with
> Integer.MAX_VALUE restart attempts and '1 s' delay[3]. It means
> the job will restart infinitely with high frequency when a job
> continues to fail.
>
> When the Kafka cluster fails, a large number of flink jobs will be
> restarted frequently. After the kafka cluster is recovered, a large
> number of high-frequency restarts of flink jobs may cause the
> kafka cluster to avalanche again.
>
> Considering the exponential-delay as the default strategy with
> a couple of reasons:
>
> - The exponential-delay can reduce the restart frequency when
>   a job continues to fail.
> - It can restart a job quickly when a job fails occasionally.
> - The restart-strategy.exponential-delay.jitter-factor can avoid r
>   estarting multiple jobs at the same time. It’s useful to prevent
>   avalanches.
>
> # What are the current default values[4] of exponential-delay?
>
> restart-strategy.exponential-delay.initial-backoff : 1s
> restart-strategy.exponential-delay.backoff-multiplier : 2.0
> restart-strategy.exponential-delay.jitter-factor : 0.1
> restart-strategy.exponential-delay.max-backoff : 5 min
> restart-strategy.exponential-delay.reset-backoff-threshold : 1h
>
> backoff-multiplier=2 means that the delay time of each restart
> will be doubled. The delay times are:
> 1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s, 256s, 300s, 300s, etc.
>
> The delay time is increased rapidly, it will affect the recover
> time for flink jobs.
>
> # Option improvements
>
> We think the backoff-multiplier between 1 and 2 is more sensible,
> such as:
>
> restart-strategy.exponential-delay.backoff-multiplier : 1.2
> restart-strategy.exponential-delay.max-backoff : 1 min
>
> After updating, the delay times are:
>
> 1s, 1.2s, 1.44s, 1.728s, 2.073s, 2.488s, 2.985s, 3.583s, 4.299s,
> 5.159s, 6.191s, 7.430s, 8.916s, 10.699s, 12.839s, 15.407s, 18.488s,
> 22.186s, 26.623s, 31.948s, 38.337s, etc
>
> They achieve the following goals:
> - When restarts are infrequent in a short period of time, flink can
>   quickly restart the job. (For example: the retry delay time when
>   restarting 5 times is 2.073s)
> - When restarting frequently in a short period of time, flink can
>   slightly reduce the restart frequency to prevent avalanches.
>   (For example: the retry delay time when retrying 10 times is 5.1 s,
>   and the retry delay time when retrying 20 times is 38s, which is not very
> large.)
>
> As @Mingliang Liu   mentioned at dev mail list: the
> one-size-fits-all
> default values do not exist. So our goal is that the default values
> can be suitable for most jobs.
>
> Looking forward to your thoughts and feedback, thanks~
>
> [1] https://cwiki.apache.org/confluence/x/uJqzDw
> [2] https://lists.apache.org/thread/5cgrft73kgkzkgjozf9zfk0w2oj7rjym
> [3]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#restart-strategy-type
> [4]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/#exponential-delay-restart-strategy
>
> Best,
> Rui
>

Re: Pointers to computational models of Flink CEP

2023-10-31 Thread David Anderson

The implementation of Flink CEP was largely based on Efficient Pattern
Matching over Event Streams by Jagrati Agrawal, Yanlei Diao, Daniel
Gyllstrom, and Neil Immerman from UMass Amherst [1].

[1] https://people.cs.umass.edu/~yanlei/publications/sase-sigmod08.pdf

Cheers,
David

On Tue, Oct 31, 2023 at 10:04 AM Vishwas Kalani 
wrote:

> Hey,
> I am a computer science student working on a project related to flink cep.
> I want to understand the theoretical foundations of flink cep. Could I get
> some pointers to the computational models used by Flink CEP and how do
> various elements of flink CEP function.
> Thanking you in advance
>

Re: Re: [DISCUSS] FLIP-357: Deprecate Iteration API of DataStream

2023-09-01 Thread David Anderson

+1

Keeping the legacy implementation in place is confusing and encourages
adoption of something that really shouldn't be used.

Thanks for driving this,
David

On Fri, Sep 1, 2023 at 8:45 AM Jing Ge  wrote:
>
> Hi Wencong,
>
> Thanks for your clarification! +1
>
> Best regards,
> Jing
>
> On Fri, Sep 1, 2023 at 12:36 PM Wencong Liu  wrote:
>
> > Hi Jing,
> >
> >
> > Thanks for your reply！
> >
> >
> > > Or the "independent module extraction" mentioned in the FLIP does mean an
> > independent module in Flink?
> >
> >
> > Yes. If there are submodules in Flink repository needs the iteration
> > (currently not),
> > we could consider extracting them to a new submodule of Flink.
> >
> >
> > > users will have to add one more dependency of Flink ML. If iteration is
> > the
> > only feature they need, it will look a little bit weird.
> >
> >
> > If users only need to execute iteration jobs, they can simply remove the
> > Flink
> > dependency and add the necessary dependencies related to Flink ML.
> > However,
> > they can still utilize the DataStream API as it is also a dependency of
> > Flink ML.
> >
> >
> > Keeping an iteration submodule in Flink repository and make Flink ML
> > depends it
> > is also another solution. But the current implementation of Iteration in
> > DataStream
> > should be removed definitely due to its Incompleteness.
> >
> >
> > The placement of the Iteration API in the repository is a topic that has
> > multiple
> > potential solutions. WDYT?
> >
> >
> > Best,
> > Wencong
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > At 2023-09-01 17:59:34, "Jing Ge"  wrote:
> > >Hi Wencong,
> > >
> > >Thanks for the proposal!
> > >
> > >"The Iteration API in DataStream is planned be deprecated in Flink 1.19
> > and
> > >then finally removed in Flink 2.0. For the users that rely on the
> > Iteration
> > >API in DataStream, they will have to migrate to Flink ML."
> > >- Does it make sense to migrate the iteration module into Flink directly?
> > >Or the "independent module extraction" mentioned in the FLIP does mean an
> > >independent module in Flink? Since the iteration will be removed in Flink,
> > >users will have to add one more dependency of Flink ML. If iteration is
> > the
> > >only feature they need, it will look a little bit weird.
> > >
> > >
> > >Best regards,
> > >Jing
> > >
> > >On Fri, Sep 1, 2023 at 11:05 AM weijie guo 
> > >wrote:
> > >
> > >> Thanks, +1 for this.
> > >>
> > >> Best regards,
> > >>
> > >> Weijie
> > >>
> > >>
> > >> Yangze Guo  于2023年9月1日周五 14:29写道：
> > >>
> > >> > +1
> > >> >
> > >> > Thanks for driving this.
> > >> >
> > >> > Best,
> > >> > Yangze Guo
> > >> >
> > >> > On Fri, Sep 1, 2023 at 2:00 PM Xintong Song 
> > >> wrote:
> > >> > >
> > >> > > +1
> > >> > >
> > >> > > Best,
> > >> > >
> > >> > > Xintong
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Fri, Sep 1, 2023 at 1:11 PM Dong Lin 
> > wrote:
> > >> > >
> > >> > > > Thanks Wencong for initiating the discussion.
> > >> > > >
> > >> > > > +1 for the proposal.
> > >> > > >
> > >> > > > On Fri, Sep 1, 2023 at 12:00 PM Wencong Liu  > >
> > >> > wrote:
> > >> > > >
> > >> > > > > Hi devs,
> > >> > > > >
> > >> > > > > I would like to start a discussion on FLIP-357: Deprecate
> > Iteration
> > >> > API
> > >> > > > of
> > >> > > > > DataStream [1].
> > >> > > > >
> > >> > > > > Currently, the Iteration API of DataStream is incomplete. For
> > >> > instance,
> > >> > > > it
> > >> > > > > lacks support
> > >> > > > > for iteration in sync mode and exactly once semantics.
> > >> Additionally,
> > >> > it
> > >> > > > > does not offer the
> > >> > > > > ability to set iteration termination conditions. As a result,
> > it's
> > >> > hard
> > >> > > > > for developers to
> > >> > > > > build an iteration pipeline by DataStream in the practical
> > >> > applications
> > >> > > > > such as machine learning.
> > >> > > > >
> > >> > > > > FLIP-176: Unified Iteration to Support Algorithms [2] has
> > >> introduced
> > >> > a
> > >> > > > > unified iteration library
> > >> > > > > in the Flink ML repository. This library addresses all the
> > issues
> > >> > present
> > >> > > > > in the Iteration API of
> > >> > > > > DataStream and could provide solution for all the iteration
> > >> > use-cases.
> > >> > > > > However, maintaining two
> > >> > > > > separate implementations of iteration in both the Flink
> > repository
> > >> > and
> > >> > > > the
> > >> > > > > Flink ML repository
> > >> > > > > would introduce unnecessary complexity and make it difficult to
> > >> > maintain
> > >> > > > > the Iteration API.
> > >> > > > >
> > >> > > > > As such I propose deprecating the Iteration API of DataStream
> > and
> > >> > > > removing
> > >> > > > > it completely in the next
> > >> > > > > major version. In the future, if other modules in the Flink
> > >> > repository
> > >> > > > > require the use of the
> > >> > > > > Iteration API, we can consider extracting all Iteration
> > >> > implementations
> > >> > > > > from the Flink

Re: log4j2 integration with flink 1.10

2023-08-28 Thread David Anderson

Flink switched to using log4j2 with the 1.11 release. Looking at the
ticket involved [1] should give you some idea of the effort involved
in backporting that to 1.10.

My initial impression is that you might not have to do much to get it
working, The comments in [2] appear to outline what's needed.

David

[1] https://issues.apache.org/jira/browse/FLINK-15672
[2] https://issues.apache.org/jira/browse/FLINK-5339

On Mon, Aug 21, 2023 at 4:21 AM Y SREEKARA BHARGAVA REDDY
 wrote:
>
> Hi Team,
>
> Does any one did integration flink 1.10 with log4j2 instead of log4j1.
>
> If yes, please share the reference for that.
>
> Any suggestion, greatly appreciated.
>
>
> Regards,
> Nagireddy Y.

Re: [DISCUSS] FLIP-326: Enhance Watermark to Support Processing-Time Temporal Join

2023-07-25 Thread David Anderson

Dong,

Thank you for the careful analysis of my proposal. Your conclusions
make sense to me.

David

On Mon, Jul 24, 2023 at 8:37 PM Dong Lin  wrote:
>
> Hi David,
>
> Thank you for the detailed comments and the suggestion of this alternative 
> approach.
>
> I agree with you that this alternative can also address the target use-case 
> with the same correctness. In comparison to the current FLIP, this 
> alternative indeed introduces much less complexity to the Flink runtime 
> internal implementation.
>
> At a high level, this alternative is simulating a one-time emission of 
> Watermark(useProcessingTime=true) with periodic emission of 
> Watermark(timestamp=wall-lock-time).
>
> One downside of this alternative is that it can introduce a bit of extra 
> per-record runtime overhead. This is because the ingestion time watermark 
> will be emitted periodically according to pipeline.auto-watermark-interval 
> (200 ms by default). Thus there is still a short period where the watermark 
> from the HybridSource can be lagging behind wall-clock time. For operators 
> whose logic depends on the watermark, such as TemporalRowTimeJoinOperator, 
> they will need to check build-side watermark and delay/buffer records on the 
> probe-side until it receives the next ingestion-time watermark.
>
> The impact of this overhead probably depends on the throughput/watermark of 
> the probe-side records. On the other hand, given that join operator is 
> typically already heavy (due to state backend access and build-side buffer), 
> and the watermark from probe-side (e.g. Kafka) is probably also lagging 
> behind wall-clock time, it is probably not an issue in most cases. Therefore 
> I agree that it is worth trying this approach. We can revisit this issue if 
> we any issues around performance or usability of this approach.
>
> Another potential concern is that it requires the user to use ingestion time. 
> I am not sure we are able to do this in a backward-compatible way yet. We 
> probably need to go through the existing APIs around ingestion time watermark 
> to validate this.
>
> BTW, with the introduction of RecordAttributes(isBacklog=true/false) from 
> FLIP-327, another short-term approach is to let 
> TemporalProcessTimeJoinOperator keep buffering records from 
> MySQL/HybridSource as long as isBacklog=true, and process them in a 
> processing-time manner once it receives isBacklog=false. This should also 
> address the use-case targeted by FLIP-326. The only caveat with this approach 
> is that it is a bit hacky, because it requires JoinOpertor to always buffer 
> records when isBacklog=true, whereas isBacklog's semantics only says it is 
> "optional" to buffer records, which can be an issue in the long term.
>
> Thanks,
> Dong
>
> On Tue, Jul 25, 2023 at 2:37 AM David Anderson  wrote:
>>
>> I'm delighted to see interest in developing support for
>> processing-time temporal joins.
>>
>> The proposed implementation seems rather complex, and I'm not
>> convinced this complexity is justified/necessary. I'd like to outline
>> a simpler alternative that I think would satisfy the key objectives.
>>
>> Key ideas:
>>
>> 1. Limit support to the HybridSource (or a derivative thereof). (E.g.,
>> I'm guessing the MySQL CDC Source could be reworked to be a hybrid
>> source.)
>> 2. Have this HybridSource wait to begin emitting watermarks until it
>> has handled all events from the bounded sources. (I'm not sure how the
>> HybridSource handles this now; if this is an incompatible change, we
>> can find a way to deal with that.)
>> 3. Instruct users to use an ingestion time watermarking strategy for
>> their unbounded source (the source the HybridSource handles last) if
>> they want to do something like a processing time temporal join.
>>
>> One objection to this is the limitation of only supporting the
>> HybridSource -- what about cases where the user has a single source,
>> e.g., a Kafka topic? I'm suggesting the user would divide their
>> build-side stream into two parts -- a bounded component that is fully
>> ingested by the hybrid source before watermarking begins, followed by
>> an unbounded component.
>>
>> I think this alternative handles use cases like processing-time
>> temporal join rather nicely, without requiring any changes to
>> watermarks or the core runtime.
>>
>> David
>>
>> On Thu, Jun 29, 2023 at 1:39 AM Martijn Visser  
>> wrote:
>> >
>> > Hi Dong and Xuannan,
>> >
>> > I'm excited to see this FLIP. I think support for processing-time
>> > temporal joins is something that the Flink users will

Re: [DISCUSS] FLIP-326: Enhance Watermark to Support Processing-Time Temporal Join

2023-07-24 Thread David Anderson

I'm delighted to see interest in developing support for
processing-time temporal joins.

The proposed implementation seems rather complex, and I'm not
convinced this complexity is justified/necessary. I'd like to outline
a simpler alternative that I think would satisfy the key objectives.

Key ideas:

1. Limit support to the HybridSource (or a derivative thereof). (E.g.,
I'm guessing the MySQL CDC Source could be reworked to be a hybrid
source.)
2. Have this HybridSource wait to begin emitting watermarks until it
has handled all events from the bounded sources. (I'm not sure how the
HybridSource handles this now; if this is an incompatible change, we
can find a way to deal with that.)
3. Instruct users to use an ingestion time watermarking strategy for
their unbounded source (the source the HybridSource handles last) if
they want to do something like a processing time temporal join.

One objection to this is the limitation of only supporting the
HybridSource -- what about cases where the user has a single source,
e.g., a Kafka topic? I'm suggesting the user would divide their
build-side stream into two parts -- a bounded component that is fully
ingested by the hybrid source before watermarking begins, followed by
an unbounded component.

I think this alternative handles use cases like processing-time
temporal join rather nicely, without requiring any changes to
watermarks or the core runtime.

David

On Thu, Jun 29, 2023 at 1:39 AM Martijn Visser  wrote:
>
> Hi Dong and Xuannan,
>
> I'm excited to see this FLIP. I think support for processing-time
> temporal joins is something that the Flink users will greatly benefit
> off. I specifically want to call-out that it's great to see the use
> cases that this enables. From a technical implementation perspective,
> I defer to the opinion of others with expertise on this topic.
>
> Best regards,
>
> Martijn
>
> On Sun, Jun 25, 2023 at 9:03 AM Xuannan Su  wrote:
> >
> > Hi all,
> >
> > Dong(cc'ed) and I are opening this thread to discuss our proposal to
> > enhance the watermark to properly support processing-time temporal
> > join, which has been documented in FLIP-326 [1].
> >
> > We want to support the use case where the records from the probe side
> > of the processing-time temporal join need to wait until the build side
> > finishes the snapshot phrase by enhancing the expressiveness of the
> > Watermark. Additionally, these changes lay the groundwork for
> > simplifying the DataStream APIs, eliminating the need for users to
> > explicitly differentiate between event-time and processing-time,
> > resulting in a more intuitive user experience.
> >
> > Please refer to the FLIP document for more details about the proposed
> > design and implementation. We welcome any feedback and opinions on
> > this proposal.
> >
> > Best regards,
> >
> > Dong and Xuannan
> >
> > [1] 
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-326%3A+Enhance+Watermark+to+Support+Processing-Time+Temporal+Join

[jira] [Created] (FLINK-32099) create flink_data volume for operations playground

2023-05-15 Thread David Anderson (Jira)

David Anderson created FLINK-32099:
--

 Summary: create flink_data volume for operations playground
 Key: FLINK-32099
 URL: https://issues.apache.org/jira/browse/FLINK-32099
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training / Exercises
Affects Versions: 1.17.0
Reporter: David Anderson


The docker-based operations playground instructs the user to create temp 
directories on the host machine for checkpoints and savepoints that are then 
mounted in the containers. This can be problematic on windows machines. It 
would be better to use a docker volume.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [NOTICE] Flink master branch now uses Maven 3.8.6

2023-05-12 Thread David Anderson

Chesnay, thank you for all your hard work on this!

David

On Fri, May 12, 2023 at 4:03 PM Chesnay Schepler  wrote:
>
>
>   What happened?
>
> I have just merged the last commits to properly support Maven 3.3+ on
> the Flink master branch.
>
> mvnw and CI have been updated to use Maven 3.8.6.
>
>
>   What does this mean for me?
>
>   * You can now use Maven versions beyond 3.2.5 (duh).
>   o Most versions should work, but 3.8.6 was the most tested and is
> thus recommended.
>   o 3.8.*5* is known to *NOT* work.
>   * Starting from 1.18.0 you need to use Maven 3.8.6 for releases.
>   o This may change to a later version until the release of 1.18.0.
>   o There have been too many issues with recent Maven releases to
> make a range acceptable.
>   * *All dependencies that are bundled by a module must be marked as
> optional.*
>   o *This is verified on CI
> 
> .*
>   o *Background info can be found in the wiki
> .*
>
>
>   Can I continue using Maven 3.2.5?
>
> For now, yes, but support will eventually be removed.
>
>
>   Does this affect users?
>
> No.
>
>
> Please ping me if you run into any issues.

what happened to the images in FLIP-24: SQL Client?

2023-04-04 Thread David Anderson

Does anyone know what happened to the diagrams that used to be in
FLIP-24: SQL Client? The last time I looked at this FLIP -- a few
weeks ago -- there were architecture diagrams for Gateway  Mode and
Embedded Mode, but now those images are missing.

David

Re: Unit Testing onTimer() event function with TestHarness - onTimer() not being called

2023-03-25 Thread David Anderson

1. The timestamp passed to testHarness.processElement should be the
timestamp that would have been extracted from the element by the
timestamp extractor in your watermark strategy.

2. Your tests should call testHarness.processWatermark and pass in the
watermark(s) you want to work with.

processBroadcastWatermark is used for testing the behavior of a
(Keyed)BroadcastProcessFunction when a watermark arrives on the
broadcast channel.

Your test might look something like this:

// send in some data
testHarness.processElement(6L, 10L);

// verify that a timer was created
assertThat(testHarness.numEventTimeTimers(), is(1));

// should cause the fire timer to fire
testHarness.processWatermark(new Watermark(20L));
assertThat(testHarness.numEventTimeTimers(), is(0));

// verify the results
assertThat(testHarness.getOutput(), containsInExactlyThisOrder(6L));

Best,
David


On Wed, Mar 22, 2023 at 6:09 AM Gabriel Angel Amarista Rodrigues
 wrote:
>
> Hello dear Flink team
>
> We are rather new to Flink and having problems understanding how unit
> testing works overall.
> We want to test the* onTimer()* method after registering a timer in the
> *processElement()* function. However, the *onTimer() *is never called.
>
> We were investigating the documentation, blog posts, books, anything we
> could find and collected a few questions/doubts that we would like your
> input on:
>
> 1. We are using event time and when calling *testHarness.processElement. *The
> function through TestHarness however requires a *timestamp*.
> Is this *timestamp *supposed to be set as the *processingTimestamp*?
> We are, however, interested in an *EventTimeFunction*, not a
> *ProcessingTimerFunction.* Would it make sense in our use case to call
> testHarness.processElement(event, event.getTimestamp())?
>
> 2. We have added a debug call to *ctx.currentWaterMark()* call in the
> original *processElement* function but this always returns *Long.MIN value*
> even though we call *processWaterMark* before the *processElement* call.
>
> Is the *processWatermark *constrained in tests somehow?
>
> We noticed there is also a *processBroadcastWatermark*. Is this necessary
> when we are working with* KeyedProcessingFunctions? *Is it analogous to
> *processBroadcastElement()*?
>
> We are really excited about working with Flink, it is a great tool.
> Great job!
>
> We will be waiting for your input.
> Thanks beforehand for your support.
>
> Kind regards,
> Gabriel Rodrigues

[jira] [Created] (FLINK-31388) restart from savepoint fails with "userVisibleTail should not be larger than offset. This is a bug."

2023-03-09 Thread David Anderson (Jira)

David Anderson created FLINK-31388:
--

 Summary: restart from savepoint fails with "userVisibleTail should 
not be larger than offset. This is a bug."
 Key: FLINK-31388
 URL: https://issues.apache.org/jira/browse/FLINK-31388
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Affects Versions: 1.16.1
Reporter: David Anderson


I took a savepoint, then used 
{code:java}
SET 'execution.savepoint.path' = ...
{code}
to set the savepoint path, and then re-executed the query that had been running 
before the stop-with-savepoint.

It was not an INSERT INTO job, but rather a "collect" job running a SELECT 
query.

It then failed with
{code:java}
userVisibleTail should not be larger than offset. This is a bug.
{code}

Perhaps there is an unstated requirement that using the sql-client to restart 
from a savepoint only works with INSERT INTO jobs?





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-31361) job created by sql-client can't authenticate to kafka, can't find org.apache.kafka.common.security.plain.PlainLoginModule

2023-03-07 Thread David Anderson (Jira)

David Anderson created FLINK-31361:
--

 Summary: job created by sql-client can't authenticate to kafka, 
can't find org.apache.kafka.common.security.plain.PlainLoginModule
 Key: FLINK-31361
 URL: https://issues.apache.org/jira/browse/FLINK-31361
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.16.1
Reporter: David Anderson


I'm working with this SQL DDL:
{noformat}
CREATE TABLE pageviews_sink (
  `url` STRING,
  `user_id` STRING,
  `browser` STRING,
  `ts` TIMESTAMP_LTZ(3)
) WITH (
  'connector' = 'kafka',
  'topic' = 'pageviews',
  'properties.bootstrap.servers' = 'xxx.confluent.cloud:9092',
  'properties.security.protocol'='SASL_SSL',
  'properties.sasl.mechanism'='PLAIN',
  
'properties.sasl.jaas.config'='org.apache.kafka.common.security.plain.PlainLoginModule
 required username="xxx" password="xxx";',
  'key.format' = 'json',
  'key.fields' = 'url',
  'value.format' = 'json'
);
{noformat}
With {{flink-sql-connector-kafka-1.16.1.jar}} in the lib directory, this fails 
with 
{noformat}
Caused by: javax.security.auth.login.LoginException: No LoginModule found for 
org.apache.kafka.common.security.plain.PlainLoginModule{noformat}
As a workaround I've found that it does work if I provide both
 
{{flink-connector-kafka-1.16.1.jar}}
{{kafka-clients-3.2.3.jar}}
 
in the lib directory. It seems like the relocation applied in the SQL connector 
isn't working properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] Incorporate performance regression monitoring into routine process

2023-02-03 Thread David Anderson

+1

I don't have anything substantive to add, but I want to express how pleased
I am to see this conversation happening.

David

On Thu, Feb 2, 2023 at 5:09 AM Martijn Visser 
wrote:

> Hi all,
>
> +1 for the overall proposal. My feedback matches with what Matthias
> has already provided earlier.
>
> Best regards,
>
> Martijn
>
> Op di 31 jan. 2023 om 19:01 schreef Matthias Pohl
> :
> >
> > Thanks for the effort you put into this discussion, Yanfei.
> >
> > For me, the regression tests are similar to test instabilities. In this
> > sense, I agree with what Piotr and Jing said about it:
> > - It needs to be identified and fixed as soon as possible to avoid the
> > change affecting other contributions (e.g. hiding other regressions) and
> > making it harder to revert them.
> > - Contributors should be made aware of regressions that are caused by
> their
> > commits during the daily development. They should be enabled to
> > pro-actively react to instabilities/regressions that were caused by their
> > contributions. The available Slack channel and ML lists provide good
> tools
> > to enable contributors to be proactive.
> >
> > My experience is that contributors are quite proactive when it comes to
> > resolving test instabilities on master. Still, a dedicated role for
> > watching over CI is necessary to identify issues that slipped through.
> The
> > same applies to performance tests. Eventhough, I could imagine that for
> > performance tests, the pro-activeness of contributors is lower because
> > quite a few changes are just not affecting the performance. One idea  to
> > raise awareness might be to mention the performance tests in the PR
> > template (maybe next to some of the yes/no questions where a yes might
> > indicate that the performance is affected).
> >
> > About making monitoring of performance tests part of the release
> > management: Right now, watching CI for test instabilities and pinging
> > contributors on issues is already a task for release managers. Extending
> > this responsibility to also check the regression tests seems natural.
> >
> > Yanfei's write-up is a good proposal for general performance test
> > guidelines. It will help contributors and release managers alike. We
> could
> > integrate it into the release testing documentation [1]. I'm wondering
> > whether we would want to add a new Jira label to group these kinds of
> Jira
> > issues analogously to test-instabilities [2].
> >
> > But that said, I want to repeat the idea of organizing the release
> > management documentation in a way that we have not only general release
> > managers with a bunch of different tasks but dedicated roles within the
> > release management. Roles I could imagine based on our experience from
> the
> > past releases are:
> > - CI monitoring: Watching the #builds Slack channel for test
> instabilities
> > - Regression test monitoring: Watching the #flink-dev-benchmarks for
> > regressions
> > - Coordination: Announcements, documentation, organizational stuff (e.g.
> > release call)
> >
> > Having dedicated roles might help finding volunteers for the release
> > management. Volunteers could sign up for a dedicated role. Each role
> would
> > provide a clear set of tasks/responsibilities. They don't restrict us in
> > any way because multiple roles can be also fulfilled by a single person.
> > It's just about cutting the tasks into smaller chunks to make it less
> > overwhelming.
> >
> > Matthias
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/Release+Management+and+Feature+Plan
> > [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Jira+Process
> >
> > On Tue, Jan 31, 2023 at 5:16 PM Jing Ge 
> wrote:
> >
> > > Hi Yanfei,
> > >
> > > Thanks for your proposal and effort of driving it.
> > >
> > > I really like the document on how to deal with the performance
> regressions.
> > > This will coach more developers to be able to work with it. I would
> suggest
> > > that more developers will be aware of the performance regressions
> during
> > > the daily development, not only for release managers. The document
> like you
> > > drafted is the key to get them involved. It would be great if the
> document
> > > could be extended and cover all types of regressions, case by case. It
> will
> > > be a one-time effort and benefit for a long time.
> > >
> > > Best regards,
> > > Jing
> > >
> > > On Tue, Jan 31, 2023 at 12:10 PM Piotr Nowojski 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > @Dong, I think the drawback of your proposal is that it wouldn't
> detect
> > > if
> > > > there is visible performance regression within benchmark noise. While
> > > this
> > > > should be do-able with large enough number of samples
> > > >
> > > > @Yanfei, Gathering medians would have basically the same problems
> with
> > > how
> > > > to deal with two consecutive performance changes. Another issue is
> that I
> > > > think it would be great to have an algorithm, where you can enter
> what's
>

[jira] [Created] (FLINK-30563) Update training exercises to use Flink 1.16

2023-01-04 Thread David Anderson (Jira)

David Anderson created FLINK-30563:
--

 Summary: Update training exercises to use Flink 1.16
 Key: FLINK-30563
 URL: https://issues.apache.org/jira/browse/FLINK-30563
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training / Exercises
Affects Versions: 1.16.0
Reporter: David Anderson


The training exercises in the 
[flink-training|https://github.com/apache/flink-training] repo need to be 
updated to use Flink 1.16.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] Extending the feature freezing date of Flink 1.17

2023-01-02 Thread David Anderson

I'm also in favor of extending the feature freeze to Jan 31st.

David

On Thu, Dec 29, 2022 at 9:01 AM Leonard Xu  wrote:

> Thanks Qingsheng for the proposal, the pandemic has really impacted
> development schedules.
>
> Jan 31st makes sense to me.
>
>
> Best,
> Leonard
>
>

Re: [VOTE] Update Flink's Scala 2.12 support from 2.12.7 to 2.12.16

2022-12-17 Thread David Anderson

+1 (binding)



On Fri, Dec 16, 2022 at 12:22 PM Martijn Visser 
wrote:

> Hi all,
>
> I'm bumping this old vote thread once more.
>
> If we want to add Java 17 support at some point, we will need to update
> our Scala 2.12 version (see
> https://issues.apache.org/jira/browse/FLINK-25000). As explained before
> in the discussion thread, that means that we would break binary & savepoint
> compatibility for Flink's Scala users.
>
> Best regards,
>
> Martijn
>
> On 2022/06/11 13:39:54 Konstantin Knauf wrote:
> > +1 (binding)
> >
> > Am Fr., 10. Juni 2022 um 18:20 Uhr schrieb Ran Tao <
> chucheng...@gmail.com>:
> >
> > > +1 non-binding
> > >
> > > for that we can't always block at 2.12.7. but we should let users know
> > > about this binary incompatibility if execute this update.
> > > btw, the configuration
> > > of scala-maven-plugin may need to update to match target-8, target-11
> jvm
> > > format.
> > >
> > >
> > > Martijn Visser 于2022年6月10日 周五18:32写道：
> > >
> > > > Hi everyone,
> > > >
> > > > As previously discussed [1] I would like to open a vote thread for
> > > updating
> > > > Flink from Scala 2.12.7 to Scala 2.12.16 and accept that there will
> be a
> > > > binary incompatibility when this upgrade is executed.
> > > >
> > > > For this vote, I'm following the FLIP criteria since I think that
> best
> > > fits
> > > > with this proposal. That means that the vote will be open for at
> least 72
> > > > hours unless there is an objection or not enough votes. Any
> committer can
> > > > provide a binding vote.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn Visser
> > > > https://twitter.com/MartijnVisser82
> > > > https://github.com/MartijnVisser
> > > >
> > > > [1] https://lists.apache.org/thread/9ft0tbhj45fplt5j3gtb7jzbt4tdr6rh
> > > >
> > > --
> > > Best,
> > > Ran Tao
> > >
> >
> >
> > --
> > https://twitter.com/snntrable
> > https://github.com/knaufk
> >
>

[jira] [Created] (FLINK-30442) Update table walkthrough playground for 1.12

2022-12-16 Thread David Anderson (Jira)

David Anderson created FLINK-30442:
--

 Summary: Update table walkthrough playground for 1.12
 Key: FLINK-30442
 URL: https://issues.apache.org/jira/browse/FLINK-30442
 Project: Flink
  Issue Type: Sub-task
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-30440) Update operations playground for 1.16

2022-12-16 Thread David Anderson (Jira)

David Anderson created FLINK-30440:
--

 Summary: Update operations playground for 1.16
 Key: FLINK-30440
 URL: https://issues.apache.org/jira/browse/FLINK-30440
 Project: Flink
  Issue Type: Sub-task
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-30441) Update pyflink walkthrough playground for 1.16

2022-12-16 Thread David Anderson (Jira)

David Anderson created FLINK-30441:
--

 Summary: Update pyflink walkthrough playground for 1.16
 Key: FLINK-30441
 URL: https://issues.apache.org/jira/browse/FLINK-30441
 Project: Flink
  Issue Type: Sub-task
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-30439) Update playgrounds for 1.16

2022-12-16 Thread David Anderson (Jira)

David Anderson created FLINK-30439:
--

 Summary: Update playgrounds for 1.16
 Key: FLINK-30439
 URL: https://issues.apache.org/jira/browse/FLINK-30439
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training
Reporter: David Anderson
 Fix For: 1.16.0


All of the playgrounds should be updated for Flink 1.16. This should include 
reworking the code as necessary to avoid using anything that has been 
deprecated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] Release Flink 1.16.1

2022-12-15 Thread David Anderson

Martijn,

Thank you for bringing this up. From my (admittedly narrow) perspective,
I'd like to see a release sooner rather than later, as there's an
already merged bug fix I'd like to get released.

David



On Thu, Dec 15, 2022 at 1:53 PM Martijn Visser 
wrote:

> Hi everyone,
>
> I would like to open a discussion about releasing Flink 1.16.1. We've
> released Flink 1.16 at the end of October, but we already have 58 fixes
> listed for 1.16.1, including a blocker [1] on the environment variables and
> a number of critical issues. Some of the critical issues are related to the
> bugs on the Sink API, on PyFlink and some correctness issues.
>
> There are also a number of open issues with a fixVersion set to 1.16.1, so
> it would be good to understand what the community thinks of starting a
> release or if there are some fixes that should be included with 1.16.1.
>
> Best regards,
>
> Martijn
>
> [1] https://issues.apache.org/jira/browse/FLINK-30116
>

Re: [DISCUSS] Remove FlinkKafkaConsumer and FlinkKafkaProducer in the master for 1.17 release

2022-11-02 Thread David Anderson

>
> For the partition
> idleness problem could you elaborate more about it? I assume both
> FlinkKafkaConsumer and KafkaSource need a WatermarkStrategy to decide
> whether to mark the partition as idle.

As a matter of fact, no, that's not the case -- which is why I mentioned it.

The FlinkKafkaConsumer automatically treats all initially empty (or
non-existent) partitions as idle, while the KafkaSource only does this if
the WatermarkStrategy specifies that idleness handling is desired by
configuring withIdleness. This can be a source of confusion for folks
upgrading to the new connector. It most often shows up in situations where
the number of Kafka partitions is less than the parallelism of the
connector, which is a rather common occurrence in development and testing
environments.

I believe this change in behavior was made deliberately, so as to create a
more consistent experience across all FLIP-27 connectors. This isn't
something that needs to be fixed, but does need to be communicated more
clearly. Unfortunately, the whole idleness mechanism remained significantly
broken until 1.16 (considering the impact of [1] and [2]), further
complicating the situation. Because of FLINK-28975 [2], users with
partitions that are initially empty may have problems with versions before
1.15.3 (still unreleased) and 1.16.0. See [3] for an example of this
confusion.

[1] https://issues.apache.org/jira/browse/FLINK-18934 (idleness didn't work
with connected streams)
[2] https://issues.apache.org/jira/browse/FLINK-28975 (idle streams could
never become active again)
[3]
https://stackoverflow.com/questions/70096166/parallelism-in-flink-kafka-source-causes-nothing-to-execute/70101290#70101290

Best,
David

On Wed, Nov 2, 2022 at 5:26 AM Qingsheng Ren  wrote:

> Thanks Jing for starting the discussion.
>
> +1 for removing FlinkKafkaConsumer, as KafkaSource has evolved for many
> release cycles and should be stable enough. I have some concerns about the
> new Kafka sink based on sink v2, as sink v2 still has some ongoing work in
> 1.17 (maybe Yun Gao could provide some inputs). Also we found some issues
> of KafkaSink related to the internal mechanism of sink v2, like
> FLINK-29492.
>
> @David
> About the ability of DeserializationSchema#isEndOfStream, FLIP-208 is
> trying to complete this piece of the puzzle, and Hang Ruan (
> ruanhang1...@gmail.com) plans to work on it in 1.17. For the partition
> idleness problem could you elaborate more about it? I assume both
> FlinkKafkaConsumer and KafkaSource need a WatermarkStrategy to decide
> whether to mark the partition as idle.
>
> Best,
> Qingsheng
> Ververica (Alibaba)
>
> On Thu, Oct 27, 2022 at 8:06 PM Jing Ge  wrote:
>
> > Hi Dev,
> >
> > I'd like to start a discussion about removing FlinkKafkaConsumer and
> > FlinkKafkaProducer in 1.17.
> >
> > Back in the past, it was originally announced to remove it with Flink
> 1.15
> > after Flink 1.14 had been released[1]. And then postponed to the next
> 1.15
> > release which meant to remove it with Flink 1.16 but forgot to change the
> > doc[2]. I have created a PRs to fix it. Since the 1.16 release branch has
> > code freeze, it makes sense to, first of all, update the doc to say that
> > FlinkKafkaConsumer will be removed with Flink 1.17 [3][4] and second
> start
> > the discussion about removing them with the current master branch i.e.
> for
> > the coming 1.17 release. I'm all ears and looking forward to your
> feedback.
> > Thanks!
> >
> > Best regards,
> > Jing
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > [1]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/kafka/#kafka-sourcefunction
> > [2]
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#kafka-sourcefunction
> > [3] https://github.com/apache/flink/pull/21172
> > [4] https://github.com/apache/flink/pull/21171
> >
>

Re: [DISCUSS] Remove FlinkKafkaConsumer and FlinkKafkaProducer in the master for 1.17 release

2022-11-01 Thread David Anderson

>
> [H]ow one can migrate from the
> FlinkKafkaConsumer/FlinkKafkaProducer to KafkaSource/KafkaSink, while
> preserving exactly-once guarantees etc?


The responses from Fabian Paul in [1] and [2] address the question of how
to handle the migration in terms of managing the state (where the short
answer is "arrange for Kafka to be the source of truth").

Those threads don't get into the differences in behavior between the two
implementations. Here I'm thinking about (1) the loss of
DeserializationSchema#isEndOfStream, and the fact that you can no longer
dynamically determine when the input stream has finished, and (2) the
change to how empty partitions are handled on startup (they used to be
marked idle automatically, whereas now you must use withIdleness in the
WatermarkStrategy).

[1] https://www.mail-archive.com/user@flink.apache.org/msg44618.html
[2] https://www.mail-archive.com/user@flink.apache.org/msg45864.html

On Mon, Oct 31, 2022 at 7:32 PM Piotr Nowojski  wrote:

> Hi,
>
> Maybe a stupid question, but how one can migrate from the
> FlinkKafkaConsumer/FlinkKafkaProducer to KafkaSource/KafkaSink, while
> preserving exactly-once guarantees etc? Is it possible? I've tried a quick
> search and couldn't find it, but maybe I was looking in wrong places.
>
> Best,
> Piotrek
>
> pon., 31 paź 2022 o 16:40 Jing Ge  napisał(a):
>
> > Thanks Martijn. What you said makes a lot of sense. I figure we should do
> > it in 2 steps.
> >
> >  Step 1 (with 1.17):
> > - Remove FlinkKafkaConsumer.
> > - Graduate Kafka Source from @PublicEvolving to @Public.
> > - Update doc and leave hints for customers as the reference.
> >
> > According to [1], the Kafka Sink should also be graduated with 1.17, i.e.
> > after 1.15 and 1.16 two release cycles. But since the design change from
> > SinkV1 to SinkV2 were significant and there were many change requests
> since
> > then, we'd better give the sink one more release cycle time to become
> more
> > stable. The other reason for giving the Sink more time is that the
> > experimental phase was only covered by one release cycle instead of two
> as
> > [1] suggested.
> >
> > Step 2 (with 1.18 ):
> > - Remove FlinkKafkaProducer.
> > - Graduate Kafka Sink from @PublicEvolving to @Public.
> > - Update doc and leave hints for customers as the reference.
> >
> > Best regards,
> > Jing
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process
> >
> > On Thu, Oct 27, 2022 at 3:01 PM Martijn Visser  >
> > wrote:
> >
> > > Hi Jing,
> > >
> > > Thanks for opening the discussion. I see no issue with removing the
> > > FlinkKafkaConsumer, since it has been marked as deprecated and the
> Source
> > > API (which is used by the KafkaSource) is marked as @Public (at least
> the
> > > Base implementation)
> > >
> > > The successor of the FlinkKafkaProducer is the KafkaSink, which is
> using
> > > the Sink V2 API which is still marked as @PublicEvolving (Base
> > > implementation). I think that we should only remove the
> > FlinkKafkaProducer
> > > if we also mark the Sink V2 as @Public. I don't think that should be a
> > > problem (since it's based on the first Sink implementation, which was
> > > Experimental in 1.14 and got replaced with Sink V2 as PublicEvolving in
> > > 1.15).
> > >
> > > Thanks,
> > >
> > > Martijn
> > >
> > > On Thu, Oct 27, 2022 at 2:06 PM Jing Ge  wrote:
> > >
> > > > Hi Dev,
> > > >
> > > > I'd like to start a discussion about removing FlinkKafkaConsumer and
> > > > FlinkKafkaProducer in 1.17.
> > > >
> > > > Back in the past, it was originally announced to remove it with Flink
> > > 1.15
> > > > after Flink 1.14 had been released[1]. And then postponed to the next
> > > 1.15
> > > > release which meant to remove it with Flink 1.16 but forgot to change
> > the
> > > > doc[2]. I have created a PRs to fix it. Since the 1.16 release branch
> > has
> > > > code freeze, it makes sense to, first of all, update the doc to say
> > that
> > > > FlinkKafkaConsumer will be removed with Flink 1.17 [3][4] and second
> > > start
> > > > the discussion about removing them with the current master branch
> i.e.
> > > for
> > > > the coming 1.17 release. I'm all ears and looking forward to your
> > > feedback.
> > > > Thanks!
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/kafka/#kafka-sourcefunction
> > > > [2]
> > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#kafka-sourcefunction
> > > > [3] https://github.com/apache/flink/pull/21172
> > > > [4] https://github.com/apache/flink/pull/21171
> > > >
> > >
> >
>

Re: [VOTE] Drop Gelly

2022-10-19 Thread David Anderson

+1

On Wed, Oct 12, 2022 at 10:59 PM Martijn Visser 
wrote:

> Hi everyone,
>
> I would like to open a vote for dropping Gelly, which was discussed a long
> time ago but never put to a vote [1].
>
> Voting will be open for at least 72 hours.
>
> Best regards,
>
> Martijn
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>
> [1] https://lists.apache.org/thread/2m6wtgjvxcogbf9d5q7mqt4ofqjf2ojc
>

Re: [VOTE] FLIP-265 Deprecate and remove Scala API support

2022-10-19 Thread David Anderson

+1



On Mon, Oct 17, 2022 at 3:39 PM Martijn Visser 
wrote:

> Hi everyone,
>
> I'm hereby opening a vote for FLIP-265 Deprecate and remove Scala API
> support. The related discussion can be found here [1].
>
> Voting will be open for at least 72 hours.
>
> Best regards,
>
> Martijn
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>
> [1] https://lists.apache.org/thread/d3borhdzj496nnggohq42fyb6zkwob3h
>

[jira] [Created] (FLINK-28975) withIdleness marks all streams from FLIP-27 sources as idle

2022-08-15 Thread David Anderson (Jira)

David Anderson created FLINK-28975:
--

 Summary: withIdleness marks all streams from FLIP-27 sources as 
idle
 Key: FLINK-28975
 URL: https://issues.apache.org/jira/browse/FLINK-28975
 Project: Flink
  Issue Type: Bug
  Components: API / DataStream
Affects Versions: 1.15.1
Reporter: David Anderson
 Fix For: 1.16.0


Using withIdleness with a FLIP-27 source leads to all of the streams from the 
source being marked idle, which in turn leads to incorrect results, e.g., from 
joins that rely on watermarks.

Quoting from the user ML thread:

In org.apache.flink.streaming.api.operators.SourceOperator, there are separate 
instances of WatermarksWithIdleness created for each split output and the main 
output. There is multiplexing of watermarks between split outputs but no 
multiplexing between split output and main output.
 
For a source such as org.apache.flink.connector.kafka.source.KafkaSource, 
{color:#353833}there is only output from splits and no output from main. Hence 
the main output will (after an initial timeout) be marked as idle.{color}
{color:#353833} {color}
{color:#353833}The implementation of {color}WatermarksWithIdleness is such that 
once an output is idle, it will periodically re-mark the output as idle. Since 
there is no multiplexing between split outputs and main output, the idle marks 
coming from main output will repeatedly set the output to idle even though 
there are events from the splits. Result is that the entire source is 
repeatedly marked as idle.


See this ML thread for more details: 
[https://lists.apache.org/thread/bbokccohs16tzkdtybqtv1vx76gqkqj4]

This probably affects older versions of Flink as well, but that needs to be 
verified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] Bump Kafka to 3.2.1 for 1.16.0

2022-08-09 Thread David Anderson

I'm in favor of adopting this fix in 1.16.0.

+1

On Tue, Aug 9, 2022 at 7:13 AM tison  wrote:

> +1
>
> This looks reasonable.
>
> Best,
> tison.
>
>
> Thomas Weise  于2022年8月9日周二 21:33写道：
>
> > +1 for bumping the Kafka dependency.
> >
> > Flink X.Y.0 releases require thorough testing, so considering the
> severity
> > of the problem this is still good timing, even that close to the first
> RC.
> >
> > Thanks for bringing this up.
> >
> > Thomas
> >
> > On Tue, Aug 9, 2022 at 7:51 AM Chesnay Schepler 
> > wrote:
> >
> > > Hello,
> > >
> > > The Kafka upgrade in 1.15.0 resulted in a regression
> > > (https://issues.apache.org/jira/browse/FLINK-28060) where offsets are
> > > not committed to Kafka, impeding monitoring and the starting offsets
> > > functionality of the connector.
> > >
> > > This has been fixed a about a week ago in Kafka 3.2.1.
> > >
> > > The question is whether we want to upgrade Kafka so close to the
> feature
> > > freeze. I'm usually not a friend of doing that in general, but in this
> > > case there is a specific issue we'd like to get fixed and we still have
> > > the entire duration of the feature freeze to observe the behavior.
> > >
> > > I'd like to know what you think about this.
> > >
> > > For reference, our current Kafka version is 3.1.1, and our CI is
> passing
> > > with 3.2.1.
> > >
> > >
> > >
> >
>

[jira] [Created] (FLINK-28754) document that Java 8 is required to build table store

2022-07-30 Thread David Anderson (Jira)

David Anderson created FLINK-28754:
--

 Summary: document that Java 8 is required to build table store
 Key: FLINK-28754
 URL: https://issues.apache.org/jira/browse/FLINK-28754
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Table Store
Reporter: David Anderson


The table store can not be built with Java 11, but the "build from source" 
instructions don't mention this restriction.

https://nightlies.apache.org/flink/flink-table-store-docs-master/docs/engines/build/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] FLIP-251: Support collecting arbitrary number of streams

2022-07-20 Thread David Anderson

+1

Thank you Chesnay.

On Tue, Jul 19, 2022 at 3:09 PM Alexander Fedulov 
wrote:

> +1
> Looking forward to using the API to simplify tests setups.
>
> Best,
> Alexander Fedulov
>
> On Tue, Jul 19, 2022 at 2:31 PM Martijn Visser 
> wrote:
>
> > Thanks for creating the FLIP and opening the vote Chesnay.
> >
> > +1 (binding)
> >
> > Op di 19 jul. 2022 om 10:26 schreef Chesnay Schepler  >:
> >
> > > I'd like to proceed with the vote for FLIP-251 [1], as no objections or
> > > issues were raised in [2].
> > >
> > > The vote will last for at least 72 hours unless there is an objection
> or
> > > insufficient votes.
> > >
> > >   [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-251%3A+Support+collecting+arbitrary+number+of+streams
> > >   [2] https://lists.apache.org/thread/ksv71m7rvcwslonw07h2qnw77zpqozvh
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-251: Support collecting arbitrary number of streams

2022-07-08 Thread David Anderson

I've found that with our current tooling it's frustrating to try to write
good end-to-end tests for real-world jobs with multiple sinks.
DataStream#executeAndCollect() is okay for simple pipelines with one sink,
but in my opinion we do need something like FLIP-251.

The proposed interface looks good to me. I look forward to trying it.

Chesnay, thanks for putting this together!

David

On Thu, Jul 7, 2022 at 7:35 AM Chesnay Schepler  wrote:

> Hello,
>
> I have created a FLIP for slightly extending the collect() API to
> support collecting an arbitrary number of streams, to make it more
> useful for testing complex workflows.
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-251%3A+Support+collecting+arbitrary+number+of+streams
>
> Looking forward to any feedback.
>
> Regards,
>
> Chesnay
>
>

[ANNOUNCE] Apache Flink 1.15.1 released

2022-07-07 Thread David Anderson

The Apache Flink community is very happy to announce the release of Apache
Flink 1.15.1, which is the first bugfix release for the Apache Flink 1.15
series.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements
for this bugfix release:

https://flink.apache.org/news/2022/07/06/release-1.15.1.html

The full release notes are available in Jira:


https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351546

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Regards,
David Anderson

[RESULT][VOTE] Release 1.15.1, release candidate #1

2022-07-05 Thread David Anderson

I am pleased to announce that we have approved this release candidate.

There are 7 approving votes, 4 of which are binding:
- Chesnay Schepler (+1 binding)
- Xingbo Huang (+1 non-binding)
- Qingsheng Ren (+1 non-binding)
- Robert Metzger (+1 binding)
- Konstantin Knauf (+1 binding)
- Jingsong Li (+0 binding)
- Danny Cranmer (+0 non-binding)

There are no disapproving votes.

Thank you for verifying the release candidate. I will now proceed to
finalize the release and announce it once everything is published.

Best regards,
David

[VOTE] Release 1.15.1, release candidate #1

2022-06-22 Thread David Anderson

Hi everyone,

Please review and vote on release candidate #1 for version 1.15.1, as
follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:

* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint E982F098 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.15.1-rc1" [5],
* website pull request listing the new release and adding announcement blog
post [6].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
David

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=
12351546
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.15.1-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1511/
[5] https://github.com/apache/flink/tree/release-1.15.1-rc1
[6] https://github.com/apache/flink-web/pull/554

Re: [DISCUSS] FLIP-238: Introduce FLIP-27-based Data Generator Source

2022-06-20 Thread David Anderson

I'm very happy with this. +1

A lot of SourceFunction implementations used in demos/POC implementations
include a call to sleep(), so adding rate limiting is a good idea, in my
opinion.

Best,
David

On Mon, Jun 20, 2022 at 10:10 AM Qingsheng Ren  wrote:

> Hi Alexander,
>
> Thanks for creating this FLIP! I’d like to share some thoughts.
>
> 1. About the “generatorFunction” I’m expecting an initializer on it
> because it’s hard to require all fields in the generator function are
> serializable in user’s implementation. Providing a function like “open” in
> the interface could let the function to make some initializations in the
> task initializing stage.
>
> 2. As of the throttling functinality you mentioned, there’s a
> FlinkConnectorRateLimiter under flink-core and maybe we could reuse this
> interface. Actually I prefer to make rate limiting as a common feature
> provided in the Source API, but this requires another FLIP and a lot of
> discussions so I’m OK to have it in the DataGen source first.
>
> Best regards,
> Qingsheng
>
>
> > On Jun 17, 2022, at 01:47, Alexander Fedulov 
> wrote:
> >
> > Hi Jing,
> >
> > thanks for your thorough analysis. I agree with the points you make and
> > also with the idea to approach the larger task of providing a universal
> > (DataStream + SQL) data generator base iteratively.
> > Regarding the name, the SourceFunction-based *DataGeneratorSource*
> resides
> > in the *org.apache.flink.streaming.api.functions.source.datagen*. I think
> > it is OK to simply place the new one (with the same name) next to the
> > *NumberSequenceSource* into *org.apache.flink.api.connector.source.lib*.
> >
> > One more thing I wanted to discuss:  I noticed that *DataGenTableSource
> *has
> > built-in throttling functionality (*rowsPerSecond*). I believe it is
> > something that could be also useful for the DataStream users of the
> > stateless data generator and since we want to eventually converge on the
> > same implementation for DataStream and Table/SQL it sounds like a good
> idea
> > to add it to the FLIP. What do you think?
> >
> > Best,
> > Alexander Fedulov
> >
> >
> > On Tue, Jun 14, 2022 at 7:17 PM Jing Ge  wrote:
> >
> >> Hi,
> >>
> >> After reading all discussions posted in this thread and the source code
> of
> >> DataGeneratorSource which unfortunately used "Source" instead of
> >> "SourceFunction" in its name, issues could summarized as following:
> >>
> >> 1. The current DataGeneratorSource based on SourceFunction is a stateful
> >> source connector and built for Table/SQL.
> >> 2. The right name for the new data generator source i.e.
> >> DataGeneratorSource has been used for the current implementation based
> on
> >> SourceFunction.
> >> 3. A new data generator source should be developed based on the new
> Source
> >> API.
> >> 4. The new data generator source should be used both for DataStream and
> >> Table/SQL, which means the current DataGeneratorSource should be
> replaced
> >> with the new one.
> >> 5. The core event generation logic should be pluggable to support
> various
> >> (test) scenarios, e.g. rondom stream, changlog stream, controllable
> events
> >> per checkpoint, etc.
> >>
> >> which turns out that
> >>
> >> To solve 1+3+4 -> we will have to make a big effort to replace the
> current
> >> DataGeneratorSource since the new Source API has a very different
> >> concept, especially for the stateful part.
> >> To solve 2+3 -> we have to find another name for the new implementation.
> >> To solve 1+3+4+5 -> It gets even more complicated to support stateless
> and
> >> stateful scenarios simultaneously with one solution.
> >>
> >> If we want to solve all of these issues in one shot, It might take
> months.
> >> Therefore, I would suggest starting from small and growing up
> iteratively.
> >>
> >> The proposal for the kickoff is to focus on stateless event generation
> >> with e.g. rondom stream and use the name "StatelessDataGeneratoSource".
> >> The will be a period of time that both DataGeneratorSource will be used
> by
> >> the developer. The current DataGeneratorSource will be then deprecated,
> >> once we can(iteratively):
> >> 1. either enlarge the scope of StatelessDataGeneratoSourcer to be able
> to
> >> cover stateful scenarios and renaming it to
> "DataGeneratorSourceV2"(follow
> >> the naming convention of SinkV2) or
> >> 2. develop a new "SatefullDataGeneratorSource" based on Source API which
> >> can handle the stateful scenarios, if it is impossible to support both
> >> stateless and stateful scenarios with one GeneratorSource
> implementation.
> >>
> >> Best regards,
> >> Jing
> >>
> >> On Thu, Jun 9, 2022 at 2:48 PM Martijn Visser  >
> >> wrote:
> >>
> >>> Hey Alex,
> >>>
> >>> Yes, I think we need to make sure that we're not causing confusion (I
> know
> >>> I already was confused). I think the DataSupplierSource is already
> better,
> >>> but perhaps there are others who have an even better idea.
> >>>
> >>> Thanks,
> >>>
> >>> Martijn
> >>>

Re: [DISCUSS] Releasing 1.15.1

2022-06-17 Thread David Anderson

After hitting pause on the 1.15.1 release a couple of days ago because of
FLINK-28060 [1], I want to hit resume now. You should go read that ticket
if you want the details, but the summary is that the upgrade of the Kafka
Clients to 2.8.1 that was done in Flink 1.15.0 is causing a bug that means
that after a Kafka broker restarts, subsequent attempts by Flink to commit
Kafka offsets will fail until Flink is restarted. The only good fix
available to us appears to be upgrading the Kafka Clients to version 3.1.1.

It doesn't seem wise to upgrade the Kafka Clients from 2.8.1 to 3.1.1 for
Flink 1.15.1. That's a big change to make this close to our release, and we
shouldn't delay this release any further. So here's a proposal:

   - We bump the Kafka Clients to 3.1.1 in master now.
   - We don't try to fix FLINK-28060 for 1.15.1.


   - We create the Flink 1.15.1 release straight away, noting that there's
   a known issue with Kafka (FLINK-28060).
   - We reach out to the Kafka community to see if they're willing to
   create a 2.8.2 release with a backport of the fix for this bug.
   - In parallel, we merge a bump of the Kafka Clients to 3.1.1 after
   1.15.1 is done, to see how it behaves on the CI for the next few weeks, and
   plan a quick Flink 1.15.2 release (most likely something like a month
   later).


[1] https://issues.apache.org/jira/browse/FLINK-28060

Best,
David

On Wed, Jun 15, 2022 at 11:37 AM David Anderson 
wrote:

> I'm now thinking we should delay 1.15.1 long enough to see if we can
> include a fix for FLINK-28060 [1], which is a serious regression affecting
> several Kafka users.
>
> [1] https://issues.apache.org/jira/browse/FLINK-28060
>
> On Fri, Jun 10, 2022 at 12:15 PM David Anderson 
> wrote:
>
>> Since no one has brought up any blockers, I'll plan to start the release
>> process on Monday unless I hear otherwise.
>>
>> Best,
>> David
>>
>> On Thu, Jun 9, 2022 at 10:20 AM Yun Gao 
>> wrote:
>>
>>> Hi David,
>>>
>>> Very thanks for driving the new version, also +1 since we already
>>> accumulated some fixes.
>>>
>>> Regarding https://issues.apache.org/jira/browse/FLINK-27492, currently
>>> there are still some
>>> controversy with how to deal with the artifacts. I also agree we may not
>>> hold up the release
>>> with this issue. We'll try to reach to the consensus as soon as possible
>>> to try best catching
>>> up with the release.
>>>
>>> Best,
>>> Yun
>>>
>>>
>>>
>>> --
>>> From:LuNing Wang 
>>> Send Time:2022 Jun. 9 (Thu.) 10:10
>>> To:dev 
>>> Subject:Re: [DISCUSS] Releasing 1.15.1
>>>
>>> Hi David,
>>>
>>> +1
>>> Thank you for driving this.
>>>
>>> Best regards,
>>> LuNing Wang
>>>
>>> Jing Ge  于2022年6月8日周三 20:45写道：
>>>
>>> > +1
>>> >
>>> > Thanks David for driving it!
>>> >
>>> > Best regards,
>>> > Jing
>>> >
>>> >
>>> > On Wed, Jun 8, 2022 at 2:32 PM Xingbo Huang 
>>> wrote:
>>> >
>>> > > Hi David,
>>> > >
>>> > > +1
>>> > > Thank you for driving this.
>>> > >
>>> > > Best,
>>> > > Xingbo
>>> > >
>>> > > Chesnay Schepler  于2022年6月8日周三 18:41写道：
>>> > >
>>> > > > +1
>>> > > >
>>> > > > Thank you for proposing this. I can take care of the PMC-side of
>>> > things.
>>> > > >
>>> > > > On 08/06/2022 12:37, Jingsong Li wrote:
>>> > > > > +1
>>> > > > >
>>> > > > > Thanks David for volunteering to manage the release.
>>> > > > >
>>> > > > > Best,
>>> > > > > Jingsong
>>> > > > >
>>> > > > > On Wed, Jun 8, 2022 at 6:21 PM Jark Wu  wrote:
>>> > > > >> Hi David, thank you for driving the release.
>>> > > > >>
>>> > > > >> +1 for the 1.15.1 release. The release-1.15 branch
>>> > > > >> already contains many bug fixes and some SQL
>>> > > > >> issues are quite critical.
>>> > > > >>
>>> > > > >> Btw, FLINK-27606 has been merged just now.
>>> > > > >>
>>> > > > >> Best,
>&g

Re: [DISCUSS] Releasing 1.15.1

2022-06-15 Thread David Anderson

I'm now thinking we should delay 1.15.1 long enough to see if we can
include a fix for FLINK-28060 [1], which is a serious regression affecting
several Kafka users.

[1] https://issues.apache.org/jira/browse/FLINK-28060

On Fri, Jun 10, 2022 at 12:15 PM David Anderson 
wrote:

> Since no one has brought up any blockers, I'll plan to start the release
> process on Monday unless I hear otherwise.
>
> Best,
> David
>
> On Thu, Jun 9, 2022 at 10:20 AM Yun Gao 
> wrote:
>
>> Hi David,
>>
>> Very thanks for driving the new version, also +1 since we already
>> accumulated some fixes.
>>
>> Regarding https://issues.apache.org/jira/browse/FLINK-27492, currently
>> there are still some
>> controversy with how to deal with the artifacts. I also agree we may not
>> hold up the release
>> with this issue. We'll try to reach to the consensus as soon as possible
>> to try best catching
>> up with the release.
>>
>> Best,
>> Yun
>>
>>
>>
>> --
>> From:LuNing Wang 
>> Send Time:2022 Jun. 9 (Thu.) 10:10
>> To:dev 
>> Subject:Re: [DISCUSS] Releasing 1.15.1
>>
>> Hi David,
>>
>> +1
>> Thank you for driving this.
>>
>> Best regards,
>> LuNing Wang
>>
>> Jing Ge  于2022年6月8日周三 20:45写道：
>>
>> > +1
>> >
>> > Thanks David for driving it!
>> >
>> > Best regards,
>> > Jing
>> >
>> >
>> > On Wed, Jun 8, 2022 at 2:32 PM Xingbo Huang  wrote:
>> >
>> > > Hi David,
>> > >
>> > > +1
>> > > Thank you for driving this.
>> > >
>> > > Best,
>> > > Xingbo
>> > >
>> > > Chesnay Schepler  于2022年6月8日周三 18:41写道：
>> > >
>> > > > +1
>> > > >
>> > > > Thank you for proposing this. I can take care of the PMC-side of
>> > things.
>> > > >
>> > > > On 08/06/2022 12:37, Jingsong Li wrote:
>> > > > > +1
>> > > > >
>> > > > > Thanks David for volunteering to manage the release.
>> > > > >
>> > > > > Best,
>> > > > > Jingsong
>> > > > >
>> > > > > On Wed, Jun 8, 2022 at 6:21 PM Jark Wu  wrote:
>> > > > >> Hi David, thank you for driving the release.
>> > > > >>
>> > > > >> +1 for the 1.15.1 release. The release-1.15 branch
>> > > > >> already contains many bug fixes and some SQL
>> > > > >> issues are quite critical.
>> > > > >>
>> > > > >> Btw, FLINK-27606 has been merged just now.
>> > > > >>
>> > > > >> Best,
>> > > > >> Jark
>> > > > >>
>> > > > >>
>> > > > >> On Wed, 8 Jun 2022 at 17:40, David Anderson <
>> dander...@apache.org>
>> > > > wrote:
>> > > > >>
>> > > > >>> I would like to start a discussion on releasing 1.15.1. Flink
>> 1.15
>> > > was
>> > > > >>> released on the 5th of May [1] and so far 43 issues have been
>> > > resolved,
>> > > > >>> including several user-facing issues with blocker and critical
>> > > > priorities
>> > > > >>> [2]. (The recent problem with FileSink rolling policies not
>> working
>> > > > >>> properly in 1.15.0 got me thinking it might be time for bug-fix
>> > > > release.)
>> > > > >>>
>> > > > >>> There currently remain 16 unresolved tickets with a fixVersion
>> of
>> > > > 1.15.1
>> > > > >>> [3], five of which are about CI infrastructure and tests. There
>> is
>> > > > only one
>> > > > >>> such ticket marked Critical:
>> > > > >>>
>> > > > >>> https://issues.apache.org/jira/browse/FLINK-27492 Flink table
>> > scala
>> > > > >>> example
>> > > > >>> does not including the scala-api jars
>> > > > >>>
>> > > > >>> I'm not convinced we should hold up a release for this issue,
>> but
>> > on
>> > > > the
>> > > > >>> other hand, it seems that this issue c

Re: [DISCUSS ] Make state.backend.incremental as true by default

2022-06-14 Thread David Anderson

Thank you for bringing this up!

+1


On Mon, Jun 13, 2022 at 1:48 PM Rui Fan <1996fan...@gmail.com> wrote:

> Strongly +1
>
> Best,
> Rui Fan
>
> On Mon, Jun 13, 2022 at 7:35 PM Martijn Visser 
> wrote:
>
> > > BTW, from my knowledge, nothing would happen for HashMapStateBackend,
> > which does not support incremental checkpoint yet, when enabling
> > incremental checkpoints.
> >
> > Thanks Yun, if no errors would occur then definitely +1 to enable it by
> > default
> >
> > Op ma 13 jun. 2022 om 12:42 schreef Alexander Fedulov <
> > alexan...@ververica.com>:
> >
> > > +1
> > >
> > > From my experience, it is actually hard to come up with use cases where
> > > incremental checkpoints should explicitly not be enabled with the
> RocksDB
> > > state backend. If the state is so small that the full snapshots do not
> > > have any negative impact, one should consider using HashMapStateBackend
> > > anyway.
> > >
> > > Best,
> > > Alexander Fedulov
> > >
> > >
> > > On Mon, Jun 13, 2022 at 12:26 PM Jing Ge  wrote:
> > >
> > > > +1
> > > >
> > > > Glad to see the kickoff of this discussion. Thanks Lihe for driving
> > this!
> > > >
> > > > We have actually already discussed it internally a few months ago.
> > After
> > > > considering some corner cases, all agreed on enabling the incremental
> > > > checkpoint as default.
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > > On Mon, Jun 13, 2022 at 12:17 PM Yun Tang  wrote:
> > > >
> > > > > Strongly +1 for making incremental checkpoints as default. Many
> users
> > > > have
> > > > > ever been asking why this configuration is not enabled by default.
> > > > >
> > > > > BTW, from my knowledge, nothing would happen for
> HashMapStateBackend,
> > > > > which does not support incremental checkpoint yet, when enabling
> > > > > incremental checkpoints.
> > > > >
> > > > >
> > > > > Best
> > > > > Yun Tang
> > > > > 
> > > > > From: Martijn Visser 
> > > > > Sent: Monday, June 13, 2022 18:05
> > > > > To: dev@flink.apache.org 
> > > > > Subject: Re: [DISCUSS ] Make state.backend.incremental as true by
> > > default
> > > > >
> > > > > Hi Lihe,
> > > > >
> > > > > What happens if we enable incremental checkpoints by default while
> > the
> > > > used
> > > > > memory backend is HashMapStateBackend, which doesn't support
> > > incremental
> > > > > checkpoints?
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn
> > > > >
> > > > > Op ma 13 jun. 2022 om 11:59 schreef Lihe Ma :
> > > > >
> > > > > > Hi, Everyone,
> > > > > >
> > > > > > I would like to open a discussion on setting incremental
> checkpoint
> > > as
> > > > > > default behavior.
> > > > > >
> > > > > > Currently, the configuration of state.backend.incremental is set
> as
> > > > false
> > > > > > by default. Incremental checkpoint has been adopted widely in
> > > industry
> > > > > > community for many years , and it is also well-tested from the
> > > feedback
> > > > > in
> > > > > > the community discussion. Incremental checkpointing is more
> > > > > light-weighted:
> > > > > > shorter checkpoint duration, less uploaded data and less resource
> > > > > > consumption.
> > > > > >
> > > > > > In terms of backward compatibility, enable incremental
> > checkpointing
> > > > > would
> > > > > > not make any data loss no matter restoring from a full
> > > > > checkpoint/savepoint
> > > > > > or an incremental checkpoint.
> > > > > >
> > > > > > FLIP-193 (Snapshot ownership)[1] has been released in 1.15,
> > > incremental
> > > > > > checkpoint no longer depends on a previous restored checkpoint in
> > > > default
> > > > > > NO_CLAIM mode, which makes the checkpoint lineage much cleaner,
> it
> > > is a
> > > > > > good chance to change the configuration state.backend.incremental
> > to
> > > > true
> > > > > > as default.
> > > > > >
> > > > > > Thus, based on the above discussion, I suggest to make
> > > > > > state.backend.incremental as true by default. What do you think
> of
> > > this
> > > > > > proposal?
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
> > > > > >
> > > > > > Best regards,
> > > > > > Lihe Ma
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Releasing 1.15.1

2022-06-10 Thread David Anderson

Since no one has brought up any blockers, I'll plan to start the release
process on Monday unless I hear otherwise.

Best,
David

On Thu, Jun 9, 2022 at 10:20 AM Yun Gao 
wrote:

> Hi David,
>
> Very thanks for driving the new version, also +1 since we already
> accumulated some fixes.
>
> Regarding https://issues.apache.org/jira/browse/FLINK-27492, currently
> there are still some
> controversy with how to deal with the artifacts. I also agree we may not
> hold up the release
> with this issue. We'll try to reach to the consensus as soon as possible
> to try best catching
> up with the release.
>
> Best,
> Yun
>
>
>
> --
> From:LuNing Wang 
> Send Time:2022 Jun. 9 (Thu.) 10:10
> To:dev 
> Subject:Re: [DISCUSS] Releasing 1.15.1
>
> Hi David,
>
> +1
> Thank you for driving this.
>
> Best regards,
> LuNing Wang
>
> Jing Ge  于2022年6月8日周三 20:45写道：
>
> > +1
> >
> > Thanks David for driving it!
> >
> > Best regards,
> > Jing
> >
> >
> > On Wed, Jun 8, 2022 at 2:32 PM Xingbo Huang  wrote:
> >
> > > Hi David,
> > >
> > > +1
> > > Thank you for driving this.
> > >
> > > Best,
> > > Xingbo
> > >
> > > Chesnay Schepler  于2022年6月8日周三 18:41写道：
> > >
> > > > +1
> > > >
> > > > Thank you for proposing this. I can take care of the PMC-side of
> > things.
> > > >
> > > > On 08/06/2022 12:37, Jingsong Li wrote:
> > > > > +1
> > > > >
> > > > > Thanks David for volunteering to manage the release.
> > > > >
> > > > > Best,
> > > > > Jingsong
> > > > >
> > > > > On Wed, Jun 8, 2022 at 6:21 PM Jark Wu  wrote:
> > > > >> Hi David, thank you for driving the release.
> > > > >>
> > > > >> +1 for the 1.15.1 release. The release-1.15 branch
> > > > >> already contains many bug fixes and some SQL
> > > > >> issues are quite critical.
> > > > >>
> > > > >> Btw, FLINK-27606 has been merged just now.
> > > > >>
> > > > >> Best,
> > > > >> Jark
> > > > >>
> > > > >>
> > > > >> On Wed, 8 Jun 2022 at 17:40, David Anderson  >
> > > > wrote:
> > > > >>
> > > > >>> I would like to start a discussion on releasing 1.15.1. Flink
> 1.15
> > > was
> > > > >>> released on the 5th of May [1] and so far 43 issues have been
> > > resolved,
> > > > >>> including several user-facing issues with blocker and critical
> > > > priorities
> > > > >>> [2]. (The recent problem with FileSink rolling policies not
> working
> > > > >>> properly in 1.15.0 got me thinking it might be time for bug-fix
> > > > release.)
> > > > >>>
> > > > >>> There currently remain 16 unresolved tickets with a fixVersion of
> > > > 1.15.1
> > > > >>> [3], five of which are about CI infrastructure and tests. There
> is
> > > > only one
> > > > >>> such ticket marked Critical:
> > > > >>>
> > > > >>> https://issues.apache.org/jira/browse/FLINK-27492 Flink table
> > scala
> > > > >>> example
> > > > >>> does not including the scala-api jars
> > > > >>>
> > > > >>> I'm not convinced we should hold up a release for this issue, but
> > on
> > > > the
> > > > >>> other hand, it seems that this issue can be resolved by making a
> > > > decision
> > > > >>> about how to handle the missing dependencies. @Timo Walther
> > > > >>>  @yun_gao can you give an update?
> > > > >>>
> > > > >>> Two other open issues seem to have made significant progress
> > (listed
> > > > >>> below). Would it make sense to wait for either of these? Are
> there
> > > any
> > > > >>> other open tickets we should consider waiting for?
> > > > >>>
> > > > >>> https://issues.apache.org/jira/browse/FLINK-27420 Suspended
> > > > SlotManager
> > > > >>> fail to reregister metrics when started again
> > > > >>> https://issues.apache.org/jira/browse/FLINK-27606
> CompileException
> > > > when
> > > > >>> using UDAF with merge() method
> > > > >>>
> > > > >>> I would volunteer to manage the release. Is there a PMC member
> who
> > > > would
> > > > >>> join me to help, as needed?
> > > > >>>
> > > > >>> Best,
> > > > >>> David
> > > > >>>
> > > > >>> [1]
> > https://flink.apache.org/news/2022/05/05/1.15-announcement.html
> > > > >>>
> > > > >>> [2]
> > > > >>>
> > > > >>>
> > > >
> > >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20status%20in%20%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC
> > > > >>>
> > > > >>> [3]
> > > > >>>
> > > > >>>
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/FLINK-27492?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC
> > > > >>>
> > > >
> > > >
> > >
> >
>
>

Re: [DISCUSS] Deprecate SourceFunction APIs

2022-06-10 Thread David Anderson

gt; > > >>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
> > > > >>>>> [2]
> > > https://lists.apache.org/thread/7okp4y46n3o3rx5mn0t3qobrof8zxwqs
> > > > >>>>>
> > > > >>>>> On Wed, Jun 8, 2022 at 2:21 AM Alexander Fedulov <
> > > > >>>> alexan...@ververica.com>
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>> Hey Austin,
> > > > >>>>>>
> > > > >>>>>> Since we are getting deeper into the implementation details of
> > the
> > > > >>>>>> DataGeneratorSource
> > > > >>>>>> and it is not the main topic of this thread, I propose to move
> > our
> > > > >>>>>> discussion to where it belongs: [DISCUSS] FLIP-238 [1]. Could
> > you
> > > > >>>> please
> > > > >>>>>> briefly formulate your requirements to make it easier for the
> > > > >> others
> > > > >>> to
> > > > >>>>>> follow? I am happy to continue this conversation there.
> > > > >>>>>>
> > > > >>>>>> [1]
> > > > >> https://lists.apache.org/thread/7gjxto1rmkpff4kl54j8nlg5db2rqhkt
> > > > >>>>>>
> > > > >>>>>> Best,
> > > > >>>>>> Alexander Fedulov
> > > > >>>>>>
> > > > >>>>>> On Tue, Jun 7, 2022 at 6:14 PM Austin Cawley-Edwards <
> > > > >>>>>> austin.caw...@gmail.com> wrote:
> > > > >>>>>>
> > > > >>>>>>>> @Austin, in the FLIP I mentioned above [1], the user is
> > > > >> expected
> > > > >>> to
> > > > >>>>>>> pass a MapFunction > > > >>>>>>> OUT>
> > > > >>>>>>> to the generator. I wonder if you could have your external
> > client
> > > > >>> and
> > > > >>>>>>> polling logic wrapped in a custom
> > > > >>>>>>> MapFunction implementation class? Would that answer your
> needs
> > or
> > > > >>> do
> > > > >>>>> you
> > > > >>>>>>> have some
> > > > >>>>>>> more sophisticated scenario in mind?
> > > > >>>>>>>
> > > > >>>>>>> At first glance, the FLIP looks good but for this case in
> > regards
> > > > >>> to
> > > > >>>>> the
> > > > >>>>>>> map function, but leaves out 1) ability to control polling
> > > > >>> intervals
> > > > >>>>> and
> > > > >>>>>> 2)
> > > > >>>>>>> ability to produce an unknown number of records, both
> per-poll
> > > > >> and
> > > > >>>>>> overall
> > > > >>>>>>> boundedness. Do you think something like this could be built
> > from
> > > > >>> the
> > > > >>>>>> same
> > > > >>>>>>> pieces?
> > > > >>>>>>> I'm also wondering what handles threading, is that on the
> user
> > or
> > > > >>> is
> > > > >>>>> that
> > > > >>>>>>> part of the DataGeneratorSource?
> > > > >>>>>>>
> > > > >>>>>>> Best,
> > > > >>>>>>> Austin
> > > > >>>>>>>
> > > > >>>>>>> On Tue, Jun 7, 2022 at 9:34 AM Alexander Fedulov <
> > > > >>>>>> alexan...@ververica.com>
> > > > >>>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi everyone,
> > > > >>>>>>>>
&g

[DISCUSS] Releasing 1.15.1

2022-06-08 Thread David Anderson

I would like to start a discussion on releasing 1.15.1. Flink 1.15 was
released on the 5th of May [1] and so far 43 issues have been resolved,
including several user-facing issues with blocker and critical priorities
[2]. (The recent problem with FileSink rolling policies not working
properly in 1.15.0 got me thinking it might be time for bug-fix release.)

There currently remain 16 unresolved tickets with a fixVersion of 1.15.1
[3], five of which are about CI infrastructure and tests. There is only one
such ticket marked Critical:

https://issues.apache.org/jira/browse/FLINK-27492 Flink table scala example
does not including the scala-api jars

I'm not convinced we should hold up a release for this issue, but on the
other hand, it seems that this issue can be resolved by making a decision
about how to handle the missing dependencies. @Timo Walther
 @yun_gao can you give an update?

Two other open issues seem to have made significant progress (listed
below). Would it make sense to wait for either of these? Are there any
other open tickets we should consider waiting for?

https://issues.apache.org/jira/browse/FLINK-27420 Suspended SlotManager
fail to reregister metrics when started again
https://issues.apache.org/jira/browse/FLINK-27606 CompileException when
using UDAF with merge() method

I would volunteer to manage the release. Is there a PMC member who would
join me to help, as needed?

Best,
David

[1] https://flink.apache.org/news/2022/05/05/1.15-announcement.html

[2]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20status%20in%20%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC

[3]
https://issues.apache.org/jira/browse/FLINK-27492?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.15.1%20ORDER%20BY%20priority%20DESC%2C%20created%20DESC

Re: [DISCUSS] Deprecate SourceFunction APIs

2022-06-06 Thread David Anderson

>
> David, can you elaborate why you need watermark generation in the source
> for your data generators?


The training exercises should strive to provide examples of best practices.
If the exercises and their solutions use

env.fromSource(source, WatermarkStrategy.noWatermarks(), "name-of-source")
  .map(...)
  .assignTimestampsAndWatermarks(...)

this will help establish this anti-pattern as the normal way of doing
things.

Most new Flink users are using a KafkaSource with a noWatermarks strategy
and a SimpleStringSchema, followed by a map that does the real
deserialization, followed by the real watermarking -- because they aren't
seeing examples that teach how these interfaces are meant to be used.

When we redo the sources used in training exercises, I want to avoid these
pitfalls.

David

On Mon, Jun 6, 2022 at 9:12 AM Konstantin Knauf  wrote:

> Hi everyone,
>
> very interesting thread. The proposal for deprecation seems to have sparked
> a very important discussion. Do we what users struggle with specifically?
>
> Speaking for myself, when I upgrade flink-faker to the new Source API an
> unbounded version of the NumberSequenceSource would have been all I needed,
> but that's just the data generator use case. I think, that one could be
> solved quite easily. David, can you elaborate why you need watermark
> generation in the source for your data generators?
>
> Cheers,
>
> Konstantin
>
>
>
>
>
> Am So., 5. Juni 2022 um 17:48 Uhr schrieb Piotr Nowojski <
> pnowoj...@apache.org>:
>
> > Also +1 to what David has written. But it doesn't mean we should be
> waiting
> > indefinitely to deprecate SourceFunction.
> >
> > Best,
> > Piotrek
> >
> > niedz., 5 cze 2022 o 16:46 Jark Wu  napisał(a):
> >
> > > +1 to David's point.
> > >
> > > Usually, when we deprecate some interfaces, we should point users to
> use
> > > the recommended alternatives.
> > > However, implementing the new Source interface for some simple
> scenarios
> > is
> > > too challenging and complex.
> > > We also found it isn't easy to push the internal connector to upgrade
> to
> > > the new Source because
> > > "FLIP-27 are hard to understand, while SourceFunction is easy".
> > >
> > > +1 to make implementing a simple Source easier before deprecating
> > > SourceFunction.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > > On Sun, 5 Jun 2022 at 07:29, Jingsong Lee 
> > wrote:
> > >
> > > > +1 to David and Ingo.
> > > >
> > > > Before deprecate and remove SourceFunction, we should have some
> easier
> > > APIs
> > > > to wrap new Source, the cost to write a new Source is too high now.
> > > >
> > > >
> > > >
> > > > Ingo Bürk 于2022年6月5日 周日05:32写道：
> > > >
> > > > > I +1 everything David said. The new Source API raised the
> complexity
> > > > > significantly. It's great to have such a rich, powerful API that
> can
> > do
> > > > > everything, but in the process we lost the ability to onboard
> people
> > to
> > > > > the APIs.
> > > > >
> > > > >
> > > > > Best
> > > > > Ingo
> > > > >
> > > > > On 04.06.22 21:21, David Anderson wrote:
> > > > > > I'm in favor of this, but I think we need to make it easier to
> > > > implement
> > > > > > data generators and test sources. As things stand in 1.15, unless
> > you
> > > > can
> > > > > > be satisfied with using a NumberSequenceSource followed by a map,
> > > > things
> > > > > > get quite complicated. I looked into reworking the data
> generators
> > > used
> > > > > in
> > > > > > the training exercises, and got discouraged by the amount of work
> > > > > involved.
> > > > > > (The sources used in the training want to be unbounded, and need
> > > > > > watermarking in the sources, which means that using
> > > > NumberSequenceSource
> > > > > > isn't an option.)
> > > > > >
> > > > > > I think the proposed deprecation will be better received if it
> can
> > be
> > > > > > accompanied by something that makes implementing a simple Source
> > > easier
> > > > > > than it is now. People are continuing to implement new
> > > Sou

Re: [DISCUSS] Deprecate SourceFunction APIs

2022-06-04 Thread David Anderson

I'm in favor of this, but I think we need to make it easier to implement
data generators and test sources. As things stand in 1.15, unless you can
be satisfied with using a NumberSequenceSource followed by a map, things
get quite complicated. I looked into reworking the data generators used in
the training exercises, and got discouraged by the amount of work involved.
(The sources used in the training want to be unbounded, and need
watermarking in the sources, which means that using NumberSequenceSource
isn't an option.)

I think the proposed deprecation will be better received if it can be
accompanied by something that makes implementing a simple Source easier
than it is now. People are continuing to implement new SourceFunctions
because the interfaces defined by FLIP-27 are hard to understand, while
SourceFunction is easy. Alex, I believe you were looking into implementing
an easier-to-use building block that could be used in situations like this.
Can we get something like that in place first?

David

On Fri, Jun 3, 2022 at 4:52 PM Jing Ge  wrote:

> Hi,
>
> Thanks Alex for driving this!
>
> +1 To give the Flink developers, especially Connector developers the clear
> signal that the new Source API is recommended according to FLIP-27, we
> should mark them as deprecated.
>
> There are some open questions to discuss:
>
> 1. Do we need to mark all subinterfaces/subclasses as deprecated? e.g.
> FromElementsFunction, etc. there are many. What are the replacements?
> 2. Do we need to mark all subclasses that have replacement as deprecated?
> e.g. ExternallyInducedSource whose replacement class, if I am not mistaken,
> ExternallyInducedSourceReader is @Experimental
> 3. Do we need to mark all related test utility classes as deprecated?
>
> I think it might make sense to create an umbrella ticket to cover all of
> these with the following process:
>
> 1. Mark SourceFunction as deprecated asap.
> 2. Mark subinterfaces and subclasses as deprecated, if there are graduated
> replacements. Good example is that KafkaSource replaced KafkaConsumer which
> has been marked as deprecated.
> 3. Do not mark subinterfaces and subclasses as deprecated, if replacement
> classes are still experimental, check if it is time to graduate them. After
> graduation, go to step 2. It might take a while for graduation.
> 4. Do not mark subinterfaces and subclasses as deprecated, if the
> replacement classes are experimental and are too young to graduate. We have
> to wait. But in this case we could create new tickets under the umbrella
> ticket.
> 5. Do not mark subinterfaces and subclasses as deprecated, if there is no
> replacement at all. We have to create new tickets and wait until the new
> implementation has been done and graduated. It will take a longer time,
> roughly 1,5 years.
> 6. For test classes, we could follow the same rule. But I think for some
> cases, we could consider doing the replacement directly without going
> through the deprecation phase.
>
> When we look back on all of these, we can realize it is a big epic (even
> bigger than an epic). It needs someone to drive it and keep focus on it
> continuously with support from the community and push the development
> towards the new Source API of FLIP-27.
>
> If we could have consensus for this,  Alex and I could create the umbrella
> ticket to kick it off.
>
> Best regards,
> Jing
>
>
> On Fri, Jun 3, 2022 at 3:54 PM Alexander Fedulov 
> wrote:
>
> > Hi everyone,
> >
> > I would like to start the discussion about marking SourceFunction-based
> > interfaces as deprecated. With the FLIP-27 APIs becoming the new
> standard,
> > the old ones have to be eventually phased out. Although this state is
> well
> > known within the community and no new connectors based on the old
> > interfaces can be accepted into the project, the footprint of
> > SourceFunction in the user code still keeps growing (primarily for data
> > generators and test utilities). I believe it is best to mark
> SourceFunction
> > as deprecated as soon as possible. What do you think?
> >
> > Best,
> > Alexander Fedulov
> >
>

Re: Request for Review: FLINK-27507 and FLINK-27509

2022-05-23 Thread David Anderson

I've taken care of this.

David

On Sun, May 22, 2022 at 4:12 AM Shubham Bansal 
wrote:

> Hi Everyone,
>
> I am not sure who to reach out for the reviews of these changesets, so I
> am putting this on the mailing list here.
>
> I have raised the review for
> FLINK-27507 - https://github.com/apache/flink-playgrounds/pull/30
> FLINK-27509 - https://github.com/apache/flink-playgrounds/pull/29
>
> I would appreciate it if somebody can review these change sets as I want
> to make the similar changes for 1.15 version after that.
> Let me know if you need more information regarding this PR, I would be
> happy to connect with you and explain the changes.
>
> Thanks,
> Shubham Bansal
>

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-10 Thread David Anderson

gt; "soft no". We may consider to also put that in the cod of conduct.
>>> >>
>>> >> Concerning using Slack for user QAs, it seem the major concern is
>>> that, we
>>> >> may end up repeatedly answering the same questions from different
>>> users,
>>> >> due to lack of capacity for archiving and searching historical
>>> >> conversations. TBH, I don't have a good solution for the
>>> archivability and
>>> >> searchability. I investigated some tools like Zapier [1], but none of
>>> them
>>> >> seems suitable for us. However, I'd like to share 2 arguments.
>>> >> - The purpose of Slack is to make the communication more efficient? By
>>> >> *efficient*, I mean saving time for both question askers and helpers
>>> with
>>> >> instance messages, file transmissions, even voice / video calls, etc.
>>> >> (Especially for cases where back and forth is needed, as David
>>> mentioned.)
>>> >> It does not mean questions that do not get enough attentions on MLs
>>> are
>>> >> now
>>> >> guaranteed to be answered immediately. We can probably put that into
>>> the
>>> >> code of conduct, and kindly guide users to first search and initiate
>>> >> questions on MLs.
>>> >> - I'd also like to share some experience from the Flink China
>>> community.
>>> >> We
>>> >> have 3 DingTalk groups with totally 25k members (might be less, I
>>> didn't
>>> >> do
>>> >> deduplication), posting hundreds of messages daily. What I'm really
>>> >> excited
>>> >> about is that, there are way more interactions between users & users
>>> than
>>> >> between users & developers. Users are helping each other, sharing
>>> >> experiences, sending screenshots / log files / documentations and
>>> solving
>>> >> problems together. We the developers seldom get pinged, if not
>>> proactively
>>> >> joined the conversations. The DingTalk groups are way more active
>>> compared
>>> >> to the user-zh@ ML, which I'd attribute to the improvement of
>>> interaction
>>> >> experiences. Admittedly, there are questions being repeatedly asked &
>>> >> answered, but TBH I don't think that compares to the benefit of a
>>> >> self-driven user community. I'd really love to see if we can bring
>>> such
>>> >> success to the global English-speaking community.
>>> >>
>>> >> Concerning StackOverFlow, it definitely worth more attention from the
>>> >> community. Thanks for the suggestion / reminder, Piotr & David. I
>>> think
>>> >> Slack and StackOverFlow are probably not mutual exclusive.
>>> >>
>>> >> Thank you~
>>> >>
>>> >> Xintong Song
>>> >>
>>> >>
>>> >> [1] https://zapier.com/
>>> >>
>>> >>
>>> >>
>>> >> On Sat, May 7, 2022 at 9:50 AM Jingsong Li 
>>> >> wrote:
>>> >>
>>> >> > Most of the open source communities I know have set up their slack
>>> >> > channels, such as Apache Iceberg [1], Apache Druid [2], etc.
>>> >> > So I think slack can be worth trying.
>>> >> >
>>> >> > David is right, there are some cases that need to communicate back
>>> and
>>> >> > forth, slack communication will be more effective.
>>> >> >
>>> >> > But back to the question, ultimately it's about whether there are
>>> >> > enough core developers willing to invest time in the slack, to
>>> >> > discuss, to answer questions, to communicate.
>>> >> > And whether there will be enough time to reply to the mailing list
>>> and
>>> >> > stackoverflow after we put in the slack (which we need to do).
>>> >> >
>>> >> > [1] https://iceberg.apache.org/community/#slack
>>> >> > [2] https://druid.apache.org/community/
>>> >> >
>>> >> > On Fri, May 6, 2022 at 10:06 PM David Anderson <
>>> dander...@apache.org>
>>> >> > wrote:
>>> >> > >
>>> >> > > I have mixed feelings about this.
>>> >> > >
>>

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-06 Thread David Anderson

I have mixed feelings about this.

I have been rather visible on stack overflow, and as a result I get a lot
of DMs asking for help. I enjoy helping, but want to do it on a platform
where the responses can be searched and shared.

It is currently the case that good questions on stack overflow frequently
go unanswered because no one with the necessary expertise takes the time to
respond. If the Flink community has the collective energy to do more user
outreach, more involvement on stack overflow would be a good place to
start. Adding slack as another way for users to request help from those who
are already actively providing support on the existing communication
channels might just lead to burnout.

On the other hand, there are rather rare, but very interesting cases where
considerable back and forth is needed to figure out what's going on. This
can happen, for example, when the requirements are unusual, or when a
difficult to diagnose bug is involved. In these circumstances, something
like slack is much better suited than email or stack overflow.

David

On Fri, May 6, 2022 at 3:04 PM Becket Qin  wrote:

> Thanks for the proposal, Xintong.
>
> While I share the same concerns as those mentioned in the previous
> discussion thread, admittedly there are benefits of having a slack channel
> as a supplementary way to discuss Flink. The fact that this topic is raised
> once a while indicates lasting interests.
>
> Personally I am open to having such a slack channel. Although it has
> drawbacks, it serves a different purpose. I'd imagine that for people who
> prefer instant messaging, in absence of the slack channel, a lot of
> discussions might just take place offline today, which leaves no public
> record at all.
>
> One step further, if the channel is maintained by the Flink PMC, some kind
> of code of conduct might be necessary. I think the suggestions of ad-hoc
> conversations, reflecting back to the emails are good starting points. I
> am +1 to give it a try and see how it goes. In the worst case, we can just
> stop doing this and come back to where we are right now.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Fri, May 6, 2022 at 8:55 PM Martijn Visser 
> wrote:
>
>> Hi everyone,
>>
>> While I see Slack having a major downside (the results are not indexed by
>> external search engines, you can't link directly to Slack content unless
>> you've signed up), I do think that the open source space has progressed and
>> that Slack is considered as something that's invaluable to users. There are
>> other Apache programs that also run it, like Apache Airflow [1]. I also see
>> it as a potential option to create a more active community.
>>
>> A concern I can see is that users will start DMing well-known
>> reviewers/committers to get a review or a PR merged. That can cause a lot
>> of noise. I can go +1 for Slack, but then we need to establish a set of
>> community rules.
>>
>> Best regards,
>>
>> Martijn
>>
>> [1] https://airflow.apache.org/community/
>>
>> On Fri, 6 May 2022 at 13:59, Piotr Nowojski  wrote:
>>
>>> Hi Xintong,
>>>
>>> I'm not sure if slack is the right tool for the job. IMO it works great
>>> as
>>> an adhoc tool for discussion between developers, but it's not searchable
>>> and it's not persistent. Between devs, it works fine, as long as the
>>> result
>>> of the ad hoc discussions is backported to JIRA/mailing list/design doc.
>>> For users, that simply would be extremely difficult to achieve. In the
>>> result, I would be afraid we are answering the same questions over, and
>>> over and over again, without even a way to provide a link to the previous
>>> thread, because nobody can search for it .
>>>
>>> I'm +1 for having an open and shared slack space/channel for the
>>> contributors, but I think I would be -1 for such channels for the users.
>>>
>>> For users, I would prefer to focus more on, for example, stackoverflow.
>>> With upvoting, clever sorting of the answers (not the oldest/newest at
>>> top)
>>> it's easily searchable - those features make it fit our use case much
>>> better IMO.
>>>
>>> Best,
>>> Piotrek
>>>
>>>
>>>
>>> pt., 6 maj 2022 o 11:08 Xintong Song  napisał(a):
>>>
>>> > Thank you~
>>> >
>>> > Xintong Song
>>> >
>>> >
>>> >
>>> > -- Forwarded message -
>>> > From: Xintong Song 
>>> > Date: Fri, May 6, 2022 at 5:07 PM
>>> > Subject: Re: [Discuss] Creating an Apache Flink slack workspace
>>> > To: private 
>>> > Cc: Chesnay Schepler 
>>> >
>>> >
>>> > Hi Chesnay,
>>> >
>>> > Correct me if I'm wrong, I don't find this is *repeatedly* discussed
>>> on the
>>> > ML. The only discussions I find are [1] & [2], which are 4 years ago.
>>> On
>>> > the other hand, I do find many users are asking questions about whether
>>> > Slack should be supported [2][3][4]. Besides, I also find a recent
>>> > discussion thread from ComDev [5], where alternative communication
>>> channels
>>> > are being discussed. It seems to me ASF is quite open to having such
>>> > additional

[jira] [Created] (FLINK-27513) Update table walkthrough playground for 1.15

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27513:
--

 Summary: Update table walkthrough playground for 1.15
 Key: FLINK-27513
 URL: https://issues.apache.org/jira/browse/FLINK-27513
 Project: Flink
  Issue Type: Sub-task
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27512) Update pyflink walkthrough playground for 1.15

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27512:
--

 Summary: Update pyflink walkthrough playground for 1.15
 Key: FLINK-27512
 URL: https://issues.apache.org/jira/browse/FLINK-27512
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training / Exercises
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27511) Update operations playground for 1.15

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27511:
--

 Summary: Update operations playground for 1.15
 Key: FLINK-27511
 URL: https://issues.apache.org/jira/browse/FLINK-27511
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation / Training / Exercises
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27510) update playgrounds for Flink 1.15

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27510:
--

 Summary: update playgrounds for Flink 1.15
 Key: FLINK-27510
 URL: https://issues.apache.org/jira/browse/FLINK-27510
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training / Exercises
Affects Versions: 1.15.0
Reporter: David Anderson


All of the playgrounds should be updated for Flink 1.15. This should include 
reworking the code as necessary to avoid using anything that has been 
deprecated.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27509) Update table walkthrough playground for 1.14

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27509:
--

 Summary: Update table walkthrough playground for 1.14
 Key: FLINK-27509
 URL: https://issues.apache.org/jira/browse/FLINK-27509
 Project: Flink
  Issue Type: Sub-task
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27508) Update pyflink walkthrough playground for 1.14

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27508:
--

 Summary: Update pyflink walkthrough playground for 1.14
 Key: FLINK-27508
 URL: https://issues.apache.org/jira/browse/FLINK-27508
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation / Training / Exercises
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27507) Update operations playground for 1.14

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27507:
--

 Summary: Update operations playground for 1.14
 Key: FLINK-27507
 URL: https://issues.apache.org/jira/browse/FLINK-27507
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 1.14.4
Reporter: David Anderson


The operations playground has yet to be updated for 1.14. At this point, it may 
as well be configured to use the latest 1.14.x release.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27506) update playgrounds for Flink 1.14

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27506:
--

 Summary: update playgrounds for Flink 1.14
 Key: FLINK-27506
 URL: https://issues.apache.org/jira/browse/FLINK-27506
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training / Exercises
Affects Versions: 1.14.4
Reporter: David Anderson


All of the flink-playgrounds need to be updated for 1.14.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27456) mistake and confusion with CEP example in docs

2022-04-29 Thread David Anderson (Jira)

David Anderson created FLINK-27456:
--

 Summary: mistake and confusion with CEP example in docs
 Key: FLINK-27456
 URL: https://issues.apache.org/jira/browse/FLINK-27456
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Library / CEP
Affects Versions: 1.14.4
Reporter: David Anderson


[https://nightlies.apache.org/flink/flink-docs-master/docs/libs/cep/#contiguity-within-looping-patterns]

In the section of the docs on contiguity within looping patterns, what it says 
about strict contiguity for the given example is either incorrect or very 
confusing (or both). It doesn't help that the example code doesn't precisely 
match the scenario described in the text.

To study this, I implemented the example in the text and find it produces no 
output for strict contiguity (as I expected), which contradicts what the text 
says.
{code:java}
public class StreamingJob {

public static void main(String[] args) throws Exception {

StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();

DataStream events = env.fromElements("a", "b1", "d1", "b2", 
"d2", "b3", "c");

AfterMatchSkipStrategy skipStrategy = 
AfterMatchSkipStrategy.skipPastLastEvent();
Pattern pattern =
Pattern.begin("a", skipStrategy)
.where(
new SimpleCondition() {
@Override
public boolean filter(String element) 
throws Exception {
return element.startsWith("a");
}
})
.next("b+")
.where(
new SimpleCondition() {
@Override
public boolean filter(String element) 
throws Exception {
return element.startsWith("b");
}
})
.oneOrMore().consecutive()
.next("c")
.where(
new SimpleCondition() {
@Override
public boolean filter(String element) 
throws Exception {
return element.startsWith("c");
}
});

PatternStream patternStream = CEP.pattern(events, 
pattern).inProcessingTime();
patternStream.select(new SelectSegment()).addSink(new 
PrintSinkFunction<>(true));
env.execute();
}

public static class SelectSegment implements PatternSelectFunction {
public String select(Map> pattern) {
return String.join("", pattern.get("a"))
+ String.join("", pattern.get("b+"))
+ String.join("", pattern.get("c"));
}
}
}
 {code}
 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [DISCUSS] FLIP-220: Temporal State

2022-04-13 Thread David Anderson

Yun Tang and Jingsong,

Some flavor of OrderedMapState is certainly feasible, and I do see some
appeal in supporting Binary**State.

However, I haven't seen a motivating use case for this generalization, and
would rather keep this as simple as possible. By handling Longs we can
already optimize a wide range of use cases.

David


On Tue, Apr 12, 2022 at 9:21 AM Yun Tang  wrote:

>  Hi David,
>
> Could you share some explanations why SortedMapState cannot work in
> details? I just cannot catch up what the statement below means:
>
> This was rejected as being overly difficult to implement in a way that
> would cleanly leverage RocksDB’s iterators.
>
>
> Best
> Yun Tang
> 
> From: Aitozi 
> Sent: Tuesday, April 12, 2022 15:00
> To: dev@flink.apache.org 
> Subject: Re: [DISCUSS] FLIP-220: Temporal State
>
> Hi David
>  I have look through the doc, I think it will be a good improvement to
> this pattern usage, I'm interested in it. Do you have some POC work to
> share for a closer look.
> Besides, I have one question that can we support expose the namespace in
> the different state type not limited to `TemporalState`. By this, user can
> specify the namespace
> and the TemporalState is one of the special case that it use timestamp as
> the namespace. I think it will be more extendable.
> What do you think about this ?
>
> Best,
> Aitozi.
>
> David Anderson  于2022年4月11日周一 20:54写道：
>
> > Greetings, Flink developers.
> >
> > I would like to open up a discussion of a proposal [1] to add a new kind
> of
> > state to Flink.
> >
> > The goal here is to optimize a fairly common pattern, which is using
> >
> > MapState>
> >
> > to store lists of events associated with timestamps. This pattern is used
> > internally in quite a few operators that implement sorting and joins, and
> > it also shows up in user code, for example, when implementing custom
> > windowing in a KeyedProcessFunction.
> >
> > Nico Kruber, Seth Wiesman, and I have implemented a POC that achieves a
> > more than 2x improvement in throughput when performing these operations
> on
> > RocksDB by better leveraging the capabilities of the RocksDB state
> backend.
> >
> > See FLIP-220 [1] for details.
> >
> > Best,
> > David
> >
> > [1] https://cwiki.apache.org/confluence/x/Xo_FD
> >
>

Re: [DISCUSS] FLIP-220: Temporal State

2022-04-13 Thread David Anderson

Aitozi,

Our POC can be seen at [1].

My personal opinion is the namespace is an implementation detail that is
better not exposed directly. Flink state is already quite complex to
understand, and I fear that if we expose the namespaces the additional
flexibility this offers will be more confusing than helpful.

[1] https://github.com/dataArtisans/flink/tree/temporal-state

On Tue, Apr 12, 2022 at 9:00 AM Aitozi  wrote:

> Hi David
>  I have look through the doc, I think it will be a good improvement to
> this pattern usage, I'm interested in it. Do you have some POC work to
> share for a closer look.
> Besides, I have one question that can we support expose the namespace in
> the different state type not limited to `TemporalState`. By this, user can
> specify the namespace
> and the TemporalState is one of the special case that it use timestamp as
> the namespace. I think it will be more extendable.
> What do you think about this ?
>
> Best,
> Aitozi.
>
> David Anderson  于2022年4月11日周一 20:54写道：
>
> > Greetings, Flink developers.
> >
> > I would like to open up a discussion of a proposal [1] to add a new kind
> of
> > state to Flink.
> >
> > The goal here is to optimize a fairly common pattern, which is using
> >
> > MapState>
> >
> > to store lists of events associated with timestamps. This pattern is used
> > internally in quite a few operators that implement sorting and joins, and
> > it also shows up in user code, for example, when implementing custom
> > windowing in a KeyedProcessFunction.
> >
> > Nico Kruber, Seth Wiesman, and I have implemented a POC that achieves a
> > more than 2x improvement in throughput when performing these operations
> on
> > RocksDB by better leveraging the capabilities of the RocksDB state
> backend.
> >
> > See FLIP-220 [1] for details.
> >
> > Best,
> > David
> >
> > [1] https://cwiki.apache.org/confluence/x/Xo_FD
> >
>

Re: [DISCUSS] FLIP-220: Temporal State

2022-04-13 Thread David Anderson

C work to
> > > share for a closer look.
> > > Besides, I have one question that can we support expose the namespace
> in
> > > the different state type not limited to `TemporalState`. By this, user
> > can
> > > specify the namespace
> > > and the TemporalState is one of the special case that it use timestamp
> as
> > > the namespace. I think it will be more extendable.
> > > What do you think about this ?
> > >
> > > Best,
> > > Aitozi.
> > >
> > > David Anderson  于2022年4月11日周一 20:54写道：
> > >
> > > > Greetings, Flink developers.
> > > >
> > > > I would like to open up a discussion of a proposal [1] to add a new
> > kind of
> > > > state to Flink.
> > > >
> > > > The goal here is to optimize a fairly common pattern, which is using
> > > >
> > > > MapState>
> > > >
> > > > to store lists of events associated with timestamps. This pattern is
> > used
> > > > internally in quite a few operators that implement sorting and joins,
> > and
> > > > it also shows up in user code, for example, when implementing custom
> > > > windowing in a KeyedProcessFunction.
> > > >
> > > > Nico Kruber, Seth Wiesman, and I have implemented a POC that
> achieves a
> > > > more than 2x improvement in throughput when performing these
> > operations on
> > > > RocksDB by better leveraging the capabilities of the RocksDB state
> > backend.
> > > >
> > > > See FLIP-220 [1] for details.
> > > >
> > > > Best,
> > > > David
> > > >
> > > > [1] https://cwiki.apache.org/confluence/x/Xo_FD
> > > >
> >
>

[DISCUSS] FLIP-220: Temporal State

2022-04-11 Thread David Anderson

Greetings, Flink developers.

I would like to open up a discussion of a proposal [1] to add a new kind of
state to Flink.

The goal here is to optimize a fairly common pattern, which is using

MapState>

to store lists of events associated with timestamps. This pattern is used
internally in quite a few operators that implement sorting and joins, and
it also shows up in user code, for example, when implementing custom
windowing in a KeyedProcessFunction.

Nico Kruber, Seth Wiesman, and I have implemented a POC that achieves a
more than 2x improvement in throughput when performing these operations on
RocksDB by better leveraging the capabilities of the RocksDB state backend.

See FLIP-220 [1] for details.

Best,
David

[1] https://cwiki.apache.org/confluence/x/Xo_FD

[jira] [Created] (FLINK-27184) Optimize IntervalJoinOperator by using temporal state

2022-04-11 Thread David Anderson (Jira)

David Anderson created FLINK-27184:
--

 Summary: Optimize IntervalJoinOperator by using temporal state
 Key: FLINK-27184
 URL: https://issues.apache.org/jira/browse/FLINK-27184
 Project: Flink
  Issue Type: Sub-task
Reporter: David Anderson


The performance of interval joins on RocksDB can be optimized by using temporal 
state.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27183) Optimize CepOperator by using temporal state

2022-04-11 Thread David Anderson (Jira)

David Anderson created FLINK-27183:
--

 Summary: Optimize CepOperator by using temporal state
 Key: FLINK-27183
 URL: https://issues.apache.org/jira/browse/FLINK-27183
 Project: Flink
  Issue Type: Sub-task
  Components: Library / CEP
Reporter: David Anderson


The performance of CEP on RocksDB can be significantly improved by having it 
use temporal state.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27182) Optimize RowTimeSortOperator by using temporal state

2022-04-11 Thread David Anderson (Jira)

David Anderson created FLINK-27182:
--

 Summary: Optimize RowTimeSortOperator by using temporal state
 Key: FLINK-27182
 URL: https://issues.apache.org/jira/browse/FLINK-27182
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Runtime
Reporter: David Anderson


The performance of the RowTimeSortOperator can be significantly improved by 
using temporal state.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27181) Optimize TemporalRowTimeJoinOperator by using temporal state

2022-04-11 Thread David Anderson (Jira)

David Anderson created FLINK-27181:
--

 Summary: Optimize TemporalRowTimeJoinOperator by using temporal 
state
 Key: FLINK-27181
 URL: https://issues.apache.org/jira/browse/FLINK-27181
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Runtime
Reporter: David Anderson


The throughput of the TemporalRowTimeJoinOperator can be significantly improved 
by using temporal state in its implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27180) Docs for temporal state

2022-04-11 Thread David Anderson (Jira)

David Anderson created FLINK-27180:
--

 Summary: Docs for temporal state
 Key: FLINK-27180
 URL: https://issues.apache.org/jira/browse/FLINK-27180
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: David Anderson


Update

[https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/state/#using-keyed-state]
 

to include temporal state.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27179) Update training example to use temporal state

2022-04-11 Thread David Anderson (Jira)

David Anderson created FLINK-27179:
--

 Summary: Update training example to use temporal state
 Key: FLINK-27179
 URL: https://issues.apache.org/jira/browse/FLINK-27179
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation / Training
Reporter: David Anderson


[https://nightlies.apache.org/flink/flink-docs-master/docs/learn-flink/event_driven/#example]
 is a good use case for temporal state (this example is doing windowing in a 
keyed process function).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27178) create examples that use temporal state

2022-04-11 Thread David Anderson (Jira)

David Anderson created FLINK-27178:
--

 Summary: create examples that use temporal state
 Key: FLINK-27178
 URL: https://issues.apache.org/jira/browse/FLINK-27178
 Project: Flink
  Issue Type: Sub-task
  Components: Examples
Reporter: David Anderson


Add examples showing how to use temporal state. E.g., sorting and/or a temporal 
join.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27177) Implement Temporal State

2022-04-11 Thread David Anderson (Jira)

David Anderson created FLINK-27177:
--

 Summary: Implement Temporal State
 Key: FLINK-27177
 URL: https://issues.apache.org/jira/browse/FLINK-27177
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / State Backends
Reporter: David Anderson


Following the plan in 
[FLIP-220|[https://cwiki.apache.org/confluence/x/Xo_FD]|https://cwiki.apache.org/confluence/x/Xo_FD],]
 * add methods to the RuntimeContext and KeyedStateStore interfaces for 
registering TemporalValueState and TemporalListState
 * 
h3. implement TemporalValueState and TemporalListState



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-27176) FLIP-220: Temporal State

2022-04-11 Thread David Anderson (Jira)

David Anderson created FLINK-27176:
--

 Summary: FLIP-220: Temporal State
 Key: FLINK-27176
 URL: https://issues.apache.org/jira/browse/FLINK-27176
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Reporter: David Anderson


Task for implementing [FLIP-220: Temporal State|
https://cwiki.apache.org/confluence/x/Xo_FD]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: [DISCUSS] Enable scala formatting check

2022-03-08 Thread David Anderson

+1

For flink-training we initially tried cloning the scalastyle setup from
flink, but we decided to use spotless + scalafmt instead.

David

On Mon, Mar 7, 2022 at 1:12 PM Timo Walther  wrote:

> Big +1
>
> This will improve the contribution experience. Even though we stopped
> adding more Scala code, it is still necessary from time to time.
>
> Regards,
> Timo
>
> Am 02.03.22 um 09:29 schrieb 刘首维:
> > +1
> >
> >
> > I still remember my first pr. Lack of experience, I had to pay attention
> to Scala code format and corrected the format manually, which made me a
> littleembarrassed(though I'm a big fan of Scala). I think this
> proposal will lighten the burden of writing Scala code.
> >
> >
> > Shouwei Liu
> >
> >
> > --原始邮件--
> > 发件人:
> "dev"
>   <
> kna...@apache.org;
> > 发送时间:2022年3月2日(星期三) 下午3:01
> > 收件人:"dev" >
> > 主题:Re: [DISCUSS] Enable scala formatting check
> >
> >
> >
> > +1 I've never written any Scala in Flink, but this makes a lot of sense
> to
> > me. Converging on a smaller set of tools and simplifying the build is
> > always a good idea and the Community already concluded before that
> spotless
> > is generally a good approach.
> >
> > On Tue, Mar 1, 2022 at 5:52 PM Francesco Guardiani <
> france...@ververica.com
> > wrote:
> >
> >  Hi all,
> > 
> >  I want to propose to enable the spotless scalafmt integration and
> remove
> >  the scalastyle plugin.
> > 
> >  From an initial analysis, scalafmt can do everything scalastyle can
> do, and
> >  the integration with spotless looks easy to enable:
> >  https://github.com/diffplug/spotless/tree/main/plugin-maven#scala.
> The
> >  scalafmt conf file gets picked up automatically from every IDE, and
> it can
> >  be heavily tuned.
> > 
> >  This way we can unify the formatting and integrate with our CI
> without any
> >  additional configurations. And we won't need scalastyle anymore, as
> >  scalafmt will take care of the checks:
> > 
> >  * mvn spotless:check will check both java and scala
> >  * mvn spotless:apply will format both java and scala
> > 
> >  WDYT?
> > 
> >  FG
> > 
> > 
> > 
> >  --
> > 
> >  Francesco Guardiani | Software Engineer
> > 
> >  france...@ververica.com
> > 
> > 
> >   > 
> >  Follow us @VervericaData
> > 
> >  --
> > 
> >  Join Flink Forward  Flink
> >  Conference
> > 
> >  Stream Processing | Event Driven | Real Time
> > 
> >  --
> > 
> >  Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> > 
> >  --
> > 
> >  Ververica GmbH
> > 
> >  Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > 
> >  Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung
> Jason,
> >  Jinwei (Kevin) Zhang
> > 
> >
> >
>
>

status of Apple Silicon (M1) as Flink dev platform?

2022-03-08 Thread David Anderson

What's the current status of using the Apple Silicon (M1) platform for
Flink development? Have we reached the point where everything "just works",
or do there remain lingering annoyances (or worse)?

In the past, I've seen reports of issues involving, e.g., RocksDB, nodejs,
protobuf, and pyflink. Looking in Jira, I see these issues haven't been
resolved yet, but I'm not sure what to read into that:

https://issues.apache.org/jira/browse/FLINK-24932 (Frocksdb cannot run on
Apple M1)
https://issues.apache.org/jira/browse/FLINK-25188 (Cannot install PyFlink
on MacOS with M1 chip)

Best,
David

Re: [VOTE] Remove Twitter connector

2022-02-03 Thread David Anderson

+1

On Mon, Jan 31, 2022 at 11:47 AM Martijn Visser 
wrote:

> Hi everyone,
>
> I would like to open up a vote to remove the Twitter connector in Flink
> 1.15. This was brought up previously for a discussion [1].
>
> The vote will last for at least 72 hours, and will be accepted by
> a consensus of active committers.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
>
> [1] https://lists.apache.org/thread/7sdvp4hj93rh0cz8r50800stzrpgkdm2
>

Re: [DISCUSS] Deprecate/remove Twitter connector

2022-01-30 Thread David Anderson

I agree.

The Twitter connector is used in a few (unofficial) tutorials, so if we
remove it that will make it more difficult for those tutorials to be
maintained. On the other hand, if I recall correctly, that connector uses
V1 of the Twitter API, which has been deprecated, so it's really not very
useful even for that purpose.

David



On Fri, Jan 21, 2022 at 9:34 AM Martijn Visser 
wrote:

> Hi everyone,
>
> I would like to discuss deprecating Flinks' Twitter connector [1]. This
> was one of the first connectors that was added to Flink, which could be
> used to access the tweets from Twitter. Given the evolution of Flink over
> Twitter, I don't think that:
>
> * Users are still using this connector at all
> * That the code for this connector should be in the main Flink codebase.
>
> Given the circumstances, I would propose to deprecate and remove this
> connector. I'm looking forward to your thoughts. If you agree, please also
> let me know if you think we should first deprecate it in Flink 1.15 and
> remove it in a version after that, or if you think we can remove it
> directly.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/twitter/
>
>

Re: Stack Overflow Question - Deserialization schema for multiple topics

2022-01-28 Thread David Anderson

For questions like this one, please address them to either Stack Overflow
or the user mailing list, but not both at once. Those two forums are
appropriate places to get help with using Flink's APIs. And once you've
asked a question, please allow some days for folks to respond before trying
again.

The dev and community mailing lists are dedicated to other topics, and
aren't suitable for getting help. The community list is for discussions
related to meetups and conferences, and the dev list is for discussions and
decision making about the ongoing development of Flink itself.

In the interest of not further spamming the dev and community lists, let's
limit the follow-up on deserializers to the user ML.

Best regards,
David

On Fri, Jan 28, 2022 at 12:07 PM Hussein El Ghoul 
wrote:

> Hello,
>
> How to specify the deserialization schema for multiple Kafka topics using
> Flink (python)
>
> I want to read from multiple Kafka topics with JSON schema using
> FlinkKafkaConsumer, and I assume that I need to use
> JsonRowDeserializationSchema to deserialize the data. The schema of the
> topics is very large (around 1500 lines for each topic), so I want to read
> it from a file instead of manually typing the types in the program. How can
> I do that?
>
> 1. How to specify deserialization schema for multiple topics (3 topics)
> 2. How to read the JSON schema from a file?
>
>
> https://stackoverflow.com/q/70892579/13067721?sem=2
>
> Thanks in advance,
> Hussein
> Quiqup - Data Engineer

Re: [VOTE] FLIP-203: Incremental savepoints

2022-01-26 Thread David Anderson

+1 (non-binding)

I'm pleased to see this significant improvement coming along, as well as
the effort made in the FLIP to document what is and isn't supported (and
where ??? remain).

On Wed, Jan 26, 2022 at 10:58 AM Yu Li  wrote:

> +1 (binding)
>
> Thanks for driving this Piotr! Just one more (belated) suggestion: in the
> "Checkpoint vs savepoint guarantees" section, there are still question
> marks scattered in the table, and I suggest putting all TODO works into the
> "Limitations" section, or adding a "Future Work" section, for easier later
> tracking.
>
> Best Regards,
> Yu
>
>
> On Mon, 24 Jan 2022 at 18:48, Konstantin Knauf  wrote:
>
> > Thanks, Piotr. Proposal looks good.
> >
> > +1 (binding)
> >
> > On Mon, Jan 24, 2022 at 11:20 AM David Morávek  wrote:
> >
> > > +1 (non-binding)
> > >
> > > Best,
> > > D.
> > >
> > > On Mon, Jan 24, 2022 at 10:54 AM Dawid Wysakowicz <
> > dwysakow...@apache.org>
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > Best,
> > > >
> > > > Dawid
> > > >
> > > > On 24/01/2022 09:56, Piotr Nowojski wrote:
> > > > > Hi,
> > > > >
> > > > > As there seems to be no further questions about the FLIP-203 [1] I
> > > would
> > > > > propose to start a voting thread for it.
> > > > >
> > > > > For me there are still two unanswered questions, whether we want to
> > > > support
> > > > > schema evolution and State Processor API with native format
> snapshots
> > > or
> > > > > not. But I would propose to tackle them as follow ups, since those
> > are
> > > > > pre-existing issues of the native format checkpoints, and could be
> > done
> > > > > completely independently of providing the native format support in
> > > > > savepoints.
> > > > >
> > > > > Best,
> > > > > Piotrek
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints
> > > > >
> > > >
> > >
> >
> >
> > --
> >
> > Konstantin Knauf
> >
> > https://twitter.com/snntrable
> >
> > https://github.com/knaufk
> >
>

Re: [DISCUSS] FLIP-203: Incremental savepoints

2022-01-14 Thread David Anderson

> I have a very similar question to State Processor API. Is it the same
scenario in this case?
> Should it also be working with checkpoints but might be just untested?

I have used the State Processor API with aligned, full checkpoints. There
it has worked just fine.

David

On Thu, Jan 13, 2022 at 12:40 PM Piotr Nowojski 
wrote:

> Hi,
>
> Thanks for the comments and questions. Starting from the top:
>
> Seth: good point about schema evolution. Actually, I have a very similar
> question to State Processor API. Is it the same scenario in this case?
> Should it also be working with checkpoints but might be just untested?
>
> And next question, should we commit to supporting those two things (State
> Processor API and schema evolution) for native savepoints? What about
> aligned checkpoints? (please check [1] for that).
>
> Yu Li: 1, 2 and 4 done.
>
> > 3. How about changing the description of "the default configuration of
> the
> > checkpoints will be used to determine whether the savepoint should be
> > incremental or not" to something like "the `state.backend.incremental`
> > setting now denotes the type of native format snapshot and will take
> effect
> > for both checkpoint and savepoint (with native type)", to prevent concept
> > confusion between checkpoint and savepoint?
>
> Is `state.backend.incremental` the only configuration parameter that can be
> used in this context? I would guess not? What about for example
> "state.storage.fs.memory-threshold" or all of the Advanced RocksDB State
> Backends Options [2]?
>
> David:
>
> > does this mean that we need to keep the checkpoints compatible across
> minor
> > versions? Or can we say, that the minor version upgrades are only
> > guaranteed with canonical savepoints?
>
> Good question. Frankly I was always assuming that this is implicitly given.
> Otherwise users would not be able to recover jobs that are failing because
> of bugs in Flink. But I'm pretty sure that was never explicitly stated.
>
> As Konstantin suggested, I've written down the pre-existing guarantees of
> checkpoints and savepoints followed by two proposals on how they should be
> changed [1]. Could you take a look?
>
> I'm especially unsure about the following things:
> a) What about RocksDB upgrades? If we bump RocksDB version between Flink
> versions, do we support recovering from a native format snapshot
> (incremental checkpoint)?
> b) State Processor API - both pre-existing and what do we want to provide
> in the future
> c) Schema Evolution - both pre-existing and what do we want to provide in
> the future
>
> Best,
> Piotrek
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-203%3A+Incremental+savepoints#FLIP203:Incrementalsavepoints-Checkpointvssavepointguarantees
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#advanced-rocksdb-state-backends-options
>
> wt., 11 sty 2022 o 09:45 Konstantin Knauf  napisał(a):
>
> > Hi Piotr,
> >
> > would it be possible to provide a table that shows the
> > compatibility guarantees provided by the different snapshots going
> forward?
> > Like type of change (Topology. State Schema, Parallelism, ..) in one
> > dimension, and type of snapshot as the other dimension. Based on that, it
> > would be easier to discuss those guarantees, I believe.
> >
> > Cheers,
> >
> > Konstantin
> >
> > On Mon, Jan 3, 2022 at 9:11 AM David Morávek  wrote:
> >
> > > Hi Piotr,
> > >
> > > does this mean that we need to keep the checkpoints compatible across
> > minor
> > > versions? Or can we say, that the minor version upgrades are only
> > > guaranteed with canonical savepoints?
> > >
> > > My concern is especially if we'd want to change layout of the
> checkpoint.
> > >
> > > D.
> > >
> > >
> > >
> > > On Wed, Dec 29, 2021 at 5:19 AM Yu Li  wrote:
> > >
> > > > Thanks for the proposal Piotr! Overall I'm +1 for the idea, and below
> > are
> > > > my two cents:
> > > >
> > > > 1. How about adding a "Term Definition" section and clarify what
> > "native
> > > > format" (the "native" data persistence format of the current state
> > > backend)
> > > > and "canonical format" (the "uniform" format that supports switching
> > > state
> > > > backends) means?
> > > >
> > > > 2. IIUC, currently the FLIP proposes to only support incremental
> > > savepoint
> > > > with native format, and there's no plan to add such support for
> > canonical
> > > > format, right? If so, how about writing this down explicitly in the
> > FLIP
> > > > doc, maybe in a "Limitations" section, plus the fact that
> > > > `HashMapStateBackend` cannot support incremental savepoint before
> > > FLIP-151
> > > > is done? (side note: @Roman just a kindly reminder, that please take
> > > > FLIP-203 into account when implementing FLIP-151)
> > > >
> > > > 3. How about changing the description of "the default configuration
> of
> > > the
> > > > checkpoints will be used to determine whether the savepoint should be
> > > > incremental or not" to

Re: [DISCUSS] Drop Gelly

2022-01-03 Thread David Anderson

Most of the inquiries I've had about Gelly in recent memory have been from
folks looking for a streaming solution, and it's only been a handful.

+1 for dropping Gelly

David

On Mon, Jan 3, 2022 at 2:41 PM Till Rohrmann  wrote:

> I haven't seen any changes or requests to/for Gelly in ages. Hence, I
> would assume that it is not really used and can be removed.
>
> +1 for dropping Gelly.
>
> Cheers,
> Till
>
> On Mon, Jan 3, 2022 at 2:20 PM Martijn Visser 
> wrote:
>
>> Hi everyone,
>>
>> Flink is bundled with Gelly, a Graph API library [1]. This has been
>> marked as approaching end-of-life for quite some time [2].
>>
>> Gelly is built on top of Flink's DataSet API, which is deprecated and
>> slowly being phased out [3]. It only works on batch jobs. Based on the
>> activity in the Dev and User mailing lists, I don't see a lot of questions
>> popping up regarding the usage of Gelly. Removing Gelly would reduce CI
>> time and resources because we won't need to run tests for this anymore.
>>
>> I'm cross-posting this to the User mailing list to see if there are any
>> users of Gelly at the moment.
>>
>> Let me know your thoughts.
>>
>> Martijn Visser | Product Manager
>>
>> mart...@ververica.com
>>
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/libs/gelly/overview/
>>
>> [2] https://flink.apache.org/roadmap.html
>>
>> [3] https://lists.apache.org/thread/b2y3xx3thbcbtzdphoct5wvzwogs9sqz
>>
>> 
>>
>>
>> Follow us @VervericaData
>>
>> --
>>
>> Join Flink Forward  - The Apache Flink
>> Conference
>>
>> Stream Processing | Event Driven | Real Time
>>
>>

Re: [DISCUSS] Conventions on assertions to use in tests

2021-11-12 Thread David Anderson

For what it's worth, I recently rewrote all of the tests in flink-training
to use assertj, removing a mixture of junit4 assertions and hamcrest in the
process. I chose assertj because I found it to be more expressive and made
the tests more readable.

+1 from me

David

On Fri, Nov 12, 2021 at 10:03 AM Francesco Guardiani <
france...@ververica.com> wrote:

> Hi all,
>
> I wonder If we have a convention of the testing tools (in particular
> assertions) to use in our tests. If not, are modules free to decide on a
> convention on their own?
>
> In case of table, we have a mixed bag of different assertions of all kinds,
> sometimes mixed even in the same test:
>
>- Assertions from junit 4
>- Assertions from junit 5
>- Hamcrest
>- Some custom assertions classes (e.g. RowDataHarnessAssertor)
>- assert instructions
>
> The result is that most tests are very complicated to read and understand,
> and we have a lot of copy pasted "assertion methods" all around our
> codebase.
>
> For table in particular, I propose to introduce assertj [1] and develop a
> couple of custom assertions [2] for the types we use most in our tests,
> e.g. Row, RowData, DataType, LogicalType, etc... For example:
>
> assertFalse(row.isNullAt(1));
> assert row instanceof GenericRowData;
> assertEquals(row.getField(1), TimestampData.ofEpochMillis(expectedMillis));
>
> Could be:
>
> assertThat(row)
>   .getField(1, TimestampData.class)
>   .isEqualToEpochMillis(expectedMillis)
>
> We could have these in table-common so every part of the table stack can
> benefit from it. Of course we can't take all our tests and convert them to
> the new assertions, but as a policy we can enforce to use the new
> assertions convention for every new test or for every test we modify in
> future PRs.
>
> What's your opinion about it? Do you agree to have such kind of policy of
> using the same assertions? If yes, do you like the idea of using assertj to
> implement such policy?
>
> FG
>
> [1] A library for assertions https://assertj.github.io, already used by
> the
> pulsar connector
> [2] https://assertj.github.io/doc/#assertj-core-custom-assertions-creation
> --
>
> Francesco Guardiani | Software Engineer
>
> france...@ververica.com
>
>
> 
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward  - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
>
> Ververica GmbH
>
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>
> Managing Directors: Karl Anton Wehner, Holger Temme, Yip Park Tung Jason,
> Jinwei (Kevin) Zhang
>

[jira] [Created] (FLINK-24478) gradle quickstart is out-of-date

2021-10-07 Thread David Anderson (Jira)

David Anderson created FLINK-24478:
--

 Summary: gradle quickstart is out-of-date
 Key: FLINK-24478
 URL: https://issues.apache.org/jira/browse/FLINK-24478
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.14.0
Reporter: David Anderson
Assignee: Nico Kruber


The gradle quickstart, as described in the docs, and produced by
  
 {{bash -c "$(curl [https://flink.apache.org/q/gradle-quickstart.sh])" – 1.14.0 
_2.11}}

is out of date, and it has some obvious errors. E.g., it defines 
scalaBinaryVersion as '_2.11', and then has

{{flinkShadowJar 
"org.apache.flink:flink-connector-kafka-0.11_${scalaBinaryVersion}:${flinkVersion}"}}

which is both ancient and includes the _ again. (I realize now that the extra _ 
actually comes from the bash command I copied from the docs, so the docs need 
to be fixed as well.)

The quickstart also doesn't produce a gradlew script, and if I try

{{gradle build}}

I get this output:

{{$ gradle build
 Starting a Gradle Daemon (subsequent builds will be faster)

FAILURE: Build failed with an exception.
 * Where:
 Build file '/Users/david/stuff/quickstart/build.gradle' line: 41

 * What went wrong:
 A problem occurred evaluating root project 'quickstart'.
 > Cannot add task 'wrapper' as a task with that name already exists}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-24118) enable TaxiFareGenerator to produce a bounded stream

2021-09-01 Thread David Anderson (Jira)

David Anderson created FLINK-24118:
--

 Summary: enable TaxiFareGenerator to produce a bounded stream
 Key: FLINK-24118
 URL: https://issues.apache.org/jira/browse/FLINK-24118
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training / Exercises
Reporter: David Anderson
Assignee: David Anderson


I would like to use the TaxiFareGenerator in tests for the training exercises. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23926) change TaxiRide data model to have a single timestamp

2021-08-23 Thread David Anderson (Jira)

David Anderson created FLINK-23926:
--

 Summary: change TaxiRide data model to have a single timestamp
 Key: FLINK-23926
 URL: https://issues.apache.org/jira/browse/FLINK-23926
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training / Exercises
Reporter: David Anderson
Assignee: David Anderson


The current TaxiRide events have two timestamps – the startTime and endTime. 
Which timestamp applies to a given event depends on the value of the isStart 
field. This is awkward, and unnecessary. It would be better to have a single 
eventTime field. This will make the exercises better examples, and allow for 
more straightforward conversion from DataStream to Table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Were Bundles meant to be internal?

2021-08-19 Thread David Anderson

Most of the table/bundle related classes such as
MapBundleFunction, MapBundleOperator, and CountBundleTrigger aren't marked
as either @Internal or @Public. What was the intention?

I ask because I'm starting to see some interest in using them for
implementing pre-aggregation via the DataStream API, and I'm not convinced
this is a good idea. E.g., see
https://stackoverflow.com/questions/68811184/pre-shuffle-aggregation-in-flink
.

David

[jira] [Created] (FLINK-23840) Confusing message from MemCheckpointStreamFactory#checkSize

2021-08-17 Thread David Anderson (Jira)

David Anderson created FLINK-23840:
--

 Summary: Confusing message from 
MemCheckpointStreamFactory#checkSize
 Key: FLINK-23840
 URL: https://issues.apache.org/jira/browse/FLINK-23840
 Project: Flink
  Issue Type: Technical Debt
  Components: Runtime / State Backends
Affects Versions: 1.13.2
Reporter: David Anderson
 Fix For: 1.14.0


After the refactoring of the state backends and checkpoint storage done in 
1.13, some folks who were using either the filesystem state backend or the 
rocksdb state backend find themselves accidentally using 
JobManagerCheckpointStorage (because it is the default), and then are very 
confused by this error message:
 
{{throw new IOException(}}
{{ "Size of the state is larger than the maximum permitted memory-backed state. 
Size="}}
{{ + size}}
{{ + " , maxSize="}}
{{ + maxSize}}
{{ + " . Consider using a different state backend, like the File System State 
backend.");}}

This should instead say something like
{quote}Consider using FileSystemCheckpointStorage instead.
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23653) improve training exercises and tests so they are better examples

2021-08-05 Thread David Anderson (Jira)

David Anderson created FLINK-23653:
--

 Summary: improve training exercises and tests so they are better 
examples
 Key: FLINK-23653
 URL: https://issues.apache.org/jira/browse/FLINK-23653
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training / Exercises
Reporter: David Anderson


The tests for the training exercises are implemented in a way that permits the 
same tests to be used for both the exercises and the solutions, and for both 
the Java and Scala implementations. The way that this was done is a bit 
awkward. 

It would be better to
 * eliminate the ExerciseBase class and its mechanisms for setting the 
source(s) and sink and parallelism
 * have tests that run with parallelism > 1
 * speed up the tests by using MiniClusterWithClientResource

It's also the case that the watermarking is done by calling emitWatermark in 
the sources. This is confusing; the watermarking should be visibly implemented 
in the exercises and solutions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [ANNOUNCE] RocksDB Version Upgrade and Performance

2021-08-04 Thread David Anderson

I am hearing quite often from users who are struggling to manage memory
usage, and these are all users using RocksDB. While I don't know for
certain that RocksDB is the cause in every case, from my perspective,
getting the better memory stability of version 6.20 in place is critical.

Regards,
David

On Wed, Aug 4, 2021 at 8:08 AM Stephan Ewen  wrote:

> Hi all!
>
> *!!!  If you are a big user of the Embedded RocksDB State Backend and have
> performance sensitive workloads, please read this !!!*
>
> I want to quickly raise some awareness for a RocksDB version upgrade we
> plan to do, and some possible impact on application performance.
>
> *We plan to upgrade RocksDB to version 6.20.* That version of RocksDB
> unfortunately introduces some non-trivial performance regression. In our
> Nexmark Benchmark, at least one query is up to 13% slower.
> With some fixes, this can be improved, but even then there is an overall 
> *regression
> up to 6% in some queries*. (See attached table for results from relevant
> Nexmark Benchmark queries).
>
> We would do this update nonetheless, because we need to get new features
> and bugfixes from RocksDB in.
>
> Please respond to this mail thread if you have major concerns about this.
>
>
> *### Fallback Plan*
>
> Optionally, we could fall back to Plan B, which is to upgrade RocksDB only
> to version 5.18.4.
> Which has no performance regression (after applying a custom patch).
>
> While this spares us the performance degradation of RocksDB 6.20.x, this
> has multiple disadvantages:
>   - Does not include the better memory stability (strict cache control)
>   - Misses out on some new features which some users asked about
>   - Does not have the latest RocksDB bugfixes
>
> The latest point is especially bad in my opinion. While we can cherry-pick
> some bugfixes back (and have done this in the past), users typically run
> into an issue first and need to trace it back to RocksDB, then one of the
> committers can find the relevant patch from RocksDB master and backport it.
> That isn't the greatest user experience.
>
> Because of those disadvantages, we would prefer to do the upgrade to the
> newer RocksDB version despite the unfortunate performance regression.
>
> Best,
> Stephan
>
>
>

[jira] [Created] (FLINK-23128) Translate update to operations playground docs to Chinese

2021-06-23 Thread David Anderson (Jira)

David Anderson created FLINK-23128:
--

 Summary: Translate update to operations playground docs to Chinese
 Key: FLINK-23128
 URL: https://issues.apache.org/jira/browse/FLINK-23128
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation / Training
Affects Versions: 1.13.1
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23100) Update pyflink walkthrough playground for 1.13

2021-06-22 Thread David Anderson (Jira)

David Anderson created FLINK-23100:
--

 Summary: Update pyflink walkthrough playground for 1.13
 Key: FLINK-23100
 URL: https://issues.apache.org/jira/browse/FLINK-23100
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation / Training / Exercises
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23099) Update table walkthrough playground for 1.13

2021-06-22 Thread David Anderson (Jira)

David Anderson created FLINK-23099:
--

 Summary: Update table walkthrough playground for 1.13
 Key: FLINK-23099
 URL: https://issues.apache.org/jira/browse/FLINK-23099
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation / Training / Exercises
Reporter: David Anderson
Assignee: David Anderson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-23098) Update operations playground for 1.13

2021-06-22 Thread David Anderson (Jira)

David Anderson created FLINK-23098:
--

 Summary: Update operations playground for 1.13
 Key: FLINK-23098
 URL: https://issues.apache.org/jira/browse/FLINK-23098
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation / Training / Exercises
Affects Versions: 1.13.0
Reporter: David Anderson
Assignee: David Anderson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

trying (and failing) to update pyflink-walkthrough for Flink 1.13

2021-06-21 Thread David Anderson

I've been trying to upgrade the pyflink-walkthrough to Flink 1.13.1, but
without any success.

Unless I give it a lot of resources the data generator times out trying to
connect to Kafka. If I give it 6 cores and 11GB (which is about all I can
offer it) it does manage to connect, but then fails trying to write to
kafka.

Not sure what's wrong? Any suggestions?

See [1] to review what I tried.

Best,
David

[1]
https://github.com/alpinegizmo/flink-playgrounds/commit/777274355ba04de6d8c8f1308b24be99ec86a0d6

21:40 $ docker-compose logs -f generator

Attaching to pyflink-walkthrough_generator_1

generator_1  | Connecting to Kafka brokers

generator_1  | Waiting for brokers to become available

generator_1  | Waiting for brokers to become available

generator_1  | Connected to Kafka

generator_1  | Traceback (most recent call last):

generator_1  |   File "./generate_source_data.py", line 61, in


generator_1  | write_data(producer)

generator_1  |   File "./generate_source_data.py", line 42, in
write_data

generator_1  | producer.send(topic, value=cur_data)

generator_1  |   File
"/usr/local/lib/python3.7/site-packages/kafka/producer/kafka.py", line 576,
in send

generator_1  | self._wait_on_metadata(topic,
self.config['max_block_ms'] / 1000.0)

generator_1  |   File
"/usr/local/lib/python3.7/site-packages/kafka/producer/kafka.py", line 703,
in _wait_on_metadata

generator_1  | "Failed to update metadata after %.1f secs."
% (max_wait,))

generator_1  | kafka.errors.KafkaTimeoutError:
KafkaTimeoutError: Failed to update metadata after 60.0 secs.

[jira] [Created] (FLINK-23059) Update playgrounds for Flink 1.13

2021-06-21 Thread David Anderson (Jira)

David Anderson created FLINK-23059:
--

 Summary: Update playgrounds for Flink 1.13
 Key: FLINK-23059
 URL: https://issues.apache.org/jira/browse/FLINK-23059
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training / Exercises
Affects Versions: 1.13.0
Reporter: David Anderson
Assignee: David Anderson


The various playgrounds in apache/flink-playgrounds all need an update for the 
1.13 release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22948) Scala example for toDataStream does not compile

2021-06-09 Thread David Anderson (Jira)

David Anderson created FLINK-22948:
--

 Summary: Scala example for toDataStream does not compile
 Key: FLINK-22948
 URL: https://issues.apache.org/jira/browse/FLINK-22948
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Table SQL / API
Affects Versions: 1.13.1
Reporter: David Anderson


The scala example at 
[https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/table/data_stream_api/#examples-for-todatastream]
 does not compile – {{User.class}} should be {{classOf[User]}}.

It would also be better to show the table DDL as

{{tableEnv.executeSql(}}
{{  """}}
{{  CREATE TABLE GeneratedTable (}}
{{    name STRING,}}
{{    score INT,}}
{{    event_time TIMESTAMP_LTZ(3),}}
{{    WATERMARK FOR event_time AS event_time - INTERVAL '10' SECOND}}
{{  )}}
{{  WITH ('connector'='datagen')}}
{{  """}}
{{)}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22894) Window Top-N should allow n=1

2021-06-06 Thread David Anderson (Jira)

David Anderson created FLINK-22894:
--

 Summary: Window Top-N should allow n=1
 Key: FLINK-22894
 URL: https://issues.apache.org/jira/browse/FLINK-22894
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Runtime
Affects Versions: 1.13.1
Reporter: David Anderson


I tried to reimplement the Hourly Tips exercise from the DataStream training 
using Flink SQL. The objective of this exercise is to find the one taxi driver 
who earned the most in tips during each hour, and report that driver's driverId 
and the sum of their tips. 

This can be expressed as a window top-n query, where n=1, as in

{{FROM (}}
{{  SELECT *, ROW_NUMBER() OVER }}{{(PARTITION BY window_start, window_end 
ORDER BY sumOfTips DESC) as rownum}}
{{  FROM ( }}
{{    SELECT driverId, window_start, window_end, sum(tip) as sumOfTips}}
{{    FROM TABLE( }}
{{      TUMBLE(TABLE fares, DESCRIPTOR(startTime), INTERVAL '1' HOUR))}}
{{    GROUP BY driverId, window_start, window_end}}
{{  )}}
{{) WHERE rownum = 1;}}

 

This fails because the {{WindowRankOperatorBuilder}} insists on {{rankEnd > 1. 
}}So, in other words, while it is possible to report the top 2 drivers, or the 
driver in 2nd place, it's not possible to report only the top driver.

This appears to be an off-by-one error in the range checking.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22868) Update training exercises for 1.13

2021-06-03 Thread David Anderson (Jira)

David Anderson created FLINK-22868:
--

 Summary: Update training exercises for 1.13
 Key: FLINK-22868
 URL: https://issues.apache.org/jira/browse/FLINK-22868
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training / Exercises
Affects Versions: 1.13.1, 1.13.0
Reporter: David Anderson
Assignee: David Anderson


The exercises in the flink-training repo need to be updated for 1.13. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22737) Add support for CURRENT_WATERMARK to SQL

2021-05-21 Thread David Anderson (Jira)

David Anderson created FLINK-22737:
--

 Summary: Add support for CURRENT_WATERMARK to SQL
 Key: FLINK-22737
 URL: https://issues.apache.org/jira/browse/FLINK-22737
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API
Reporter: David Anderson


With a built-in function returning the current watermark, one could operate on 
late events without resorting to using the DataStream API.

Called with zero parameters, this function returns the current watermark for 
the current row – if there is an event time attribute. Otherwise, it returns 
NULL. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-22543) layout of exception history tab isn't very usable with Flink SQL

2021-05-01 Thread David Anderson (Jira)

David Anderson created FLINK-22543:
--

 Summary: layout of exception history tab isn't very usable with 
Flink SQL
 Key: FLINK-22543
 URL: https://issues.apache.org/jira/browse/FLINK-22543
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Web Frontend
Affects Versions: 1.13.0
Reporter: David Anderson
 Attachments: image-2021-05-01-12-38-38-178.png

With Flink SQL, the name field can be very long, in which case the Time and 
Exception columns of the Exception History view become very narrow and hard to 
read.

Also, the Cancel Job button is covered over with other text.

 

!image-2021-05-01-12-38-38-178.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Release 1.13.0, release candidate #2

2021-04-29 Thread David Anderson

+1 (non-binding)

Checks:
- I built from source, successfully.
- I tested the new backpressure metrics and UI. I found one non-critical
bug that's been around for years, and for which a fix has already been
merged for 1.13.1 (https://issues.apache.org/jira/browse/FLINK-22489

).
- I tested flame graphs.

On Thu, Apr 29, 2021 at 2:17 PM Robert Metzger  wrote:

> Thanks for creating the RC and managing the release process so far Guowei
> and Dawid!
>
> +1 (binding)
>
> Checks:
> - I deployed the RC on AWS EMR (session cluster and per-job cluster). I
> confirmed a minor issue Arvid Heise told me offline about:
> https://issues.apache.org/jira/browse/FLINK-22509. I believe we can ignore
> this issue.
> - I tested reactive mode extensively on Kubernetes, letting it scale up and
> down for a very long time (multiple weeks)
> - Checked the changes to the pom files: all dependency changes seem to be
> reflected properly in the NOTICE files
>   - netty bump to 4.1.46
>   - Elasticsearch to 1.15.1
>   - hbase dependency changes
>   - the new aws glue schema doesn't deploy anything foreign to maven
>   -flink-sql-connector-hbase-1.4 excludes fewer hbase classes, but seems
> fine
> - the license checker has not been changed in this release cycle (there are
> two exclusion lists in there)
>
>
>
>
>
> On Thu, Apr 29, 2021 at 12:05 PM Yun Tang  wrote:
>
> > +1 (non-binding)
> >
> > - built from source code with scala 2.11 succeeded
> > - submit state machine example and it runed well with expected commit id
> > shown in UI.
> > - enable state latency tracking with slf4j metrics reporter and all
> > behaves as expected.
> > - Click 'FlameGraph' but found the we UI did not give friendly hint to
> > tell me enable it via setting rest.flamegraph.enabled: true, will create
> > issue later.
> >
> > Best
> > Yun Tang
> > 
> > From: Leonard Xu 
> > Sent: Thursday, April 29, 2021 16:52
> > To: dev 
> > Subject: Re: [VOTE] Release 1.13.0, release candidate #2
> >
> > +1 (non-binding)
> >
> > - verified signatures and hashes
> > - built from source code with scala 2.11 succeeded
> > - started a cluster, WebUI was accessible, ran some simple SQL jobs, no
> > suspicious log output
> > - tested time functions and time zone usage in SQL Client, the query
> > result is as expected
> > - the web PR looks good
> > - found one minor exception message typo, will improve it later
> >
> > Best,
> > Leonard Xu
> >
> > > 在 2021年4月29日，16:11，Xingbo Huang  写道：
> > >
> > > +1 (non-binding)
> > >
> > > - verified checksum and signature
> > > - test upload `apache-flink` and `apache-flink-libraries` to test.pypi
> > > - pip install `apache-flink-libraries` and `apache-flink` in mac os
> > > - started cluster and run row-based operation test
> > > - started cluster and test python general group window agg
> > >
> > > Best,
> > > Xingbo
> > >
> > > Dian Fu  于2021年4月29日周四 下午4:05写道：
> > >
> > >> +1 (binding)
> > >>
> > >> - Verified the signature and checksum
> > >> - Installed PyFlink successfully using the source package
> > >> - Run a few PyFlink examples: Python UDF, Pandas UDF, Python
> DataStream
> > >> API with state access, Python DataStream API with batch execution mode
> > >> - Reviewed the website PR
> > >>
> > >> Regards,
> > >> Dian
> > >>
> > >>> 2021年4月29日 下午3:11，Jark Wu  写道：
> > >>>
> > >>> +1 (binding)
> > >>>
> > >>> - checked/verified signatures and hashes
> > >>> - started cluster and run some e2e sql queries using SQL Client,
> > results
> > >>> are as expect:
> > >>> * read from kafka source, window aggregate, lookup mysql database,
> > write
> > >>> into elasticsearch
> > >>> * window aggregate using legacy window syntax and new window TVF
> > >>> * verified web ui and log output
> > >>> - reviewed the release PR
> > >>>
> > >>> I found the log contains some verbose information when using window
> > >>> aggregate,
> > >>> but I think this doesn't block the release, I created FLINK-22522 to
> > fix
> > >>> it.
> > >>>
> > >>> Best,
> > >>> Jark
> > >>>
> > >>>
> > >>> On Thu, 29 Apr 2021 at 14:46, Dawid Wysakowicz <
> dwysakow...@apache.org
> > >
> > >>> wrote:
> > >>>
> >  Hey Matthias,
> > 
> >  I'd like to double confirm what Guowei said. The dependency is
> Apache
> > 2
> >  licensed and we do not bundle it in our jar (as it is in the runtime
> >  scope) thus we do not need to mention it in the NOTICE file (btw,
> the
> >  best way to check what is bundled is to check the output of maven
> > shade
> >  plugin). Thanks for checking it!
> > 
> >  Best,
> > 
> >  Dawid
> > 
> >  On 29/04/2021 05:25, Guowei Ma wrote:
> > > Hi, Matthias
> > >
> > > Thank you very much for your careful inspection.
> > > I check the flink-python_2.11-1.13.0.jar and we do not bundle
> > > org.conscrypt:conscrypt-openjdk-uber:2.5.1 to

1 2 3 >

1 - 100 of 236 matches

Mail list logo