date:20220505

[jira] [Created] (FLINK-27520) Use admission-controller-framework in Webhook

2022-05-05 Thread Matyas Orhidi (Jira)

Matyas Orhidi created FLINK-27520:
-

 Summary: Use admission-controller-framework in Webhook
 Key: FLINK-27520
 URL: https://issues.apache.org/jira/browse/FLINK-27520
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-0.1.0
Reporter: Matyas Orhidi


Use the released 
[https://github.com/java-operator-sdk/admission-controller-framework]

instead of borrowed source codes in the Webhook module.

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Lincoln Lee

Congratulations Yang!

Best,
Lincoln Lee


Őrhidi Mátyás  于2022年5月6日周五 12:46写道：

> Congrats Yang! Well deserved!
> Best,
> Matyas
>
> On Fri, May 6, 2022 at 5:30 AM huweihua  wrote:
>
> > Congratulations Yang!
> >
> > Best,
> > Weihua
> >
> >
>

Re: Source alignment for Iceberg

2022-05-05 Thread Piotr Nowojski

Option 1 sounds reasonable but I would be tempted to wait for a second 
motivational use case before generalizing the framework. However I wouldn’t 
oppose this extension if others feel it’s useful and good thing to do

Piotrek

> Wiadomość napisana przez Becket Qin  w dniu 06.05.2022, 
> o godz. 03:50:
> 
> I think the key point here is essentially what information should Flink
> expose to the user pluggables. Apparently split / local task watermark is
> something many user pluggables would be interested in. Right now it is
> calculated by the Flink framework but not exposed to the users space, i.e.
> SourceReader / SplitEnumerator. So it looks at least we can offer this
> information in some way so users can leverage that information to do
> things.
> 
> That said, I am not sure if this would help in the Iceberg alignment case.
> Because at this point, FLIP-182 reports source reader watermarks
> periodically, which may not align with the RequestSplitEvent. So if we
> really want to leverage the FLIP-182 mechanism here, I see a few ways, just
> to name two of them:
> 1. we can expose the source reader watermark in the SourceReaderContext, so
> the source readers can put the local watermark in a custom operator event.
> This will effectively bypass the existing RequestSplitEvent. Or we can also
> extend the RequestSplitEvent to add an additional info field of byte[]
> type, so users can piggy-back additional information there, be it watermark
> or other stuff.
> 2. Simply piggy-back the local watermark in the RequestSplitEvent and pass
> that info to the SplitEnumerator as well.
> 
> If we are going to do this, personally I'd prefer the first way, as it
> provides a mechanism to allow future extension. So it would be easier to
> expose other framework information to the user space in the future.
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> 
> 
>> On Fri, May 6, 2022 at 6:15 AM Thomas Weise  wrote:
>> 
>>> On Wed, May 4, 2022 at 11:03 AM Steven Wu  wrote:
>>> Any opinion on different timestamp for source alignment (vs Flink
>> application watermark)? For Iceberg source, we might want to enforce
>> alignment on kafka timestamp but Flink application watermark may use event
>> time field from payload.
>> 
>> I imagine that more generally the question is alignment based on the
>> iceberg partition/file metadata vs. individual rows? I think that
>> should work as long as there is a guarantee for out of orderness
>> within the split?
>> 
>> Thomas
>> 
>>> 
>>> Thanks,
>>> Steven
>>> 
>>> On Wed, May 4, 2022 at 7:02 AM Becket Qin  wrote:
 
 Hey Piotr,
 
 I think the mechanism FLIP-182 provided is a reasonable default one,
>> which
 ensures the watermarks are only drifted by an upper bound. However,
 admittedly there are also other strategies for different purposes.
 
 In the Iceberg case, I am not sure if a static strictly allowed
>> watermark
 drift is desired. The source might just want to finish reading the
>> assigned
 splits as fast as possible. And it is OK to have a drift of "one split",
 instead of a fixed time period.
 
 As another example, if there are some fast readers whose splits are
>> always
 throttled, while the other slow readers are struggling to keep up with
>> the
 rest of the splits, the split enumerator may decide to reassign the slow
 splits so all the readers have something to read. This would need the
 SplitEnumerator to be aware of the watermark progress on each reader.
>> So it
 seems useful to expose the WatermarkAlignmentEvent information to the
 SplitEnumerator as well.
 
 Thanks,
 
 Jiangjie (Becket) Qin
 
 
 
 On Tue, May 3, 2022 at 7:58 PM Piotr Nowojski 
>> wrote:
 
> Hi Steven,
> 
> Isn't this redundant to FLIP-182 and FLIP-217? Can not Iceberg just
>> emit
> all splits and let FLIP-182/FLIP-217 handle the watermark alignment
>> and
> block the splits that are too much into the future? I can see this
>> being an
> issue if the existence of too many blocked splits is occupying too
>> many
> resources.
> 
> If that's the case, indeed SourceCoordinator/SplitEnumerator would
>> have to
> decide on some basis how many and which splits to assign in what
>> order. But
> in that case I'm not sure how much you could use from FLIP-182 and
> FLIP-217. They seem somehow orthogonal to me, operating on different
> levels. FLIP-182 and FLIP-217 are working with whatever splits have
>> already
> been generated and assigned. You could leverage FLIP-182 and FLIP-217
>> and
> take care of only the problem to limit the number of parallel active
> splits. And here I'm not sure if it would be worth generalising a
>> solution
> across different connectors.
> 
> Regarding the global watermark, I made a related comment sometime ago
> about it [1]. It sounds to me like you also need to solve this
>> problem,
>

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Őrhidi Mátyás

Congrats Yang! Well deserved!
Best,
Matyas

On Fri, May 6, 2022 at 5:30 AM huweihua  wrote:

> Congratulations Yang!
>
> Best,
> Weihua
>
>

Re: [DISCUSS] DockerHub repository maintainers

2022-05-05 Thread Xintong Song

It seems to me we at least don't have a consensus on dropping the use of
apache namespace, which means we need to decide on a list of maintainers
anyway. So maybe we can get the discussion back to the maintainers. We may
continue the official-image vs. apache-namespace in a separate thread if
necessary.

As mentioned previously, we need to reduce the number of maintainers from
20 to 5, as required by INFRA. Jingsong and I would like to volunteer as 2
of the 5, and we would like to learn who else wants to join us. Of course
the list of maintainers can be modified later.

*This also means the current maintainers may be removed from the list.*
Please let us know if you still need that privilege. CC-ed all the current
maintainers for attention.

Thank you~

Xintong Song



On Wed, May 4, 2022 at 3:14 PM Chesnay Schepler  wrote:

> One advantage is that the images are periodically rebuilt to get
> security fixes.
>
> The operator is a different story anyway because it is AFAIK only
> supposed to be used via docker
> (i.e., no standalone mode), which alleviates concerns about keeping the
> logic within the image
> to a minimum (which bit us in the past on the flink side).
>
> On 03/05/2022 16:09, Yang Wang wrote:
> > The flink-kubernetes-operator project is only published
> > via apache/flink-kubernetes-operator on docker hub and github packages.
> > We do not find the obvious advantages by using docker hub official
> images.
> >
> > Best,
> > Yang
> >
> > Xintong Song  于2022年4月28日周四 19:27写道：
> >
> >> I agree with you that doing QA for the image after the release has been
> >> finalized doesn't feel right. IIUR, that is mostly because official
> image
> >> PR needs 1) the binary release being deployed and propagated and 2) the
> >> corresponding git commit being specified. I'm not completely sure about
> >> this. Maybe we can improve the process by investigating more about the
> >> feasibility of pre-verifying an official image PR before finalizing the
> >> release. It's definitely a good thing to do if possible.
> >>
> >> I also agree that QA from DockerHub folks is valuable to us.
> >>
> >> I'm not against publishing official-images, and I'm not against working
> >> closely with the DockerHub folks to improve the process of delivering
> the
> >> official image. However, I don't think these should become reasons that
> we
> >> don't release our own apache/flink images.
> >>
> >> Taking the 1.12.0 as an example, admittedly it would be nice for us to
> >> comply with the DockerHub folks' standards and not have a
> >> just-for-kubernetes command in our entrypoint. However, this is IMO far
> >> less important compared to delivering the image to our users timely. I
> >> guess that's where the DockerHub folks and us have different
> >> priorities, and that's why I think we should have a path that is fully
> >> controlled by this community to deliver images. We could take their
> >> valuable inputs and improve afterwards. Actually, that's what we did for
> >> 1.12.0 by starting to release to apache/flink.
> >>
> >> Thank you~
> >>
> >> Xintong Song
> >>
> >>
> >>
> >> On Thu, Apr 28, 2022 at 6:30 PM Chesnay Schepler 
> >> wrote:
> >>
> >>> I still think that's mostly a process issue.
> >>> Of course we can be blind-sided if we do the QA for a release artifact
> >>> after the release has been finalized.
> >>> But that's a clearly broken process from the get-go.
> >>>
> >>> At the very least we should already open a PR when the RC is created to
> >>> get earlier feedback.
> >>>
> >>> Moreover, nowadays the docker images are way slimmer and we are much
> >>> more careful on what is actually added to the scripts.
> >>>
> >>> Finally, the problems they found did show that their QA is very
> valuable
> >>> to us. And side-stepping that for such an essential piece of a release
> >>> isn't a good idea imo.
> >>>
> >>> On 28/04/2022 11:31, Xintong Song wrote:
>  I'm overall against only releasing to official-images.
> 
>  We started releasing to apache/flink, in addition to the
> >> official-image,
> >>> in
>  1.12.0. That was because releasing the official-image needs approval
> >> from
>  the DockerHub folks, which is not under control of the Flink
> community.
> >>> For
>  1.12.0 there were unfortunately some divergences between us and the
>  DockerHub folks, and it ended-up taking us nearly 2 months to get that
>  official-image PR merged [1][2]. Many users, especially those who need
>  Flink's K8s & Native-K8s deployment modes, were asking for the image
> >>> after
>  1.12.0 was announced.
> 
>  One could argue that what happened for 1.12.0 is not a regular case.
>  However, I'd like to point out that the docker images are not
> something
>  nice-to-have, but a practically necessary piece of the release for the
> >>> k8s
>  / native-k8s deployments to work. I'm strongly against a release
> >> process
>  where such an important piece depends on the approval

[jira] [Created] (FLINK-27519) Fix duplicates names when there are multiple levels of over window aggregate

2022-05-05 Thread jinfeng (Jira)

jinfeng created FLINK-27519:
---

 Summary: Fix duplicates names when there are multiple levels of 
over window aggregate
 Key: FLINK-27519
 URL: https://issues.apache.org/jira/browse/FLINK-27519
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Affects Versions: 1.15.0
Reporter: jinfeng


A similar  issue like 
[FLINK-22121|https://issues.apache.org/jira/browse/FLINK-22121]

And can be reproduced by adding this unit test 

org.apache.flink.table.planner.plan.stream.sql.agg.GroupWindowTest#testWindowAggregateWithAnotherWindowAggregate
{code:java}
//代码占位符
@Test
def testWindowAggregateWithAnotherWindowAggregate(): Unit = {
  val sql =
"""
  |SELECT CAST(pv AS INT) AS pv, CAST(uv AS INT) AS uv FROM (
  |  SELECT *, count(distinct(c)) over (partition by a order by b desc) AS 
uv
  |  FROM (
  |SELECT *, count(*) over (partition by a, c order by b desc) AS pv
  |FROM MyTable
  |  )
  |)
  |""".stripMargin
  util.verifyExecPlan(sql)
} {code}
The error message : 

 

 
{code:java}


//代码占位符
org.apache.flink.table.api.ValidationException: Field names must be unique. 
Found duplicates: [w0$o0]    at 
org.apache.flink.table.types.logical.RowType.validateFields(RowType.java:273)
    at org.apache.flink.table.types.logical.RowType.(RowType.java:158)
    at org.apache.flink.table.types.logical.RowType.of(RowType.java:298)
    at org.apache.flink.table.types.logical.RowType.of(RowType.java:290)
    at 
org.apache.flink.table.planner.calcite.FlinkTypeFactory$.toLogicalRowType(FlinkTypeFactory.scala:663)
    at 
org.apache.flink.table.planner.plan.nodes.physical.stream.StreamPhysicalOverAggregate.translateToExecNode(StreamPhysicalOverAggregate.scala:57)
    at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeGraphGenerator.generate(ExecNodeGraphGenerator.java:74)
    at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNodeGraphGenerator.generate(ExecNodeGraphGenerator.java:71)
 {code}
 

I think we can add come logical in  FlinkLogicalOverAggregate  to avoid 
duplicate names of  output rowType. 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27518) Refactor migration tests to support generate snapshots of new version automatically

2022-05-05 Thread Yun Gao (Jira)

Yun Gao created FLINK-27518:
---

 Summary: Refactor migration tests to support generate snapshots of 
new version automatically
 Key: FLINK-27518
 URL: https://issues.apache.org/jira/browse/FLINK-27518
 Project: Flink
  Issue Type: Bug
  Components: Test Infrastructure
Affects Versions: 1.16.0
Reporter: Yun Gao


Currently on releasing each version, we need to manually generate the snapshots 
for every migration tests and update the current versions. With more and more 
migration tests are added, this has been more and more intractable. It is 
better if we could make it happen automatically on cutting new branches. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27517) Introduce rolling file writer to write one record each time for append-only table.

2022-05-05 Thread Zheng Hu (Jira)

Zheng Hu created FLINK-27517:


 Summary: Introduce rolling file writer to write one record each 
time for append-only table.
 Key: FLINK-27517
 URL: https://issues.apache.org/jira/browse/FLINK-27517
 Project: Flink
  Issue Type: Sub-task
Reporter: Zheng Hu


Currently,  we table store has introduced a `RollingFile` to write an iterator 
of rows into the underlying files.  That's suitable for the memory store flush 
processing, but for append-only table, it usually don't have any memory store 
to cache those branch of rows temporarily, the idea approach is writing one 
record into the columnar writer each time, and the columnar writer will cache 
them  and write into the underlying file in one batch.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re:Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)

2022-05-05 Thread Mang Zhang

Hi, Yuxia
Thanks for your reply!
About the question 1, we will not support, FLIP-218[1] is to simplify the 
complexity of user DDL and make it easier for users to use. I have never 
encountered this case in a big data.
About the question 2, we will provide a public API like below public void 
cleanUp();

  Regarding the mechanism of cleanUp, people who are familiar with the 
runtime module need to provide professional advice, which is what we need to 
focus on.










--

Best regards,
Mang Zhang





At 2022-04-29 17:00:03, "yuxia"  wrote:
>Thanks for for driving this work, it's to be a useful feature.
>About the flip-218, I have some questions.
>
>1: Does our CTAS syntax support specify target table's schema including column 
>name and data type? I think it maybe a useful fature in case we want to change 
>the data types in target table instead of always copy the source table's 
>schema. It'll be more flexible with this feature.
>Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this feature.
>
>2: Seems it'll requre sink to implement an public interface to drop table, so 
>what's the interface will look like? 
>
>[1] https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html
>
>Best regards,
>Yuxia
>
>- 原始邮件 -
>发件人: "Mang Zhang" 
>收件人: "dev" 
>发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24
>主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
>
>Hi, everyone
>
>
>I would like to open a discussion for support select clause in CREATE 
>TABLE(CTAS),
>With the development of business and the enhancement of flink sql 
>capabilities, queries become more and more complex.
>Now the user needs to use the Create Table statement to create the target 
>table first, and then execute the insert statement.
>However, the target table may have many columns, which will bring a lot of 
>work outside the business logic to the user.
>At the same time, ensure that the schema of the created target table is 
>consistent with the schema of the query result.
>Using a CTAS syntax like Hive/Spark can greatly facilitate the user.
>
>
>
>You can find more details in FLIP-218[1]. Looking forward to your feedback.
>
>
>
>[1] 
>https://cwiki.apache.org/confluence/display/FLINK/FLIP-218%3A+Support+SELECT+clause+in+CREATE+TABLE(CTAS)
>
>
>
>
>--
>
>Best regards,
>Mang Zhang

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread huweihua

Congratulations Yang!

Best,
Weihua

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Leonard Xu

Congratulations Yang!

Best,
Leonard
> 2022年5月5日 下午10:36，Zhu Zhu  写道：
> 
> Congratulations, Yang!
> 
> Thanks,
> Zhu
> 
> Weiqing Yang  于2022年5月5日周四 22:28写道：
>> 
>> Congratulations Yang!
>> 
>> Best,
>> Weiqing
>> 
>> On Thu, May 5, 2022 at 4:18 AM Xintong Song  wrote:
>> 
>>> Hi all,
>>> 
>>> I'm very happy to announce that Yang Wang has joined the Flink PMC!
>>> 
>>> Yang has been consistently contributing to our community, by contributing
>>> codes, participating in discussions, mentoring new contributors, answering
>>> questions on mailing lists, and giving talks on Flink at
>>> various conferences and events. He is one of the main contributors and
>>> maintainers in Flink's Native Kubernetes / Yarn integrations and the Flink
>>> Kubernetes Operator.
>>> 
>>> Congratulations and welcome, Yang!
>>> 
>>> Thank you~
>>> 
>>> Xintong Song (On behalf of the Apache Flink PMC)
>>>

[DISCUSS] FLIP-229: Introduces Join Hint for Flink SQL Batch Job

2022-05-05 Thread Xuyang

Hi, all.
I want to start a discussion about the FLIP-229: Introduces Join Hint for 
Flink SQL Batch Job(The cwiki[1] is not ready completely but you can see the 
whole details in docs[2]).
Join Hint is a common solution in many popular computing engines and DBs to 
improve the shortcomings of the optimizer by intervening in optimizing the 
plan. By Join Hint, users can intervene in the selection of the join strategy 
in optimizer, and manually optimize the execution plan to improve the 
performance of the query.
In this FLIP, we propose some join hints by the existing join strategies in 
Flink SQL for Batch job.
I'm look forward to your feedback about FLIP-229.




--

Best！
Xuyang


[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job
[2] 
https://docs.google.com/document/d/1IL00ME0Z0nlXGDWTUPODobVQMAm94PAPr9pw9EdGkoQ/edit?usp=sharing

[jira] [Created] (FLINK-27516) The config execution.attached doesn't take effect because it is override by Cli param

2022-05-05 Thread Liu (Jira)

Liu created FLINK-27516:
---

 Summary: The config execution.attached doesn't take effect because 
it is override by Cli param
 Key: FLINK-27516
 URL: https://issues.apache.org/jira/browse/FLINK-27516
 Project: Flink
  Issue Type: Improvement
  Components: Client / Job Submission
Affects Versions: 1.16.0
Reporter: Liu


The config execution.attached's default value is false. But no matter what 
value we set, it take no effect. After digging in, we find that it is only 
affected by Cli param as following:
 # If we don't specify -d or -yd, the member detachedMode in ProgramOptions is 
set to false.
 # In method applyToConfiguration, the execution.attached is set true.
 # No matter what value is set to execution.attached, it take no effect.

If -d or -yd is not set, we should use the config execution.attached. Since the 
actual attach mode is using for a long time, we may need to change 
execution.attached's default value to true after the modification.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread godfrey he

Congratulations, Yang!

Best,
Godfrey

Yangze Guo  于2022年5月6日周五 10:17写道：
>
> Congratulations Yang!
>
> Best,
> Yangze Guo
>
> On Fri, May 6, 2022 at 10:11 AM Forward Xu  wrote:
> >
> > Congratulations, Yang!
> >
> >
> > Best,
> >
> > Forward
> >
> > Jingsong Li  于2022年5月6日周五 10:07写道：
> >
> > > Congratulations, Yang!
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Fri, May 6, 2022 at 10:04 AM yuxia  wrote:
> > > >
> > > > Congratulations, Yang!
> > > >
> > > > Best regards,
> > > > Yuxia
> > > >
> > > > - 原始邮件 -
> > > > 发件人: "Zhu Zhu" 
> > > > 收件人: "dev" 
> > > > 抄送: "Yang Wang" 
> > > > 发送时间: 星期四, 2022年 5 月 05日 下午 10:36:19
> > > > 主题: Re: [ANNOUNCE] New Flink PMC member: Yang Wang
> > > >
> > > > Congratulations, Yang!
> > > >
> > > > Thanks,
> > > > Zhu
> > > >
> > > > Weiqing Yang  于2022年5月5日周四 22:28写道：
> > > > >
> > > > > Congratulations Yang!
> > > > >
> > > > > Best,
> > > > > Weiqing
> > > > >
> > > > > On Thu, May 5, 2022 at 4:18 AM Xintong Song 
> > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I'm very happy to announce that Yang Wang has joined the Flink PMC!
> > > > > >
> > > > > > Yang has been consistently contributing to our community, by
> > > contributing
> > > > > > codes, participating in discussions, mentoring new contributors,
> > > answering
> > > > > > questions on mailing lists, and giving talks on Flink at
> > > > > > various conferences and events. He is one of the main contributors
> > > and
> > > > > > maintainers in Flink's Native Kubernetes / Yarn integrations and the
> > > Flink
> > > > > > Kubernetes Operator.
> > > > > >
> > > > > > Congratulations and welcome, Yang!
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song (On behalf of the Apache Flink PMC)
> > > > > >
> > >

[jira] [Created] (FLINK-27515) Support 'ALTER TABLE ... COMPACT' in TableStore

2022-05-05 Thread Jane Chan (Jira)

Jane Chan created FLINK-27515:
-

 Summary: Support 'ALTER TABLE ... COMPACT' in TableStore
 Key: FLINK-27515
 URL: https://issues.apache.org/jira/browse/FLINK-27515
 Project: Flink
  Issue Type: New Feature
  Components: Table Store
Reporter: Jane Chan


Implement 'ALTER TABLE ... COMPACT'[1] in TableStore. 

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-188%3A+Introduce+Built-in+Dynamic+Table+Storage



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Emitting metrics from Flink SQL LookupTableSource

2022-05-05 Thread santhosh venkat

Hi,

We are trying to develop Flink SQL connectors in my company for proprietary
data-stores. One problem we observed is that the Flink-SQL
LookupTablesource/LookupFunction does not seem to have capabilities to emit
any metrics(i.e there is no metric group wired into either through
LookupSourceContext or DynamicSourceContext). It would be great to expose
latency and throughput metrics from these table sinks for monitoring.

I looked at the existing lookuptablesource implementations in open source
Flink. I noticed that none of them were emitting any metrics.  Does such a
capability exist? Please let me know if I'm missing something.

Thanks.

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Yangze Guo

Congratulations Yang!

Best,
Yangze Guo

On Fri, May 6, 2022 at 10:11 AM Forward Xu  wrote:
>
> Congratulations, Yang!
>
>
> Best,
>
> Forward
>
> Jingsong Li  于2022年5月6日周五 10:07写道：
>
> > Congratulations, Yang!
> >
> > Best,
> > Jingsong
> >
> > On Fri, May 6, 2022 at 10:04 AM yuxia  wrote:
> > >
> > > Congratulations, Yang!
> > >
> > > Best regards,
> > > Yuxia
> > >
> > > - 原始邮件 -
> > > 发件人: "Zhu Zhu" 
> > > 收件人: "dev" 
> > > 抄送: "Yang Wang" 
> > > 发送时间: 星期四, 2022年 5 月 05日 下午 10:36:19
> > > 主题: Re: [ANNOUNCE] New Flink PMC member: Yang Wang
> > >
> > > Congratulations, Yang!
> > >
> > > Thanks,
> > > Zhu
> > >
> > > Weiqing Yang  于2022年5月5日周四 22:28写道：
> > > >
> > > > Congratulations Yang!
> > > >
> > > > Best,
> > > > Weiqing
> > > >
> > > > On Thu, May 5, 2022 at 4:18 AM Xintong Song 
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'm very happy to announce that Yang Wang has joined the Flink PMC!
> > > > >
> > > > > Yang has been consistently contributing to our community, by
> > contributing
> > > > > codes, participating in discussions, mentoring new contributors,
> > answering
> > > > > questions on mailing lists, and giving talks on Flink at
> > > > > various conferences and events. He is one of the main contributors
> > and
> > > > > maintainers in Flink's Native Kubernetes / Yarn integrations and the
> > Flink
> > > > > Kubernetes Operator.
> > > > >
> > > > > Congratulations and welcome, Yang!
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song (On behalf of the Apache Flink PMC)
> > > > >
> >

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Forward Xu

Congratulations, Yang!


Best,

Forward

Jingsong Li  于2022年5月6日周五 10:07写道：

> Congratulations, Yang!
>
> Best,
> Jingsong
>
> On Fri, May 6, 2022 at 10:04 AM yuxia  wrote:
> >
> > Congratulations, Yang!
> >
> > Best regards,
> > Yuxia
> >
> > - 原始邮件 -
> > 发件人: "Zhu Zhu" 
> > 收件人: "dev" 
> > 抄送: "Yang Wang" 
> > 发送时间: 星期四, 2022年 5 月 05日 下午 10:36:19
> > 主题: Re: [ANNOUNCE] New Flink PMC member: Yang Wang
> >
> > Congratulations, Yang!
> >
> > Thanks,
> > Zhu
> >
> > Weiqing Yang  于2022年5月5日周四 22:28写道：
> > >
> > > Congratulations Yang!
> > >
> > > Best,
> > > Weiqing
> > >
> > > On Thu, May 5, 2022 at 4:18 AM Xintong Song 
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'm very happy to announce that Yang Wang has joined the Flink PMC!
> > > >
> > > > Yang has been consistently contributing to our community, by
> contributing
> > > > codes, participating in discussions, mentoring new contributors,
> answering
> > > > questions on mailing lists, and giving talks on Flink at
> > > > various conferences and events. He is one of the main contributors
> and
> > > > maintainers in Flink's Native Kubernetes / Yarn integrations and the
> Flink
> > > > Kubernetes Operator.
> > > >
> > > > Congratulations and welcome, Yang!
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song (On behalf of the Apache Flink PMC)
> > > >
>

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Jingsong Li

Congratulations, Yang!

Best,
Jingsong

On Fri, May 6, 2022 at 10:04 AM yuxia  wrote:
>
> Congratulations, Yang!
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Zhu Zhu" 
> 收件人: "dev" 
> 抄送: "Yang Wang" 
> 发送时间: 星期四, 2022年 5 月 05日 下午 10:36:19
> 主题: Re: [ANNOUNCE] New Flink PMC member: Yang Wang
>
> Congratulations, Yang!
>
> Thanks,
> Zhu
>
> Weiqing Yang  于2022年5月5日周四 22:28写道：
> >
> > Congratulations Yang!
> >
> > Best,
> > Weiqing
> >
> > On Thu, May 5, 2022 at 4:18 AM Xintong Song  wrote:
> >
> > > Hi all,
> > >
> > > I'm very happy to announce that Yang Wang has joined the Flink PMC!
> > >
> > > Yang has been consistently contributing to our community, by contributing
> > > codes, participating in discussions, mentoring new contributors, answering
> > > questions on mailing lists, and giving talks on Flink at
> > > various conferences and events. He is one of the main contributors and
> > > maintainers in Flink's Native Kubernetes / Yarn integrations and the Flink
> > > Kubernetes Operator.
> > >
> > > Congratulations and welcome, Yang!
> > >
> > > Thank you~
> > >
> > > Xintong Song (On behalf of the Apache Flink PMC)
> > >

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread yuxia

Congratulations, Yang!

Best regards,
Yuxia

- 原始邮件 -
发件人: "Zhu Zhu" 
收件人: "dev" 
抄送: "Yang Wang" 
发送时间: 星期四, 2022年 5 月 05日 下午 10:36:19
主题: Re: [ANNOUNCE] New Flink PMC member: Yang Wang

Congratulations, Yang!

Thanks,
Zhu

Weiqing Yang  于2022年5月5日周四 22:28写道：
>
> Congratulations Yang!
>
> Best,
> Weiqing
>
> On Thu, May 5, 2022 at 4:18 AM Xintong Song  wrote:
>
> > Hi all,
> >
> > I'm very happy to announce that Yang Wang has joined the Flink PMC!
> >
> > Yang has been consistently contributing to our community, by contributing
> > codes, participating in discussions, mentoring new contributors, answering
> > questions on mailing lists, and giving talks on Flink at
> > various conferences and events. He is one of the main contributors and
> > maintainers in Flink's Native Kubernetes / Yarn integrations and the Flink
> > Kubernetes Operator.
> >
> > Congratulations and welcome, Yang!
> >
> > Thank you~
> >
> > Xintong Song (On behalf of the Apache Flink PMC)
> >

Re: [DISCUSS] Keep the Community Member Information Up-to-Date

2022-05-05 Thread Xintong Song

FYI, I've created FLINK-27514 to add the notice and link.

https://issues.apache.org/jira/browse/FLINK-27514

Thank you~

Xintong Song



On Thu, May 5, 2022 at 2:51 PM Xintong Song  wrote:

> Thanks Tison and Yun for sharing your opinions.
>
> @Tison,
> Keep both a potential out-of-date table and a link to the up-to-date list
> sounds good to me. We can also add a notice about the page being
> potentially out-of-date.
>
> @Yun,
> - I'm not sure about using someone's github avatar *by default*. I think
> we should at least get permission from that person, which IMO makes little
> difference than asking that person to update the website.
> - Adding a reminder in the invitation letter sounds good. However, I'm not
> sure whether we have such a place holding the template of an
> invitation letter. It seems to me most PMC members would just randomly pick
> a previous invitation letter from the mailing list archive and replace the
> name.
> - Adding a notice on the website definitely makes sense. We can also link
> to the up-to-date list in that notice.
>
>
> Thank you~
>
> Xintong Song
>
>
>
> On Fri, Apr 29, 2022 at 2:59 PM Yun Tang  wrote:
>
>> Hi Xintong,
>>
>>
>> +1 to add a link for the full member information.
>>
>> And I think the avatars are very friendly for website viewers. The lazy
>> update might be caused by the invitation letter to new committer/PMC do not
>> have such hint to tell them to update the website.
>> If we can:
>>
>>   1.  Manually update the information on the website if someone
>> voluntary, we can use the avatars from the github account by default.
>>   2.  Add such reminder information in the invitation letters.
>>   3.  Add descriptions in the website to tell viewers that the
>> information might not be up-to-date.
>>
>> I think this looks like a better solution.
>>
>> Best
>> Yun Tang
>> 
>> From: tison 
>> Sent: Thursday, April 28, 2022 21:52
>> To: dev 
>> Subject: Re: [DISCUSS] Keep the Community Member Information Up-to-Date
>>
>> Hi Xintong,
>>
>> Thanks for starting this discussion.
>>
>> +0 to replace the information with link to
>> https://projects.apache.org/committee.html?flink.
>> +1 to add such a link.
>>
>> My opinion is that we the community doesn't have to keep the page up to
>> date since Apache has a member page[1]
>> that isn't up to date also.
>>
>> We can add one line to redirect to the whole list so that those who are
>> "lazy" to add themselves on the page
>> don't have to do it. And keep the table so that those who are proud to
>> announce their membership or trying a commit
>> with their commit access can do.
>>
>> Best,
>> tison.
>>
>> [1] https://www.apache.org/foundation/members.html
>>
>>
>> Xintong Song  于2022年4月28日周四 21:26写道：
>>
>> > >
>> > > Personally I'm tempted to just link to
>> > > https://projects.apache.org/committee.html?flink, if at all.
>> > >
>> >
>> > Despite its fashionless look, I'm thinking the same thing...
>> >
>> >
>> > Thank you~
>> >
>> > Xintong Song
>> >
>> >
>> >
>> > On Thu, Apr 28, 2022 at 8:41 PM Jingsong Li 
>> > wrote:
>> >
>> > > One value is that this page has avatars. :-)
>> > >
>> > > Best,
>> > > Jingsong
>> > >
>> > > On Thu, Apr 28, 2022 at 8:27 PM Chesnay Schepler 
>> > > wrote:
>> > > >
>> > > > Personally I'm tempted to just link to
>> > > > https://projects.apache.org/committee.html?flink, if at all.
>> > > >
>> > > > I'm not sure overall whether this listing really provides value in
>> the
>> > > > first place.
>> > > >
>> > > > On 28/04/2022 13:58, Xintong Song wrote:
>> > > > > Hi Flink Committers & PMC members,
>> > > > >
>> > > > > I just noticed that the list of community members on our website
>> [1]
>> > is
>> > > > > quite out-of-date. According to the ASF roster [2], this project
>> > > currently
>> > > > > has 87 committers, 39 of which are PMC members. However, there's
>> only
>> > > 62
>> > > > > people listed on our website, and many (e.g., me) have outdated
>> > roles.
>> > > > >
>> > > > > I believe the list on the website is supposed to be updated by
>> each
>> > new
>> > > > > committer / PMC member. I remember reading somewhere that
>> suggested
>> > new
>> > > > > committers to add themselves to this list as the first trying-out
>> for
>> > > > > merging changes. Unfortunately I couldn't find it anymore.
>> > > > >
>> > > > > Do you think we should keep the page manually updated, or shall we
>> > > > > investigate some way to keep it automatically synchronized?
>> > > > >
>> > > > > Thank you~
>> > > > >
>> > > > > Xintong Song
>> > > > >
>> > > > >
>> > > > > [1] https://flink.apache.org/community.html
>> > > > >
>> > > > > [2] https://whimsy.apache.org/roster/committee/flink
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Source alignment for Iceberg

2022-05-05 Thread Becket Qin

I think the key point here is essentially what information should Flink
expose to the user pluggables. Apparently split / local task watermark is
something many user pluggables would be interested in. Right now it is
calculated by the Flink framework but not exposed to the users space, i.e.
SourceReader / SplitEnumerator. So it looks at least we can offer this
information in some way so users can leverage that information to do
things.

That said, I am not sure if this would help in the Iceberg alignment case.
Because at this point, FLIP-182 reports source reader watermarks
periodically, which may not align with the RequestSplitEvent. So if we
really want to leverage the FLIP-182 mechanism here, I see a few ways, just
to name two of them:
1. we can expose the source reader watermark in the SourceReaderContext, so
the source readers can put the local watermark in a custom operator event.
This will effectively bypass the existing RequestSplitEvent. Or we can also
extend the RequestSplitEvent to add an additional info field of byte[]
type, so users can piggy-back additional information there, be it watermark
or other stuff.
2. Simply piggy-back the local watermark in the RequestSplitEvent and pass
that info to the SplitEnumerator as well.

If we are going to do this, personally I'd prefer the first way, as it
provides a mechanism to allow future extension. So it would be easier to
expose other framework information to the user space in the future.

Thanks,

Jiangjie (Becket) Qin

On Fri, May 6, 2022 at 6:15 AM Thomas Weise  wrote:

> On Wed, May 4, 2022 at 11:03 AM Steven Wu  wrote:
> > Any opinion on different timestamp for source alignment (vs Flink
> application watermark)? For Iceberg source, we might want to enforce
> alignment on kafka timestamp but Flink application watermark may use event
> time field from payload.
>
> I imagine that more generally the question is alignment based on the
> iceberg partition/file metadata vs. individual rows? I think that
> should work as long as there is a guarantee for out of orderness
> within the split?
>
> Thomas
>
> >
> > Thanks,
> > Steven
> >
> > On Wed, May 4, 2022 at 7:02 AM Becket Qin  wrote:
> >>
> >> Hey Piotr,
> >>
> >> I think the mechanism FLIP-182 provided is a reasonable default one,
> which
> >> ensures the watermarks are only drifted by an upper bound. However,
> >> admittedly there are also other strategies for different purposes.
> >>
> >> In the Iceberg case, I am not sure if a static strictly allowed
> watermark
> >> drift is desired. The source might just want to finish reading the
> assigned
> >> splits as fast as possible. And it is OK to have a drift of "one split",
> >> instead of a fixed time period.
> >>
> >> As another example, if there are some fast readers whose splits are
> always
> >> throttled, while the other slow readers are struggling to keep up with
> the
> >> rest of the splits, the split enumerator may decide to reassign the slow
> >> splits so all the readers have something to read. This would need the
> >> SplitEnumerator to be aware of the watermark progress on each reader.
> So it
> >> seems useful to expose the WatermarkAlignmentEvent information to the
> >> SplitEnumerator as well.
> >>
> >> Thanks,
> >>
> >> Jiangjie (Becket) Qin
> >>
> >>
> >>
> >> On Tue, May 3, 2022 at 7:58 PM Piotr Nowojski 
> wrote:
> >>
> >> > Hi Steven,
> >> >
> >> > Isn't this redundant to FLIP-182 and FLIP-217? Can not Iceberg just
> emit
> >> > all splits and let FLIP-182/FLIP-217 handle the watermark alignment
> and
> >> > block the splits that are too much into the future? I can see this
> being an
> >> > issue if the existence of too many blocked splits is occupying too
> many
> >> > resources.
> >> >
> >> > If that's the case, indeed SourceCoordinator/SplitEnumerator would
> have to
> >> > decide on some basis how many and which splits to assign in what
> order. But
> >> > in that case I'm not sure how much you could use from FLIP-182 and
> >> > FLIP-217. They seem somehow orthogonal to me, operating on different
> >> > levels. FLIP-182 and FLIP-217 are working with whatever splits have
> already
> >> > been generated and assigned. You could leverage FLIP-182 and FLIP-217
> and
> >> > take care of only the problem to limit the number of parallel active
> >> > splits. And here I'm not sure if it would be worth generalising a
> solution
> >> > across different connectors.
> >> >
> >> > Regarding the global watermark, I made a related comment sometime ago
> >> > about it [1]. It sounds to me like you also need to solve this
> problem,
> >> > otherwise Iceberg users will encounter late records in case of some
> race
> >> > conditions between assigning new splits and completions of older.
> >> >
> >> > Best,
> >> > Piotrek
> >> >
> >> > [1]
> >> >
> https://issues.apache.org/jira/browse/FLINK-21871?focusedCommentId=17495545=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17495545
> >> >
> >> > pon., 2 maj 2022 o

[jira] [Created] (FLINK-27514) Website links to up-to-date committer and PMC member lists

2022-05-05 Thread Xintong Song (Jira)

Xintong Song created FLINK-27514:


 Summary: Website links to up-to-date committer and PMC member lists
 Key: FLINK-27514
 URL: https://issues.apache.org/jira/browse/FLINK-27514
 Project: Flink
  Issue Type: Improvement
  Components: Project Website
Reporter: Xintong Song
Assignee: Xintong Song


According to the [ML 
discussion|https://lists.apache.org/thread/679ds6lfqs8f4q8lnt7tnlofl58str4y], 
we are going to add a link to the up-to-date [committer and PMC member 
list|https://projects.apache.org/committee.html?flink] from the community page 
of our project website, as well as a notice about the current list could be 
outdated.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [DISCUSS] FLIP-91: Support SQL Client Gateway

2022-05-05 Thread Jingsong Li

Thanks Shengkai for driving.  And all for your discussion.

> The reason why we introduce the gateway with pluggable endpoints is that many 
> users has their preferences. For example, the HiveServer2 users prefer to use 
> the gateway with HiveServer2-style API, which has numerous tools. However, 
> some filnk-native users may prefer to use the REST API. Therefore, we hope to 
> learn from the Kyuubi's design that expose multiple endpoints with different 
> API that allow the user to use.

My understanding is that we need multiple endpoints, But I don't quite
understand why we need both the rest api and the SQLGatewayService
API, maybe I'm missing something, what's the difference between them?
Is it possible to use one set of rest api to solve all the problems?

> Gateway to support multiple Flink versions

I think this is a good question to consider.
- First of all, I think it is absolutely impossible for gateway to
support multiple versions of Flink under the current architecture,
because gateway relies on Flink SQL and a lot of SQL compiled and
optimized code is bound to the Flink version.
- The other way is that gateway does not rely on Flink SQL, and each
time a different version of Flink Jar is loaded to compile the job at
once, and frankly speaking, stream jobs actually prefer this model.

The benefit of gateway support for multiple versions is that it's
really more user-friendly. I've seen cases where users must have
multiple versions existing in a cluster, and if each version needs to
run a gateway, the O burden will be heavy.

> I don't think that the Gateway is a 'core' function of Flink which should be 
> included with Flink.

First, I think the Gateway is a 'core' function in Flink.
Why?
I think our architecture should be consistent, which means that Flink
sql-client should use the implementation of gateway, which means that
sql-client depends on gateway.
And sql-client is the basic tool of flink sql, it must exist in flink
repository, otherwise flink sql has no most important entrance.
So, the gateway itself should be our core functionality as well.

Best,
Jingsong

On Thu, May 5, 2022 at 10:06 PM Jark Wu  wrote:
>
> Hi Martijn,
>
> Regarding maintaining Gateway inside or outside Flink code base,
> I would like to share my thoughts:
>
> > I would like to understand why it's complicated to make the upgrades
> problematic. Is it because of relying on internal interfaces? If so, should
> we not consider making them public?
>
> It's not about internal interfaces. Flink itself doesn't provide backward
> compatibility for public APIs.
>
>
> > a) it will not be possible to have separate releases of the Gateway,
> they will be tied to individual Flink releases
> I don't think it's a problem. On the contrary, maintaining a separate repo
> for Gateway will take a lot of
> extra community efforts, e.g., individual CICD, docs, releases.
>
>
> > b) if you want the Gateway to support multiple Flink versions
> Sorry, I don't see any users requesting this feature for such a long time
> for SQL Gateway.
> Users can build services on Gateway to easily support multi Flink versions
> (a Gateway for a Flink version).
> It's difficult for Gateway to support multi-version because Flink doesn't
> provide an API that supports backward and forward compatibility.
> If Gateway wants to support multi-version, it has to invent an
> inner-gateway for each version, and Gateway act as a proxy to communicate
> with inner-gateway.
> So you have to have a gateway to couple with the Flink version.
>
> In fact, Gateway is the layer to support multi Flink versions for
> higher-level applications because its API (REST, gRpc) provides backward
> and forward compatibility.
> The gateway itself doesn't need to support multi Flink versions. Besides,
> Trino/Presto also provides servers[1] for each version.
>
>
> > I don't think that the Gateway is a 'core' function of Flink which should
> be included with Flink.
> Sorry, I can't agree with this. If I remember correctly, Flink SQL has been
> promoted to first-class citizen for a long time.
> The community also aims to make Flink a truly batch-stream unified
> computing platform, and Gateway would be the entry and center of the
> platform.
> From my point of view, Gateway is a very "core" function and must be
> included in Flink to have better cooperation with SQL and provide an
> out-of-box experience.
>
> Best,
> Jark
>
> [1]: https://trino.io/download.html
>
> On Thu, 5 May 2022 at 19:57, godfrey he  wrote:
>
> > Hi Shengkai.
> >
> > Thanks for driving the proposal, it's been silent too long.
> >
> > I have a few questions:
> > about the Architecture
> > > The architecture of the Gateway is in the following graph.
> > Is the TableEnvironment shared for all sessions ?
> >
> > about the REST Endpoint
> > > /v1/sessions
> > Are both local file and remote file supported for `libs` and `jars`?
> > Does sql gateway support upload file?
> >
> >

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Thomas Weise

Thank you to all who contributed for making this release happen!

Thomas

On Thu, May 5, 2022 at 7:41 AM Zhu Zhu  wrote:
>
> Thanks Yun, Till and Joe for the great work and thanks everyone who
> makes this release possible!
>
> Cheers,
> Zhu
>
> Jiangang Liu  于2022年5月5日周四 21:09写道：
> >
> > Congratulations! This version is really helpful for us . We will explore it
> > and help to improve it.
> >
> > Best
> > Jiangang Liu
> >
> > Yu Li  于2022年5月5日周四 18:53写道：
> >
> > > Hurray!
> > >
> > > Thanks Yun Gao, Till and Joe for all the efforts as our release managers.
> > > And thanks all contributors for making this happen!
> > >
> > > Best Regards,
> > > Yu
> > >
> > >
> > > On Thu, 5 May 2022 at 18:01, Sergey Nuyanzin  wrote:
> > >
> > > > Great news!
> > > > Congratulations!
> > > > Thanks to the release managers, and everyone involved.
> > > >
> > > > On Thu, May 5, 2022 at 11:57 AM godfrey he  wrote:
> > > >
> > > > > Congratulations~
> > > > >
> > > > > Thanks Yun, Till and Joe for driving this release
> > > > > and everyone who made this release happen.
> > > > >
> > > > > Best,
> > > > > Godfrey
> > > > >
> > > > > Becket Qin  于2022年5月5日周四 17:39写道：
> > > > > >
> > > > > > Hooray! Thanks Yun, Till and Joe for driving the release!
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > JIangjie (Becket) Qin
> > > > > >
> > > > > > On Thu, May 5, 2022 at 5:20 PM Timo Walther 
> > > > wrote:
> > > > > >
> > > > > > > It took a bit longer than usual. But I'm sure the users will love
> > > > this
> > > > > > > release.
> > > > > > >
> > > > > > > Big thanks to the release managers!
> > > > > > >
> > > > > > > Timo
> > > > > > >
> > > > > > > Am 05.05.22 um 10:45 schrieb Yuan Mei:
> > > > > > > > Great!
> > > > > > > >
> > > > > > > > Thanks, Yun Gao, Till, and Joe for driving the release, and
> > > thanks
> > > > to
> > > > > > > > everyone for making this release happen!
> > > > > > > >
> > > > > > > > Best
> > > > > > > > Yuan
> > > > > > > >
> > > > > > > > On Thu, May 5, 2022 at 4:40 PM Leonard Xu 
> > > > wrote:
> > > > > > > >
> > > > > > > >> Congratulations!
> > > > > > > >>
> > > > > > > >> Thanks Yun Gao, Till and Joe for the great work as our release
> > > > > manager
> > > > > > > and
> > > > > > > >> everyone who involved.
> > > > > > > >>
> > > > > > > >> Best,
> > > > > > > >> Leonard
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> 2022年5月5日 下午4:30，Yang Wang  写道：
> > > > > > > >>>
> > > > > > > >>> Congratulations!
> > > > > > > >>>
> > > > > > > >>> Thanks Yun Gao, Till and Joe for driving this release and
> > > > everyone
> > > > > who
> > > > > > > >> made
> > > > > > > >>> this release happen.
> > > > > > > >>
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Sergey
> > > >
> > >

Re: Source alignment for Iceberg

2022-05-05 Thread Thomas Weise

On Wed, May 4, 2022 at 11:03 AM Steven Wu  wrote:
> Any opinion on different timestamp for source alignment (vs Flink application 
> watermark)? For Iceberg source, we might want to enforce alignment on kafka 
> timestamp but Flink application watermark may use event time field from 
> payload.

I imagine that more generally the question is alignment based on the
iceberg partition/file metadata vs. individual rows? I think that
should work as long as there is a guarantee for out of orderness
within the split?

Thomas

>
> Thanks,
> Steven
>
> On Wed, May 4, 2022 at 7:02 AM Becket Qin  wrote:
>>
>> Hey Piotr,
>>
>> I think the mechanism FLIP-182 provided is a reasonable default one, which
>> ensures the watermarks are only drifted by an upper bound. However,
>> admittedly there are also other strategies for different purposes.
>>
>> In the Iceberg case, I am not sure if a static strictly allowed watermark
>> drift is desired. The source might just want to finish reading the assigned
>> splits as fast as possible. And it is OK to have a drift of "one split",
>> instead of a fixed time period.
>>
>> As another example, if there are some fast readers whose splits are always
>> throttled, while the other slow readers are struggling to keep up with the
>> rest of the splits, the split enumerator may decide to reassign the slow
>> splits so all the readers have something to read. This would need the
>> SplitEnumerator to be aware of the watermark progress on each reader. So it
>> seems useful to expose the WatermarkAlignmentEvent information to the
>> SplitEnumerator as well.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>>
>>
>> On Tue, May 3, 2022 at 7:58 PM Piotr Nowojski  wrote:
>>
>> > Hi Steven,
>> >
>> > Isn't this redundant to FLIP-182 and FLIP-217? Can not Iceberg just emit
>> > all splits and let FLIP-182/FLIP-217 handle the watermark alignment and
>> > block the splits that are too much into the future? I can see this being an
>> > issue if the existence of too many blocked splits is occupying too many
>> > resources.
>> >
>> > If that's the case, indeed SourceCoordinator/SplitEnumerator would have to
>> > decide on some basis how many and which splits to assign in what order. But
>> > in that case I'm not sure how much you could use from FLIP-182 and
>> > FLIP-217. They seem somehow orthogonal to me, operating on different
>> > levels. FLIP-182 and FLIP-217 are working with whatever splits have already
>> > been generated and assigned. You could leverage FLIP-182 and FLIP-217 and
>> > take care of only the problem to limit the number of parallel active
>> > splits. And here I'm not sure if it would be worth generalising a solution
>> > across different connectors.
>> >
>> > Regarding the global watermark, I made a related comment sometime ago
>> > about it [1]. It sounds to me like you also need to solve this problem,
>> > otherwise Iceberg users will encounter late records in case of some race
>> > conditions between assigning new splits and completions of older.
>> >
>> > Best,
>> > Piotrek
>> >
>> > [1]
>> > https://issues.apache.org/jira/browse/FLINK-21871?focusedCommentId=17495545=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17495545
>> >
>> > pon., 2 maj 2022 o 04:26 Steven Wu  napisał(a):
>> >
>> >> add dev@ group to the thread as Thomas suggested
>> >>
>> >> Arvid,
>> >>
>> >> The scenario 3 (Dynamic assignment + temporary no split) in the FLIP-180
>> >> (idleness) can happen to Iceberg source alignment, as readers can be
>> >> temporarily starved due to the holdback by the enumerator when assigning
>> >> new splits upon request.
>> >>
>> >> Totally agree that we should decouple this discussion with the FLIP-217,
>> >> which addresses the split level watermark alignment problem as a follow-up
>> >> of FLIP-182
>> >>
>> >> Becket,
>> >>
>> >> Yes, currently Iceberg source implemented the alignment leveraging the
>> >> dynamic split assignment from FLIP-27 design. Basically, the enumerator
>> >> can
>> >> hold back split assignments to readers when necessary. Everything are
>> >> centralized in the enumerator: (1) watermark extraction and aggregation
>> >> (2)
>> >> alignment decision and execution
>> >>
>> >> The motivation of this discussion is to see if Iceberg source can leverage
>> >> some of the watermark alignment solutions (like FLIP-182) from Flink
>> >> framework. E.g., as mentioned in the doc, Iceberg source can potentially
>> >> leverage the FLIP-182 framework to do the watermark extraction and
>> >> aggregation. For the alignment decision and execution, we can keep them in
>> >> the centralized enumerator.
>> >>
>> >> Thanks,
>> >> Steven
>> >>
>> >> On Thu, Apr 28, 2022 at 2:05 AM Becket Qin  wrote:
>> >>
>> >> > Hi Steven,
>> >> >
>> >> > Thanks for pulling me into this thread. I think the timestamp
>> >> > alignment use case here is a good example of what FLIP-27 was designed
>> >> for.
>> >> >
>> >> > Technically speaking, Iceberg source can already

Re: Source alignment for Iceberg

2022-05-05 Thread Thomas Weise

It seems that the iceberg source benefits from performing the
alignment in the enumerator and holding back the splits until they
actually can be processed. That is probably true for any similar
source that assigns work in smaller increments as centralizing the
"ready to process" decision in the enumerator allows for further
optimizations. Maybe there is an opportunity to come up with building
blocks for enumerators that benefit similar sources? It's probably
less compelling to sacrifice that for what appears to be limited reuse
of reader functionality?

Thomas


On Thu, May 5, 2022 at 12:55 PM Piotr Nowojski  wrote:
>
> Hi Steven,
>
> Ok, thanks for the clarification. I'm not sure how much could be leveraged? 
> Maybe just re-using the watermark alignment configuration? Please correct me 
> if I'm wrong, but I think for the sole purpose of this use case, I don't see 
> a good motivation behind expanding our APIs. Clearly this feature can be 
> implemented already (you did it in the Iceberg connector after all).
>
> > As another example, if there are some fast readers whose splits are always 
> > throttled, while the other slow readers
> > are struggling to keep up with the rest of the splits, the split enumerator 
> > may decide to reassign the slow
> > splits so all the readers have something to read. This would need the 
> > SplitEnumerator to be aware of the
> > watermark progress on each reader. So it seems useful to expose the 
> > WatermarkAlignmentEvent information to the
> > SplitEnumerator as well.
>
> It seems like a valid potential use case. But do we have a good enough 
> motivation to work on it right now?
>
> Piotrek
>
> czw., 5 maj 2022 o 16:21 Steven Wu  napisał(a):
>>
>> Piotr,
>>
>> With FLIP-27, Iceberg source already implemented alignment by tracking
>> watermark and holding back split assignment when necessary.
>>
>> The purpose of this discussion is to see if Iceberg source can leverage
>> some of the watermark alignment work from Flink framework.
>>
>> Thanks,
>> Steven
>>
>> On Thu, May 5, 2022 at 1:10 AM Piotr Nowojski  wrote:
>>
>> > Ok, I see. Thanks to both of you for the explanation.
>> >
>> > Do we need changes to Apache Flink for this feature? Can it be implemented
>> > in the Sources without changes in the framework? I presume source can
>> > access min/max watermark from the split, so as long as it also knows
>> > exactly which splits have finished, it would know which splits to hold 
>> > back.
>> >
>> > Best,
>> > Piotrek
>> >
>> > śr., 4 maj 2022 o 20:03 Steven Wu  napisał(a):
>> >
>> >> Piotr, thanks a lot for your feedback.
>> >>
>> >> > I can see this being an issue if the existence of too many blocked
>> >> splits is occupying too many resources.
>> >>
>> >> This is not desirable. Eagerly assigning many splits to a reader can
>> >> defeat the benefits of pull based dynamic split assignments. Iceberg
>> >> readers request one split at a time upon start or completion of a split.
>> >> Dynamic split assignment is better for work sharing/stealing as Becket
>> >> mentioned. Limiting number of active splits can be handled by the FLIP-27
>> >> Iceberg source and is somewhat orthogonal to watermark alignment.
>> >>
>> >> > Can not Iceberg just emit all splits and let FLIP-182/FLIP-217 handle
>> >> the watermark alignment and block the splits that are too much into the
>> >> future?
>> >>
>> >> The enumerator just assigns the next split to the requesting reader
>> >> instead of holding back the split assignment. Let the reader handle the
>> >> pause (if the file split requires alignment wait).  This strategy might
>> >> work and leverage more from the framework.
>> >>
>> >> We probably need the following to make this work
>> >> * extract watermark/timestamp only at the completion of a split (not at
>> >> record level). Because records in a file aren't probably not sorted by the
>> >> timestamp field, the pause or watermark advancement is probably better 
>> >> done
>> >> at file level.
>> >> * source readers checkpoint the watermark. otherwise, upon restart
>> >> readers won't be able to determine the local watermark and pause for
>> >> alignment. We don't want to emit records upon restart due to unknown
>> >> watermark info.
>> >>
>> >> All,
>> >>
>> >> Any opinion on different timestamp for source alignment (vs Flink
>> >> application watermark)? For Iceberg source, we might want to enforce
>> >> alignment on kafka timestamp but Flink application watermark may use event
>> >> time field from payload.
>> >>
>> >> Thanks,
>> >> Steven
>> >>
>> >> On Wed, May 4, 2022 at 7:02 AM Becket Qin  wrote:
>> >>
>> >>> Hey Piotr,
>> >>>
>> >>> I think the mechanism FLIP-182 provided is a reasonable default one,
>> >>> which
>> >>> ensures the watermarks are only drifted by an upper bound. However,
>> >>> admittedly there are also other strategies for different purposes.
>> >>>
>> >>> In the Iceberg case, I am not sure if a static strictly allowed watermark
>> >>> drift is desired. The

Re: Source alignment for Iceberg

2022-05-05 Thread Piotr Nowojski

Hi Steven,

Ok, thanks for the clarification. I'm not sure how much could be leveraged?
Maybe just re-using the watermark alignment configuration? Please correct
me if I'm wrong, but I think for the sole purpose of this use case, I don't
see a good motivation behind expanding our APIs. Clearly this feature can
be implemented already (you did it in the Iceberg connector after all).

> As another example, if there are some fast readers whose splits are
always throttled, while the other slow readers
> are struggling to keep up with the rest of the splits, the split
enumerator may decide to reassign the slow
> splits so all the readers have something to read. This would need the
SplitEnumerator to be aware of the
> watermark progress on each reader. So it seems useful to expose the
WatermarkAlignmentEvent information to the
> SplitEnumerator as well.

It seems like a valid potential use case. But do we have a good enough
motivation to work on it right now?

Piotrek

czw., 5 maj 2022 o 16:21 Steven Wu  napisał(a):

> Piotr,
>
> With FLIP-27, Iceberg source already implemented alignment by tracking
> watermark and holding back split assignment when necessary.
>
> The purpose of this discussion is to see if Iceberg source can leverage
> some of the watermark alignment work from Flink framework.
>
> Thanks,
> Steven
>
> On Thu, May 5, 2022 at 1:10 AM Piotr Nowojski 
> wrote:
>
> > Ok, I see. Thanks to both of you for the explanation.
> >
> > Do we need changes to Apache Flink for this feature? Can it be
> implemented
> > in the Sources without changes in the framework? I presume source can
> > access min/max watermark from the split, so as long as it also knows
> > exactly which splits have finished, it would know which splits to hold
> back.
> >
> > Best,
> > Piotrek
> >
> > śr., 4 maj 2022 o 20:03 Steven Wu  napisał(a):
> >
> >> Piotr, thanks a lot for your feedback.
> >>
> >> > I can see this being an issue if the existence of too many blocked
> >> splits is occupying too many resources.
> >>
> >> This is not desirable. Eagerly assigning many splits to a reader can
> >> defeat the benefits of pull based dynamic split assignments. Iceberg
> >> readers request one split at a time upon start or completion of a split.
> >> Dynamic split assignment is better for work sharing/stealing as Becket
> >> mentioned. Limiting number of active splits can be handled by the
> FLIP-27
> >> Iceberg source and is somewhat orthogonal to watermark alignment.
> >>
> >> > Can not Iceberg just emit all splits and let FLIP-182/FLIP-217 handle
> >> the watermark alignment and block the splits that are too much into the
> >> future?
> >>
> >> The enumerator just assigns the next split to the requesting reader
> >> instead of holding back the split assignment. Let the reader handle the
> >> pause (if the file split requires alignment wait).  This strategy might
> >> work and leverage more from the framework.
> >>
> >> We probably need the following to make this work
> >> * extract watermark/timestamp only at the completion of a split (not at
> >> record level). Because records in a file aren't probably not sorted by
> the
> >> timestamp field, the pause or watermark advancement is probably better
> done
> >> at file level.
> >> * source readers checkpoint the watermark. otherwise, upon restart
> >> readers won't be able to determine the local watermark and pause for
> >> alignment. We don't want to emit records upon restart due to unknown
> >> watermark info.
> >>
> >> All,
> >>
> >> Any opinion on different timestamp for source alignment (vs Flink
> >> application watermark)? For Iceberg source, we might want to enforce
> >> alignment on kafka timestamp but Flink application watermark may use
> event
> >> time field from payload.
> >>
> >> Thanks,
> >> Steven
> >>
> >> On Wed, May 4, 2022 at 7:02 AM Becket Qin  wrote:
> >>
> >>> Hey Piotr,
> >>>
> >>> I think the mechanism FLIP-182 provided is a reasonable default one,
> >>> which
> >>> ensures the watermarks are only drifted by an upper bound. However,
> >>> admittedly there are also other strategies for different purposes.
> >>>
> >>> In the Iceberg case, I am not sure if a static strictly allowed
> watermark
> >>> drift is desired. The source might just want to finish reading the
> >>> assigned
> >>> splits as fast as possible. And it is OK to have a drift of "one
> split",
> >>> instead of a fixed time period.
> >>>
> >>> As another example, if there are some fast readers whose splits are
> >>> always
> >>> throttled, while the other slow readers are struggling to keep up with
> >>> the
> >>> rest of the splits, the split enumerator may decide to reassign the
> slow
> >>> splits so all the readers have something to read. This would need the
> >>> SplitEnumerator to be aware of the watermark progress on each reader.
> So
> >>> it
> >>> seems useful to expose the WatermarkAlignmentEvent information to the
> >>> SplitEnumerator as well.
> >>>
> >>> Thanks,
> >>>
> >>> Jiangjie (Becket)

Re: [DISCUSS] FLIP-227: Support overdraft buffer

2022-05-05 Thread Piotr Nowojski

Hi fanrui,

> How to identify legacySource?

legacy sources are always using the SourceStreamTask class and
SourceStreamTask is used only for legacy sources. But I'm not sure how to
enable/disable that. Adding `disableOverdraft()` call in SourceStreamTask
would be better compared to relying on the `getAvailableFuture()` call
(isn't it used for back pressure metric anyway?). Ideally we should
enable/disable it in the constructors, but that might be tricky.

> I prefer it to be between 5 and 10

I would vote for a smaller value because of FLINK-13203

Piotrek



czw., 5 maj 2022 o 11:49 rui fan <1996fan...@gmail.com> napisał(a):

> Hi,
>
> Thanks a lot for your discussion.
>
> After several discussions, I think it's clear now. I updated the
> "Proposed Changes" of FLIP-227[1]. If I have something
> missing, please help to add it to FLIP, or add it in the mail
> and I can add it to FLIP. If everything is OK, I will create a
> new JIRA for the first task, and use FLINK-26762[2] as the
> second task.
>
> About the legacy source, do we set maxOverdraftBuffersPerGate=0
> directly? How to identify legacySource? Or could we add
> the overdraftEnabled in LocalBufferPool? The default value
> is false. If the getAvailableFuture is called, change
> overdraftEnabled=true.
> It indicates whether there are checks isAvailable elsewhere.
> It might be more general, it can cover more cases.
>
> Also, I think the default value of 'max-overdraft-buffers-per-gate'
> needs to be confirmed. I prefer it to be between 5 and 10. How
> do you think?
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-227%3A+Support+overdraft+buffer
> [2] https://issues.apache.org/jira/browse/FLINK-26762
>
> Thanks
> fanrui
>
> On Thu, May 5, 2022 at 4:41 PM Piotr Nowojski 
> wrote:
>
>> Hi again,
>>
>> After sleeping over this, if both versions (reserve and overdraft) have
>> the same complexity, I would also prefer the overdraft.
>>
>> > `Integer.MAX_VALUE` as default value was my idea as well but now, as
>> > Dawid mentioned, I think it is dangerous since it is too implicit for
>> > the user and if the user submits one more job for the same TaskManger
>>
>> As I mentioned, it's not only an issue with multiple jobs. The same
>> problem can happen with different subtasks from the same job, potentially
>> leading to the FLINK-13203 deadlock [1]. With FLINK-13203 fixed, I would be
>> in favour of Integer.MAX_VALUE to be the default value, but as it is, I
>> think we should indeed play on the safe side and limit it.
>>
>> > I still don't understand how should be limited "reserve" implementation.
>> > I mean if we have X buffers in total and the user sets overdraft equal
>> > to X we obviously can not reserve all buffers, but how many we are
>> > allowed to reserve? Should it be a different configuration like
>> > percentegeForReservedBuffers?
>>
>> The reserve could be defined as percentage, or as a fixed number of
>> buffers. But yes. In normal operation subtask would not use the reserve, as
>> if numberOfAvailableBuffers < reserve, the output would be not available.
>> Only in the flatMap/timers/huge records case the reserve could be used.
>>
>> > 1. If the total buffers of LocalBufferPool <= the reserve buffers, will
>> LocalBufferPool never be available? Can't process data?
>>
>> Of course we would need to make sure that never happens. So the reserve
>> should be < total buffer size.
>>
>> > 2. If the overdraft buffer use the extra buffers, when the downstream
>> > task inputBuffer is insufficient, it should fail to start the job, and
>> then
>> > restart? When the InputBuffer is initialized, it will apply for enough
>> > buffers, right?
>>
>> The failover if downstream can not allocate buffers is already
>> implemented FLINK-14872 [2]. There is a timeout for how long the task is
>> waiting for buffer allocation. However this doesn't prevent many
>> (potentially infinitely many) deadlock/restarts cycles. IMO the propper
>> solution for [1] would be 2b described in the ticket:
>>
>> > 2b. Assign extra buffers only once all of the tasks are RUNNING. This
>> is a simplified version of 2a, without tracking the tasks sink-to-source.
>>
>> But that's a pre-existing problem and I don't think we have to solve it
>> before implementing overdraft. I think we would need to solve it only
>> before setting Integer.MAX_VALUE as the default for the overdraft. Maybe I
>> would hesitate setting the overdraft to anything more then a couple of
>> buffers by default for the same reason.
>>
>> > Actually, I totally agree that we don't need a lot of buffers for
>> overdraft
>>
>> and
>>
>> > Also I agree we ignore the overdraftBuffers=numberOfSubpartitions.
>> > When we finish this feature and after users use it, if users feedback
>> > this issue we can discuss again.
>>
>> +1
>>
>> Piotrek
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-13203
>> [2] https://issues.apache.org/jira/browse/FLINK-14872
>>
>> czw., 5 maj 2022 o 05:52 rui fan

Re: Supporting Collector API in Kinesis Connector

2022-05-05 Thread Danny Cranmer

Hello Blake,

Sorry for the delay, I have posted a few comments to your PR.

Thanks for the contribution!
Danny

On Fri, Apr 15, 2022 at 6:03 PM Blake Wilson 
wrote:

> Great to know! Thanks for the reference.
>
> On Fri, Apr 15, 2022 at 6:15 AM Danny Cranmer 
> wrote:
>
> > Yes, the Flink Kinesis Consumer detects aggregated records and seamlessly
> > de-aggregates them for you [1].
> >
> > Thanks,
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/internals/publisher/RecordBatch.java#L93
> >
> >
> >
> > On Thu, 14 Apr 2022, 23:56 Blake Wilson, 
> wrote:
> >
> > > Thanks for offering to review, Danny.
> > >
> > > Thanks also for pointing out that KCL can de-aggregate records
> aggregated
> > > by KPL. Several applications I've worked on batch multiple records
> > without
> > > using the KPL unfortunately.
> > >
> > > Is de-aggregation supported by the Kinesis Connector Source? I found
> > > mention of aggregation only in the FlinkKinesisProducer when searching
> > > online for this feature.
> > >
> > > On Thu, Apr 14, 2022 at 12:51 AM Danny Cranmer <
> dannycran...@apache.org>
> > > wrote:
> > >
> > > > Just to clarify, the native KCL/KPL aggregation [1] handles the
> > partition
> > > > key rebalancing for you out of the box.
> > > >
> > > >
> > > > [1] https://docs.aws.amazon.com/streams/latest/dev/kinesis
> > > > -kpl-concepts.html#kinesis-kpl-concepts-aggretation
> > > >
> > > > On Thu, Apr 14, 2022 at 8:48 AM Danny Cranmer <
> dannycran...@apache.org
> > >
> > > > wrote:
> > > >
> > > > > Hey Blake,
> > > > >
> > > > > I am happy to take a look, but I will not have capacity until next
> > > week.
> > > > >
> > > > > The current way to achieve multiple records per PUT is to use the
> > > native
> > > > > KCL/KPL aggregation [1], which is supported by the Flink
> connector. A
> > > > > downside of aggregation is that the sender has to manage the
> > > partitioning
> > > > > strategy. For example, each record in your list will be sent to the
> > > same
> > > > > shard. If the sender implements grouping of records by partition
> key,
> > > > then
> > > > > care needs to be taken during shard scaling.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html#kinesis-kpl-concepts-aggretation
> > > > >
> > > > >
> > > > > On Tue, Apr 12, 2022 at 3:52 AM Blake Wilson <
> > bl...@yellowpapersun.com
> > > >
> > > > > wrote:
> > > > >
> > > > >> Hello, I recently submitted a pull request to support the
> Collector
> > > API
> > > > >> for
> > > > >> the Kinesis Streams Connector.
> > > > >>
> > > > >> The ability to use this API would save a great deal of shuttling
> > bytes
> > > > >> around in multiple Flink programs I've worked on. This is because
> to
> > > > >> construct a stream of the desired type without Collector support,
> > the
> > > > >> Kinesis source must emit a List[Type], and this must be flattened
> > to a
> > > > >> Type
> > > > >> stream.
> > > > >>
> > > > >> Because of the way Kinesis pricing works, it rarely makes sense to
> > > send
> > > > >> one
> > > > >> value per Kinesis record. In provisioned mode, Kinesis PUTs are
> > priced
> > > > to
> > > > >> the nearest 25KB (
> > > https://aws.amazon.com/kinesis/data-streams/pricing/
> > > > ),
> > > > >> so
> > > > >> records are more sensibly packed with multiple values unless these
> > > > values
> > > > >> are quite large. Therefore, I suspect the need to handle multiple
> > > values
> > > > >> per Kinesis record is quite common.
> > > > >>
> > > > >> The PR is located at https://github.com/apache/flink/pull/19417,
> > and
> > > > I'd
> > > > >> love to get some feedback on Github or here.
> > > > >>
> > > > >> Thanks!
> > > > >>
> > > > >
> > > >
> > >
> >
>

[jira] [Created] (FLINK-27513) Update table walkthrough playground for 1.15

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27513:
--

 Summary: Update table walkthrough playground for 1.15
 Key: FLINK-27513
 URL: https://issues.apache.org/jira/browse/FLINK-27513
 Project: Flink
  Issue Type: Sub-task
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27512) Update pyflink walkthrough playground for 1.15

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27512:
--

 Summary: Update pyflink walkthrough playground for 1.15
 Key: FLINK-27512
 URL: https://issues.apache.org/jira/browse/FLINK-27512
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training / Exercises
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27511) Update operations playground for 1.15

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27511:
--

 Summary: Update operations playground for 1.15
 Key: FLINK-27511
 URL: https://issues.apache.org/jira/browse/FLINK-27511
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation / Training / Exercises
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27510) update playgrounds for Flink 1.15

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27510:
--

 Summary: update playgrounds for Flink 1.15
 Key: FLINK-27510
 URL: https://issues.apache.org/jira/browse/FLINK-27510
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training / Exercises
Affects Versions: 1.15.0
Reporter: David Anderson


All of the playgrounds should be updated for Flink 1.15. This should include 
reworking the code as necessary to avoid using anything that has been 
deprecated.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27509) Update table walkthrough playground for 1.14

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27509:
--

 Summary: Update table walkthrough playground for 1.14
 Key: FLINK-27509
 URL: https://issues.apache.org/jira/browse/FLINK-27509
 Project: Flink
  Issue Type: Sub-task
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27508) Update pyflink walkthrough playground for 1.14

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27508:
--

 Summary: Update pyflink walkthrough playground for 1.14
 Key: FLINK-27508
 URL: https://issues.apache.org/jira/browse/FLINK-27508
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation / Training / Exercises
Reporter: David Anderson






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27507) Update operations playground for 1.14

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27507:
--

 Summary: Update operations playground for 1.14
 Key: FLINK-27507
 URL: https://issues.apache.org/jira/browse/FLINK-27507
 Project: Flink
  Issue Type: Sub-task
Affects Versions: 1.14.4
Reporter: David Anderson


The operations playground has yet to be updated for 1.14. At this point, it may 
as well be configured to use the latest 1.14.x release.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27506) update playgrounds for Flink 1.14

2022-05-05 Thread David Anderson (Jira)

David Anderson created FLINK-27506:
--

 Summary: update playgrounds for Flink 1.14
 Key: FLINK-27506
 URL: https://issues.apache.org/jira/browse/FLINK-27506
 Project: Flink
  Issue Type: Improvement
  Components: Documentation / Training / Exercises
Affects Versions: 1.14.4
Reporter: David Anderson


All of the flink-playgrounds need to be updated for 1.14.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27505) Add javadoc comments to AsyncSinkBase

2022-05-05 Thread Zichen Liu (Jira)

Zichen Liu created FLINK-27505:
--

 Summary: Add javadoc comments to AsyncSinkBase
 Key: FLINK-27505
 URL: https://issues.apache.org/jira/browse/FLINK-27505
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Common
Affects Versions: 1.15.0
Reporter: Zichen Liu


Currently the javadocs describing each of the parameters on the constructor for 
AsyncSinkBase exist in AsyncSinkBaseBuilder. Since we are not enforcing the use 
of the builder, it makes more sense to have these descriptions in the 
AsyncSinkBase.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [DISCUSS] FLIP-227: Support overdraft buffer

2022-05-05 Thread Anton Kalashnikov

Hi,

Thanks Fanrui, It looks correct for me.

I vote for 5 as 'max-overdraft-buffers-per-gate'.

If I understand correctly, Legacy source can be detected by the operator 
which is an instance of StreamSource and it is also can be detected by 
invokable which is an instance of SourceStreamTask. We create 
ResultPartitionWriters in the constructor of Task and theoretically, we 
already know the invokable type. So we can create ResultPartitionWriters 
with BufferPoolFactory which will produce the correct LocalBufferPool. 
But honestly, it looks a little dirty and I don't actually know what 
type of invokable we have in case of the chain. But roughly, the idea is 
to create LocalBufferPool with/without overdraft based on knowledge of 
operator type.

--

Best regards,
Anton Kalashnikov

05.05.2022 11:49, rui fan пишет:

Hi,

Thanks a lot for your discussion.

After several discussions, I think it's clear now. I updated the
"Proposed Changes" of FLIP-227[1]. If I have something
missing, please help to add it to FLIP, or add it in the mail
and I can add it to FLIP. If everything is OK, I will create a
new JIRA for the first task, and use FLINK-26762[2] as the
second task.

About the legacy source, do we set maxOverdraftBuffersPerGate=0
directly? How to identify legacySource? Or could we add
the overdraftEnabled in LocalBufferPool? The default value
is false. If the getAvailableFuture is called, change 
overdraftEnabled=true.

It indicates whether there are checks isAvailable elsewhere.
It might be more general, it can cover more cases.

Also, I think the default value of 'max-overdraft-buffers-per-gate'
needs to be confirmed. I prefer it to be between 5 and 10. How
do you think?

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-227%3A+Support+overdraft+buffer

[2] https://issues.apache.org/jira/browse/FLINK-26762

Thanks
fanrui

On Thu, May 5, 2022 at 4:41 PM Piotr Nowojski  
wrote:

Hi again,

After sleeping over this, if both versions (reserve and overdraft)
have the same complexity, I would also prefer the overdraft.

> `Integer.MAX_VALUE` as default value was my idea as well but now, as
> Dawid mentioned, I think it is dangerous since it is too
implicit for
> the user and if the user submits one more job for the same
TaskManger

As I mentioned, it's not only an issue with multiple jobs. The
same problem can happen with different subtasks from the same job,
potentially leading to the FLINK-13203 deadlock [1]. With
FLINK-13203 fixed, I would be in favour of Integer.MAX_VALUE to be
the default value, but as it is, I think we should indeed play on
the safe side and limit it.

> I still don't understand how should be limited "reserve"
implementation.
> I mean if we have X buffers in total and the user sets overdraft
equal
> to X we obviously can not reserve all buffers, but how many we are
> allowed to reserve? Should it be a different configuration like
> percentegeForReservedBuffers?

The reserve could be defined as percentage, or as a fixed number
of buffers. But yes. In normal operation subtask would not use the
reserve, as if numberOfAvailableBuffers < reserve, the output
would be not available. Only in the flatMap/timers/huge records
case the reserve could be used.

> 1. If the total buffers of LocalBufferPool <= the reserve
buffers, will LocalBufferPool never be available? Can't process data?

Of course we would need to make sure that never happens. So the
reserve should be < total buffer size.

> 2. If the overdraft buffer use the extra buffers, when the
downstream
> task inputBuffer is insufficient, it should fail to start the
job, and then
> restart? When the InputBuffer is initialized, it will apply for
enough
> buffers, right?

The failover if downstream can not allocate buffers is already
implemented FLINK-14872 [2]. There is a timeout for how long the
task is waiting for buffer allocation. However this doesn't
prevent many (potentially infinitely many) deadlock/restarts
cycles. IMO the propper solution for [1] would be 2b described in
the ticket:

> 2b. Assign extra buffers only once all of the tasks are RUNNING.
This is a simplified version of 2a, without tracking the tasks
sink-to-source.

But that's a pre-existing problem and I don't think we have to
solve it before implementing overdraft. I think we would need to
solve it only before setting Integer.MAX_VALUE as the default for
the overdraft. Maybe I would hesitate setting the overdraft to
anything more then a couple of buffers by default for the same reason.

> Actually, I totally agree that we don't need a lot of buffers
for overdraft

and

> Also I agree we ignore the overdraftBuffers=numberOfSubpartitions.
> When we finish this feature and after users use it, if users
feedback
> this issue we can

Re: [DISCUSS] FLIP-224: Blacklist Mechanism

2022-05-05 Thread Zhu Zhu

Thank you for all your feedback!

Besides the answers from Lijie, I'd like to share some of my thoughts:
1. Whether to enable automatical blocklist
Generally speaking, it is not a goal of FLIP-224.
The automatical way should be something built upon the blocklist
mechanism and well decoupled. It was designed to be a configurable
blocklist strategy, but I think we can further decouple it by
introducing a abnormal node detector, as Becket suggested, which just
uses the blocklist mechanism once bad nodes are detected. However, it
should be a separate FLIP with further dev discussions and feedback
from users. I also agree with Becket that different users have different
requirements, and we should listen to them.

2. Is it enough to just take away abnormal nodes externally
My answer is no. As Lijie has mentioned, we need a way to avoid
deploying tasks to temporary hot nodes. In this case, users may just
want to limit the load of the node and do not want to kill all the
processes on it. Another case is the speculative execution[1] which
may also leverage this feature to avoid starting mirror tasks on slow
nodes.

Thanks,
Zhu

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-168%3A+Speculative+execution+for+Batch+Job

Lijie Wang  于2022年5月5日周四 15:56写道：

>
> Hi everyone,
>
>
> Thanks for your feedback.
>
>
> There's one detail that I'd like to re-emphasize here because it can affect 
> the value and design of the blocklist mechanism (perhaps I should highlight 
> it in the FLIP). We propose two actions in FLIP:
>
> 1) MARK_BLOCKLISTED: Just mark the task manager or node as blocked. Future 
> slots should not be allocated from the blocked task manager or node. But 
> slots that are already allocated will not be affected. A typical application 
> scenario is to mitigate machine hotspots. In this case, we hope that 
> subsequent resource allocations will not be on the hot machine, but tasks 
> currently running on it should not be affected.
>
> 2) MARK_BLOCKLISTED_AND_EVACUATE_TASKS: Mark the task manager or node as 
> blocked, and evacuate all tasks on it. Evacuated tasks will be restarted on 
> non-blocked task managers.
>
> For the above 2 actions, the former may more highlight the meaning of this 
> FLIP, because the external system cannot do that.
>
>
> Regarding *Manually* and *Automatically*, I basically agree with @Becket Qin: 
> different users have different answers. Not all users’ deployment 
> environments have a special external system that can perform the anomaly 
> detection. In addition, adding pluggable/optional auto-detection doesn't 
> require much extra work on top of manual specification.
>
>
> I will answer your other questions one by one.
>
>
> @Yangze
>
> a) I think you are right, we do not need to expose the 
> `cluster.resource-blocklist.item.timeout-check-interval` to users.
>
> b) We can abstract the `notifyException` to a separate interface (maybe 
> BlocklistExceptionListener), and the ResourceManagerBlocklistHandler can 
> implement it in the future.
>
>
> @Martijn
>
> a) I also think the manual blocking should be done by cluster operators.
>
> b) I think manual blocking makes sense, because according to my experience, 
> users are often the first to perceive the machine problems (because of job 
> failover or delay), and they will contact cluster operators to solve it, or 
> even tell the cluster operators which machine is problematic. From this point 
> of view, I think the people who really need the manual blocking are the 
> users, and it’s just performed by the cluster operator, so I think the manual 
> blocking makes sense.
>
>
> @Chesnay
>
> We need to touch the logic of JM/SlotPool, because for MARK_BLOCKLISTED , we 
> need to know whether the slot is blocklisted when the task is 
> FINISHED/CANCELLED/FAILED. If so,  SlotPool should release the slot directly 
> to avoid assigning other tasks (of this job) on it. If we only maintain the 
> blocklist information on the RM, JM needs to retrieve it by RPC. I think the 
> performance overhead of that is relatively large, so I think it's worth 
> maintaining the blocklist information on the JM side and syncing them.
>
>
> @Роман
>
> a) “Probably storing inside Zookeeper/Configmap might be helpful here.”  
> Can you explain it in detail? I don't fully understand that. In my opinion, 
> non-active and active are the same, and no special treatment is required.
>
> b) I agree with you, the `endTimestamp` makes sense, I will add it to FLIP.
>
>
> @Yang
>
> As mentioned above, AFAK, the external system cannot support the 
> MARK_BLOCKLISTED action.
>
>
> Looking forward to your further feedback.
>
>
> Best,
>
> Lijie
>
>
> Yang Wang  于2022年5月3日周二 21:09写道：
>>
>> Thanks Lijie and Zhu for creating the proposal.
>>
>> I want to share some thoughts about Flink cluster operations.
>>
>> In the production environment, the SRE(aka Site Reliability Engineer)
>> already has many tools to detect the unstable nodes, which could take the

[jira] [Created] (FLINK-27504) State compaction not happening with sliding window and incremental RocksDB backend

2022-05-05 Thread Alexis Sarda-Espinosa (Jira)

Alexis Sarda-Espinosa created FLINK-27504:
-

 Summary: State compaction not happening with sliding window and 
incremental RocksDB backend
 Key: FLINK-27504
 URL: https://issues.apache.org/jira/browse/FLINK-27504
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 1.14.4
 Environment: Local Flink cluster on Arch Linux.
Reporter: Alexis Sarda-Espinosa
 Attachments: duration_trend_52ca77c.png, size_growth_52ca77c.png

Hello,

I'm trying to estimate an upper bound for RocksDB's state size in my 
application. For that purpose, I have created a small job with faster timings 
whose code you can find on GitHub: 
[https://github.com/asardaes/flink-rocksdb-ttl-test]. You can see some of the 
results there, but I summarize here as well:
 * Approximately 20 events per second, 10 unique keys for partitioning are 
pre-specified.
 * Sliding window of 11 seconds with a 1-second slide.
 * Allowed lateness of 11 seconds.
 * State TTL configured to 1 minute and compaction after 1000 entries.
 * Both window-specific and window-global state used.

The goal is to let the job run and analyze state compaction behavior with 
RocksDB.

I have been running the job on a local cluster (outside IDE), the configuration 
YAML is also available in the repository. After running for approximately 1.6 
days, state size is currently 2.3 GiB (see attachments). I understand state can 
retain expired data for a while, but since TTL is 1 minute, this seems 
excessive to me.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Zhu Zhu

Thanks Yun, Till and Joe for the great work and thanks everyone who
makes this release possible!

Cheers,
Zhu

Jiangang Liu  于2022年5月5日周四 21:09写道：
>
> Congratulations! This version is really helpful for us . We will explore it
> and help to improve it.
>
> Best
> Jiangang Liu
>
> Yu Li  于2022年5月5日周四 18:53写道：
>
> > Hurray!
> >
> > Thanks Yun Gao, Till and Joe for all the efforts as our release managers.
> > And thanks all contributors for making this happen!
> >
> > Best Regards,
> > Yu
> >
> >
> > On Thu, 5 May 2022 at 18:01, Sergey Nuyanzin  wrote:
> >
> > > Great news!
> > > Congratulations!
> > > Thanks to the release managers, and everyone involved.
> > >
> > > On Thu, May 5, 2022 at 11:57 AM godfrey he  wrote:
> > >
> > > > Congratulations~
> > > >
> > > > Thanks Yun, Till and Joe for driving this release
> > > > and everyone who made this release happen.
> > > >
> > > > Best,
> > > > Godfrey
> > > >
> > > > Becket Qin  于2022年5月5日周四 17:39写道：
> > > > >
> > > > > Hooray! Thanks Yun, Till and Joe for driving the release!
> > > > >
> > > > > Cheers,
> > > > >
> > > > > JIangjie (Becket) Qin
> > > > >
> > > > > On Thu, May 5, 2022 at 5:20 PM Timo Walther 
> > > wrote:
> > > > >
> > > > > > It took a bit longer than usual. But I'm sure the users will love
> > > this
> > > > > > release.
> > > > > >
> > > > > > Big thanks to the release managers!
> > > > > >
> > > > > > Timo
> > > > > >
> > > > > > Am 05.05.22 um 10:45 schrieb Yuan Mei:
> > > > > > > Great!
> > > > > > >
> > > > > > > Thanks, Yun Gao, Till, and Joe for driving the release, and
> > thanks
> > > to
> > > > > > > everyone for making this release happen!
> > > > > > >
> > > > > > > Best
> > > > > > > Yuan
> > > > > > >
> > > > > > > On Thu, May 5, 2022 at 4:40 PM Leonard Xu 
> > > wrote:
> > > > > > >
> > > > > > >> Congratulations!
> > > > > > >>
> > > > > > >> Thanks Yun Gao, Till and Joe for the great work as our release
> > > > manager
> > > > > > and
> > > > > > >> everyone who involved.
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Leonard
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>> 2022年5月5日 下午4:30，Yang Wang  写道：
> > > > > > >>>
> > > > > > >>> Congratulations!
> > > > > > >>>
> > > > > > >>> Thanks Yun Gao, Till and Joe for driving this release and
> > > everyone
> > > > who
> > > > > > >> made
> > > > > > >>> this release happen.
> > > > > > >>
> > > > > >
> > > > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Sergey
> > >
> >

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Zhu Zhu

Congratulations, Yang!

Thanks,
Zhu

Weiqing Yang  于2022年5月5日周四 22:28写道：
>
> Congratulations Yang!
>
> Best,
> Weiqing
>
> On Thu, May 5, 2022 at 4:18 AM Xintong Song  wrote:
>
> > Hi all,
> >
> > I'm very happy to announce that Yang Wang has joined the Flink PMC!
> >
> > Yang has been consistently contributing to our community, by contributing
> > codes, participating in discussions, mentoring new contributors, answering
> > questions on mailing lists, and giving talks on Flink at
> > various conferences and events. He is one of the main contributors and
> > maintainers in Flink's Native Kubernetes / Yarn integrations and the Flink
> > Kubernetes Operator.
> >
> > Congratulations and welcome, Yang!
> >
> > Thank you~
> >
> > Xintong Song (On behalf of the Apache Flink PMC)
> >

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Weiqing Yang

Congratulations Yang!

Best,
Weiqing

On Thu, May 5, 2022 at 4:18 AM Xintong Song  wrote:

> Hi all,
>
> I'm very happy to announce that Yang Wang has joined the Flink PMC!
>
> Yang has been consistently contributing to our community, by contributing
> codes, participating in discussions, mentoring new contributors, answering
> questions on mailing lists, and giving talks on Flink at
> various conferences and events. He is one of the main contributors and
> maintainers in Flink's Native Kubernetes / Yarn integrations and the Flink
> Kubernetes Operator.
>
> Congratulations and welcome, Yang!
>
> Thank you~
>
> Xintong Song (On behalf of the Apache Flink PMC)
>

Re: Source alignment for Iceberg

2022-05-05 Thread Steven Wu

Piotr,

With FLIP-27, Iceberg source already implemented alignment by tracking
watermark and holding back split assignment when necessary.

The purpose of this discussion is to see if Iceberg source can leverage
some of the watermark alignment work from Flink framework.

Thanks,
Steven

On Thu, May 5, 2022 at 1:10 AM Piotr Nowojski  wrote:

> Ok, I see. Thanks to both of you for the explanation.
>
> Do we need changes to Apache Flink for this feature? Can it be implemented
> in the Sources without changes in the framework? I presume source can
> access min/max watermark from the split, so as long as it also knows
> exactly which splits have finished, it would know which splits to hold back.
>
> Best,
> Piotrek
>
> śr., 4 maj 2022 o 20:03 Steven Wu  napisał(a):
>
>> Piotr, thanks a lot for your feedback.
>>
>> > I can see this being an issue if the existence of too many blocked
>> splits is occupying too many resources.
>>
>> This is not desirable. Eagerly assigning many splits to a reader can
>> defeat the benefits of pull based dynamic split assignments. Iceberg
>> readers request one split at a time upon start or completion of a split.
>> Dynamic split assignment is better for work sharing/stealing as Becket
>> mentioned. Limiting number of active splits can be handled by the FLIP-27
>> Iceberg source and is somewhat orthogonal to watermark alignment.
>>
>> > Can not Iceberg just emit all splits and let FLIP-182/FLIP-217 handle
>> the watermark alignment and block the splits that are too much into the
>> future?
>>
>> The enumerator just assigns the next split to the requesting reader
>> instead of holding back the split assignment. Let the reader handle the
>> pause (if the file split requires alignment wait).  This strategy might
>> work and leverage more from the framework.
>>
>> We probably need the following to make this work
>> * extract watermark/timestamp only at the completion of a split (not at
>> record level). Because records in a file aren't probably not sorted by the
>> timestamp field, the pause or watermark advancement is probably better done
>> at file level.
>> * source readers checkpoint the watermark. otherwise, upon restart
>> readers won't be able to determine the local watermark and pause for
>> alignment. We don't want to emit records upon restart due to unknown
>> watermark info.
>>
>> All,
>>
>> Any opinion on different timestamp for source alignment (vs Flink
>> application watermark)? For Iceberg source, we might want to enforce
>> alignment on kafka timestamp but Flink application watermark may use event
>> time field from payload.
>>
>> Thanks,
>> Steven
>>
>> On Wed, May 4, 2022 at 7:02 AM Becket Qin  wrote:
>>
>>> Hey Piotr,
>>>
>>> I think the mechanism FLIP-182 provided is a reasonable default one,
>>> which
>>> ensures the watermarks are only drifted by an upper bound. However,
>>> admittedly there are also other strategies for different purposes.
>>>
>>> In the Iceberg case, I am not sure if a static strictly allowed watermark
>>> drift is desired. The source might just want to finish reading the
>>> assigned
>>> splits as fast as possible. And it is OK to have a drift of "one split",
>>> instead of a fixed time period.
>>>
>>> As another example, if there are some fast readers whose splits are
>>> always
>>> throttled, while the other slow readers are struggling to keep up with
>>> the
>>> rest of the splits, the split enumerator may decide to reassign the slow
>>> splits so all the readers have something to read. This would need the
>>> SplitEnumerator to be aware of the watermark progress on each reader. So
>>> it
>>> seems useful to expose the WatermarkAlignmentEvent information to the
>>> SplitEnumerator as well.
>>>
>>> Thanks,
>>>
>>> Jiangjie (Becket) Qin
>>>
>>>
>>>
>>> On Tue, May 3, 2022 at 7:58 PM Piotr Nowojski 
>>> wrote:
>>>
>>> > Hi Steven,
>>> >
>>> > Isn't this redundant to FLIP-182 and FLIP-217? Can not Iceberg just
>>> emit
>>> > all splits and let FLIP-182/FLIP-217 handle the watermark alignment and
>>> > block the splits that are too much into the future? I can see this
>>> being an
>>> > issue if the existence of too many blocked splits is occupying too many
>>> > resources.
>>> >
>>> > If that's the case, indeed SourceCoordinator/SplitEnumerator would
>>> have to
>>> > decide on some basis how many and which splits to assign in what
>>> order. But
>>> > in that case I'm not sure how much you could use from FLIP-182 and
>>> > FLIP-217. They seem somehow orthogonal to me, operating on different
>>> > levels. FLIP-182 and FLIP-217 are working with whatever splits have
>>> already
>>> > been generated and assigned. You could leverage FLIP-182 and FLIP-217
>>> and
>>> > take care of only the problem to limit the number of parallel active
>>> > splits. And here I'm not sure if it would be worth generalising a
>>> solution
>>> > across different connectors.
>>> >
>>> > Regarding the global watermark, I made a related comment sometime ago
>>> > about

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Aitozi

Congratulations！

Best,
Aitozi

Guowei Ma  于2022年5月5日周四 21:51写道：

> Congratulations！
> Best,
> Guowei
>
>
> On Thu, May 5, 2022 at 9:01 PM Jiangang Liu 
> wrote:
>
> > Congratulations!
> >
> > Best
> > Liu Jiangang
> >
> > Marios Trivyzas  于2022年5月5日周四 20:47写道：
> >
> > > Congrats Yang!
> > >
> > > On Thu, May 5, 2022, 15:29 Yuan Mei  wrote:
> > >
> > > > Congrats and well Deserved, Yang!
> > > >
> > > > Best,
> > > > Yuan
> > > >
> > > > On Thu, May 5, 2022 at 8:21 PM Nicholas Jiang <
> > nicholasji...@apache.org>
> > > > wrote:
> > > >
> > > > > Congrats Yang!
> > > > >
> > > > > Best regards,
> > > > > Nicholas Jiang
> > > > >
> > > > > On 2022/05/05 11:18:10 Xintong Song wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > I'm very happy to announce that Yang Wang has joined the Flink
> PMC!
> > > > > >
> > > > > > Yang has been consistently contributing to our community, by
> > > > contributing
> > > > > > codes, participating in discussions, mentoring new contributors,
> > > > > answering
> > > > > > questions on mailing lists, and giving talks on Flink at
> > > > > > various conferences and events. He is one of the main
> contributors
> > > and
> > > > > > maintainers in Flink's Native Kubernetes / Yarn integrations and
> > the
> > > > > Flink
> > > > > > Kubernetes Operator.
> > > > > >
> > > > > > Congratulations and welcome, Yang!
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song (On behalf of the Apache Flink PMC)
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-91: Support SQL Client Gateway

2022-05-05 Thread Jark Wu

Hi Martijn,

Regarding maintaining Gateway inside or outside Flink code base,
I would like to share my thoughts:

> I would like to understand why it's complicated to make the upgrades
problematic. Is it because of relying on internal interfaces? If so, should
we not consider making them public?

It's not about internal interfaces. Flink itself doesn't provide backward
compatibility for public APIs.


> a) it will not be possible to have separate releases of the Gateway,
they will be tied to individual Flink releases
I don't think it's a problem. On the contrary, maintaining a separate repo
for Gateway will take a lot of
extra community efforts, e.g., individual CICD, docs, releases.


> b) if you want the Gateway to support multiple Flink versions
Sorry, I don't see any users requesting this feature for such a long time
for SQL Gateway.
Users can build services on Gateway to easily support multi Flink versions
(a Gateway for a Flink version).
It's difficult for Gateway to support multi-version because Flink doesn't
provide an API that supports backward and forward compatibility.
If Gateway wants to support multi-version, it has to invent an
inner-gateway for each version, and Gateway act as a proxy to communicate
with inner-gateway.
So you have to have a gateway to couple with the Flink version.

In fact, Gateway is the layer to support multi Flink versions for
higher-level applications because its API (REST, gRpc) provides backward
and forward compatibility.
The gateway itself doesn't need to support multi Flink versions. Besides,
Trino/Presto also provides servers[1] for each version.


> I don't think that the Gateway is a 'core' function of Flink which should
be included with Flink.
Sorry, I can't agree with this. If I remember correctly, Flink SQL has been
promoted to first-class citizen for a long time.
The community also aims to make Flink a truly batch-stream unified
computing platform, and Gateway would be the entry and center of the
platform.
>From my point of view, Gateway is a very "core" function and must be
included in Flink to have better cooperation with SQL and provide an
out-of-box experience.

Best,
Jark

[1]: https://trino.io/download.html

On Thu, 5 May 2022 at 19:57, godfrey he  wrote:

> Hi Shengkai.
>
> Thanks for driving the proposal, it's been silent too long.
>
> I have a few questions:
> about the Architecture
> > The architecture of the Gateway is in the following graph.
> Is the TableEnvironment shared for all sessions ?
>
> about the REST Endpoint
> > /v1/sessions
> Are both local file and remote file supported for `libs` and `jars`?
> Does sql gateway support upload file?
>
> >/v1/sessions/:session_handle/configure_session
> Can this api be replaced with `/v1/sessions/:session_handle/statements` ?
>
> >/v1/sessions/:session_id/operations/:operation_handle/status
> `:session_id` is a typo, it should be `:session_handdle`
>
> >/v1/sessions/:session_handle/statements
> >The statement must be a single command
> Does this api support `begin statement set ... end` or `statement set
> begin ... end`
>  DO `ADD JAR`, `REMOVE JAR` support ? If yes, how to manage the jars?
>
> >/v1/sessions/:session_handle/operations/:operation_handle/result/:token
> >"type": # string value of LogicalType
>  Some LogicalTypes can not be serialized, such as: CharType(0)
>
> about Options
> > endpoint.protocol
> I think REST is not a kind of protocol[1], but is an architectural style.
> The value should be `HTTP`.
>
> about SQLGatewayService API
> >  Catalog API
> > ...
> I think we should avoid providing such api, because once catalog api
> is changed or added,
> This class should also be changed. SQL statement is a more general
> interface.
>
> > Options
> > sql-gateway.session.idle.timeout
> >sql-gateway.session.check.interval
> >sql-gateway.worker.keepalive.time
> It's better we can keep the option style as Flink, the level should
> not be too deep.
> sql-gateway.session.idle.timeout -> sql-gateway.session.idle-timeout
> sql-gateway.session.check.interval -> sql-gateway.session.check-interval
> sql-gateway.worker.keepalive.time -> sql-gateway.worker.keepalive->time
>
> [1] https://restfulapi.net/
>
> Best,
> Godfrey
>
> Nicholas Jiang  于2022年5月5日周四 14:58写道：
> >
> > Hi Shengkai,
> >
> > I have another concern about the submission of batch job. Does the Flink
> SQL gateway support to submit batch job? In Kyuubi, BatchProcessBuilder is
> used to submit batch job. What about the Flink SQL gateway?
> >
> > Best regards,
> > Nicholas Jiang
> >
> > On 2022/04/24 03:28:36 Shengkai Fang wrote:
> > > Hi. Jiang.
> > >
> > > Thanks for your feedback！
> > >
> > > > Do the public interfaces of GatewayService refer to any service?
> > >
> > > We will only expose one GatewayService implementation. We will put the
> > > interface into the common package and the developer who wants to
> implement
> > > a new endpoint can just rely on the interface package rather than the
> > > implementation.
> > >
> > > > What's

[jira] [Created] (FLINK-27503) Rename sst file to data file

2022-05-05 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-27503:


 Summary: Rename sst file to data file
 Key: FLINK-27503
 URL: https://issues.apache.org/jira/browse/FLINK-27503
 Project: Flink
  Issue Type: Sub-task
  Components: Table Store
Reporter: Jingsong Lee
Assignee: Jingsong Lee
 Fix For: table-store-0.2.0


If we support append only data, the records have no primary key, the file 
should be data file instead of sst file.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Guowei Ma

Congratulations！
Best,
Guowei


On Thu, May 5, 2022 at 9:01 PM Jiangang Liu 
wrote:

> Congratulations!
>
> Best
> Liu Jiangang
>
> Marios Trivyzas  于2022年5月5日周四 20:47写道：
>
> > Congrats Yang!
> >
> > On Thu, May 5, 2022, 15:29 Yuan Mei  wrote:
> >
> > > Congrats and well Deserved, Yang!
> > >
> > > Best,
> > > Yuan
> > >
> > > On Thu, May 5, 2022 at 8:21 PM Nicholas Jiang <
> nicholasji...@apache.org>
> > > wrote:
> > >
> > > > Congrats Yang!
> > > >
> > > > Best regards,
> > > > Nicholas Jiang
> > > >
> > > > On 2022/05/05 11:18:10 Xintong Song wrote:
> > > > > Hi all,
> > > > >
> > > > > I'm very happy to announce that Yang Wang has joined the Flink PMC!
> > > > >
> > > > > Yang has been consistently contributing to our community, by
> > > contributing
> > > > > codes, participating in discussions, mentoring new contributors,
> > > > answering
> > > > > questions on mailing lists, and giving talks on Flink at
> > > > > various conferences and events. He is one of the main contributors
> > and
> > > > > maintainers in Flink's Native Kubernetes / Yarn integrations and
> the
> > > > Flink
> > > > > Kubernetes Operator.
> > > > >
> > > > > Congratulations and welcome, Yang!
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song (On behalf of the Apache Flink PMC)
> > > > >
> > > >
> > >
> >
>

[jira] [Created] (FLINK-27502) Add file Suffix to data files

2022-05-05 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-27502:


 Summary: Add file Suffix to data files
 Key: FLINK-27502
 URL: https://issues.apache.org/jira/browse/FLINK-27502
 Project: Flink
  Issue Type: Improvement
  Components: Table Store
Reporter: Jingsong Lee
 Fix For: table-store-0.2.0


For example, adding .orc to an orc format file.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-05 Thread Jark Wu

It's great to see the active discussion! I want to share my ideas:

1) implement the cache in framework vs. connectors base
I don't have a strong opinion on this. Both ways should work (e.g., cache
pruning, compatibility).
The framework way can provide more concise interfaces.
The connector base way can define more flexible cache
strategies/implementations.
We are still investigating a way to see if we can have both advantages.
We should reach a consensus that the way should be a final state, and we
are on the path to it.

2) filters and projections pushdown:
I agree with Alex that the filter pushdown into cache can benefit a lot for
ALL cache.
However, this is not true for LRU cache. Connectors use cache to reduce IO
requests to databases for better throughput.
If a filter can prune 90% of data in the cache, we will have 90% of lookup
requests that can never be cached
and hit directly to the databases. That means the cache is meaningless in
this case.

IMO, Flink SQL has provided a standard way to do filters and projects
pushdown, i.e., SupportsFilterPushDown and SupportsProjectionPushDown.
Jdbc/hive/HBase haven't implemented the interfaces, don't mean it's hard to
implement.
They should implement the pushdown interfaces to reduce IO and the cache
size.
That should be a final state that the scan source and lookup source share
the exact pushdown implementation.
I don't see why we need to duplicate the pushdown logic in caches, which
will complex the lookup join design.

3) ALL cache abstraction
All cache might be the most challenging part of this FLIP. We have never
provided a reload-lookup public interface.
Currently, we put the reload logic in the "eval" method of TableFunction.
That's hard for some sources (e.g., Hive).
Ideally, connector implementation should share the logic of reload and
scan, i.e. ScanTableSource with InputFormat/SourceFunction/FLIP-27 Source.
However, InputFormat/SourceFunction are deprecated, and the FLIP-27 source
is deeply coupled with SourceOperator.
If we want to invoke the FLIP-27 source in LookupJoin, this may make the
scope of this FLIP much larger.
We are still investigating how to abstract the ALL cache logic and reuse
the existing source interfaces.

Best,
Jark

On Thu, 5 May 2022 at 20:22, Roman Boyko  wrote:

> It's a much more complicated activity and lies out of the scope of this
> improvement. Because such pushdowns should be done for all ScanTableSource
> implementations (not only for Lookup ones).
>
> On Thu, 5 May 2022 at 19:02, Martijn Visser 
> wrote:
>
>> Hi everyone,
>>
>> One question regarding "And Alexander correctly mentioned that filter
>> pushdown still is not implemented for jdbc/hive/hbase." -> Would an
>> alternative solution be to actually implement these filter pushdowns? I
>> can
>> imagine that there are many more benefits to doing that, outside of lookup
>> caching and metrics.
>>
>> Best regards,
>>
>> Martijn Visser
>> https://twitter.com/MartijnVisser82
>> https://github.com/MartijnVisser
>>
>>
>> On Thu, 5 May 2022 at 13:58, Roman Boyko  wrote:
>>
>> > Hi everyone!
>> >
>> > Thanks for driving such a valuable improvement!
>> >
>> > I do think that single cache implementation would be a nice opportunity
>> for
>> > users. And it will break the "FOR SYSTEM_TIME AS OF proc_time" semantics
>> > anyway - doesn't matter how it will be implemented.
>> >
>> > Putting myself in the user's shoes, I can say that:
>> > 1) I would prefer to have the opportunity to cut off the cache size by
>> > simply filtering unnecessary data. And the most handy way to do it is
>> apply
>> > it inside LookupRunners. It would be a bit harder to pass it through the
>> > LookupJoin node to TableFunction. And Alexander correctly mentioned that
>> > filter pushdown still is not implemented for jdbc/hive/hbase.
>> > 2) The ability to set the different caching parameters for different
>> tables
>> > is quite important. So I would prefer to set it through DDL rather than
>> > have similar ttla, strategy and other options for all lookup tables.
>> > 3) Providing the cache into the framework really deprives us of
>> > extensibility (users won't be able to implement their own cache). But
>> most
>> > probably it might be solved by creating more different cache strategies
>> and
>> > a wider set of configurations.
>> >
>> > All these points are much closer to the schema proposed by Alexander.
>> > Qingshen Ren, please correct me if I'm not right and all these
>> facilities
>> > might be simply implemented in your architecture?
>> >
>> > Best regards,
>> > Roman Boyko
>> > e.: ro.v.bo...@gmail.com
>> >
>> > On Wed, 4 May 2022 at 21:01, Martijn Visser 
>> > wrote:
>> >
>> > > Hi everyone,
>> > >
>> > > I don't have much to chip in, but just wanted to express that I really
>> > > appreciate the in-depth discussion on this topic and I hope that
>> others
>> > > will join the conversation.
>> > >
>> > > Best regards,
>> > >
>> > > Martijn
>> > >
>> > > On Tue, 3 May 2022 at 10:15,

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Jiangang Liu

Congratulations! This version is really helpful for us . We will explore it
and help to improve it.

Best
Jiangang Liu

Yu Li  于2022年5月5日周四 18:53写道：

> Hurray!
>
> Thanks Yun Gao, Till and Joe for all the efforts as our release managers.
> And thanks all contributors for making this happen!
>
> Best Regards,
> Yu
>
>
> On Thu, 5 May 2022 at 18:01, Sergey Nuyanzin  wrote:
>
> > Great news!
> > Congratulations!
> > Thanks to the release managers, and everyone involved.
> >
> > On Thu, May 5, 2022 at 11:57 AM godfrey he  wrote:
> >
> > > Congratulations~
> > >
> > > Thanks Yun, Till and Joe for driving this release
> > > and everyone who made this release happen.
> > >
> > > Best,
> > > Godfrey
> > >
> > > Becket Qin  于2022年5月5日周四 17:39写道：
> > > >
> > > > Hooray! Thanks Yun, Till and Joe for driving the release!
> > > >
> > > > Cheers,
> > > >
> > > > JIangjie (Becket) Qin
> > > >
> > > > On Thu, May 5, 2022 at 5:20 PM Timo Walther 
> > wrote:
> > > >
> > > > > It took a bit longer than usual. But I'm sure the users will love
> > this
> > > > > release.
> > > > >
> > > > > Big thanks to the release managers!
> > > > >
> > > > > Timo
> > > > >
> > > > > Am 05.05.22 um 10:45 schrieb Yuan Mei:
> > > > > > Great!
> > > > > >
> > > > > > Thanks, Yun Gao, Till, and Joe for driving the release, and
> thanks
> > to
> > > > > > everyone for making this release happen!
> > > > > >
> > > > > > Best
> > > > > > Yuan
> > > > > >
> > > > > > On Thu, May 5, 2022 at 4:40 PM Leonard Xu 
> > wrote:
> > > > > >
> > > > > >> Congratulations!
> > > > > >>
> > > > > >> Thanks Yun Gao, Till and Joe for the great work as our release
> > > manager
> > > > > and
> > > > > >> everyone who involved.
> > > > > >>
> > > > > >> Best,
> > > > > >> Leonard
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>> 2022年5月5日 下午4:30，Yang Wang  写道：
> > > > > >>>
> > > > > >>> Congratulations!
> > > > > >>>
> > > > > >>> Thanks Yun Gao, Till and Joe for driving this release and
> > everyone
> > > who
> > > > > >> made
> > > > > >>> this release happen.
> > > > > >>
> > > > >
> > > > >
> > >
> >
> >
> > --
> > Best regards,
> > Sergey
> >
>

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Jiangang Liu

Congratulations!

Best
Liu Jiangang

Marios Trivyzas  于2022年5月5日周四 20:47写道：

> Congrats Yang!
>
> On Thu, May 5, 2022, 15:29 Yuan Mei  wrote:
>
> > Congrats and well Deserved, Yang!
> >
> > Best,
> > Yuan
> >
> > On Thu, May 5, 2022 at 8:21 PM Nicholas Jiang 
> > wrote:
> >
> > > Congrats Yang!
> > >
> > > Best regards,
> > > Nicholas Jiang
> > >
> > > On 2022/05/05 11:18:10 Xintong Song wrote:
> > > > Hi all,
> > > >
> > > > I'm very happy to announce that Yang Wang has joined the Flink PMC!
> > > >
> > > > Yang has been consistently contributing to our community, by
> > contributing
> > > > codes, participating in discussions, mentoring new contributors,
> > > answering
> > > > questions on mailing lists, and giving talks on Flink at
> > > > various conferences and events. He is one of the main contributors
> and
> > > > maintainers in Flink's Native Kubernetes / Yarn integrations and the
> > > Flink
> > > > Kubernetes Operator.
> > > >
> > > > Congratulations and welcome, Yang!
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song (On behalf of the Apache Flink PMC)
> > > >
> > >
> >
>

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Marios Trivyzas

Congrats Yang!

On Thu, May 5, 2022, 15:29 Yuan Mei  wrote:

> Congrats and well Deserved, Yang!
>
> Best,
> Yuan
>
> On Thu, May 5, 2022 at 8:21 PM Nicholas Jiang 
> wrote:
>
> > Congrats Yang!
> >
> > Best regards,
> > Nicholas Jiang
> >
> > On 2022/05/05 11:18:10 Xintong Song wrote:
> > > Hi all,
> > >
> > > I'm very happy to announce that Yang Wang has joined the Flink PMC!
> > >
> > > Yang has been consistently contributing to our community, by
> contributing
> > > codes, participating in discussions, mentoring new contributors,
> > answering
> > > questions on mailing lists, and giving talks on Flink at
> > > various conferences and events. He is one of the main contributors and
> > > maintainers in Flink's Native Kubernetes / Yarn integrations and the
> > Flink
> > > Kubernetes Operator.
> > >
> > > Congratulations and welcome, Yang!
> > >
> > > Thank you~
> > >
> > > Xintong Song (On behalf of the Apache Flink PMC)
> > >
> >
>

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Yuan Mei

Congrats and well Deserved, Yang!

Best,
Yuan

On Thu, May 5, 2022 at 8:21 PM Nicholas Jiang 
wrote:

> Congrats Yang!
>
> Best regards,
> Nicholas Jiang
>
> On 2022/05/05 11:18:10 Xintong Song wrote:
> > Hi all,
> >
> > I'm very happy to announce that Yang Wang has joined the Flink PMC!
> >
> > Yang has been consistently contributing to our community, by contributing
> > codes, participating in discussions, mentoring new contributors,
> answering
> > questions on mailing lists, and giving talks on Flink at
> > various conferences and events. He is one of the main contributors and
> > maintainers in Flink's Native Kubernetes / Yarn integrations and the
> Flink
> > Kubernetes Operator.
> >
> > Congratulations and welcome, Yang!
> >
> > Thank you~
> >
> > Xintong Song (On behalf of the Apache Flink PMC)
> >
>

Re: [DISCUSS] Process regarding to new connectors

2022-05-05 Thread Jark Wu

Hi Martijn,

Thanks for bringing up this discussion.

>From my point of view, Flink Bylaws should also be applied to the
connectors.
I don't think connectors are just implementations, they provide many APIs
for
end-users, including DataStream API, and SQL DDL options, SQL metadata
columns.

There are many FLIPs just discussing the connector APIs, e.g. FLIP-86[1],
FLIP-125[2], FLIP-105[3], FLIP-107[4].

Best,
Jark

[1]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-86%3A+Improve+Connector+Properties
[2]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-125%3A+Confluent+Schema+Registry+Catalog
[3]:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427289
[4]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors#FLIP107:HandlingofmetadatainSQLconnectors-Kafka




On Thu, 5 May 2022 at 20:00, Martijn Visser 
wrote:

> Hi everyone,
>
> I would like to open a discussion on the process regarding new connectors.
> As you know from previous updates [1] we are making a lot of progress on
> externalizing connectors that are currently hosted inside in the Flink
> repository.
>
> One topic I would like to bring up for discussion is how the Flink
> community wants to deal with new connectors. I've been contacted by many
> contributors who are interested in working on one or more connectors. Most
> of these connectors are not yet made public or development hasn't started
> yet.
>
> When reading up on the Flink Bylaws [2] I would argue that connectors that
> are currently already existing (but not under a Flink project scope) would
> fall under 'Adoption of New Codebase' which would require a 2/3 majority
> vote by PMC members. Looking at the FLIP requirements [3] you could argue
> that any new connector is 'a major new feature, subsystem, or piece of
> functionality'. A pro of needing to create a (small) FLIP for a new
> connector is that someone needs to think about the design, implementation
> and requires a vote, so there is more control. The downside of it is that a
> FLIP is considered a drawback, given that a connector normally should be
> using only the public interfaces provided by Flink so you could argue it's
> just an implementation.
>
> I'm looking forward to your input to see if we can reach consensus on this
> topic, so it can be included in the documentation for contributors that
> want to work and maintain a new connector.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>
> [1] https://lists.apache.org/thread/8k1xonqt7hn0xldbky1cxfx3fzh6sj7h
> [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
> [3]
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-Whatisconsidereda%22majorchange%22thatneedsaFLIP
> ?
>

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-05 Thread Roman Boyko

It's a much more complicated activity and lies out of the scope of this
improvement. Because such pushdowns should be done for all ScanTableSource
implementations (not only for Lookup ones).

On Thu, 5 May 2022 at 19:02, Martijn Visser 
wrote:

> Hi everyone,
>
> One question regarding "And Alexander correctly mentioned that filter
> pushdown still is not implemented for jdbc/hive/hbase." -> Would an
> alternative solution be to actually implement these filter pushdowns? I can
> imagine that there are many more benefits to doing that, outside of lookup
> caching and metrics.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>
>
> On Thu, 5 May 2022 at 13:58, Roman Boyko  wrote:
>
> > Hi everyone!
> >
> > Thanks for driving such a valuable improvement!
> >
> > I do think that single cache implementation would be a nice opportunity
> for
> > users. And it will break the "FOR SYSTEM_TIME AS OF proc_time" semantics
> > anyway - doesn't matter how it will be implemented.
> >
> > Putting myself in the user's shoes, I can say that:
> > 1) I would prefer to have the opportunity to cut off the cache size by
> > simply filtering unnecessary data. And the most handy way to do it is
> apply
> > it inside LookupRunners. It would be a bit harder to pass it through the
> > LookupJoin node to TableFunction. And Alexander correctly mentioned that
> > filter pushdown still is not implemented for jdbc/hive/hbase.
> > 2) The ability to set the different caching parameters for different
> tables
> > is quite important. So I would prefer to set it through DDL rather than
> > have similar ttla, strategy and other options for all lookup tables.
> > 3) Providing the cache into the framework really deprives us of
> > extensibility (users won't be able to implement their own cache). But
> most
> > probably it might be solved by creating more different cache strategies
> and
> > a wider set of configurations.
> >
> > All these points are much closer to the schema proposed by Alexander.
> > Qingshen Ren, please correct me if I'm not right and all these facilities
> > might be simply implemented in your architecture?
> >
> > Best regards,
> > Roman Boyko
> > e.: ro.v.bo...@gmail.com
> >
> > On Wed, 4 May 2022 at 21:01, Martijn Visser 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > I don't have much to chip in, but just wanted to express that I really
> > > appreciate the in-depth discussion on this topic and I hope that others
> > > will join the conversation.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Tue, 3 May 2022 at 10:15, Александр Смирнов 
> > > wrote:
> > >
> > > > Hi Qingsheng, Leonard and Jark,
> > > >
> > > > Thanks for your detailed feedback! However, I have questions about
> > > > some of your statements (maybe I didn't get something?).
> > > >
> > > > > Caching actually breaks the semantic of "FOR SYSTEM_TIME AS OF
> > > proc_time”
> > > >
> > > > I agree that the semantics of "FOR SYSTEM_TIME AS OF proc_time" is
> not
> > > > fully implemented with caching, but as you said, users go on it
> > > > consciously to achieve better performance (no one proposed to enable
> > > > caching by default, etc.). Or by users do you mean other developers
> of
> > > > connectors? In this case developers explicitly specify whether their
> > > > connector supports caching or not (in the list of supported options),
> > > > no one makes them do that if they don't want to. So what exactly is
> > > > the difference between implementing caching in modules
> > > > flink-table-runtime and in flink-table-common from the considered
> > > > point of view? How does it affect on breaking/non-breaking the
> > > > semantics of "FOR SYSTEM_TIME AS OF proc_time"?
> > > >
> > > > > confront a situation that allows table options in DDL to control
> the
> > > > behavior of the framework, which has never happened previously and
> > should
> > > > be cautious
> > > >
> > > > If we talk about main differences of semantics of DDL options and
> > > > config options("table.exec.xxx"), isn't it about limiting the scope
> of
> > > > the options + importance for the user business logic rather than
> > > > specific location of corresponding logic in the framework? I mean
> that
> > > > in my design, for example, putting an option with lookup cache
> > > > strategy in configurations would  be the wrong decision, because it
> > > > directly affects the user's business logic (not just performance
> > > > optimization) + touches just several functions of ONE table (there
> can
> > > > be multiple tables with different caches). Does it really matter for
> > > > the user (or someone else) where the logic is located, which is
> > > > affected by the applied option?
> > > > Also I can remember DDL option 'sink.parallelism', which in some way
> > > > "controls the behavior of the framework" and I don't see any problem
> > > > here.
> > > >
> > > > > introduce a new interface for this all-caching

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Nicholas Jiang

Congrats Yang!

Best regards,
Nicholas Jiang

On 2022/05/05 11:18:10 Xintong Song wrote:
> Hi all,
> 
> I'm very happy to announce that Yang Wang has joined the Flink PMC!
> 
> Yang has been consistently contributing to our community, by contributing
> codes, participating in discussions, mentoring new contributors, answering
> questions on mailing lists, and giving talks on Flink at
> various conferences and events. He is one of the main contributors and
> maintainers in Flink's Native Kubernetes / Yarn integrations and the Flink
> Kubernetes Operator.
> 
> Congratulations and welcome, Yang!
> 
> Thank you~
> 
> Xintong Song (On behalf of the Apache Flink PMC)
>

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-05 Thread Martijn Visser

Hi everyone,

One question regarding "And Alexander correctly mentioned that filter
pushdown still is not implemented for jdbc/hive/hbase." -> Would an
alternative solution be to actually implement these filter pushdowns? I can
imagine that there are many more benefits to doing that, outside of lookup
caching and metrics.

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser


On Thu, 5 May 2022 at 13:58, Roman Boyko  wrote:

> Hi everyone!
>
> Thanks for driving such a valuable improvement!
>
> I do think that single cache implementation would be a nice opportunity for
> users. And it will break the "FOR SYSTEM_TIME AS OF proc_time" semantics
> anyway - doesn't matter how it will be implemented.
>
> Putting myself in the user's shoes, I can say that:
> 1) I would prefer to have the opportunity to cut off the cache size by
> simply filtering unnecessary data. And the most handy way to do it is apply
> it inside LookupRunners. It would be a bit harder to pass it through the
> LookupJoin node to TableFunction. And Alexander correctly mentioned that
> filter pushdown still is not implemented for jdbc/hive/hbase.
> 2) The ability to set the different caching parameters for different tables
> is quite important. So I would prefer to set it through DDL rather than
> have similar ttla, strategy and other options for all lookup tables.
> 3) Providing the cache into the framework really deprives us of
> extensibility (users won't be able to implement their own cache). But most
> probably it might be solved by creating more different cache strategies and
> a wider set of configurations.
>
> All these points are much closer to the schema proposed by Alexander.
> Qingshen Ren, please correct me if I'm not right and all these facilities
> might be simply implemented in your architecture?
>
> Best regards,
> Roman Boyko
> e.: ro.v.bo...@gmail.com
>
> On Wed, 4 May 2022 at 21:01, Martijn Visser 
> wrote:
>
> > Hi everyone,
> >
> > I don't have much to chip in, but just wanted to express that I really
> > appreciate the in-depth discussion on this topic and I hope that others
> > will join the conversation.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Tue, 3 May 2022 at 10:15, Александр Смирнов 
> > wrote:
> >
> > > Hi Qingsheng, Leonard and Jark,
> > >
> > > Thanks for your detailed feedback! However, I have questions about
> > > some of your statements (maybe I didn't get something?).
> > >
> > > > Caching actually breaks the semantic of "FOR SYSTEM_TIME AS OF
> > proc_time”
> > >
> > > I agree that the semantics of "FOR SYSTEM_TIME AS OF proc_time" is not
> > > fully implemented with caching, but as you said, users go on it
> > > consciously to achieve better performance (no one proposed to enable
> > > caching by default, etc.). Or by users do you mean other developers of
> > > connectors? In this case developers explicitly specify whether their
> > > connector supports caching or not (in the list of supported options),
> > > no one makes them do that if they don't want to. So what exactly is
> > > the difference between implementing caching in modules
> > > flink-table-runtime and in flink-table-common from the considered
> > > point of view? How does it affect on breaking/non-breaking the
> > > semantics of "FOR SYSTEM_TIME AS OF proc_time"?
> > >
> > > > confront a situation that allows table options in DDL to control the
> > > behavior of the framework, which has never happened previously and
> should
> > > be cautious
> > >
> > > If we talk about main differences of semantics of DDL options and
> > > config options("table.exec.xxx"), isn't it about limiting the scope of
> > > the options + importance for the user business logic rather than
> > > specific location of corresponding logic in the framework? I mean that
> > > in my design, for example, putting an option with lookup cache
> > > strategy in configurations would  be the wrong decision, because it
> > > directly affects the user's business logic (not just performance
> > > optimization) + touches just several functions of ONE table (there can
> > > be multiple tables with different caches). Does it really matter for
> > > the user (or someone else) where the logic is located, which is
> > > affected by the applied option?
> > > Also I can remember DDL option 'sink.parallelism', which in some way
> > > "controls the behavior of the framework" and I don't see any problem
> > > here.
> > >
> > > > introduce a new interface for this all-caching scenario and the
> design
> > > would become more complex
> > >
> > > This is a subject for a separate discussion, but actually in our
> > > internal version we solved this problem quite easily - we reused
> > > InputFormat class (so there is no need for a new API). The point is
> > > that currently all lookup connectors use InputFormat for scanning the
> > > data in batch mode: HBase, JDBC and even Hive - it uses class
> > > PartitionReader, that is actually just a

[DISCUSS] Process regarding to new connectors

2022-05-05 Thread Martijn Visser

Hi everyone,

I would like to open a discussion on the process regarding new connectors.
As you know from previous updates [1] we are making a lot of progress on
externalizing connectors that are currently hosted inside in the Flink
repository.

One topic I would like to bring up for discussion is how the Flink
community wants to deal with new connectors. I've been contacted by many
contributors who are interested in working on one or more connectors. Most
of these connectors are not yet made public or development hasn't started
yet.

When reading up on the Flink Bylaws [2] I would argue that connectors that
are currently already existing (but not under a Flink project scope) would
fall under 'Adoption of New Codebase' which would require a 2/3 majority
vote by PMC members. Looking at the FLIP requirements [3] you could argue
that any new connector is 'a major new feature, subsystem, or piece of
functionality'. A pro of needing to create a (small) FLIP for a new
connector is that someone needs to think about the design, implementation
and requires a vote, so there is more control. The downside of it is that a
FLIP is considered a drawback, given that a connector normally should be
using only the public interfaces provided by Flink so you could argue it's
just an implementation.

I'm looking forward to your input to see if we can reach consensus on this
topic, so it can be included in the documentation for contributors that
want to work and maintain a new connector.

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser

[1] https://lists.apache.org/thread/8k1xonqt7hn0xldbky1cxfx3fzh6sj7h
[2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
[3]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-Whatisconsidereda%22majorchange%22thatneedsaFLIP
?

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Anton Kalashnikov


Congrats Yang!

--

Best regards,
Anton Kalashnikov

05.05.2022 13:53, rui fan пишет:

Congratulations Yang!

Best
fanrui

On Thu, May 5, 2022 at 19:47 Martijn Visser  wrote:


Congratulations Yang Wang!

On Thu, 5 May 2022 at 13:23, Lijie Wang  wrote:


Congratulations Yang!

Best,
Lijie

Xintong Song  于2022年5月5日周四 19:18写道：


Hi all,

I'm very happy to announce that Yang Wang has joined the Flink PMC!

Yang has been consistently contributing to our community, by

contributing

codes, participating in discussions, mentoring new contributors,

answering

questions on mailing lists, and giving talks on Flink at
various conferences and events. He is one of the main contributors and
maintainers in Flink's Native Kubernetes / Yarn integrations and the

Flink

Kubernetes Operator.

Congratulations and welcome, Yang!

Thank you~

Xintong Song (On behalf of the Apache Flink PMC)

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-05 Thread Roman Boyko

Hi everyone!

Thanks for driving such a valuable improvement!

I do think that single cache implementation would be a nice opportunity for
users. And it will break the "FOR SYSTEM_TIME AS OF proc_time" semantics
anyway - doesn't matter how it will be implemented.

Putting myself in the user's shoes, I can say that:
1) I would prefer to have the opportunity to cut off the cache size by
simply filtering unnecessary data. And the most handy way to do it is apply
it inside LookupRunners. It would be a bit harder to pass it through the
LookupJoin node to TableFunction. And Alexander correctly mentioned that
filter pushdown still is not implemented for jdbc/hive/hbase.
2) The ability to set the different caching parameters for different tables
is quite important. So I would prefer to set it through DDL rather than
have similar ttla, strategy and other options for all lookup tables.
3) Providing the cache into the framework really deprives us of
extensibility (users won't be able to implement their own cache). But most
probably it might be solved by creating more different cache strategies and
a wider set of configurations.

All these points are much closer to the schema proposed by Alexander.
Qingshen Ren, please correct me if I'm not right and all these facilities
might be simply implemented in your architecture?

Best regards,
Roman Boyko
e.: ro.v.bo...@gmail.com

On Wed, 4 May 2022 at 21:01, Martijn Visser 
wrote:

> Hi everyone,
>
> I don't have much to chip in, but just wanted to express that I really
> appreciate the in-depth discussion on this topic and I hope that others
> will join the conversation.
>
> Best regards,
>
> Martijn
>
> On Tue, 3 May 2022 at 10:15, Александр Смирнов 
> wrote:
>
> > Hi Qingsheng, Leonard and Jark,
> >
> > Thanks for your detailed feedback! However, I have questions about
> > some of your statements (maybe I didn't get something?).
> >
> > > Caching actually breaks the semantic of "FOR SYSTEM_TIME AS OF
> proc_time”
> >
> > I agree that the semantics of "FOR SYSTEM_TIME AS OF proc_time" is not
> > fully implemented with caching, but as you said, users go on it
> > consciously to achieve better performance (no one proposed to enable
> > caching by default, etc.). Or by users do you mean other developers of
> > connectors? In this case developers explicitly specify whether their
> > connector supports caching or not (in the list of supported options),
> > no one makes them do that if they don't want to. So what exactly is
> > the difference between implementing caching in modules
> > flink-table-runtime and in flink-table-common from the considered
> > point of view? How does it affect on breaking/non-breaking the
> > semantics of "FOR SYSTEM_TIME AS OF proc_time"?
> >
> > > confront a situation that allows table options in DDL to control the
> > behavior of the framework, which has never happened previously and should
> > be cautious
> >
> > If we talk about main differences of semantics of DDL options and
> > config options("table.exec.xxx"), isn't it about limiting the scope of
> > the options + importance for the user business logic rather than
> > specific location of corresponding logic in the framework? I mean that
> > in my design, for example, putting an option with lookup cache
> > strategy in configurations would  be the wrong decision, because it
> > directly affects the user's business logic (not just performance
> > optimization) + touches just several functions of ONE table (there can
> > be multiple tables with different caches). Does it really matter for
> > the user (or someone else) where the logic is located, which is
> > affected by the applied option?
> > Also I can remember DDL option 'sink.parallelism', which in some way
> > "controls the behavior of the framework" and I don't see any problem
> > here.
> >
> > > introduce a new interface for this all-caching scenario and the design
> > would become more complex
> >
> > This is a subject for a separate discussion, but actually in our
> > internal version we solved this problem quite easily - we reused
> > InputFormat class (so there is no need for a new API). The point is
> > that currently all lookup connectors use InputFormat for scanning the
> > data in batch mode: HBase, JDBC and even Hive - it uses class
> > PartitionReader, that is actually just a wrapper around InputFormat.
> > The advantage of this solution is the ability to reload cache data in
> > parallel (number of threads depends on number of InputSplits, but has
> > an upper limit). As a result cache reload time significantly reduces
> > (as well as time of input stream blocking). I know that usually we try
> > to avoid usage of concurrency in Flink code, but maybe this one can be
> > an exception. BTW I don't say that it's an ideal solution, maybe there
> > are better ones.
> >
> > > Providing the cache in the framework might introduce compatibility
> issues
> >
> > It's possible only in cases when the developer of the connector won't
> >

Re: [DISCUSS] FLIP-91: Support SQL Client Gateway

2022-05-05 Thread godfrey he

Hi Shengkai.

Thanks for driving the proposal, it's been silent too long.

I have a few questions:
about the Architecture
> The architecture of the Gateway is in the following graph.
Is the TableEnvironment shared for all sessions ?

about the REST Endpoint
> /v1/sessions
Are both local file and remote file supported for `libs` and `jars`?
Does sql gateway support upload file?

>/v1/sessions/:session_handle/configure_session
Can this api be replaced with `/v1/sessions/:session_handle/statements` ?

>/v1/sessions/:session_id/operations/:operation_handle/status
`:session_id` is a typo, it should be `:session_handdle`

>/v1/sessions/:session_handle/statements
>The statement must be a single command
Does this api support `begin statement set ... end` or `statement set
begin ... end`
 DO `ADD JAR`, `REMOVE JAR` support ? If yes, how to manage the jars?

>/v1/sessions/:session_handle/operations/:operation_handle/result/:token
>"type": # string value of LogicalType
 Some LogicalTypes can not be serialized, such as: CharType(0)

about Options
> endpoint.protocol
I think REST is not a kind of protocol[1], but is an architectural style.
The value should be `HTTP`.

about SQLGatewayService API
>  Catalog API
> ...
I think we should avoid providing such api, because once catalog api
is changed or added,
This class should also be changed. SQL statement is a more general interface.

> Options
> sql-gateway.session.idle.timeout
>sql-gateway.session.check.interval
>sql-gateway.worker.keepalive.time
It's better we can keep the option style as Flink, the level should
not be too deep.
sql-gateway.session.idle.timeout -> sql-gateway.session.idle-timeout
sql-gateway.session.check.interval -> sql-gateway.session.check-interval
sql-gateway.worker.keepalive.time -> sql-gateway.worker.keepalive->time

[1] https://restfulapi.net/

Best,
Godfrey

Nicholas Jiang  于2022年5月5日周四 14:58写道：
>
> Hi Shengkai,
>
> I have another concern about the submission of batch job. Does the Flink SQL 
> gateway support to submit batch job? In Kyuubi, BatchProcessBuilder is used 
> to submit batch job. What about the Flink SQL gateway?
>
> Best regards,
> Nicholas Jiang
>
> On 2022/04/24 03:28:36 Shengkai Fang wrote:
> > Hi. Jiang.
> >
> > Thanks for your feedback！
> >
> > > Do the public interfaces of GatewayService refer to any service?
> >
> > We will only expose one GatewayService implementation. We will put the
> > interface into the common package and the developer who wants to implement
> > a new endpoint can just rely on the interface package rather than the
> > implementation.
> >
> > > What's the behavior of SQL Client Gateway working on Yarn or K8S? Does
> > the SQL Client Gateway support application or session mode on Yarn?
> >
> > I think we can support SQL Client Gateway to submit the jobs in
> > application/sesison mode.
> >
> > > Is there any event trigger in the operation state machine?
> >
> > Yes. I have already updated the content and add more details about the
> > state machine. During the revise, I found that I mix up the two concepts:
> > job submission and job execution. In fact, we only control the submission
> > mode at the gateway layer. Therefore, we don't need to mapping the
> > JobStatus here. If the user expects that the synchronization behavior is to
> > wait for the completion of the job execution before allowing the next
> > statement to be executed, then the Operation lifecycle should also contains
> > the job's execution, which means users should set `table.dml-sync`.
> >
> > > What's the return schema for the public interfaces of GatewayService?
> > Like getTable interface, what's the return value schema?
> >
> > The API of the GatewayService return the java objects and the endpoint can
> > organize the objects with expected schema. The return results is also list
> > the section ComponetAPI#GatewayService#API. The return type of the
> > GatewayService#getTable is `ContextResolvedTable`.
> >
> > > How does the user get the operation log?
> >
> > The OperationManager will register the LogAppender before the Operation
> > execution. The Log Appender will hijack the logger and also write the log
> > that related to the Operation to another files. When users wants to fetch
> > the Operation log, the GatewayService will read the content in the file and
> > return.
> >
> > Best,
> > Shengkai
> >
> >
> >
> >
> > Nicholas Jiang  于2022年4月22日周五 16:21写道：
> >
> > > Hi Shengkai.
> > >
> > > Thanks for driving the proposal of SQL Client Gateway. I have some
> > > knowledge of Kyuubi and have some questions about the design:
> > >
> > > 1.Do the public interfaces of GatewayService refer to any service? If
> > > referring to HiveService, does GatewayService need interfaces like
> > > getQueryId etc.
> > >
> > > 2.What's the behavior of SQL Client Gateway working on Yarn or K8S? Does
> > > the SQL Client Gateway support application or session mode on Yarn?
> > >
> > > 3.Is there any event trigger in the operation state machine?
> > >

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread rui fan

Congratulations Yang!

Best
fanrui

On Thu, May 5, 2022 at 19:47 Martijn Visser  wrote:

> Congratulations Yang Wang!
>
> On Thu, 5 May 2022 at 13:23, Lijie Wang  wrote:
>
> > Congratulations Yang!
> >
> > Best,
> > Lijie
> >
> > Xintong Song  于2022年5月5日周四 19:18写道：
> >
> > > Hi all,
> > >
> > > I'm very happy to announce that Yang Wang has joined the Flink PMC!
> > >
> > > Yang has been consistently contributing to our community, by
> contributing
> > > codes, participating in discussions, mentoring new contributors,
> > answering
> > > questions on mailing lists, and giving talks on Flink at
> > > various conferences and events. He is one of the main contributors and
> > > maintainers in Flink's Native Kubernetes / Yarn integrations and the
> > Flink
> > > Kubernetes Operator.
> > >
> > > Congratulations and welcome, Yang!
> > >
> > > Thank you~
> > >
> > > Xintong Song (On behalf of the Apache Flink PMC)
> > >
> >
>

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Martijn Visser

Congratulations Yang Wang!

On Thu, 5 May 2022 at 13:23, Lijie Wang  wrote:

> Congratulations Yang!
>
> Best,
> Lijie
>
> Xintong Song  于2022年5月5日周四 19:18写道：
>
> > Hi all,
> >
> > I'm very happy to announce that Yang Wang has joined the Flink PMC!
> >
> > Yang has been consistently contributing to our community, by contributing
> > codes, participating in discussions, mentoring new contributors,
> answering
> > questions on mailing lists, and giving talks on Flink at
> > various conferences and events. He is one of the main contributors and
> > maintainers in Flink's Native Kubernetes / Yarn integrations and the
> Flink
> > Kubernetes Operator.
> >
> > Congratulations and welcome, Yang!
> >
> > Thank you~
> >
> > Xintong Song (On behalf of the Apache Flink PMC)
> >
>

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Lijie Wang

Congratulations Yang!

Best,
Lijie

Xintong Song  于2022年5月5日周四 19:18写道：

> Hi all,
>
> I'm very happy to announce that Yang Wang has joined the Flink PMC!
>
> Yang has been consistently contributing to our community, by contributing
> codes, participating in discussions, mentoring new contributors, answering
> questions on mailing lists, and giving talks on Flink at
> various conferences and events. He is one of the main contributors and
> maintainers in Flink's Native Kubernetes / Yarn integrations and the Flink
> Kubernetes Operator.
>
> Congratulations and welcome, Yang!
>
> Thank you~
>
> Xintong Song (On behalf of the Apache Flink PMC)
>

Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Geng Biao

Congratulations, Yang!
Best,
Biao Geng

获取 Outlook for iOS

发件人: Xintong Song 
发送时间: Thursday, May 5, 2022 7:18:10 PM
收件人: dev ; Yang Wang 
主题: [ANNOUNCE] New Flink PMC member: Yang Wang

Hi all,

I'm very happy to announce that Yang Wang has joined the Flink PMC!

Yang has been consistently contributing to our community, by contributing
codes, participating in discussions, mentoring new contributors, answering
questions on mailing lists, and giving talks on Flink at
various conferences and events. He is one of the main contributors and
maintainers in Flink's Native Kubernetes / Yarn integrations and the Flink
Kubernetes Operator.

Congratulations and welcome, Yang!

Thank you~

Xintong Song (On behalf of the Apache Flink PMC)

[ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-05 Thread Xintong Song

Hi all,

I'm very happy to announce that Yang Wang has joined the Flink PMC!

Yang has been consistently contributing to our community, by contributing
codes, participating in discussions, mentoring new contributors, answering
questions on mailing lists, and giving talks on Flink at
various conferences and events. He is one of the main contributors and
maintainers in Flink's Native Kubernetes / Yarn integrations and the Flink
Kubernetes Operator.

Congratulations and welcome, Yang!

Thank you~

Xintong Song (On behalf of the Apache Flink PMC)

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Yu Li

Hurray!

Thanks Yun Gao, Till and Joe for all the efforts as our release managers.
And thanks all contributors for making this happen!

Best Regards,
Yu


On Thu, 5 May 2022 at 18:01, Sergey Nuyanzin  wrote:

> Great news!
> Congratulations!
> Thanks to the release managers, and everyone involved.
>
> On Thu, May 5, 2022 at 11:57 AM godfrey he  wrote:
>
> > Congratulations~
> >
> > Thanks Yun, Till and Joe for driving this release
> > and everyone who made this release happen.
> >
> > Best,
> > Godfrey
> >
> > Becket Qin  于2022年5月5日周四 17:39写道：
> > >
> > > Hooray! Thanks Yun, Till and Joe for driving the release!
> > >
> > > Cheers,
> > >
> > > JIangjie (Becket) Qin
> > >
> > > On Thu, May 5, 2022 at 5:20 PM Timo Walther 
> wrote:
> > >
> > > > It took a bit longer than usual. But I'm sure the users will love
> this
> > > > release.
> > > >
> > > > Big thanks to the release managers!
> > > >
> > > > Timo
> > > >
> > > > Am 05.05.22 um 10:45 schrieb Yuan Mei:
> > > > > Great!
> > > > >
> > > > > Thanks, Yun Gao, Till, and Joe for driving the release, and thanks
> to
> > > > > everyone for making this release happen!
> > > > >
> > > > > Best
> > > > > Yuan
> > > > >
> > > > > On Thu, May 5, 2022 at 4:40 PM Leonard Xu 
> wrote:
> > > > >
> > > > >> Congratulations!
> > > > >>
> > > > >> Thanks Yun Gao, Till and Joe for the great work as our release
> > manager
> > > > and
> > > > >> everyone who involved.
> > > > >>
> > > > >> Best,
> > > > >> Leonard
> > > > >>
> > > > >>
> > > > >>
> > > > >>> 2022年5月5日 下午4:30，Yang Wang  写道：
> > > > >>>
> > > > >>> Congratulations!
> > > > >>>
> > > > >>> Thanks Yun Gao, Till and Joe for driving this release and
> everyone
> > who
> > > > >> made
> > > > >>> this release happen.
> > > > >>
> > > >
> > > >
> >
>
>
> --
> Best regards,
> Sergey
>

[jira] [Created] (FLINK-27500) Validation error handling inside controller blocks reconciliation

2022-05-05 Thread Gyula Fora (Jira)

Gyula Fora created FLINK-27500:
--

 Summary: Validation error handling inside controller blocks 
reconciliation
 Key: FLINK-27500
 URL: https://issues.apache.org/jira/browse/FLINK-27500
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
Reporter: Gyula Fora


Currently when using the operator without the Webhook (validating only within 
the controller) , the way we handle validation errors completely blocks 
reconciliation.

The reason for this is that validation happens between observe and 
reconciliation and an error short-circuits the controller flow thus skipping 
the reconciler which would be able to execute actions such as rollbacks, 
deployment-recovery etc.

We also return an UpdateControl without reschedule after an error which makes 
this even worse.

There are a few ways to get around this some are more complex than the other. 
One possible solution:

If a validation error occurs simply use the "old" FlinkDeployment option in the 
rest of the controller loop. We can restore the old valid deployment from the 
lastReconciledSpec field, we just need to make sure to only update the status 
at the end. This would work from the observer/reconciler's perspective as if 
the new broken spec was never submitted.

Going this way we have to avoid repeatedly reporting the error caused by 
validation as we reschedule again and again.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27501) [JUnit5 Migration] SerializerTestBase

2022-05-05 Thread Chesnay Schepler (Jira)

Chesnay Schepler created FLINK-27501:


 Summary: [JUnit5 Migration] SerializerTestBase
 Key: FLINK-27501
 URL: https://issues.apache.org/jira/browse/FLINK-27501
 Project: Flink
  Issue Type: Sub-task
  Components: API / Type Serialization System, Tests
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27499) Bump base Flink version to 1.15.0

2022-05-05 Thread Gyula Fora (Jira)

Gyula Fora created FLINK-27499:
--

 Summary: Bump base Flink version to 1.15.0
 Key: FLINK-27499
 URL: https://issues.apache.org/jira/browse/FLINK-27499
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
Reporter: Gyula Fora


With the 1.15.0 release out, we should bump our Flink dependency to 1.15 if 
this does not interfere with the 1.14 compatibility.

[~wangyang0918] what do you think?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27498) Add E2E tests to cover Flink 1.15

2022-05-05 Thread Gyula Fora (Jira)

Gyula Fora created FLINK-27498:
--

 Summary: Add E2E tests to cover Flink 1.15
 Key: FLINK-27498
 URL: https://issues.apache.org/jira/browse/FLINK-27498
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
Reporter: Gyula Fora


We should extend our e2e test coverage to test all supported Flink versions, 
initially that will be 1.14 and 1.15



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27497) Track terminal job states in the observer

2022-05-05 Thread Gyula Fora (Jira)

Gyula Fora created FLINK-27497:
--

 Summary: Track terminal job states in the observer
 Key: FLINK-27497
 URL: https://issues.apache.org/jira/browse/FLINK-27497
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
Reporter: Gyula Fora


With the improvements in FLINK-27468 Flink 1.15 app clusters will not be shut 
down in case of terminal job states (failed, finished) etc.

It is important to properly handle these states and let the user know about it.

We should always trigger events, and for terminally failed jobs record the 
error information in the FlinkDeployment status.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27496) Add test for rollback during application upgrade

2022-05-05 Thread Gyula Fora (Jira)

Gyula Fora created FLINK-27496:
--

 Summary: Add test for rollback during application upgrade
 Key: FLINK-27496
 URL: https://issues.apache.org/jira/browse/FLINK-27496
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
Reporter: Gyula Fora


With the improvements contained in FLINK-27468, it now should be possible to 
trigger a rollback during an ongoing savepoint upgrade (when we are waiting for 
the cluster to become upgradeable).

We should add a test case to cover this after FLINK-27468 is merged.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Created] (FLINK-27495) Observer should update last savepoint information directly from cluster too

2022-05-05 Thread Gyula Fora (Jira)

Gyula Fora created FLINK-27495:
--

 Summary: Observer should update last savepoint information 
directly from cluster too
 Key: FLINK-27495
 URL: https://issues.apache.org/jira/browse/FLINK-27495
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
Reporter: Gyula Fora


The observer should fetch the list checkpoints from the observed job and store 
the last savepoint into the status directly.

This is especially useful for terminal job states in Flink 1.15 as it allows us 
to avoid cornercases such as the operator failing after calling 
cancel-with-savepoint but before updating the status.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [DISCUSS] Planning Flink 1.16

2022-05-05 Thread Konstantin Knauf

Hi everyone,

Xingbo, Becket, Chesnay, Martijn, Godfrey and myself briefly talked offline
and we would propose that Xingbo, Godfrey, Martijn and Chesnay take on the
release management for the next release. Six people seemed like an overkill
but having a bit of high availability in both Europe and Asia sounded like
a good dea. These four will follow up with next steps.

Thank you,

Konstantin



Am Fr., 29. Apr. 2022 um 15:17 Uhr schrieb Becket Qin :

> Thanks for kicking off the topic, Konstantin and Chesnay.
>
> Also thanks Martijn, Godfrey and Xingbo for volunteering to be the release
> manager. Given that release 1.16 would likely be a beefy release with a
> bunch of major features already on their way, it might make sense to have
> more release managers than we usually do. We can figure out how to
> collaborate, e.g. splitting by modules / FLIPs. In case we need someone to
> get some errands or coordination done, I am happy to help.
>
> Also, +1 for feature freeze by the end of July with a potential 2-week
> delay of contingency.
>
> Cheers,
>
> Jiangjie (Becket) Qin
>
> On Fri, Apr 29, 2022 at 5:14 PM Xingbo Huang  wrote:
>
> > Thanks Konstantin and Chesnay for starting the discussion. I'm also
> willing
> > to volunteer as the release manager if this is still open.
> >
> > Regarding the feature freeze date, +1 to the end of mid August.
> >
> > Best,
> > Xingbo
> >
> > Zhu Zhu  于2022年4月29日周五 11:01写道：
> >
> > > +1 for a 5 months release cycle.
> > > +1 target the feature freeze date 1.5 months before the release date.
> It
> > > can better guard the release date.
> > > Therefore, also +1 for mid August as the feature freeze date of 1.16
> > >
> > > Thanks,
> > > Zhu
> > >
> > > Jark Wu  于2022年4月28日周四 22:24写道：
> > >
> > > > I'm also fine with the end of July for the feature freeze.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Thu, 28 Apr 2022 at 21:00, Martijn Visser 
> > > > wrote:
> > > >
> > > > > +1 for continuing to strive for a 5 months release cycle.
> > > > >
> > > > > And +1 to have the planned feature freeze mid August, which I would
> > > > propose
> > > > > to have happen on Monday the 15th of August 2022. I would also
> > already
> > > > > state that even though we know this is a holiday period, we should
> > not
> > > > > extend this deadline for that reason :)
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn Visser
> > > > > https://twitter.com/MartijnVisser82
> > > > > https://github.com/MartijnVisser
> > > > >
> > > > >
> > > > > On Thu, 28 Apr 2022 at 14:37, Jingsong Li 
> > > > wrote:
> > > > >
> > > > > > Thanks for the check.
> > > > > >
> > > > > > > Continue to strive for a 5 months release cycle and 1.5 months
> > > before
> > > > > to
> > > > > > the desired release date
> > > > > >
> > > > > > Sounds good to me!
> > > > > >
> > > > > > Best,
> > > > > > Jingsong
> > > > > >
> > > > > > On Thu, Apr 28, 2022 at 7:06 PM Konstantin Knauf <
> > kna...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > thank you for the feedback so far. I've checked the length of
> the
> > > > > > previous
> > > > > > > last release cycles. Up until Flink 1.14, we've actually been
> > > > > incredibly
> > > > > > > good at maintaining a 5 months release cycle (see below).
> > > > > Interestingly,
> > > > > > > the community is officially targeting a 4 months release cycle
> > [1].
> > > > > > >
> > > > > > > - 1.15.0 2022-05-01? (7 months, 2 days?)
> > > > > > > - 1.14.0: 2021-09-29 (4 months, 26 days)
> > > > > > > - 1.13.0: 2021-05-03 (4 months, 23 days)
> > > > > > > - 1.12.0: 2020-12-10 (5 months, 3 days)
> > > > > > > - 1.11.0: 2020-07-07 (4 months, 26 days)
> > > > > > > - 1.10.0: 2020-02-11
> > > > > > >
> > > > > > > The 1.15 release cycle has took significantly longer. In my
> > opinion
> > > > we
> > > > > > > should try to get back into the 5 months cadence with the next
> > > > release.
> > > > > > > Since empirically we always often end up moving the the feature
> > > > freeze
> > > > > > by a
> > > > > > > week or two, and that we often need about a month for release
> > > > testing &
> > > > > > > stabilization and releasing, I don't think, we should move the
> > > > planned
> > > > > > > feature freeze to later than
> > > > > > > *mid August. *
> > > > > > > What do you think:
> > > > > > > 1. Should we continue to strive for a 5 months release cycle
> (and
> > > > > update
> > > > > > > [1] accordingly)?
> > > > > > > 2. Does it sound reasonable to target a feature freeze date,
> > which
> > > is
> > > > > 1.5
> > > > > > > months before to the desired release date?
> > > > > > >
> > > > > > > Cheers,
> > > > > > >
> > > > > > > Konstantin
> > > > > > >
> > > > > > >  [1]
> > > > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases
> > > > > > >
> > > > > > > Am Do., 28. Apr. 2022 um 05:20 Uhr schrieb Jingsong Li <
> > > > > > > jingsongl...@gmail.com>:
> > > > > > >
> > > > > > > >

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Sergey Nuyanzin

Great news!
Congratulations!
Thanks to the release managers, and everyone involved.

On Thu, May 5, 2022 at 11:57 AM godfrey he  wrote:

> Congratulations~
>
> Thanks Yun, Till and Joe for driving this release
> and everyone who made this release happen.
>
> Best,
> Godfrey
>
> Becket Qin  于2022年5月5日周四 17:39写道：
> >
> > Hooray! Thanks Yun, Till and Joe for driving the release!
> >
> > Cheers,
> >
> > JIangjie (Becket) Qin
> >
> > On Thu, May 5, 2022 at 5:20 PM Timo Walther  wrote:
> >
> > > It took a bit longer than usual. But I'm sure the users will love this
> > > release.
> > >
> > > Big thanks to the release managers!
> > >
> > > Timo
> > >
> > > Am 05.05.22 um 10:45 schrieb Yuan Mei:
> > > > Great!
> > > >
> > > > Thanks, Yun Gao, Till, and Joe for driving the release, and thanks to
> > > > everyone for making this release happen!
> > > >
> > > > Best
> > > > Yuan
> > > >
> > > > On Thu, May 5, 2022 at 4:40 PM Leonard Xu  wrote:
> > > >
> > > >> Congratulations!
> > > >>
> > > >> Thanks Yun Gao, Till and Joe for the great work as our release
> manager
> > > and
> > > >> everyone who involved.
> > > >>
> > > >> Best,
> > > >> Leonard
> > > >>
> > > >>
> > > >>
> > > >>> 2022年5月5日 下午4:30，Yang Wang  写道：
> > > >>>
> > > >>> Congratulations!
> > > >>>
> > > >>> Thanks Yun Gao, Till and Joe for driving this release and everyone
> who
> > > >> made
> > > >>> this release happen.
> > > >>
> > >
> > >
>


-- 
Best regards,
Sergey

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread godfrey he

Congratulations~

Thanks Yun, Till and Joe for driving this release
and everyone who made this release happen.

Best,
Godfrey

Becket Qin  于2022年5月5日周四 17:39写道：
>
> Hooray! Thanks Yun, Till and Joe for driving the release!
>
> Cheers,
>
> JIangjie (Becket) Qin
>
> On Thu, May 5, 2022 at 5:20 PM Timo Walther  wrote:
>
> > It took a bit longer than usual. But I'm sure the users will love this
> > release.
> >
> > Big thanks to the release managers!
> >
> > Timo
> >
> > Am 05.05.22 um 10:45 schrieb Yuan Mei:
> > > Great!
> > >
> > > Thanks, Yun Gao, Till, and Joe for driving the release, and thanks to
> > > everyone for making this release happen!
> > >
> > > Best
> > > Yuan
> > >
> > > On Thu, May 5, 2022 at 4:40 PM Leonard Xu  wrote:
> > >
> > >> Congratulations!
> > >>
> > >> Thanks Yun Gao, Till and Joe for the great work as our release manager
> > and
> > >> everyone who involved.
> > >>
> > >> Best,
> > >> Leonard
> > >>
> > >>
> > >>
> > >>> 2022年5月5日 下午4:30，Yang Wang  写道：
> > >>>
> > >>> Congratulations!
> > >>>
> > >>> Thanks Yun Gao, Till and Joe for driving this release and everyone who
> > >> made
> > >>> this release happen.
> > >>
> >
> >

Re: [DISCUSS] FLIP-227: Support overdraft buffer

2022-05-05 Thread rui fan

Hi,

Thanks a lot for your discussion.

After several discussions, I think it's clear now. I updated the
"Proposed Changes" of FLIP-227[1]. If I have something
missing, please help to add it to FLIP, or add it in the mail
and I can add it to FLIP. If everything is OK, I will create a
new JIRA for the first task, and use FLINK-26762[2] as the
second task.

About the legacy source, do we set maxOverdraftBuffersPerGate=0
directly? How to identify legacySource? Or could we add
the overdraftEnabled in LocalBufferPool? The default value
is false. If the getAvailableFuture is called, change
overdraftEnabled=true.
It indicates whether there are checks isAvailable elsewhere.
It might be more general, it can cover more cases.

Also, I think the default value of 'max-overdraft-buffers-per-gate'
needs to be confirmed. I prefer it to be between 5 and 10. How
do you think?

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-227%3A+Support+overdraft+buffer
[2] https://issues.apache.org/jira/browse/FLINK-26762

Thanks
fanrui

On Thu, May 5, 2022 at 4:41 PM Piotr Nowojski  wrote:

> Hi again,
>
> After sleeping over this, if both versions (reserve and overdraft) have
> the same complexity, I would also prefer the overdraft.
>
> > `Integer.MAX_VALUE` as default value was my idea as well but now, as
> > Dawid mentioned, I think it is dangerous since it is too implicit for
> > the user and if the user submits one more job for the same TaskManger
>
> As I mentioned, it's not only an issue with multiple jobs. The same
> problem can happen with different subtasks from the same job, potentially
> leading to the FLINK-13203 deadlock [1]. With FLINK-13203 fixed, I would be
> in favour of Integer.MAX_VALUE to be the default value, but as it is, I
> think we should indeed play on the safe side and limit it.
>
> > I still don't understand how should be limited "reserve" implementation.
> > I mean if we have X buffers in total and the user sets overdraft equal
> > to X we obviously can not reserve all buffers, but how many we are
> > allowed to reserve? Should it be a different configuration like
> > percentegeForReservedBuffers?
>
> The reserve could be defined as percentage, or as a fixed number of
> buffers. But yes. In normal operation subtask would not use the reserve, as
> if numberOfAvailableBuffers < reserve, the output would be not available.
> Only in the flatMap/timers/huge records case the reserve could be used.
>
> > 1. If the total buffers of LocalBufferPool <= the reserve buffers, will
> LocalBufferPool never be available? Can't process data?
>
> Of course we would need to make sure that never happens. So the reserve
> should be < total buffer size.
>
> > 2. If the overdraft buffer use the extra buffers, when the downstream
> > task inputBuffer is insufficient, it should fail to start the job, and
> then
> > restart? When the InputBuffer is initialized, it will apply for enough
> > buffers, right?
>
> The failover if downstream can not allocate buffers is already implemented
> FLINK-14872 [2]. There is a timeout for how long the task is waiting for
> buffer allocation. However this doesn't prevent many (potentially
> infinitely many) deadlock/restarts cycles. IMO the propper solution for [1]
> would be 2b described in the ticket:
>
> > 2b. Assign extra buffers only once all of the tasks are RUNNING. This is
> a simplified version of 2a, without tracking the tasks sink-to-source.
>
> But that's a pre-existing problem and I don't think we have to solve it
> before implementing overdraft. I think we would need to solve it only
> before setting Integer.MAX_VALUE as the default for the overdraft. Maybe I
> would hesitate setting the overdraft to anything more then a couple of
> buffers by default for the same reason.
>
> > Actually, I totally agree that we don't need a lot of buffers for
> overdraft
>
> and
>
> > Also I agree we ignore the overdraftBuffers=numberOfSubpartitions.
> > When we finish this feature and after users use it, if users feedback
> > this issue we can discuss again.
>
> +1
>
> Piotrek
>
> [1] https://issues.apache.org/jira/browse/FLINK-13203
> [2] https://issues.apache.org/jira/browse/FLINK-14872
>
> czw., 5 maj 2022 o 05:52 rui fan <1996fan...@gmail.com> napisał(a):
>
>> Hi everyone,
>>
>> I still have some questions.
>>
>> 1. If the total buffers of LocalBufferPool <= the reserve buffers, will
>> LocalBufferPool never be available? Can't process data?
>> 2. If the overdraft buffer use the extra buffers, when the downstream
>> task inputBuffer is insufficient, it should fail to start the job, and
>> then
>> restart? When the InputBuffer is initialized, it will apply for enough
>> buffers, right?
>>
>> Also I agree we ignore the overdraftBuffers=numberOfSubpartitions.
>> When we finish this feature and after users use it, if users feedback
>> this issue we can discuss again.
>>
>> Thanks
>> fanrui
>>
>> On Wed, May 4, 2022 at 5:29 PM Dawid Wysakowicz 
>> wrote:
>>
>>> Hey all,
>>>
>>> I have not

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Becket Qin

Hooray! Thanks Yun, Till and Joe for driving the release!

Cheers,

JIangjie (Becket) Qin

On Thu, May 5, 2022 at 5:20 PM Timo Walther  wrote:

> It took a bit longer than usual. But I'm sure the users will love this
> release.
>
> Big thanks to the release managers!
>
> Timo
>
> Am 05.05.22 um 10:45 schrieb Yuan Mei:
> > Great!
> >
> > Thanks, Yun Gao, Till, and Joe for driving the release, and thanks to
> > everyone for making this release happen!
> >
> > Best
> > Yuan
> >
> > On Thu, May 5, 2022 at 4:40 PM Leonard Xu  wrote:
> >
> >> Congratulations!
> >>
> >> Thanks Yun Gao, Till and Joe for the great work as our release manager
> and
> >> everyone who involved.
> >>
> >> Best,
> >> Leonard
> >>
> >>
> >>
> >>> 2022年5月5日 下午4:30，Yang Wang  写道：
> >>>
> >>> Congratulations!
> >>>
> >>> Thanks Yun Gao, Till and Joe for driving this release and everyone who
> >> made
> >>> this release happen.
> >>
>
>

Re: [VOTE] Apache Flink Table Store 0.1.0, release candidate #2

2022-05-05 Thread Jingsong Li

Hi, Konstantin and OpenInx,

I have updated https://github.com/apache/flink-web/pull/531 , you can
take a review~

Best,
Jingsong

On Thu, May 5, 2022 at 3:07 PM OpenInx  wrote:
>
> +1 ( non-binding) for the release. I agree with Konstantin that we can add
> more materials about the table-store for the voting.
>
>
>1. Download the source tarball, signature (.asc), and checksum
>(.sha512):   OK
>2. Import gpg keys: download KEYS and run gpg --import
>/path/to/downloaded/KEYS (optional if this hasn’t changed) :  OK
>3. Verify the signature by running: gpg --verify
>flink-table-store-0.1.0-src.tgz.asc:  OK
>4. Verify the checksum by running: shasum -a 256 -c
>flink-table-store-0.1.0-src.tgz.sha512 flink-table-store-0.1.0-src.tgz :  
> OK
>5. Untar the archive and go into the source directory: tar xzf
>flink-table-store-0.1.0-src.tgz && cd flink-table-store-0.1.0:  OK
>6. Build and test the project: mvn clean package (use Java 8) :   All
>unit tests passed, except the e2e tests. Seems we will need to set up a
>docker environment to run those e2e tests successfully.
>7. Verify the apache flink access table store:
>
> ./bin/sql-client.sh -j
> /Users/openinx/Downloads/flink-table-store-0.1.0/flink-table-store-dist/target/flink-table-store-dist-0.1.0.jar
> embedded shell
>
> SET 'table-store.path' = '/Users/openinx/test/table-store' ;
>
> SET 'execution.runtime-mode'='batch';
>
> SET 'sql-client.execution.result-mode' = 'tableau';
>
> CREATE TABLE MyTable (
>
> user_id BIGINT,
>
> item_id BIGINT,
>
> behavior STRING,
>
> dt STRING,
>
> PRIMARY KEY (dt, user_id) NOT ENFORCED
>
> ) PARTITIONED BY (dt) WITH (
>
> 'bucket' = '4'
>
> );
>
> INSERT INTO MyTable VALUES
>
> (100, 200, 'buy', '2022-05-04'),
>
> (101, 201, 'save', '2022-05-04'),
>
> (101, 201, 'purchase', '2022-05-04');
>
>
> SELECT * FROM MyTable;
>
> +-+-+--++
>
> | user_id | item_id | behavior | dt |
>
> +-+-+--++
>
> | 100 | 200 | buy | 2022-05-04 |
>
> | 101 | 201 | purchase | 2022-05-04 |
>
> +-+-+--++
>
> 2 rows in set
>
> On Thu, May 5, 2022 at 2:39 PM Nicholas Jiang 
> wrote:
>
> > Hi everyone,
> >
> > +1 for the release (non-binding).
> >
> > - Built and compiled source codes [PASSED]
> > - Went through quick start guide [PASSED]
> > - Checked README.md [PASSED]
> > - Checked that use the table store jar to build query table application
> > [PASSED]
> >
> > Best regards,
> >
> > Nicholas Jiang
> >
> > On 2022/04/29 02:24:09 Jingsong Li wrote:
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #2 for the version 0.1.0
> > of
> > > Apache Flink Table Store, as follows:
> > >
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > > **Release Overview**
> > >
> > > As an overview, the release consists of the following:
> > > a) Table Store canonical source distribution, to be deployed to the
> > > release repository at dist.apache.org
> > > b) Maven artifacts to be deployed to the Maven Central Repository
> > >
> > > **Staging Areas to Review**
> > >
> > > The staging areas containing the above mentioned artifacts are as
> > follows,
> > > for your review:
> > > * All artifacts for a) and b) can be found in the corresponding dev
> > > repository at dist.apache.org [2]
> > > * All artifacts for c) can be found at the Apache Nexus Repository [3]
> > > * Pre Bundled Binaries Jar can work fine with quick start [4][5]
> > >
> > > All artifacts are signed with the key
> > > 2C2B6A653B07086B65E4369F7C76245E0A318150 [6]
> > >
> > > Other links for your review:
> > > * JIRA release notes [7]
> > > * source code tag "release-0.1.0-rc2" [8]
> > > * PR to update the website Downloads page to include Table Store
> > > links [9]
> > >
> > > **Vote Duration**
> > >
> > > The voting time will run for at least 72 hours.
> > > It is adopted by majority approval, with at least 3 PMC affirmative
> > votes.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/Verifying+a+Flink+Table+Store+Release
> > > [2]
> > https://dist.apache.org/repos/dist/dev/flink/flink-table-store-0.1.0-rc2/
> > > [3]
> > https://repository.apache.org/content/repositories/orgapacheflink-1502/
> > > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1502/org/apache/flink/flink-table-store-dist/0.1.0/flink-table-store-dist-0.1.0.jar
> > > [5]
> > https://nightlies.apache.org/flink/flink-table-store-docs-release-0.1/docs/try-table-store/quick-start/
> > > [6] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [7]
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351234
> > > [8] https://github.com/apache/flink-table-store/tree/release-0.1.0-rc2
> > > [9] https://github.com/apache/flink-web/pull/531
> > >
> >

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Yun Tang

Great news!

Thanks for Yun Gao, Till and Joe for driving this release and everyone who ever 
contributed to this release.

Best
Yun Tang

From: Timo Walther 
Sent: Thursday, May 5, 2022 17:20
To: dev@flink.apache.org 
Subject: Re: [ANNOUNCE] Apache Flink 1.15.0 released

It took a bit longer than usual. But I'm sure the users will love this
release.

Big thanks to the release managers!

Timo

Am 05.05.22 um 10:45 schrieb Yuan Mei:
> Great!
>
> Thanks, Yun Gao, Till, and Joe for driving the release, and thanks to
> everyone for making this release happen!
>
> Best
> Yuan
>
> On Thu, May 5, 2022 at 4:40 PM Leonard Xu  wrote:
>
>> Congratulations!
>>
>> Thanks Yun Gao, Till and Joe for the great work as our release manager and
>> everyone who involved.
>>
>> Best,
>> Leonard
>>
>>
>>
>>> 2022年5月5日 下午4:30，Yang Wang  写道：
>>>
>>> Congratulations!
>>>
>>> Thanks Yun Gao, Till and Joe for driving this release and everyone who
>> made
>>> this release happen.
>>

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Timo Walther

It took a bit longer than usual. But I'm sure the users will love this 
release.


Big thanks to the release managers!

Timo

Am 05.05.22 um 10:45 schrieb Yuan Mei:

Great!

Thanks, Yun Gao, Till, and Joe for driving the release, and thanks to
everyone for making this release happen!

Best
Yuan

On Thu, May 5, 2022 at 4:40 PM Leonard Xu  wrote:


Congratulations!

Thanks Yun Gao, Till and Joe for the great work as our release manager and
everyone who involved.

Best,
Leonard




2022年5月5日 下午4:30，Yang Wang  写道：

Congratulations!

Thanks Yun Gao, Till and Joe for driving this release and everyone who

made

this release happen.

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Yuan Mei

Great!

Thanks, Yun Gao, Till, and Joe for driving the release, and thanks to
everyone for making this release happen!

Best
Yuan

On Thu, May 5, 2022 at 4:40 PM Leonard Xu  wrote:

> Congratulations!
>
> Thanks Yun Gao, Till and Joe for the great work as our release manager and
> everyone who involved.
>
> Best,
> Leonard
>
>
>
> > 2022年5月5日 下午4:30，Yang Wang  写道：
> >
> > Congratulations!
> >
> > Thanks Yun Gao, Till and Joe for driving this release and everyone who
> made
> > this release happen.
>
>

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Leonard Xu

Congratulations! 

Thanks Yun Gao, Till and Joe for the great work as our release manager and 
everyone who involved.

Best,
Leonard



> 2022年5月5日 下午4:30，Yang Wang  写道：
> 
> Congratulations!
> 
> Thanks Yun Gao, Till and Joe for driving this release and everyone who made
> this release happen.

Re: [DISCUSS] FLIP-227: Support overdraft buffer

2022-05-05 Thread Piotr Nowojski

Hi again,

After sleeping over this, if both versions (reserve and overdraft) have the
same complexity, I would also prefer the overdraft.

> `Integer.MAX_VALUE` as default value was my idea as well but now, as
> Dawid mentioned, I think it is dangerous since it is too implicit for
> the user and if the user submits one more job for the same TaskManger

As I mentioned, it's not only an issue with multiple jobs. The same problem
can happen with different subtasks from the same job, potentially leading
to the FLINK-13203 deadlock [1]. With FLINK-13203 fixed, I would be in
favour of Integer.MAX_VALUE to be the default value, but as it is, I think
we should indeed play on the safe side and limit it.

> I still don't understand how should be limited "reserve" implementation.
> I mean if we have X buffers in total and the user sets overdraft equal
> to X we obviously can not reserve all buffers, but how many we are
> allowed to reserve? Should it be a different configuration like
> percentegeForReservedBuffers?

The reserve could be defined as percentage, or as a fixed number of
buffers. But yes. In normal operation subtask would not use the reserve, as
if numberOfAvailableBuffers < reserve, the output would be not available.
Only in the flatMap/timers/huge records case the reserve could be used.

> 1. If the total buffers of LocalBufferPool <= the reserve buffers, will
LocalBufferPool never be available? Can't process data?

Of course we would need to make sure that never happens. So the reserve
should be < total buffer size.

> 2. If the overdraft buffer use the extra buffers, when the downstream
> task inputBuffer is insufficient, it should fail to start the job, and
then
> restart? When the InputBuffer is initialized, it will apply for enough
> buffers, right?

The failover if downstream can not allocate buffers is already implemented
FLINK-14872 [2]. There is a timeout for how long the task is waiting for
buffer allocation. However this doesn't prevent many (potentially
infinitely many) deadlock/restarts cycles. IMO the propper solution for [1]
would be 2b described in the ticket:

> 2b. Assign extra buffers only once all of the tasks are RUNNING. This is
a simplified version of 2a, without tracking the tasks sink-to-source.

But that's a pre-existing problem and I don't think we have to solve it
before implementing overdraft. I think we would need to solve it only
before setting Integer.MAX_VALUE as the default for the overdraft. Maybe I
would hesitate setting the overdraft to anything more then a couple of
buffers by default for the same reason.

> Actually, I totally agree that we don't need a lot of buffers for
overdraft

and

> Also I agree we ignore the overdraftBuffers=numberOfSubpartitions.
> When we finish this feature and after users use it, if users feedback
> this issue we can discuss again.

+1

Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-13203
[2] https://issues.apache.org/jira/browse/FLINK-14872

czw., 5 maj 2022 o 05:52 rui fan <1996fan...@gmail.com> napisał(a):

> Hi everyone,
>
> I still have some questions.
>
> 1. If the total buffers of LocalBufferPool <= the reserve buffers, will
> LocalBufferPool never be available? Can't process data?
> 2. If the overdraft buffer use the extra buffers, when the downstream
> task inputBuffer is insufficient, it should fail to start the job, and
> then
> restart? When the InputBuffer is initialized, it will apply for enough
> buffers, right?
>
> Also I agree we ignore the overdraftBuffers=numberOfSubpartitions.
> When we finish this feature and after users use it, if users feedback
> this issue we can discuss again.
>
> Thanks
> fanrui
>
> On Wed, May 4, 2022 at 5:29 PM Dawid Wysakowicz 
> wrote:
>
>> Hey all,
>>
>> I have not replied in the thread yet, but I was following the discussion.
>>
>> Personally, I like Fanrui's and Anton's idea. As far as I understand it
>> the idea to distinguish between inside flatMap & outside would be fairly
>> simple, but maybe slightly indirect. The checkAvailability would remain
>> unchanged and it is checked always between separate invocations of the
>> UDF. Therefore the overdraft buffers would not apply there. However once
>> the pool says it is available, it means it has at least an initial
>> buffer. So any additional request without checking for availability can
>> be considered to be inside of processing a single record. This does not
>> hold just for the LegacySource as I don't think it actually checks for
>> the availability of buffers in the LocalBufferPool.
>>
>> In the offline chat with Anton, we also discussed if we need a limit of
>> the number of buffers we could overdraft (or in other words if the limit
>> should be equal to Integer.MAX_VALUE), but personally I'd prefer to stay
>> on the safe side and have it limited. The pool of network buffers is
>> shared for the entire TaskManager, so it means it can be shared even
>> across tasks of separate jobs. However, I might be just

[jira] [Created] (FLINK-27494) Documentation is unavailable

2022-05-05 Thread Jira

李伟高 created FLINK-27494:
---

 Summary: Documentation is unavailable
 Key: FLINK-27494
 URL: https://issues.apache.org/jira/browse/FLINK-27494
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.15.0
Reporter: 李伟高
 Attachments: image-2022-05-05-16-38-01-827.png, 
image-2022-05-05-16-38-21-499.png

!image-2022-05-05-16-38-01-827.png!

!image-2022-05-05-16-38-21-499.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: [DISCUSS] Maintain a Calcite repository for Flink to accelerate the development for Flink SQL features

2022-05-05 Thread godfrey he

Hi, Timo & Martijn,

Sorry for the late reply, thanks for the feedback.

I strongly agree that the best solution would be to cooperate more
with the Calcite community
and maintain all new features and bug fixes in the Calcite community,
without any forking.
It is a long-term process. I think it's difficult to change community
rules, because the Calcite
project is a neutral lib that serves multiple projects simultaneously.
I don't think fork calcite is the perfect solution, but rather a
better balance within limited resources:
it's possible to introduce some necessary minor features and bug fixes
without having to
upgrade to the latest version.

I investigate other projects that use Calcite[1] and find that most of
them do not use
the latest version of the Calcite. Even for the Kylin community, the
version, based on
Calcite-1.16.0 has been updated to 70[2]. (Similar projects are quark and drill)
My guess is that these projects choosed a stable version,
(or even choose to maintain a fork project), to maintain the stability.
When Flink does not need to introduce new syntax anymore,
I guess it's less expensive and more manageable to maintain a fork Calcite.

Even if we don't end up going the fork calcite route,
I hope that we could discuss the options for subsequent calcite upgrades here.
Just like Timo mentioned, how to balance feature development and code
maintenance.
There are a few realistic questions about the Calcite upgrade
situation now, such as:
1. If we keep up with the latest version of Calcite, who is
responsible for each upgrade?
The current status is that no one has motivation to upgrade the version
unless he/she wants to drive new features.
2. Do we have the resources/energy to upgrade each version?
3. How do we ensure that each upgrade is expected? It took a lot of effort to
verify the correctness of the upgrade results.The Test set for
uncommon sql usage is not enough now.

> I still don't quite understand why we want to avoid Calcite upgrades.
Not every feature in Calcite is a feature we really need. While some
refactorings can be very burdensome
(lots of bugs, plan changes, and a lot of effort to fix).
Just as mentioned above, the "SEARCH operator" refactoring in
CALCITE-4173 did cause a lot of bugs.

[1] https://calcite.apache.org/docs/powered_by.html
[2] https://github.com/Kyligence/calcite/commits/kycalcite-1.16.0.x-4.x

Best,
Godfrey

Martijn Visser  于2022年4月25日周一 22:11写道：

>
> Hi all,
>
> Just a couple of remarks on some of things from this thread:
>
> > I think we will upgrade Calcite to 1.31 only when Flink depends on some
> significant features of Calcite.
> > Such as: new syntax PTF (CALCITE-4865).
>
> Like Timo also mentions, I think this is a bad practice. Calcite is a key
> dependency for Flink. We should upgrade as often as possible, not as little
> as possible. Any fork in the beginning is easy, but it becomes a bigger
> pain as time progresses.
>
> > >## Are the calcite repository costly to maintain?
> > From the experience of @Dann y chen (One PMC of Calcite), publishing
> > is much easier.
>
> Since Calcite is such a key dependency, I would really oppose forking it.
> There will only be very few maintainers of such a fork. The amount of
> people that know and can maintain both Calcite and Flink will be even less.
>
> > I'm just trying to find an approach which can avoid frequent Calcite
> upgrades,
> > but easily support bug fix and minor new feature development.
>
> I still don't quite understand why we want to avoid Calcite upgrades.
> Upgrading Calcite introduces new features, but it also resolves bugs that
> currently exist in Flink. Part of housekeeping is that we keep our codebase
> up-to-date and tidy, to avoid that it becomes a mess and unmaintainable. I
> understand that this is less preferred, because you can't spend this time
> working on new features. If I make a comparison with doing construction
> work on your house, you can't put in a new floor if you don't clean out the
> room first.
>
> > About Calcite version upgrading,  we should try not use the latest
> Calcite version to avoid the bugs introduced by the new version if possible.
>
> I can fully agree on that. But right now we're running multiple versions
> behind.
>
> Have we reached out to the Calcite community first with our problems, or
> have we gone straight into "let's fork it"?
>
> I still haven't seen an argument that would make me in favor of setting up
> a fork.
>
> Best regards,
>
> Martijn
>
> On Mon, 25 Apr 2022 at 15:55, Timo Walther  wrote:
>
> > Hi Godfrey,
> >
> > I'm also strictly against maintaining a Calcite fork. We had similar
> > discussions during the merge of the Blink code base in the past and I'm
> > happy that we could prevent a fork until today. Let me elaborate a bit
> > on my strict opinion here:
> >
> > 1) Calcite does not offer bugfix releases
> >
> > In the end, also Calcite is an Apache community. I'm sure we could
> > improve our collaboration and help releasing

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Yang Wang

Congratulations!

Thanks Yun Gao, Till and Joe for driving this release and everyone who made
this release happen.



Best,
Yang

Jingsong Li  于2022年5月5日周四 16:04写道：

> Cheers! Congratulations!
>
> Thank you very much! And thank all who contributed to this release.
>
> Best,
> Jingsong
>
> On Thu, May 5, 2022 at 3:57 PM Xintong Song  wrote:
> >
> > Congratulations~!
> >
> > Thank the release managers, and thank all who contributed to this
> release.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Thu, May 5, 2022 at 3:45 PM Guowei Ma  wrote:
> >
> > > Hi, Yun
> > >
> > > Great job!
> > > Thank you very much for your efforts to release Flink-1.15 during this
> > > time.
> > > Thanks also to all the contributors who worked on this release!
> > >
> > > Best,
> > > Guowei
> > >
> > >
> > > On Thu, May 5, 2022 at 3:24 PM Peter Schrott 
> > > wrote:
> > >
> > > > Great!
> > > >
> > > > Will install it on the cluster asap! :)
> > > >
> > > > One thing I noticed: the linked release notes in the blog
> announcement
> > > > under "Upgrade Notes" result in a 404
> > > > (
> > > >
> > > >
> > >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/release-notes/flink-1.15/
> > > > )
> > > >
> > > > They are also not linked on the main page:
> > > > https://nightlies.apache.org/flink/flink-docs-release-1.15/
> > > >
> > > > Keep it up!
> > > > Peter
> > > >
> > > >
> > > > On Thu, May 5, 2022 at 8:43 AM Martijn Visser  >
> > > > wrote:
> > > >
> > > > > Thank you Yun Gao, Till and Joe for driving this release. Your
> efforts
> > > > are
> > > > > greatly appreciated!
> > > > >
> > > > > To everyone who has opened Jira tickets, provided PRs, reviewed
> code,
> > > > > written documentation or anything contributed in any other way,
> this
> > > > > release was (once again) made possible by you! Thank you.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Martijn
> > > > >
> > > > > Op do 5 mei 2022 om 08:38 schreef Yun Gao 
> > > > >
> > > > >> The Apache Flink community is very happy to announce the release
> of
> > > > >> Apache Flink 1.15.0, which is the first release for the Apache
> Flink
> > > > >> 1.15 series.
> > > > >>
> > > > >> Apache Flink® is an open-source stream processing framework for
> > > > >> distributed, high-performing, always-available, and accurate data
> > > > >> streaming applications.
> > > > >>
> > > > >> The release is available for download at:
> > > > >> https://flink.apache.org/downloads.html
> > > > >>
> > > > >> Please check out the release blog post for an overview of the
> > > > >> improvements for this release:
> > > > >> https://flink.apache.org/news/2022/05/05/1.15-announcement.html
> > > > >>
> > > > >> The full release notes are available in Jira:
> > > > >>
> > > > >>
> > > >
> > >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350442
> > > > >>
> > > > >> We would like to thank all contributors of the Apache Flink
> community
> > > > >> who made this release possible!
> > > > >>
> > > > >> Regards,
> > > > >> Joe, Till, Yun Gao
> > > > >>
> > > > > --
> > > > >
> > > > > Martijn Visser | Product Manager
> > > > >
> > > > > mart...@ververica.com
> > > > >
> > > > > 
> > > > >
> > > > >
> > > > > Follow us @VervericaData
> > > > >
> > > > > --
> > > > >
> > > > > Join Flink Forward  - The Apache Flink
> > > > > Conference
> > > > >
> > > > > Stream Processing | Event Driven | Real Time
> > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-05-05 Thread Piotr Nowojski

Hi Guowei,

as Dawid wrote a couple of messages back:

> This is covered in the previous FLIP[1] which has been already
implemented in 1.15. In short, it must be enabled with the watermark
strategy which also configures drift and update interval

So by default watermark alignment is disabled, regardless if a source
supports it or not.

Best,
Piotrek

czw., 5 maj 2022 o 09:56 Guowei Ma  napisał(a):

> Hi,
>
> We know that in the case of Bounded input Flink supports the Batch
> execution mode. Currently in Batch execution mode, flink is executed on a
> stage-by-stage basis. In this way, perhaps watermark alignment might not
> gain much.
>
> So my question is: Is watermark alignment the default behavior(for
> implemented source only)? If so, have you considered evaluating the impact
> of this behavior on the Batch execution mode? Or thinks it is not
> necessary.
>
> Correct me if I miss something.
>
> Best,
> Guowei
>
>
> On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski 
> wrote:
>
> > Hi Becket and Dawid,
> >
> > > I feel that no matter which option we choose this can not be solved
> > entirely in either of the options, because of the point above and because
> > the signature of SplitReader#pauseOrResumeSplits and
> > SourceReader#pauseOrResumeSplits are slightly different (one identifies
> > splits with splitId the other one passes the splits directly).
> >
> > Yes, that's a good point in this case and for features that need to be
> > implemented in more than one place.
> >
> > > Is there any reason for pausing reading from a split an optional
> feature,
> > > other than that this was not included in the original interface?
> >
> > An additional argument in favor of making it optional is to simplify
> source
> > implementation. But on its own I'm not sure if that would be enough to
> > justify making this feature optional. Maybe.
> >
> > > I think it would be way simpler and clearer to just let end users and
> > Flink
> > > assume all the connectors will implement this feature.
> >
> > As I wrote above that would be an interesting choice to make (ease of
> > implementation for new users, vs system consistency). Regardless of that,
> > yes, for me the main argument is the API backward compatibility. But
> let's
> > clear a couple of points:
> > - The current proposal adding methods to the base interface with default
> > implementations is an OPTIONAL feature. Same as the decorative version
> > would be.
> > - Decorative version could implement "throw
> UnsupportedOperationException"
> > if user enabled watermark alignment just as well and I agree that's a
> > better option compared to logging a warning.
> >
> > Best,
> > Piotrek
> >
> >
> > śr., 4 maj 2022 o 15:40 Becket Qin  napisał(a):
> >
> > > Thanks for the reply and patient discussion, Piotr and Dawid.
> > >
> > > Is there any reason for pausing reading from a split an optional
> feature,
> > > other than that this was not included in the original interface?
> > >
> > > To be honest I am really worried about the complexity of the user story
> > > here. Optional features like this have a high overhead. Imagine this
> > > feature is optional, now a user enabled watermark alignment and
> defined a
> > > few watermark groups. Would it work? Hmm, that depends on whether the
> > > involved Source has implmemented this feature. If the Sources are well
> > > documented, good luck. Otherwise end users may have to look into the
> code
> > > of the Source to see whether the feature is supported. Which is
> something
> > > they shouldn't have to do.
> > >
> > > I think it would be way simpler and clearer to just let end users and
> > Flink
> > > assume all the connectors will implement this feature. After all the
> > > watermark group is not optinoal to the end users. If in some rare
> cases,
> > > the feature cannot be supported, a clear UnsupportedOperationException
> > will
> > > be thrown to tell users to explicitly remove this Source from the
> > watermark
> > > group. I don't think we should have a warning message here, as they
> tend
> > to
> > > be ignored in many cases. If we do this, we don't even need the
> > supportXXX
> > > method in the Source for this feature. In fact this is exactly how many
> > > interfaces works today. For example, SplitEnumerator#addSplitsBack() is
> > not
> > > supported by Pravega source because it does not support partial
> failover.
> > > In that case, it simply throws an exception to trigger a global
> recovery.
> > >
> > > The reason we add a default implementation in this case would just for
> > the
> > > sake of backwards compatibility so the old source can still compile.
> > Sure,
> > > in short term, this feature might not be supported by many existing
> > > sources. That is OK, and it is quite visible to the source developers
> > that
> > > they did not override the default impl which throws an
> > > UnsupportedOperationException.
> > >
> > > @Dawid,
> > >
> > > the Java doc of the SupportXXX() method in the Source

[jira] [Created] (FLINK-27493) Forward all numeric Kafka metrics to Flink's metrics system

2022-05-05 Thread Fabian Paul (Jira)

Fabian Paul created FLINK-27493:
---

 Summary: Forward all numeric Kafka metrics to Flink's metrics 
system
 Key: FLINK-27493
 URL: https://issues.apache.org/jira/browse/FLINK-27493
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / Kafka
Affects Versions: 1.16.0
Reporter: Fabian Paul


With the upgrade of the Kafka version to 2.8, it is now possible to access more 
metrics from the KafkaConsumer/KafkaProducer. So far we only forward metrics 
that are of type Double but ignore other numeric values. It might be worthwhile 
to forward all numerics metrics.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Re: Source alignment for Iceberg

2022-05-05 Thread Piotr Nowojski

Ok, I see. Thanks to both of you for the explanation.

Do we need changes to Apache Flink for this feature? Can it be implemented
in the Sources without changes in the framework? I presume source can
access min/max watermark from the split, so as long as it also knows
exactly which splits have finished, it would know which splits to hold back.

Best,
Piotrek

śr., 4 maj 2022 o 20:03 Steven Wu  napisał(a):

> Piotr, thanks a lot for your feedback.
>
> > I can see this being an issue if the existence of too many blocked
> splits is occupying too many resources.
>
> This is not desirable. Eagerly assigning many splits to a reader can
> defeat the benefits of pull based dynamic split assignments. Iceberg
> readers request one split at a time upon start or completion of a split.
> Dynamic split assignment is better for work sharing/stealing as Becket
> mentioned. Limiting number of active splits can be handled by the FLIP-27
> Iceberg source and is somewhat orthogonal to watermark alignment.
>
> > Can not Iceberg just emit all splits and let FLIP-182/FLIP-217 handle
> the watermark alignment and block the splits that are too much into the
> future?
>
> The enumerator just assigns the next split to the requesting reader
> instead of holding back the split assignment. Let the reader handle the
> pause (if the file split requires alignment wait).  This strategy might
> work and leverage more from the framework.
>
> We probably need the following to make this work
> * extract watermark/timestamp only at the completion of a split (not at
> record level). Because records in a file aren't probably not sorted by the
> timestamp field, the pause or watermark advancement is probably better done
> at file level.
> * source readers checkpoint the watermark. otherwise, upon restart readers
> won't be able to determine the local watermark and pause for alignment. We
> don't want to emit records upon restart due to unknown watermark info.
>
> All,
>
> Any opinion on different timestamp for source alignment (vs Flink
> application watermark)? For Iceberg source, we might want to enforce
> alignment on kafka timestamp but Flink application watermark may use event
> time field from payload.
>
> Thanks,
> Steven
>
> On Wed, May 4, 2022 at 7:02 AM Becket Qin  wrote:
>
>> Hey Piotr,
>>
>> I think the mechanism FLIP-182 provided is a reasonable default one, which
>> ensures the watermarks are only drifted by an upper bound. However,
>> admittedly there are also other strategies for different purposes.
>>
>> In the Iceberg case, I am not sure if a static strictly allowed watermark
>> drift is desired. The source might just want to finish reading the
>> assigned
>> splits as fast as possible. And it is OK to have a drift of "one split",
>> instead of a fixed time period.
>>
>> As another example, if there are some fast readers whose splits are always
>> throttled, while the other slow readers are struggling to keep up with the
>> rest of the splits, the split enumerator may decide to reassign the slow
>> splits so all the readers have something to read. This would need the
>> SplitEnumerator to be aware of the watermark progress on each reader. So
>> it
>> seems useful to expose the WatermarkAlignmentEvent information to the
>> SplitEnumerator as well.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>>
>>
>> On Tue, May 3, 2022 at 7:58 PM Piotr Nowojski 
>> wrote:
>>
>> > Hi Steven,
>> >
>> > Isn't this redundant to FLIP-182 and FLIP-217? Can not Iceberg just emit
>> > all splits and let FLIP-182/FLIP-217 handle the watermark alignment and
>> > block the splits that are too much into the future? I can see this
>> being an
>> > issue if the existence of too many blocked splits is occupying too many
>> > resources.
>> >
>> > If that's the case, indeed SourceCoordinator/SplitEnumerator would have
>> to
>> > decide on some basis how many and which splits to assign in what order.
>> But
>> > in that case I'm not sure how much you could use from FLIP-182 and
>> > FLIP-217. They seem somehow orthogonal to me, operating on different
>> > levels. FLIP-182 and FLIP-217 are working with whatever splits have
>> already
>> > been generated and assigned. You could leverage FLIP-182 and FLIP-217
>> and
>> > take care of only the problem to limit the number of parallel active
>> > splits. And here I'm not sure if it would be worth generalising a
>> solution
>> > across different connectors.
>> >
>> > Regarding the global watermark, I made a related comment sometime ago
>> > about it [1]. It sounds to me like you also need to solve this problem,
>> > otherwise Iceberg users will encounter late records in case of some race
>> > conditions between assigning new splits and completions of older.
>> >
>> > Best,
>> > Piotrek
>> >
>> > [1]
>> >
>> https://issues.apache.org/jira/browse/FLINK-21871?focusedCommentId=17495545=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17495545
>> >
>> > pon., 2 maj 2022 o 04:26 Steven Wu

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Jingsong Li

Cheers! Congratulations!

Thank you very much! And thank all who contributed to this release.

Best,
Jingsong

On Thu, May 5, 2022 at 3:57 PM Xintong Song  wrote:
>
> Congratulations~!
>
> Thank the release managers, and thank all who contributed to this release.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Thu, May 5, 2022 at 3:45 PM Guowei Ma  wrote:
>
> > Hi, Yun
> >
> > Great job!
> > Thank you very much for your efforts to release Flink-1.15 during this
> > time.
> > Thanks also to all the contributors who worked on this release!
> >
> > Best,
> > Guowei
> >
> >
> > On Thu, May 5, 2022 at 3:24 PM Peter Schrott 
> > wrote:
> >
> > > Great!
> > >
> > > Will install it on the cluster asap! :)
> > >
> > > One thing I noticed: the linked release notes in the blog announcement
> > > under "Upgrade Notes" result in a 404
> > > (
> > >
> > >
> > https://nightlies.apache.org/flink/flink-docs-release-1.15/release-notes/flink-1.15/
> > > )
> > >
> > > They are also not linked on the main page:
> > > https://nightlies.apache.org/flink/flink-docs-release-1.15/
> > >
> > > Keep it up!
> > > Peter
> > >
> > >
> > > On Thu, May 5, 2022 at 8:43 AM Martijn Visser 
> > > wrote:
> > >
> > > > Thank you Yun Gao, Till and Joe for driving this release. Your efforts
> > > are
> > > > greatly appreciated!
> > > >
> > > > To everyone who has opened Jira tickets, provided PRs, reviewed code,
> > > > written documentation or anything contributed in any other way, this
> > > > release was (once again) made possible by you! Thank you.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > Op do 5 mei 2022 om 08:38 schreef Yun Gao 
> > > >
> > > >> The Apache Flink community is very happy to announce the release of
> > > >> Apache Flink 1.15.0, which is the first release for the Apache Flink
> > > >> 1.15 series.
> > > >>
> > > >> Apache Flink® is an open-source stream processing framework for
> > > >> distributed, high-performing, always-available, and accurate data
> > > >> streaming applications.
> > > >>
> > > >> The release is available for download at:
> > > >> https://flink.apache.org/downloads.html
> > > >>
> > > >> Please check out the release blog post for an overview of the
> > > >> improvements for this release:
> > > >> https://flink.apache.org/news/2022/05/05/1.15-announcement.html
> > > >>
> > > >> The full release notes are available in Jira:
> > > >>
> > > >>
> > >
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350442
> > > >>
> > > >> We would like to thank all contributors of the Apache Flink community
> > > >> who made this release possible!
> > > >>
> > > >> Regards,
> > > >> Joe, Till, Yun Gao
> > > >>
> > > > --
> > > >
> > > > Martijn Visser | Product Manager
> > > >
> > > > mart...@ververica.com
> > > >
> > > > 
> > > >
> > > >
> > > > Follow us @VervericaData
> > > >
> > > > --
> > > >
> > > > Join Flink Forward  - The Apache Flink
> > > > Conference
> > > >
> > > > Stream Processing | Event Driven | Real Time
> > > >
> > > >
> > >
> >

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Konstantin Knauf

Congratulations to everyone!

Thanks to Yun Gao, Joe Moser & Till Rohrmann for managing this release.
It's done :)

Am Do., 5. Mai 2022 um 09:57 Uhr schrieb Xintong Song :

> Congratulations~!
>
> Thank the release managers, and thank all who contributed to this release.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Thu, May 5, 2022 at 3:45 PM Guowei Ma  wrote:
>
> > Hi, Yun
> >
> > Great job!
> > Thank you very much for your efforts to release Flink-1.15 during this
> > time.
> > Thanks also to all the contributors who worked on this release!
> >
> > Best,
> > Guowei
> >
> >
> > On Thu, May 5, 2022 at 3:24 PM Peter Schrott 
> > wrote:
> >
> > > Great!
> > >
> > > Will install it on the cluster asap! :)
> > >
> > > One thing I noticed: the linked release notes in the blog announcement
> > > under "Upgrade Notes" result in a 404
> > > (
> > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/release-notes/flink-1.15/
> > > )
> > >
> > > They are also not linked on the main page:
> > > https://nightlies.apache.org/flink/flink-docs-release-1.15/
> > >
> > > Keep it up!
> > > Peter
> > >
> > >
> > > On Thu, May 5, 2022 at 8:43 AM Martijn Visser 
> > > wrote:
> > >
> > > > Thank you Yun Gao, Till and Joe for driving this release. Your
> efforts
> > > are
> > > > greatly appreciated!
> > > >
> > > > To everyone who has opened Jira tickets, provided PRs, reviewed code,
> > > > written documentation or anything contributed in any other way, this
> > > > release was (once again) made possible by you! Thank you.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > Op do 5 mei 2022 om 08:38 schreef Yun Gao 
> > > >
> > > >> The Apache Flink community is very happy to announce the release of
> > > >> Apache Flink 1.15.0, which is the first release for the Apache Flink
> > > >> 1.15 series.
> > > >>
> > > >> Apache Flink® is an open-source stream processing framework for
> > > >> distributed, high-performing, always-available, and accurate data
> > > >> streaming applications.
> > > >>
> > > >> The release is available for download at:
> > > >> https://flink.apache.org/downloads.html
> > > >>
> > > >> Please check out the release blog post for an overview of the
> > > >> improvements for this release:
> > > >> https://flink.apache.org/news/2022/05/05/1.15-announcement.html
> > > >>
> > > >> The full release notes are available in Jira:
> > > >>
> > > >>
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350442
> > > >>
> > > >> We would like to thank all contributors of the Apache Flink
> community
> > > >> who made this release possible!
> > > >>
> > > >> Regards,
> > > >> Joe, Till, Yun Gao
> > > >>
> > > > --
> > > >
> > > > Martijn Visser | Product Manager
> > > >
> > > > mart...@ververica.com
> > > >
> > > > 
> > > >
> > > >
> > > > Follow us @VervericaData
> > > >
> > > > --
> > > >
> > > > Join Flink Forward  - The Apache Flink
> > > > Conference
> > > >
> > > > Stream Processing | Event Driven | Real Time
> > > >
> > > >
> > >
> >
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Xintong Song

Congratulations~!

Thank the release managers, and thank all who contributed to this release.

Thank you~

Xintong Song



On Thu, May 5, 2022 at 3:45 PM Guowei Ma  wrote:

> Hi, Yun
>
> Great job!
> Thank you very much for your efforts to release Flink-1.15 during this
> time.
> Thanks also to all the contributors who worked on this release!
>
> Best,
> Guowei
>
>
> On Thu, May 5, 2022 at 3:24 PM Peter Schrott 
> wrote:
>
> > Great!
> >
> > Will install it on the cluster asap! :)
> >
> > One thing I noticed: the linked release notes in the blog announcement
> > under "Upgrade Notes" result in a 404
> > (
> >
> >
> https://nightlies.apache.org/flink/flink-docs-release-1.15/release-notes/flink-1.15/
> > )
> >
> > They are also not linked on the main page:
> > https://nightlies.apache.org/flink/flink-docs-release-1.15/
> >
> > Keep it up!
> > Peter
> >
> >
> > On Thu, May 5, 2022 at 8:43 AM Martijn Visser 
> > wrote:
> >
> > > Thank you Yun Gao, Till and Joe for driving this release. Your efforts
> > are
> > > greatly appreciated!
> > >
> > > To everyone who has opened Jira tickets, provided PRs, reviewed code,
> > > written documentation or anything contributed in any other way, this
> > > release was (once again) made possible by you! Thank you.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > Op do 5 mei 2022 om 08:38 schreef Yun Gao 
> > >
> > >> The Apache Flink community is very happy to announce the release of
> > >> Apache Flink 1.15.0, which is the first release for the Apache Flink
> > >> 1.15 series.
> > >>
> > >> Apache Flink® is an open-source stream processing framework for
> > >> distributed, high-performing, always-available, and accurate data
> > >> streaming applications.
> > >>
> > >> The release is available for download at:
> > >> https://flink.apache.org/downloads.html
> > >>
> > >> Please check out the release blog post for an overview of the
> > >> improvements for this release:
> > >> https://flink.apache.org/news/2022/05/05/1.15-announcement.html
> > >>
> > >> The full release notes are available in Jira:
> > >>
> > >>
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350442
> > >>
> > >> We would like to thank all contributors of the Apache Flink community
> > >> who made this release possible!
> > >>
> > >> Regards,
> > >> Joe, Till, Yun Gao
> > >>
> > > --
> > >
> > > Martijn Visser | Product Manager
> > >
> > > mart...@ververica.com
> > >
> > > 
> > >
> > >
> > > Follow us @VervericaData
> > >
> > > --
> > >
> > > Join Flink Forward  - The Apache Flink
> > > Conference
> > >
> > > Stream Processing | Event Driven | Real Time
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-224: Blacklist Mechanism

2022-05-05 Thread Lijie Wang

Hi everyone,

Thanks for your feedback.


There's one detail that I'd like to re-emphasize here because it can affect
the value and design of the blocklist mechanism (perhaps I should highlight
it in the FLIP). We propose two actions in FLIP:

1) MARK_BLOCKLISTED: Just mark the task manager or node as blocked. Future
slots should not be allocated from the blocked task manager or node. But
slots that are already allocated will not be affected. A typical
application scenario is to mitigate machine hotspots. In this case, we hope
that subsequent resource allocations will not be on the hot machine, but
tasks currently running on it should not be affected.

2) MARK_BLOCKLISTED_AND_EVACUATE_TASKS: Mark the task manager or node as
blocked, and evacuate all tasks on it. Evacuated tasks will be restarted on
non-blocked task managers.

For the above 2 actions, the former may more highlight the meaning of this
FLIP, because the external system cannot do that.


Regarding *Manually* and *Automatically*, I basically agree with @Becket
Qin: different users have different answers. Not all users’ deployment
environments have a special external system that can perform the anomaly
detection. In addition, adding pluggable/optional auto-detection doesn't
require much extra work on top of manual specification.

I will answer your other questions one by one.


@Yangze

a) I think you are right, we do not need to expose the
`cluster.resource-blocklist.item.timeout-check-interval` to users.

b) We can abstract the `notifyException` to a separate interface (maybe
BlocklistExceptionListener), and the ResourceManagerBlocklistHandler can
implement it in the future.

@Martijn

a) I also think the manual blocking should be done by cluster operators.

b) I think manual blocking makes sense, because according to my experience,
users are often the first to perceive the machine problems (because of job
failover or delay), and they will contact cluster operators to solve it, or
even tell the cluster operators which machine is problematic. From this
point of view, I think the people who really need the manual blocking are
the users, and it’s just performed by the cluster operator, so I think the
manual blocking makes sense.

@Chesnay

We need to touch the logic of JM/SlotPool, because for MARK_BLOCKLISTED ,
we need to know whether the slot is blocklisted when the task is
FINISHED/CANCELLED/FAILED. If so,  SlotPool should release the slot
directly to avoid assigning other tasks (of this job) on it. If we only
maintain the blocklist information on the RM, JM needs to retrieve it by
RPC. I think the performance overhead of that is relatively large, so I
think it's worth maintaining the blocklist information on the JM side and
syncing them.

@Роман

a) “Probably storing inside Zookeeper/Configmap might be helpful
here.”  Can you explain it in detail? I don't fully understand that. In my
opinion, non-active and active are the same, and no special treatment is
required.

b) I agree with you, the `endTimestamp` makes sense, I will add it to FLIP.

@Yang

As mentioned above, AFAK, the external system cannot support the
MARK_BLOCKLISTED action.


Looking forward to your further feedback.


Best,

Lijie

Yang Wang  于2022年5月3日周二 21:09写道：

> Thanks Lijie and Zhu for creating the proposal.
>
> I want to share some thoughts about Flink cluster operations.
>
> In the production environment, the SRE(aka Site Reliability Engineer)
> already has many tools to detect the unstable nodes, which could take the
> system logs/metrics into consideration.
> Then they use graceful-decomission in YARN and taint in K8s to prevent new
> allocations on these unstable nodes.
> At last, they will evict all the containers and pods running on these
> nodes.
> This mechanism also works for planned maintenance. So I am afraid this is
> not the typical use case for FLIP-224.
>
> If we only support to block nodes manually, then I could not see
> the obvious advantages compared with current SRE's approach(via *yarn
> rmadmin or kubectl taint*).
> At least, we need to have a pluggable component which could expose the
> potential unstable nodes automatically and block them if enabled
> explicitly.
>
>
> Best,
> Yang
>
>
>
> Becket Qin  于2022年5月2日周一 16:36写道：
>
> > Thanks for the proposal, Lijie.
> >
> > This is an interesting feature and discussion, and somewhat related to
> the
> > design principle about how people should operate Flink.
> >
> > I think there are three things involved in this FLIP.
> >  a) Detect and report the unstable node.
> >  b) Collect the information of the unstable node and form a
> blocklist.
> >  c) Take the action to block nodes.
> >
> > My two cents:
> >
> > 1. It looks like people all agree that Flink should have c). It is not
> only
> > useful for cases of node failures, but also handy for some planned
> > maintenance.
> >
> > 2. People have different opinions on b), i.e. who should be the brain to
> > make the decision to block a node.

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-05-05 Thread Guowei Ma

Hi,

We know that in the case of Bounded input Flink supports the Batch
execution mode. Currently in Batch execution mode, flink is executed on a
stage-by-stage basis. In this way, perhaps watermark alignment might not
gain much.

So my question is: Is watermark alignment the default behavior(for
implemented source only)? If so, have you considered evaluating the impact
of this behavior on the Batch execution mode? Or thinks it is not necessary.

Correct me if I miss something.

Best,
Guowei


On Thu, May 5, 2022 at 1:01 PM Piotr Nowojski 
wrote:

> Hi Becket and Dawid,
>
> > I feel that no matter which option we choose this can not be solved
> entirely in either of the options, because of the point above and because
> the signature of SplitReader#pauseOrResumeSplits and
> SourceReader#pauseOrResumeSplits are slightly different (one identifies
> splits with splitId the other one passes the splits directly).
>
> Yes, that's a good point in this case and for features that need to be
> implemented in more than one place.
>
> > Is there any reason for pausing reading from a split an optional feature,
> > other than that this was not included in the original interface?
>
> An additional argument in favor of making it optional is to simplify source
> implementation. But on its own I'm not sure if that would be enough to
> justify making this feature optional. Maybe.
>
> > I think it would be way simpler and clearer to just let end users and
> Flink
> > assume all the connectors will implement this feature.
>
> As I wrote above that would be an interesting choice to make (ease of
> implementation for new users, vs system consistency). Regardless of that,
> yes, for me the main argument is the API backward compatibility. But let's
> clear a couple of points:
> - The current proposal adding methods to the base interface with default
> implementations is an OPTIONAL feature. Same as the decorative version
> would be.
> - Decorative version could implement "throw UnsupportedOperationException"
> if user enabled watermark alignment just as well and I agree that's a
> better option compared to logging a warning.
>
> Best,
> Piotrek
>
>
> śr., 4 maj 2022 o 15:40 Becket Qin  napisał(a):
>
> > Thanks for the reply and patient discussion, Piotr and Dawid.
> >
> > Is there any reason for pausing reading from a split an optional feature,
> > other than that this was not included in the original interface?
> >
> > To be honest I am really worried about the complexity of the user story
> > here. Optional features like this have a high overhead. Imagine this
> > feature is optional, now a user enabled watermark alignment and defined a
> > few watermark groups. Would it work? Hmm, that depends on whether the
> > involved Source has implmemented this feature. If the Sources are well
> > documented, good luck. Otherwise end users may have to look into the code
> > of the Source to see whether the feature is supported. Which is something
> > they shouldn't have to do.
> >
> > I think it would be way simpler and clearer to just let end users and
> Flink
> > assume all the connectors will implement this feature. After all the
> > watermark group is not optinoal to the end users. If in some rare cases,
> > the feature cannot be supported, a clear UnsupportedOperationException
> will
> > be thrown to tell users to explicitly remove this Source from the
> watermark
> > group. I don't think we should have a warning message here, as they tend
> to
> > be ignored in many cases. If we do this, we don't even need the
> supportXXX
> > method in the Source for this feature. In fact this is exactly how many
> > interfaces works today. For example, SplitEnumerator#addSplitsBack() is
> not
> > supported by Pravega source because it does not support partial failover.
> > In that case, it simply throws an exception to trigger a global recovery.
> >
> > The reason we add a default implementation in this case would just for
> the
> > sake of backwards compatibility so the old source can still compile.
> Sure,
> > in short term, this feature might not be supported by many existing
> > sources. That is OK, and it is quite visible to the source developers
> that
> > they did not override the default impl which throws an
> > UnsupportedOperationException.
> >
> > @Dawid,
> >
> > the Java doc of the SupportXXX() method in the Source would be the single
> > >> source of truth regarding how to implement this feature.
> > >
> > >
> >
> > I also don't find it entirely true. Half of the classes are theoretically
> > > optional and are utility classes from the point of view how the
> > interfaces
> > > are organized. Theoretically users do not need to use any of
> > > SourceReaderBase & SplitReader. Would be weird to list their methods in
> > the
> > > Source interface.
> >
> > I think the ultimate goal of java docs is to guide users to implement the
> > Source. If SourceReaderBase is the preferred way to implement a
> > SourceReader, it seems worth mentioning

Re: [ANNOUNCE] Apache Flink 1.15.0 released

2022-05-05 Thread Guowei Ma

Hi, Yun

Great job!
Thank you very much for your efforts to release Flink-1.15 during this time.
Thanks also to all the contributors who worked on this release!

Best,
Guowei


On Thu, May 5, 2022 at 3:24 PM Peter Schrott  wrote:

> Great!
>
> Will install it on the cluster asap! :)
>
> One thing I noticed: the linked release notes in the blog announcement
> under "Upgrade Notes" result in a 404
> (
>
> https://nightlies.apache.org/flink/flink-docs-release-1.15/release-notes/flink-1.15/
> )
>
> They are also not linked on the main page:
> https://nightlies.apache.org/flink/flink-docs-release-1.15/
>
> Keep it up!
> Peter
>
>
> On Thu, May 5, 2022 at 8:43 AM Martijn Visser 
> wrote:
>
> > Thank you Yun Gao, Till and Joe for driving this release. Your efforts
> are
> > greatly appreciated!
> >
> > To everyone who has opened Jira tickets, provided PRs, reviewed code,
> > written documentation or anything contributed in any other way, this
> > release was (once again) made possible by you! Thank you.
> >
> > Best regards,
> >
> > Martijn
> >
> > Op do 5 mei 2022 om 08:38 schreef Yun Gao 
> >
> >> The Apache Flink community is very happy to announce the release of
> >> Apache Flink 1.15.0, which is the first release for the Apache Flink
> >> 1.15 series.
> >>
> >> Apache Flink® is an open-source stream processing framework for
> >> distributed, high-performing, always-available, and accurate data
> >> streaming applications.
> >>
> >> The release is available for download at:
> >> https://flink.apache.org/downloads.html
> >>
> >> Please check out the release blog post for an overview of the
> >> improvements for this release:
> >> https://flink.apache.org/news/2022/05/05/1.15-announcement.html
> >>
> >> The full release notes are available in Jira:
> >>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350442
> >>
> >> We would like to thank all contributors of the Apache Flink community
> >> who made this release possible!
> >>
> >> Regards,
> >> Joe, Till, Yun Gao
> >>
> > --
> >
> > Martijn Visser | Product Manager
> >
> > mart...@ververica.com
> >
> > 
> >
> >
> > Follow us @VervericaData
> >
> > --
> >
> > Join Flink Forward  - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
> >
> >
>

1 2 >

1 - 100 of 109 matches

Mail list logo