Re: Elastic Block Store as checkpoint storage

2023-07-18 Thread Konstantin Knauf
Hi Prabhu,

this should be possible, but is quite expensive in comparison to AWS S3 and
you have to remount the EBS volumes to the new Taskmanagers in case of a
failure which takes some non-trivial time, which slows down recovery. So,
overall I don't think its peferrable compared to S3.

We do use EBS volumes, though, for the local RocksDB working directory. We
don't remount them on failure though right now due to the additional
latency that is introduced by that.

Cheers,

Konstantin

Am Mi., 12. Juli 2023 um 18:55 Uhr schrieb Prabhu Joseph <
prabhujose.ga...@gmail.com>:

> Hi,
>
> We are investigating the feasibility of setting up an Elastic Block Store
> (EBS) as checkpoint storage by mounting a volume (a shared local file
> system path) to JobManager and all the TaskManager pods. I want to hear any
> feedback on this approach if anyone has already tried it.
>
>
> Thanks,
> Prabhu Joseph
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk


Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Konstantin Knauf
Congrats!

Am Mo., 27. März 2023 um 14:32 Uhr schrieb Leonard Xu :

> Congratulations!
>
>
> Best,
> Leonard
>
> On Mar 27, 2023, at 5:23 PM, Yu Li  wrote:
>
> Dear Flinkers,
>
>
>
> As you may have noticed, we are pleased to announce that Flink Table Store 
> has joined the Apache Incubator as a separate project called Apache 
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
> streaming data lake platform for high-speed data ingestion, change data 
> tracking and efficient real-time analytics, with the vision of supporting a 
> larger ecosystem and establishing a vibrant and neutral open source community.
>
>
>
> We would like to thank everyone for their great support and efforts for the 
> Flink Table Store project, and warmly welcome everyone to join the 
> development and activities of the new project. Apache Flink will continue to 
> be one of the first-class citizens supported by Paimon, and we believe that 
> the Flink and Paimon communities will maintain close cooperation.
>
>
> 亲爱的Flinkers,
>
>
> 正如您可能已经注意到的,我们很高兴地宣布,Flink Table Store 已经正式加入 Apache
> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
> Apache 
> Paimon(incubating),仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外,新项目将支持更加丰富的生态,并建立一个充满活力和中立的开源社区。
>
>
> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入,并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink
> 将继续作为 Paimon 支持的主力计算引擎之一,我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
>
>
> Best Regards,
> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>
> 致礼,
> 李钰(谨代表 Apache Flink PMC 和 Apache Paimon PPMC)
>
> [1] https://paimon.apache.org/
> [2] https://github.com/apache/incubator-paimon
> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>
>
>

-- 
https://twitter.com/snntrable
https://github.com/knaufk


Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 Thread Konstantin Knauf
Congrats!

Am Mo., 27. März 2023 um 14:32 Uhr schrieb Leonard Xu :

> Congratulations!
>
>
> Best,
> Leonard
>
> On Mar 27, 2023, at 5:23 PM, Yu Li  wrote:
>
> Dear Flinkers,
>
>
>
> As you may have noticed, we are pleased to announce that Flink Table Store 
> has joined the Apache Incubator as a separate project called Apache 
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
> streaming data lake platform for high-speed data ingestion, change data 
> tracking and efficient real-time analytics, with the vision of supporting a 
> larger ecosystem and establishing a vibrant and neutral open source community.
>
>
>
> We would like to thank everyone for their great support and efforts for the 
> Flink Table Store project, and warmly welcome everyone to join the 
> development and activities of the new project. Apache Flink will continue to 
> be one of the first-class citizens supported by Paimon, and we believe that 
> the Flink and Paimon communities will maintain close cooperation.
>
>
> 亲爱的Flinkers,
>
>
> 正如您可能已经注意到的,我们很高兴地宣布,Flink Table Store 已经正式加入 Apache
> 孵化器独立孵化 [1] [2] [3]。新项目的名字是
> Apache 
> Paimon(incubating),仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外,新项目将支持更加丰富的生态,并建立一个充满活力和中立的开源社区。
>
>
> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入,并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink
> 将继续作为 Paimon 支持的主力计算引擎之一,我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
>
>
> Best Regards,
> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
>
> 致礼,
> 李钰(谨代表 Apache Flink PMC 和 Apache Paimon PPMC)
>
> [1] https://paimon.apache.org/
> [2] https://github.com/apache/incubator-paimon
> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>
>
>

-- 
https://twitter.com/snntrable
https://github.com/knaufk


Re: [ANNOUNCE] Apache Flink 1.16.1 released

2023-02-01 Thread Konstantin Knauf
Great. Thanks, Martijn for managing the release.

Am Mi., 1. Feb. 2023 um 20:26 Uhr schrieb Martijn Visser <
martijnvis...@apache.org>:

> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.16.1, which is the first bugfix release for the Apache Flink 1.16
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2023/01/30/release-1.16.1.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352344
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Feel free to reach out to the release managers (or respond to this thread)
> with feedback on the release process. Our goal is to constantly improve the
> release process. Feedback on what could be improved or things that didn't
> go so well are appreciated.
>
> Best regards,
>
> Martijn Visser
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk


Re: [ANNOUNCE] Apache Flink 1.16.1 released

2023-02-01 Thread Konstantin Knauf
Great. Thanks, Martijn for managing the release.

Am Mi., 1. Feb. 2023 um 20:26 Uhr schrieb Martijn Visser <
martijnvis...@apache.org>:

> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.16.1, which is the first bugfix release for the Apache Flink 1.16
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2023/01/30/release-1.16.1.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352344
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Feel free to reach out to the release managers (or respond to this thread)
> with feedback on the release process. Our goal is to constantly improve the
> release process. Feedback on what could be improved or things that didn't
> go so well are appreciated.
>
> Best regards,
>
> Martijn Visser
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk


Re: Best way to perform look up with historical data in Flink

2022-06-06 Thread Konstantin Knauf
Hi Surendra,

where does your historical data reside? In a database? Files? Behind a REST
API?

Depending on the answer and which API you use, the AsyncIO [1] operator
(DataStream API) or a Lookup Table Join [2]  might be good options. There
are other ways, too. A while back I did a webinar [3] on this. It is a bit
outdated, but still covers a few concepts and ideas around this.

Cheers,

Konstantin

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/operators/asyncio/
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/table/sql/queries/joins/#lookup-join
[3] https://www.youtube.com/watch?v=cJS18iKLUIY=2s

Am Mo., 6. Juni 2022 um 07:36 Uhr schrieb Surendra Lalwani <
surendra.lalw...@swiggy.in>:

> Hi Team,
>
> Since Flink does not support Stream-Batch Join, we need to perform a
> lookup with some historical data, what could be the best way in Flink to do
> that. I am aware of one such approach using Broadcasted stream, not sure if
> there are any other proper solutions. Any help in this will be helpful.
>
> Regards ,
> Surendra Lalwani
>
>
> --
> IMPORTANT NOTICE: This e-mail, including any attachments, may contain
> confidential information and is intended only for the addressee(s) named
> above. If you are not the intended recipient(s), you should not
> disseminate, distribute, or copy this e-mail. Please notify the sender by
> reply e-mail immediately if you have received this e-mail in error and
> permanently delete all copies of the original message from your system.
> E-mail transmission cannot be guaranteed to be secure as it could be
> intercepted, corrupted, lost, destroyed, arrive late or incomplete, or
> contain viruses. Company accepts no liability for any damage or loss of
> confidential information caused by this email or due to any virus
> transmitted by this email or otherwise.



-- 
https://twitter.com/snntrable
https://github.com/knaufk


Re: Community fork of Flink Scala API for Scala 2.12/2.13/3.x

2022-05-16 Thread Konstantin Knauf
Great work! Thank you for sharing.

Am Do., 12. Mai 2022 um 17:19 Uhr schrieb Jeff Zhang :

> That's true scala shell is removed from flink . Fortunately, Apache
> Zeppelin has its own scala repl for Flink. So if Flink can support scala
> 2.13, I am wondering whether it is possible to integrate it into scala
> shell so that user can run flink scala code in notebook like spark.
>
> On Thu, May 12, 2022 at 11:06 PM Roman Grebennikov  wrote:
>
>> Hi,
>>
>> AFAIK scala REPL was removed completely in Flink 1.15 (
>> https://issues.apache.org/jira/browse/FLINK-24360), so there is nothing
>> to cross-build.
>>
>> Roman Grebennikov | g...@dfdx.me
>>
>>
>> On Thu, May 12, 2022, at 14:55, Jeff Zhang wrote:
>>
>> Great work Roman, do you think it is possible to run in scala shell as
>> well?
>>
>> On Thu, May 12, 2022 at 10:43 PM Roman Grebennikov  wrote:
>>
>>
>> Hello,
>>
>> As far as I understand discussions in this mailist, now there is almost
>> no people maintaining the official Scala API in Apache Flink. Due to some
>> technical complexities it will be probably stuck for a very long time on
>> Scala 2.12 (which is not EOL yet, but quite close to):
>> * Traversable serializer relies a lot on CanBuildFrom (so it's read and
>> compiled on restore), which is missing in Scala 2.13 and 3.x - migrating
>> off from this approach maintaining a savepoint compatibility can be quite a
>> complex task.
>> * Scala API uses an implicitly generated TypeInformation, which is
>> generated by a giant scary mkTypeInfo macro, which should be completely
>> rewritten for Scala 3.x.
>>
>> But even in the current state, scala support in Flink has some issues
>> with ADT (sealed traits, popular data modelling pattern) not being natively
>> supported, so if you use them, you have to fall back to Kryo, which is not
>> that fast: we've seed 3x-4x throughput drops in performance tests.
>>
>> In my current company we made a library (
>> https://github.com/findify/flink-adt) which used Magnolia (
>> https://github.com/softwaremill/magnolia) to do all the compile-time
>> TypeInformation generation to make Scala ADT nice & fast in Flink. With a
>> couple of community contributions it was now possible to cross-build it
>> also for scala3.
>>
>> As Flink 1.15 core is scala free, we extracted the DataStream part of
>> Flink Scala API into a separate project, glued it together with flink-adt
>> and ClosureCleaner from Spark 3.2 (supporting Scala 2.13 and jvm17) and
>> cross-compiled it for 2.12/2.13/3.x. You can check out the result on this
>> github project: https://github.com/findify/flink-scala-api
>>
>> So technically speaking, now it's possible to migrate a scala flink job
>> from 2.12 to 3.x with:
>> * replace flink-streaming-scala dependency with flink-scala-api
>> (optional, both libs can co-exist in classpath on 2.12)
>> * replace all imports of org.apache.flink.streaming.api.scala._ with ones
>> from the new library
>> * rebuild the job for 3.x
>>
>> The main drawback is that there is no savepoint compatibility due to
>> CanBuildFrom and different way of handling ADTs. But if you can afford
>> re-bootstrapping the state - migration is quite straightforward.
>>
>> The README on github https://github.com/findify/flink-scala-api#readme
>> has some more details on how and why this project was done in this way. And
>> the project is a bit experimental, so if you're interested in scala3 on
>> Flink, you're welcome to share your feedback and ideas.
>>
>> with best regards,
>> Roman Grebennikov | g...@dfdx.me
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk


Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-11 Thread Konstantin Knauf
I don't think we can maintain two additional channels. Some people have
already concerns about covering one additional channel.

I think, a forum provides a better user experience than a mailing list.
Information is structured better, you can edit messages, sign up and search
is easier.

To make some progress, maybe we decide on chat vs forum vs none and then go
into a deeper discussion on the implementation or is there anything about
Slack that would be complete blocker for the implementation?



Am Mi., 11. Mai 2022 um 07:35 Uhr schrieb Xintong Song <
tonysong...@gmail.com>:

> I agree with Robert on reworking the "Community" and "Getting Help" pages
> to emphasize how we position the mailing lists and Slack, and on revisiting
> in 6-12 months.
>
> Concerning dedicated Apache Flink Slack vs. ASF Slack, I'm with
> Konstantin. I'd expect it to be easier for having more channels and keeping
> them organized, managing permissions for different roles, adding bots, etc.
>
> IMO, having Slack is about improving the communication efficiency when you
> are already in a discussion, and we expect such improvement would motivate
> users to interact more with each other. From that perspective, forums are
> not much better than mailing lists.
>
> I'm also open to forums as well, but not as an alternative to Slack. I
> definitely see how forums help in keeping information organized and easy to
> find. However, I'm a bit concerned about the maintenance overhead. I'm not
> very familiar with Discourse or Reddit. My impression is that they are not
> as easy to set up and maintain as Slack.
>
> Thank you~
>
> Xintong Song
>
>
> [1] https://asktug.com/
>
> On Tue, May 10, 2022 at 4:50 PM Konstantin Knauf 
> wrote:
>
>> Thanks for starting this discussion again. I am pretty much with Timo
>> here. Slack or Discourse as an alternative for the user community, and
>> mailing list for the contributing, design discussion, etc. I definitely see
>> the friction of joining a mailing list and understand if users are
>> intimidated.
>>
>> I am leaning towards a forum aka Discourse over a chat aka Slack. This is
>> about asking for help, finding information and thoughtful discussion more
>> so than casual chatting, right? For this a forum, where it is easier to
>> find and comment on older threads and topics just makes more sense to me. A
>> well-done Discourse forum is much more inviting and vibrant than a mailing
>> list. Just from a tool perspective, discourse would have the advantage of
>> being Open Source and so we could probably self-host it on an ASF machine.
>> [1]
>>
>> When it comes to Slack, I definitely see the benefit of a dedicated
>> Apache Flink Slack compared to ASF Slack. For example, we could have more
>> channels (e.g. look how many channels Airflow is using
>> http://apache-airflow.slack-archives.org) and we could generally
>> customize the experience more towards Apache Flink.  If we go for Slack,
>> let's definitely try to archive it like Airflow did. If we do this, we
>> don't necessarily need infinite message retention in Slack itself.
>>
>> Cheers,
>>
>> Konstantin
>>
>> [1] https://github.com/discourse/discourse
>>
>>
>> Am Di., 10. Mai 2022 um 10:20 Uhr schrieb Timo Walther <
>> twal...@apache.org>:
>>
>>> I also think that a real-time channel is long overdue. The Flink
>>> community in China has shown that such a platform can be useful for
>>> improving the collaboration within the community. The DingTalk channel of
>>> 10k+ users collectively helping each other is great to see. It could also
>>> reduce the burden from committers for answering frequently asked questions.
>>>
>>> Personally, I'm a mailing list fan esp. when it comes to design
>>> discussions. In my opinion, the dev@ mailing list should definitely
>>> stay where and how it is. However, I understand that users might not want
>>> to subscribe to a mailing list for a single question and get their mailbox
>>> filled with unrelated discussions afterwards. Esp. in a company setting it
>>> might not be easy to setup a dedicated email address for mailing lists and
>>> setting up rules is also not convenient.
>>>
>>> It would be great if we could use the ASF Slack. We should find an
>>> official, accessible channel. I would be open for the right tool. It might
>>> make sense to also look into Discourse or even Reddit? The latter would
>>> definitely be easier to index by a search engine. Discourse is actually
>>> made for modern real-time forums.
>>>
>>&g

Re: Migrating Flink apps across cloud with state

2022-05-10 Thread Konstantin Knauf
Hi there,

to me the simplest and most reliable solution still seems to be to split
the stream based on event time. It requires a bit of preparation and assume
that you can tolerate some downtime when doing the migration.

1) For Cloud1 you chain a filter to your sources that filters out any
records with a timestamp >  t_migration. Best you make this timestamp
configurable.
2) For Cloud2, you chain a filter to your sources that filters out any
records with timestamp <= t_migration.  Also configurable.
3) When you do the migration you configure t_migration to be, let's say 1
hour in the future. You let the Job run in Cloud1 until you are sure that
no more data with an event timestamp <= t_migration will arrive. You take a
savepoint.
4) You start your application in cloud2 with the same value for t_migration
and manually configured Kafka offsets for which you are sure they contain
all records with a timestamp > t_migration.

Could this work for you?

Cheers,

Konstantin




Am Mi., 4. Mai 2022 um 22:26 Uhr schrieb Andrew Otto :

> Have you tried MirrorMaker 2's consumer offset translation feature?  I
> have not used this myself, but it sounds like what you are looking for!
> https://issues.apache.org/jira/browse/KAFKA-9076
>
> https://kafka.apache.org/26/javadoc/org/apache/kafka/connect/mirror/Checkpoint.html
> https://strimzi.io/blog/2020/03/30/introducing-mirrormaker2/
>
> I tried to find some better docs to link for you, but that's the best I
> got :)  It looks like there is just the Java API.
>
>
>
> On Wed, May 4, 2022 at 3:29 PM Hemanga Borah 
> wrote:
>
>> Thank you for the suggestions, guys!
>>
>> @Austin Cawley-Edwards
>> Your idea is spot on! This approach would surely work. We could take a
>> savepoint of each of our apps, load it using state processor apis and
>> create another savepoint accounting for the delta on the offsets, and start
>> the app on the new cloud using this modified savepoint.
>> However, the solution will not be generic, and we have to do this for
>> each of our applications. This can be quite cumbersome as we have several
>> applications (around 25).
>>
>> We are thinking of overriding the FlinkKafkaConsumerBase to account for
>> the offset deltas during the start-up of any app. Do you think it is safe
>> to do that? Is there a better way of doing this?
>>
>> @Schwalbe Matthias
>> Thank you for your suggestion. We do use exactly-once semantics, but, our
>> apps can tolerate a few duplicates in rare cases like this one where we are
>> migrating clouds. However, your suggestion is really helpful and we will
>> use it in case some of the apps cannot tolerate duplicate data.
>>
>>
>> On Wed, May 4, 2022 at 12:00 AM Schwalbe Matthias <
>> matthias.schwa...@viseca.ch> wrote:
>>
>>> Hello Hemanga,
>>>
>>>
>>>
>>> MirrorMaker can cause havoc in many respects, for one, it does not have
>>> strict exactly-once.semantics…
>>>
>>>
>>>
>>> The way I would tackle this problem (and have done in similar
>>> situaltions):
>>>
>>>
>>>
>>>- For the source topics that need to be have exactly-once-semantics
>>>and that are not intrinsically idempotent:
>>>- Add one extra operator after the source that deduplicates events
>>>by unique id for a rolling time range (on the source cloud provider)
>>>- Take a savepoint after the rolling time-range has passed (at least
>>>once completely)
>>>- Move your job to the target cloud provider
>>>- Reconfigure the resp. source with a new kafka consumer group.id,
>>>- Change the uid() of the resp. kafka source,
>>>- Configure start-by-timestamp for the resp. source with a timestamp
>>>that lies within the rolling time range (of above)
>>>- Configure the job to ignore  recovery for state that does not have
>>>a corresponding operator in the job (the previous kafka source uid()s)
>>>- Start the job on new cloud provider, wait for it to pick
>>>up/back-fill
>>>- Take a savepoint
>>>- Remove deduplication operator if that causes too much
>>>load/latency/whatever
>>>
>>>
>>>
>>> This scheme sounds more complicated than it really is … and has saved my
>>> sanity quite a number of times 
>>>
>>>
>>>
>>> Good luck and ready to answer more details
>>>
>>>
>>>
>>> Thias
>>>
>>>
>>>
>>> *From:* Hemanga Borah 
>>> *Sent:* Tuesday, May 3, 2022 3:12 AM
>>> *To:* user@flink.apache.org
>>> *Subject:* Migrating Flink apps across cloud with state
>>>
>>>
>>>
>>> Hello,
>>>  We are attempting to port our Flink applications from one cloud
>>> provider to another.
>>>
>>>  These Flink applications consume data from Kafka topics and output to
>>> various destinations (Kafka or databases). The applications have states
>>> stored in them. Some of these stored states are aggregations, for example,
>>> at times we store hours (or days) worth of data to aggregate over time.
>>> Some other applications have cached information for data enrichment, for
>>> example, we store data in Flink state for days, so that we can 

Re: flink Job is throwing depdnecy issue when submitted to clusrer

2022-05-10 Thread Konstantin Knauf
Hi there,

are you using any of Flink S3 Filesystems? If so, where do you load it from

a) lib/
b) plugins/
c) bundled with your Job in a fat JAR

b) would be the right way to do it in Flink 1.13. I don't know if this
fixes the issue, but IIRC because we introduced the plugin mechansim we
don't relocated dependencies in the filesystems anymore.

Cheers,

Konstantin





Am Sa., 7. Mai 2022 um 07:47 Uhr schrieb 张立志 :

> 退订
>
>
>
> | |
> zh_ha...@163.com
> |
> |
> 邮箱:zh_ha...@163.com
> |
>
>
>
>
>  回复的原邮件 
> | 发件人 | Great Info |
> | 日期 | 2022年05月07日 13:21 |
> | 收件人 | d...@flink.apache.org、user<
> user@flink.apache.org> |
> | 抄送至 | |
> | 主题 | flink Job is throwing depdnecy issue when submitted to clusrer |
> I have one flink job which reads files from s3 and processes them.
> Currently, it is running on flink 1.9.0, I need to upgrade my cluster to
> 1.13.5, so I have done the changes in my job pom and brought up the flink
> cluster using 1.13.5 dist.
>
> when I submit my application I am getting the below error when it tries to
> connect to s3, have updated the s3 SDK version to the latest, but still
> getting the same error.
>
> caused by: java.lang.invoke.lambdaconversionexception: invalid receiver
> type interface org.apache.http.header; not a subtype of implementation type
> interface org.apache.http.namevaluepair
>
> it works when I just run as a mini-cluster ( running just java -jar
> ) and also when I submit to the Flink cluster with 1.9.0.
>
> Not able to understand where the dependency match is happening.
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk


Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-10 Thread Konstantin Knauf
Thanks for starting this discussion again. I am pretty much with Timo here.
Slack or Discourse as an alternative for the user community, and mailing
list for the contributing, design discussion, etc. I definitely see the
friction of joining a mailing list and understand if users are intimidated.

I am leaning towards a forum aka Discourse over a chat aka Slack. This is
about asking for help, finding information and thoughtful discussion more
so than casual chatting, right? For this a forum, where it is easier to
find and comment on older threads and topics just makes more sense to me. A
well-done Discourse forum is much more inviting and vibrant than a mailing
list. Just from a tool perspective, discourse would have the advantage of
being Open Source and so we could probably self-host it on an ASF machine.
[1]

When it comes to Slack, I definitely see the benefit of a dedicated Apache
Flink Slack compared to ASF Slack. For example, we could have more channels
(e.g. look how many channels Airflow is using
http://apache-airflow.slack-archives.org) and we could generally customize
the experience more towards Apache Flink.  If we go for Slack, let's
definitely try to archive it like Airflow did. If we do this, we don't
necessarily need infinite message retention in Slack itself.

Cheers,

Konstantin

[1] https://github.com/discourse/discourse


Am Di., 10. Mai 2022 um 10:20 Uhr schrieb Timo Walther :

> I also think that a real-time channel is long overdue. The Flink community
> in China has shown that such a platform can be useful for improving the
> collaboration within the community. The DingTalk channel of 10k+ users
> collectively helping each other is great to see. It could also reduce the
> burden from committers for answering frequently asked questions.
>
> Personally, I'm a mailing list fan esp. when it comes to design
> discussions. In my opinion, the dev@ mailing list should definitely stay
> where and how it is. However, I understand that users might not want to
> subscribe to a mailing list for a single question and get their mailbox
> filled with unrelated discussions afterwards. Esp. in a company setting it
> might not be easy to setup a dedicated email address for mailing lists and
> setting up rules is also not convenient.
>
> It would be great if we could use the ASF Slack. We should find an
> official, accessible channel. I would be open for the right tool. It might
> make sense to also look into Discourse or even Reddit? The latter would
> definitely be easier to index by a search engine. Discourse is actually
> made for modern real-time forums.
>
> Regards,
> Timo
>
>
> Am 10.05.22 um 09:59 schrieb David Anderson:
>
> Thank you @Xintong Song  for sharing the
> experience of the Flink China community.
>
> I'm become convinced we should give Slack a try, both for discussions
> among the core developers, and as a place where the community can reach out
> for help. I am in favor of using the ASF slack, as we will need a paid
> instance for this to go well, and joining it is easy enough (took me about
> 2 minutes). Thanks, Robert, for suggesting we go down this route.
>
> David
>
> On Tue, May 10, 2022 at 8:21 AM Robert Metzger 
> wrote:
>
>> It seems that we'd have to use invite links on the Flink website for
>> people to join our Slack (1)
>> These links can be configured to have no time-expiration, but they will
>> expire after 100 guests have joined.
>> I guess we'd have to use a URL shortener (https://s.apache.org) that we
>> update once the invite link expires. It's not a nice solution, but it'll
>> work.
>>
>>
>> (1) https://the-asf.slack.com/archives/CBX4TSBQ8/p1652125017094159
>>
>>
>> On Mon, May 9, 2022 at 3:59 PM Robert Metzger 
>> wrote:
>>
>>> Thanks a lot for your answer. The onboarding experience to the ASF Slack
>>> is indeed not ideal:
>>> https://apisix.apache.org/docs/general/join#join-the-slack-channel
>>> I'll see if we can improve it
>>>
>>> On Mon, May 9, 2022 at 3:38 PM Martijn Visser 
>>> wrote:
>>>
 As far as I recall you can't sign up for the ASF instance of Slack, you
 can
 only get there if you're a committer or if you're invited by a
 committer.

 On Mon, 9 May 2022 at 15:15, Robert Metzger 
 wrote:

 > Sorry for joining this discussion late, and thanks for the summary
 Xintong!
 >
 > Why are we considering a separate slack instance instead of using the
 ASF
 > Slack instance?
 > The ASF instance is paid, so all messages are retained forever, and
 quite
 > a few people are already on that Slack instance.
 > There is already a #flink channel on that Slack instance, that we
 could
 > leave as passive as it is right now, or put some more effort into it,
 on a
 > voluntary basis.
 > We could add another #flink-dev channel to that Slack for developer
 > discussions, and a private flink-committer and flink-pmc chat.
 >
 > If we are going that path, we should rework 

Re: Alertmanager Sink Connector

2022-04-20 Thread Konstantin Knauf
cc user@, bcc dev@

Hi Dhruv,

Yes, this should be totally possible.

For 1, I would use a ProcessFunction to buffer the alerts and register
timers per alert for the repeated firing (or to clear it from state if it
is resolved). From this ProcessFunction you send records to an
AlertManagerSink.

For 2. the AsyncSink, Fabian Paul (cc) might be the best person to judge if
it is a good fit. There is PR [1] to add a blog post about the Async Sink
that might be of help for you in the meantime.

Cheers,

Konstantin

[1] https://github.com/apache/flink-web/pull/517


On Thu, Apr 14, 2022 at 9:43 AM Dhruv Patel 
wrote:

> Hi,
>We have a use case where we want to send alerts to Prometheus
> Alertmanager (https://prometheus.io/docs/alerting/latest/alertmanager/)
> from Flink. Each of these alerts have a startsAt, endsAt field along with
> alertMetadata. Alertmanager expects clients to send all the FIRING alerts
> every X minutes (X is configurable) with an updated endsAt time (greater
> than X since alertmanager would drop the alert once endsAt time is reached)
> and once the alert is in RESOLVED state stop sending it. The state updates
> of the alerts would come from Kafka. So the pipeline is something like this
> State Updates to Alerts (Kafka) -> Flink (Some Enrichment) -> Alertmanager
> Sink
>  They provide a rest endpoint where we can post these alerts. I have some
> questions to see if its achievable to develop a sink connector for
> alertmanager in flink?
>
> 1. Is it possible to send data to a sink every X minutes from a custom sink
> connector since I have to somehow simulate a behavior of continuously
> sending the same alerts even because state updates are only received from
> Kafka for FIRING -> RESOLVED state and not for FIRING -> FIRING state? I
> was thinking of having a state of active alerts and somehow the sink
> connector would get the state every X minutes, batch it and then send it to
> alertmanager. However, I am not able to find resources to write some
> implementation around it.
>
> 2. In Flink 1.15, there are Async Sinks (
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink)
> but not much documentation around. Also don't know if it would be
> achievable to write the continuous firing logic in alertmanager
>
> Any other ideas are welcomed.
>
> --
> *Thanks,*
> *Dhruv*
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [ANNOUNCE] Apache Flink 1.1.4.4 released

2022-03-16 Thread Konstantin Knauf
Hi Xingbo,

you are totally right. Thank you for noticing. This also affected Flink
1.13.6, the other release I was recently managing. I simply skipped a step
in the release guide.

It should be fixed now. Could you double-check?

Cheers,

Konstantin

On Wed, Mar 16, 2022 at 4:07 AM Xingbo Huang  wrote:

> Thanks a lot for being our release manager Konstantin and everyone who
> contributed. I have a question about pyflink. I see that there are no
> corresponding wheel packages uploaded on pypi, only the source package is
> uploaded. Is there something wrong with building the wheel packages?
>
> Best,
> Xingbo
>
> Leonard Xu  于2022年3月16日周三 01:02写道:
>
>> Thanks a lot for being our release manager Konstantin and everyone who
>> involved!
>>
>> Best,
>> Leonard
>>
>> 2022年3月15日 下午9:34,Martijn Visser  写道:
>>
>> Thank you Konstantin and everyone who contributed!
>>
>>
>>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [ANNOUNCE] Apache Flink 1.1.4.4 released

2022-03-16 Thread Konstantin Knauf
Hi Xingbo,

you are totally right. Thank you for noticing. This also affected Flink
1.13.6, the other release I was recently managing. I simply skipped a step
in the release guide.

It should be fixed now. Could you double-check?

Cheers,

Konstantin

On Wed, Mar 16, 2022 at 4:07 AM Xingbo Huang  wrote:

> Thanks a lot for being our release manager Konstantin and everyone who
> contributed. I have a question about pyflink. I see that there are no
> corresponding wheel packages uploaded on pypi, only the source package is
> uploaded. Is there something wrong with building the wheel packages?
>
> Best,
> Xingbo
>
> Leonard Xu  于2022年3月16日周三 01:02写道:
>
>> Thanks a lot for being our release manager Konstantin and everyone who
>> involved!
>>
>> Best,
>> Leonard
>>
>> 2022年3月15日 下午9:34,Martijn Visser  写道:
>>
>> Thank you Konstantin and everyone who contributed!
>>
>>
>>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Apache Flink 1.1.4.4 released

2022-03-15 Thread Konstantin Knauf
The Apache Flink community is very happy to announce the release of Apache
Flink 1.14.4, which is the third bugfix release for the Apache Flink 1.14
series.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements
for this bugfix release:

https://flink.apache.org/news/2022/03/11/release-1.14.4.html

The full release notes are available in Jira:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351231

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Regards,

Konstantin

--

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Apache Flink 1.1.4.4 released

2022-03-15 Thread Konstantin Knauf
The Apache Flink community is very happy to announce the release of Apache
Flink 1.14.4, which is the third bugfix release for the Apache Flink 1.14
series.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements
for this bugfix release:

https://flink.apache.org/news/2022/03/11/release-1.14.4.html

The full release notes are available in Jira:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351231

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Regards,

Konstantin

--

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: Add libatomic1 to flink image

2022-03-11 Thread Konstantin Knauf
Moving dev@ to bcc, adding user@

Hi Julius,

the recommended approach would be to build your own Docker images from the
official images along the lines of

FROM apache/flink:1.14.3
RUN apt-get install -y libatomic1

Cheers,

Konstantin


On Fri, Mar 11, 2022 at 11:07 AM Almeida, Julius
 wrote:

> Hi Developers,
>
> I wish to add a new library to existing flink docker image since it’s a
> dependency requirement in my processor.
> Is it possible to add?
>
> apt-get install libatomic1
>
> Thanks,
> Julius
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Apache Flink 1.13.6 released

2022-02-18 Thread Konstantin Knauf
The Apache Flink community is very happy to announce the release of Apache
Flink 1.13.6, which is the fifth bugfix release for the Apache Flink 1.13
series.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements
for this bugfix release:

https://flink.apache.org/news/2022/02/09/release-1.13.6.html

The full release notes are available in Jira:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351074

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Regards,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Apache Flink 1.13.6 released

2022-02-18 Thread Konstantin Knauf
The Apache Flink community is very happy to announce the release of Apache
Flink 1.13.6, which is the fifth bugfix release for the Apache Flink 1.13
series.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements
for this bugfix release:

https://flink.apache.org/news/2022/02/09/release-1.13.6.html

The full release notes are available in Jira:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351074

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Regards,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[RESULT] [VOTE] Deprecate Per-Job Mode

2022-02-08 Thread Konstantin Knauf
Hi everyone,

The vote on deprecating per-job mode in Flink 1.15 has been
unanimously approved in [1].

I've created a ticket for deprecation [2] and dropping [3] and linked the
current blockers for dropping it to the latter.

Binding +1
Thomas Weise
Xintong Song
Yang Wang
Jing Zhang
Till Rohrmann

Non-Binding +1
Chenya Zhang
David Moravek
Gabor Somogyi

Cheers,

Konstantin

[1] https://lists.apache.org/thread/v6oz92dfp95qcox45l0f8393089oyjv4
[2] https://issues.apache.org/jira/browse/FLINK-25999
[3] https://issues.apache.org/jira/browse/FLINK-26000

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [DISCUSS] Future of Per-Job Mode

2022-01-21 Thread Konstantin Knauf
Thanks Thomas & Biao for your feedback.

Any additional opinions on how we should proceed with per job-mode? As you
might have guessed, I am leaning towards proposing to deprecate per-job
mode.

On Thu, Jan 13, 2022 at 5:11 PM Thomas Weise  wrote:

> Regarding session mode:
>
> ## Session Mode
> * main() method executed in client
>
> Session mode also supports execution of the main method on Jobmanager
> with submission through REST API. That's how Flinkk k8s operators like
> [1] work. It's actually an important capability because it allows for
> allocation of the cluster resources prior to taking down the previous
> job during upgrade when the goal is optimization for availability.
>
> Thanks,
> Thomas
>
> [1] https://github.com/lyft/flinkk8soperator
>
> On Thu, Jan 13, 2022 at 12:32 AM Konstantin Knauf 
> wrote:
> >
> > Hi everyone,
> >
> > I would like to discuss and understand if the benefits of having Per-Job
> > Mode in Apache Flink outweigh its drawbacks.
> >
> >
> > *# Background: Flink's Deployment Modes*
> > Flink currently has three deployment modes. They differ in the following
> > dimensions:
> > * main() method executed on Jobmanager or Client
> > * dependencies shipped by client or bundled with all nodes
> > * number of jobs per cluster & relationship between job and cluster
> > lifecycle* (supported resource providers)
> >
> > ## Application Mode
> > * main() method executed on Jobmanager
> > * dependencies already need to be available on all nodes
> > * dedicated cluster for all jobs executed from the same main()-method
> > (Note: applications with more than one job, currently still significant
> > limitations like missing high-availability). Technically, a session
> cluster
> > dedicated to all jobs submitted from the same main() method.
> > * supported by standalone, native kubernetes, YARN
> >
> > ## Session Mode
> > * main() method executed in client
> > * dependencies are distributed from and by the client to all nodes
> > * cluster is shared by multiple jobs submitted from different clients,
> > independent lifecycle
> > * supported by standalone, Native Kubernetes, YARN
> >
> > ## Per-Job Mode
> > * main() method executed in client
> > * dependencies are distributed from and by the client to all nodes
> > * dedicated cluster for a single job
> > * supported by YARN only
> >
> >
> > *# Reasons to Keep** There are use cases where you might need the
> > combination of a single job per cluster, but main() method execution in
> the
> > client. This combination is only supported by per-job mode.
> > * It currently exists. Existing users will need to migrate to either
> > session or application mode.
> >
> >
> > *# Reasons to Drop** With Per-Job Mode and Application Mode we have two
> > modes that for most users probably do the same thing. Specifically, for
> > those users that don't care where the main() method is executed and want
> to
> > submit a single job per cluster. Having two ways to do the same thing is
> > confusing.
> > * Per-Job Mode is only supported by YARN anyway. If we keep it, we should
> > work towards support in Kubernetes and Standalone, too, to reduce special
> > casing.
> > * Dropping per-job mode would reduce complexity in the code and allow us
> to
> > dedicate more resources to the other two deployment modes.
> > * I believe with session mode and application mode we have to easily
> > distinguishable and understandable deployment modes that cover Flink's
> use
> > cases:
> >* session mode: olap-style, interactive jobs/queries, short lived
> batch
> > jobs, very small jobs, traditional cluster-centric deployment mode (fits
> > the "Hadoop world")
> >* application mode: long-running streaming jobs, large scale &
> > heterogenous jobs (resource isolation!), application-centric deployment
> > mode (fits the "Kubernetes world")
> >
> >
> > *# Call to Action*
> > * Do you use per-job mode? If so, why & would you be able to migrate to
> one
> > of the other methods?
> > * Am I missing any pros/cons?
> > * Are you in favor of dropping per-job mode midterm?
> >
> > Cheers and thank you,
> >
> > Konstantin
> >
> > --
> >
> > Konstantin Knauf
> >
> > https://twitter.com/snntrable
> >
> > https://github.com/knaufk
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [DISCUSS] Moving connectors from Flink to external connector repositories

2022-01-14 Thread Konstantin Knauf
 like bounded and unbounded scan, lookup, batch and
>> > streaming sink capabilities. In the end the quality should depend on the
>> > maintainers of the connector, not on where the code is maintained.
>> > * The Hybrid Source connector is a special connector because of its
>> > purpose.
>> > * The FileSystem, DataGen, Print and BlackHole connectors are important
>> for
>> > first time Flink users/testers. If you want to experiment with Flink,
>> you
>> > will most likely start with a local file before moving to one of the
>> other
>> > sources or sinks. These 4 connectors can help with either
>> reading/writing
>> > local files or generating/displaying/ignoring data.
>> > * Some of the connectors haven't been maintained in a long time (for
>> > example, NiFi and Google Cloud PubSub). An argument could be made that
>> we
>> > check if we actually want to move such a connector or make the decision
>> to
>> > drop the connector entirely.
>> >
>> > I'm looking forward to your thoughts!
>> >
>> > Best regards,
>> >
>> > Martijn Visser | Product Manager
>> >
>> > mart...@ververica.com
>> >
>> > [1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm
>> >
>> > <https://www.ververica.com/>
>> >
>> >
>> > Follow us @VervericaData
>> >
>> > --
>> >
>> > Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> > Conference
>> >
>> > Stream Processing | Event Driven | Real Time
>>
>>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[DISCUSS] Future of Per-Job Mode

2022-01-13 Thread Konstantin Knauf
Hi everyone,

I would like to discuss and understand if the benefits of having Per-Job
Mode in Apache Flink outweigh its drawbacks.


*# Background: Flink's Deployment Modes*
Flink currently has three deployment modes. They differ in the following
dimensions:
* main() method executed on Jobmanager or Client
* dependencies shipped by client or bundled with all nodes
* number of jobs per cluster & relationship between job and cluster
lifecycle* (supported resource providers)

## Application Mode
* main() method executed on Jobmanager
* dependencies already need to be available on all nodes
* dedicated cluster for all jobs executed from the same main()-method
(Note: applications with more than one job, currently still significant
limitations like missing high-availability). Technically, a session cluster
dedicated to all jobs submitted from the same main() method.
* supported by standalone, native kubernetes, YARN

## Session Mode
* main() method executed in client
* dependencies are distributed from and by the client to all nodes
* cluster is shared by multiple jobs submitted from different clients,
independent lifecycle
* supported by standalone, Native Kubernetes, YARN

## Per-Job Mode
* main() method executed in client
* dependencies are distributed from and by the client to all nodes
* dedicated cluster for a single job
* supported by YARN only


*# Reasons to Keep** There are use cases where you might need the
combination of a single job per cluster, but main() method execution in the
client. This combination is only supported by per-job mode.
* It currently exists. Existing users will need to migrate to either
session or application mode.


*# Reasons to Drop** With Per-Job Mode and Application Mode we have two
modes that for most users probably do the same thing. Specifically, for
those users that don't care where the main() method is executed and want to
submit a single job per cluster. Having two ways to do the same thing is
confusing.
* Per-Job Mode is only supported by YARN anyway. If we keep it, we should
work towards support in Kubernetes and Standalone, too, to reduce special
casing.
* Dropping per-job mode would reduce complexity in the code and allow us to
dedicate more resources to the other two deployment modes.
* I believe with session mode and application mode we have to easily
distinguishable and understandable deployment modes that cover Flink's use
cases:
   * session mode: olap-style, interactive jobs/queries, short lived batch
jobs, very small jobs, traditional cluster-centric deployment mode (fits
the "Hadoop world")
   * application mode: long-running streaming jobs, large scale &
heterogenous jobs (resource isolation!), application-centric deployment
mode (fits the "Kubernetes world")


*# Call to Action*
* Do you use per-job mode? If so, why & would you be able to migrate to one
of the other methods?
* Am I missing any pros/cons?
* Are you in favor of dropping per-job mode midterm?

Cheers and thank you,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: Flink native k8s integration vs. operator

2022-01-13 Thread Konstantin Knauf
Hi Thomas,

Yes, I was referring to a separate repository under Apache Flink.

Cheers,

Konstantin

On Thu, Jan 13, 2022 at 6:19 AM Thomas Weise  wrote:

> Hi everyone,
>
> Thanks for the feedback and discussion. A few additional thoughts:
>
> [Konstantin] > With respect to common lifecycle management operations:
> these features are
> > not available (within Apache Flink) for any of the other resource
> providers
> > (YARN, Standalone) either. From this perspective, I wouldn't consider
> this
> > a shortcoming of the Kubernetes integration.
>
> I think time and evolution of the ecosystem are factors to consider as
> well. The state and usage of Flink was much different when YARN
> integration was novel. Expectations are different today and the
> lifecycle functionality provided by an operator may as well be
> considered essential to support the concept of a Flink application on
> k8s. After few years learning from operator experience outside of
> Flink it might be a good time to fill the gap.
>
> [Konstantin] > I still believe that we should keep this focus on low
> > level composable building blocks (like Jobs and Snapshots) in Apache
> Flink
> > to make it easy for everyone to build fitting higher level abstractions
> > like a FlinkApplication Custom Resource on top of it.
>
> I completely agree that it is important that the basic functions of
> Flink are solid and continued focus is necessary. Thanks for sharing
> the pointers, these are great improvements. At the same time,
> ecosystem, contributor base and user spectrum are growing. There have
> been significant additions in many areas of Flink including connectors
> and higher level abstractions like statefun, SQL and Python. It's also
> evident from additional repositories/subprojects that we have in Flink
> today.
>
> [Konstantin] > Having said this, if others in the community have the
> capacity to push and
> > *maintain* a somewhat minimal "reference" Kubernetes Operator for Apache
> > Flink, I don't see any blockers. If or when this happens, I'd see some
> > clear benefits of using a separate repository (easier independent
> > versioning and releases, different build system & tooling (go, I
> assume)).
>
> Naturally different contributors to the project have different focus.
> Let's find out if there is strong enough interest to take this on and
> strong enough commitment to maintain. As I see it, there is a
> tremendous amount of internal investment going into operationalizing
> Flink within many companies. Improvements to the operational side of
> Flink like the operator would complement Flink nicely. I assume that
> you are referring to a separate repository within Apache Flink, which
> would give it the chance to achieve better sustainability than the
> existing external operator efforts. There is also the fact that some
> organizations which are heavily invested in operationalizing Flink are
> allowing contributing to Apache Flink itself but less so to arbitrary
> github projects. Regarding the tooling, it could well turn out that
> Java is a good alternative given the ecosystem focus and that there is
> an opportunity for reuse in certain aspects (metrics, logging etc.).
>
> [Yang] > I think Xintong has given a strong point why we introduced
> the native K8s integration, which is active resource management.
> > I have a concrete example for this in the production. When a K8s node is
> down, the standalone K8s deployment will take longer
> > recovery time based on the K8s eviction time(IIRC, default is 5
> minutes). For the native K8s integration, Flink RM could be aware of the
> > TM heartbeat lost and allocate a new one timely.
>
> Thanks for sharing this, we should evaluate it as part of a proposal.
> If we can optimize recovery or scaling with active resource management
> then perhaps it is worth to support it through the operator.
> Previously mentioned operators all rely on the standalone model.
>
> Cheers,
> Thomas
>
> On Wed, Jan 12, 2022 at 3:21 AM Konstantin Knauf 
> wrote:
> >
> > cc dev@
> >
> > Hi Thomas, Hi everyone,
> >
> > Thank you for starting this discussion and sorry for chiming in late.
> >
> > I agree with Thomas' and David's assessment of Flink's "Native Kubernetes
> > Integration", in particular, it does actually not integrate well with the
> > Kubernetes ecosystem despite being called "native" (tooling, security
> > concerns).
> >
> > With respect to common lifecycle management operations: these features
> are
> > not available (within Apache Flink) for any of the other resource
> providers
> > (YARN, Standalone

Re: Flink native k8s integration vs. operator

2022-01-12 Thread Konstantin Knauf
tegration is really
>>>> suitable here. The direction that we're headed is with the standalone
>>>> deployment on Kubernetes + the reactive mode (adaptive scheduler).
>>>> >
>>>> > In theory, if we want to build a really cloud (Kubernetes) native
>>>> stream processor, deploying the pipeline should be as simple as deploying
>>>> any other application. It should be also simple to integrate with CI & CD
>>>> environment and the fast / frequent deploy philosophy.
>>>> >
>>>> > Let's see where we stand and where we can expand from there:
>>>> >
>>>> > a) Resource efficiency
>>>> >
>>>> > We already have the reactive mode in place. This allows you to add /
>>>> remove task managers by adjusting the TM deployment (`kubectl scale ...`)
>>>> and Flink will automatically react to the available resources. This is
>>>> currently only supported with the Application Mode, that is limited to a
>>>> single job (which should be enough for this kind of workload).
>>>> >
>>>> > The re-scaling logic is left completely up to the user and can be as
>>>> simple as setting up a HPA (Horizontal Pod Autoscaler). I tend to think in
>>>> the direction, that we might want to provide a custom k8s metrics server,
>>>> that allows HPA to query the metrics from JM, to make this more flexible
>>>> and easy to use.
>>>> >
>>>> > As this looks really great in theory, there are still some
>>>> shortcomings that we're actively working on addressing. For this feature to
>>>> be really widely adopted, we need to make the re-scaling experience as fast
>>>> as possible, so we can re-scale often to react to the input rate. This
>>>> could be currently a problem with large RocksDB states as this involves
>>>> full re-balance of the state (splitting / merging RocksDB instances). The
>>>> k8s operator approach has the same / even worse limitation as it involves
>>>> taking a savepoint a re-building the state from it.
>>>> >
>>>> > b) Fast recovery
>>>> >
>>>> > This is IMO not as different from the native mode (although I'd have
>>>> to check whether RM failover can reuse task managers). This involves
>>>> frequent and fast checkpointing, local recovery (which is still not
>>>> supported in reactive mode, but this will be hopefully addressed soon) and
>>>> working directory efforts [4].
>>>> >
>>>> > c) Application upgrades
>>>> >
>>>> > This is the topic I'm still struggling with a little. Historically
>>>> this involves external lifecycle management (savepoint + submitting a new
>>>> job). I think at the end of the day, with application mode on standalone
>>>> k8s, it could be as simple as updating the docker image of the JM
>>>> deployment.
>>>> >
>>>> > If I think about the simplest upgrade scenario, simple in-place
>>>> restore from the latest checkpoint, it may be fairly simple to implement.
>>>> What I'm struggling with are the more complex upgrade scenarios such as
>>>> dual, blue / green deployment.
>>>> >
>>>> >
>>>> > To sum this up, I'd really love if Flink could provide great out-of
>>>> the box experience with standalone mode on k8s, that makes the experience
>>>> as close to running / operating any other application as possible.
>>>> >
>>>> > I'd really appreciate to hear your thoughts on this topic.
>>>> >
>>>> > [1]
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler
>>>> > [2] https://github.com/flink-extended/flink-remote-shuffle
>>>> > [3]
>>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/elastic_scaling/
>>>> > [4]
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-198%3A+Working+directory+for+Flink+processes
>>>> >
>>>> > Best,
>>>> > D.
>>>> >
>>>> > On Tue, Jan 4, 2022 at 12:44 AM Thomas Weise  wrote:
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> I was recently looking at the Flink native Kubernetes integration [1]
>>>> >> to get an idea how it relates to existing operator based solutions
>>>> >> [2], [3].
>>>> >>
>>>> >> Part of the native integration's motivations was simplicity (no extra
>>>> >> component to install), but arguably that is also a shortcoming. The
>>>> >> k8s operator model can offer support for application lifecycle like
>>>> >> upgrade and rescaling, as well as job submission without a Flink
>>>> >> client.
>>>> >>
>>>> >> When using the Flink native integration it would still be necessary
>>>> to
>>>> >> provide that controller functionality. Is the idea to use the native
>>>> >> integration for task manager resource allocation in tandem with an
>>>> >> operator that provides the external controller functionality? If
>>>> >> anyone using the Flink native integration can share experience, I
>>>> >> would be curious to learn more about the specific setup and if there
>>>> >> are plans to expand the k8s native integration capabilities.
>>>> >>
>>>> >> For example:
>>>> >>
>>>> >> * Application upgrade with features such as [4]. Since the job
>>>> manager
>>>> >> is part of the deployment it cannot orchestrate the deployment. It
>>>> >> needs to be the responsibility of an external process. Has anyone
>>>> >> contemplated adding such a component to Flink itself?
>>>> >>
>>>> >> * Rescaling: Theoretically a parallelism change could be performed
>>>> w/o
>>>> >> restart of the job manager pod. Hence, building blocks to trigger and
>>>> >> apply rescaling could be part of Flink itself. Has this been explored
>>>> >> further?
>>>> >>
>>>> >> Yang kindly pointed me to [5]. Is the recommendation/conclusion that
>>>> >> when a k8s operator is already used, then let it be in charge of the
>>>> >> task manager resource allocation? If so, what scenario was the native
>>>> >> k8s integration originally intended for?
>>>> >>
>>>> >> Thanks,
>>>> >> Thomas
>>>> >>
>>>> >> [1]
>>>> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/deployment/resource-providers/native_kubernetes/#deployment-modes
>>>> >> [2] https://github.com/lyft/flinkk8soperator
>>>> >> [3] https://github.com/spotify/flink-on-k8s-operator
>>>> >> [4]
>>>> https://github.com/lyft/flinkk8soperator/blob/master/docs/state_machine.md
>>>> >> [5] https://lists.apache.org/thread/8cn99f6n8nhr07n5vqfo880tpm624s5d
>>>>
>>>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Advise on Apache Log4j Zero Day (CVE-2021-44228)

2021-12-10 Thread Konstantin Knauf
Dear Flink Community,

Yesterday, a new Zero Day for Apache Log4j was reported [1]. It is now
tracked under CVE-2021-44228 [2].

Apache Flink bundles a version of Log4j that is affected by this
vulnerability. We recommend users to follow the advisory [3] of the Apache
Log4j Community. For Apache Flink this currently translates to “setting
system property log4j2.formatMsgNoLookups to true” until Log4j has been
upgraded to 2.15.0 in Apache Flink.

This effort is tracked in FLINK-25240 [4]. It will be included in Flink
1.15.0, Flink 1.14.1 and Flink 1.13.3. We expect Flink 1.14.1 to be
released in the next 1-2 weeks. The other releases will follow in their
regular cadence.

This advice has also been published on the Apache Flink blog
https://flink.apache.org/2021/12/10/log4j-cve.html.

Best,

Konstantin

[1]
https://www.cyberkendra.com/2021/12/apache-log4j-vulnerability-details-and.html
[2] https://nvd.nist.gov/vuln/detail/CVE-2021-44228
[3] https://logging.apache.org/log4j/2.x/security.html
[4] https://issues.apache.org/jira/browse/FLINK-25240

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [DISCUSS] Deprecate MapR FS

2021-12-09 Thread Konstantin Knauf
+1 (what Seth said)

On Thu, Dec 9, 2021 at 4:15 PM Seth Wiesman  wrote:

> +1
>
> I actually thought we had already dropped this FS. If anyone is still
> relying on it in production, the file system abstraction in Flink has been
> incredibly stable over the years. They should be able to use the 1.14 MapR
> FS with later versions of Flink.
>
> Seth
>
> On Wed, Dec 8, 2021 at 10:03 AM Martijn Visser 
> wrote:
>
>> Hi all,
>>
>> Flink supports multiple file systems [1] which includes MapR FS. MapR as
>> a company doesn't exist anymore since 2019, the technology and intellectual
>> property has been sold to Hewlett Packard.
>>
>> I don't think that there's anyone who's using MapR anymore and therefore
>> I think it would be good to deprecate this for Flink 1.15 and then remove
>> it in Flink 1.16. Removing this from Flink will slightly shrink the
>> codebase and CI runtime.
>>
>> I'm also cross posting this to the User mailing list, in case there's
>> still anyone who's using MapR.
>>
>> Best regards,
>>
>> Martijn
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>>
>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [ANNOUNCE] Apache Flink 1.13.3 released

2021-10-21 Thread Konstantin Knauf
Thank you, Chesnay & Martijn, for managing this release!

On Thu, Oct 21, 2021 at 10:29 AM Chesnay Schepler 
wrote:

> The Apache Flink community is very happy to announce the release of
> Apache Flink 1.13.3, which is the third bugfix release for the Apache
> Flink 1.13 series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data
> streaming applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the
> improvements for this bugfix release:
> https://flink.apache.org/news/2021/10/19/release-1.13.3.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350329
>
> We would like to thank all contributors of the Apache Flink community
> who made this release possible!
>
> Regards,
> Chesnay
>
>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [DISCUSS] Creating an external connector repository

2021-10-20 Thread Konstantin Knauf
t; >> released separately. Flink already has separate releases
> > >> (flink-shaded), so that by itself isn't a new thing. Per-connector
> > >> releases would need to allow for more frequent releases (without the
> > >> baggage that a full Flink release comes with).
> > >>
> > >> Separate releases would only make sense if the core Flink surface is
> > >> fairly stable though. As evident from Iceberg (and also Beam), that's
> > >> not the case currently. We should probably focus on addressing the
> > >> stability first, before splitting code. A success criteria could be
> > >> that we are able to build Iceberg and Beam against multiple Flink
> > >> versions w/o the need to change code. The goal would be that no
> > >> connector breaks when we make changes to Flink core. Until that's the
> > >> case, code separation creates a setup where 1+1 or N+1 repositories
> > >> need to move lock step.
> > >>
> > >> Regarding some connectors being more important for Flink than others:
> > >> That's a fact. Flink w/o Kafka connector (and few others) isn't
> > >> viable. Testability of Flink was already brought up, can we really
> > >> certify a Flink core release without Kafka connector? Maybe those
> > >> connectors that are used in Flink e2e tests to validate functionality
> > >> of core Flink should not be broken out?
> > >>
> > >> Finally, I think that the connectors that move into separate repos
> > >> should remain part of the Apache Flink project. Larger organizations
> > >> tend to approve the use of and contribution to open source at the
> > >> project level. Sometimes it is everything ASF. More often it is
> > >> "Apache Foo". It would be fatal to end up with a patchwork of projects
> > >> with potentially different licenses and governance to arrive at a
> > >> working Flink setup. This may mean we prioritize usability over
> > >> developer convenience, if that's in the best interest of Flink as a
> > >> whole.
> > >>
> > >> Thanks,
> > >> Thomas
> > >>
> > >>
> > >>
> > >> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
> > >> wrote:
> > >>> Generally, the issues are reproducibility and control.
> > >>>
> > >>> Stuffs completely broken on the Flink side for a week? Well then so
> are
> > >>> the connector repos.
> > >>> (As-is) You can't go back to a previous version of the snapshot.
> Which
> > >>> also means that checking out older commits can be problematic because
> > >>> you'd still work against the latest snapshots, and they not be
> > >>> compatible with each other.
> > >>>
> > >>>
> > >>> On 18/10/2021 15:22, Arvid Heise wrote:
> > >>>> I was actually betting on snapshots versions. What are the limits?
> > >>>> Obviously, we can only do a release of a 1.15 connector after 1.15
> is
> > >>>> release.
> > >>>
> >
> >
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Konstantin Knauf
Thank you, Arvid & team, for working on this.

I would also favor one connector repository under the ASF. This will
already force us to provide better tools and more stable APIs, which
connectors developed outside of Apache Flink will benefit from, too.

Besides simplifying the formal release process for connectors, I believe,
we can also be more liberal with Committership for connector maintainers.

I expect that this setup can scale better than the current one, but it
doesn't scale super well either. In addition, there is still the ASF
barrier to contributions/releases. So, we might have more connectors in
this repository than we have in Apache Flink right now, but not all
connectors will end up in this repository. For those "external" connectors,
we should still aim to improve visibility, documentation and tooling.

It feels like such a hybrid approach might be the only option given
competing requirements.

Thanks,

Konstnatin

On Mon, Oct 18, 2021 at 10:22 PM Thomas Weise  wrote:

> Thanks for initiating this discussion.
>
> There are definitely a few things that are not optimal with our
> current management of connectors. I would not necessarily characterize
> it as a "mess" though. As the points raised so far show, it isn't easy
> to find a solution that balances competing requirements and leads to a
> net improvement.
>
> It would be great if we can find a setup that allows for connectors to
> be released independently of core Flink and that each connector can be
> released separately. Flink already has separate releases
> (flink-shaded), so that by itself isn't a new thing. Per-connector
> releases would need to allow for more frequent releases (without the
> baggage that a full Flink release comes with).
>
> Separate releases would only make sense if the core Flink surface is
> fairly stable though. As evident from Iceberg (and also Beam), that's
> not the case currently. We should probably focus on addressing the
> stability first, before splitting code. A success criteria could be
> that we are able to build Iceberg and Beam against multiple Flink
> versions w/o the need to change code. The goal would be that no
> connector breaks when we make changes to Flink core. Until that's the
> case, code separation creates a setup where 1+1 or N+1 repositories
> need to move lock step.
>
> Regarding some connectors being more important for Flink than others:
> That's a fact. Flink w/o Kafka connector (and few others) isn't
> viable. Testability of Flink was already brought up, can we really
> certify a Flink core release without Kafka connector? Maybe those
> connectors that are used in Flink e2e tests to validate functionality
> of core Flink should not be broken out?
>
> Finally, I think that the connectors that move into separate repos
> should remain part of the Apache Flink project. Larger organizations
> tend to approve the use of and contribution to open source at the
> project level. Sometimes it is everything ASF. More often it is
> "Apache Foo". It would be fatal to end up with a patchwork of projects
> with potentially different licenses and governance to arrive at a
> working Flink setup. This may mean we prioritize usability over
> developer convenience, if that's in the best interest of Flink as a
> whole.
>
> Thanks,
> Thomas
>
>
>
> On Mon, Oct 18, 2021 at 6:59 AM Chesnay Schepler 
> wrote:
> >
> > Generally, the issues are reproducibility and control.
> >
> > Stuffs completely broken on the Flink side for a week? Well then so are
> > the connector repos.
> > (As-is) You can't go back to a previous version of the snapshot. Which
> > also means that checking out older commits can be problematic because
> > you'd still work against the latest snapshots, and they not be
> > compatible with each other.
> >
> >
> > On 18/10/2021 15:22, Arvid Heise wrote:
> > > I was actually betting on snapshots versions. What are the limits?
> > > Obviously, we can only do a release of a 1.15 connector after 1.15 is
> > > release.
> >
> >
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [ANNOUNCE] Apache Flink 1.14.0 released

2021-09-30 Thread Konstantin Knauf
Big Thanks to David, Joe, Xintong and everyone who contributed to the
release! Well done!

Cheers,

Konstantin

On Thu, Sep 30, 2021 at 3:12 AM Yangze Guo  wrote:

> Thanks, Xintong, Joe, Dawid for the great work, thanks to everyone
> involved!
>
> Best,
> Yangze Guo
>
> On Thu, Sep 30, 2021 at 12:02 AM Rion Williams 
> wrote:
> >
> > Great news all! Looking forward to it!
> >
> > > On Sep 29, 2021, at 10:43 AM, Theo Diefenthal <
> theo.diefent...@scoop-software.de> wrote:
> > >
> > > 
> > > Awesome, thanks for the release.
> > >
> > > - Ursprüngliche Mail -
> > > Von: "Dawid Wysakowicz" 
> > > An: "dev" , "user" ,
> annou...@apache.org
> > > Gesendet: Mittwoch, 29. September 2021 15:59:47
> > > Betreff: [ANNOUNCE] Apache Flink 1.14.0 released
> > >
> > > The Apache Flink community is very happy to announce the release of
> > > Apache Flink 1.14.0.
> > >
> > > Apache Flink® is an open-source stream processing framework for
> > > distributed, high-performing, always-available, and accurate data
> > > streaming applications.
> > >
> > > The release is available for download at:
> > > https://flink.apache.org/downloads.html
> > >
> > > Please check out the release blog post for an overview of the
> > > improvements for this bugfix release:
> > > https://flink.apache.org/news/2021/09/29/release-1.14.0.html
> > >
> > > The full release notes are available in Jira:
> > >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12349614
> > >
> > > We would like to thank all contributors of the Apache Flink community
> > > who made this release possible!
> > >
> > > Regards,
> > > Xintong, Joe, Dawid
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [ANNOUNCE] Apache Flink 1.13.0 released

2021-05-04 Thread Konstantin Knauf
Thank you Dawid and Guowei! Great job everyone :)

On Mon, May 3, 2021 at 7:11 PM Till Rohrmann  wrote:

> This is great news. Thanks a lot for being our release managers Dawid and
> Guowei! And also thanks to everyone who has made this release possible :-)
>
> Cheers,
> Till
>
> On Mon, May 3, 2021 at 5:46 PM vishalovercome  wrote:
>
>> This is a very big release! Many thanks to the flink developers for their
>> contributions to making Flink as good a framework that it is!
>>
>>
>>
>> --
>> Sent from:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>>
>

-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Yip Park Tung Jason, Jinwei (Kevin) Zhang, Karl Anton
Wehner


[ANNOUNCE] Flink Jira Bot fully live (& Useful Filters to Work with the Bot)

2021-04-22 Thread Konstantin Knauf
Hi everyone,

all of the Jira Bot rules are live now. Particularly in the beginning the
Jira Bot will be very active, because the rules apply to a lot of old,
stale tickets. So, if you get a huge amount of emails from the Flink Jira
Bot right now, this will get better. In any case, the Flink Jira Bot (or
the rules that it implements) demand some changes to how we work with Jira.

Here are a few points to make this transition easier for us:

*1) Retrospective*

In 3-4 weeks I would like to collect feedback. What is working well? What
is not working well or getting in your way? Is the bot moving us closer to
the goals mentioned in the initial email? Specifically, the
initial parameterization [1] of the bot was kind of an educated guess. I
will open a [DISCUSS]ion thread to collect feedback and proposals for
changes around that time.

*2) Use Sub-Tasks*

The bot will ask you for an update on assigned tickets after quite a short
time for Flink standards. If you are working on a ticket that takes longer,
consider splitting it up in Sub-Tasks. Activity on Sub-Tasks counts as
activity for the parent ticket, too. So, as long as any subtask is moving
along you won't be nagged by the bot.


*3) Useful Filters*

You've probably received a lot of emails already, in particular if you are
watching many tickets. Here are a few JIRA filters to find the tickets,
that are probably most important to you and have been updated by the bot:

Tickets that *you are assigned to*, which are "stale-assigned"

https://issues.apache.org/jira/issues/?filter=12350499

Tickets that *you reported*, which are stale in anyway:

https://issues.apache.org/jira/issues/?filter=12350500

If you are a maintainer of some components, you might find the following
filters useful (replace with your own components):

*All tickets that are about to be closed*
project = FLINK AND component in ("Build System", "BuildSystem / Shaded",
"Deployment / Kubernetes", "Deployment / Mesos", "Deployment / YARN",
flink-docker, "Release System", "Runtime / Coordination", "Runtime /
Metrics", "Runtime / Queryable State", "Runtime / REST", Travis) AND
resolution = Unresolved AND labels in (stale-minor)

*Bugs that are about to be deprioritized or closed*
project = FLINK AND type = BUG AND component in ("Build System",
"BuildSystem / Shaded", "Deployment / Kubernetes", "Deployment / Mesos",
"Deployment / YARN", flink-docker, "Release System", "Runtime /
Coordination", "Runtime / Metrics", "Runtime / Queryable State", "Runtime /
REST", Travis) AND resolution = Unresolved AND labels in (stale-major,
stale-blocker, stale-critical, stale-minor)


*Tickets that are stale-assigned, but already have a PR available*project =
FLINK AND component in ("Build System", "BuildSystem / Shaded", "Deployment
/ Kubernetes", "Deployment / Mesos", "Deployment / YARN", flink-docker,
"Release System", "Runtime / Coordination", "Runtime / Metrics", "Runtime /
Queryable State", "Runtime / REST", Travis) AND resolution = Unresolved AND
labels in (stale-assigned) AND labels in (pull-request-available)

Cheers,

Konstantin

[1] https://github.com/apache/flink-jira-bot/blob/master/config.yaml


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: Flink Statefun Python Batch

2021-04-21 Thread Konstantin Knauf
Hi Igal, Hi Timothy,

this sounds very interesting. Both state introspection as well as
OpenTracing support have been requested by multiple users before, so
certainly something we are willing to invest into. Timothy, would you have
time for a 30min call in the next days to understand your use case and
requirements better? In the meantime, let's document these feature requests
in Jira.

* Exposing Batches to SDKs:
https://issues.apache.org/jira/browse/FLINK-22389
* Support for OpenTracing: https://issues.apache.org/jira/browse/FLINK-22390
* Support for State Introspection:
https://issues.apache.org/jira/browse/FLINK-22391

Please feel free to edit, comment on these issues directly, too.

Cheers,

Konstantin



Am Mi., 21. Apr. 2021 um 09:15 Uhr schrieb Igal Shilman :

> Hi Tim,
>
> Yes, I think that this feature can be implemented relatively fast.
> If this blocks you at the moment, I can prepare a branch for you to
> experiment with, in the following days.
>
> Regarding to open tracing integration, I think the community can benefit a
> lot out of this,
> and definitely contributions are welcome!
>
> @Konstantin Knauf  would you like to understand more
> in depth, Tim's use case with opentracing?
>
> Thanks,
> Igal.
>
>
>
> On Tue, Apr 20, 2021 at 8:10 PM Timothy Bess  wrote:
>
>> Hi Igal,
>>
>> Yes! that's exactly what I was thinking. The batching will naturally
>> happen as the model applies backpressure. We're using pandas and it's
>> pretty costly to create a dataframe and everything to process a single
>> event. Internally the SDK has access to the batch and is calling my
>> function, which creates a dataframe for each individual event. This causes
>> a ton of overhead since we basically get destroyed by the constant factors
>> around creating and operating on dataframes.
>>
>> Knowing how the SDK works, it seems like it'd be easy to do something
>> like your example and maybe have a different decorator for "batch
>> functions" where the SDK just passes in everything at once.
>>
>> Also just out of curiosity are there plans to build out more
>> introspection into statefun's flink state? I was thinking it would be super
>> useful to add either Queryable state or have some control topic that
>> statefun listens to that allows me to send events to introspect or modify
>> flink state.
>>
>> For example like:
>>
>> // control topic request
>> {"type": "FunctionIdsReq", "namespace": "foo", "type": "bar"}
>> // response
>> {"type": "FunctionIdsResp", "ids": [ "1", "2", "3", ... ] }
>>
>> Or
>>
>> {"type": "SetState", "namespace": "foo", "type": "bar", "id": "1", value:
>> "base64bytes"}
>> {"type": "DeleteState", "namespace": "foo", "type": "bar", "id": "1"}
>>
>> Also having opentracing integration where Statefun passes b3 headers with
>> each request so we can trace a message's route through statefun would be
>> _super_ useful. We'd literally be able to see the entire path of an event
>> from ingress to egress and time spent in each function. Not sure if there
>> are any plans around that, but since we're live with a statefun project
>> now, it's possible we could contribute some if you guys are open to it.
>>
>> Thanks,
>>
>> Tim
>>
>> On Tue, Apr 20, 2021 at 9:25 AM Igal Shilman  wrote:
>>
>>> Hi Tim!
>>>
>>> Indeed the StateFun SDK / StateFun runtime, has an internal concept of
>>> batching, that kicks in the presence of a slow
>>> /congested remote function. Keep in mind that under normal circumstances
>>> batching does not happen (effectively a batch of size 1 will be sent). [1]
>>> This batch is not currently exposed via the SDKs (both Java and Python)
>>> as it is an implementation detail (see [2]).
>>>
>>> The way I understand your message (please correct me if I'm wrong): is
>>> that evaluation of the ML model is costly, and it would benefit from some
>>> sort of batching (like pandas do i assume ?)
>>> instead of being applied for every event individually.
>>> If this is the case, perhaps exposing this batch can be a useful feature
>>> to add.
>>>
>>> For example:
>>>
>>> @functions.bind_tim(..)
>>> def ml(context, messages: typing.List[Message]):
>>>   ...
>>>
>>>

Re: [BULK]Re: [SURVEY] Remove Mesos support

2021-03-25 Thread Konstantin Knauf
ergaard <
> >> lassenedergaardfl...@gmail.com<mailto:lassenedergaardfl...@gmail.com>>
> >> wrote:
> >>
> >> Hi
> >>
> >>
> >>
> >> At Trackunit We have been using Mesos for long time but have now moved
> to
> >> k8s.
> >>
> >> Med venlig hilsen / Best regards
> >>
> >> Lasse Nedergaard
> >>
> >>
> >>
> >>
> >>
> >> Den 23. okt. 2020 kl. 17.01 skrev Robert Metzger  >> <mailto:rmetz...@apache.org>>:
> >>
> >> 
> >>
> >> Hey Piyush,
> >>
> >> thanks a lot for raising this concern. I believe we should keep Mesos in
> >> Flink then in the foreseeable future.
> >>
> >> Your offer to help is much appreciated. We'll let you know once there is
> >> something.
> >>
> >>
> >>
> >> On Fri, Oct 23, 2020 at 4:28 PM Piyush Narang  >> <mailto:p.nar...@criteo.com>> wrote:
> >>
> >> Thanks Kostas. If there's items we can help with, I'm sure we'd be able
> >> to find folks who would be excited to contribute / help in any way.
> >>
> >> -- Piyush
> >>
> >>
> >> On 10/23/20, 10:25 AM, "Kostas Kloudas"  >> kklou...@gmail.com>> wrote:
> >>
> >> Thanks Piyush for the message.
> >> After this, I revoke my +1. I agree with the previous opinions that
> we
> >> cannot drop code that is actively used by users, especially if it
> >> something that deep in the stack as support for cluster management
> >> framework.
> >>
> >> Cheers,
> >> Kostas
> >>
> >> On Fri, Oct 23, 2020 at 4:15 PM Piyush Narang  >> <mailto:p.nar...@criteo.com>> wrote:
> >> >
> >> > Hi folks,
> >> >
> >> >
> >> >
> >> > We at Criteo are active users of the Flink on Mesos resource
> >> management component. We are pretty heavy users of Mesos for scheduling
> >> workloads on our edge datacenters and we do want to continue to be able
> to
> >> run some of our Flink topologies (to compute machine learning short term
> >> features) on those DCs. If possible our vote would be not to drop Mesos
> >> support as that will tie us to an old release / have to maintain a fork
> as
> >> we’re not planning to migrate off Mesos anytime soon. Is the burden
> >> something that can be helped with by the community? (Or are you
> referring
> >> to having to ensure PRs handle the Mesos piece as well when they touch
> the
> >> resource managers?)
> >> >
> >> >
> >> >
> >> > Thanks,
> >> >
> >> >
> >> >
> >> > -- Piyush
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > From: Till Rohrmann  >> trohrm...@apache.org>>
> >> > Date: Friday, October 23, 2020 at 8:19 AM
> >> > To: Xintong Song  >> tonysong...@gmail.com>>
> >> > Cc: dev mailto:d...@flink.apache.org>>,
> user <
> >> user@flink.apache.org<mailto:user@flink.apache.org>>
> >> > Subject: Re: [SURVEY] Remove Mesos support
> >> >
> >> >
> >> >
> >> > Thanks for starting this survey Robert! I second Konstantin and
> >> Xintong in the sense that our Mesos user's opinions should matter most
> >> here. If our community is no longer using the Mesos integration, then I
> >> would be +1 for removing it in order to decrease the maintenance burden.
> >> >
> >> >
> >> >
> >> > Cheers,
> >> >
> >> > Till
> >> >
> >> >
> >> >
> >> > On Fri, Oct 23, 2020 at 2:03 PM Xintong Song <
> tonysong...@gmail.com
> >> <mailto:tonysong...@gmail.com>> wrote:
> >> >
> >> > +1 for adding a warning in 1.12 about planning to remove Mesos
> >> support.
> >> >
> >> >
> >> >
> >> > With my developer hat on, removing the Mesos support would
> >> definitely reduce the maintaining overhead for the deployment and
> resource
> >> management related components. On the other hand, the Flink on Mesos
> users'
> >> voices definitely matter a lot for this community. Either way, it would
> be
> >> good to draw users attention to this discussion early.
> >> >
> >> >
> >> >
> >> > Thank you~
> >> >
> >> > Xintong Song
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Oct 23, 2020 at 7:53 PM Konstantin Knauf <
> kna...@apache.org
> >> <mailto:kna...@apache.org>> wrote:
> >> >
> >> > Hi Robert,
> >> >
> >> > +1 to the plan you outlined. If we were to drop support in Flink
> >> 1.13+, we
> >> > would still support it in Flink 1.12- with bug fixes for some time
> >> so that
> >> > users have time to move on.
> >> >
> >> > It would certainly be very interesting to hear from current Flink
> >> on Mesos
> >> > users, on how they see the evolution of this part of the
> ecosystem.
> >> >
> >> > Best,
> >> >
> >> > Konstantin
> >>
> >
>


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: Future of QueryableState

2021-03-09 Thread Konstantin Knauf
Hi Maciek,

Thank you for reaching out. I'll try to answer your questions separately.

- nothing comparable. You already mention the State Processor API. Besides
that, I can only think of a side channel (CoFunction) that is used to
request a certain state that is then send to a side output and ultimate to
a sink, e.g. Kafka State Request Topic -> Flink -> Kafka State Response
Topic. This puts this complexity into the Flink Job, though.

- I think it is a combination of both. Queryable State works well within
its limitations. In the case of the RocksDBStatebackend this is mainly the
availability of the job and the fact that you might read "uncommitted"
state updates. In case of the heap-backed statebackends there are also
synchronization issues, e.g. you might read stale values. You also mention
the fact that queryable state has been an afterthought when it comes to
more recent deployment options. I am not aware of any Committer who
currently has the time to work on this to the degree that would be
required. So, we thought, it would be more fair and realistic to mark
Queryable State as "approaching end of life" in the sense that there is no
active development on that component anymore.

Best,

Konstantin

On Tue, Mar 9, 2021 at 7:08 AM Maciek Próchniak  wrote:

> Hello,
>
>
> We are using QueryableState in some of Nussknacker deployments as a nice
> addition, allowing end users to peek inside job state for a given key
> (we mostly use custom operators).
>
>
> Judging by mailing list and feature radar proposition by Stephan:
>
> https://github.com/StephanEwen/flink-web/blob/feature_radar/img/flink_feature_radar.svg
>
>
> this feature is not widely used/supported. I'd like to ask:
>
> - are there any alternative ways of accessing state during job
> execution? State API is very nice, but it operates on checkpoints and
> loading whole state to lookup one key seems a bit heavy?
>
> - are there any inherent problems in QueryableState design (e.g. it's
> not feasible to use it in K8 settings, performance considerations) or
> just lack of interest/support (in that case we may offer some help)?
>
>
> thanks,
>
> maciek
>
>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: FlinkSQL kafka->dedup->kafka

2020-12-15 Thread Konstantin Knauf
HI Laurent,

Did you manage to find the error in your MATCH_RECOGNIZE statement? If I
had to take a guess, I'd say it's because you are accessing A, but due to
the quantifier of * there might actually be no event A.

Cheers,

Konstantin



On Fri, Nov 27, 2020 at 10:03 PM Laurent Exsteens <
laurent.exste...@euranova.eu> wrote:

> Hi Leonard,
>
>
>> From  my understanding, your case is not a pure deduplication case but
>> want to both keep the previous record and current record, thus the
>> deduplication query can not satisfy your requirement.
>>
>
> Indeed, that's what I came to realise during our discussion on this email
> chain. I'm sorry if it caused confusion. I'm still not sure how to express
> this requirement in a concise way: "the need to deduplicate but let
> previous values come back after a different value has appeared"
>
>
>> Keeping last row in Deduplication always produces a changelog stream,
>> because we need to retract the previous last value and sent the new last
>> value. You could use a connector that supports upsert sink like HBase, JDBC
>> or upsert-kafka connector when sink a changelog stream, the kafka connector
>> can only accept append-only stream and thus you got the message.
>>
>
> That's what I understood indeed. But in my case I really do want to insert
> and not upsert.
> Just for information: the goal is to be able to historize kafka messages
> in real-time. Each message could potentially be splitted to store
> information in multiple tables (in my example: name and address would be
> inserted in 2 different tables), and the history should be kept and
> enriched with the ingestion date. The fact that the kafka message can be
> split to be stored in multiple tables creates that "deduplication"
> requirement (in my example the address could have changed but not the name,
> and we don't want to add a record with no business value in the table
> containing the names). And of course, a field can be changed twice and as a
> result have the same value again, and that's business information we do
> want to keep.
>
>
>> The LAG function is used in over window aggregation and should work in
>> your case, but unfortunately look like the LAG function does not implements
>> correctly, I create an issue[1] to fix this.
>>
>
> Thanks a lot! I'll follow the issue.
> I would love to try to fix it... but quickly looking at that code, I'm not
> sure it's the best way to start contributing. I don't understand what
> should be changed in that code, let alone find what generated that code and
> how it should be fixed...
>
>
> In the meantime, I guess the only other option would be the
> MATCH_RECOGNIZE?
> Do you think you help me find what I did wrong in this query:
>
> SELECT *
> FROM customers
> MATCH_RECOGNIZE (
> PARTITION BY client_number
> ORDER BY proctime()
> MEASURES
> B.client_number as client_number,
> B.address as address
> PATTERN (A* B)
> DEFINE
> B AS LAST(A.address, 1) is NULL OR B.address <> LAST(A.address, 1)
> ) as T;
>
> I get the following error:
> SQL validation failed. Index 0 out of bounds for length 0
>
> Thanks a lot for your help!
>
> Laurent.
>
>
>>
>>
>> Best,
>> Leonard
>> [1] https://issues.apache.org/jira/browse/FLINK-20405
>>
>> On Fri, 27 Nov 2020 at 03:28, Leonard Xu  wrote:
>>
>>> Hi, Laurent
>>>
>>> Basically, I need to deduplicate, *but only keeping in the
>>> deduplication state the latest value of the changed column* to compare
>>> with. While here it seems to keep all previous values…
>>>
>>>
>>> You can use ` ORDER BY proctime() DESC`  in the deduplication query,
>>>  it will keep last row, I think that’s what you want.
>>>
>>> BTW, the deduplication has supported event time in 1.12, this will be
>>> available soon.
>>>
>>> Best,
>>> Leonard
>>>
>>>
>>
>> --
>> *Laurent Exsteens*
>> Data Engineer
>> (M) +32 (0) 486 20 48 36
>>
>> *EURA NOVA*
>> Rue Emile Francqui, 4
>> 1435 Mont-Saint-Guibert
>> (T) +32 10 75 02 00
>>
>>
>> *euranova.eu <http://euranova.eu/>*
>> *research.euranova.eu* <http://research.euranova.eu/>
>>
>> ♻ Be green, keep it on the screen
>>
>>
>>
>
> --
> *Laurent Exsteens*
> Data Engineer
> (M) +32 (0) 486 20 48 36
>
> *EURA NOVA*
>
> Rue Emile Francqui, 4
>
> 1435 Mont-Saint-Guibert
>
> (T) +32 10 75 02 00
>
> *euranova.eu <http://euranova.eu/>*
>
> *research.euranova.eu* <http://research.euranova.eu/>
>
> ♻ Be green, keep it on the screen



-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: FlinkSQL kafka->dedup->kafka

2020-11-27 Thread Konstantin Knauf
Hi Laurent,

With respect to Ververica Platform, we will support Flink 1.12 and add
"upsert-kafka" as a packaged connector in our next minor release which we
target for February.

Cheers,

Konstantin

On Thu, Nov 12, 2020 at 3:43 AM Jark Wu  wrote:

> Hi Laurent,
>
> 1. Deduplicate with keeping the first row will generate an append-only
> stream. But I guess you are expecting to keep the last row which generates
> an updating stream. An alternative way is you can
>  use the "changelog-json" format in this repo [1], it will convert the
> updating stream into append
> records with change flag encoded.
> 2. Yes. It will replace records with the same key, i.e. upsert statement.
> 3. I think it will be available in one or two months. There will be a
> first release candidate soon.
> You can watch on the dev ML. I'm not sure the plan of Ververica
> platform, cc @Konstantin Knauf 
>
> Best,
> Jark
>
> [1]:
> https://github.com/ververica/flink-cdc-connectors/wiki/Changelog-JSON-Format
>
> On Wed, 11 Nov 2020 at 21:31, Laurent Exsteens <
> laurent.exste...@euranova.eu> wrote:
>
>> Hi Jark,
>>
>> thanks for your quick reply. I was indeed expecting it.
>>
>> But that triggers the following questions:
>>
>>1. Is there another way to do this deduplication and generate an
>>append-only stream? Match Recognize? UDF? ...?
>>2. If I would put Postgres as a sink, what would happen? Will the
>>events happen or will they replace the record with the same key?
>>3. When will release-1.12 be available? And when would it be
>>integrated in the Ververica platform?
>>
>> Thanks a lot for your help!
>>
>> Best Regards,
>>
>> Laurent.
>>
>>
>>
>> On Wed, 11 Nov 2020 at 03:31, Jark Wu  wrote:
>>
>>> Hi Laurent,
>>>
>>> This is because the deduplicate node generates an updating stream,
>>> however Kafka currently only supports append-only stream.
>>> This can be addressed in release-1.12, because we introduce a new
>>> connector "upsert-kafka" which supports writing updating
>>>  streams into Kafka compacted topics.
>>>
>>> Does the "Kafka ingestion date" refer to "kafka message timestamp", i.e.
>>> ConsumerRecord#timestamp()?
>>> If yes, this is also supported in release-1.12 via metadata syntax in
>>> DDL [1]:
>>>
>>> CREATE TABLE kafka_table (
>>>   id BIGINT,
>>>   name STRING,
>>>   timestamp BIGINT METADATA,  -- read timestamp
>>> ) WITH (
>>>   'connector' = 'kafka',
>>>   'topic' = 'test-topic',
>>>   'format' = 'avro'
>>> )
>>>
>>> Best,
>>> Jark
>>>
>>> [1]:
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors
>>>
>>> On Tue, 10 Nov 2020 at 23:12, Laurent Exsteens <
>>> laurent.exste...@euranova.eu> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm getting an error  in Flink SQL when reading from kafka,
>>>> deduplicating records and sending them back to Kafka.
>>>>
>>>> The behavior I want is the following:
>>>>
>>>> *input:*
>>>> | client_number | address |
>>>> | --- | --- |
>>>> | 1  | addr1 |
>>>> | 1  | addr1 |
>>>> | 1  | addr2 |
>>>> | 1  | addr2 |
>>>> | 1  | addr1 |
>>>> | 1  | addr1 |
>>>>
>>>> *output:*
>>>> | client_number | address |
>>>> | --- | --- |
>>>> | 1  | addr1 |
>>>> | 1  | addr2 |
>>>> | 1  | addr1 |
>>>>
>>>> The error seems to say that the type of stream created by the
>>>> deduplication query is of "update & delete" type, while kafka only supports
>>>> append-only:
>>>>
>>>> Unsupported query
>>>> Table sink 'vvp.default.sat_customers_address' doesn't support
>>>> consuming update and delete changes which is produced by node
>>>> Rank(strategy=[UndefinedStrategy], rankType=[ROW_NUMBER],
>>>> rankRange=[rankStart=1, rankEnd=1], partitionBy=[client_number, address,
>>>> $2], orderBy=[$3 ASC], select=[client_numb

[ANNOUNCE] Weekly Community Update 2020/44-45

2020-11-11 Thread Konstantin Knauf
/apache-hudi-meets-apache-flink/
[14]
https://www.ververica.com/blog/how-mitigating-event-time-skewness-can-reduce-checkpoint-failures-and-task-manager-crashes

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/39-43

2020-10-27 Thread Konstantin Knauf
n3.nabble.com/DISCUSS-FLIP-146-Improve-new-TableSource-and-TableSink-interfaces-tp45161p45172.html
[9]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-149-Introduce-the-KTable-Connector-tp45813.html
[10]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Remove-Mesos-support-tp45974.html
[11]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduce-MongoDB-connector-for-Flink-tp45380.html
[12]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Nebula-Connector-tp45885.html
[13]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-flink-connector-filesystem-module-tp45611.html
[14]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-148-Introduce-Sort-Merge-Based-Blocking-Shuffle-to-Flink-tp45717p45727.html
[15]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-docker-Adopt-Jemalloc-as-default-memory-allocator-for-debian-based-Flink-docker-image-tp45643.html
[16]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Enforce-common-opinionated-coding-style-using-Spotless-tp45450p45451.html


Notable Bugs
==

I've limited my research to the last seven days.

* [FLINK-19790] [1.11.2] Writing MAP column with flink-json produces
incorrect results. Fixed for 1.11.3. [17]
* [FLINK-19711] [1.11.2] [1.10.2] "csv" format does not ignore all parsing
errors even when told so. Fixed for 1.11.3 and 1.10.3. [18]
* [FLINK-19557] [1.11.2] [1.10.2] When a following job manager loses its
connection to Zookeeper and subsequently reconnects, it might miss the fact
that the leader has not changed in the meantime is stuck in "leader
election". Please check the ticket for a more precise description. Fixed
for 1.10.3 and 1.11.3. [19]

[17] https://issues.apache.org/jira/browse/FLINK-19790
[18] https://issues.apache.org/jira/browse/FLINK-19711
[19] https://issues.apache.org/jira/browse/FLINK-19557

Events, Blog Posts, Misc
===

* Zhu Zhu is now a member of the Apache Flink PMC. [20]

* There are two newish blog post on the Flink blog:

* Arvid and Stephan published a first part of a blog post series on the
motivation and development of unaligned checkpoints. [21]
<https://flink.apache.org/2020/10/15/from-aligned-to-unaligned-checkpoints-part-1.html>
   * Gordon published a blog post that looks under the hood of Apache Flink
stateful functions, e.g. explaining the communication protocol between
remote functions and the stateful functions runtime. [22]

* Flink Forward Global took place last week. The recordings of most of the
talks will be available soon.

[20]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-PMC-member-Zhu-Zhu-tp45418p45474.html
[21]
https://flink.apache.org/2020/10/15/from-aligned-to-unaligned-checkpoints-part-1.html
[22]
https://flink.apache.org/news/2020/10/13/stateful-serverless-internals.html

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [SURVEY] Remove Mesos support

2020-10-23 Thread Konstantin Knauf
Hi Robert,

+1 to the plan you outlined. If we were to drop support in Flink 1.13+, we
would still support it in Flink 1.12- with bug fixes for some time so that
users have time to move on.

It would certainly be very interesting to hear from current Flink on Mesos
users, on how they see the evolution of this part of the ecosystem.

Best,

Konstantin


Re: [DISCUSS] Remove flink-connector-filesystem module.

2020-10-13 Thread Konstantin Knauf
Given that it has been deprecated for three releases now, I am +1 to
dropping it.

On Mon, Oct 12, 2020 at 9:38 PM Chesnay Schepler  wrote:

> Is there a way for us to change the module (in a reasonable way) that
> would allow users to continue using it?
> Is it an API problem, or one of semantics?
>
> On 10/12/2020 4:57 PM, Kostas Kloudas wrote:
> > Hi Chesnay,
> >
> > Unfortunately not from what I can see in the code.
> > This is the reason why I am opening a discussion. I think that if we
> > supported backwards compatibility, this would have been an easier
> > process.
> >
> > Kostas
> >
> > On Mon, Oct 12, 2020 at 4:32 PM Chesnay Schepler 
> wrote:
> >> Are older versions of the module compatible with 1.12+?
> >>
> >> On 10/12/2020 4:30 PM, Kostas Kloudas wrote:
> >>> Hi all,
> >>>
> >>> As the title suggests, this thread is to discuss the removal of the
> >>> flink-connector-filesystem module which contains (only) the deprecated
> >>> BucketingSink. The BucketingSin is deprecated since FLINK 1.9 [1] in
> >>> favor of the relatively recently introduced StreamingFileSink.
> >>>
> >>> For the sake of a clean and more manageable codebase, I propose to
> >>> remove this module for release-1.12, but of course we should see first
> >>> if there are any usecases that depend on it.
> >>>
> >>> Let's have a fruitful discussion.
> >>>
> >>> Cheers,
> >>> Kostas
> >>>
> >>> [1] https://issues.apache.org/jira/browse/FLINK-13396
> >>>
>
>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/38

2020-09-22 Thread Konstantin Knauf
Dear community,

happy to share a brief community update for the past week. A lot of FLIP
votes are currently ongoing on the dev@ mailing list. I've covered this
FLIP previously, so skipping those this time. Besides that, a couple of
release related updates and again multiple new Committers.

Flink Development
==

* [releases] Apache Flink 1.12.2 was released. [1]

* [releases] The first release candidate for Stateful Functions 2.2.0 was
published and already cancelled :) A new release candidate will probably be
published today. [2]

* [releases] Robert has shared another update on blocker and build
instabilities for the upcoming release of Apache Flink 1.12. There are five
weeks left till feature freeze. [3]

* [releases] Chesnay started a discussion thread on releases flink-shaded
12.0 containing upgrades to some of Apache Flink's core dependencies. [4]

* [cep, sql] Kosma has started a discussion on supporting timeouts in
MATCH_RECOGNIZE, which would allow a pattern to fire/match in the absence
of an event. [5]

[1] https://flink.apache.org/news/2020/09/17/release-1.11.2.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Apache-Flink-Stateful-Functions-2-2-0-release-candidate-1-tp45032.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Release-1-12-Stale-blockers-and-build-instabilities-tp43477.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Release-flink-shaded-12-0-td44968.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Timed-out-patterns-handling-using-MATCH-RECOGNIZE-tp45001.html

Notable Bugs
==

Nothing notable came to my attention.

Events, Blog Posts, Misc
===

* Godfrey He, Igal Shilman and Yun Tang are now Apache Flink Committers.
Congratulations! [6,7,8]

* The first keynote of Flink Forward Global has been announced: "Real-Time
Metrics at Fortnite Scale" by Ricky Saltzer of Epic Games []

[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-Apache-Flink-Committer-Godfrey-He-tp44830.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-Apache-Flink-Committer-Igal-Shilman-tp44754p44865.html
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-Apache-Flink-Committer-Yun-Tang-tp44777p44909.html
[9] https://twitter.com/FlinkForward/status/1306219099475902464

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/37

2020-09-15 Thread Konstantin Knauf
Dear community,

happy to share a belated update for the past week. This time with the
release of Flink 1.11.2, a couple of discussions and FLIPs on improving
Flink's APIs and dropping some baggage, most notably Scala 2.11, a new
unified sink API and a bit more.

Flink Development
==

* [releases] The vote for Flink 1.11.2 has passed today. Release
announcement will follow shortly. [1]

* [releases] Feature Freeze for Stateful Function 2.2.0 happened as planned
five days ago. Release candidate is expected soon. [2]

* [apis] Seth has started a discussion to drop Scala 2.11. Many in favor so
far. [3]

* [apis] Seth proposes to separate the concepts of "Statebackend" and
"Checkpointing" in Flink's APIs. So far, the statebackend (RocksDB vs
Heap-Based) and checkpointing storage (usually a distributed file system)
are handled by the same classes and configured together. He believes this
is the reason for some of the confusion of users around these concepts.
[4,5]

* [apis] Aljoscha started a discussion to deprecate and later
remove UnionList OperatorState. Some current users of UnionList
OperatorState have voiced concerns. No conclusion so far. [6]

* [connectors] Guowei has published FLIP-143 "Unified Sink API". This is
another follow up FLIP in order for the DataStream to supersede the DataSet
API (FLIP-131). The basic idea is to provide an interface that allows the
development of sinks that provide exactly-once guarantees for both bounded
and unbounded workloads, but don't require the developer of the sink to
make this distinction. [6,7]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/RESULT-VOTE-Release-1-11-2-release-candidate-1-tp44731.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Feature-freeze-for-Stateful-Functions-2-2-0-tp44606.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Drop-Scala-2-11-tp44607.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-142-Disentangle-StateBackends-from-Checkpointing-tp44496.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-142-Disentangle-StateBackends-from-Checkpointing-tp44679.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Deprecate-and-remove-UnionList-OperatorState-tp44548p44650.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-143-Unified-Sink-API-tp44602.html

flink-packages.org
==

* Jark has published an implementation of the Nexmark benchmark for Apache
Flink. [8]

[8] https://flink-packages.org/packages/nexmark-benchmark


Notable Bugs
==

Nothing came to my attention.

Events, Blog Posts, Misc
===

* Arvid Heise & Niels Basjes are Apache Flink Committers now!
Congratulations to both. [9,10]

* The third video of my colleague Alexander's "Introduction to Flink"
series has been published. This time about building event-driven
applications with Apache Flink. [11]

* On the Ververica Blog, Jark and Qingshen give an outlook to their
upcoming talk at Flink Forward Global [12] on Change Data Capture with
Flink SQL. [13]

[9]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-Apache-Flink-Committer-Arvid-Heise-tp44713.html
[10]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-Apache-Flink-Committer-Niels-Basjes-tp44653.html
[11] https://www.youtube.com/watch?v=t663p-qHijE=youtu.be
[12] https://www.flink-forward.org/global-2020
[13]
https://www.ververica.com/blog/a-deep-dive-on-change-data-capture-with-flink-sql-during-flink-forward

Cheers,

Konstantin


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/36

2020-09-07 Thread Konstantin Knauf
Dear community,

happy to share another community update for the past week. This time with
the upcoming release of Flink 1.11.2, a proposal for more efficient
aggregation for batch processing with the DataStream API, and the
comeback of two FLIPs that have been abandoned for a bit.

Flink Development
==

* [releases] Zhu Zhu proposes to release Flink 1.11.2 soon and has started
collecting blockers. Not too many open tickets, so I am anticipating a
first release candidate this week. [1]

* [releases] Robert has shared another update on blockers and instabilities
for the upcoming release. [2]

* [apis] FLIP-134 states the goal to replace the DataSet API by DataStream
API for bounded data processing. The implementation is handled in follow up
FLIPs. One of these was now published by Dawid as FLIP-140. It proposes to
replace hash-based aggregation by sort-based aggregations in keyed
operators if the input is bounded in order to improve the performance of
the DataStream API for bounded data processing. [3]

* [connectors] Becket has revived the discussion on FLIP-33, which aims to
standardize common metrics across all connectors of Apache Flink to
facilitate the integration with 3rd party systems. [4]

* [connectors] Timo has updated FLIP-107 which builds the foundation to
read/write table columns from/to different parts of source records. For
example, this will allow you to read/write the key and timestamp
information of a Kafka record. [4]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-11-2-tp44323.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Release-1-12-Stale-blockers-and-build-instabilities-tp43477.html

[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Re-DISCUSS-FLIP-140-Introduce-bounded-style-execution-for-keyed-streams-tp44395.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-33-Standardize-connector-metrics-tp26869.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-107-Reading-table-columns-from-different-parts-of-source-records-tp38277.html

Notable Bugs
==

* [FLINK-19109] [1.11.1] [1.10.2] Watermarks generated (in the same
operator chain) after ContinousFileReaderOperator (env.readXXX) are
swallowed. Fixed for 1.11.2. [6]

* [FLINK-19133] [1.11.1] A bug in the KafkaSerialzationSchemaWrapper can
lead to situations where only Kafka partition 0 ever receives data. Fixed
for 1.11.2 [7]

[6] https://issues.apache.org/jira/browse/FLINK-19109
[7] https://issues.apache.org/jira/browse/FLINK-19133

Events, Blog Posts, Misc
===

* In this blog post Andrey describes the recent changes around memory
management and configuration of the JobManager process. Similar changes had
already been released for the Taskamangers in Flink 1.10 [8] , with Flink
1.11 the Jobmanager received a corresponding update. [9]

* Marta has published a Flink Community Update blog post for August 2020.
[10]

* The second video of my colleague Alexander's "Introduction to Flink"
series has been published on Youtube. If you are looking for a concise
refresher on the basics of Apache Flink , stop here. [11]

[8]
https://flink.apache.org/2020/09/01/flink-1.11-memory-management-improvements.html
[9]
https://flink.apache.org/news/2020/04/21/memory-management-improvements-flink-1.10.html
[10] https://flink.apache.org/news/2020/09/04/community-update.html
[11]
https://www.youtube.com/watch?v=_G-hQfT02BA=PLaDktj9CFcS9YAaJ4bKWMWpjptudLr782=2

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/35

2020-08-31 Thread Konstantin Knauf
Dear community,

happy to share a brief community update for the past week with configurable
memory sharing between Flink and its Python "side car", stateful Python
UDFs, an introduction of our GSoD participants and a little bit more.

Flink Development
==

* [datastream api] Dawid has started a vote to remove DataStream#fold and
DataStream#split (both already deprecated) in Flink 1.12. [1]

* [runtime] Xintong has started a discussion thread for "Intra-Slot
Management Memory Sharing", which lets users configure the fraction of
managed memory that should be used by Flink internally (RocksDB or Batch
algorithms) on the one side and the Python process on the other side. [2]

* [python] Wei Zhon has started discussion for FLIP-139, which aims to add
support for *stateful *Python UDFs for the Table API/SQL. So far, only
stateless functions are supported. [3]

* [documentation] Kartik Khare and Mohammad Haseeb Asif will work with the
Apache Flink Community to improve the documentation of Flink SQL as part of
their participation in Google Season of Docs 2020. [4]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Remove-deprecated-DataStream-fold-and-DataStream-split-in-1-12-tp44229.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-141-Intra-Slot-Managed-Memory-Sharing-tp44146.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-139-General-Python-User-Defined-Aggregate-Function-on-Table-API-tp44139.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Introducing-the-GSoD-2020-Participants-tp44144.html

flink-packages.org
==

* fabricalab has published a DynamoDB streaming source on flink-packages.org.
[5]

[5] https://flink-packages.org/packages/streaming-flink-dynamodb-connector

Notable Bugs
==

* [FLINK-18934] [1.11.1] Flink's mechanism to deal with idle
sources/partitions introduced in Flink 1.11 [6] does not currently work
with co-functions, union or joins. [7]

[6]
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/event_timestamps_watermarks.html#dealing-with-idle-sources
[7] https://issues.apache.org/jira/browse/FLINK-18934

Events, Blog Posts, Misc
===

* Dian Fu is now part of the Apache Flink PMC. Congratulations! [8]

[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-PMC-member-Dian-Fu-tp44170p44240.html

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/31-34

2020-08-25 Thread Konstantin Knauf
e/pages/viewpage.action?pageId=158871522
[10]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-134-DataStream-Semantics-for-Bounded-Input-tp43839p43965.html
[11]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-136-Improve-interoperability-between-DataStream-and-Table-API-tp43993.html
[12]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Removing-deprecated-methods-from-DataStream-API-tp43938.html
[13]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-135-Approximate-Task-Local-Recovery-tp43930.html
[14]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-137-Support-Pandas-UDAF-in-PyFlink-tp44060.html
[15]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-Kafka-0-10-x-connector-and-possibly-0-11-x-tp44087.html
[16]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Upgrade-HBase-connector-to-2-2-x-tp42657.html

flink-packages.org
==

Jark has recently published a set of Flink connectors (DataStream & Table
API/SQL) that allow to ingest the changelog of MySQL and Postgres without
additional tools like Kafka or Debezium. [17]

[17] https://flink-packages.org/packages/cdc-connectors

Notable Bugs
==

To be honest, I did not search through every bug ticket created over the
last four weeks, only the last seven days, and I did not find anything
particularly notable. So, I'll leave you without any bug reports this time.

Events, Blog Posts, Misc
===

* David Anderson is now an Apache Flink committer. Congrats! [18]

* There have been a couple blog posts on the Flink blog recently that
highlight some of the features added in latest release:
* PyFlink: The Integration of Pands into PyFlink [19]
<https://flink.apache.org/2020/08/04/pyflink-pandas-udf-support-flink.html>
*  
<https://flink.apache.org/news/2020/08/06/external-resource.html>Accelerating
your workload with GPU and other external resources [20]
* Monitoring and Controlling Networks of IoT Devices with Flink
Stateful Functions [21]
* The State of Flink on Docker [22]
<https://flink.apache.org/news/2020/08/20/flink-docker.html>

* The schedule for Flink Forward Global is live [23]. The event is free and
you can already register under [24].

[18]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-Flink-Committer-David-Anderson-tp43814p43847.html
[19]
https://flink.apache.org/2020/08/04/pyflink-pandas-udf-support-flink.html
[20] https://flink.apache.org/news/2020/08/06/external-resource.html
[21] https://flink.apache.org/2020/08/19/statefun.html
[22] https://flink.apache.org/news/2020/08/20/flink-docker.html
[23] https://www.flink-forward.org/global-2020/conference-program
[24]
https://www.eventbrite.com/e/flink-forward-global-virtual-2020-tickets-113775477516#tickets

Cheers,

Konstantin


-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [DISCUSS] Removing deprecated methods from DataStream API

2020-08-25 Thread Konstantin Knauf
>>>>
>>>> >
>>>>
>>>> > I also wanted to make it explicit that most of the changes above
>>>> would result in a binary incompatibility and require additional exclusions
>>>> in the japicmp. Those are:
>>>>
>>>> >
>>>>
>>>> > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>>>>
>>>> > ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
>>>>
>>>> > ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)
>>>>
>>>> > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>>>>
>>>> > ExecutionConfig#(get)/setNumberOfExecutionRetries() (deprecated in
>>>> 1.1?)
>>>>
>>>> > DataStream#fold and all related classes and methods such as
>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>> 1.3/1.4)
>>>>
>>>> > DataStream#split (deprecated in 1.8)
>>>>
>>>> > Methods in (Connected)DataStream that specify keys as either indices
>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>> ConnectedStream#keyBy,  (deprecated in 1.11)
>>>>
>>>> >
>>>> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>>>> (deprecated in 1.2)
>>>>
>>>> >
>>>>
>>>> > Looking forward to more opinions on the issue.
>>>>
>>>> >
>>>>
>>>> > Best,
>>>>
>>>> >
>>>>
>>>> > Dawid
>>>>
>>>> >
>>>>
>>>> >
>>>>
>>>> > On 17/08/2020 12:49, Kostas Kloudas wrote:
>>>>
>>>> >
>>>>
>>>> > Thanks a lot for starting this Dawid,
>>>>
>>>> >
>>>>
>>>> > Big +1 for the proposed clean-up, and I would also add the deprecated
>>>>
>>>> > methods of the StreamExecutionEnvironment like:
>>>>
>>>> >
>>>>
>>>> > enableCheckpointing(long interval, CheckpointingMode mode, boolean
>>>> force)
>>>>
>>>> > enableCheckpointing()
>>>>
>>>> > isForceCheckpointing()
>>>>
>>>> >
>>>>
>>>> > readFile(FileInputFormat inputFormat,String
>>>>
>>>> > filePath,FileProcessingMode watchType,long interval, FilePathFilter
>>>>
>>>> > filter)
>>>>
>>>> > readFileStream(...)
>>>>
>>>> >
>>>>
>>>> > socketTextStream(String hostname, int port, char delimiter, long
>>>> maxRetry)
>>>>
>>>> > socketTextStream(String hostname, int port, char delimiter)
>>>>
>>>> >
>>>>
>>>> > There are more, like the (get)/setNumberOfExecutionRetries() that were
>>>>
>>>> > deprecated long ago, but I have not investigated to see if they are
>>>>
>>>> > actually easy to remove.
>>>>
>>>> >
>>>>
>>>> > Cheers,
>>>>
>>>> > Kostas
>>>>
>>>> >
>>>>
>>>> > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>>>>
>>>> > wrote:
>>>>
>>>> >
>>>>
>>>> > Hi devs and users,
>>>>
>>>> >
>>>>
>>>> > I wanted to ask you what do you think about removing some of the
>>>> deprecated APIs around the DataStream API.
>>>>
>>>> >
>>>>
>>>> > The APIs I have in mind are:
>>>>
>>>> >
>>>>
>>>> > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>>>>
>>>> > DataStream#fold and all related classes and methods such as
>>>> FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated in
>>>> 1.3/1.4)
>>>>
>>>> > StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>>>> (deprecated in 1.5)
>>>>
>>>> > DataStream#split (deprecated in 1.8)
>>>>
>>>> > Methods in (Connected)DataStream that specify keys as either indices
>>>> or field names such as DataStream#keyBy, DataStream#partitionCustom,
>>>> ConnectedStream#keyBy,  (deprecated in 1.11)
>>>>
>>>> >
>>>>
>>>> > I think the first three should be straightforward. They are long
>>>> deprecated. The getAccumulators method is not used very often in my
>>>> opinion. The same applies to the DataStream#fold which additionally is not
>>>> very performant. Lastly the setStateBackend has an alternative with a class
>>>> from the AbstractStateBackend hierarchy, therefore it will be still code
>>>> compatible. Moreover if we remove the
>>>> #setStateBackend(AbstractStateBackend) we will get rid off warnings users
>>>> have right now when setting a statebackend as the correct method cannot be
>>>> used without an explicit casting.
>>>>
>>>> >
>>>>
>>>> > As for the DataStream#split I know there were some objections against
>>>> removing the #split method in the past. I still believe the output tags can
>>>> replace the split method already.
>>>>
>>>> >
>>>>
>>>> > The only problem in the last set of methods I propose to remove is
>>>> that they were deprecated only in the last release and those method were
>>>> only partially deprecated. Moreover some of the methods were not deprecated
>>>> in ConnectedStreams. Nevertheless I'd still be inclined to remove the
>>>> methods in this release.
>>>>
>>>> >
>>>>
>>>> > Let me know what do you think about it.
>>>>
>>>> >
>>>>
>>>> > Best,
>>>>
>>>> >
>>>>
>>>> > Dawid
>>>
>>>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [DISCUSS] Remove Kafka 0.10.x connector (and possibly 0.11.x)

2020-08-25 Thread Konstantin Knauf
Hi Aljoscha,

I am assuming you're asking about dropping the
flink-connector-kafka-0.10/0.11 modules, right? Or are you talking about
removing support for Kafka 0.10/0.11 from the universal connector?

I am in favor of removing flink-connector-kafka-0.10/0.11 in the next
release. These modules would still be available in Flink 1.11- as a
reference, and could be used with Flink 1.12+ with small or no
modifications. To my knowledge, you also use the universal Kafka connector
with 0.10 brokers, but there might be a performance penalty if I remember
correctly. In general, I find it important to continuously reduce baggage
that accumulates over time and this seems like a good opportunity.

Best,

Konstantin

On Tue, Aug 25, 2020 at 4:59 AM Paul Lam  wrote:

> Hi Aljoscha,
>
> I'm lightly leaning towards keeping the 0.10 connector, for Kafka 0.10
> still has a steady user base in my observation.
>
> But if we drop 0.10 connector, can we ensure the users would be able to
> smoothly migrate to 0.11 connector/universal connector?
>
> If I remember correctly, the universal connector is compatible with 0.10
> brokers, but I want to double check that.
>
> Best,
> Paul Lam
>
> 2020年8月24日 22:46,Aljoscha Krettek  写道:
>
> Hi all,
>
> this thought came up on FLINK-17260 [1] but I think it would be a good
> idea in general. The issue reminded us that Kafka didn't have an
> idempotent/fault-tolerant Producer before Kafka 0.11.0. By now we have had
> the "modern" Kafka connector that roughly follows new Kafka releases for a
> while and this one supports Kafka cluster versions as far back as 0.10.2.0
> (I believe).
>
> What are your thoughts on removing support for older Kafka versions? And
> yes, I know that we had multiple discussions like this in the past but I'm
> trying to gauge the current sentiment.
>
> I'm cross-posting to the user-ml since this is important for both users
> and developers.
>
> Best,
> Aljoscha
>
> [1] https://issues.apache.org/jira/browse/FLINK-17260
>
>
>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/29-30

2020-07-27 Thread Konstantin Knauf
Dear community,

happy to share an update for the last two weeks with the release of Apache
Flink 1.11.1, planning for Flink 1.12, a proposal for better
interoperability with Microsoft Azure services, a few blog posts and more.

Flink Development
==

* [releases] Flink 1.11.1 was released as a quick follow up to the Flink
1.11.0 release mostly fixing some critical issues in the Table API/SQL
ecosystem. [1]

* [releases] Robert started a thread to collect the different
topics/features that are planned for Flink 1.12. Robert & Dian will be our
release managers for this one. They propose a feature freeze around the end
of September. [2]

* [connectors] Israel Ekpo started a thread to discuss the contribution of
multiple connectors for Microsoft Azure services including Data Lake Store
Gen 2 (Filesystem), Azure Cosmos DB  (DataStream) and Azure Event Hub
(DataStream). [3]

* [sql] Seth has started a small discussion on how to handle timestamps if
a "datagen" table is created based on an existing table using the LIKE
clause. [4]

* [connectors] Benchao raised the point that the semantic of
InputFormat#nextRecord returning null is inconsistent throughout the code
case and would like to align these. No feedback so far. [5]

* [development process] Andrey reminds everyone to assign the "starter"
label to Jira issues, which are a good pick for new contributors to Apache
Flink. [6]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-11-1-released-tp43335.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Planning-Flink-1-12-tp43348.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Adding-Azure-Platform-Support-in-DataStream-Table-and-SQL-Connectors-tp43342.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-Handling-of-Timestamp-in-DataGen-table-created-via-LIKE-td43433.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Align-the-semantic-of-returning-null-from-InputFormat-nextRecord-tp43379.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/REMINDER-Use-starter-labels-for-Jira-issues-where-it-makes-sense-tp43284.html

Notable Bugs
==

* [FLINK-18705] [FLINK-18700] [1.11.1] For those of you, who are trying out
the new Debezium format, check out the limitations reported in [7,8].
* [FLINK-18656] [1.11.1] The checkpoint start delay metric is always zero
when unaligned checkpoints are used. [9]

[7] https://issues.apache.org/jira/browse/FLINK-18705
[8] https://issues.apache.org/jira/browse/FLINK-18700
[9] https://issues.apache.org/jira/browse/FLINK-18656

Events, Blog Posts, Misc
===

 * Two new posts on the Flink blog:
*  Dawid gives an overview over (external) catalogs (e.g.
HiveMetastore, PostgreSQL) in Flink. [10]
*  Kostas introduces the newly added "Application Mode" and contrasts
it to the two existing modes: "Session Mode" & "Per-Job Mode". [11]

* In this blog post Eric J. Bruno of Dell explains in detail how Apache
Flink can be used for complex event processing and streaming analytics. [12]

* On the 23rd, the Apache Flink meetup group in Seoul hosted a virtual
session with talks by SK Telecom (Korean), HyperConnect (Korean) and
Ververica (English). It is available on Youtube [13].

* We have published the training program for Flink Forward Global taking
place on the 20th & 21st of October. [14] There will be six different
courses offered over these two days:
* Flink Development (2 days)
* SQL Development (2 days)
* Runtime & Operations (1 day)
* Stateful Functions (1 day)
* Tuning & Troubleshooting (introduction and advanced, 1 day each).

[10] https://flink.apache.org/2020/07/23/catalogs.html
[11] https://flink.apache.org/news/2020/07/14/application-mode.html
[12]
https://blogs.oracle.com/javamagazine/streaming-analytics-with-java-and-apache-flink
[13] https://www.youtube.com/watch?v=HWTb5kn4LvE
[14] https://www.flink-forward.org/global-2020/training-program

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Konstantin Knauf
Thank you for managing the quick follow up release. I think this was very
important for Table & SQL users.

On Wed, Jul 22, 2020 at 1:45 PM Till Rohrmann  wrote:

> Thanks for being the release manager for the 1.11.1 release, Dian. Thanks
> a lot to everyone who contributed to this release.
>
> Cheers,
> Till
>
> On Wed, Jul 22, 2020 at 11:38 AM Hequn Cheng  wrote:
>
>> Thanks Dian for the great work and thanks to everyone who makes this
>> release possible!
>>
>> Best, Hequn
>>
>> On Wed, Jul 22, 2020 at 4:40 PM Jark Wu  wrote:
>>
>> > Congratulations! Thanks Dian for the great work and to be the release
>> > manager!
>> >
>> > Best,
>> > Jark
>> >
>> > On Wed, 22 Jul 2020 at 15:45, Yangze Guo  wrote:
>> >
>> > > Congrats!
>> > >
>> > > Thanks Dian Fu for being release manager, and everyone involved!
>> > >
>> > > Best,
>> > > Yangze Guo
>> > >
>> > > On Wed, Jul 22, 2020 at 3:14 PM Wei Zhong 
>> > wrote:
>> > > >
>> > > > Congratulations! Thanks Dian for the great work!
>> > > >
>> > > > Best,
>> > > > Wei
>> > > >
>> > > > > 在 2020年7月22日,15:09,Leonard Xu  写道:
>> > > > >
>> > > > > Congratulations!
>> > > > >
>> > > > > Thanks Dian Fu for the great work as release manager, and thanks
>> > > everyone involved!
>> > > > >
>> > > > > Best
>> > > > > Leonard Xu
>> > > > >
>> > > > >> 在 2020年7月22日,14:52,Dian Fu  写道:
>> > > > >>
>> > > > >> The Apache Flink community is very happy to announce the release
>> of
>> > > Apache Flink 1.11.1, which is the first bugfix release for the Apache
>> > Flink
>> > > 1.11 series.
>> > > > >>
>> > > > >> Apache Flink® is an open-source stream processing framework for
>> > > distributed, high-performing, always-available, and accurate data
>> > streaming
>> > > applications.
>> > > > >>
>> > > > >> The release is available for download at:
>> > > > >> https://flink.apache.org/downloads.html
>> > > > >>
>> > > > >> Please check out the release blog post for an overview of the
>> > > improvements for this bugfix release:
>> > > > >> https://flink.apache.org/news/2020/07/21/release-1.11.1.html
>> > > > >>
>> > > > >> The full release notes are available in Jira:
>> > > > >>
>> > >
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348323
>> > > > >>
>> > > > >> We would like to thank all contributors of the Apache Flink
>> > community
>> > > who made this release possible!
>> > > > >>
>> > > > >> Regards,
>> > > > >> Dian
>> > > > >
>> > > >
>> > >
>> >
>>
>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Konstantin Knauf
Thank you for managing the quick follow up release. I think this was very
important for Table & SQL users.

On Wed, Jul 22, 2020 at 1:45 PM Till Rohrmann  wrote:

> Thanks for being the release manager for the 1.11.1 release, Dian. Thanks
> a lot to everyone who contributed to this release.
>
> Cheers,
> Till
>
> On Wed, Jul 22, 2020 at 11:38 AM Hequn Cheng  wrote:
>
>> Thanks Dian for the great work and thanks to everyone who makes this
>> release possible!
>>
>> Best, Hequn
>>
>> On Wed, Jul 22, 2020 at 4:40 PM Jark Wu  wrote:
>>
>> > Congratulations! Thanks Dian for the great work and to be the release
>> > manager!
>> >
>> > Best,
>> > Jark
>> >
>> > On Wed, 22 Jul 2020 at 15:45, Yangze Guo  wrote:
>> >
>> > > Congrats!
>> > >
>> > > Thanks Dian Fu for being release manager, and everyone involved!
>> > >
>> > > Best,
>> > > Yangze Guo
>> > >
>> > > On Wed, Jul 22, 2020 at 3:14 PM Wei Zhong 
>> > wrote:
>> > > >
>> > > > Congratulations! Thanks Dian for the great work!
>> > > >
>> > > > Best,
>> > > > Wei
>> > > >
>> > > > > 在 2020年7月22日,15:09,Leonard Xu  写道:
>> > > > >
>> > > > > Congratulations!
>> > > > >
>> > > > > Thanks Dian Fu for the great work as release manager, and thanks
>> > > everyone involved!
>> > > > >
>> > > > > Best
>> > > > > Leonard Xu
>> > > > >
>> > > > >> 在 2020年7月22日,14:52,Dian Fu  写道:
>> > > > >>
>> > > > >> The Apache Flink community is very happy to announce the release
>> of
>> > > Apache Flink 1.11.1, which is the first bugfix release for the Apache
>> > Flink
>> > > 1.11 series.
>> > > > >>
>> > > > >> Apache Flink® is an open-source stream processing framework for
>> > > distributed, high-performing, always-available, and accurate data
>> > streaming
>> > > applications.
>> > > > >>
>> > > > >> The release is available for download at:
>> > > > >> https://flink.apache.org/downloads.html
>> > > > >>
>> > > > >> Please check out the release blog post for an overview of the
>> > > improvements for this bugfix release:
>> > > > >> https://flink.apache.org/news/2020/07/21/release-1.11.1.html
>> > > > >>
>> > > > >> The full release notes are available in Jira:
>> > > > >>
>> > >
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12348323
>> > > > >>
>> > > > >> We would like to thank all contributors of the Apache Flink
>> > community
>> > > who made this release possible!
>> > > > >>
>> > > > >> Regards,
>> > > > >> Dian
>> > > > >
>> > > >
>> > >
>> >
>>
>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/28

2020-07-13 Thread Konstantin Knauf
Dear community,

happy to share this week's community update with a discussion on
releasing Flink 1.11.1, FLIPs on a Python DataStream API and improvements
to the "Connect API" of the Table API and a bit more.

Flink Development
==

* [releases] Flink 1.11.0 was released. Check out the release announcement
for an overview of the features and improvements that come with it. Thanks
to Piotr and Zhijiang for managing the release. [1,2]

* [releases] Jark started a discussion on releasing Flink 1.11.1 following
very soon after Flink 1.11.0 in order to resolve a couple of known issues
mostly in the Table API / SQL ecosystem. [3]

* [python] Shuiqiang Chen has started a discussion on introducing a Python
DataStream API along similar lines as the Python Table API. It is proposed
to initially only support stateless user-defined functions. [4]

* [table api] Jark has started a discussion thread for FLIP-129 to improve
the "Connect API" in the Table API, i.e. the API a user uses to
describe/register a Table with the environment. Many features (e.g.
computed columns) that are already supported in the SQL DDL are not yet
exposed in the Table API yet. [5]

[1] https://flink.apache.org/news/2020/07/06/release-1.11.0.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-11-0-released-tp43023.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-11-1-soon-tp43065.html

[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-130-Support-for-Python-DataStream-API-Stateless-Part-tp43035.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-129-Refactor-Descriptor-API-to-register-connector-in-Table-API-tp42995.html

Notable Bugs
==

Quite a few bugs reported for Flink 1.11.0. See also the discussion on a
quick patch release.

* [FLINK-18419] [1.11.0] You can not create a catalog from a user JAR, e.g.
a JAR passed via the sql-client CLI. Fixed for 1.11.1. [6]

* [FLINK-18583] [1.11.0] The _id field is incorrectly set to the index in
the ElasticSearch table sink. Trivial fix targeted for 1.11.1. [7]

* [FLINK-18583] [1.11.1] The InfluxDB metrics reporter does not work as a
plugin. Trivial fix targeted for 1.11.1. [8]

* [FLINK-18434 [1.10.0] You can not select fields when using the JDBC
catalog. Fixed for 1.11.1. [9]

* [FLINK-18461] [1.11.0] It is currently not possible to write a changelog
stream directly into an upsert sink (e.g. Elastic or JDBC). Fixed for
1.11.1. [10]

[6] https://issues.apache.org/jira/browse/FLINK-18419
[7] https://issues.apache.org/jira/browse/FLINK-18583
[8] https://issues.apache.org/jira/browse/FLINK-18573
[9] https://issues.apache.org/jira/browse/FLINK-18434
[10] https://issues.apache.org/jira/browse/FLINK-18461

Events, Blog Posts, Misc
===

* Piotr has joined the Apache Flink PMC. Congratulations! [10]

* My colleague Alexander Fedulov has started an introductory video series
on Apache Flink and stream processing on Youtube. [11,12]

[10]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-PMC-member-Piotr-Nowojski-tp42966p43022.html
[11]
https://www.ververica.com/blog/presenting-our-streaming-concepts-introduction-to-flink-video-series
[12] https://www.youtube.com/watch?v=ZU1r7uEAO7o

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/26-27

2020-07-06 Thread Konstantin Knauf
Dear community,

happy to share this (and last) week's community update. Flink 1.11 is
finally ready to be released. Besides that a few design discussions in
different areas of Apache Flink, like enhanced fan out for Flink's Kinesis
Source or Temporal Table support in pure Flink SQL, and of course a bit
more.

Flink Development
==

* [releases] Zhinjiang has published release candidate #4 for Flink 1.11.0
last Tuesday. The vote [1] passed this morning, so we will see the release
of Flink 1.11 very soon.

* [sql] A while ago I started a discussion on supporting Temporal Table
Joins via pure Flink SQL. As of now, a user either needs to register a
Temporal Table Function in the Table API or the environments configuration
of the SQL CLI. This became a more involved discussion than anticipated
that Leonard Xu is doing a great job in moving forward. It seems that we
are close to a FLIP document now. [2]

* [connectors] Danny Cranmer has started the discussion [3] and -
subsequently - the vote [4] on FLIP-128, which adds support for enhanced
fan out for Flink's Kinesis source. With enhanced fan out each consumer
receives dedicated data output per shard, as opposed to competing for the
per-shared data output with other consumers.

* [apis] Aljoscha has started a discussion about what kind of compatibility
guarantees the community would like to give for the APIs that are commonly
used by packaged, third-party or custom connectors. Not too much feedback
so far, but right now it seems that we would like it to be safe to use
connectors across patch releases (1.x.y -> 1.x.z), but not across minor
releases (1.u -> 1.v). Based on the recent discussions [5] on the
guarantees for @PublicEvolving this means that connectors could only use
APIs that are annotated @Public or @PublicEvolving. [6]

* [state] Etienne Chauchot has published a design document for FLINK-17073,
which introduces a backpressure mechanism for checkpoints: when checkpoints
can not be cleaned up as quickly as they are created triggering new
checkpoints will be delayed. This change was motivated by an OOME on the
Jobmanger resulting from too many queued checkpoint clean up tasks. [7]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-1-11-0-release-candidate-4-tp42829.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLINK-16824-Creating-Temporal-Table-Function-via-DDL-tp40333.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-128-Enhanced-Fan-Out-for-AWS-Kinesis-Consumers-tp42728.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-128-Enhanced-Fan-Out-for-AWS-Kinesis-Consumers-tp42846.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Cross-version-compatibility-guarantees-of-Flink-Modules-Jars-tp42746.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Stability-guarantees-for-PublicEvolving-classes-tp41459.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/FLINK-17073-checkpoint-backpressure-design-doc-tp42788.html

Notable Bugs
==

* [FLINK-18452] [1.11] [1.10.1] SQL queries that use "Top-N" can not be
restored from a savepoint due to a incorrectly implemented Object#equals in
one of the state objects. [8]

[8] https://issues.apache.org/jira/browse/FLINK-18452

Events, Blog Posts, Misc
===

* On the Ververica blog Jaehyeuk Oh & Gihoon Yeom explain how HyperConnect
is using Apache Flink for match making in their real-time communication app
Azar. [9]

* On the Flink blog, Jeff Zhang has published the second part of his blog
post series on Flink on Zeppelin. [10]

[9]
https://www.ververica.com/blog/data-driven-matchmaking-at-azar-with-apache-flink
[10]
https://flink.apache.org/ecosystem/2020/06/23/flink-on-zeppelin-part2.html

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/25

2020-06-22 Thread Konstantin Knauf
Dear community,

happy to share this week's community update: release testing for Flink
1.11.0 is slowly converging, and the first feature discussions for the
upcoming release cycle are coming up.

Flink Development
==

* [releases] The community has published another non-voting release
candidate for Flink 1.11.0 to facilitate further release testing. [1]

* [connectors] Jacky Lau proposes to drop support for ElesaticSearch 5.x to
facilitate the implementation of FLIP-127, a source connector for
ElasticsSearch. [2]

* [connectors] Gyula proposes to upgrade the HBase connector to 2.2.x. and
dropping support for 1.4.x. The discussion indicates that there are still a
lot of HBase 1.x installations out there and hence there are reservations
on dropping support for it now. [3]

* [docs] Flink Master has been re-renamed to Jobmanager as part of a larger
discussion about the usage of the terms slave/master in Apache Flink. [4,5]

* [sql] Fabian has started a discussion on a SQL syntax for a Table API's
StatementSet. A StatementSet allows users to group multiple INSERT INTO
statements, so that they are compiled into a single Flink job. This
functionality has been added in FLIP-84, but is not exposed in SQL yet. [6]

* [sql] Jingsong Li has started a discussion on how to ensure that new SQL
features are also supported by the Flink SQL client going forward. So far,
new features are often only available in the Flink SQL client with a delay
as developers forget to explicitly add support for them directly. [7]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-11-0-release-candidate-2-tp42620.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Re-DISCUSS-Drop-connectors-for-5-x-and-restart-the-flink-es-source-connector-tp42671.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Upgrade-HBase-connector-to-2-2-x-tp42657.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Re-renaming-Flink-Master-back-to-JobManager-tp42510p42512.html
[5] https://issues.apache.org/jira/browse/FLINK-18209
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Sql-client-lack-support-for-new-features-tp42640.html

flink-packages.org
==

* lukess has published a MetricsReporter for SignalFX. [7]

[7] https://flink-packages.org/packages/flink-metrics-signalfx

Notable Bugs
==

Of course, there is still a lot of activity around release testing, but
nothing exciting for any of the released versions.

Events, Blog Posts, Misc
===

* Yu Li joined the Apache Flink PMC. Congratulations! [8]

* On the Ververica blog, Qian Yu explains how Weibo uses Apache Flink for
real time feature extraction and online model training. [9]

* Jeff Zhang has published the first part of a blog post series on Flink on
Zeppelin on the Apache Flink blog. [10]

* The Call for Proposals of Flink Forward Global taking place virtually,
October 19-21, has been extended by 1 week until the 28th of June. We are
looking forward to your submissions. If you feel unsure about submitting,
please do not hesitate to reach out to me or anyone at Ververica. We are
happy to help. [11]

[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Yu-Li-is-now-part-of-the-Flink-PMC-tp42567.html

[9]
https://www.ververica.com/blog/flink-for-online-machine-learning-and-real-time-processing-at-weibo
[10] https://flink.apache.org/news/2020/06/15/flink-on-zeppelin-part1.html
[11] https://www.flink-forward.org/global-2020/call-for-presentations


Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/23-24

2020-06-15 Thread Konstantin Knauf
Dear community,

happy to share this community update on the last two weeks including the
release of Stateful Functions 2.1, a table source for ElasticSearch, and a
bit more. The community is still working on release testing for Apache
Flink 1.11, so still comparably quite. Expecting the first feature
discussions for Flink 1.12 to start soon, though.

Flink Development
==

* [releases] Apache Flink Stateful Functions 2.1.0 was released last
Tuesday. [1] For details checkout the release blog post [2].

* [connectors] Jacky Lau started a discussion on extending the
ElasticSearch connector by adding a TableSource. The source would be
bounded (similar to JDBC and Hbase) and scannable as well as lookupable. [3]

* [documentation] Robert has started a conversation on adding a Japanese
translation of the documentation. The discussion showed that a) there does
not seem to be anyone who could review these contributions and b) there are
a couple of issues in keeping the English and Chinese translation in sync
that would need to be addressed prior to adding another language. [4]

* [documentation] Seth proposed to move the code backing the "walkthroughs"
from the main *flink* repository to the *flink-playgrounds* repository. [5]

* [development process] Aljoscha proposes to update the checked–in
EditorConfig to the current code and checkstyle configuration of Apache
Flink [6]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-Stateful-Functions-2-1-0-released-tp42345.html
[2] https://flink.apache.org/news/2020/06/09/release-statefun-2.1.0.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discussion-flink-elasticsearch-connector-supports-tp42082p42238.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Add-Japanese-translation-of-the-flink-apache-org-website-tp42279.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-Migrate-walkthroughs-to-flink-playgrounds-for-1-11-tp42360.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Update-our-EditorConfig-file-tp42409.html

Notable Bugs
==

A lot of activity around release testing for Flink 1.11.0, but nothing
notable for any of our release versions.

Events, Blog Posts, Misc
===

* Benchalo Li and Xintong Song are Apache Flink Committers now.
Congratulations! [7,8]

* Marta has published a new quarterly community update blog post on the
Flink blog. [9]

[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-Flink-Committer-Benchao-Li-tp42312p42353.html
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-Apache-Flink-Committer-Xintong-Song-tp42194p42207.html
[9] https://flink.apache.org/news/2020/06/11/community-update.html

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/22

2020-06-01 Thread Konstantin Knauf
Dear community,

happy to share this week's community update with an update on the upcoming
Apache Flink releases: Flink 1.11 and Flink Stateful Functions 2.1. With
the community focused on release testing  the dev@ mailing list remains
relatively quiet.

Flink Development
==

* [releases] Release Testing for Flink 1.11 is progressing. To follow the
testing efforts check out the Flink 1.11 burndown board [1] in the Flink
Jira. Stephan proposes to backport FLIP-126 to Flink 1.11 (after the
feature freeze) as it is an isolated change and to avoid breaking the newly
added source interface again in the next release. [2]

* [releases] Apache Flink-shaded 11.0 was released. Flink 1.11 will depend
on it. [3]

* [releases] Gordon just published the first release candidate for Apache
Flink Stateful Functions 2.1. [4]

* [savepoints] I started a small discussion about documenting (breaking)
backwards compatibility of Apache Flink's savepoint format. [5]

[1]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=364=FLINK

[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Backpoint-FLIP-126-watermarks-integration-with-FLIP-27-tp41897p41999.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-shaded-11-0-released-tp42056.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Apache-Flink-Stateful-Functions-2-1-0-release-candidate-1-tp42061.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Document-Backwards-Compatibility-of-Savepoints-tp41903.html

Notable Bugs
==

* Flink 1.10.1 seemed to have reintroduced an issue with Kerberos-secured
MapR environments. [6]

[6] https://issues.apache.org/jira/browse/FLINK-18045

Events, Blog Posts, Misc
===

* Ververica has added a second training to Flink Forward Global (Oct 20)
shifting the two conference days back by day. The new dates are 19th/20th
Oct for training, and 21st/22nd Oct for conference/talks. [7]
Pre-registration is already opened. [8]

[7] https://twitter.com/FlinkForward/status/1265281578676166658
[8] https://www.flink-forward.org/global-2020

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/21

2020-05-25 Thread Konstantin Knauf
Dear community,

happy to share this week's community update!

Flink Development
==

* [releases]  Zhijiang has published a first release candidate for Apache
Flink 1.11.0. This is a "preview-only" release candidate to facilitate
current testing efforts. There will not be a vote on this RC. [1]

* [releases] Gordon proposes to release Apache Flink Stateful Function
2.1.0 soon. This would mean a faster release cycle compared to the main
project, which seems to make sense given the early state of the project.
The proposal received a lot positive feedback and the feature freeze date
was set for this Wednesday (27th of May). [2]

* [deployment] Klou proposes to remove support for shipping nested JARs
during job submission, a so far hidden feature of Apache Flink from the
early days of the project. [3]

* [security] Weike Dong proposes to exclude "unsafe ports", i.e. ports
considered to be unsafe by major browsers, as candidates for the Flink REST
API, if no port was specified explicitly. [4]

* [development process] Robert has started a thread to clarify the
semantics and usage of JIRA fields in the Apache Flink project. If you are
unsure about some fields, check out this thread and join the conversation.
[5]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-11-0-release-candidate-1-tp41840.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Stateful-Functions-2-1-0-soon-tp41734.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-dependency-shipping-through-nested-jars-during-job-submission-tp41711.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Shall-we-avoid-binding-unsafe-ports-during-random-port-selection-tp41821.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Semantics-of-our-JIRA-fields-tp41786p41806.html

Notable Bugs
==

* [FLINK-17189] [1.10.1] HiveCatalog does not support processing time
columns. [6]
* [FLINK-17800] [1.10.1] There might be a state loss issue when using the
Rocksdb with optimizeForPointLookup. The community is still investigating
the bug report. [7]

[6] https://issues.apache.org/jira/browse/FLINK-17189
[7] https://issues.apache.org/jira/browse/FLINK-17800

Events, Blog Posts, Misc
===

* Ankit Jhalaria of GoDaddy has published a blog post about their
Flink-based streaming data platform on the Ververica Blog [8].
* If you have not heard about Lyft's streaming platform before, here is
another great talk by Sherin Thomas and blog on InfoQ [9].

[8]
https://www.ververica.com/blog/how-godaddy-uses-flink-to-run-real-time-streaming-pipelines
[9] https://www.infoq.com/presentations/ml-streaming-lyft/

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/20

2020-05-18 Thread Konstantin Knauf
Dear community,

happy to share this week's community update. With feature freeze for Flink
1.11 today, there are not a lot of feature-related discussions right now on
Apache Flink development mailing list. Still I can share some news on
Apache Flink 1.11 and 1.10.1, Flink Forward Global and a discussion on
stability guarantees for @PublicEvolving classes.

Note: Going forward I will publish the weekly Apache Flink community update
on Mondays for the preceding week.

Flink Development
==

* [releases] Apache Flink 1.11 feature freeze is today. Afterwards, release
testing will commence. [1]

* [releases] Apache Flink 1.10.1 was released early last week [2,3].
Shortly afterwards the community discovered a change [4] in 1.10.1 that
broke binary compatibility of the StreamingFileSink. More information in
the next item. [5]

* [development process] Triggered by the compatibility issue with the
StreamingFileSink, a class annotated with @PublicEvolving, the community
started discussing what stability guarantees this annotation gives for
patch releases. It seems like (ongoing vote) that going forward the
community will commit to ensure API and binary
compatibility of @PublicEvolving classes across patch releases. (x.y.u ->
x.y.v). [6]

* [security] The Apache Flink PMC has informed all users about a
vulnerability in the Apache Flink JMX Reporter []. A fix is contained in
Apache Flink 1.9.3 and 1.10.1. For all other versions of Apache Flink
Chesnay's email contains a commit that can be cherry-picked to fix the
vulnerability. [7]

* [deployment] Gyula proposes to extend the Apache Flink web user interface
to show information across multiple job or application clusters. [8]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Exact-feature-freeze-date-tp40624.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-10-1-released-tp41435.html
[3] https://flink.apache.org/news/2020/05/12/release-1.10.1.html
[4] https://issues.apache.org/jira/browse/FLINK-16684
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-Breaking-API-Change-in-1-10-1-tp41377.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Stability-guarantees-for-PublicEvolving-classes-tp41459.html
[7] https://nvd.nist.gov/vuln/detail/CVE-2020-1960
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/CVE-2020-1960-Apache-Flink-JMX-information-disclosure-vulnerability-tp41437.html

Notable Bugs
==

Nothing came to my attention.

Events, Blog Posts, Misc
===

* Yuan Mei started a discussion to create a platform to collect and share
learning resources around Apache Flink. Positive feedback, main concern is
maintainability. The discussion showed that we are looking for something
"more dynamic", where everyone (not just committers) can share their
material. Yuan will work on a prototype for such a page. [9]

* Ververveica will host another global virtual conference 19th - 21st of
October 2020. The conference will consist of a training day (limited seats,
paid) and two days of keynotes and conference talks (free). You can already
pre-registered and submit your talks under
https://www.flink-forward.org/global-2020.

[9]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Add-a-material-web-page-under-https-flink-apache-org-tp41298.html

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/19

2020-05-10 Thread Konstantin Knauf
Dear community,

with only a few days left until the planned feature freeze for Flink 1.11
the dev@ mailing list is getting pretty quite while everyone is working on
their final features. Hence, this week only a short update.

Flink Development
==

* [releases] Yu has published RC #3 for Flink 1.10.1. Voting is still
ongoing, no -1s so far. [1]

* [development process] Robert has revived a thread to update the
roadmap published on the Flink website, which led to a conversation on how
to keep it up-to-date as part of the regular release process. [2]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-1-10-1-release-candidate-3-tp41216.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Update-our-Roadmap-tp28378.html

Notable Bugs
==

Nothing came to my attention.

Events, Blog Posts, Misc
===

* Marta recapps the last months in a community update blog post on the
Flink blog [3].

* Apache Flink applies to Google's Season of Docs. The announcement blog
post has been published on the Flink Blog. 4[]

* Upcoming Meetups
* May 16th,  Apache Flink 杭州站线上 Meetup, Apache Flink China [5]
* May 27th, Apache Flink + FlinkSQL + FLaNK Talks, Future of Data, New
Jersey [6]

[] https://flink.apache.org/news/2020/05/07/community-update.html
[] https://flink.apache.org/news/2020/05/04/season-of-docs.html
[] https://www.meetup.com/Flink-China/events/270310980/
[] https://www.meetup.com/futureofdata-princeton/events/269933905/

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/18

2020-05-03 Thread Konstantin Knauf
Dear community,

happy to share - a brief - community update this week with an update on
Flink 1.10.1, our application to Google Season of Docs 2020, a discussion
to support Hadoop 3, a recap of Flink Forward Virtual 2020 and a bit more.

Flink Development
==

* [releases] Yu has published a RC #2 for Flink 1.10.1. No -1s so far. [1]

* [docs] Apache Flink's application to Google Season of Docs 2020 is about
to be finalized. Marta has opened PR for the announcement and Seth &
Aljoscha volunteered as Mentor. Apache Flink is pitching a project to
improve the documentation of Table API & SQL. [2]

* [hadoop] Robert has started a discussion on adding support for Hadoop 3.
In particular, the thread discusses the questions of whether Hadoop 3 would
be supported via flink-shaded-hadoop or not. [3]

* [configuration] Timo has started a discussion on how we represent
configuration hierarchies in properties (Flink configuration as well as
Connector properties), so that the resulting files would be valid
JSON/YAML. [4]

* [connectors] Leonard Xu proposes to refactor package, module and class
names of the Flink JDBC connector to be consistent with other connectors .
Details in the [5].

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-1-10-1-release-candidate-2-tp41019.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/PROPOSAL-Google-Season-of-Docs-2020-tp40264.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Adding-support-for-Hadoop-3-and-removing-flink-shaded-hadoop-tp40570p40601.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Hierarchies-in-ConfigOption-tp40920.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Refactor-flink-jdbc-connector-structure-tp40984.html

flink-packages.org
==

* Alibaba has published a preview version of its SpillableHeapStateBackend
on flink-packages.org. [6] This statebackend is contributed to Apache in
FLINK-12692 [7]. The SpillableHeapStateBackend is a Java Heap-based
Statebackend (like the FilesystemStatebackend) that spills the coldest
state to disk before the heap is exhausted.

[6] https://flink-packages.org/packages/spillable-state-backend-for-flink
[7] https://issues.apache.org/jira/browse/FLINK-12692

Notable Bugs
==

I did not encounter anything particularly worth sharing.

Events, Blog Posts, Misc
===

* Fabian has published a recap of Flink Foward Virtual 2020 on the
Ververica blog. [8]

* All recordings of Flink Forward Virtual 2020 have been published on
Youtube. [9]

[8] https://www.ververica.com/blog/flink-forward-virtual-2020-recap
[9]
https://www.youtube.com/watch?v=NF0hXZfUyqE=PLDX4T_cnKjD0ngnBSU-bYGfgVv17MiwA7

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: join in sql without time interval

2020-04-30 Thread Konstantin Knauf
Hi Lec Ssmi,

yes, Dastream#connect on two streams both keyed on the productId with a
KeyedCoProcessFunction is the way to go.

Cheers,

Konstantin

On Thu, Apr 30, 2020 at 11:10 AM lec ssmi  wrote:

> Maybe,  the connect method?
>
> lec ssmi  于2020年4月30日周四 下午3:59写道:
>
>> Hi:
>>   As  the following sql:
>>
>>SELECT *  FROM Orders INNER JOIN Product ON Orders.productId =
>> Product.id
>>
>>  If we use datastream API instead  of sql, how should it be achieved?
>>  Because the APIs in DataStream only have Window Join and Interval
>> Join,the official website says that to solve the above state capacity
>> problem in sql is using TableConfig. But TableConfig itself can only solve
>> the state  ttl  problem of non-time operators. So I think the above sql's
>> implementation is neither tumble window join, nor sliding window join and
>> interval join.
>>
>> Best Regards
>> Lec Ssmi
>>
>

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


[ANNOUNCE] Weekly Community Update 2020/17

2020-04-26 Thread Konstantin Knauf
Dear community,

happy to share this week's community digest with an update on Flink 1.11,
Flink 1.10.1 and Flink 1.9.3, the revival of FLIP-36 to support interactive
programming, a new FLIP to unify (& separate) TimestampAssigners and a bit
more.


Flink Development
==

Releases
^

* [releases] Apache Flink 1.9.3 was released. [1,2]

* [releases] Stephan has proposed 15th of May as the feature freeze date
for Flink 1.11 [3]. Subsequently, Piotr also published a status update on
the development progress for the upcoming release. Check it out to get an
overview of all the features, which are still or not anymore planned for
this release. [4]

* [releases] The first release candidate for Flink 1.10.1 is out. [5]

FLIPs
^^

* [table api] Xuannan has revived the discussion on FLIP-36 to support
interactive programming the Table API. In essence, the FLIP proposes to
support caching (intermediate) results of one (batch) job, so that they can
be used by following (batch) jobs in the same TableEnvironment. [6]

* [time] Aljoscha proposes to a) unify punctuated and periodic watermark
assigners and b) to separate watermarks assigners and timestamp extractors.
[7]

More
^

* [configuration] Yangze started a discussion to unify the way max/min are
used in the config options. Currently, there is a max of different patterns
(**.max, max-**, and more). [8]

* [connectors] Karim Mansour proposes a change to the current RabbitMQ
connector in apache Flink to make message deduplication more flexible. [9]

* [metrics] Jinhai would like to add additional labels to the metrics
reported by the PrometheusReporter. [10]

* [datastream api] Stephan proposes to remove a couple of
deprecated methods for state access in Flink 1.11. [11]


[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-9-3-released-tp40730.html
[2] https://flink.apache.org/news/2020/04/24/release-1.9.3.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Exact-feature-freeze-date-tp40624.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Development-progress-of-Apache-Flink-1-11-tp40718.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-1-10-1-release-candidate-1-tp40724.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-36-Support-Interactive-Programming-in-Flink-Table-API-td40592.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-126-Unify-and-separate-Watermark-Assigners-tp40525p40565.html
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Should-max-min-be-part-of-the-hierarchy-of-config-option-tp40578.html
[9]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-flink-connector-rabbitmq-api-changes-tp40704.html
[10]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Add-custom-labels-on-AbstractPrometheusReporter-like-PrometheusPushGatewayReporter-s-groupiny-tp40708.html
[11]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Removing-deprecated-state-methods-in-1-11-tp40651.html

Notable Bugs
==

* [FLINK-17350] [1.10.0] [1.9.3] [1.8.3]  Since Flink 1.5 users can choose
not to fail their Flink job on checkpoint errors. For the synchronous part
of a checkpoint the implementation of this feature was incorrect, leaving
operators in an inconsistent state following such a failure. Piotr proposes
to always fail tasks on failures in the synchronous part of a checkpoint
going forward. [12]

* [FLINK-17351] [1.10.0] [1.9.3] CheckpointFailureManager ignores
checkpoint timeouts when checking against the maximally tolerable number of
checkpoints failures. So, checkpoint failures are not discovered when they
only surface in the CheckpointFailureManager as a checkpoint timeout
instead of an exception. [13]

Background: both issues were discovered based on a bug report by Jun Qin
[14].

[12] https://issues.apache.org/jira/browse/FLINK-17351
[13] https://issues.apache.org/jira/browse/FLINK-17350
[14] https://issues.apache.org/jira/browse/FLINK-17327

Events, Blog Posts, Misc
===

* Andrey recaps the changes and simplifications to Flink's memory
management (released in Flink 1.10) on the Apache Flink blog. [15] Closely
related, there is also a small tool to test different memory configurations
on flink-packages.org. [16]

[15]
https://flink.apache.org/news/2020/04/21/memory-management-improvements-flink-1.10.html
[16] https://flink-packages.org/packages/flink-memory-calculator

Cheers,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: [ANNOUNCE] Apache Flink 1.9.3 released

2020-04-26 Thread Konstantin Knauf
Thanks for managing this release!

On Sun, Apr 26, 2020 at 3:58 AM jincheng sun 
wrote:

> Thanks for your great job, Dian!
>
> Best,
> Jincheng
>
>
> Hequn Cheng  于2020年4月25日周六 下午8:30写道:
>
>> @Dian, thanks a lot for the release and for being the release manager.
>> Also thanks to everyone who made this release possible!
>>
>> Best,
>> Hequn
>>
>> On Sat, Apr 25, 2020 at 7:57 PM Dian Fu  wrote:
>>
>>> Hi everyone,
>>>
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink 1.9.3, which is the third bugfix release for the Apache Flink
>>> 1.9 series.
>>>
>>> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data streaming
>>> applications.
>>>
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>>
>>> Please check out the release blog post for an overview of the
>>> improvements for this bugfix release:
>>> https://flink.apache.org/news/2020/04/24/release-1.9.3.html
>>>
>>> The full release notes are available in Jira:
>>> https://issues.apache.org/jira/projects/FLINK/versions/12346867
>>>
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>> Also great thanks to @Jincheng for helping finalize this release.
>>>
>>> Regards,
>>> Dian
>>>
>>

-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


Re: Flink

2020-04-24 Thread Konstantin Knauf
Hi Navneeth,

I think there might be some misunderstanding here. Let me try to clarify.

1) The so-called native Kubernetes support [1], which was added as an
experimental feature in Flink 1.10, is not used by Ververica Platform CE
nor by the Lyft K8s Operator as far as I am aware of.

2) The native Kubernetes support only supports session clusters [1] so far.
Support for Job clusters work in progress for Flink 1.11. In case of a
session cluster, Flink will request additional resources from Kubernetes
whenever necessary to deploy a Flink Job upon submission. So, there is
automatic scaling of the *cluster, *but no automatic scaling of the *jobs*.

3) There are multiple efforts within the Flink ecosystem working on auto
scaling jobs.

I hope this helps and please reach out if you have further questions.

Best,

Konstantin

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/concepts/glossary.html#flink-session-cluster


On Thu, Apr 23, 2020 at 7:11 AM Navneeth Krishnan 
wrote:

> Thanks a lot Timo. I will take a look at it. But does flink automatically
> scale up and down at this point with native integration?
>
> Thanks
>
> On Tue, Apr 14, 2020 at 11:27 PM Timo Walther  wrote:
>
>> Hi Navneeth,
>>
>> it might be also worth to look into Ververica Plaform for this. The
>> community edition was published recently is free of charge. It provides
>> first class K8s support [1].
>>
>> There is also a tutorial how to deploy it on EKS [2] (not the most
>> recent one through).
>>
>> Regards,
>> Timo
>>
>> [1]
>>
>> https://www.ververica.com/blog/announcing-ververica-platform-community-edition?utm_campaign=Ververica%20Platform%20-%20Community%20Edition_content=123140986_medium=social_source=twitter_channel=tw-2581958070
>> [2]
>>
>> https://www.ververica.com/blog/how-to-get-started-with-data-artisans-platform-on-aws-eks
>>
>>
>>
>> On 15.04.20 03:57, Navneeth Krishnan wrote:
>> > Hi All,
>> >
>> > I'm very new to EKS and trying to deploy a flink job in cluster mode.
>> > Are there any good documentations on what are the steps to deploy on
>> EKS?
>> >
>> >  From my understanding, with flink 1.10 running it on EKS will
>> > automatically scale up and down with kubernetes integration based on
>> the
>> > load. Is this correct? Do I have to do enable some configs to support
>> > this feature?
>> >
>> > How to use the lyft k8s operator when deploying on EKS?
>> >
>> > Thanks a lot, appreciate all the help.
>> >
>> >
>> >
>> >
>>
>>

-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


Re: Handling stale data enrichment

2020-04-24 Thread Konstantin Knauf
Hi Vinay,

I assume your subscription updates also have a timestamp and a watermark.
Otherwise, there is no way for Flink to tell that the subscription updates
are late.

If you use a "temporal table "-style join to join the two streams, and you
do not receive any subscription updates for 2 hours, the watermark will not
advance (it is the minimum of the two input streams) and hence all click
events will be buffered. No output. This has the advantage of not sending
out duplicate records, but the disadvantage that you do not make any
progress until you see fresh subscription data. Is this the desired
behavior for your use case?

Best,

Konstantin


On Thu, Apr 23, 2020 at 1:29 PM Vinay Patil  wrote:

> Hi,
>
> I went through Konstantin webinar on 99 ways you can do enrichment. One
> thing I am failing to understand is how do we efficiently handle stale data
> enrichment.
>
> Context: Let's say I want to enrich user data with the subscription data.
> Here subscription data is acting as reference data and will be used for
> joining these two streams based on event time. Consider the following
> scenario:
>
>
>1. We are going to enrich Click Stream event (containing user_info)
>with Subscription details
>2. Subscription Status for Alice user is FREE
>3. Current Internal State contains Alice with Subscription status as
>FREE
>4.
>
>Reference data is not flowing because of some issue for 2hrs
>5.
>
>Alice upgraded the subscription to Premium at 10.30 AM
>6.
>
>Watched video event comes for Alice at 10.40 AM
>-
>
>   flink pipeline looks up in internal state and writes to enrichment
>   topic
>   -
>
>   Enrichment topic now contains Alice -> FREE
>   7.
>
>Reference data starts flowing in at 11AM
>-
>
>   let's assume we consider late elements upto 2 hours, so the click
>   stream event of Alice is still buffered in the state
>   - The enrichment topic will now contain duplicate records for Alice
>   because of multiple firings of window
>1. Alice -> FREE -> 10 AM
>   2. Alice -> PREMIUM -> 11 AM
>
> Question is how do I avoid sending duplicate records ? I am not able to
> understand it. I can think of Low Level joins but not sure how do we know
> if it is stale data or not based on timestamp (watermark) as it can happen
> that a particular enriched record is not updated for 6 hrs.
>
> Regards,
> Vinay Patil
>


-- 

Konstantin Knauf


[ANNOUNCE] Weekly Community Update 2020/14

2020-04-05 Thread Konstantin Knauf
Dear community,

happy to share this week's community update with Flink on Zeppelin, full
support for VIEWs in Flink SQL, Ververica Platform Community Edition and a
bit more.

Flink Development
==

* [releases] Four issues (3 Blockers, 1 Critical) left for Flink 1.10.1 at
this point. [1]

* [statefun] On Friday, Gordon published the sixth release candidate for
Apache Flink Stateful Functions 2.0.0. No votes so far. [2]

* [clients] With the release of Zeppelin 0.9, Flink is now available on
Zeppelin! The thread also contains a series of valuable, introductory blog
posts on the topic. [3]

* [sql] Kurt has formally started the discussion to make the Blink planner
the default planner in Flink 1.11+. A lot of positive feedback so far. [4]

* [sql] Zhenghua Gao has revived the discussion on FLIP-71 to fully support
VIEWs in Flink SQL (including persisting VIEWs in the Catalog). [5]

* [sql] Jark started a discussion on FLIP-112 to shorten & simplify the
connector property keys in Flink SQL and Table API. [6]

* [python] Dian Fu proposes to support the conversion between Flink Tables
and Pandas DataFrames. [7]

* [docker] Andrey has updated FLIP-111 to unify Flink Docker images based
on feedback by Ufuk and Patrick Lucas. It seems like this can soon go to a
vote. [8]

* [development process] Aljoscha reminds everyone to create Jira tickets
for all test failures if none exists for this test yet. [9]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-10-1-tp38689.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Apache-Flink-Stateful-Functions-Release-2-0-0-release-candidate-6-tp39776.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Flink-on-Zeppelin-Zeppelin-0-9-is-released-tp39498p39531.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Change-default-planner-to-blink-planner-in-1-11-tp39608p39686.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-71-E2E-View-support-in-Flink-SQL-tp33131p39787.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-120-Support-conversion-between-PyFlink-Table-and-Pandas-DataFrame-tp39611.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-122-New-Connector-Property-Keys-for-New-Factory-tp39759.html
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-111-Docker-image-unification-tp38444.html
[9]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/PSA-Please-report-all-occurrences-of-test-failures-tp39793.html

flink-packages.org
==

* On Tuesday, Ververica published Ververica Platform Community Edition on
flink-packages.org. The Community Edition is a no-cost version of our
enterprise offering and aims to make deploying, operating and managing
Apache Flink applications easier for everyone. It requires Kubernetes and a
distributed file system or object storage. [10]

[10]
https://flink-packages.org/packages/ververica-platform-community-edition

Notable Bugs
==

* [FLINK-16913] [1.10.0]  You currently can not configure the
RocksDbStatebackend in the native Kubernetes support of Flink 1.10.
Resolved for 1.10.1.   [11]

[11] https://issues.apache.org/jira/browse/FLINK-16913

Events, Blog Posts, Misc
===

* Dawid and Zhijiang joined the Apache Flink PMC. Congratulations!
Additionally, I have joined as a Committer :) [12]

* Marta has published a Flink community update on the Flink blog which
among other things looks into the number of contributions to Apache Flink
over time as well as the responsiveness of the community in addressing
these. [13]

[12]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-New-Committers-and-PMC-member-tp39640.html
[13] https://flink.apache.org/news/2020/04/01/community-update.html

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2020/13

2020-03-29 Thread Konstantin Knauf
 Flink with Hive and gives
an introduction to the recent improvements in Flink 1.10. [18]

* Robert has published a first blogpost in the Flink "Engine Room" on the
migration of Flink's CI infrastructure from Travis CI to Azure Pipelines.
[19]

[14] https://www.flink-forward.org/sf-2020/conference-program
[15]
https://www.bigmarker.com/series/flink-forward-virtual-confer1/series_summit
[16] https://www.datadoghq.com/blog/monitor-apache-flink-with-datadog/
[17] https://flink.apache.org/news/2020/03/24/demo-fraud-detection-2.html
[18]
https://flink.apache.org/features/2020/03/27/flink-for-data-warehouse.html
[19]
https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


Re: Kafka - FLink - MongoDB using Scala

2020-03-29 Thread Konstantin Knauf
cc user@f.a.o

Hi Siva,

I am not aware of a Flink MongoDB Connector in either Apache Flink, Apache
Bahir or flink-packages.org. I assume that you are doing idempotent
upserts, and hence do not require a transactional sink to achieve
end-to-end exactly-once results.

To build one yourself, you implement
org.apache.flink.streaming.api.functions.sink.SinkFunction (better inherit
from org.apache.flink.streaming.api.functions.sink.RichSinkFunction).
Roughly speaking, you would instantiate the MongoDB client in the "open"
method and write records in the MongoDB client. Usually, such sinks us some
kind of batching to increase write performance.

I suggest you also have a look at the source code of the ElasticSearch or
Cassandra Sink.

Best,

Konstantin

On Sat, Mar 28, 2020 at 1:47 PM Sivapragash Krishnan <
sivapragas...@gmail.com> wrote:

> Hi
>
> I'm working on creating a streaming pipeline which streams data from Kafka
> and stores in MongoDB using Flink scala.
>
> I'm able to successfully stream data from Kafka using FLink Scala. I'm not
> finding any support to store the data into MongoDB, could you please help
> me with the code snippet to store data into MongoDB.
>
> Thanks
> Siva
>


-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2020/12

2020-03-22 Thread Konstantin Knauf
Dear community,

happy to share this week's community digest featuring "Flink Forward
Virtual Conference 2020", a small update on Flink 1.10.1, a better
Filesystem connector for the Table API & SQL, new source/sink interfaces
for the Table API and a bit more.

Flink Development
==

* [releases] For an update on the outstanding tickets
("Blocker"/"Critical") planned for Apache *Flink 1.10.1* please see the
overview posted by Yu Li in this release discussion thread [1].

* [sql] Timo has shared a proposal (FLIP-95) for *new TableSource and
TableSink interfaces*. It is based on discussions with Jark, Dawid,
Aljoscha, Kurt, Jingsong and many more. Its goals are to simplify the
current interface architecture, to support changelog sources (FLIP-105) and
to remove dependencies on the DataStream API as well as the planner
components. [2]

* [hadoop] Following up on a discussion [3] with Stephan and Till,
Sivaprasanna has shared an overview of Hadoop related utility components to
kick off a discussion on moving these into a separate module
"flink-hadoop-utils". [4]

* [sql] Jingsong Li has started a discussion on introducing a table source
that in essence generates a random stream of data of a given schema to
facilitate development and testing in Flink SQL [5].

* [sql] Jingsong Li has started a discussion on improving the filesystem
connector for the Table API. The current filesystem connector only supports
CSV format and can only be considered experimental for streaming use cases.
There seems to be a consensus to build on top of the existing
StreamingFileSink (DataStream API) and to focus on ORC, Parquet and better
Hive interoperability. [6]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-10-1-tp38689.html
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
[3]
https://lists.apache.org/thread.html/r198f09496ba46885adbcc41fe778a7a34ad1cd685eeae8beb71e6fbb%40%3Cdev.flink.apache.org%3E
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduce-a-new-module-flink-hadoop-utils-tp39107.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduce-TableFactory-for-StatefulSequenceSource-tp39116.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-115-Filesystem-connector-in-Table-tp38870.html


Notable Bugs
==

* [FLINK-16684] [1.10.0] [1.9.2] The builder of the StreamingFileSink does
not work in Scala. This is one of the blockers to drop support for the
BucketingSink (covered in last week's update). Resolved in Flink 1.10.1. [7]

[7] https://issues.apache.org/jira/browse/FLINK-16684

Events, Blog Posts, Misc
===

* Unfortunately, we had to cancel Flink Forward SF due to the spread of
SARS-CoV-2 two weeks ago. But instead we will have a three day virtual
Flink Forward conference April 22 - 24. You can register for free under [8]

* Stefan Hausmann has published a blog post on how Apache Flink can be used
for streaming ETL on AWS (Kinesis, Kafka, ElasticSearch and S3
(StreamingFileSink)). [9]

* On the Ververica blog Nico Kruber presents a small benchmark comparing
the overhead of SSL encryption in Flink depending on the SSL provider (JDK
vs OpenSSL). The difference seems to be quite significant. [10]

* Upcoming Meetups: None.

[8] https://www.flink-forward.org/sf-2020
[9]
https://aws.amazon.com/blogs/big-data/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics
[10]
https://www.ververica.com/blog/how-openssl-in-ververica-platform-improves-your-flink-job-performance

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2020/11

2020-03-15 Thread Konstantin Knauf
bout the portable Apache Flink runner of Apache
Beam. [15]

* Upcoming Meetups: I personally believe all upcoming meetups in the
regions, I usually cover, will be cancelled. So, no update on this today.

[15]
https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2020/10

2020-03-08 Thread Konstantin Knauf
shop) [15]
* Zurich (March 10th, full-day workshop) [16]
* New Jersey (May 5th, meetup) [17]

[13]
https://www.meetup.com/apache-flink-aws-kinesis-hyd-india/events/268930388/
[14] https://www.meetup.com/Apache-Flink-Meetup/events/269005339/
[15] https://www.meetup.com/futureofdata-vienna/events/268418974/
[16] https://www.meetup.com/futureofdata-zurich/events/268423809/
[17] https://www.meetup.com/futureofdata-princeton/events/268830725/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


Re: [DISCUSS] FLIP-111: Docker image unification

2020-03-08 Thread Konstantin Knauf
Hi Andrey,

thanks a lot for this proposal. The variety of Docker files in the project
has been causing quite some confusion.

For the entrypoint, have you considered to also allow setting configuration
via environment variables as in "docker run -e FLINK_REST_BIN_PORT=8081
..."? This is quite common and more flexible, e.g. it makes it very easy to
pass values of Kubernetes Secrets into the Flink configuration.

With respect to logging, I would opt to keep this very basic and to only
support logging to the console (maybe with a fix for the web user
interface). For everything else, users can easily build their own images
based on library/flink (provide the dependencies, change the logging
configuration).

Cheers,

Konstantin


On Thu, Mar 5, 2020 at 11:01 AM Yang Wang  wrote:

> Hi Andrey,
>
>
> Thanks for driving this significant FLIP. From the user ML, we could also
> know there are
> many users running Flink in container environment. Then the docker image
> will be the
> very basic requirement. Just as you say, we should provide a unified place
> for all various
> usage(e.g. session, job, native k8s, swarm, etc.).
>
>
> > About docker utils
>
> I really like the idea to provide some utils for the docker file and entry
> point. The
> `flink_docker_utils` will help to build the image easier. I am not sure
> about the
> `flink_docker_utils start_jobmaster`. Do you mean when we build a docker
> image, we
> need to add `RUN flink_docker_utils start_jobmaster` in the docker file?
> Why do we need this?
>
>
> > About docker entry point
>
> I agree with you that the docker entry point could more powerful with more
> functionality.
> Mostly, it is about to override the config options. If we support dynamic
> properties, i think
> it is more convenient for users without any learning curve.
> `docker run flink session_jobmanager -D rest.bind-port=8081`
>
>
> > About the logging
>
> Updating the `log4j-console.properties` to support multiple appender is a
> better option.
> Currently, the native K8s is suggesting users to debug the logs in this
> way[1]. However,
> there is also some problems. The stderr and stdout of JM/TM processes
> could not be
> forwarded to the docker container console.
>
>
> [1].
> https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html#log-files
>
>
> Best,
> Yang
>
>
>
>
> Andrey Zagrebin  于2020年3月4日周三 下午5:34写道:
>
>> Hi All,
>>
>> If you have ever touched the docker topic in Flink, you
>> probably noticed that we have multiple places in docs and repos which
>> address its various concerns.
>>
>> We have prepared a FLIP [1] to simplify the perception of docker topic in
>> Flink by users. It mostly advocates for an approach of extending official
>> Flink image from the docker hub. For convenience, it can come with a set
>> of
>> bash utilities and documented examples of their usage. The utilities allow
>> to:
>>
>>- run the docker image in various modes (single job, session master,
>>task manager etc)
>>- customise the extending Dockerfile
>>- and its entry point
>>
>> Eventually, the FLIP suggests to remove all other user facing Dockerfiles
>> and building scripts from Flink repo, move all docker docs to
>> apache/flink-docker and adjust existing docker use cases to refer to this
>> new approach (mostly Kubernetes now).
>>
>> The first contributed version of Flink docker integration also contained
>> example and docs for the integration with Bluemix in IBM cloud. We also
>> suggest to maintain it outside of Flink repository (cc Markus Müller).
>>
>> Thanks,
>> Andrey
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification
>>
>

-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2020/09

2020-03-01 Thread Konstantin Knauf
Dear community,

happy to share this week's community update. It was a relatively quiet week
on the dev@ mailing list (mostly votes on previously covered FLIPs), but
there is always something to share. Additionally, I have decided to also
feature *flink-packages.org <http://flink-packages.org> *in this newsletter
going forward. Depending on the level of activity, I will cover newly added
packages or introduce one of the existing packages.

Flink Development
==

* [sql] Dawid has started a discussion to enable Tabla API/SQL sources to
read columns from different parts of source records. With this it would,
for example, be possible to read partition, timestamp or offset from a
Kafka source record. Similarly, it would be possible to specify override
partitioning when writing to Kafka or Kinesis. [1]

* [sql, python] FLIP-58 introduced Python UDFs in SQL and Table API.
FLIP-79 added a Function DDL in Flink SQL to register Java & Scala UDFs in
pure SQL. Based on these two FLIPs, Wei Zhon published FLIP-106 to also
support Python UDFs in the SQL Function DDL. [2]

* [development] Chesnay started a discussion on Eclipse support for Apache
Flink (framework) development. If you are using Eclipse as an Apache Flink
contributor, please get involved in the thread. [3]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-107-Reading-table-columns-from-different-parts-of-source-records-tp38277.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-106-Support-Python-UDF-in-SQL-Function-DDL-tp38107.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-Eclipse-specific-plugin-configurations-tp38255.html

Notable Bugs
==

[FLINK-16262] [1.10.0] The FlinkKafkaProducer can not be used in
EXACTLY_ONCE mode when using the user code classloader. For application
cluster (per-job clusters) you can work around this issue by using the
system classloader (user jar in lib/ directory). Will be fixed in 1.10.1.
[4]

[4] https://issues.apache.org/jira/browse/FLINK-16262

flink-packages.org
=

DTStack, a Chinese cloud technology company, has recently published FlinkX
[5] on flink-packages.org. The documentation is Chinese only, but it seems
to be a configuration-based integration framework based on Apache Flink
with an impressive set of connectors.

[5] https://flink-packages.org/packages/flinkx

Events, Blog Posts, Misc
===

* This week I stumbled across this Azure tutorial to use Event Hubs with
Apache Flink. [6]
* Gökce Sürenkök has written a blog post on setting up a highly available
Flink cluster on Kubernetes based on Zookeeper for Flink Master failover
and HDFS as checkpoint storage. [7]

* Upcoming Meetups
* On March 5th, Stephan Ewen will talk about Apache Flink Stateful
Function at the Utrecht Data Engineering Meetup. [8]
* On March 12th, Prateep Kumar will host an online event comparing
Kafka Streams and Apache Flink [9].
* On April 22, Ververica will host the next Apache Flink meetup in
Berlin. [10]
* Cloudera is hosting a couple of "Future of Data" events on stream
processing with Apache Flink in
* Vienna (March 4th, full-day workshop) [11]
* Zurich (March 10th, full-day workshop) [12]
* New Jersey (May 5th, meetup) [13]

[6]
https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-kafka-flink-tutorial
[7]
https://medium.com/hepsiburadatech/high-available-flink-cluster-on-kubernetes-setup-73b2baf9200e
[8] https://www.meetup.com/Data-Engineering-NL/events/268424399/
[9]
https://www.meetup.com/apache-flink-aws-kinesis-hyd-india/events/268930388/
[10] https://www.meetup.com/Apache-Flink-Meetup/events/269005339/
[11] https://www.meetup.com/futureofdata-vienna/events/268418974/
[12] https://www.meetup.com/futureofdata-zurich/events/268423809/
[13] https://www.meetup.com/futureofdata-princeton/events/268830725/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2020/07

2020-02-23 Thread Konstantin Knauf
teep Kumar will host an online event on log
analytics with Apache Flink [19].
* On March 5th, Stephan Ewen will talk about Apache Flink Stateful
Function at the Utrecht Data Engineering Meetup. [20]
* Cloudera is hosting a couple of "Future of Data" events on stream
processing with Apache Flink in
* Budapest, (February 25th, meetup) [21]
* Vienna (March 4th, full-day workshop) [22]
* Zurich (March 10th, full-day workshop) [23]
* New Jersey (May 5th, meetup) [24]

[17]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Jingsong-Lee-becomes-a-Flink-committer-tp37938p38006.html
[18] https://flink.apache.org/news/2020/02/20/ddl.html
[19]
https://www.meetup.com/apache-flink-aws-kinesis-hyd-india/events/268907057/
[20] https://www.meetup.com/Data-Engineering-NL/events/268424399/
[21] https://www.meetup.com/futureofdata-budapest/events/268423538/
[22] https://www.meetup.com/futureofdata-vienna/events/268418974/
[23] https://www.meetup.com/futureofdata-zurich/events/268423809/
[24] https://www.meetup.com/futureofdata-princeton/events/268830725/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2020/07

2020-02-16 Thread Konstantin Knauf
Dear community,

happy to share this week's community digest with the release of Flink 1.10,
a proposal for better changelog support in Flink SQL, a documentation style
guide, the Flink Forward San Francisco schedule and a bit more.

Flink Development
==

* [releases] Apache Flink 1.10 has been released! [1] Read all about it in
Marta's release blog post [2]. With the release of Flink 1.10 (technically
already Flink 1.9.2), the maintenance of the "official" Flink Docker images
[3] has moved to the Apache Flink project. [4]

* [releases] Moreover, you can now also find apache-flink on PyPi [5]
  * https://pypi.org/project/apache-flink/1.9.2/#files
  * https://pypi.org/project/apache-flink/1.10.0/#files

* [releases] Chesnay published a third release candidate for flink-shaded
10.0. on Wednesday. Only +1s so far. [6]

* [sql] Jark has published FLIP-105, which proposes to support changelog
streams in Flink SQL. In essence, this means being able to interpret
changelogs (Debezium, compacted topics, ...) as a dynamic table in update
mode. Afterwards, the resulting continuously updating table could be
directly used in a (temporal table) joins and aggregations. [7]

* [sql] Dawid has started a discussion on moving around packages in the
Table API in order to greatly simplify the required imports for users of
the Table API. [8]

* [cep] Shuai Xu proposes to support notFollowedBy() as the last part of a
pattern, if an additional "within" time interval is given. He is waiting
for feedback on his design document. [9]

* [development process] The documentation style guide [10] has finally been
merged. Thanks to Marta, Robert, Aljoscha and others for driving this.

* [connectors] Dawid proposes to drop the ElasticSearch 2.x and 5.x
connector. There is a consensus to drop 2.x, but some uncertainty about the
current usage of 5.x connector. Please get involved in that thread if you
are using ElasticSearch 5.x sink right now. [11]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-10-0-released-tp37564.html
[2] https://flink.apache.org/news/2020/02/11/release-1.10.0.html
[3] https://hub.docker.com/_/flink
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/RESULT-VOTE-Integrate-Flink-Docker-image-publication-into-Flink-release-process-tp37096.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-Python-API-PyFlink-1-9-2-released-tp37597.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-flink-shaded-10-0-release-candidate-3-tp37567.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-105-Support-to-Interpret-and-Emit-Changelog-in-Flink-SQL-tp37665.html
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-TABLE-Issue-with-package-structure-in-the-Table-API-tp37623.html
[9]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-notFollowedBy-with-interval-as-the-last-part-of-a-Pattern-tp37513.html
[10] https://flink.apache.org/contributing/docs-style.html
[11]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Drop-connectors-for-Elasticsearch-2-x-and-5-x-tp37471.html

Notable Bugs
==

Not today.

Events, Blog Posts, Misc
===

* Andrew Torson of Salesforce has published a blog post on application log
analysis with *Apache Flink at Salesforce*.[12]

* The conference program for Flink Forward San Francisco is live now
including speakers from *AWS, Bird, Cloudera, Lyft, Netflix, Splunk, Uber,
Yelp, Alibaba, Ververica *and others! Use "FFSF20-MailingList" to get a 50%
discount on your conference pass. [13]

* Upcoming Meetups
   * On February 19th, Apache Flink Meetup London, "Monitoring and
Analysing Communication & Trade Events as Graphs", hosted by *Christos
Hadjinikolis* [14]

[12]
https://engineering.salesforce.com/application-log-intelligence-performance-insights-at-salesforce-using-flink-92955f30573f
[13] https://www.flink-forward.org/sf-2020/conference-program
[14] https://www.meetup.com/Apache-Flink-London-Meetup/events/268400545/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Head of Product

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2020/06

2020-02-09 Thread Konstantin Knauf
-for-unit-testing-in-apache-flink.html
[15] https://www.flink-forward.org/sf-2020/speakers
[16]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Community-Discounts-for-Flink-Forward-SF-2020-Registrations-td37055.html
[17] https://www.meetup.com/Apache-Flink-London-Meetup/events/268400545/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2020/05

2020-02-02 Thread Konstantin Knauf
Dear community,

comparably quiet times on the dev@ mailing list, so I will keep it brief
and mostly share release related updates today.

Flink Development
==

* [releases] Apache Flink 1.9.2 was released on Thursday. Check out the
release blog post [1] for details. [2]

* [releases] Gary has published and started a vote on the first release
candidate for Flink 1.10. The vote has failed due to incorrect license
information in one of the newly added modules. The community also found a
handful of additional issues, some of which might need to be fixed for the
next RC. [3]

* [docker] The Apache Flink community has unanimously approved the
integration of the Docker image publication into the Apache Flink release
process, which means https://github.com/docker-flink/docker-flink will move
under the Apache Flink project and the Apache Flink Docker images will
become official Flink releases approved by the Apache Flink PMC. [4]

[1] https://flink.apache.org/news/2020/01/30/release-1.9.2.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-9-2-released-tp37102.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-1-10-0-release-candidate-1-tp36985.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/RESULT-VOTE-Integrate-Flink-Docker-image-publication-into-Flink-release-process-tp37096.html

Notable Bugs
==

I am not aware of any notable user-facing bug in any of the released
versions.

Events, Blog Posts, Misc
===

* My colleague *Seth Wiesman* has published an excellent blog post on *state
evolution* in Flink covering state schema evolution, the State Processor
API and a look ahead. [5]

* Upcoming Meetups
  * On February 6 *Alexander Fedulov *will talk about Stateful Stream
Processing with Apache Flink at the R-Ladies meetup in Kiew. [6]

[5]
https://flink.apache.org/news/2020/01/29/state-unlocked-interacting-with-state-in-apache-flink.html
[6] https://www.meetup.com/rladies-kyiv/events/267948988/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


Re: Is there anything strictly special about sink functions?

2020-01-27 Thread Konstantin Knauf
Hi Andrew,

as far as I know there is nothing particularly special about the sink in
terms of how it handles state or time. You can not leave the pipeline
"unfinished", only sinks trigger the execution of the whole pipeline.

Cheers,

Konstantin



On Fri, Jan 24, 2020 at 5:59 PM Andrew Roberts  wrote:

> Hello,
>
> I’m trying to push some behavior that we’ve currently got in a large,
> stateful SinkFunction implementation into Flink’s windowing system. The
> task at hand is similar to what StreamingFileSink provides, but more
> flexible. I don’t want to re-implement that sink, because it uses the
> StreamingRuntimeContext.getProcessingTimeService() via a cast - that class
> is marked as internal, and I’d like to avoid the exposure to an interface
> that could change. Extending it similarly introduces complexity I would
> rather not add to our codebase.
>
> WindowedStream.process() provides more or less the pieces I need, but the
> stream continues on after a ProcessFunction - there’s no way to process()
> directly into a sink. I could use a ProcessFunction[In, Unit, Key, Window],
> and follow that immediately with a no-op sink that discards the Unit
> values, or I could just leave the stream “unfinished," with no sink.
>
> Is there a downside to either of these approaches? Is there anything
> special about doing sink-like work in a ProcessFunction or FlatMapFunction
> instead of a SinkFunction?
>
> Thanks,
>
> Andrew
>
>
>
> --
> *Confidentiality Notice: The information contained in this e-mail and any
>
> attachments may be confidential. If you are not an intended recipient, you
>
> are hereby notified that any dissemination, distribution or copying of this
>
> e-mail is strictly prohibited. If you have received this e-mail in error,
>
> please notify the sender and permanently delete the e-mail and any
>
> attachments immediately. You should not retain, copy or use this e-mail or
>
> any attachment for any purpose, nor disclose all or any part of the
>
> contents to any other person. Thank you.*
>


-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2020/04

2020-01-26 Thread Konstantin Knauf
Dear community,

happy to share a brief community digest after a rather quite week including
updates on Flink 1.9.2 and Flink 1.10, and the ongoing votes on FLIP-27
(New Source Interface) and FLIP-92 (N-Ary Stream Operator).

Flink Development
==

* [releases] Hequn has published the first release candidate for Apache
Flink 1.9.2 and is waiting for votes and feedback. [1]

* [releases] No feedback on the RC0 for Flink 1.10 so far. Still, I assume
a first release (non-preview) candidate for Flink 1.10 is just around the
corner as only a few blockers remain on the board for 1.10. [2]

* [runtime] Piotr has started a vote on adding an N-Ary Stream Operator
(FLIP-92). [3]

* [connectors] Becket has resumed the vote on highly anticipated FLIP-27,
the new source interface. [4]

[1]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=349=FLINK
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-1-9-2-release-candidate-1-tp36943.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-92-Add-N-Ary-Stream-Operator-in-Flink-tp36539.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-27-Refactor-Source-Interface-tp35569p36850.html

Notable Bugs
==

[FLINK-15575] [1.9.1] Incomplete shading/relocation of some dependencies of
the Azure filesystem can result in classloading conflicts. This only
applies to users using Application Clusters (Job Clusters) prior to Flink
1.10 as these do not use a dedicated classloader for user code. Resolved in
1.9.2 and 1.10. [5]

[FLINK-13758] [1.8.3] [1.9.1] As a user you can use Flink's distributed
cache to make files available to all user defined functions running in the
cluster. This did not work for files stored in HDFS for a while. Resolved
in 1.8.4, 1.9.2 and 1.10. [6]

[5] https://issues.apache.org/jira/browse/FLINK-15575
[6] https://issues.apache.org/jira/browse/FLINK-13758

Events, Blog Posts, Misc
===

* Yu Li is now an Apache Flink Comitter. Congratulations! [7]

* Apache has published their "Apache in 2019 - By the Digits" blog post.
Apache Flink ranking *3rd in #commits, 1st in user@ mailing list activity, *and
*2nd in dev@* *mailing list activity*. Congrats to the community! [8]

* Upcoming Meetups
  * On January 30th *Preetdeep Kumar* is hosting an introductory online
meetup in the Hyderabab Apache Flink Meetup group on Windows and Function
in a Apache Flink.
<https://www.meetup.com/Hyderabad-Apache-Flink-Meetup-Group/events/268080082/attendees/>
[9]
  * On February 6 *Alexander Fedulov *will talk about Stateful Stream
Processing with Apache Flink at the R-Ladies meetup in Kiew. [10]

[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Yu-Li-became-a-Flink-committer-tp36904.html
[8] https://blogs.apache.org/foundation/entry/apache-in-2019-by-the
[9]
https://www.meetup.com/Hyderabad-Apache-Flink-Meetup-Group/events/268080082/
[10] https://www.meetup.com/rladies-kyiv/events/267948988/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2020/03

2020-01-19 Thread Konstantin Knauf
Dear community,

happy to share this week's weekly community digest with a release candidate
for Flink 1.10, a Pulsar Catalog for Flink, a 50% discount code for Flink
Forward SF and bit more.

Flink Development
==

* [releases] The first (preview)* release candidate for Flink 1.10* has
been created. Every help in testing the release candidate is highly
appreciated. [1]

* [sql] I believe I have never covered the contribution of a *Pulsar
Catalog* to Flink's SQL ecosystem in the past. So here it is. Apache Pulsar
includes a schema registry out-of-the-box. With Flink's Pulsar catalog,
Pulsar topics will automatically available as tables in Flink. Pulsar
namespaces are mapped to databases in Flink. [2,3]

* [deployment] End of last year Patrick Lucas had started a discussion on
integrating the *Flink Docker images *into the Apache Flink project. The
discussion has stalled a bit by now, but there seems to be a consensus that
a) the Flink Docker images should move to Apache Flink and b) the
Dockerfiles in apache/flink need to be consolidated. [4]

* [statebackends] When using the *RocksDbStatebackend Timer* state is
currently - by default - still stored on the Java heap. Stephan proposes to
change this behaviour to store Timers in RocksDB by default and has
received a lot of positive feedback. [5]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-10-0-release-candidate-0-tp36770.html
[2]
https://docs.google.com/document/d/1LMnABtXn-wQedsmWv8hopvx-B-jbdr8-jHbIiDhdsoE/edit
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-72-Introduce-Pulsar-Connector-tp33283.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Integrate-Flink-Docker-image-publication-into-Flink-release-process-tp36139.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Change-default-for-RocksDB-timers-Java-Heap-in-RocksDB-tp36720.html

Notable Bugs
==

* [FLINK-15577] [1.9.1] When using two different windows within on SQL
query, Flink might generate an invalid physical execution plan as it
incorrectly considers the two window transformations equivalent. More
details in the ticket. [6]

[6] https://issues.apache.org/jira/browse/FLINK-15577

Events, Blog Posts, Misc
===

* *Dian Fu* is now an Apache Flink Comitter. Congratulations! [7]

* *Bird *has published an in-depth blog post on how they use Apache Flink
to detect offline scooters. [8]

* On the Flink Blog, Alexander Fedulov has published a first blog post of a
series on how to implement a *financial fraud detection use case* with
Apache Flink. [9]

* The extended *Call for Presentations for Flink Forward San Francisco*
ends today. Make sure to submit your talk in time. [10,11]

* Still on the fence whether to attend *Flink Forward San Francisco*? Let
me help you with that: when registering use discount code
*FFSF20-MailingList* to get a * 50% discount* on your conference ticket.

* Upcoming Meetups
   * On January 22 my colleague *Alexander Fedulov *will talk about Fraud
Detection with Apache Flink at the Apache Flink Meetup in Madrid [12].
   * On February 6 *Alexander Fedulov *will talk about Stateful Stream
Processing with Apache Flink at the R-Ladies meetup in Kiew. [13]

[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Dian-Fu-becomes-a-Flink-committer-tp36696p36760.html
[8]
https://medium.com/bird-engineering/replayable-process-functions-in-flink-time-ordering-and-timers-28007a0210e1
[9] https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html
[10]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Flink-Forward-San-Francisco-2020-Call-for-Presentation-extended-tp36595.html
[11] https://www.flink-forward.org/sf-2020
[12]
https://www.meetup.com/Meetup-de-Apache-Flink-en-Madrid/events/267744681/
[13] https://www.meetup.com/rladies-kyiv/events/267948988/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2020/02

2020-01-12 Thread Konstantin Knauf
Dear community,

happy new year from my side, too, and thanks a lot to Hequn for helping out
with the weekly updates during the last three weeks! I enjoyed reading
these myself for a change.

This week's community digest features an update on Flink 1.10 release
testing, a proposal for a SQL catalog to read the schema of relational
databases and the Call for Presentations of Flink Forward San Francisco.

Flink Development
==

* [releases] The community is still testing and fixing bugs for* Flink 1.10*.
You can follow the effort on the release burndown board [1]. Should not be
too long until a first RC is ready.

* [sql] Bowen proposes to add a *JDBC and Postgres Catalog *to the Table
API. By this, Flink could automatically create tables corresponding to the
tables in relational databases. Currently, users need to manually create
corresponding tables (incl. schema) on the Flink-side. [2,3]

* [configuration] Xintong proposes to change some of the default values for
Flink's memory configuration following his work on *FLIP-49 *and is looking
for feedback [4]

* [datastream api] Congxian proposes to unify the handling of adding "null"
values to *AppendingState* across statebackends. The proposed behavior is
to make all statebackends refuse "null" values. [5]

[1]
https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=349=FLINK
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-92%3A+JDBC+catalog+and+Postgres+catalog
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-92-JDBC-catalog-and-Postgres-catalog-tp36505.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-Tuning-FLIP-49-configuration-default-values-td36528.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Make-AppendingState-add-refuse-to-add-null-element-tp36493.html

Notable Bugs
==

A lot of activity due to release testing, but I did not catch any new
notable bugs for already released versions.

Events, Blog Posts, Misc
===

* *Flink Forward San Francisco Call for Presentations *is ending soon, but
you still have a chance to submit your talk to the one (and possibly only)
Apache Flink community conference in North America. In case of questions or
if you are unsure whether to submit, feel free to reach out to me
personally. [6]

* Upcoming Meetups
  * On January 18th *Preetdeep Kumar* will share some basic Flink
DataStream processing API, followed by a hands-on demo. This will be an
online event. Check more details within the meetup link. [7]
  * On January 22 my colleague *Alexander Fedulov *will talk about Fraud
Detection with Apache Flink at the Apache Flink Meetup in Madrid [8].

[6] https://www.flink-forward.org/sf-2020
[7]
https://www.meetup.com/Hyderabad-Apache-Flink-Meetup-Group/events/267610014/
[8]https://www.meetup.com/Meetup-de-Apache-Flink-en-Madrid/events/267744681/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


Re: [ANNOUNCE] Weekly Community Update 2019/50

2019-12-17 Thread Konstantin Knauf
Hi Hequn,

thanks, and thanks for the offer. Of course, you can cover the holiday
break, i.e. the next three weeks. Looking forward to your updates!

Cheers,

Konstantin

On Mon, Dec 16, 2019 at 5:53 AM Hequn Cheng  wrote:

> Hi Konstantin,
>
> Happy holidays and thanks a lot for your great job on the updates
> continuously.
> With the updates, it is easier for us to catch up with what's going on in
> the community, which I think is quite helpful.
>
> I'm wondering if I can do some help and cover this during your vocation. :)
>
> Best,
> Hequn
>
> On Sun, Dec 15, 2019 at 11:36 PM Konstantin Knauf <
> konstan...@ververica.com> wrote:
>
>> Dear community,
>>
>> happy to share this week's brief community digest with updates on Flink
>> 1.8.3 and Flink 1.10, a discussion on how to facilitate easier Flink/Hive
>> setups, a couple of blog posts and a bit more.
>>
>> *Personal Note:* Thank you for reading these updates since I started
>> them early this year. I will take a three week Christmas break and will be
>> back with a Holiday season community update on the 12th of January.
>>
>> Flink Development
>> ==
>>
>> * [releases] Apache Flink 1.8.3 was released on Wednesday. [1,2]
>>
>> * [releases] The feature freeze for Apache Flink took place on Monday.
>> The community is now working on testing, bug fixes and improving the
>> documentation in order to create a first release candidate soon. [3]
>>
>> * [development process] Seth has revived the discussion on a past PR by
>> Marta, which added a documentation style guide to the contributor guide.
>> Please check it [4] out, if you are contributing documentation to Apache
>> Flink. [5]
>>
>> * [security] Following a recent report to the Flink PMC of "exploiting"
>> the Flink Web UI for remote code execution, Robert has started a discussion
>> on how to improve the tooling/documentation to make users aware of this
>> possibility and recommend securing this interface in production setups. [6]
>>
>> * [sql] Bowen has started a discussion on how to simplify the Flink-Hive
>> setup for new users as currently users need to add some additional
>> dependencies to the classpath manually. The discussion seems to conclude
>> towards providing a single additional hive-uber jar, which contains all the
>> required dependencies. [7]
>>
>> [1] https://flink.apache.org/news/2019/12/11/release-1.8.3.html
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-8-3-released-tp35868.html
>> [3]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Feature-freeze-for-Apache-Flink-1-10-0-release-tp35139.html
>> [4] https://github.com/apache/flink-web/pull/240
>> [5]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Docs-Style-Guide-Review-tp35758.html
>> [6]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-documentation-tooling-around-security-of-Flink-tp35898.html
>> [7]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-tp35918.html
>>
>> Notable Bugs
>> ==
>>
>> [FLINK-15152] [1.9.1] When a "stop" action on a job fails, because not
>> all tasks are in "RUNNING" state the job is not checkpointing afterwards.
>> [8]
>>
>> [8] https://issues.apache.org/jira/browse/FLINK-15152
>>
>> Events, Blog Posts, Misc
>> ===
>>
>> * Zhu Zhu is now an Apache Flink Comitter. Congratulations! [9]
>>
>> * Gerred Dillon has published a blog post on the Apache Flink blog on how
>> to run Flink on Kubernetes with a KUDO Flink operator. [10]
>>
>> * In this blog post Apache Flink PMC Sun Jincheng outlines the reasons
>> and motivation for his and his colleague's work to provide a world-class
>> Python support for Apache Flink's Table API. [11]
>>
>> * Upcoming Meetups
>> * On December 17th there will be the second Apache Flink meetup in
>> Seoul. [12] *Dongwon* has shared a detailed agenda in last weeks
>> community update. [13]
>> * On December 18th Alexander Fedulov will talk about Stateful Stream
>> Processing with Apache Flink at the Java Professionals Meetup in Minsk. [14]
>>
>> [9]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Zhu-Zhu-becomes-a-Flink-committer-tp35944.html
>> [10] https://flink.apache.org/news/2019/12/09/flink-kubernetes-kudo.html
&g

[ANNOUNCE] Weekly Community Update 2019/50

2019-12-15 Thread Konstantin Knauf
Dear community,

happy to share this week's brief community digest with updates on Flink
1.8.3 and Flink 1.10, a discussion on how to facilitate easier Flink/Hive
setups, a couple of blog posts and a bit more.

*Personal Note:* Thank you for reading these updates since I started them
early this year. I will take a three week Christmas break and will be back
with a Holiday season community update on the 12th of January.

Flink Development
==

* [releases] Apache Flink 1.8.3 was released on Wednesday. [1,2]

* [releases] The feature freeze for Apache Flink took place on Monday. The
community is now working on testing, bug fixes and improving the
documentation in order to create a first release candidate soon. [3]

* [development process] Seth has revived the discussion on a past PR by
Marta, which added a documentation style guide to the contributor guide.
Please check it [4] out, if you are contributing documentation to Apache
Flink. [5]

* [security] Following a recent report to the Flink PMC of "exploiting" the
Flink Web UI for remote code execution, Robert has started a discussion on
how to improve the tooling/documentation to make users aware of this
possibility and recommend securing this interface in production setups. [6]

* [sql] Bowen has started a discussion on how to simplify the Flink-Hive
setup for new users as currently users need to add some additional
dependencies to the classpath manually. The discussion seems to conclude
towards providing a single additional hive-uber jar, which contains all the
required dependencies. [7]

[1] https://flink.apache.org/news/2019/12/11/release-1.8.3.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-8-3-released-tp35868.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Feature-freeze-for-Apache-Flink-1-10-0-release-tp35139.html
[4] https://github.com/apache/flink-web/pull/240
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Docs-Style-Guide-Review-tp35758.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-documentation-tooling-around-security-of-Flink-tp35898.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-tp35918.html

Notable Bugs
==

[FLINK-15152] [1.9.1] When a "stop" action on a job fails, because not all
tasks are in "RUNNING" state the job is not checkpointing afterwards. [8]

[8] https://issues.apache.org/jira/browse/FLINK-15152

Events, Blog Posts, Misc
===

* Zhu Zhu is now an Apache Flink Comitter. Congratulations! [9]

* Gerred Dillon has published a blog post on the Apache Flink blog on how
to run Flink on Kubernetes with a KUDO Flink operator. [10]

* In this blog post Apache Flink PMC Sun Jincheng outlines the reasons and
motivation for his and his colleague's work to provide a world-class Python
support for Apache Flink's Table API. [11]

* Upcoming Meetups
* On December 17th there will be the second Apache Flink meetup in
Seoul. [12] *Dongwon* has shared a detailed agenda in last weeks community
update. [13]
* On December 18th Alexander Fedulov will talk about Stateful Stream
Processing with Apache Flink at the Java Professionals Meetup in Minsk. [14]

[9]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Zhu-Zhu-becomes-a-Flink-committer-tp35944.html
[10] https://flink.apache.org/news/2019/12/09/flink-kubernetes-kudo.html
[11]
https://developpaper.com/why-will-apache-flink-1-9-0-support-the-python-api/
[12] https://www.meetup.com/Seoul-Apache-Flink-Meetup/events/266824815/
[13]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Weekly-Community-Update-2019-48-td35423.html
[14] https://www.meetup.com/Apache-Flink-Meetup-Minsk/events/267134296/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2019/49

2019-12-08 Thread Konstantin Knauf
ica blog including a short summary of the keynotes. [14]

* Upcoming Meetups
* On December 17th there will be the second Apache Flink meetup in
Seoul. [15] *Dongwon* has shared a detailed agenda in last weeks community
update. [16]

[14] https://www.ververica.com/blog/flink-forward-asia-2019-summary
[15] https://www.meetup.com/Seoul-Apache-Flink-Meetup/events/266824815/
[16]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Weekly-Community-Update-2019-48-td35423.html

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2019/48

2019-12-01 Thread Konstantin Knauf
Dear community,

happy to share a short community update this week. With one week to go to
the planned feature freeze for Flink 1.10 and Flink Forward Asia in Beijing
the dev@ mailing list pretty quiet these days.

Flink Development
==

* [releases] Hequn has started the vote on RC1 for Flink 1.8.3, which
unfortunately has already received a -1 due to wrong/missing license
information. Expecting a new RC soon. [1]

* [sql] In the past timestamp fields in Flink SQL were internally
represented as longs and it was recommended to use longs directly in
user-defined functions. With the introduction of a new TimestampType the
situation has changed and conversion between long and TIMESTAMP will be
disabled. [2]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-1-8-3-release-candidate-1-tp35401p35407.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Disable-conversion-between-TIMESTAMP-and-Long-in-parameters-and-results-of-UDXs-tp35269p35271.html

Notable Bugs
==

* [FLINK-14930] [1.9.1] The OSS filesystem did not allow the configuration
of additional credentials providers due to a shading-related bug. Resolved
for 1.9.2. [3]

[3] https://issues.apache.org/jira/browse/FLINK-14930

Events, Blog Posts, Misc
===

* Flink Forward Asia took place this week at the National Congress Center
in Beijing organized by Alibaba. Talks by Ververica, Tencent, Baidu,
Alibaba, Dell, Lyft, Netflix, Cloudera and many other members of the
Chinese Apache Flink community, and more than 1500 attendees as far as I
heard. Impressive! [4]

* At Flink Forward Asia Alibaba announced it has open sourced Alink, a
machine learning library on top of Apache Flink[5,6]

* Upcoming Meetups
* The first meetup of the Apache Flink Meetup Chicago on 5th of
December comes with four talks(!) highlighting different deployment methods
of Apache Flink (AWS EMR, AWS Kinesis Analytics, Verveirca Platform, IBM
Kubernetes). Talks by *Trevor Grant*, *Seth Wiesman*, *Joe Olson* and
*Margarita
Uk*. [7]
* On December 17th there will be the second Apache Flink meetup in
Seoul. Maybe Dongwon can share the list of speakers in this thread, my
Korean is a bit rusty. [8]

[4] https://m.aliyun.com/markets/aliyun/developer/ffa2019
[5]
https://technode.com/2019/11/28/alibaba-cloud-machine-learning-platform-open-source/
[6] https://github.com/alibaba/Alink/blob/master/README.en-US.md
[7]
https://www.meetup.com/Chicago-Apache-Flink-Meetup-CHAF/events/266609828/
[8] https://www.meetup.com/Seoul-Apache-Flink-Meetup/events/266824815/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2019/47

2019-11-24 Thread Konstantin Knauf
Dear community,

happy to share this week's weekly community digests with flink-packages.org,
an update on the release of Flink 1.8.3, Flink Forward San Francisco CfP
and a couple of smaller discussions and proposals.

Flink Development
==

* [ecosystem] Ververica has announced the launch of flink-packages.org. It
is a registry of  Apache Flink ecosystem projects (connecters, metrics
reporters, operational tools, ...) to make it easier to find and promote
these projects in the community. Everyone can add their own projects and
vote on listed projects. [1]

* [releases] Now that the vote for flink-shaded 9.0 has passed [2], I
expect to see a first release candidate for Flink 1.8.3 as well soon [3].

* [releases] The feature freeze for Flink 1.10 was announced for December
8th. The release branch will be cut that date. [4]

* [connectors] As part of Pulsar connector contribution, Yijie Shen has
shared a first design document for the integration of Pulsar's catalog with
Apache Flink [5]

* [sql] Dawid has published a FLIP (FlIP-87) to support primary key
constraints in Apache Flink SQL. Primary Keys (and unique constraints) can
be used for query optimization and primary keys are natural keys for upsert
streams. [6]

* [checkpointing] Shuwen Zhou has started a discussion on a more flexible
configuration checkpointing times (e.g. cron style). [7]

* [webui] Chesnay proposes to remove the old web user interface. It was
replaced by a new implementation in Flink 1.9 and was kept around as a
backup. [8]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Launch-of-flink-packages-org-A-website-to-foster-the-Flink-Ecosystem-tp35091.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-flink-shaded-9-0-release-candidate-1-tp35146.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-8-3-tp34811.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Feature-freeze-for-Apache-Flink-1-10-0-release-tp35139.html
[5]
https://docs.google.com/document/d/1LMnABtXn-wQedsmWv8hopvx-B-jbdr8-jHbIiDhdsoE/edit
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Primary-keys-in-Table-API-tp35138.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Cron-style-for-checkpoint-tp35194.html
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-old-WebUI-tp35218.html

Notable Bugs
==

Nothing worth mentioning came to my attention.

Events, Blog Posts, Misc
===

* The Call for Presentation for Flink Forward San Francisco is now open.
The conference will happen March 23rd - 25th (two days conference, one day
training). It is the first time that we will have two full conference days
in San Francisco. [9]

* Upcoming Meetups
* We will have our next Apache Flink Meetup in Munich on November 27th
with talks by *Heiko Udluft & Giuseppe Sirigu*, Airbus, and *Konstantin
Knauf* (on Stateful Functions). [10]
* There will be an introduction to Apache Flink, use cases and best
practices at the next Uber Engineering meeup in Toronto. If you live in
Toronto, its an excellent opportunity to get started with Flink or to meet
local Flink users. [11]
* The first meetup of the Apache Flink Meetup Chicago on 5th of
December comes with four talks(!) highlighting different deployment methods
of Apache Flink (AWS EMR, AWS Kinesis Analytics, Verveirca Platform, IBM
Kubernetes). Talks by *Trevor Grant*, *Seth Wiesman*, *Joe Olson* and
*Margarita
Uk*. [12]

[9]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Flink-Forward-North-America-2020-Call-for-Presentations-open-until-January-12th-2020-tp35187.html
[10] https://www.meetup.com/Apache-Flink-Meetup-Munich/events/266072196/
[11]
https://www.meetup.com/Uber-Engineering-Events-Toronto/events/266264176/
[12]
https://www.meetup.com/Chicago-Apache-Flink-Meetup-CHAF/events/266609828/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


Weekly Community Update 2019/46

2019-11-17 Thread Konstantin Knauf
Dear community,

happy to share this week's community update. While slowly approaching the
planned feature freeze for Flink 1.10, things have calmed down a bit on the
dev mailing list. Stil there are couple of interesting topics to cover.
This week's digest includes an update on the Flink 1.8.3 release, a
proposal for a KafkaMetricsReporter, a FLIP to improve the configuration of
Table API connectors and a bit more.

Flink Development
==

* [releases] With respect to the release of *Flink 1.8.3* the main
questions seems to be whether to wait for flink-shaded 9.0. If not, a first
release candidate can be expected within the next week. [1]

* [releases] Chesnay proposes to release *flink-shaded 9.0* soon and is
looking for someone to manage this release. [2]

* [connectors] Stephan has shared a quick update on the progress of *FLIP-27
(New Source Interface)*: a first version will likely be available in 1.10,
but it will probably take another release until connectors are migrated and
things settle down a bit. [3]

* [connectors, sql] Jark has started a FLIP discussion to* improve the
properties format* of Table API/SQL connectors. It contains a whole list of
smaller and larger improvements and it seems this will be targeted for the
1.11 release. [4]

* [metrics] Gyula has started a discussion on contributing a
*MetricsReporter* to write Flink's metrics to *Apache Kafka*. [5]

* [development process] Dian Fu proposes to introduce a *security@f.a.o*
mailing list for users to report security-related issues. There is a lot of
positive feedback, but also some concerns,  e.g. because there is already a
cross-project secur...@apache.org mailing list. [6]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-8-3-tp34811.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Release-flink-shaded-9-0-tp35041.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-27-Refactor-Source-Interface-tp24952.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-86-Improve-Connector-Properties-tp34922.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSSION-Kafka-Metrics-Reporter-tp35067.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tp34950p34951.html

Notable Bugs
==

Not much happening here. So, let's look at two edge cases, which might help
one or the other user.

* [FLINK-13184] [1.9.1] [1.8.2] If you are deploying to YARN and start a
lot of Taskmanagers (1000s), the Resourcemanager might be
blocked/unresponsive quite some time, so that heartbeats of Taskmanagers
start timing out. Yang Wang is working on a fix, which might get into
1.8.3. [7]

* [FLINK-14066] If you try to *build* PyFlink on Windows, this will fail as
we use a UNIX specific path to the local Flink distribution's build target.
The fix is contained in this ticket. [8]

[7] https://issues.apache.org/jira/browse/FLINK-13184
[8] https://issues.apache.org/jira/browse/FLINK-14066

Events, Blog Posts, Misc
===

* A missed this excellent *Flink Foward Europe *recap by my colleagues
*Robert* and* Fabian* published 1st of November on the Ververica blog in
the last weeks, so here it is. [9]

* Upcoming Meetups
* *Bowen Li* will speak about "The Rise of Apache Flink and Stream
Processing" at the next Big Data Bellevue in Seattle on the 20th of
November. [10]
* The next edition of the Bay Area Apache Flink meetup will happen on
the 20th of November with talks by *Gyula Fora (Cloudera)* and *Lakshmi Rao
(Lyft)*.[11]
* We will have our next Apache Flink Meetup in Munich on November 27th
with talks by *Heiko Udluft & Giuseppe Sirigu*, Airbus, and *Konstantin
Knauf* (on Stateful Functions). [12]
* There will be an introduction to Apache Flink, use cases and best
practices at the next Uber Engineering meeup in Toronto. If you live in
Toronto, its an excellent opportunity to get started with Flink or to meet
local Flink users. [13]

[9] https://www.ververica.com/blog/flink-forward-europe-2019-recap
[10] https://www.meetup.com/Big-Data-Bellevue-BDB/events/fxbnllyzpbbc/
[11] https://www.meetup.com/Bay-Area-Apache-Flink-Meetup/events/266226960/
[12] https://www.meetup.com/Apache-Flink-Meetup-Munich/events/266072196/
[13]
https://www.meetup.com/Uber-Engineering-Events-Toronto/events/266264176/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Di

[ANNOUNCE] Weekly Community Update 2019/45

2019-11-10 Thread Konstantin Knauf
Dear community,

happy to share this week's community digest with updates on stateful
functions & Flink 1.8.3, a discussion on a connector for
Hortonworks/Cloudera's schema registry, a couple of meetups and a bit more.

Flink Development
==

* [stateful functions] After the successful vote to accept* Stateful
Function*s into Flink Igal has started a thread to discuss a few details of
the contribution like repository name, mailing lists, component name, etc.
[1]

* [releases] Jingcheng has started a conversation about the release of *Flink
1.8.3*. Probably still waiting for a few fixes to come in, but looks like
there could be a first RC soon. [2]

* [connectors] Őrhidi Mátyás and Gyula propose to contribute a connector
for *Hortonworks/Cloudera Schema Registry*, which can be used during
de-/serialization in Flink's Kafka Connector. [3]

* [python] Apache Flink will start Python processes to execute *Python UDFs*
in the Table API, planned for 1.10. Dian Fu has published a proposal how to
integrate the resource requirements of these Python processes into the
unified memory configuration framework, which is currently introduced in
FLIP-49. [4]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Stateful-Functions-Contribution-Details-tp34737.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-8-3-tp34811.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Avro-Cloudera-Registry-FLINK-14577-tp34647.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-PyFlink-User-Defined-Function-Resource-Management-tp34631.html

Notable Bugs
==

* [FLINK-14382] [1.9.1] [1.8.2] [yarn] In Flink 1.9+ filesystem
dependencies can be loaded via a plugin mechanism, each with its own
classloader. This does currently not work on YARN, where the plugin
directory is directly added to the classpath instead. [5]

[5] https://issues.apache.org/jira/browse/FLINK-14382

Events, Blog Posts, Misc
===

* *Jark Wu *is now a member of the Apache Flink PMC. Congratulations! [6]
* This blog post by *Sreekanth Krishnavajjala & Vinod Kataria (AWS)*
includes a hands-on introduction to Apache Flink on AWS EMR. [7]
* Upcoming Meetups
* At the next Athens Big Data Group on the 14th of November *Chaoran Yu
*of Lightbend will talk about Flink and Spark on Kubernetes. [8]
* *Bowen Li* will speak about "The Rise of Apache and Stream
Processing" at the next Big Data Bellevue in Seattle on the 20th of
November. [9]
* The next edition of the Bay Area Apache Flink meetup will happen on
the 20th of November with talks by *Gyula Fora (Cloudera)* and *Lakshmi Rao
(Lyft)*.[10]
* We will have our next Apache Flink Meetup in Munich on November 27th
with talks by *Heiko Udluft & Giuseppe Sirigu*, Airbus, and *Konstantin
Knauf* (on Stateful Functions). [11]

[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Jark-Wu-is-now-part-of-the-Flink-PMC-tp34768.html
[7]
https://idk.dev/extract-oracle-oltp-data-in-real-time-with-goldengate-and-query-from-amazon-athena/
[8] https://www.meetup.com/Athens-Big-Data/events/265957761/
[9] https://www.meetup.com/Big-Data-Bellevue-BDB/events/fxbnllyzpbbc/
[10] https://www.meetup.com/Bay-Area-Apache-Flink-Meetup/events/266226960/
[11] https://www.meetup.com/Apache-Flink-Meetup-Munich/events/266072196/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2019/44

2019-11-03 Thread Konstantin Knauf
Dear community,

happy to share this week's community digest with updates on the ongoing
release cycle, end-to-end performance testing, changes to the Table API and
a discussion on the Flink Per-Job Mode (aka Application Clusters).

Flink Development
==

* [releases] We have about one month until the planned feature freeze
for *Flink
1.10* and Gary has shared another status update. Check it out for a good
overview of the ongoing development threads. [1]

* [development process] Yu proposes to integrate *end-to-end performance
tests *into our build process. For this he has published FLIP-83 [2] which
describes two benchmark jobs and a list of configuration scenarios to
consider. In a first step the tests would focus on throughput in a small
standalone cluster. [3]

* [development process] Yu reminds all FLIP authors to keep the* FLIP
status *in Confluence up-to-date to facilitate release planning [4]

* [sql] Terry Wang proposes a few changes to the* API of the
TableEnvironment *(TableEnvironment#execute/sqlQuery/sqlUpdate etc.) to
better support asynchronous submission and multi-line sql statements and to
make the API more consistent overall. [5]

* [clients] Tison has started a discussion on *Flink's per-job mode *and
highlights two shortcomings of the current implementation. First, the
per-job mode only allows a single JobGraph to be executed. Second, on YARN,
the JobGraph is compiled on the client side, not in the Flink Master as in
the Standalone Per-Job mode. There has already been quite some feedback on
the proposal, which mostly emphasizes the need for multiple JobGraphs in
per-job mode and advantages of the current client-side compilation of the
JobGraph. [6]

* [connectors] *flink-orc* has been moved from "flink-connectors" to
"flink-formats". [7,8]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Progress-of-Apache-Flink-1-10-2-tp34585.html
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-83%3A+Flink+End-to-end+Performance+Testing+Framework
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-83-Flink-End-to-end-Performance-Testing-Framework-tp34517.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Reminder-please-update-FLIP-document-to-latest-status-tp34555.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Improve-Refactor-API-of-Table-Module-tp34537.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Semantic-and-implementation-of-per-job-mode-tp34502p34520.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Move-flink-orc-to-flink-formats-from-flink-connectors-tp34438.html
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Move-flink-orc-to-flink-formats-from-flink-connectors-tp34496.html


Notable Bugs
==

* [FLINK-14546] [1.9.1] [1.8.2] The *RabbitMQSource *might leave open
consumers around after a job is cancelled or stopped. PR available. [9]

[9] https://issues.apache.org/jira/browse/FLINK-14546

Events, Blog Posts, Misc
===

* *Becket Qin* is now a member of the Apache Flink PMC. Congratulations!
[10]

* *Euroa Nova* has published a Flink Forward Europe recap on their blog
including key takeaways and summaries of selected talks. [11]

* *Preetdeep Kumar *has published the first part of a series of articles on
Streaming ETL with Apache Flink on DZone. [12]

* Upcoming Meetups
* There will be Flink/Spark talk at the next Chicago Big Data [13] on
the 7th of November. No idea what it will be about (can't join the group) :)
* At the next Athens Big Data Group on the 14th of November *Chaoran Yu
*of Lightbend will talk about Flink and Spark on Kubernetes. [14]
* We will have our next Apache Flink Meetup in Munich on November 27th
with talks by *Heiko Udluft & Giuseppe Sirigu*, Airbus, and *Konstantin
Knauf* (on Stateful Functions). [15]

[10]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Becket-Qin-joins-the-Flink-PMC-tp34400p34452.html
[11] https://research.euranova.eu/flink-forward-the-key-takeaways/
[12]
https://dzone.com/articles/introduction-to-streaming-etl-with-apache-flink
[13]
https://www.meetup.com/Chicago-area-Hadoop-User-Group-CHUG/events/265675851/
[14] https://www.meetup.com/Athens-Big-Data/events/265957761/
[15] https://www.meetup.com/Apache-Flink-Meetup-Munich/events/266072196/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2019/43

2019-10-27 Thread Konstantin Knauf
Group-CHUG/events/265675851/
[12] https://www.meetup.com/Athens-Big-Data/events/265957761/
[13] https://www.meetup.com/Bangalore-Apache-Kafka-Group/events/265285812/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2019/42

2019-10-20 Thread Konstantin Knauf
Dear community,

happy to share this week's community update with the release of Flink
1.9.1, a couple of threads about our development process, the sql ecosystem
and more.

Flink Development
==

* [releases] *Apache Flink 1.9.1* was released. [1,2]

* [statefun] Stephan has started a separate discussion on whether to
maintain *Stateful Functions* in a separate repository or the Flink main
repository after its contribution to Apache Flink. The majority seems to
prefer a separate repository e.g. to enable quicker iterations on the new
code base and not to overwhelm new contributors to Stateful Functions. [3]

* [development process] The *NOTICE* file and the directory for licenses of
bundled dependencies for binary releases is now auto-generated during the
release process. [4]

* [development process] According to our *FLIP process* the introduction of
a new config option requires a FLIP (and vote). Aljoscha has started a
discussion to clarify this point, as this is currently not always the case.
Looks like the majority leans towards a vote for every configuration
change, but possibly making it more lightweight than a proper FLIP. [5]

* [development process] Xiyuan gave an update on *Flink's ARM support.*
Travis ARM Support is in alpha now (alternative to previously proposed
OpenLab), and regardless of the CI system
Xiyuan points the community to a list of PRs/Jiras, which need to be
solved/reviewed. [6]

* [configuration] The discussion on FLIP-59 to make the execution
configuration (ExecutionConfig et al.) configurable via the Flink
Configuration has been revived a bit and focuses on alignment with FLIP-73
and naming of the different configurations now. [7]

* [sql] Based on feedback from the user community, Timo proposes to rename
the "ANY" datatype "OPAQUE" highlighting that a field of type "ANY" does
not hold any type, but a data type that is unknown to Flink. [8]

* [sql] Jark has started a discussion on FLIP-80 [9] about how to
de/serialize expressions in catalogs. [10]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-9-1-released-tp34170.html
[2] https://flink.apache.org/news/2019/10/18/release-1.9.1.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Stateful-Functions-in-which-form-to-contribute-same-or-different-repository-tp34034.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/NOTICE-Binary-licensing-is-now-auto-generated-tp34121.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-policy-for-introducing-config-option-keys-tp34011p34045.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ARM-support-Travis-ARM-CI-is-now-in-Alpha-Release-tp34039.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-59-Enable-execution-configuration-from-Configuration-object-tp32359.html
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Rename-the-SQL-ANY-type-to-OPAQUE-type-tp34162.html
[9]
https://docs.google.com/document/d/1LxPEzbPuEVWNixb1L_USv0gFgjRMgoZuMsAecS_XvdE/edit?usp=sharing
[10]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISUCSS-FLIP-80-Expression-String-Serializable-and-Deserializable-tp34146.html

Notable Bugs
==

* [FLINK-14429] [1.9.1] [1.8.2] When you run a batch job on YARN in
non-detached mode, it will be reported as SUCCEEDED if when it actually
FAILED. [11]

[11] https://issues.apache.org/jira/browse/FLINK-14429

Events, Blog Posts, Misc
===

* This discussion on the dev@ mailing list might be interesting to follow
for people using the StreamingFileSink or BucketingSink with S3.  [12]

* There will be full-day meetup with six talks in the Bangalore Kafka Group
on the 2nd of November including at least three Flink talks by *Timo Walter*
(Ververica), *Shashank Agarwal* (Razorpay) and *Rasyid Hakim* (GoJek).  [13]

[12]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/performances-of-S3-writing-with-many-buckets-in-parallel-tp34021p34050.html
[13] https://www.meetup.com/Bangalore-Apache-Kafka-Group/events/265285812/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2019/41

2019-10-13 Thread Konstantin Knauf
Dear community,

happy to share a small community update this week with Flink Forward
Europe, "Stateful Functions" and a bit more.

Flink Development
==

* [api] Stephan proposes to contribute* Stateful Function (statefun.io
<http://statefun.io>)* to Apache Flink. Stateful Functions were announced
by Ververica earlier this week at Flink Forward Europe. It is an Actor-like
API, which makes it easier to to write general purpose event-driven
application on top of Flink. There has been a lot of positive feedback so
far. [1]

* [releases] The vote/verification of *Flink 1.9.1 RC1* is still ongoing.
[2]

* [python] Dian Fu started a discussion to *drop Python 2 support* in Flink
1.10. It looks like there is a consensus for this due to Python 2's EOL in
January 2020 and additional external dependencies (e.g. Beam) added for the
the 1.10 release increasing the burden of continued  support.  [3]

* [config] In a previous update, I covered FLIP-54 "Evolve ConfigOption and
Configuration". In the course of the discussion this FLIP has turned out to
be intertwined with too many other areas and ongoing developments. Hence,
Timo proposed to split it up into three FLIPS. The first of these is *FLIP-77
"Introduce ConfigOptions with Data Types"*. It proposes add information
about the described type to ConfigOptions. [4]

* [checkpointing] Biao Liu has started a survey about the usage of the "
*ExternallyInducedSource*" and "*WithMasterCheckpoint*" interfaces. He is
currently working on these and is looking for feedback from existing users.
[5]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/PROPOSAL-Contribute-Stateful-Functions-to-Apache-Flink-tp33913.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-1-9-1-release-candidate-1-tp33637.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Drop-Python-2-support-for-1-10-tp33824.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-77-Introduce-ConfigOptions-with-Data-Types-tp33902.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-How-do-you-use-ExternallyInducedSource-or-WithMasterCheckpointHook-tp33864.html

Notable Bugs
==

Nothing came to my attention :)

Events, Blog Posts, Misc
===

* This week the community gathered in Berlin for Flink Forward Europe. The
recordings of the talks will probably be available next week already. Check
https://twitter.com/flinkforward for updates.

* There will be full-day meetup with six talks in the Bangalore Kafka Group
on the 2nd of November including at least three Flink talks by *Timo Walter*
(Ververica), *Shashank Agarwal* (Razorpay) and *Rasyid Hakim* (GoJek).  [6]

[6] https://www.meetup.com/Bangalore-Apache-Kafka-Group/events/265285812/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2019/39

2019-09-29 Thread Konstantin Knauf
Dear community,

happy to share this week's community update with news about Flink 1.10 &
1.9.1, two FLIPs for better programmatic job and cluster control,
improvements to the web user interface and a bit more.

Flink Development
==

* [releases] Jark has started a discussion on releasing a first patch
release for Flink 1.9. Looking at the discussion I expect a first release
candidate soonish. [1]

* [releases] Yu Li has shared a small progress report on Flink 1.10, which
gives a very good overview of ongoing FLIPs including their status. [2]

* [ui] Yadong has started a discussion on contributing improvements to the
Flink Web UI. It includes a variety of pretty cool changes such as better
overview over the cluster resources, flame graphs, additional backpressure
monitoring and much more. Best you check it out yourself. [3]

* [python] FLIP-58 aims to support stateless Python UDFs in the Table API.
Wei has shared a design document to support the usage of Python
dependencies inside such UDFs and is looking for feedback. [4]

* [sql] There a currently three ongoing votes for previously covered FLIPs
in the SQL ecosystem (FLIP-66 (Time-Attribute in SQL DDL), FLIP-57
(Function Catalog), FLIP-68 (Modular Plugins)). [5,6,7]

* [client] A couple of weeks ago Zili had started a discussion on how to
improve Flink's API for job submission and job and cluster management. As a
result of this discussion Kostas has now shared a design document to
refactor the (Stream)ExecutionEnvironments. The basic idea is to move the
responsibility of job submission out of the ExecutionEnvironment into
Executors. This would result in only one ExecutionEnvironment per API in
the future, and one Executor per environment (YARN, Standalone, ...). The
ExecutionEnvironment would then choose the correct Executor based on its
configuration. [8]

* [client] The second FLIP that originated from this discussion was also
shared by Zili this week and proposes to expose JobClient, ClusterClient
and ClusterDescriptor to users to programmatically control Flink Jobs and
Clusters during their runtime. [9]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-9-1-tp33343.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Progress-of-Apache-Flink-1-10-1-tp33570.html
[3]
https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit#
[4]
https://docs.google.com/document/d/1vq5J3TSyhscQXbpRhz-Yd3KCX62PBJeC_a_h3amUvJ4/edit#heading=h.lvy7nudjmhjd
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-66-Support-Time-Attribute-in-SQL-DDL-2-tp33513.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-57-Rework-FunctionCatalog-tp33373.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-68-Extend-Core-Table-System-with-Modular-Plugins-tp33372.html
[8]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-73%3A+Introducing+Executors+for+job+submission
[9]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-74:+Flink+JobClient+API

Notable Bugs
==

[FLINK-14145] [1.9.0] When preferring checkpoints (as opposed to
savepoints) for recovery, the wrong checkpoint could be chosen for recovery
in certain situations. Fixed for 1.9.1. [10]

[FLINK-13708] [1.9.0] When calling "execute" twice on the same table
environment, transformations from the first execution will also be part of
the second execution. PR pending, might get into 1.9.1. [11]

[10] https://issues.apache.org/jira/browse/FLINK-14145
[11] https://issues.apache.org/jira/browse/FLINK-13708

Events, Blog Posts, Misc
===

* Flink Forward Europe is getting close, 7th - 9th of October. Keynotes by
AWS, Cloudera, Cloud Foundry Foundation, Google and Ververica. [12]
* Upcoming Meetups
* Tomorrow * Dean Shaw* and *Max McKittrick* will talk about click
stream analysis at scale at Capital One at the next dataCouncil.ai NYC Data
Engineering meetup. [13]

[12] https://europe-2019.flink-forward.org/
[13]
https://www.meetup.com/DataCouncil-AI-NYC-Data-Engineering-Science/events/264748638/

Cheers,

Konstantin

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2019/38

2019-09-22 Thread Konstantin Knauf
Dear community,

happy to share this week's community update with a FLIP for the Pulsar
Connector contribution, three FLIPs for the SQL Ecosystem (plugin system,
computed columns, extended support for views), and a bit more. Enjoy!

Flink Development
==

* [connectors]  After some discussion on the mailing list over the last
weeks, Sijie has opened a FLIP to add an exactly-once Pulsar Connector
(DataStream API, Table API, Catalog API) to Flink. [1]

* [sql] The discussion on supporting Hive built-in function in Flink SQL
lead to FLIP-69 to extend the core table system with modular plugins. [2]
While focusing on function modules as a first step, the FLIP proposes a
more general plugin system also covering user defined types, operators,
rules, etc. As part of this FLIP the existing functions in Flink SQL would
also be migrated into a "CorePlugin". [3]

* [sql] Danny proposes to add support for computed columns in Flink SQL (as
FLIP-10). [4]

* [sql] Zhenghua has started a discussion on extending support for VIEWs in
Flink SQL (as FLIP-71).  He proposes to add support to store views in a
catalog and to add support for "SHOW VIEWS" and "DESCRIBE VIEW". [5]


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-72%3A+Introduce+Pulsar+Connector
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Modular+Plugins
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-68-Extend-Core-Table-System-with-Modular-Plugins-tp33161.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-70-Support-Computed-Column-for-Flink-SQL-tp33126.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-71-E2E-View-support-in-Flink-SQL-tp33131.html

Notable Bugs
==

* [FLINK-14010] [ 1.9.0] [1.8.2] [1.7.2] [yarn] When the Flink Yarn
Application Manager receives a shut down request by the YARN Resource
Manager, the Flink cluster can get into an inconsistent state, where
leaderhship for JobManager, ResourceManager and Dispatcher components is
split between two master processes. Tison is working on a fix. [6]

* [FLINK-14107] [ 1.9.0] [ 1.8.2] [kinesis] When using event time alignment
with the Kinsesis Consumer the consumer might deadlock in one corner case.
Fixed for 1.9.1 and 1.8.3. [7]

[6] https://issues.apache.org/jira/browse/FLINK-14010
[7] https://issues.apache.org/jira/browse/FLINK-14107

Events, Blog Posts, Misc
===

* Upcoming Meetups
* *Enrico Canzonieri* of Yelp and *David Massart* of Tentative will
share their Apache Flink user stories of Yelp and BNP Paribas at the next *Bay
Area Apache Flink Meetup* 24th of September.  [8]

* *Ana Esguerra* has published a blog post on how to run Flink on YARN with
Kerberos for Kafka & YARN. [9]

[8] https://www.meetup.com/Bay-Area-Apache-Flink-Meetup/events/262680261/
[9]
https://medium.com/@minyodev/apache-flink-on-yarn-with-kerberos-authentication-adeb62ef47d2

Cheers,

Konstantin

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2019/37

2019-09-15 Thread Konstantin Knauf
Dear community,

happy to share this week's community update with the release of Flink
1.8.2, more work in the area of dynamic resource management, three
proposals in the SQL space and a bit more.

Flink Development
==

* [releases] *Flink 1.8.2* has been released. [1]

* [resource management] Xintong has started a discussion on FLIP-56 "*Dynamic
Slot Allocation*". With upcoming FLIP-53 [2] different Tasks can have
different resource requirements leading to different resources requirements
for the TaskSlots these Tasks are deployed into. On the other hand,
TaskSlots are currently statically configured during TaskManager creation.
FLIP-56 proposes to start TaskManagers without any TaskSlots initially and
to dynamically create/release TaskSlots based on the resource requirements
of the Tasks to be deployed. [3]

* [connectors] Stephan has started a discussion to drop the Kafka connector
for *Kafka 0.9* and *0.10.* If you are relying on these connectors, it's a
good idea to join the discussion. [4]

* [connectors] The discussion on contributing a *Pulsar connector to Flink*
seems to be converging towards adding the connector soon based on the
existing source interface, but clearly documenting (experimental?) that in
the long-term only a version based on the new source interface will be
supported. [5]

* [sql] As a spin-off of the discussion on reworking the function catalog,
Bowen has started a discussion to support loading *external built-in
function via modules*. [6]

* [sql] Dawid proposes to rework the support for* temporary objects*
(tables/views, function) in the Table API. [7]

* [sql] Jark started a discussion on FLIP-66 to add support for *time
attributes in Flink's SQL DDL*. This means you will be able to specify
event time and processing time columns (including watermarking strategy)
via the SQL DDL. [8]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Apache-Flink-1-8-2-released-tp33050.html
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Contribute-Pulsar-Flink-connector-back-to-Flink-tp32538p33055.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Drop-older-versions-of-Kafka-Connectors-0-9-0-10-for-Flink-1-10-tp32954.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-modular-built-in-functions-tp32918.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-64-Support-for-Temporary-Objects-in-Table-module-tp32684.html
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-66-Support-time-attribute-in-SQL-DDL-tp32766.html

Notable Bugs
==

Overall, there were only 20 ticket bug tickets (excluding test
instabilities) updated this week, none of which seem to be particularly
relevant for a wider audience ;)

Events, Blog Posts, Misc
===

* *Zili Chen *is now an Apache Flink committer. Congrats! [9]
* *Fabian* and *Seth* have published a blog post on the newly added *State
Processor API* on the Flink blog [10]
* *Marta *has published a Flink Community Update on the Flink blog focusing
on stats, events and ongoing initiatives in the Flink Community.[11]
* *Google *has open-sourced a *Kubernetes Operator for Flink* [12] which
automates cluster creation and job submission on Kubernetes.

[9]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Zili-Chen-becomes-a-Flink-committer-tp32961.html
[10] https://flink.apache.org/feature/2019/09/13/state-processor-api.html
[11] https://flink.apache.org/news/2019/09/10/community-update.html
[12] https://github.com/GoogleCloudPlatform/flink-on-k8s-operator

Cheers,

Konstantin

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2019/33-36

2019-09-08 Thread Konstantin Knauf
Zagrebin-becomes-a-Flink-committer-tp31735p31931.html
[31] https://europe-2019.flink-forward.org/training-program
[32] https://www.meetup.com/Bay-Area-Apache-Flink-Meetup/events/262680261/
[33] https://www.meetup.com/Apache-Flink-London-Meetup/events/264123672/

Cheers,

Konstantin

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Follow us @VervericaData Ververica <https://www.ververica.com/>


--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Tony) Cheng


[ANNOUNCE] Weekly Community Update 2019/32

2019-08-11 Thread Konstantin Knauf
 in Apache Flink by *Yun Tang*. [15]

[12]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Hequn-becomes-a-Flink-committer-tp31378.html
[13] https://www.meetup.com/seattle-flink/events/263782233
[14]
https://www.eventbrite.com/e/apache-pulsar-meetup-beijing-tickets-67849484635
[15] https://www.meetup.com/acm-sf/events/263768407/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


Planned Absences: 10.08.2019 - 31.08.2019, 05.09. - 06.09.2019


--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen


[ANNOUNCE] Weekly Community Update 2019/31

2019-08-04 Thread Konstantin Knauf
Dear community,

happy to share this week's community update with news about a Flink on
PyPi, code style proposals, Flink 1.9.0 RC1, Flame Graphs in the WebUI and
a bit more.

As always, please feel free to add additional updates and news to this
thread!

Flink Development
===

* [development process] Andrey has opened three threads following up on the
recently published "code style and quality guide". They deal with the usage
of Java Optional [1] (tl;dr: only as return type in public APIs) ,
Collections with initial capacity [2] (tl;dr: only if trivial) and how to
wrap long arguments lists and chained method calls [3] (tl;dr: yes, but
details still open).

* [python] Jingcheng has started a voting thread on publishing PyFlink to
PyPi. Name will be "apache-flink" and the account will be managed by the
Apache Flink PMC. The vote has passed unanimously. [4]

* [metrics] David proposed to add a CPU flame graph [5] for each Task to
the WebUI (similar to the backpressure monitor). A CPU flame graph is a
visualisation method for stack trace samples, which makes it easier to
determine hot code paths. This has been well received and David is looking
for a comitter to sheperd the effort. [6]

* [releases] Kurt announced a second preview RC (RC1) for Flink 1.9.0 to
facilitate ongoing release testing. There will be no vote on this RC. [7]

* [filesystems] Aljoscha proposed to removev flink-mapr-fs module from
Apache Flink due to recent problems pulling its dependencies. If removed
the MapR filesytem could still be used with Flink's HadoopFilesystem [8]

* [client] Tison has published a first design document on the recently
discussed improvements to Flink's client API. [9]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-CODE-STYLE-Usage-of-Java-Optional-tp31240.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-CODE-STYLE-Create-collections-always-with-initial-capacity-tp31229.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-CODE-STYLE-Breaking-long-function-argument-lists-and-chained-method-calls-tp31242.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Publish-the-PyFlink-into-PyPI-tp31201.html
[5] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-CPU-flame-graph-for-a-job-vertex-in-web-UI-tp31188.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/PREVIEW-Apache-Flink-1-9-0-release-candidate-1-tp31233.html
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Removing-the-flink-mapr-fs-module-tp31080.html
[9]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-client-api-enhancement-for-downstream-project-tp25844.html

Notable Bugs
===

Nothing for 1.6/1.7/1.8 that came to my attention.

Events, Blog Posts, Misc


* *Nico Kruber *published the second part of his series on Flink's network
stack. This time about metrics, monitoring and backpressure. (This slipped
through last week.) [10]

[10] https://flink.apache.org/2019/07/23/flink-network-stack-2.html

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen


[ANNOUNCE] Weekly Community Update 2019/30

2019-07-28 Thread Konstantin Knauf
Dear community,

happy to share this week's community update with news about a Flink on
PyPi, two proposal on the Table API/SQL and a bit more.

As always, please feel free to add additional updates and news to this
thread!

Flink Development
===

[sql] *Temporary tables* are always registered with the built-in catalog.
If the built-in catalog is not the default catalog, access to temporary
tables needs to be fully qualified, which breaks current functionality,
e.g. in the SQL Client. Dawid and Bowen stumbled across this during release
testing and are proposing three different solutions for the upcoming
release and beyond [1]

[python] The discussion on how to release (and if) to release the *Python
Table API on PyPi *is converging to publish it under the name
"apache-flink" and to include the binary Java/Scala release in the Python
package for convenience. [2]

[fault-tolerance] In the discussion thread about supporting *"at-most-once"*
delivery Stephan proposed to work towards this feature outside of Flink
core and to offer it to the community through the upcoming Flink ecosystem
website. This seems to get a lot of agreement. [3]

[table-api] Xuannan has started a discussion on *"FLIP-48: Pluggable
Intermediate Result Storage"*. This will make the result cache for
interactive programming (FLIP-36] pluggable. [4]

[releases] Flink 1.9.0 release testing is still ongoing. There is no RC1
yet. [5]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-temporary-tables-in-SQL-API-tp30831.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Publish-the-PyFlink-into-PyPI-td30095.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Allow-at-most-once-delivery-in-case-of-failures-tp29464.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-48-Pluggable-Intermediate-Result-Storage-tp31001.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Progress-updates-for-Apache-Flink-1-9-0-release-tp30565.html
<https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=328=FLINK=FLINK-13249>

Notable Bugs
===

Not to much. Still a lot of blockers and critical issues in the Flink
project due to release testing.

* [FLINK-13372] [1.8.1] [1.7.2] [1.6.4] There is a bug in the timestamp
conversion of the Table API (due to timezone handling, of course). The
issue contains a good explanation of the problem. [6]

[6] https://issues.apache.org/jira/browse/FLINK-13372

Events, Blog Posts, Misc


* *Kurt* is now part of the Apache Flink PMC. Congratulations! [7]
* *Zhijiang* is now an Apache Flink committer. Congrats! [8]
* Upcoming Meetups
* On 30th of July *Berecz Dániel *of Ekata talks about a "Parameter
Server on Flink, an approach for model-parallel machine learning" at the
Budapest Scala Meetup [9]
* On 30th of *July Grzegorz *Liter talks about “Stream processing with
Apache Flink”  at the EPAM tech talks in Wroclaw. [10]

[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Kete-Young-is-now-part-of-the-Flink-PMC-tp30884p30998.html
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Zhijiang-Wang-has-been-added-as-a-committer-to-the-Flink-project-tp30830p30943.html
[9] https://www.meetup.com/budapest-scala/events/263025323/
[10] https://www.meetup.com/EPAM-Tech-Talks-Wroc%C5%82aw/events/263027190/

Cheers,

Konstantin (@snntrable)

-- 

Konstantin Knauf | Solutions Architect

+49 160 91394525


--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen


  1   2   3   >