Re: Flink native k8s integration vs. operator

2022-01-13 Thread Xintong Song
Thanks for volunteering to drive this effort, Marton, Thomas and Gyula.

Looking forward to the public discussion. Please feel free to reach out if
there's anything you need from us.

Thank you~

Xintong Song



On Fri, Jan 14, 2022 at 8:27 AM Chenya Zhang 
wrote:

> Thanks Thomas, Gyula, and Marton for driving this effort! It would greatly
> ease the adoption of Apache Flink on Kubernetes and help to address the
> current operational pain points as mentioned. Look forward to the proposal
> and more discussions!
>
> Best,
> Chenya
>
> On Thu, Jan 13, 2022 at 12:15 PM Márton Balassi 
> wrote:
>
>> Hi All,
>>
>> I am pleased to see the level of enthusiasm and technical consideration
>> already emerging in this thread. I wholeheartedly support building an
>> operator and endorsing it via placing it under the Apache Flink umbrella
>> (as a separate repository) as the current lack of it is clearly becoming
>> an
>> adoption bottleneck for large scale Flink users. The next logical step is
>> to write a FLIP to agree on the technical details, so that we can put
>> forward the proposal to the Flink PMC for creating a new repository with a
>> clear purpose in mind. I volunteer to work with Thomas and Gyula on the
>> initial wording on the proposal which we will put up for public discussion
>> in the coming weeks.
>>
>> Best,
>> Marton
>>
>> On Thu, Jan 13, 2022 at 9:22 AM Konstantin Knauf 
>> wrote:
>>
>> > Hi Thomas,
>> >
>> > Yes, I was referring to a separate repository under Apache Flink.
>> >
>> > Cheers,
>> >
>> > Konstantin
>> >
>> > On Thu, Jan 13, 2022 at 6:19 AM Thomas Weise  wrote:
>> >
>> >> Hi everyone,
>> >>
>> >> Thanks for the feedback and discussion. A few additional thoughts:
>> >>
>> >> [Konstantin] > With respect to common lifecycle management operations:
>> >> these features are
>> >> > not available (within Apache Flink) for any of the other resource
>> >> providers
>> >> > (YARN, Standalone) either. From this perspective, I wouldn't consider
>> >> this
>> >> > a shortcoming of the Kubernetes integration.
>> >>
>> >> I think time and evolution of the ecosystem are factors to consider as
>> >> well. The state and usage of Flink was much different when YARN
>> >> integration was novel. Expectations are different today and the
>> >> lifecycle functionality provided by an operator may as well be
>> >> considered essential to support the concept of a Flink application on
>> >> k8s. After few years learning from operator experience outside of
>> >> Flink it might be a good time to fill the gap.
>> >>
>> >> [Konstantin] > I still believe that we should keep this focus on low
>> >> > level composable building blocks (like Jobs and Snapshots) in Apache
>> >> Flink
>> >> > to make it easy for everyone to build fitting higher level
>> abstractions
>> >> > like a FlinkApplication Custom Resource on top of it.
>> >>
>> >> I completely agree that it is important that the basic functions of
>> >> Flink are solid and continued focus is necessary. Thanks for sharing
>> >> the pointers, these are great improvements. At the same time,
>> >> ecosystem, contributor base and user spectrum are growing. There have
>> >> been significant additions in many areas of Flink including connectors
>> >> and higher level abstractions like statefun, SQL and Python. It's also
>> >> evident from additional repositories/subprojects that we have in Flink
>> >> today.
>> >>
>> >> [Konstantin] > Having said this, if others in the community have the
>> >> capacity to push and
>> >> > *maintain* a somewhat minimal "reference" Kubernetes Operator for
>> Apache
>> >> > Flink, I don't see any blockers. If or when this happens, I'd see
>> some
>> >> > clear benefits of using a separate repository (easier independent
>> >> > versioning and releases, different build system & tooling (go, I
>> >> assume)).
>> >>
>> >> Naturally different contributors to the project have different focus.
>> >> Let's find out if there is strong enough interest to take this on and
>> >> strong enough commitment to maintain. As I see it, there is a
>> >> tremendous amount of internal investment going into operationalizing
>> >> Flink within many companies. Improvements to the operational side of
>> >> Flink like the operator would complement Flink nicely. I assume that
>> >> you are referring to a separate repository within Apache Flink, which
>> >> would give it the chance to achieve better sustainability than the
>> >> existing external operator efforts. There is also the fact that some
>> >> organizations which are heavily invested in operationalizing Flink are
>> >> allowing contributing to Apache Flink itself but less so to arbitrary
>> >> github projects. Regarding the tooling, it could well turn out that
>> >> Java is a good alternative given the ecosystem focus and that there is
>> >> an opportunity for reuse in certain aspects (metrics, logging etc.).
>> >>
>> >> [Yang] > I think Xintong has given a strong point why we introduced
>> >> the native K8s 

Re: [FEEDBACK] Metadata Platforms / Catalogs / Lineage integration

2022-01-13 Thread JIN FENG
Hi
I am a software engineer from Xiaomi.

Last year we used metacat(https://github.com/Netflix/metacat) to manage all
metadata, including Hive, Kudu, Doris, Iceberg, Elasticsearch, Talos
(Xiaomi self-developed message queue), Mysql, Tidb..

Metacat is well compatible with the hive-metastore protocol. Therefore, we
can directly use FlinkHiveCatalog to connect metacat to create different
Tables, including Hive tables, or other generic types of tables.

All systems are abstracted into catalog.database.table structure. So in
FlinkSQL we can access any registered table through catalog.database.table.

In addition, metacat uniformly manages all table creation, deletion, and
partitioning operations. By analyzing the audit log of metacat, we can
easily obtain the DDL lineage of different tables.

At the same time, with the use of ranger(https://github.com/ranger/ranger),
we have added permission control to the Flink framework, and all permission
information will be saved in the form of catalog.database.table.

We also modified the logic related to FlinkJobListener. By exposing the
JobGraph, we can obtain the lineage information of the job by parsing the
JobGraph.

To sum up, unified metadata management is convenient for managing different
systems and connecting to Flink, and at the same time, it is convenient for
unified permission management and obtaining table-related lineage
information.


On Fri, Jan 14, 2022 at 3:14 AM Maciej Obuchowski <
obuchowski.mac...@gmail.com> wrote:

> Hello,
>
> I'm an OpenLineage committer - and previously, a minor Flink contributor.
> OpenLineage community is very interested in conversation about Flink
> metadata, and we'll be happy to cooperate with the Flink community.
>
> Best,
> Maciej Obuchowski
>
>
>
> czw., 13 sty 2022 o 18:12 Martijn Visser 
> napisał(a):
> >
> > Hi all,
> >
> > @Andrew thanks for sharing that!
> >
> > @Tero good point, I should have clarified the purpose. I want to
> understand
> > what "metadata platforms" tools are used or evaluated by the Flink
> > community, what's their purpose for using such a tool (is it as a generic
> > catalogue, as a data discovery tool, is lineage the important part etc)
> and
> > what problems are people trying to solve with them. This space is
> > developing rapidly and there are many open source and commercial tools
> > popping up/growing, which is also why I'm trying to keep an open vision
> on
> > how this space is evolving.
> >
> > If the Flink community wants to integrate with metadata tools, I fully
> > agree that ideally we do that via standards. My perception is at this
> > moment that no clear standard has yet been established. You mentioned
> > open-metadata.org, but I believe https://openlineage.io/ is also an
> > alternative standard.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Thu, 13 Jan 2022 at 17:00, Tero Paananen 
> wrote:
> >
> > > > I'm currently checking out different metadata platforms, such as
> > > Amundsen [1] and Datahub [2]. In short, these types of tools try to
> address
> > > problems related to topics such as data discovery, data lineage and an
> > > overall data catalogue.
> > > >
> > > > I'm reaching out to the Dev and User mailing lists to get some
> feedback.
> > > It would really help if you could spend a couple of minutes to let me
> know
> > > if you already use either one of the two mentioned metadata platforms
> or
> > > another one, or are you evaluating such tools? If so, is that for the
> > > purpose as a catalogue, for lineage or anything else? Any type of
> feedback
> > > on these types of tools is appreciated.
> > >
> > > I hope you don't mind answers off-list.
> > >
> > > You didn't say what purpose you're evaluating these tools for, but if
> > > you're evaluating platforms for integration with Flink, I wouldn't
> > > approach it with a particular product in mind. Rather I'd create some
> > > sort of facility to propagate metadata and/or lineage information in a
> > > generic way and allow Flink users to plug in their favorite metadata
> > > tool. Using standards like OpenLineage, for example. I believe Egeria
> > > is also trying to create an open standard for metadata.;
> > >
> > > If you're evaluating data catalogs for personal use or use in a
> > > particular project, Andrew's answer about the Wikimedia evaluation is
> > > a good start. It's missing OpenMetadata (https://open-metadata.org/).
> > > That one is showing a LOT of promise. Wikimedia's evaluation is also
> > > missing industry leading commercial products (understandably, given
> > > their mission). Collibra and Alation probably the ones that pop up
> > > most often.
> > >
> > > I have personally looked into both DataHub and Amundsen. My high level
> > > feedback is that DataHub is overengineered, and using proprietary
> > > LinkedIn technology platform(s), which aren't widely used anywhere.
> > > Amundsen is much less flexible than DataHub and quite basic in its
> > > functionality. If you need anything beyond 

Unsubscribe

2022-01-13 Thread Jerome Li
Unsubscribe



FlinkKafkaConsumer and FlinkKafkaProducer and Kafka Cluster Migration

2022-01-13 Thread Alexey Trenikhun
Hello,

Currently we are using FlinkKafkaConsumer and FlinkKafkaProducer and planning 
to migrate to different Kafka cluster. Are boostrap servers, username and 
passwords part of FlinkKafkaConsumer and FlinkKafkaProducer ? So if we take 
savepoint  change boostrap server and credentials and start job from savepoint, 
will it use new connection properties and old one from savepoint?
Assuming that we connected to new Kafka cluster, I think that 
FlinkKafkaConsumer offsets will be reset, because new Kafka cluster will be 
empty and FlinkKafkaConsumer will not be able to seek to stored offsets, am I 
right?

Thanks,
Alexey


Re: Flink native k8s integration vs. operator

2022-01-13 Thread Chenya Zhang
Thanks Thomas, Gyula, and Marton for driving this effort! It would greatly
ease the adoption of Apache Flink on Kubernetes and help to address the
current operational pain points as mentioned. Look forward to the proposal
and more discussions!

Best,
Chenya

On Thu, Jan 13, 2022 at 12:15 PM Márton Balassi 
wrote:

> Hi All,
>
> I am pleased to see the level of enthusiasm and technical consideration
> already emerging in this thread. I wholeheartedly support building an
> operator and endorsing it via placing it under the Apache Flink umbrella
> (as a separate repository) as the current lack of it is clearly becoming an
> adoption bottleneck for large scale Flink users. The next logical step is
> to write a FLIP to agree on the technical details, so that we can put
> forward the proposal to the Flink PMC for creating a new repository with a
> clear purpose in mind. I volunteer to work with Thomas and Gyula on the
> initial wording on the proposal which we will put up for public discussion
> in the coming weeks.
>
> Best,
> Marton
>
> On Thu, Jan 13, 2022 at 9:22 AM Konstantin Knauf 
> wrote:
>
> > Hi Thomas,
> >
> > Yes, I was referring to a separate repository under Apache Flink.
> >
> > Cheers,
> >
> > Konstantin
> >
> > On Thu, Jan 13, 2022 at 6:19 AM Thomas Weise  wrote:
> >
> >> Hi everyone,
> >>
> >> Thanks for the feedback and discussion. A few additional thoughts:
> >>
> >> [Konstantin] > With respect to common lifecycle management operations:
> >> these features are
> >> > not available (within Apache Flink) for any of the other resource
> >> providers
> >> > (YARN, Standalone) either. From this perspective, I wouldn't consider
> >> this
> >> > a shortcoming of the Kubernetes integration.
> >>
> >> I think time and evolution of the ecosystem are factors to consider as
> >> well. The state and usage of Flink was much different when YARN
> >> integration was novel. Expectations are different today and the
> >> lifecycle functionality provided by an operator may as well be
> >> considered essential to support the concept of a Flink application on
> >> k8s. After few years learning from operator experience outside of
> >> Flink it might be a good time to fill the gap.
> >>
> >> [Konstantin] > I still believe that we should keep this focus on low
> >> > level composable building blocks (like Jobs and Snapshots) in Apache
> >> Flink
> >> > to make it easy for everyone to build fitting higher level
> abstractions
> >> > like a FlinkApplication Custom Resource on top of it.
> >>
> >> I completely agree that it is important that the basic functions of
> >> Flink are solid and continued focus is necessary. Thanks for sharing
> >> the pointers, these are great improvements. At the same time,
> >> ecosystem, contributor base and user spectrum are growing. There have
> >> been significant additions in many areas of Flink including connectors
> >> and higher level abstractions like statefun, SQL and Python. It's also
> >> evident from additional repositories/subprojects that we have in Flink
> >> today.
> >>
> >> [Konstantin] > Having said this, if others in the community have the
> >> capacity to push and
> >> > *maintain* a somewhat minimal "reference" Kubernetes Operator for
> Apache
> >> > Flink, I don't see any blockers. If or when this happens, I'd see some
> >> > clear benefits of using a separate repository (easier independent
> >> > versioning and releases, different build system & tooling (go, I
> >> assume)).
> >>
> >> Naturally different contributors to the project have different focus.
> >> Let's find out if there is strong enough interest to take this on and
> >> strong enough commitment to maintain. As I see it, there is a
> >> tremendous amount of internal investment going into operationalizing
> >> Flink within many companies. Improvements to the operational side of
> >> Flink like the operator would complement Flink nicely. I assume that
> >> you are referring to a separate repository within Apache Flink, which
> >> would give it the chance to achieve better sustainability than the
> >> existing external operator efforts. There is also the fact that some
> >> organizations which are heavily invested in operationalizing Flink are
> >> allowing contributing to Apache Flink itself but less so to arbitrary
> >> github projects. Regarding the tooling, it could well turn out that
> >> Java is a good alternative given the ecosystem focus and that there is
> >> an opportunity for reuse in certain aspects (metrics, logging etc.).
> >>
> >> [Yang] > I think Xintong has given a strong point why we introduced
> >> the native K8s integration, which is active resource management.
> >> > I have a concrete example for this in the production. When a K8s node
> >> is down, the standalone K8s deployment will take longer
> >> > recovery time based on the K8s eviction time(IIRC, default is 5
> >> minutes). For the native K8s integration, Flink RM could be aware of the
> >> > TM heartbeat lost and allocate a new 

Re: Flink native k8s integration vs. operator

2022-01-13 Thread Márton Balassi
Hi All,

I am pleased to see the level of enthusiasm and technical consideration
already emerging in this thread. I wholeheartedly support building an
operator and endorsing it via placing it under the Apache Flink umbrella
(as a separate repository) as the current lack of it is clearly becoming an
adoption bottleneck for large scale Flink users. The next logical step is
to write a FLIP to agree on the technical details, so that we can put
forward the proposal to the Flink PMC for creating a new repository with a
clear purpose in mind. I volunteer to work with Thomas and Gyula on the
initial wording on the proposal which we will put up for public discussion
in the coming weeks.

Best,
Marton

On Thu, Jan 13, 2022 at 9:22 AM Konstantin Knauf  wrote:

> Hi Thomas,
>
> Yes, I was referring to a separate repository under Apache Flink.
>
> Cheers,
>
> Konstantin
>
> On Thu, Jan 13, 2022 at 6:19 AM Thomas Weise  wrote:
>
>> Hi everyone,
>>
>> Thanks for the feedback and discussion. A few additional thoughts:
>>
>> [Konstantin] > With respect to common lifecycle management operations:
>> these features are
>> > not available (within Apache Flink) for any of the other resource
>> providers
>> > (YARN, Standalone) either. From this perspective, I wouldn't consider
>> this
>> > a shortcoming of the Kubernetes integration.
>>
>> I think time and evolution of the ecosystem are factors to consider as
>> well. The state and usage of Flink was much different when YARN
>> integration was novel. Expectations are different today and the
>> lifecycle functionality provided by an operator may as well be
>> considered essential to support the concept of a Flink application on
>> k8s. After few years learning from operator experience outside of
>> Flink it might be a good time to fill the gap.
>>
>> [Konstantin] > I still believe that we should keep this focus on low
>> > level composable building blocks (like Jobs and Snapshots) in Apache
>> Flink
>> > to make it easy for everyone to build fitting higher level abstractions
>> > like a FlinkApplication Custom Resource on top of it.
>>
>> I completely agree that it is important that the basic functions of
>> Flink are solid and continued focus is necessary. Thanks for sharing
>> the pointers, these are great improvements. At the same time,
>> ecosystem, contributor base and user spectrum are growing. There have
>> been significant additions in many areas of Flink including connectors
>> and higher level abstractions like statefun, SQL and Python. It's also
>> evident from additional repositories/subprojects that we have in Flink
>> today.
>>
>> [Konstantin] > Having said this, if others in the community have the
>> capacity to push and
>> > *maintain* a somewhat minimal "reference" Kubernetes Operator for Apache
>> > Flink, I don't see any blockers. If or when this happens, I'd see some
>> > clear benefits of using a separate repository (easier independent
>> > versioning and releases, different build system & tooling (go, I
>> assume)).
>>
>> Naturally different contributors to the project have different focus.
>> Let's find out if there is strong enough interest to take this on and
>> strong enough commitment to maintain. As I see it, there is a
>> tremendous amount of internal investment going into operationalizing
>> Flink within many companies. Improvements to the operational side of
>> Flink like the operator would complement Flink nicely. I assume that
>> you are referring to a separate repository within Apache Flink, which
>> would give it the chance to achieve better sustainability than the
>> existing external operator efforts. There is also the fact that some
>> organizations which are heavily invested in operationalizing Flink are
>> allowing contributing to Apache Flink itself but less so to arbitrary
>> github projects. Regarding the tooling, it could well turn out that
>> Java is a good alternative given the ecosystem focus and that there is
>> an opportunity for reuse in certain aspects (metrics, logging etc.).
>>
>> [Yang] > I think Xintong has given a strong point why we introduced
>> the native K8s integration, which is active resource management.
>> > I have a concrete example for this in the production. When a K8s node
>> is down, the standalone K8s deployment will take longer
>> > recovery time based on the K8s eviction time(IIRC, default is 5
>> minutes). For the native K8s integration, Flink RM could be aware of the
>> > TM heartbeat lost and allocate a new one timely.
>>
>> Thanks for sharing this, we should evaluate it as part of a proposal.
>> If we can optimize recovery or scaling with active resource management
>> then perhaps it is worth to support it through the operator.
>> Previously mentioned operators all rely on the standalone model.
>>
>> Cheers,
>> Thomas
>>
>> On Wed, Jan 12, 2022 at 3:21 AM Konstantin Knauf 
>> wrote:
>> >
>> > cc dev@
>> >
>> > Hi Thomas, Hi everyone,
>> >
>> > Thank you for starting this discussion and sorry for chiming 

Re: [FEEDBACK] Metadata Platforms / Catalogs / Lineage integration

2022-01-13 Thread Maciej Obuchowski
Hello,

I'm an OpenLineage committer - and previously, a minor Flink contributor.
OpenLineage community is very interested in conversation about Flink
metadata, and we'll be happy to cooperate with the Flink community.

Best,
Maciej Obuchowski



czw., 13 sty 2022 o 18:12 Martijn Visser  napisał(a):
>
> Hi all,
>
> @Andrew thanks for sharing that!
>
> @Tero good point, I should have clarified the purpose. I want to understand
> what "metadata platforms" tools are used or evaluated by the Flink
> community, what's their purpose for using such a tool (is it as a generic
> catalogue, as a data discovery tool, is lineage the important part etc) and
> what problems are people trying to solve with them. This space is
> developing rapidly and there are many open source and commercial tools
> popping up/growing, which is also why I'm trying to keep an open vision on
> how this space is evolving.
>
> If the Flink community wants to integrate with metadata tools, I fully
> agree that ideally we do that via standards. My perception is at this
> moment that no clear standard has yet been established. You mentioned
> open-metadata.org, but I believe https://openlineage.io/ is also an
> alternative standard.
>
> Best regards,
>
> Martijn
>
> On Thu, 13 Jan 2022 at 17:00, Tero Paananen  wrote:
>
> > > I'm currently checking out different metadata platforms, such as
> > Amundsen [1] and Datahub [2]. In short, these types of tools try to address
> > problems related to topics such as data discovery, data lineage and an
> > overall data catalogue.
> > >
> > > I'm reaching out to the Dev and User mailing lists to get some feedback.
> > It would really help if you could spend a couple of minutes to let me know
> > if you already use either one of the two mentioned metadata platforms or
> > another one, or are you evaluating such tools? If so, is that for the
> > purpose as a catalogue, for lineage or anything else? Any type of feedback
> > on these types of tools is appreciated.
> >
> > I hope you don't mind answers off-list.
> >
> > You didn't say what purpose you're evaluating these tools for, but if
> > you're evaluating platforms for integration with Flink, I wouldn't
> > approach it with a particular product in mind. Rather I'd create some
> > sort of facility to propagate metadata and/or lineage information in a
> > generic way and allow Flink users to plug in their favorite metadata
> > tool. Using standards like OpenLineage, for example. I believe Egeria
> > is also trying to create an open standard for metadata.;
> >
> > If you're evaluating data catalogs for personal use or use in a
> > particular project, Andrew's answer about the Wikimedia evaluation is
> > a good start. It's missing OpenMetadata (https://open-metadata.org/).
> > That one is showing a LOT of promise. Wikimedia's evaluation is also
> > missing industry leading commercial products (understandably, given
> > their mission). Collibra and Alation probably the ones that pop up
> > most often.
> >
> > I have personally looked into both DataHub and Amundsen. My high level
> > feedback is that DataHub is overengineered, and using proprietary
> > LinkedIn technology platform(s), which aren't widely used anywhere.
> > Amundsen is much less flexible than DataHub and quite basic in its
> > functionality. If you need anything beyond what it already offers,
> > good luck.
> >
> > We dumped Amundsen in favor of OpenMetadata a few months back. We
> > don't have enough data points to fully evaluate OpenMetadata yet.
> >
> > -TPP
> >


Re: [FEEDBACK] Metadata Platforms / Catalogs / Lineage integration

2022-01-13 Thread Martijn Visser
Hi all,

@Andrew thanks for sharing that!

@Tero good point, I should have clarified the purpose. I want to understand
what "metadata platforms" tools are used or evaluated by the Flink
community, what's their purpose for using such a tool (is it as a generic
catalogue, as a data discovery tool, is lineage the important part etc) and
what problems are people trying to solve with them. This space is
developing rapidly and there are many open source and commercial tools
popping up/growing, which is also why I'm trying to keep an open vision on
how this space is evolving.

If the Flink community wants to integrate with metadata tools, I fully
agree that ideally we do that via standards. My perception is at this
moment that no clear standard has yet been established. You mentioned
open-metadata.org, but I believe https://openlineage.io/ is also an
alternative standard.

Best regards,

Martijn

On Thu, 13 Jan 2022 at 17:00, Tero Paananen  wrote:

> > I'm currently checking out different metadata platforms, such as
> Amundsen [1] and Datahub [2]. In short, these types of tools try to address
> problems related to topics such as data discovery, data lineage and an
> overall data catalogue.
> >
> > I'm reaching out to the Dev and User mailing lists to get some feedback.
> It would really help if you could spend a couple of minutes to let me know
> if you already use either one of the two mentioned metadata platforms or
> another one, or are you evaluating such tools? If so, is that for the
> purpose as a catalogue, for lineage or anything else? Any type of feedback
> on these types of tools is appreciated.
>
> I hope you don't mind answers off-list.
>
> You didn't say what purpose you're evaluating these tools for, but if
> you're evaluating platforms for integration with Flink, I wouldn't
> approach it with a particular product in mind. Rather I'd create some
> sort of facility to propagate metadata and/or lineage information in a
> generic way and allow Flink users to plug in their favorite metadata
> tool. Using standards like OpenLineage, for example. I believe Egeria
> is also trying to create an open standard for metadata.;
>
> If you're evaluating data catalogs for personal use or use in a
> particular project, Andrew's answer about the Wikimedia evaluation is
> a good start. It's missing OpenMetadata (https://open-metadata.org/).
> That one is showing a LOT of promise. Wikimedia's evaluation is also
> missing industry leading commercial products (understandably, given
> their mission). Collibra and Alation probably the ones that pop up
> most often.
>
> I have personally looked into both DataHub and Amundsen. My high level
> feedback is that DataHub is overengineered, and using proprietary
> LinkedIn technology platform(s), which aren't widely used anywhere.
> Amundsen is much less flexible than DataHub and quite basic in its
> functionality. If you need anything beyond what it already offers,
> good luck.
>
> We dumped Amundsen in favor of OpenMetadata a few months back. We
> don't have enough data points to fully evaluate OpenMetadata yet.
>
> -TPP
>


Re: [DISCUSS] Future of Per-Job Mode

2022-01-13 Thread Thomas Weise
Regarding session mode:

## Session Mode
* main() method executed in client

Session mode also supports execution of the main method on Jobmanager
with submission through REST API. That's how Flinkk k8s operators like
[1] work. It's actually an important capability because it allows for
allocation of the cluster resources prior to taking down the previous
job during upgrade when the goal is optimization for availability.

Thanks,
Thomas

[1] https://github.com/lyft/flinkk8soperator

On Thu, Jan 13, 2022 at 12:32 AM Konstantin Knauf  wrote:
>
> Hi everyone,
>
> I would like to discuss and understand if the benefits of having Per-Job
> Mode in Apache Flink outweigh its drawbacks.
>
>
> *# Background: Flink's Deployment Modes*
> Flink currently has three deployment modes. They differ in the following
> dimensions:
> * main() method executed on Jobmanager or Client
> * dependencies shipped by client or bundled with all nodes
> * number of jobs per cluster & relationship between job and cluster
> lifecycle* (supported resource providers)
>
> ## Application Mode
> * main() method executed on Jobmanager
> * dependencies already need to be available on all nodes
> * dedicated cluster for all jobs executed from the same main()-method
> (Note: applications with more than one job, currently still significant
> limitations like missing high-availability). Technically, a session cluster
> dedicated to all jobs submitted from the same main() method.
> * supported by standalone, native kubernetes, YARN
>
> ## Session Mode
> * main() method executed in client
> * dependencies are distributed from and by the client to all nodes
> * cluster is shared by multiple jobs submitted from different clients,
> independent lifecycle
> * supported by standalone, Native Kubernetes, YARN
>
> ## Per-Job Mode
> * main() method executed in client
> * dependencies are distributed from and by the client to all nodes
> * dedicated cluster for a single job
> * supported by YARN only
>
>
> *# Reasons to Keep** There are use cases where you might need the
> combination of a single job per cluster, but main() method execution in the
> client. This combination is only supported by per-job mode.
> * It currently exists. Existing users will need to migrate to either
> session or application mode.
>
>
> *# Reasons to Drop** With Per-Job Mode and Application Mode we have two
> modes that for most users probably do the same thing. Specifically, for
> those users that don't care where the main() method is executed and want to
> submit a single job per cluster. Having two ways to do the same thing is
> confusing.
> * Per-Job Mode is only supported by YARN anyway. If we keep it, we should
> work towards support in Kubernetes and Standalone, too, to reduce special
> casing.
> * Dropping per-job mode would reduce complexity in the code and allow us to
> dedicate more resources to the other two deployment modes.
> * I believe with session mode and application mode we have to easily
> distinguishable and understandable deployment modes that cover Flink's use
> cases:
>* session mode: olap-style, interactive jobs/queries, short lived batch
> jobs, very small jobs, traditional cluster-centric deployment mode (fits
> the "Hadoop world")
>* application mode: long-running streaming jobs, large scale &
> heterogenous jobs (resource isolation!), application-centric deployment
> mode (fits the "Kubernetes world")
>
>
> *# Call to Action*
> * Do you use per-job mode? If so, why & would you be able to migrate to one
> of the other methods?
> * Am I missing any pros/cons?
> * Are you in favor of dropping per-job mode midterm?
>
> Cheers and thank you,
>
> Konstantin
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk


Re: [FEEDBACK] Metadata Platforms / Catalogs / Lineage integration

2022-01-13 Thread Pedro Silva
Hello,

I'm part of the DataHub community and working in collaboration with the
company behind it: http://acryldata.io
Happy to have a conversation or clarify any questions you may have on
DataHub :)

Have a nice day!

Em qui., 13 de jan. de 2022 às 15:33, Andrew Otto 
escreveu:

> Hello!  The Wikimedia Foundation is currently doing a similar evaluation
> (although we are not currently including any Flink considerations).
>
>
> https://wikitech.wikimedia.org/wiki/Data_Catalog_Application_Evaluation_Rubric
>
> More details will be published there as folks keep working on this.
> Hope that helps a little bit! :)
>
> -Andrew Otto
>
> On Thu, Jan 13, 2022 at 10:27 AM Martijn Visser 
> wrote:
>
>> Hi everyone,
>>
>> I'm currently checking out different metadata platforms, such as Amundsen
>> [1] and Datahub [2]. In short, these types of tools try to address problems
>> related to topics such as data discovery, data lineage and an overall data
>> catalogue.
>>
>> I'm reaching out to the Dev and User mailing lists to get some feedback.
>> It would really help if you could spend a couple of minutes to let me know
>> if you already use either one of the two mentioned metadata platforms or
>> another one, or are you evaluating such tools? If so, is that for
>> the purpose as a catalogue, for lineage or anything else? Any type of
>> feedback on these types of tools is appreciated.
>>
>> Best regards,
>>
>> Martijn
>>
>> [1] https://github.com/amundsen-io/amundsen/
>> [2] https://github.com/linkedin/datahub
>>
>>
>>


Re: [FEEDBACK] Metadata Platforms / Catalogs / Lineage integration

2022-01-13 Thread Andrew Otto
Hello!  The Wikimedia Foundation is currently doing a similar evaluation
(although we are not currently including any Flink considerations).

https://wikitech.wikimedia.org/wiki/Data_Catalog_Application_Evaluation_Rubric

More details will be published there as folks keep working on this.
Hope that helps a little bit! :)

-Andrew Otto

On Thu, Jan 13, 2022 at 10:27 AM Martijn Visser 
wrote:

> Hi everyone,
>
> I'm currently checking out different metadata platforms, such as Amundsen
> [1] and Datahub [2]. In short, these types of tools try to address problems
> related to topics such as data discovery, data lineage and an overall data
> catalogue.
>
> I'm reaching out to the Dev and User mailing lists to get some feedback.
> It would really help if you could spend a couple of minutes to let me know
> if you already use either one of the two mentioned metadata platforms or
> another one, or are you evaluating such tools? If so, is that for
> the purpose as a catalogue, for lineage or anything else? Any type of
> feedback on these types of tools is appreciated.
>
> Best regards,
>
> Martijn
>
> [1] https://github.com/amundsen-io/amundsen/
> [2] https://github.com/linkedin/datahub
>
>
>


[FEEDBACK] Metadata Platforms / Catalogs / Lineage integration

2022-01-13 Thread Martijn Visser
Hi everyone,

I'm currently checking out different metadata platforms, such as Amundsen
[1] and Datahub [2]. In short, these types of tools try to address problems
related to topics such as data discovery, data lineage and an overall data
catalogue.

I'm reaching out to the Dev and User mailing lists to get some feedback. It
would really help if you could spend a couple of minutes to let me know if
you already use either one of the two mentioned metadata platforms or
another one, or are you evaluating such tools? If so, is that for
the purpose as a catalogue, for lineage or anything else? Any type of
feedback on these types of tools is appreciated.

Best regards,

Martijn

[1] https://github.com/amundsen-io/amundsen/
[2] https://github.com/linkedin/datahub


Re: Upgrade to flink 1.14.2 and using new Data Source and Sink API

2022-01-13 Thread Mika Naylor

Hi Daniel,

These logs look pretty normal. As for the -1 epochs, depending on which version
you're using, I think that this might apply:

"For a producer which is being initialized for the first time, the producerId 
and
epoch will be set to -1. For a producer which is reinitializing, a positive 
valued
producerId and epoch must be provided."

(From: 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=89068820#:~:text=For%20a%20producer%20which%20is,and%20epoch%20must%20be%20provided)

I think that these logs are being created when a producer is being
initialized, likely because the logging mode is INFO, which is quite
verbose.

Kind regards,
Mika

On 13.01.2022 13:40, Daniel Peled wrote:

Hi everyone,

We have upgraded our flink version from 1.13.5 to 1.14.2
We are using the new kafkaSource and KafkaSink (instead of FlinkKafkaConsumer
and FlinkKafkaProducer)
After the upgrade, we *keep seeing* these log messages in TM logs
Is this OK ?
Are we doing something wrong ?

BR,
Danny

[image: image.png]




Mika Naylor
https://autophagy.io


Flink 1.14.2 - Log4j2 -Dlog4j.configurationFile is ignored and falls back to default /opt/flink/conf/log4j-console.properties

2022-01-13 Thread Tamir Sagi
Hey All

I'm Running Flink 1.14.2, it seems like it ignores system property 
-Dlog4j.configurationFile and
falls back to /opt/flink/conf/log4j-console.properties

I enabled debug log for log4j2  ( -Dlog4j2.debug)

DEBUG StatusLogger Catching
 java.io.FileNotFoundException: file:/opt/flink/conf/log4j-console.properties 
(No such file or directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(Unknown Source)
at java.base/java.io.FileInputStream.(Unknown Source)
at 
org.apache.logging.log4j.core.config.ConfigurationFactory.getInputFromString(ConfigurationFactory.java:370)
at 
org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.getConfiguration(ConfigurationFactory.java:513)
at 
org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.getConfiguration(ConfigurationFactory.java:499)
at 
org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.getConfiguration(ConfigurationFactory.java:422)
at 
org.apache.logging.log4j.core.config.ConfigurationFactory.getConfiguration(ConfigurationFactory.java:322)
at 
org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:695)
at 
org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:716)
at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:270)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:155)
at 
org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:47)
at org.apache.logging.log4j.LogManager.getContext(LogManager.java:196)
at 
org.apache.logging.log4j.spi.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:137)
at 
org.apache.logging.slf4j.Log4jLoggerFactory.getContext(Log4jLoggerFactory.java:55)
at 
org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:47)
at 
org.apache.logging.slf4j.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:33)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:329)
at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:349)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils.(AkkaRpcServiceUtils.java:55)
at 
org.apache.flink.runtime.rpc.akka.AkkaRpcSystem.remoteServiceBuilder(AkkaRpcSystem.java:42)
at 
org.apache.flink.runtime.rpc.akka.CleanupOnCloseRpcSystem.remoteServiceBuilder(CleanupOnCloseRpcSystem.java:77)
at 
org.apache.flink.runtime.rpc.RpcUtils.createRemoteRpcService(RpcUtils.java:184)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:300)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:243)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$1(ClusterEntrypoint.java:193)
at 
org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:190)
at 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:617)

Where I see the property is being loaded while deploying the cluster

source:{
class:org.apache.flink.configuration.GlobalConfiguration
method:loadYAMLResource
file:GlobalConfiguration.java
line:213
}
message:Loading configuration property: env.java.opts, 
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dumps
-Dlog4j.configurationFile=/opt/log4j2/log4j2.xml -Dlog4j2.debug=true

in addition,  following the documentation[1], it seems like Flink comes with 
default log4j properties files located in /opt/flink/conf

looking into that dir once the cluster is deployed, only flink-conf.yaml is 
there.

[cid:08bf37ec-7fed-4caf-a08d-3d27f2edb5d5]

Docker file content

FROM flink:1.14.2-scala_2.12-java11
ARG JAR_FILE
COPY target/${JAR_FILE} $FLINK_HOME/usrlib/flink-job.jar
ADD log4j2.xml /opt/log4j2/log4j2.xml


It perfectly works in 1.12.2 with the same log4j2.xml file and system property.

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/advanced/logging/#configuring-log4j-2


Best,
Tamir




Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.


Re: [DISCUSS] Future of Per-Job Mode

2022-01-13 Thread Biao Geng
Hi Konstantin,

Thanks a lot for starting this discussion! I hope my thoughts and
experiences why users use Per-Job Mode, especially in YARN can help:
#1. Per-job mode makes managing dependencies easier: I have met some
customers who used Per-Job Mode to submit jobs with a lot of local
user-defined jars by using '-C' option directly. They do not need to upload
these jars to some remote file system(e.g. HDFS) first, which makes their
life easier.
#2. In YARN mode, currently, there are some limitations of Application Mode:
in this jira(https://issues.apache.org/jira/browse/FLINK-24897) that I am
working on, we find that YARN Application Mode do not support `usrlib` very
well, which makes it hard to use FlinkUserCodeClassLoader to load classes
in user-defined jars.

I believe above 2 points, especially #2, can be reassured as we enhance the
YARN Application Mode later but I think it is worthwhile to consider
dependency management more carefully before we make decisions.

Best,
Biao Geng


Konstantin Knauf  于2022年1月13日周四 16:32写道:

> Hi everyone,
>
> I would like to discuss and understand if the benefits of having Per-Job
> Mode in Apache Flink outweigh its drawbacks.
>
>
> *# Background: Flink's Deployment Modes*
> Flink currently has three deployment modes. They differ in the following
> dimensions:
> * main() method executed on Jobmanager or Client
> * dependencies shipped by client or bundled with all nodes
> * number of jobs per cluster & relationship between job and cluster
> lifecycle* (supported resource providers)
>
> ## Application Mode
> * main() method executed on Jobmanager
> * dependencies already need to be available on all nodes
> * dedicated cluster for all jobs executed from the same main()-method
> (Note: applications with more than one job, currently still significant
> limitations like missing high-availability). Technically, a session cluster
> dedicated to all jobs submitted from the same main() method.
> * supported by standalone, native kubernetes, YARN
>
> ## Session Mode
> * main() method executed in client
> * dependencies are distributed from and by the client to all nodes
> * cluster is shared by multiple jobs submitted from different clients,
> independent lifecycle
> * supported by standalone, Native Kubernetes, YARN
>
> ## Per-Job Mode
> * main() method executed in client
> * dependencies are distributed from and by the client to all nodes
> * dedicated cluster for a single job
> * supported by YARN only
>
>
> *# Reasons to Keep** There are use cases where you might need the
> combination of a single job per cluster, but main() method execution in the
> client. This combination is only supported by per-job mode.
> * It currently exists. Existing users will need to migrate to either
> session or application mode.
>
>
> *# Reasons to Drop** With Per-Job Mode and Application Mode we have two
> modes that for most users probably do the same thing. Specifically, for
> those users that don't care where the main() method is executed and want to
> submit a single job per cluster. Having two ways to do the same thing is
> confusing.
> * Per-Job Mode is only supported by YARN anyway. If we keep it, we should
> work towards support in Kubernetes and Standalone, too, to reduce special
> casing.
> * Dropping per-job mode would reduce complexity in the code and allow us
> to dedicate more resources to the other two deployment modes.
> * I believe with session mode and application mode we have to easily
> distinguishable and understandable deployment modes that cover Flink's use
> cases:
>* session mode: olap-style, interactive jobs/queries, short lived batch
> jobs, very small jobs, traditional cluster-centric deployment mode (fits
> the "Hadoop world")
>* application mode: long-running streaming jobs, large scale &
> heterogenous jobs (resource isolation!), application-centric deployment
> mode (fits the "Kubernetes world")
>
>
> *# Call to Action*
> * Do you use per-job mode? If so, why & would you be able to migrate to
> one of the other methods?
> * Am I missing any pros/cons?
> * Are you in favor of dropping per-job mode midterm?
>
> Cheers and thank you,
>
> Konstantin
>
> --
>
> Konstantin Knauf
>
> https://twitter.com/snntrable
>
> https://github.com/knaufk
>


[DISCUSS] Future of Per-Job Mode

2022-01-13 Thread Konstantin Knauf
Hi everyone,

I would like to discuss and understand if the benefits of having Per-Job
Mode in Apache Flink outweigh its drawbacks.


*# Background: Flink's Deployment Modes*
Flink currently has three deployment modes. They differ in the following
dimensions:
* main() method executed on Jobmanager or Client
* dependencies shipped by client or bundled with all nodes
* number of jobs per cluster & relationship between job and cluster
lifecycle* (supported resource providers)

## Application Mode
* main() method executed on Jobmanager
* dependencies already need to be available on all nodes
* dedicated cluster for all jobs executed from the same main()-method
(Note: applications with more than one job, currently still significant
limitations like missing high-availability). Technically, a session cluster
dedicated to all jobs submitted from the same main() method.
* supported by standalone, native kubernetes, YARN

## Session Mode
* main() method executed in client
* dependencies are distributed from and by the client to all nodes
* cluster is shared by multiple jobs submitted from different clients,
independent lifecycle
* supported by standalone, Native Kubernetes, YARN

## Per-Job Mode
* main() method executed in client
* dependencies are distributed from and by the client to all nodes
* dedicated cluster for a single job
* supported by YARN only


*# Reasons to Keep** There are use cases where you might need the
combination of a single job per cluster, but main() method execution in the
client. This combination is only supported by per-job mode.
* It currently exists. Existing users will need to migrate to either
session or application mode.


*# Reasons to Drop** With Per-Job Mode and Application Mode we have two
modes that for most users probably do the same thing. Specifically, for
those users that don't care where the main() method is executed and want to
submit a single job per cluster. Having two ways to do the same thing is
confusing.
* Per-Job Mode is only supported by YARN anyway. If we keep it, we should
work towards support in Kubernetes and Standalone, too, to reduce special
casing.
* Dropping per-job mode would reduce complexity in the code and allow us to
dedicate more resources to the other two deployment modes.
* I believe with session mode and application mode we have to easily
distinguishable and understandable deployment modes that cover Flink's use
cases:
   * session mode: olap-style, interactive jobs/queries, short lived batch
jobs, very small jobs, traditional cluster-centric deployment mode (fits
the "Hadoop world")
   * application mode: long-running streaming jobs, large scale &
heterogenous jobs (resource isolation!), application-centric deployment
mode (fits the "Kubernetes world")


*# Call to Action*
* Do you use per-job mode? If so, why & would you be able to migrate to one
of the other methods?
* Am I missing any pros/cons?
* Are you in favor of dropping per-job mode midterm?

Cheers and thank you,

Konstantin

-- 

Konstantin Knauf

https://twitter.com/snntrable

https://github.com/knaufk


Re: OutOfMemoryError: Java heap space while implmentating flink sql api

2022-01-13 Thread Martijn Visser
Hi Ronak,

As mentioned in the Flink Community & Project information [1] the primary
place for support are the mailing lists and user support should go to the
User mailing list. Keep in mind that this is still done by the community
and set up for asynchronous handling. If you want to have quick
acknowledgment or SLAs, there are vendors that can offer commercial support
on Flink.

You can't compare the two statements, because in your first join you're
also applying a TUMBLE. That means that you're not only maintaining state
for your join, but also for your window. You're also using the old Group
Window Aggregation function and it's recommended to use Window TVFs due to
better performance optimizations [2]

Best regards,

Martijn

[1]
https://flink.apache.org/community.html#how-do-i-get-help-from-apache-flink
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-agg/#group-window-aggregation


On Thu, 13 Jan 2022 at 06:33, Ronak Beejawat (rbeejawa) 
wrote:

> HI Martijn,
>
>
>
> I posted the below query both the places(flink mailing list and stack
> overflow) to get a quick response on it.
>
> Please let me know the exact poc / mailing list to post my quries if it is
> causing trouble, so at least we can get quick acknowledgement on the issues
> reported.
>
>
>
> Ok let me ask the below question in a simpler way
>
>
>
> *Join 1 *
>
>
>
> select * from cdrTable left join  ccmversionsumapTable cvsm ON
> (cdrTable.version = ccmversionsumapTable.ccmversion) group by
> TUMBLE(PROCTIME(), INTERVAL '1' MINUTE), …
>
> (2.5 million left join with 23 records it is failing to compute and
> throwing heap error)
>
> Note: This is small join example as compared to Join2 condition as shown
> below. here we are using different connector for reading cdrTable -> kafka
> connector and ccmversionsumapTable -> jdbc connector
>
>
>
> *Join 2*
>
>
>
> select * from cdrTable left join  left join cmrTable cmr on (cdr.org_id =
> cmr.org_id AND cdr.cluster_id = cmr.cluster_id AND
> cdr.globalcallid_callmanagerid = cmr.globalcallid_callmanagerid AND
> cdr.globalcallid_callid = cmr.globalcallid_callid AND
> (cdr.origlegcallidentifier = cmr.callidentifier OR
> cdr.destlegcallidentifier = cmr.callidentifier)), … (2.5 million left join
> with 5 million it is computing properly without any heap error )
>
> Note: This is bigger join example as compared to Join1 condition as shown
> above. here we are using same connector for reading cdrTable , cmrTable ->
> kafka connector
>
>
>
> So the question is with small join condition it is throwing heap error and
> with bigger set of join it is working properly . Please help us on this
>
>
>
> Thanks
>
> Ronak Beejawat
>
>
>
> *From: *Martijn Visser 
> *Date: *Wednesday, 12 January 2022 at 7:43 PM
> *To: *dev 
> *Cc: *commun...@flink.apache.org ,
> user@flink.apache.org , Hang Ruan <
> ruanhang1...@gmail.com>, Shrinath Shenoy K (sshenoyk) ,
> Jayaprakash Kuravatti (jkuravat) , Krishna Singitam
> (ksingita) , Nabhonil Sinha (nasinha) <
> nasi...@cisco.com>, Vibhor Jain (vibhjain) ,
> Raghavendra Jsv (rjsv) , Arun Yadav (aruny) <
> ar...@cisco.com>, Avi Sanwal (asanwal) 
> *Subject: *Re: OutOfMemoryError: Java heap space while implmentating
> flink sql api
>
> Hi Ronak,
>
>
>
> I would like to ask you to stop cross-posting to all the Flink mailing
> lists and then also post the same question to Stackoverflow. Both the
> mailing lists and Stackoverflow are designed for asynchronous communication
> and you should allow the community some days to address your question.
>
>
>
> Joins are state heavy. As mentioned in the documentation [1] "Thus, the
> required state for computing the query result might grow infinitely
> depending on the number of distinct input rows of all input tables and
> intermediate join results."
>
>
>
> Best regards,
>
>
>
> Martijn
>
>
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/
>
>
>
>
>
> On Wed, 12 Jan 2022 at 11:06, Ronak Beejawat (rbeejawa) <
> rbeej...@cisco.com.invalid> wrote:
>
> Hi Team,
>
> I was trying to implement flink sql api join with 2 tables it is throwing
> error OutOfMemoryError: Java heap space . PFB screenshot for flink cluster
> memory details.
> [Flink Memory Model][1]
>
>
>   [1]: https://i.stack.imgur.com/AOnQI.png
>
> **PFB below code snippet which I was trying:**
> ```
> EnvironmentSettings settings =
> EnvironmentSettings.newInstance().inStreamingMode().build();
> TableEnvironment tableEnv = TableEnvironment.create(settings);
>
>
> tableEnv.getConfig().getConfiguration().setString("table.optimizer.agg-phase-strategy",
> "TWO_PHASE");
> tableEnv.getConfig().getConfiguration().setString("table.optimizer.join-reorder-enabled",
> "true");
> tableEnv.getConfig().getConfiguration().setString("table.exec.resource.default-parallelism",
> "16");
>
> tableEnv.executeSql("CREATE TEMPORARY TABLE ccmversionsumapTable (\r\n"
>  + "  

Re: Flink native k8s integration vs. operator

2022-01-13 Thread Konstantin Knauf
Hi Thomas,

Yes, I was referring to a separate repository under Apache Flink.

Cheers,

Konstantin

On Thu, Jan 13, 2022 at 6:19 AM Thomas Weise  wrote:

> Hi everyone,
>
> Thanks for the feedback and discussion. A few additional thoughts:
>
> [Konstantin] > With respect to common lifecycle management operations:
> these features are
> > not available (within Apache Flink) for any of the other resource
> providers
> > (YARN, Standalone) either. From this perspective, I wouldn't consider
> this
> > a shortcoming of the Kubernetes integration.
>
> I think time and evolution of the ecosystem are factors to consider as
> well. The state and usage of Flink was much different when YARN
> integration was novel. Expectations are different today and the
> lifecycle functionality provided by an operator may as well be
> considered essential to support the concept of a Flink application on
> k8s. After few years learning from operator experience outside of
> Flink it might be a good time to fill the gap.
>
> [Konstantin] > I still believe that we should keep this focus on low
> > level composable building blocks (like Jobs and Snapshots) in Apache
> Flink
> > to make it easy for everyone to build fitting higher level abstractions
> > like a FlinkApplication Custom Resource on top of it.
>
> I completely agree that it is important that the basic functions of
> Flink are solid and continued focus is necessary. Thanks for sharing
> the pointers, these are great improvements. At the same time,
> ecosystem, contributor base and user spectrum are growing. There have
> been significant additions in many areas of Flink including connectors
> and higher level abstractions like statefun, SQL and Python. It's also
> evident from additional repositories/subprojects that we have in Flink
> today.
>
> [Konstantin] > Having said this, if others in the community have the
> capacity to push and
> > *maintain* a somewhat minimal "reference" Kubernetes Operator for Apache
> > Flink, I don't see any blockers. If or when this happens, I'd see some
> > clear benefits of using a separate repository (easier independent
> > versioning and releases, different build system & tooling (go, I
> assume)).
>
> Naturally different contributors to the project have different focus.
> Let's find out if there is strong enough interest to take this on and
> strong enough commitment to maintain. As I see it, there is a
> tremendous amount of internal investment going into operationalizing
> Flink within many companies. Improvements to the operational side of
> Flink like the operator would complement Flink nicely. I assume that
> you are referring to a separate repository within Apache Flink, which
> would give it the chance to achieve better sustainability than the
> existing external operator efforts. There is also the fact that some
> organizations which are heavily invested in operationalizing Flink are
> allowing contributing to Apache Flink itself but less so to arbitrary
> github projects. Regarding the tooling, it could well turn out that
> Java is a good alternative given the ecosystem focus and that there is
> an opportunity for reuse in certain aspects (metrics, logging etc.).
>
> [Yang] > I think Xintong has given a strong point why we introduced
> the native K8s integration, which is active resource management.
> > I have a concrete example for this in the production. When a K8s node is
> down, the standalone K8s deployment will take longer
> > recovery time based on the K8s eviction time(IIRC, default is 5
> minutes). For the native K8s integration, Flink RM could be aware of the
> > TM heartbeat lost and allocate a new one timely.
>
> Thanks for sharing this, we should evaluate it as part of a proposal.
> If we can optimize recovery or scaling with active resource management
> then perhaps it is worth to support it through the operator.
> Previously mentioned operators all rely on the standalone model.
>
> Cheers,
> Thomas
>
> On Wed, Jan 12, 2022 at 3:21 AM Konstantin Knauf 
> wrote:
> >
> > cc dev@
> >
> > Hi Thomas, Hi everyone,
> >
> > Thank you for starting this discussion and sorry for chiming in late.
> >
> > I agree with Thomas' and David's assessment of Flink's "Native Kubernetes
> > Integration", in particular, it does actually not integrate well with the
> > Kubernetes ecosystem despite being called "native" (tooling, security
> > concerns).
> >
> > With respect to common lifecycle management operations: these features
> are
> > not available (within Apache Flink) for any of the other resource
> providers
> > (YARN, Standalone) either. From this perspective, I wouldn't consider
> this
> > a shortcoming of the Kubernetes integration. Instead, we have been
> focusing
> > our efforts in Apache Flink on the operations of a single Job, and left
> > orchestration and lifecycle management that spans multiple Jobs to
> > ecosystem projects. I still believe that we should keep this focus on low
> > level composable building blocks (like