Re: [VOTE] Creating an Apache Flink slack workspace
+1 (non-binding) Best Regards Peter Huang On Wed, May 18, 2022 at 9:33 PM Leonard Xu wrote: > Thanks Xintong for driving this. > > +1 > > Best, > Leonard > > > 2022年5月19日 上午11:11,Zhou, Brian 写道: > > > > +1 (non-binding) Slack is a better place for code sharing and quick > discussion. > > > > Regards, > > Brian Zhou > > > > -Original Message- > > From: Yun Tang > > Sent: Thursday, May 19, 2022 10:32 AM > > To: dev > > Subject: Re: [VOTE] Creating an Apache Flink slack workspace > > > > > > [EXTERNAL EMAIL] > > > > Thanks Xintong for driving this. +1 from my side. > > > > > > Best > > Yun Tang > > > > From: Zhu Zhu > > Sent: Wednesday, May 18, 2022 17:08 > > To: dev > > Subject: Re: [VOTE] Creating an Apache Flink slack workspace > > > > +1 (binding) > > > > Thanks, > > Zhu > > > > Timo Walther 于2022年5月18日周三 16:52写道: > >> > >> +1 (binding) > >> > >> Thanks, > >> Timo > >> > >> > >> On 17.05.22 20:44, Gyula Fóra wrote: > >>> +1 (binding) > >>> > >>> On Tue, 17 May 2022 at 19:52, Yufei Zhang wrote: > >>> > +1 (nonbinding) > > On Tue, May 17, 2022 at 5:29 PM Márton Balassi < > balassi.mar...@gmail.com> > wrote: > > > +1 (binding) > > > > On Tue, May 17, 2022 at 11:00 AM Jingsong Li > > > wrote: > > > >> Thank Xintong for driving this work. > >> > >> +1 > >> > >> Best, > >> Jingsong > >> > >> On Tue, May 17, 2022 at 4:49 PM Martijn Visser < > martijnvis...@apache.org > >> > >> wrote: > >> > >>> +1 (binding) > >>> > >>> On Tue, 17 May 2022 at 10:38, Yu Li wrote: > >>> > +1 (binding) > > Thanks Xintong for driving this! > > Best Regards, > Yu > > > On Tue, 17 May 2022 at 16:32, Robert Metzger > > >>> wrote: > > > Thanks for starting the VOTE! > > > > +1 (binding) > > > > > > > > On Tue, May 17, 2022 at 10:29 AM Jark Wu > wrote: > > > >> Thank Xintong for driving this work. > >> > >> +1 from my side (binding) > >> > >> Best, > >> Jark > >> > >> On Tue, 17 May 2022 at 16:24, Xintong Song < > > tonysong...@gmail.com> > > wrote: > >> > >>> Hi everyone, > >>> > >>> As previously discussed in [1], I would like to open a vote > on > creating > >> an > >>> Apache Flink slack workspace channel. > >>> > >>> The proposed actions include: > >>> - Creating a dedicated slack workspace with the name Apache > > Flink > that > > is > >>> controlled and maintained by the Apache Flink PMC > >>> - Updating the Flink website about rules for using various > > communication > >>> channels > >>> - Setting up an Archive for the Apache Flink slack > >>> - Revisiting this initiative by the end of 2022 > >>> > >>> The vote will last for at least 72 hours, and will be > accepted > >> by a > >>> consensus of active PMC members. > >>> > >>> Best, > >>> > >>> Xintong > >>> > >> > > > > >>> > >> > > > > >>> > >> > >
Re: [VOTE] Creating an Apache Flink slack workspace
Thanks Xintong for driving this. +1 Best, Leonard > 2022年5月19日 上午11:11,Zhou, Brian 写道: > > +1 (non-binding) Slack is a better place for code sharing and quick > discussion. > > Regards, > Brian Zhou > > -Original Message- > From: Yun Tang > Sent: Thursday, May 19, 2022 10:32 AM > To: dev > Subject: Re: [VOTE] Creating an Apache Flink slack workspace > > > [EXTERNAL EMAIL] > > Thanks Xintong for driving this. +1 from my side. > > > Best > Yun Tang > > From: Zhu Zhu > Sent: Wednesday, May 18, 2022 17:08 > To: dev > Subject: Re: [VOTE] Creating an Apache Flink slack workspace > > +1 (binding) > > Thanks, > Zhu > > Timo Walther 于2022年5月18日周三 16:52写道: >> >> +1 (binding) >> >> Thanks, >> Timo >> >> >> On 17.05.22 20:44, Gyula Fóra wrote: >>> +1 (binding) >>> >>> On Tue, 17 May 2022 at 19:52, Yufei Zhang wrote: >>> +1 (nonbinding) On Tue, May 17, 2022 at 5:29 PM Márton Balassi wrote: > +1 (binding) > > On Tue, May 17, 2022 at 11:00 AM Jingsong Li > wrote: > >> Thank Xintong for driving this work. >> >> +1 >> >> Best, >> Jingsong >> >> On Tue, May 17, 2022 at 4:49 PM Martijn Visser < martijnvis...@apache.org >> >> wrote: >> >>> +1 (binding) >>> >>> On Tue, 17 May 2022 at 10:38, Yu Li wrote: >>> +1 (binding) Thanks Xintong for driving this! Best Regards, Yu On Tue, 17 May 2022 at 16:32, Robert Metzger >>> wrote: > Thanks for starting the VOTE! > > +1 (binding) > > > > On Tue, May 17, 2022 at 10:29 AM Jark Wu wrote: > >> Thank Xintong for driving this work. >> >> +1 from my side (binding) >> >> Best, >> Jark >> >> On Tue, 17 May 2022 at 16:24, Xintong Song < > tonysong...@gmail.com> > wrote: >> >>> Hi everyone, >>> >>> As previously discussed in [1], I would like to open a vote on creating >> an >>> Apache Flink slack workspace channel. >>> >>> The proposed actions include: >>> - Creating a dedicated slack workspace with the name Apache > Flink that > is >>> controlled and maintained by the Apache Flink PMC >>> - Updating the Flink website about rules for using various > communication >>> channels >>> - Setting up an Archive for the Apache Flink slack >>> - Revisiting this initiative by the end of 2022 >>> >>> The vote will last for at least 72 hours, and will be accepted >> by a >>> consensus of active PMC members. >>> >>> Best, >>> >>> Xintong >>> >> > >>> >> > >>> >>
Re: [DISCUSS] Next Flink Kubernetes Operator release timeline
Hi Thomas! Thank you for raising your concerns. I agree that we should document the compatibility guarantees that we expect to provide going forward. Since releasing 0.1 (v1alpha1) we added a great deal of new core features. This required some modification to the CR obviously but actually it only touched the status subresource and the mainly user facing spec itself had only backward compatible changes. In the future we also would like to start moving fields out from the status to some configmaps to make it easier to change logic in the future (this can be done in a backward compatible way). Based on these I think it is fair to say that we expect to keep backward compatibility going forward for the CR itself and I think release version 1.0.0 (with api version v1beta1) shows our confidence in the overall spec and design. With the core features covered I would consider this production ready and 1.0.0 marks it so, based on the wider experience we gain from users we will can further improve the design towards the v1 api release (in a backward compatible way :)) As for the upgrade docs you linked, it explains the process of upgrading from the currently experimental v1alpha1 to the new v1beta1 release. For this release this is the relevant process, but certainly we need to upgrade before the next release. Also you are right that the automation is not there, that again is definitely a blocker for the next release to ensure backward compatibility. We have tickets already for these 2 tasks. [1][2] Cheers Gyula [1] https://issues.apache.org/jira/browse/FLINK-26955 [2] https://issues.apache.org/jira/browse/FLINK-27302 On Thu, May 19, 2022 at 2:26 AM Thomas Weise wrote: > I think before we release 1.0, we need to define and document the > compatibility guarantees. > > At the moment, the CR changes frequently and as was pointed out > earlier, there isn't any automation to ensure changes are backward > compatible. In addition, our documentation still refers to upgrade as > a process that involves removing the prior CRD, which IMO needs to > change for a 1.0 release. > > If we feel that we are not ready to put a compatibility guarantee in > place, then perhaps release the next version as 0.2? > > Thanks, > Thomas > > > [1] > https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/upgrade/ > > On Mon, May 16, 2022 at 5:13 PM Aitozi wrote: > > > > Thanks Gyula. It looks good to me. I could do a favor during the release > > also. > > Please feel free to ping me to help the doc, release and test work :) > > > > Best, > > Aitozi > > > > Yang Wang 于2022年5月16日周一 21:57写道: > > > > > Thanks Gyula for sharing the progress. It is very likely we could have > the > > > first release candidate next Monday. > > > > > > Best, > > > Yang > > > > > > Gyula Fóra 于2022年5月16日周一 20:50写道: > > > > > > > Hi Devs! > > > > > > > > We are on track for our planned 1.0.0 release timeline. There are no > > > > outstanding blocker issues on JIRA for the release. > > > > > > > > There are 3 outstanding new feature PRs. They are all in pretty good > > > shape > > > > and should be merged within a day: > > > > https://github.com/apache/flink-kubernetes-operator/pull/213 > > > > https://github.com/apache/flink-kubernetes-operator/pull/216 > > > > https://github.com/apache/flink-kubernetes-operator/pull/217 > > > > > > > > As we agreed previously we should not merge any more new features for > > > > 1.0.0 and focus our efforts on testing, bug fixes and documentation > for > > > > this week. > > > > > > > > I will cut the release branch tomorrow once these PRs are merged. > And the > > > > target day for the first release candidate is next Monday. > > > > > > > > The release managers for this release will be Yang Wang and myself. > > > > > > > > Cheers, > > > > Gyula > > > > > > > > On Wed, Apr 27, 2022 at 11:28 AM Yang Wang > > > wrote: > > > > > > > >> Thanks @Chesnay Schepler for pointing out > this. > > > >> > > > >> The only public interface the flink-kubernetes-operator provides is > the > > > >> CRD[1]. We are trying to stabilize the CRD from v1beta1. > > > >> If more fields are introduced to support new features(e.g. > standalone > > > >> mode, > > > >> SQL jobs), they should have the default value to ensure > compatibility. > > > >> Currently, we do not have some tools to enforce the compatibility > > > >> guarantees. But we have created a ticket[1] to follow this and hope > it > > > >> could be resolved before releasing 1.0.0. > > > >> > > > >> Just as you said, now is also a good time to think more about the > > > approach > > > >> of releases. Since flink-kubernetes-operator is much simpler than > Flink, > > > >> we > > > >> could have a shorter release cycle. > > > >> Two month for a major release(1.0, 1.1, etc.) is reasonable to me. > And > > > >> this > > > >> could be shorten for the minor releases. Also we need to support at > > > least > > > >> the last two major versions. > > > >> > > > >>
[jira] [Created] (FLINK-27690) Add Pulsar Source connector document
LuNng Wang created FLINK-27690: -- Summary: Add Pulsar Source connector document Key: FLINK-27690 URL: https://issues.apache.org/jira/browse/FLINK-27690 Project: Flink Issue Type: Improvement Components: API / Python Affects Versions: 1.15.0 Reporter: LuNng Wang -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27689) Pulsar Connector support PulsarSchema
LuNng Wang created FLINK-27689: -- Summary: Pulsar Connector support PulsarSchema Key: FLINK-27689 URL: https://issues.apache.org/jira/browse/FLINK-27689 Project: Flink Issue Type: Improvement Components: API / Python Affects Versions: 1.15.0 Reporter: LuNng Wang Currently, Python Pulsar Connector only supports Flink Schema, we also need to support Pulsar Schema. The following is detail. https://github.com/apache/flink/pull/19682#discussion_r872131355 -- This message was sent by Atlassian Jira (v8.20.7#820007)
RE: [VOTE] Creating an Apache Flink slack workspace
+1 (non-binding) Slack is a better place for code sharing and quick discussion. Regards, Brian Zhou -Original Message- From: Yun Tang Sent: Thursday, May 19, 2022 10:32 AM To: dev Subject: Re: [VOTE] Creating an Apache Flink slack workspace [EXTERNAL EMAIL] Thanks Xintong for driving this. +1 from my side. Best Yun Tang From: Zhu Zhu Sent: Wednesday, May 18, 2022 17:08 To: dev Subject: Re: [VOTE] Creating an Apache Flink slack workspace +1 (binding) Thanks, Zhu Timo Walther 于2022年5月18日周三 16:52写道: > > +1 (binding) > > Thanks, > Timo > > > On 17.05.22 20:44, Gyula Fóra wrote: > > +1 (binding) > > > > On Tue, 17 May 2022 at 19:52, Yufei Zhang wrote: > > > >> +1 (nonbinding) > >> > >> On Tue, May 17, 2022 at 5:29 PM Márton Balassi > >> wrote: > >> > >>> +1 (binding) > >>> > >>> On Tue, May 17, 2022 at 11:00 AM Jingsong Li > >>> wrote: > >>> > Thank Xintong for driving this work. > > +1 > > Best, > Jingsong > > On Tue, May 17, 2022 at 4:49 PM Martijn Visser < > >> martijnvis...@apache.org > > wrote: > > > +1 (binding) > > > > On Tue, 17 May 2022 at 10:38, Yu Li wrote: > > > >> +1 (binding) > >> > >> Thanks Xintong for driving this! > >> > >> Best Regards, > >> Yu > >> > >> > >> On Tue, 17 May 2022 at 16:32, Robert Metzger > > wrote: > >> > >>> Thanks for starting the VOTE! > >>> > >>> +1 (binding) > >>> > >>> > >>> > >>> On Tue, May 17, 2022 at 10:29 AM Jark Wu > >> wrote: > >>> > Thank Xintong for driving this work. > > +1 from my side (binding) > > Best, > Jark > > On Tue, 17 May 2022 at 16:24, Xintong Song < > >>> tonysong...@gmail.com> > >>> wrote: > > > Hi everyone, > > > > As previously discussed in [1], I would like to open a vote > >> on > >> creating > an > > Apache Flink slack workspace channel. > > > > The proposed actions include: > > - Creating a dedicated slack workspace with the name Apache > >>> Flink > >> that > >>> is > > controlled and maintained by the Apache Flink PMC > > - Updating the Flink website about rules for using various > >>> communication > > channels > > - Setting up an Archive for the Apache Flink slack > > - Revisiting this initiative by the end of 2022 > > > > The vote will last for at least 72 hours, and will be > >> accepted > by a > > consensus of active PMC members. > > > > Best, > > > > Xintong > > > > >>> > >> > > > > >>> > >> > > >
Re: [VOTE] FLIP-229: Introduces Join Hint for Flink SQL Batch Job
Thanks for driving, +1 (binding) Best Yun Tang From: Jark Wu Sent: Wednesday, May 18, 2022 23:09 To: dev Subject: Re: [VOTE] FLIP-229: Introduces Join Hint for Flink SQL Batch Job +1(binding) Best, Jark On Wed, 18 May 2022 at 14:18, Jingsong Li wrote: > +1 Thanks for driving. > > Best, > Jingsong > > On Wed, May 18, 2022 at 1:33 PM godfrey he wrote: > > > Thanks Xuyang for driving this, +1(binding) > > > > Best, > > Godfrey > > > > Xuyang 于2022年5月17日周二 10:21写道: > > > > > > Hi, everyone. > > > Thanks for your feedback for FLIP-229: Introduces Join Hint for Flink > > SQL Batch Job[1] on the discussion thread[2]. > > > I'd like to start a vote for it. The vote will be open for at least 72 > > hours unless there is an objection or not enough votes. > > > > > > -- > > > > > > Best! > > > Xuyang > > > > > > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job > > > [2] https://lists.apache.org/thread/y668bxyjz66ggtjypfz9t571m0tyvv9h > > >
Re: [VOTE] Creating an Apache Flink slack workspace
Thanks Xintong for driving this. +1 from my side. Best Yun Tang From: Zhu Zhu Sent: Wednesday, May 18, 2022 17:08 To: dev Subject: Re: [VOTE] Creating an Apache Flink slack workspace +1 (binding) Thanks, Zhu Timo Walther 于2022年5月18日周三 16:52写道: > > +1 (binding) > > Thanks, > Timo > > > On 17.05.22 20:44, Gyula Fóra wrote: > > +1 (binding) > > > > On Tue, 17 May 2022 at 19:52, Yufei Zhang wrote: > > > >> +1 (nonbinding) > >> > >> On Tue, May 17, 2022 at 5:29 PM Márton Balassi > >> wrote: > >> > >>> +1 (binding) > >>> > >>> On Tue, May 17, 2022 at 11:00 AM Jingsong Li > >>> wrote: > >>> > Thank Xintong for driving this work. > > +1 > > Best, > Jingsong > > On Tue, May 17, 2022 at 4:49 PM Martijn Visser < > >> martijnvis...@apache.org > > wrote: > > > +1 (binding) > > > > On Tue, 17 May 2022 at 10:38, Yu Li wrote: > > > >> +1 (binding) > >> > >> Thanks Xintong for driving this! > >> > >> Best Regards, > >> Yu > >> > >> > >> On Tue, 17 May 2022 at 16:32, Robert Metzger > > wrote: > >> > >>> Thanks for starting the VOTE! > >>> > >>> +1 (binding) > >>> > >>> > >>> > >>> On Tue, May 17, 2022 at 10:29 AM Jark Wu > >> wrote: > >>> > Thank Xintong for driving this work. > > +1 from my side (binding) > > Best, > Jark > > On Tue, 17 May 2022 at 16:24, Xintong Song < > >>> tonysong...@gmail.com> > >>> wrote: > > > Hi everyone, > > > > As previously discussed in [1], I would like to open a vote > >> on > >> creating > an > > Apache Flink slack workspace channel. > > > > The proposed actions include: > > - Creating a dedicated slack workspace with the name Apache > >>> Flink > >> that > >>> is > > controlled and maintained by the Apache Flink PMC > > - Updating the Flink website about rules for using various > >>> communication > > channels > > - Setting up an Archive for the Apache Flink slack > > - Revisiting this initiative by the end of 2022 > > > > The vote will last for at least 72 hours, and will be > >> accepted > by a > > consensus of active PMC members. > > > > Best, > > > > Xintong > > > > >>> > >> > > > > >>> > >> > > >
Re:[ANNOUNCE] Call for Presentations for ApacheCon Asia 2022 streaming track
退订 At 2022-05-18 19:47:47, "Yu Li" wrote: >Hi everyone, > >ApacheCon Asia [1] will feature the Streaming track for the second year. >Please don't hesitate to submit your proposal if there is an interesting >project or Flink experience you would like to share with us! > >The conference will be online (virtual) and the talks will be pre-recorded. >The deadline of proposal submission is at the end of this month (May 31st). > >See you all there :) > >Best Regards, >Yu > >[1] https://apachecon.com/acasia2022/cfp.html
Re: [DISCUSS] Next Flink Kubernetes Operator release timeline
I think before we release 1.0, we need to define and document the compatibility guarantees. At the moment, the CR changes frequently and as was pointed out earlier, there isn't any automation to ensure changes are backward compatible. In addition, our documentation still refers to upgrade as a process that involves removing the prior CRD, which IMO needs to change for a 1.0 release. If we feel that we are not ready to put a compatibility guarantee in place, then perhaps release the next version as 0.2? Thanks, Thomas [1] https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/upgrade/ On Mon, May 16, 2022 at 5:13 PM Aitozi wrote: > > Thanks Gyula. It looks good to me. I could do a favor during the release > also. > Please feel free to ping me to help the doc, release and test work :) > > Best, > Aitozi > > Yang Wang 于2022年5月16日周一 21:57写道: > > > Thanks Gyula for sharing the progress. It is very likely we could have the > > first release candidate next Monday. > > > > Best, > > Yang > > > > Gyula Fóra 于2022年5月16日周一 20:50写道: > > > > > Hi Devs! > > > > > > We are on track for our planned 1.0.0 release timeline. There are no > > > outstanding blocker issues on JIRA for the release. > > > > > > There are 3 outstanding new feature PRs. They are all in pretty good > > shape > > > and should be merged within a day: > > > https://github.com/apache/flink-kubernetes-operator/pull/213 > > > https://github.com/apache/flink-kubernetes-operator/pull/216 > > > https://github.com/apache/flink-kubernetes-operator/pull/217 > > > > > > As we agreed previously we should not merge any more new features for > > > 1.0.0 and focus our efforts on testing, bug fixes and documentation for > > > this week. > > > > > > I will cut the release branch tomorrow once these PRs are merged. And the > > > target day for the first release candidate is next Monday. > > > > > > The release managers for this release will be Yang Wang and myself. > > > > > > Cheers, > > > Gyula > > > > > > On Wed, Apr 27, 2022 at 11:28 AM Yang Wang > > wrote: > > > > > >> Thanks @Chesnay Schepler for pointing out this. > > >> > > >> The only public interface the flink-kubernetes-operator provides is the > > >> CRD[1]. We are trying to stabilize the CRD from v1beta1. > > >> If more fields are introduced to support new features(e.g. standalone > > >> mode, > > >> SQL jobs), they should have the default value to ensure compatibility. > > >> Currently, we do not have some tools to enforce the compatibility > > >> guarantees. But we have created a ticket[1] to follow this and hope it > > >> could be resolved before releasing 1.0.0. > > >> > > >> Just as you said, now is also a good time to think more about the > > approach > > >> of releases. Since flink-kubernetes-operator is much simpler than Flink, > > >> we > > >> could have a shorter release cycle. > > >> Two month for a major release(1.0, 1.1, etc.) is reasonable to me. And > > >> this > > >> could be shorten for the minor releases. Also we need to support at > > least > > >> the last two major versions. > > >> > > >> Maybe the standalone mode support is a big enough feature for version > > 2.0. > > >> > > >> > > >> [1]. > > >> > > >> > > https://github.com/apache/flink-kubernetes-operator/tree/main/helm/flink-kubernetes-operator/crds > > >> [2]. https://issues.apache.org/jira/browse/FLINK-26955 > > >> > > >> > > >> @Hao t Chang We do not have regular sync up > > meeting > > >> so > > >> far. But I think we could schedule some sync up for the 1.0.0 release if > > >> necessary. Anyone who is interested are welcome. > > >> > > >> > > >> Best, > > >> Yang > > >> > > >> > > >> > > >> > > >> Hao t Chang 于2022年4月27日周三 07:45写道: > > >> > > >> > Hi Gyula, > > >> > > > >> > Thanks for the release timeline information. I would like to learn the > > >> > gathered knowledge and volunteer as well. Will there be sync up > > >> > meeting/call for this collaboration ? > > >> > > > >> > From: Gyula Fóra > > >> > Date: Monday, April 25, 2022 at 11:22 AM > > >> > To: dev > > >> > Subject: [DISCUSS] Next Flink Kubernetes Operator release timeline > > >> > Hi Devs! > > >> > > > >> > The community has been working hard on cleaning up the operator logic > > >> and > > >> > adding some core features that have been missing from the preview > > >> release > > >> > (session jobs for example). We have also added some significant > > >> > improvements around deployment/operations. > > >> > > > >> > With the current pace of the development I think in a few weeks we > > >> should > > >> > be in a good position to release next version of the operator. This > > >> would > > >> > also give us the opportunity to add support for the upcoming 1.15 > > >> release > > >> > :) > > >> > > > >> > We have to decide on 2 main things: > > >> > 1. Target release date > > >> > 2. Release version > > >> > > > >> > With the current state of the project I am confident that we could > > cut a > > >> > really good
[jira] [Created] (FLINK-27688) Pluggable backend for EventUtils
Márton Balassi created FLINK-27688: -- Summary: Pluggable backend for EventUtils Key: FLINK-27688 URL: https://issues.apache.org/jira/browse/FLINK-27688 Project: Flink Issue Type: New Feature Components: Kubernetes Operator Reporter: Márton Balassi Assignee: Gyula Fora Fix For: kubernetes-operator-1.1.0 Currently the [EventUtils|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/EventUtils.java] utility that we use to publish events for the operator has an implementation that is tightly coupled with the [Kubernetes Events|https://www.containiq.com/post/kubernetes-events] mechanism. I suggest to enhance this with a pluggable event interface, which could be implemented by our users to support their event messaging system of choice. -- This message was sent by Atlassian Jira (v8.20.7#820007)
RE: Kafka Sink Key and Value Avro Schema Usage Issues
Also forgot to attach the information regarding how did I convert the avro objects to bytes in the approach that I took with deprecated kafka producer. protected byte[] getValueBytes(Value value) { DatumWriter valWriter = new SpecificDatumWriter( Value.getSchema()); ByteArrayOutputStream valOut = new ByteArrayOutputStream(); BinaryEncoder valEncoder = EncoderFactory.get().binaryEncoder(valOut, null); try { valWriter.write(value, valEncoder); // TODO Auto-generated catch block valEncoder.flush(); // TODO Auto-generated catch block valOut.close(); // TODO Auto-generated catch block } catch (Exception e) { } return valOut.toByteArray(); } protected byte[] getKeyBytes(Key key) { DatumWriter keyWriter = new SpecificDatumWriter( key.getSchema()); ByteArrayOutputStream keyOut = new ByteArrayOutputStream(); BinaryEncoder keyEncoder = EncoderFactory.get().binaryEncoder(keyOut, null); try { keyWriter.write(key, keyEncoder); // TODO Auto-generated catch block keyEncoder.flush(); // TODO Auto-generated catch block keyOut.close(); // TODO Auto-generated catch block } catch (Exception e) { } return keyOut.toByteArray(); } From: Ghiya, Jay (GE Healthcare) Sent: 18 May 2022 21:51 To: u...@flink.apache.org Cc: dev@flink.apache.org; Pandiaraj, Satheesh kumar (GE Healthcare) ; Kumar, Vipin (GE Healthcare) Subject: Kafka Sink Key and Value Avro Schema Usage Issues Hi Team, This is regarding Flink Kafka Sink. We have a usecase where we have headers and both key and value as Avro Schema. Below is the expectation in terms of intuitiveness for avro kafka key and value: KafkaSink.>builder() .setBootstrapServers(cloudkafkaBrokerAPI) .setRecordSerializer( KafkaRecordSerializationSchema.builder() .setKeySerializationSchema( ConfluentRegistryAvroSerializationSchema .forSpecific( key.class, "Key", cloudSchemaRegistryURL)) .setValueSerializationSchema( ConfluentRegistryAvroSerializationSchema .forSpecific( Value.class,"val", cloudSchemaRegistryURL)) .setTopic(outputTopic) .build()) .build(); What I understood currently it does not accept key and value both as avro schemas as part of kafka sink. It only accepts sink. First I tried to use the deprecated Flink Kafka Producer by implementing KafkaSerializationSchema and supplying properties of avro ser and der via : cloudKafkaProperties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,io.confluent.kafka.serializers.KafkaAvroSerializer.class.getName()); cloudKafkaProperties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,io.confluent.kafka.serializers.KafkaAvroSerializer.class.getName()); The problem here is I am able to run this example but the schema that gets stored in confluent schema registry is: { "subject": "ddp_out-key", "version": 1, "id": 1, "schema": "\"bytes\"" } Instead of full schema it has just recognized the whole as bytes. So I am looking for a solution without kafka sink to make it work as of now and is there feature request part of roadmap for adding support To kafka sink itself for producer record as that would be ideal. The previous operator can send the producer record with key,val and headers and then it can be forwarded ahead. -Jay GEHC
[jira] [Created] (FLINK-27687) SpanningWrapper shouldn't assume temp folder exists
Gaël Renoux created FLINK-27687: --- Summary: SpanningWrapper shouldn't assume temp folder exists Key: FLINK-27687 URL: https://issues.apache.org/jira/browse/FLINK-27687 Project: Flink Issue Type: New Feature Components: Runtime / Network Affects Versions: 1.14.4 Reporter: Gaël Renoux In SpanningWrapper.createSpillingChannel, it assumes that the folder in which we create the file exists. However, this is not the case in the following scenario (which actually happened to us today): * The temp folders were created a while ago (I assume on startup of the task-manager) in the /tmp folder. They weren't used for a while, probably because we didn't have any record big enough to trigger it. * The cleanup cron for /tmp did its job and deleted those old folders in /tmp. * We deployed a new version of the job that actually needed the folders, and it crashed. => Not sure if it should be SpanningWrapper's responsability to create the folder if it doesn't exist anymore, though, but I'm not familiar enough with Flink's internal to make a guess as to what class should do it. The problem occurred to us on SpanningWrapper, but it can probably happen in other places as well. More generally, assuming that folders and files in /tmp won't get deleted at some point doesn't seem correct to me. The [documentation for io.tmp.dirs|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/] recommands that it shouldn't be purged, but we do need to clean up at some point. If that is not the case, then the documentation should be updated to indicate that this is not a recommendation but mandatory, and that purges will break the jobs (not just trigger a recovery). -- This message was sent by Atlassian Jira (v8.20.7#820007)
Kafka Sink Key and Value Avro Schema Usage Issues
Hi Team, This is regarding Flink Kafka Sink. We have a usecase where we have headers and both key and value as Avro Schema. Below is the expectation in terms of intuitiveness for avro kafka key and value: KafkaSink.>builder() .setBootstrapServers(cloudkafkaBrokerAPI) .setRecordSerializer( KafkaRecordSerializationSchema.builder() .setKeySerializationSchema( ConfluentRegistryAvroSerializationSchema .forSpecific( key.class, "Key", cloudSchemaRegistryURL)) .setValueSerializationSchema( ConfluentRegistryAvroSerializationSchema .forSpecific( Value.class,"val", cloudSchemaRegistryURL)) .setTopic(outputTopic) .build()) .build(); What I understood currently it does not accept key and value both as avro schemas as part of kafka sink. It only accepts sink. First I tried to use the deprecated Flink Kafka Producer by implementing KafkaSerializationSchema and supplying properties of avro ser and der via : cloudKafkaProperties.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,io.confluent.kafka.serializers.KafkaAvroSerializer.class.getName()); cloudKafkaProperties.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,io.confluent.kafka.serializers.KafkaAvroSerializer.class.getName()); The problem here is I am able to run this example but the schema that gets stored in confluent schema registry is: { "subject": "ddp_out-key", "version": 1, "id": 1, "schema": "\"bytes\"" } Instead of full schema it has just recognized the whole as bytes. So I am looking for a solution without kafka sink to make it work as of now and is there feature request part of roadmap for adding support To kafka sink itself for producer record as that would be ideal. The previous operator can send the producer record with key,val and headers and then it can be forwarded ahead. -Jay GEHC
[jira] [Created] (FLINK-27686) Only patch status when the status actually changed
Gyula Fora created FLINK-27686: -- Summary: Only patch status when the status actually changed Key: FLINK-27686 URL: https://issues.apache.org/jira/browse/FLINK-27686 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Gyula Fora Fix For: kubernetes-operator-1.0.0 The StatusHelper class currently always patches the status regardless if it changed or not. We should use an ObjectMapper and simply compare the ObjectNode representations and only patch if there is any change. (I think we cannot directly compare the status objects because some of the content comes from getters and are not part of the equals implementation) -- This message was sent by Atlassian Jira (v8.20.7#820007)
Re: [VOTE] FLIP-229: Introduces Join Hint for Flink SQL Batch Job
+1(binding) Best, Jark On Wed, 18 May 2022 at 14:18, Jingsong Li wrote: > +1 Thanks for driving. > > Best, > Jingsong > > On Wed, May 18, 2022 at 1:33 PM godfrey he wrote: > > > Thanks Xuyang for driving this, +1(binding) > > > > Best, > > Godfrey > > > > Xuyang 于2022年5月17日周二 10:21写道: > > > > > > Hi, everyone. > > > Thanks for your feedback for FLIP-229: Introduces Join Hint for Flink > > SQL Batch Job[1] on the discussion thread[2]. > > > I'd like to start a vote for it. The vote will be open for at least 72 > > hours unless there is an objection or not enough votes. > > > > > > -- > > > > > > Best! > > > Xuyang > > > > > > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job > > > [2] https://lists.apache.org/thread/y668bxyjz66ggtjypfz9t571m0tyvv9h > > >
Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
Hi Godfrey, Regarding Table API for CTAS, "Table#createTableAs(tablePath)" seems a little strange to me. Usually, the parameter after AS should be the query, but the query is in front of AS. I slightly prefer a method on TableEnvironment besides "createTable" (i.e. a special createTable with writing data). For example: void createTableAs(String path, TableDescriptor descriptor, Table query); Usage: tableEnv.createTableAs( "T1", TableDescriptor.forConnector("hive") .option("format", "parquet") .build(), query); Best, Jark On Wed, 18 May 2022 at 22:53, Jark Wu wrote: > Hi Mang, > > Thanks for proposing this, CTAS is a very important API for batch users. > > I think the key problem of this FLIP is the ACID semantics of the CTAS > operation. > We care most about two parts of the semantics: > 1) Atomicity: the created table should be rolled back if the write is > failed. > 2) Isolation: the created table shouldn't be visible before the write is > successful (read uncommitted). > > From your investigation, it seems that: > - Flink (your FLIP): none of them. ==> LEVEL-1 > - Spark DataSource v1: is atomic (can roll back), but is not isolated. ==> > LEVEL-2 > - Spark DataSource v2: guarantees both of them. ==> LEVEL-3 > - Hive MR: guarantees both of them. ==> LEVEL-3 > > In order to support higher ACID semantics, I agree with Godfrey that we > need some hooks in JM > which can be called when the job is finished or failed/canceled. It might > look like > `StreamExecutionEnvironment#registerJobListener(JobListener)`, > but JobListener is called on the > client side. What we need is an interface called on the JM side, because > the job can be submitted in > detached mode. > > With this interface, we can easily support LEVEL-2 semantics by calling > `Catalog#dropTable` in the > `JobListener#onJobFailed`. We can also support LEVEL-3 by introducing > `StagingTableCatalog` like Spark, > calling `StagedTable#commitStagedChanges()` in `JobListener#onJobFinished` > and > calling StagedTable#abortStagedChanges() in `JobListener#onJobFailed`. > > Best, > Jark > > > On Wed, 18 May 2022 at 12:29, godfrey he wrote: > >> Hi Mang, >> >> Thanks for driving this FLIP. >> >> Please follow the FLIP template[1] style, and the `Syntax ` is part of >> the `Public API Changes` section. >> ‘Program research’ and 'Implementation Plan' are part of the `Proposed >> Changes` section, >> or move ‘Program research’ to the appendix. >> >> > Providing methods that are used to execute CTAS for Table API users. >> We should introduce `createTable` in `Table` instead of >> `TableEnvironment`. >> Because all table operations are defined in `Table`, see: >> Table#executeInsert, >> Table#insertInto, etc. >> About the method name, I prefer to use `createTableAs`. >> >> > TableSink needs to provide the CleanUp API, developers implement as >> needed. >> I think it's hard for TableSink to implement a clean up operation. For >> file system sink, >> the data can be written to a temporary directory, but for key/value >> sinks, it's hard to >> remove the written keys, unless the sink records all written keys. >> >> > Do not do drop table operations in the framework, drop table is >> implemented in >> TableSink according to the needs of specific TableSink >> The TM process may crash at any time, and the drop operation will not >> be executed any more. >> >> How about we do the drop table operation and cleanup data action in the >> catalog? >> Where to execute the drop operation. one approach is in client, other is >> in JM. >> 1. in client: this requires the client to be alive until the job is >> finished and failed. >> 2. in JM: this requires the JM could provide some interfaces/hooks >> that the planner >> implements the logic and the code will be executed in JM. >> I prefer the approach two, but it requires more detail design with >> runtime @gaoyunhaii, @kevin.yingjie >> >> >> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template >> >> Best, >> Godfrey >> >> >> Mang Zhang 于2022年5月6日周五 11:24写道: >> >> > >> > Hi, Yuxia >> > Thanks for your reply! >> > About the question 1, we will not support, FLIP-218[1] is to simplify >> the complexity of user DDL and make it easier for users to use. I have >> never encountered this case in a big data. >> > About the question 2, we will provide a public API like below public >> void cleanUp(); >> > >> > Regarding the mechanism of cleanUp, people who are familiar with >> the runtime module need to provide professional advice, which is what we >> need to focus on. >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > -- >> > >> > Best regards, >> > Mang Zhang >> > >> > >> > >> > >> > >> > At 2022-04-29 17:00:03, "yuxia" wrote: >> > >Thanks for for driving this work, it's to be a useful feature. >> > >About the flip-218, I have some questions. >> > > >> > >1: Does our CTAS syntax support specify
Re: Re: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS)
Hi Mang, Thanks for proposing this, CTAS is a very important API for batch users. I think the key problem of this FLIP is the ACID semantics of the CTAS operation. We care most about two parts of the semantics: 1) Atomicity: the created table should be rolled back if the write is failed. 2) Isolation: the created table shouldn't be visible before the write is successful (read uncommitted). >From your investigation, it seems that: - Flink (your FLIP): none of them. ==> LEVEL-1 - Spark DataSource v1: is atomic (can roll back), but is not isolated. ==> LEVEL-2 - Spark DataSource v2: guarantees both of them. ==> LEVEL-3 - Hive MR: guarantees both of them. ==> LEVEL-3 In order to support higher ACID semantics, I agree with Godfrey that we need some hooks in JM which can be called when the job is finished or failed/canceled. It might look like `StreamExecutionEnvironment#registerJobListener(JobListener)`, but JobListener is called on the client side. What we need is an interface called on the JM side, because the job can be submitted in detached mode. With this interface, we can easily support LEVEL-2 semantics by calling `Catalog#dropTable` in the `JobListener#onJobFailed`. We can also support LEVEL-3 by introducing `StagingTableCatalog` like Spark, calling `StagedTable#commitStagedChanges()` in `JobListener#onJobFinished` and calling StagedTable#abortStagedChanges() in `JobListener#onJobFailed`. Best, Jark On Wed, 18 May 2022 at 12:29, godfrey he wrote: > Hi Mang, > > Thanks for driving this FLIP. > > Please follow the FLIP template[1] style, and the `Syntax ` is part of > the `Public API Changes` section. > ‘Program research’ and 'Implementation Plan' are part of the `Proposed > Changes` section, > or move ‘Program research’ to the appendix. > > > Providing methods that are used to execute CTAS for Table API users. > We should introduce `createTable` in `Table` instead of `TableEnvironment`. > Because all table operations are defined in `Table`, see: > Table#executeInsert, > Table#insertInto, etc. > About the method name, I prefer to use `createTableAs`. > > > TableSink needs to provide the CleanUp API, developers implement as > needed. > I think it's hard for TableSink to implement a clean up operation. For > file system sink, > the data can be written to a temporary directory, but for key/value > sinks, it's hard to > remove the written keys, unless the sink records all written keys. > > > Do not do drop table operations in the framework, drop table is > implemented in > TableSink according to the needs of specific TableSink > The TM process may crash at any time, and the drop operation will not > be executed any more. > > How about we do the drop table operation and cleanup data action in the > catalog? > Where to execute the drop operation. one approach is in client, other is > in JM. > 1. in client: this requires the client to be alive until the job is > finished and failed. > 2. in JM: this requires the JM could provide some interfaces/hooks > that the planner > implements the logic and the code will be executed in JM. > I prefer the approach two, but it requires more detail design with > runtime @gaoyunhaii, @kevin.yingjie > > > [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template > > Best, > Godfrey > > > Mang Zhang 于2022年5月6日周五 11:24写道: > > > > > Hi, Yuxia > > Thanks for your reply! > > About the question 1, we will not support, FLIP-218[1] is to simplify > the complexity of user DDL and make it easier for users to use. I have > never encountered this case in a big data. > > About the question 2, we will provide a public API like below public > void cleanUp(); > > > > Regarding the mechanism of cleanUp, people who are familiar with > the runtime module need to provide professional advice, which is what we > need to focus on. > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > Mang Zhang > > > > > > > > > > > > At 2022-04-29 17:00:03, "yuxia" wrote: > > >Thanks for for driving this work, it's to be a useful feature. > > >About the flip-218, I have some questions. > > > > > >1: Does our CTAS syntax support specify target table's schema including > column name and data type? I think it maybe a useful fature in case we want > to change the data types in target table instead of always copy the source > table's schema. It'll be more flexible with this feature. > > >Btw, MySQL's "CREATE TABLE ... SELECT Statement"[1] support this > feature. > > > > > >2: Seems it'll requre sink to implement an public interface to drop > table, so what's the interface will look like? > > > > > >[1] https://dev.mysql.com/doc/refman/8.0/en/create-table-select.html > > > > > >Best regards, > > >Yuxia > > > > > >- 原始邮件 - > > >发件人: "Mang Zhang" > > >收件人: "dev" > > >发送时间: 星期四, 2022年 4 月 28日 下午 4:57:24 > > >主题: [DISCUSS] FLIP-218: Support SELECT clause in CREATE TABLE(CTAS) > > > > > >Hi, everyone > > > > > > > > >I would like to open a discussion for support
[jira] [Created] (FLINK-27685) Add scale subresource
Gyula Fora created FLINK-27685: -- Summary: Add scale subresource Key: FLINK-27685 URL: https://issues.apache.org/jira/browse/FLINK-27685 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Gyula Fora Assignee: Gyula Fora Fix For: kubernetes-operator-1.1.0 We should define a scale subresource for the deployment/sessionjob resources that allows us to use the `scale` command or even hook in the HPA. I suggest to use parallelism as the "replicas". -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27684) FlinkKafkaConsumerBase could record partitions offset when GROUP_OFFSETS
SilkyAlex created FLINK-27684: - Summary: FlinkKafkaConsumerBase could record partitions offset when GROUP_OFFSETS Key: FLINK-27684 URL: https://issues.apache.org/jira/browse/FLINK-27684 Project: Flink Issue Type: Improvement Components: Connectors / Kafka Affects Versions: 1.14.3 Reporter: SilkyAlex when FlinkKafkaConsumerBase startupMode been set with: EARLIEST/LATEST/TIMESTAMP/GROUP_OFFSETS the log when startup are not record current partitions's offsets, that makes difficult to locate starup offsets for check something data problem. we could record it for a better world. {code:java} 2022-04-15 22:27:58.802 INFO [95] org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase - Consumer subtask 11 will start reading the following 1 partitions from the committed group offsets in Kafka: [KafkaTopicPartition{topic='kafka_topic', partition=4}] 2022-04-15 22:27:58.802 INFO [94] org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase - Consumer subtask 5 will start reading the following 1 partitions from the committed group offsets in Kafka: [KafkaTopicPartition{topic='kafka_topic', partition=14}] 2022-04-15 22:27:58.805 INFO [92] org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase - Consumer subtask 3 will start reading the following 1 partitions from the committed group offsets in Kafka: [KafkaTopicPartition{topic='kafka_topic', partition=12, wish here to log offsets}] {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[ANNOUNCE] Call for Presentations for ApacheCon Asia 2022 streaming track
Hi everyone, ApacheCon Asia [1] will feature the Streaming track for the second year. Please don't hesitate to submit your proposal if there is an interesting project or Flink experience you would like to share with us! The conference will be online (virtual) and the talks will be pre-recorded. The deadline of proposal submission is at the end of this month (May 31st). See you all there :) Best Regards, Yu [1] https://apachecon.com/acasia2022/cfp.html
[jira] [Created] (FLINK-27683) Insert into (column1, column2) Values(.....) can't work with sql Hints
Xin Yang created FLINK-27683: Summary: Insert into (column1, column2) Values(.) can't work with sql Hints Key: FLINK-27683 URL: https://issues.apache.org/jira/browse/FLINK-27683 Project: Flink Issue Type: New Feature Affects Versions: 1.14.0 Reporter: Xin Yang {code:java} INSERT INTO `tidb`.`%s`.`%s` /*+ OPTIONS('tidb.sink.update-columns'='c2, c13') (c2, c13) values(1, 12.12) {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27682) [JUnit5 Migration] Migrate ComparatorTestBase to Junit5
Sergey Nuyanzin created FLINK-27682: --- Summary: [JUnit5 Migration] Migrate ComparatorTestBase to Junit5 Key: FLINK-27682 URL: https://issues.apache.org/jira/browse/FLINK-27682 Project: Flink Issue Type: Sub-task Components: Tests Reporter: Sergey Nuyanzin Several modules depend on it -- This message was sent by Atlassian Jira (v8.20.7#820007)
Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client
Hi Paul, 1) SHOW QUERIES +1 to add finished time, but it would be better to call it "end_time" to keep aligned with names in Web UI. 2) DROP QUERY I think we shouldn't throw exceptions for batch jobs, otherwise, how to stop batch queries? At present, I don't think "DROP" is a suitable keyword for this statement. >From the perspective of users, "DROP" sounds like the query should be removed from the list of "SHOW QUERIES". However, it doesn't. Maybe "STOP QUERY" is more suitable and compliant with commands of Flink CLI. 3) SHOW SAVEPOINTS I think this statement is needed, otherwise, savepoints are lost after the SAVEPOINT command is executed. Savepoints can be retrieved from REST API "/jobs/:jobid/checkpoints" with filtering "checkpoint_type"="savepoint". It's also worth considering providing "SHOW CHECKPOINTS" to list all checkpoints. 4) SAVEPOINT & RELEASE SAVEPOINT I'm a little concerned with the SAVEPOINT and RELEASE SAVEPOINT statements now. In the vendors, the parameters of SAVEPOINT and RELEASE SAVEPOINT are both the same savepoint id. However, in our syntax, the first one is query id, and the second one is savepoint path, which is confusing and not consistent. When I came across SHOW SAVEPOINT, I thought maybe they should be in the same syntax set. For example, CREATE SAVEPOINT FOR [QUERY] & DROP SAVEPOINT . That means we don't follow the majority of vendors in SAVEPOINT commands. I would say the purpose is different in Flink. What other's opinion on this? Best, Jark [1]: https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-checkpoints On Wed, 18 May 2022 at 14:43, Paul Lam wrote: > Hi Godfrey, > > Thanks a lot for your inputs! > > 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs (DataStream > or SQL) or > clients (SQL client or CLI). Under the hook, it’s based on > ClusterClient#listJobs, the > same with Flink CLI. I think it’s okay to have non-SQL jobs listed in SQL > client, because > these jobs can be managed via SQL client too. > > WRT finished time, I think you’re right. Adding it to the FLIP. But I’m a > bit afraid that the > rows would be too long. > > WRT ‘DROP QUERY’, > > What's the behavior for batch jobs and the non-running jobs? > > > In general, the behavior would be aligned with Flink CLI. Triggering a > savepoint for > a non-running job would cause errors, and the error message would be > printed to > the SQL client. Triggering a savepoint for batch(unbounded) jobs in > streaming > execution mode would be the same with streaming jobs. However, for batch > jobs in > batch execution mode, I think there would be an error, because batch > execution > doesn’t support checkpoints currently (please correct me if I’m wrong). > > WRT ’SHOW SAVEPOINTS’, I’ve thought about it, but Flink clusterClient/ > jobClient doesn’t have such a functionality at the moment, neither do > Flink CLI. > Maybe we could make it a follow-up FLIP, which includes the modifications > to > clusterClient/jobClient and Flink CLI. WDYT? > > Best, > Paul Lam > > > 2022年5月17日 20:34,godfrey he 写道: > > > > Godfrey > >
Re: [VOTE] Creating an Apache Flink slack workspace
+1 (binding) Thanks, Zhu Timo Walther 于2022年5月18日周三 16:52写道: > > +1 (binding) > > Thanks, > Timo > > > On 17.05.22 20:44, Gyula Fóra wrote: > > +1 (binding) > > > > On Tue, 17 May 2022 at 19:52, Yufei Zhang wrote: > > > >> +1 (nonbinding) > >> > >> On Tue, May 17, 2022 at 5:29 PM Márton Balassi > >> wrote: > >> > >>> +1 (binding) > >>> > >>> On Tue, May 17, 2022 at 11:00 AM Jingsong Li > >>> wrote: > >>> > Thank Xintong for driving this work. > > +1 > > Best, > Jingsong > > On Tue, May 17, 2022 at 4:49 PM Martijn Visser < > >> martijnvis...@apache.org > > wrote: > > > +1 (binding) > > > > On Tue, 17 May 2022 at 10:38, Yu Li wrote: > > > >> +1 (binding) > >> > >> Thanks Xintong for driving this! > >> > >> Best Regards, > >> Yu > >> > >> > >> On Tue, 17 May 2022 at 16:32, Robert Metzger > > wrote: > >> > >>> Thanks for starting the VOTE! > >>> > >>> +1 (binding) > >>> > >>> > >>> > >>> On Tue, May 17, 2022 at 10:29 AM Jark Wu > >> wrote: > >>> > Thank Xintong for driving this work. > > +1 from my side (binding) > > Best, > Jark > > On Tue, 17 May 2022 at 16:24, Xintong Song < > >>> tonysong...@gmail.com> > >>> wrote: > > > Hi everyone, > > > > As previously discussed in [1], I would like to open a vote > >> on > >> creating > an > > Apache Flink slack workspace channel. > > > > The proposed actions include: > > - Creating a dedicated slack workspace with the name Apache > >>> Flink > >> that > >>> is > > controlled and maintained by the Apache Flink PMC > > - Updating the Flink website about rules for using various > >>> communication > > channels > > - Setting up an Archive for the Apache Flink slack > > - Revisiting this initiative by the end of 2022 > > > > The vote will last for at least 72 hours, and will be > >> accepted > by a > > consensus of active PMC members. > > > > Best, > > > > Xintong > > > > >>> > >> > > > > >>> > >> > > >
[jira] [Created] (FLINK-27681) Improve the availability of Flink when the RocksDB file is corrupted.
ming li created FLINK-27681: --- Summary: Improve the availability of Flink when the RocksDB file is corrupted. Key: FLINK-27681 URL: https://issues.apache.org/jira/browse/FLINK-27681 Project: Flink Issue Type: Improvement Components: Runtime / State Backends Reporter: ming li We have encountered several times when the RocksDB checksum does not match or the block verification fails when the job is restored. The reason for this situation is generally that there are some problems with the machine where the task is located, which causes the files uploaded to HDFS to be incorrect, but it has been a long time (a dozen minutes to half an hour) when we found this problem. I'm not sure if anyone else has had a similar problem. Since this file is referenced by incremental checkpoints for a long time, when the maximum number of checkpoints reserved is exceeded, we can only use this file until it is no longer referenced. When the job failed, it cannot be recovered. Therefore we consider: 1. Can RocksDB periodically check whether all files are correct and find the problem in time? 2. Can Flink automatically roll back to the previous checkpoint when there is a problem with the checkpoint data, because even with manual intervention, it just tries to recover from the existing checkpoint or discard the entire state. 3. Can we increase the maximum number of references to a file based on the maximum number of checkpoints reserved? When the number of references exceeds the maximum number of checkpoints -1, the Task side is required to upload a new file for this reference. Not sure if this way will ensure that the new file we upload will be correct. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27680) Disable PulsarSinkITCase on JDK 11
Martijn Visser created FLINK-27680: -- Summary: Disable PulsarSinkITCase on JDK 11 Key: FLINK-27680 URL: https://issues.apache.org/jira/browse/FLINK-27680 Project: Flink Issue Type: Technical Debt Components: Connectors / Pulsar Affects Versions: 1.16.0, 1.14.5, 1.15.1 Reporter: Martijn Visser Assignee: Martijn Visser Since Pulsar doesn't yet support Java 11, we should make sure that the Pulsar tests don't run when testing JDK11. This is the case already for the e2e tests, but not yet for the connector tests. We should disable this too. -- This message was sent by Atlassian Jira (v8.20.7#820007)
Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric
Hi Jark and Alexander, Thanks for your comments! I’m also OK to introduce common table options. I prefer to introduce a new DefaultLookupCacheOptions class for holding these option definitions because putting all options into FactoryUtil would make it a bit ”crowded” and not well categorized. FLIP has been updated according to suggestions above: 1. Use static “of” method for constructing RescanRuntimeProvider considering both arguments are required. 2. Introduce new table options matching DefaultLookupCacheFactory Best, Qingsheng On Wed, May 18, 2022 at 2:57 PM Jark Wu wrote: > Hi Alex, > > 1) retry logic > I think we can extract some common retry logic into utilities, e.g. > RetryUtils#tryTimes(times, call). > This seems independent of this FLIP and can be reused by DataStream users. > Maybe we can open an issue to discuss this and where to put it. > > 2) cache ConfigOptions > I'm fine with defining cache config options in the framework. > A candidate place to put is FactoryUtil which also includes > "sink.parallelism", "format" options. > > Best, > Jark > > > On Wed, 18 May 2022 at 13:52, Александр Смирнов > wrote: > >> Hi Qingsheng, >> >> Thank you for considering my comments. >> >> > there might be custom logic before making retry, such as re-establish >> the connection >> >> Yes, I understand that. I meant that such logic can be placed in a >> separate function, that can be implemented by connectors. Just moving >> the retry logic would make connector's LookupFunction more concise + >> avoid duplicate code. However, it's a minor change. The decision is up >> to you. >> >> > We decide not to provide common DDL options and let developers to >> define their own options as we do now per connector. >> >> What is the reason for that? One of the main goals of this FLIP was to >> unify the configs, wasn't it? I understand that current cache design >> doesn't depend on ConfigOptions, like was before. But still we can put >> these options into the framework, so connectors can reuse them and >> avoid code duplication, and, what is more significant, avoid possible >> different options naming. This moment can be pointed out in >> documentation for connector developers. >> >> Best regards, >> Alexander >> >> вт, 17 мая 2022 г. в 17:11, Qingsheng Ren : >> > >> > Hi Alexander, >> > >> > Thanks for the review and glad to see we are on the same page! I think >> you forgot to cc the dev mailing list so I’m also quoting your reply under >> this email. >> > >> > > We can add 'maxRetryTimes' option into this class >> > >> > In my opinion the retry logic should be implemented in lookup() instead >> of in LookupFunction#eval(). Retrying is only meaningful under some >> specific retriable failures, and there might be custom logic before making >> retry, such as re-establish the connection (JdbcRowDataLookupFunction is an >> example), so it's more handy to leave it to the connector. >> > >> > > I don't see DDL options, that were in previous version of FLIP. Do >> you have any special plans for them? >> > >> > We decide not to provide common DDL options and let developers to >> define their own options as we do now per connector. >> > >> > The rest of comments sound great and I’ll update the FLIP. Hope we can >> finalize our proposal soon! >> > >> > Best, >> > >> > Qingsheng >> > >> > >> > > On May 17, 2022, at 13:46, Александр Смирнов >> wrote: >> > > >> > > Hi Qingsheng and devs! >> > > >> > > I like the overall design of updated FLIP, however I have several >> > > suggestions and questions. >> > > >> > > 1) Introducing LookupFunction as a subclass of TableFunction is a good >> > > idea. We can add 'maxRetryTimes' option into this class. 'eval' method >> > > of new LookupFunction is great for this purpose. The same is for >> > > 'async' case. >> > > >> > > 2) There might be other configs in future, such as 'cacheMissingKey' >> > > in LookupFunctionProvider or 'rescanInterval' in ScanRuntimeProvider. >> > > Maybe use Builder pattern in LookupFunctionProvider and >> > > RescanRuntimeProvider for more flexibility (use one 'build' method >> > > instead of many 'of' methods in future)? >> > > >> > > 3) What are the plans for existing TableFunctionProvider and >> > > AsyncTableFunctionProvider? I think they should be deprecated. >> > > >> > > 4) Am I right that the current design does not assume usage of >> > > user-provided LookupCache in re-scanning? In this case, it is not very >> > > clear why do we need methods such as 'invalidate' or 'putAll' in >> > > LookupCache. >> > > >> > > 5) I don't see DDL options, that were in previous version of FLIP. Do >> > > you have any special plans for them? >> > > >> > > If you don't mind, I would be glad to be able to make small >> > > adjustments to the FLIP document too. I think it's worth mentioning >> > > about what exactly optimizations are planning in the future. >> > > >> > > Best regards, >> > > Smirnov Alexander >> > > >> > > пт, 13 мая 2022 г. в 20:27, Qingsheng Ren : >>
Could not copy native libraries - Permission denied
Hi, We are using flink version 1.13 with a kafka source and a kinesis sink with a parallelism of 3. On submitting the job I get this error Could not copy native binaries to temp directory /tmp/amazon-kinesis-producer-native-binaries Followed by permission denied even though all the permissions have been provided and is being run as root user. What could be causing this?
Re: [VOTE] Creating an Apache Flink slack workspace
+1 (binding) Thanks, Timo On 17.05.22 20:44, Gyula Fóra wrote: +1 (binding) On Tue, 17 May 2022 at 19:52, Yufei Zhang wrote: +1 (nonbinding) On Tue, May 17, 2022 at 5:29 PM Márton Balassi wrote: +1 (binding) On Tue, May 17, 2022 at 11:00 AM Jingsong Li wrote: Thank Xintong for driving this work. +1 Best, Jingsong On Tue, May 17, 2022 at 4:49 PM Martijn Visser < martijnvis...@apache.org wrote: +1 (binding) On Tue, 17 May 2022 at 10:38, Yu Li wrote: +1 (binding) Thanks Xintong for driving this! Best Regards, Yu On Tue, 17 May 2022 at 16:32, Robert Metzger wrote: Thanks for starting the VOTE! +1 (binding) On Tue, May 17, 2022 at 10:29 AM Jark Wu wrote: Thank Xintong for driving this work. +1 from my side (binding) Best, Jark On Tue, 17 May 2022 at 16:24, Xintong Song < tonysong...@gmail.com> wrote: Hi everyone, As previously discussed in [1], I would like to open a vote on creating an Apache Flink slack workspace channel. The proposed actions include: - Creating a dedicated slack workspace with the name Apache Flink that is controlled and maintained by the Apache Flink PMC - Updating the Flink website about rules for using various communication channels - Setting up an Archive for the Apache Flink slack - Revisiting this initiative by the end of 2022 The vote will last for at least 72 hours, and will be accepted by a consensus of active PMC members. Best, Xintong
[jira] [Created] (FLINK-27679) Support append-only table for log store.
Zheng Hu created FLINK-27679: Summary: Support append-only table for log store. Key: FLINK-27679 URL: https://issues.apache.org/jira/browse/FLINK-27679 Project: Flink Issue Type: Sub-task Reporter: Zheng Hu Will publish separate PR to support append-only table for log table. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27678) Support append-only table for file store.
Zheng Hu created FLINK-27678: Summary: Support append-only table for file store. Key: FLINK-27678 URL: https://issues.apache.org/jira/browse/FLINK-27678 Project: Flink Issue Type: Sub-task Reporter: Zheng Hu Let me publish a separate PR for supporting append-only table in flink table store's file store. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27677) Kubernetes reuse rest.bind-port, but do not support a range of ports
tartarus created FLINK-27677: Summary: Kubernetes reuse rest.bind-port, but do not support a range of ports Key: FLINK-27677 URL: https://issues.apache.org/jira/browse/FLINK-27677 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.15.0 Reporter: tartarus k8s module reuse the rest options {color:#DE350B}rest.bind-port{color}, but do not support a range of ports {code:java} /** * Parse a valid port for the config option. A fixed port is expected, and do not support a * range of ports. * * @param flinkConfig flink config * @param port port config option * @return valid port */ public static Integer parsePort(Configuration flinkConfig, ConfigOption port) { checkNotNull(flinkConfig.get(port), port.key() + " should not be null."); try { return Integer.parseInt(flinkConfig.get(port)); } catch (NumberFormatException ex) { throw new FlinkRuntimeException( port.key() + " should be specified to a fixed port. Do not support a range of ports.", ex); } } {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27676) Output records from on_timer are behind the triggering watermark in PyFlink
Juntao Hu created FLINK-27676: - Summary: Output records from on_timer are behind the triggering watermark in PyFlink Key: FLINK-27676 URL: https://issues.apache.org/jira/browse/FLINK-27676 Project: Flink Issue Type: Bug Components: API / Python Affects Versions: 1.15.0 Reporter: Juntao Hu Fix For: 1.16.0 Currently, when dealing with watermarks in AbstractPythonFunctionOperator, super.processWatermark(mark) is called, which advances watermark in timeServiceManager thus triggering timers and then emit current watermark. However, timer triggering is not synchronous in PyFlink (processTimer only put data into beam buffer), and when remote bundle is closed and output records produced by on_timer function finally arrive at Java side, they are already behind the triggering watermark. -- This message was sent by Atlassian Jira (v8.20.7#820007)
Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric
Hi Alex, 1) retry logic I think we can extract some common retry logic into utilities, e.g. RetryUtils#tryTimes(times, call). This seems independent of this FLIP and can be reused by DataStream users. Maybe we can open an issue to discuss this and where to put it. 2) cache ConfigOptions I'm fine with defining cache config options in the framework. A candidate place to put is FactoryUtil which also includes "sink.parallelism", "format" options. Best, Jark On Wed, 18 May 2022 at 13:52, Александр Смирнов wrote: > Hi Qingsheng, > > Thank you for considering my comments. > > > there might be custom logic before making retry, such as re-establish > the connection > > Yes, I understand that. I meant that such logic can be placed in a > separate function, that can be implemented by connectors. Just moving > the retry logic would make connector's LookupFunction more concise + > avoid duplicate code. However, it's a minor change. The decision is up > to you. > > > We decide not to provide common DDL options and let developers to define > their own options as we do now per connector. > > What is the reason for that? One of the main goals of this FLIP was to > unify the configs, wasn't it? I understand that current cache design > doesn't depend on ConfigOptions, like was before. But still we can put > these options into the framework, so connectors can reuse them and > avoid code duplication, and, what is more significant, avoid possible > different options naming. This moment can be pointed out in > documentation for connector developers. > > Best regards, > Alexander > > вт, 17 мая 2022 г. в 17:11, Qingsheng Ren : > > > > Hi Alexander, > > > > Thanks for the review and glad to see we are on the same page! I think > you forgot to cc the dev mailing list so I’m also quoting your reply under > this email. > > > > > We can add 'maxRetryTimes' option into this class > > > > In my opinion the retry logic should be implemented in lookup() instead > of in LookupFunction#eval(). Retrying is only meaningful under some > specific retriable failures, and there might be custom logic before making > retry, such as re-establish the connection (JdbcRowDataLookupFunction is an > example), so it's more handy to leave it to the connector. > > > > > I don't see DDL options, that were in previous version of FLIP. Do you > have any special plans for them? > > > > We decide not to provide common DDL options and let developers to define > their own options as we do now per connector. > > > > The rest of comments sound great and I’ll update the FLIP. Hope we can > finalize our proposal soon! > > > > Best, > > > > Qingsheng > > > > > > > On May 17, 2022, at 13:46, Александр Смирнов > wrote: > > > > > > Hi Qingsheng and devs! > > > > > > I like the overall design of updated FLIP, however I have several > > > suggestions and questions. > > > > > > 1) Introducing LookupFunction as a subclass of TableFunction is a good > > > idea. We can add 'maxRetryTimes' option into this class. 'eval' method > > > of new LookupFunction is great for this purpose. The same is for > > > 'async' case. > > > > > > 2) There might be other configs in future, such as 'cacheMissingKey' > > > in LookupFunctionProvider or 'rescanInterval' in ScanRuntimeProvider. > > > Maybe use Builder pattern in LookupFunctionProvider and > > > RescanRuntimeProvider for more flexibility (use one 'build' method > > > instead of many 'of' methods in future)? > > > > > > 3) What are the plans for existing TableFunctionProvider and > > > AsyncTableFunctionProvider? I think they should be deprecated. > > > > > > 4) Am I right that the current design does not assume usage of > > > user-provided LookupCache in re-scanning? In this case, it is not very > > > clear why do we need methods such as 'invalidate' or 'putAll' in > > > LookupCache. > > > > > > 5) I don't see DDL options, that were in previous version of FLIP. Do > > > you have any special plans for them? > > > > > > If you don't mind, I would be glad to be able to make small > > > adjustments to the FLIP document too. I think it's worth mentioning > > > about what exactly optimizations are planning in the future. > > > > > > Best regards, > > > Smirnov Alexander > > > > > > пт, 13 мая 2022 г. в 20:27, Qingsheng Ren : > > >> > > >> Hi Alexander and devs, > > >> > > >> Thank you very much for the in-depth discussion! As Jark mentioned we > were inspired by Alexander's idea and made a refactor on our design. > FLIP-221 [1] has been updated to reflect our design now and we are happy to > hear more suggestions from you! > > >> > > >> Compared to the previous design: > > >> 1. The lookup cache serves at table runtime level and is integrated > as a component of LookupJoinRunner as discussed previously. > > >> 2. Interfaces are renamed and re-designed to reflect the new design. > > >> 3. We separate the all-caching case individually and introduce a new > RescanRuntimeProvider to reuse the ability of scanning. We are planning to >
Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client
Hi Godfrey, Thanks a lot for your inputs! 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs (DataStream or SQL) or clients (SQL client or CLI). Under the hook, it’s based on ClusterClient#listJobs, the same with Flink CLI. I think it’s okay to have non-SQL jobs listed in SQL client, because these jobs can be managed via SQL client too. WRT finished time, I think you’re right. Adding it to the FLIP. But I’m a bit afraid that the rows would be too long. WRT ‘DROP QUERY’, > What's the behavior for batch jobs and the non-running jobs? In general, the behavior would be aligned with Flink CLI. Triggering a savepoint for a non-running job would cause errors, and the error message would be printed to the SQL client. Triggering a savepoint for batch(unbounded) jobs in streaming execution mode would be the same with streaming jobs. However, for batch jobs in batch execution mode, I think there would be an error, because batch execution doesn’t support checkpoints currently (please correct me if I’m wrong). WRT ’SHOW SAVEPOINTS’, I’ve thought about it, but Flink clusterClient/ jobClient doesn’t have such a functionality at the moment, neither do Flink CLI. Maybe we could make it a follow-up FLIP, which includes the modifications to clusterClient/jobClient and Flink CLI. WDYT? Best, Paul Lam > 2022年5月17日 20:34,godfrey he 写道: > > Godfrey
Re: [VOTE] FLIP-229: Introduces Join Hint for Flink SQL Batch Job
+1 Thanks for driving. Best, Jingsong On Wed, May 18, 2022 at 1:33 PM godfrey he wrote: > Thanks Xuyang for driving this, +1(binding) > > Best, > Godfrey > > Xuyang 于2022年5月17日周二 10:21写道: > > > > Hi, everyone. > > Thanks for your feedback for FLIP-229: Introduces Join Hint for Flink > SQL Batch Job[1] on the discussion thread[2]. > > I'd like to start a vote for it. The vote will be open for at least 72 > hours unless there is an objection or not enough votes. > > > > -- > > > > Best! > > Xuyang > > > > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job > > [2] https://lists.apache.org/thread/y668bxyjz66ggtjypfz9t571m0tyvv9h >