Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-29 Thread Tamir Sagi
Congratulations! thanks to the release managers and everyone who has 
contributed!

Best
Tamir


From: Jark Wu 
Sent: Friday, October 27, 2023 7:39 AM
To: d...@flink.apache.org 
Cc: Qingsheng Ren ; User ; 
user...@flink.apache.org 
Subject: Re: [ANNOUNCE] Apache Flink 1.18.0 released


EXTERNAL EMAIL


Congratulations and thanks release managers and everyone who has contributed!

Best,
Jark

On Fri, 27 Oct 2023 at 12:25, Hang Ruan 
mailto:ruanhang1...@gmail.com>> wrote:
Congratulations!

Best,
Hang

Samrat Deb mailto:decordea...@gmail.com>> 于2023年10月27日周五 
11:50写道:

> Congratulations on the great release
>
> Bests,
> Samrat
>
> On Fri, 27 Oct 2023 at 7:59 AM, Yangze Guo 
> mailto:karma...@gmail.com>> wrote:
>
> > Great work! Congratulations to everyone involved!
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Oct 27, 2023 at 10:23 AM Qingsheng Ren 
> > mailto:re...@apache.org>> wrote:
> > >
> > > Congratulations and big THANK YOU to everyone helping with this
> release!
> > >
> > > Best,
> > > Qingsheng
> > >
> > > On Fri, Oct 27, 2023 at 10:18 AM Benchao Li 
> > > mailto:libenc...@apache.org>>
> > wrote:
> > >>
> > >> Great work, thanks everyone involved!
> > >>
> > >> Rui Fan <1996fan...@gmail.com<mailto:1996fan...@gmail.com>> 
> > >> 于2023年10月27日周五 10:16写道:
> > >> >
> > >> > Thanks for the great work!
> > >> >
> > >> > Best,
> > >> > Rui
> > >> >
> > >> > On Fri, Oct 27, 2023 at 10:03 AM Paul Lam 
> > >> > mailto:paullin3...@gmail.com>>
> > wrote:
> > >> >
> > >> > > Finally! Thanks to all!
> > >> > >
> > >> > > Best,
> > >> > > Paul Lam
> > >> > >
> > >> > > > 2023年10月27日 03:58,Alexander Fedulov <
> alexander.fedu...@gmail.com<mailto:alexander.fedu...@gmail.com>>
> > 写道:
> > >> > > >
> > >> > > > Great work, thanks everyone!
> > >> > > >
> > >> > > > Best,
> > >> > > > Alexander
> > >> > > >
> > >> > > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser <
> > martijnvis...@apache.org<mailto:martijnvis...@apache.org>>
> > >> > > > wrote:
> > >> > > >
> > >> > > >> Thank you all who have contributed!
> > >> > > >>
> > >> > > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin <
> > jinfeng1...@gmail.com<mailto:jinfeng1...@gmail.com>>
> > >> > > >>
> > >> > > >>> Thanks for the great work! Congratulations
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> Best,
> > >> > > >>> Feng Jin
> > >> > > >>>
> > >> > > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu <
> xbjt...@gmail.com<mailto:xbjt...@gmail.com>>
> > wrote:
> > >> > > >>>
> > >> > > >>>> Congratulations, Well done!
> > >> > > >>>>
> > >> > > >>>> Best,
> > >> > > >>>> Leonard
> > >> > > >>>>
> > >> > > >>>> On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee <
> > lincoln.8...@gmail.com<mailto:lincoln.8...@gmail.com>>
> > >> > > >>>> wrote:
> > >> > > >>>>
> > >> > > >>>>> Thanks for the great work! Congrats all!
> > >> > > >>>>>
> > >> > > >>>>> Best,
> > >> > > >>>>> Lincoln Lee
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>> Jing Ge  于2023年10月27日周五
> 00:16写道:
> > >> > > >>>>>
> > >> > > >>>>>> The Apache Flink community is very happy to announce the
> > release of
> > >> > > >>>>> Apache
> > >> > > >>>>>> Flink 1.18.0, which is the first release for the Apache
> > Flink 1.18
> > >> > > >>>>> series.

Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-28 Thread Alexander Fedulov
> Or was it the querying of the checkpoints you were advising against?

Yes, I meant the approach, not file removal itself. Mainly because how
exactly FileSource stores its state is an implementation detail and there
are no external guarantees for its consistency between even the minor
versions.
On top of that, the original author of the StateProcessor API has moved to
another project, so it has not been actively worked on recently. I am not
sure it is even possible to access the FileSource state directly with it
since FLIP-27 sources do not use the OperatorState abstraction directly [1].

[1]
https://github.com/apache/flink/blob/release-1.18/flink-runtime/src/main/java/org/apache/flink/runtime/source/coordinator/SourceCoordinator.java#L510

Best,
Alexander

On Sat, 28 Oct 2023 at 16:13, Andrew Otto  wrote:

> > This is not a robust solution, I would advise against it.
> Oh no?  Am curious as to why not.  It seems not dissimilar to how Kafka
> topic retention works: the messages are removed after some time period
> (hopefully after they are processed), so why would it be bad to remove
> files that are already processed?
>
> Or was it the querying of the checkpoints you were advising against?
>
> To be sure, I was referring to moving the previously processed files away,
> not the checkpoints themselves.
>
> On Fri, Oct 27, 2023 at 12:45 PM Alexander Fedulov <
> alexander.fedu...@gmail.com> wrote:
>
>> > I wonder if you could use this fact to query the committed checkpoints
>> and move them away after the job is done.
>>
>> This is not a robust solution, I would advise against it.
>>
>> Best,
>> Alexander
>>
>> On Fri, 27 Oct 2023 at 16:41, Andrew Otto  wrote:
>>
>>> For moving the files:
>>> > It will keep the files as is and remember the name of the file read
>>> in checkpointed state to ensure it doesnt read the same file twice.
>>>
>>> I wonder if you could use this fact to query the committed checkpoints
>>> and move them away after the job is done.  I think it should even be safe
>>> to do this outside of the Flink job periodically (cron, whatever), because
>>> on restart it won't reprocess the files that have been committed in the
>>> checkpoints.
>>>
>>>
>>> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/libs/state_processor_api/#reading-state
>>>
>>>
>>>
>>>
>>> On Fri, Oct 27, 2023 at 1:13 AM arjun s  wrote:
>>>
 Hi team, Thanks for your quick response.
 I have an inquiry regarding file processing in the event of a job
 restart. When the job is restarted, we encounter challenges in tracking
 which files have been processed and which remain pending. Is there a method
 to seamlessly resume processing files from where they were left off,
 particularly in situations where we need to submit and restart the job
 manually due to any server restart or application restart? This becomes an
 issue when the job processes all the files in the directory from the
 beginning after a restart, and I'm seeking a solution to address this.

 Thanks and regards,
 Arjun

 On Fri, 27 Oct 2023 at 07:29, Chirag Dewan 
 wrote:

> Hi Arjun,
>
> Flink's FileSource doesnt move or delete the files as of now. It will
> keep the files as is and remember the name of the file read in 
> checkpointed
> state to ensure it doesnt read the same file twice.
>
> Flink's source API works in a way that single Enumerator operates on
> the JobManager. The enumerator is responsible for listing the files and
> splitting these into smaller units. These units could be the complete file
> (in case of row formats) or splits within a file (for bulk formats). The
> reading is done by SplitReaders in the Task Managers. This way it ensures
> that only reading is done concurrently and is able to track file
> completions.
>
> You can read more Flink Sources
> 
>  and here
> 
>
> FileSystem
>
> FileSystem # This connector provides a unified Source and Sink for
> BATCH and STREAMING that reads or writes (par...
>
> 
>
>
>
> On Thursday, 26 October, 2023 at 06:53:23 pm IST, arjun s <
> arjunjoice...@gmail.com> wrote:
>
>
> Hello team,
> I'm currently in the process of configuring a Flink job. This job
> entails reading files from a specified directory and then transmitting the
> data to a Kafka sink. I've already successfully designed a Flink job that
> reads the file contents in a streaming manner and effectively sends them 
> to
> Kafka. However, my specific requirement is a bit more intricate. I need 
> the
> job to not only 

Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-28 Thread Andrew Otto
> This is not a robust solution, I would advise against it.
Oh no?  Am curious as to why not.  It seems not dissimilar to how Kafka
topic retention works: the messages are removed after some time period
(hopefully after they are processed), so why would it be bad to remove
files that are already processed?

Or was it the querying of the checkpoints you were advising against?

To be sure, I was referring to moving the previously processed files away,
not the checkpoints themselves.

On Fri, Oct 27, 2023 at 12:45 PM Alexander Fedulov <
alexander.fedu...@gmail.com> wrote:

> > I wonder if you could use this fact to query the committed checkpoints
> and move them away after the job is done.
>
> This is not a robust solution, I would advise against it.
>
> Best,
> Alexander
>
> On Fri, 27 Oct 2023 at 16:41, Andrew Otto  wrote:
>
>> For moving the files:
>> > It will keep the files as is and remember the name of the file read in
>> checkpointed state to ensure it doesnt read the same file twice.
>>
>> I wonder if you could use this fact to query the committed checkpoints
>> and move them away after the job is done.  I think it should even be safe
>> to do this outside of the Flink job periodically (cron, whatever), because
>> on restart it won't reprocess the files that have been committed in the
>> checkpoints.
>>
>>
>> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/libs/state_processor_api/#reading-state
>>
>>
>>
>>
>> On Fri, Oct 27, 2023 at 1:13 AM arjun s  wrote:
>>
>>> Hi team, Thanks for your quick response.
>>> I have an inquiry regarding file processing in the event of a job
>>> restart. When the job is restarted, we encounter challenges in tracking
>>> which files have been processed and which remain pending. Is there a method
>>> to seamlessly resume processing files from where they were left off,
>>> particularly in situations where we need to submit and restart the job
>>> manually due to any server restart or application restart? This becomes an
>>> issue when the job processes all the files in the directory from the
>>> beginning after a restart, and I'm seeking a solution to address this.
>>>
>>> Thanks and regards,
>>> Arjun
>>>
>>> On Fri, 27 Oct 2023 at 07:29, Chirag Dewan 
>>> wrote:
>>>
 Hi Arjun,

 Flink's FileSource doesnt move or delete the files as of now. It will
 keep the files as is and remember the name of the file read in checkpointed
 state to ensure it doesnt read the same file twice.

 Flink's source API works in a way that single Enumerator operates on
 the JobManager. The enumerator is responsible for listing the files and
 splitting these into smaller units. These units could be the complete file
 (in case of row formats) or splits within a file (for bulk formats). The
 reading is done by SplitReaders in the Task Managers. This way it ensures
 that only reading is done concurrently and is able to track file
 completions.

 You can read more Flink Sources
 
  and here
 

 FileSystem

 FileSystem # This connector provides a unified Source and Sink for
 BATCH and STREAMING that reads or writes (par...

 



 On Thursday, 26 October, 2023 at 06:53:23 pm IST, arjun s <
 arjunjoice...@gmail.com> wrote:


 Hello team,
 I'm currently in the process of configuring a Flink job. This job
 entails reading files from a specified directory and then transmitting the
 data to a Kafka sink. I've already successfully designed a Flink job that
 reads the file contents in a streaming manner and effectively sends them to
 Kafka. However, my specific requirement is a bit more intricate. I need the
 job to not only read these files and push the data to Kafka but also
 relocate the processed file to a different directory once all of its
 contents have been processed. Following this, the job should seamlessly
 transition to processing the next file in the source directory.
 Additionally, I have some concerns regarding how the job will behave if it
 encounters a restart. Could you please advise if this is achievable, and if
 so, provide guidance or code to implement it?

 I'm also quite interested in how the job will handle situations where
 the source has a parallelism greater than 2 or 3, and how it can accurately
 monitor the completion of reading all contents in each file.

 Thanks and Regards,
 Arjun

>>>


Re: Which Flink engine versions do Connectors support?

2023-10-28 Thread Xianxun Ye
Hi Gordon,

Thanks for your information. That is what I need.

And I have responded to the Kafka connector RC vote mail.


Best regards,
Xianxun

> 2023年10月28日 04:13,Tzu-Li (Gordon) Tai  写道:
> 
> Hi Xianxun,
> 
> You can find the list supported Flink versions for each connector here:
> https://flink.apache.org/downloads/#apache-flink-connectors
> 
> Specifically for the Kafka connector, we're in the process of releasing a new 
> version for the connector that works with Flink 1.18.
> The release candidate vote thread is here if you want to test that out: 
> https://lists.apache.org/thread/35gjflv4j2pp2h9oy5syj2vdfpotg486
> 
> Thanks,
> Gordon
> 
> 
> On Fri, Oct 27, 2023 at 12:57 PM Xianxun Ye  > wrote:
>> 
>> Hello Team, 
>> 
>> After the release of Flink 1.18, I found that most connectors had been 
>> externalized, e.g. Kafka, ES, HBase, JDBC, and pulsar connectors.   But I 
>> didn't find any manual or codes indicating which versions of Flink these 
>> connectors could work. 
>> 
>> 
>> Best regards,
>> Xianxun
>> 



Re: Which Flink engine versions do Connectors support?

2023-10-27 Thread Tzu-Li (Gordon) Tai
Hi Xianxun,

You can find the list supported Flink versions for each connector here:
https://flink.apache.org/downloads/#apache-flink-connectors

Specifically for the Kafka connector, we're in the process of releasing a
new version for the connector that works with Flink 1.18.
The release candidate vote thread is here if you want to test that out:
https://lists.apache.org/thread/35gjflv4j2pp2h9oy5syj2vdfpotg486

Thanks,
Gordon


On Fri, Oct 27, 2023 at 12:57 PM Xianxun Ye  wrote:

> 
> Hello Team,
>
> After the release of Flink 1.18, I found that most connectors had been
> externalized, e.g. Kafka, ES, HBase, JDBC, and pulsar connectors.   But I
> didn't find any manual or codes indicating which versions of Flink these
> connectors could work.
>
>
> Best regards,
> Xianxun
>
>


Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-27 Thread Alexander Fedulov
> I wonder if you could use this fact to query the committed checkpoints
and move them away after the job is done.

This is not a robust solution, I would advise against it.

Best,
Alexander

On Fri, 27 Oct 2023 at 16:41, Andrew Otto  wrote:

> For moving the files:
> > It will keep the files as is and remember the name of the file read in
> checkpointed state to ensure it doesnt read the same file twice.
>
> I wonder if you could use this fact to query the committed checkpoints and
> move them away after the job is done.  I think it should even be safe to do
> this outside of the Flink job periodically (cron, whatever), because on
> restart it won't reprocess the files that have been committed in the
> checkpoints.
>
>
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/libs/state_processor_api/#reading-state
>
>
>
>
> On Fri, Oct 27, 2023 at 1:13 AM arjun s  wrote:
>
>> Hi team, Thanks for your quick response.
>> I have an inquiry regarding file processing in the event of a job
>> restart. When the job is restarted, we encounter challenges in tracking
>> which files have been processed and which remain pending. Is there a method
>> to seamlessly resume processing files from where they were left off,
>> particularly in situations where we need to submit and restart the job
>> manually due to any server restart or application restart? This becomes an
>> issue when the job processes all the files in the directory from the
>> beginning after a restart, and I'm seeking a solution to address this.
>>
>> Thanks and regards,
>> Arjun
>>
>> On Fri, 27 Oct 2023 at 07:29, Chirag Dewan 
>> wrote:
>>
>>> Hi Arjun,
>>>
>>> Flink's FileSource doesnt move or delete the files as of now. It will
>>> keep the files as is and remember the name of the file read in checkpointed
>>> state to ensure it doesnt read the same file twice.
>>>
>>> Flink's source API works in a way that single Enumerator operates on the
>>> JobManager. The enumerator is responsible for listing the files and
>>> splitting these into smaller units. These units could be the complete file
>>> (in case of row formats) or splits within a file (for bulk formats). The
>>> reading is done by SplitReaders in the Task Managers. This way it ensures
>>> that only reading is done concurrently and is able to track file
>>> completions.
>>>
>>> You can read more Flink Sources
>>> 
>>>  and here
>>> 
>>>
>>> FileSystem
>>>
>>> FileSystem # This connector provides a unified Source and Sink for BATCH
>>> and STREAMING that reads or writes (par...
>>>
>>> 
>>>
>>>
>>>
>>> On Thursday, 26 October, 2023 at 06:53:23 pm IST, arjun s <
>>> arjunjoice...@gmail.com> wrote:
>>>
>>>
>>> Hello team,
>>> I'm currently in the process of configuring a Flink job. This job
>>> entails reading files from a specified directory and then transmitting the
>>> data to a Kafka sink. I've already successfully designed a Flink job that
>>> reads the file contents in a streaming manner and effectively sends them to
>>> Kafka. However, my specific requirement is a bit more intricate. I need the
>>> job to not only read these files and push the data to Kafka but also
>>> relocate the processed file to a different directory once all of its
>>> contents have been processed. Following this, the job should seamlessly
>>> transition to processing the next file in the source directory.
>>> Additionally, I have some concerns regarding how the job will behave if it
>>> encounters a restart. Could you please advise if this is achievable, and if
>>> so, provide guidance or code to implement it?
>>>
>>> I'm also quite interested in how the job will handle situations where
>>> the source has a parallelism greater than 2 or 3, and how it can accurately
>>> monitor the completion of reading all contents in each file.
>>>
>>> Thanks and Regards,
>>> Arjun
>>>
>>


Re: Invalid Null Check in DefaultFileFilter

2023-10-27 Thread Alexander Fedulov
* with regards to empty string. The null check is still a bit defensive and
one could return false in test(), but it does not matter really since
String.substring in getName() can never return null.

On Fri, 27 Oct 2023 at 16:32, Alexander Fedulov 
wrote:

> Actually, this is not even "defensive programming", but is the required
> logic for processing directories.
> See here:
>
> https://github.com/apache/flink/blob/release-1.18/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/enumerate/NonSplittingRecursiveEnumerator.java#L90
>
>
> https://github.com/apache/flink/blob/release-1.18/flink-core/src/main/java/org/apache/flink/core/fs/Path.java#L295
>
> Returning false would prevent addSplitsForPath from adding all nested
> files recursively.
>
> Best,
> Alexander
>
>
>
> On Fri, 27 Oct 2023 at 04:04, Chirag Dewan 
> wrote:
>
>> Yeah agree, not a problem in general. But it just seems odd. Returning
>> true if a fileName can be null will blow up a lot more in the reader as far
>> as my understanding goes.
>>
>> I just want to understand whether this is an erroneous condition or an
>> actual use case. Lets say is it possible to get a null file name for some
>> sub directories and hence important to return true so that the File Source
>> can monitor inside those sub directories?
>>
>> On Friday, 27 October, 2023 at 12:58:44 am IST, Alexander Fedulov <
>> alexander.fedu...@gmail.com> wrote:
>>
>>
>> Is there an actual issue behind this question?
>>
>> In general: this is a form of defensive programming for a public
>> interface and the decision here is to be more lenient when facing
>> potentially erroneous user input rather than blow up the whole application
>> with a NullPointerException.
>>
>> Best,
>> Alexander Fedulov
>>
>> On Thu, 26 Oct 2023 at 07:35, Chirag Dewan via user <
>> user@flink.apache.org> wrote:
>>
>> Hi,
>>
>> I was looking at this check in DefaultFileFilter:
>>
>> public boolean test(Path path) {
>> final String fileName = path.getName();
>> if (fileName == null || fileName.length() == 0) {
>> return true;
>> }
>>
>> Was wondering how can a file name be null?
>>
>> And if null, shouldnt this be *return false*?
>>
>> I created a JIRA for this - [FLINK-33367] Invalid Check in
>> DefaultFileFilter - ASF JIRA
>> 
>>
>> [FLINK-33367] Invalid Check in DefaultFileFilter - ASF JIRA
>>
>> 
>> Any input is appreciated.
>>
>> Thanks
>>
>>
>>
>>


Re: Invalid Null Check in DefaultFileFilter

2023-10-27 Thread Alexander Fedulov
Actually, this is not even "defensive programming", but is the required
logic for processing directories.
See here:
https://github.com/apache/flink/blob/release-1.18/flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/enumerate/NonSplittingRecursiveEnumerator.java#L90

https://github.com/apache/flink/blob/release-1.18/flink-core/src/main/java/org/apache/flink/core/fs/Path.java#L295

Returning false would prevent addSplitsForPath from adding all nested files
recursively.

Best,
Alexander



On Fri, 27 Oct 2023 at 04:04, Chirag Dewan  wrote:

> Yeah agree, not a problem in general. But it just seems odd. Returning
> true if a fileName can be null will blow up a lot more in the reader as far
> as my understanding goes.
>
> I just want to understand whether this is an erroneous condition or an
> actual use case. Lets say is it possible to get a null file name for some
> sub directories and hence important to return true so that the File Source
> can monitor inside those sub directories?
>
> On Friday, 27 October, 2023 at 12:58:44 am IST, Alexander Fedulov <
> alexander.fedu...@gmail.com> wrote:
>
>
> Is there an actual issue behind this question?
>
> In general: this is a form of defensive programming for a public interface
> and the decision here is to be more lenient when facing potentially
> erroneous user input rather than blow up the whole application with a
> NullPointerException.
>
> Best,
> Alexander Fedulov
>
> On Thu, 26 Oct 2023 at 07:35, Chirag Dewan via user 
> wrote:
>
> Hi,
>
> I was looking at this check in DefaultFileFilter:
>
> public boolean test(Path path) {
> final String fileName = path.getName();
> if (fileName == null || fileName.length() == 0) {
> return true;
> }
>
> Was wondering how can a file name be null?
>
> And if null, shouldnt this be *return false*?
>
> I created a JIRA for this - [FLINK-33367] Invalid Check in
> DefaultFileFilter - ASF JIRA
> 
>
> [FLINK-33367] Invalid Check in DefaultFileFilter - ASF JIRA
>
> 
> Any input is appreciated.
>
> Thanks
>
>
>
>


Re: Updating existing state with state processor API

2023-10-27 Thread Alexis Sarda-Espinosa
Hi Matthias,

Thanks for the response. I guess the specific question would be, if I work
with an existing savepoint and pass an empty DataStream to
OperatorTransformation#bootstrapWith, will the new savepoint end up with an
empty state for the modified operator, or will it maintain the existing
state because nothing was changed?

Regards,
Alexis.

Am Fr., 27. Okt. 2023 um 08:40 Uhr schrieb Schwalbe Matthias <
matthias.schwa...@viseca.ch>:

> Good morning Alexis,
>
>
>
> Something like this we do all the time.
>
> Read and existing savepoint, copy some of the not to be changed operator
> states (keyed/non-keyed) over, and process/patch the remaining ones by
> transforming and bootstrapping to new state.
>
>
>
> I could spare more details for more specific questions, if you like 
>
>
>
> Regards
>
>
>
> Thias
>
>
>
> PS: I’m currently working on this ticket in order to get some glitches
> removed: FLINK-26585 
>
>
>
>
>
> *From:* Alexis Sarda-Espinosa 
> *Sent:* Thursday, October 26, 2023 4:01 PM
> *To:* user 
> *Subject:* Updating existing state with state processor API
>
>
>
> Hello,
>
>
>
> The documentation of the state processor API has some examples to modify
> an existing savepoint by defining a StateBootstrapTransformation. In all
> cases, the entrypoint is OperatorTransformation#bootstrapWith, which
> expects a DataStream. If I pass an empty DataStream to bootstrapWith and
> then apply the resulting transformation to an existing savepoint, will the
> transformation still receive data from the existing state?
>
>
>
> If the aforementioned is incorrect, I imagine I could instantiate
> a SavepointReader and create a DataStream of the existing state with it,
> which I could then pass to the bootstrapWith method directly or after
> "unioning" it with additional state. Would this work?
>
>
>
> Regards,
>
> Alexis.
>
>
> Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und
> beinhaltet unter Umständen vertrauliche Mitteilungen. Da die
> Vertraulichkeit von e-Mail-Nachrichten nicht gewährleistet werden kann,
> übernehmen wir keine Haftung für die Gewährung der Vertraulichkeit und
> Unversehrtheit dieser Mitteilung. Bei irrtümlicher Zustellung bitten wir
> Sie um Benachrichtigung per e-Mail und um Löschung dieser Nachricht sowie
> eventueller Anhänge. Jegliche unberechtigte Verwendung oder Verbreitung
> dieser Informationen ist streng verboten.
>
> This message is intended only for the named recipient and may contain
> confidential or privileged information. As the confidentiality of email
> communication cannot be guaranteed, we do not accept any responsibility for
> the confidentiality and the intactness of this message. If you have
> received it in error, please advise the sender by return e-mail and delete
> this message and any attachments. Any unauthorised use or dissemination of
> this information is strictly prohibited.
>


Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-27 Thread Andrew Otto
For moving the files:
> It will keep the files as is and remember the name of the file read in
checkpointed state to ensure it doesnt read the same file twice.

I wonder if you could use this fact to query the committed checkpoints and
move them away after the job is done.  I think it should even be safe to do
this outside of the Flink job periodically (cron, whatever), because on
restart it won't reprocess the files that have been committed in the
checkpoints.

https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/libs/state_processor_api/#reading-state




On Fri, Oct 27, 2023 at 1:13 AM arjun s  wrote:

> Hi team, Thanks for your quick response.
> I have an inquiry regarding file processing in the event of a job restart.
> When the job is restarted, we encounter challenges in tracking which files
> have been processed and which remain pending. Is there a method to
> seamlessly resume processing files from where they were left off,
> particularly in situations where we need to submit and restart the job
> manually due to any server restart or application restart? This becomes an
> issue when the job processes all the files in the directory from the
> beginning after a restart, and I'm seeking a solution to address this.
>
> Thanks and regards,
> Arjun
>
> On Fri, 27 Oct 2023 at 07:29, Chirag Dewan 
> wrote:
>
>> Hi Arjun,
>>
>> Flink's FileSource doesnt move or delete the files as of now. It will
>> keep the files as is and remember the name of the file read in checkpointed
>> state to ensure it doesnt read the same file twice.
>>
>> Flink's source API works in a way that single Enumerator operates on the
>> JobManager. The enumerator is responsible for listing the files and
>> splitting these into smaller units. These units could be the complete file
>> (in case of row formats) or splits within a file (for bulk formats). The
>> reading is done by SplitReaders in the Task Managers. This way it ensures
>> that only reading is done concurrently and is able to track file
>> completions.
>>
>> You can read more Flink Sources
>> 
>>  and here
>> 
>>
>> FileSystem
>>
>> FileSystem # This connector provides a unified Source and Sink for BATCH
>> and STREAMING that reads or writes (par...
>>
>> 
>>
>>
>>
>> On Thursday, 26 October, 2023 at 06:53:23 pm IST, arjun s <
>> arjunjoice...@gmail.com> wrote:
>>
>>
>> Hello team,
>> I'm currently in the process of configuring a Flink job. This job entails
>> reading files from a specified directory and then transmitting the data to
>> a Kafka sink. I've already successfully designed a Flink job that reads the
>> file contents in a streaming manner and effectively sends them to Kafka.
>> However, my specific requirement is a bit more intricate. I need the job to
>> not only read these files and push the data to Kafka but also relocate the
>> processed file to a different directory once all of its contents have been
>> processed. Following this, the job should seamlessly transition to
>> processing the next file in the source directory. Additionally, I have some
>> concerns regarding how the job will behave if it encounters a restart.
>> Could you please advise if this is achievable, and if so, provide guidance
>> or code to implement it?
>>
>> I'm also quite interested in how the job will handle situations where the
>> source has a parallelism greater than 2 or 3, and how it can accurately
>> monitor the completion of reading all contents in each file.
>>
>> Thanks and Regards,
>> Arjun
>>
>


RE: Updating existing state with state processor API

2023-10-27 Thread Schwalbe Matthias
Good morning Alexis,

Something like this we do all the time.
Read and existing savepoint, copy some of the not to be changed operator states 
(keyed/non-keyed) over, and process/patch the remaining ones by transforming 
and bootstrapping to new state.

I could spare more details for more specific questions, if you like 

Regards

Thias

PS: I’m currently working on this ticket in order to get some glitches removed: 
FLINK-26585


From: Alexis Sarda-Espinosa 
Sent: Thursday, October 26, 2023 4:01 PM
To: user 
Subject: Updating existing state with state processor API

Hello,

The documentation of the state processor API has some examples to modify an 
existing savepoint by defining a StateBootstrapTransformation. In all cases, 
the entrypoint is OperatorTransformation#bootstrapWith, which expects a 
DataStream. If I pass an empty DataStream to bootstrapWith and then apply the 
resulting transformation to an existing savepoint, will the transformation 
still receive data from the existing state?

If the aforementioned is incorrect, I imagine I could instantiate a 
SavepointReader and create a DataStream of the existing state with it, which I 
could then pass to the bootstrapWith method directly or after "unioning" it 
with additional state. Would this work?

Regards,
Alexis.

Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-26 Thread arjun s
Hi team, Thanks for your quick response.
I have an inquiry regarding file processing in the event of a job restart.
When the job is restarted, we encounter challenges in tracking which files
have been processed and which remain pending. Is there a method to
seamlessly resume processing files from where they were left off,
particularly in situations where we need to submit and restart the job
manually due to any server restart or application restart? This becomes an
issue when the job processes all the files in the directory from the
beginning after a restart, and I'm seeking a solution to address this.

Thanks and regards,
Arjun

On Fri, 27 Oct 2023 at 07:29, Chirag Dewan  wrote:

> Hi Arjun,
>
> Flink's FileSource doesnt move or delete the files as of now. It will keep
> the files as is and remember the name of the file read in checkpointed
> state to ensure it doesnt read the same file twice.
>
> Flink's source API works in a way that single Enumerator operates on the
> JobManager. The enumerator is responsible for listing the files and
> splitting these into smaller units. These units could be the complete file
> (in case of row formats) or splits within a file (for bulk formats). The
> reading is done by SplitReaders in the Task Managers. This way it ensures
> that only reading is done concurrently and is able to track file
> completions.
>
> You can read more Flink Sources
> 
>  and here
> 
>
> FileSystem
>
> FileSystem # This connector provides a unified Source and Sink for BATCH
> and STREAMING that reads or writes (par...
>
> 
>
>
>
> On Thursday, 26 October, 2023 at 06:53:23 pm IST, arjun s <
> arjunjoice...@gmail.com> wrote:
>
>
> Hello team,
> I'm currently in the process of configuring a Flink job. This job entails
> reading files from a specified directory and then transmitting the data to
> a Kafka sink. I've already successfully designed a Flink job that reads the
> file contents in a streaming manner and effectively sends them to Kafka.
> However, my specific requirement is a bit more intricate. I need the job to
> not only read these files and push the data to Kafka but also relocate the
> processed file to a different directory once all of its contents have been
> processed. Following this, the job should seamlessly transition to
> processing the next file in the source directory. Additionally, I have some
> concerns regarding how the job will behave if it encounters a restart.
> Could you please advise if this is achievable, and if so, provide guidance
> or code to implement it?
>
> I'm also quite interested in how the job will handle situations where the
> source has a parallelism greater than 2 or 3, and how it can accurately
> monitor the completion of reading all contents in each file.
>
> Thanks and Regards,
> Arjun
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Jark Wu
Congratulations and thanks release managers and everyone who has
contributed!

Best,
Jark

On Fri, 27 Oct 2023 at 12:25, Hang Ruan  wrote:

> Congratulations!
>
> Best,
> Hang
>
> Samrat Deb  于2023年10月27日周五 11:50写道:
>
> > Congratulations on the great release
> >
> > Bests,
> > Samrat
> >
> > On Fri, 27 Oct 2023 at 7:59 AM, Yangze Guo  wrote:
> >
> > > Great work! Congratulations to everyone involved!
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Fri, Oct 27, 2023 at 10:23 AM Qingsheng Ren 
> wrote:
> > > >
> > > > Congratulations and big THANK YOU to everyone helping with this
> > release!
> > > >
> > > > Best,
> > > > Qingsheng
> > > >
> > > > On Fri, Oct 27, 2023 at 10:18 AM Benchao Li 
> > > wrote:
> > > >>
> > > >> Great work, thanks everyone involved!
> > > >>
> > > >> Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
> > > >> >
> > > >> > Thanks for the great work!
> > > >> >
> > > >> > Best,
> > > >> > Rui
> > > >> >
> > > >> > On Fri, Oct 27, 2023 at 10:03 AM Paul Lam 
> > > wrote:
> > > >> >
> > > >> > > Finally! Thanks to all!
> > > >> > >
> > > >> > > Best,
> > > >> > > Paul Lam
> > > >> > >
> > > >> > > > 2023年10月27日 03:58,Alexander Fedulov <
> > alexander.fedu...@gmail.com>
> > > 写道:
> > > >> > > >
> > > >> > > > Great work, thanks everyone!
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Alexander
> > > >> > > >
> > > >> > > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser <
> > > martijnvis...@apache.org>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > >> Thank you all who have contributed!
> > > >> > > >>
> > > >> > > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin <
> > > jinfeng1...@gmail.com>
> > > >> > > >>
> > > >> > > >>> Thanks for the great work! Congratulations
> > > >> > > >>>
> > > >> > > >>>
> > > >> > > >>> Best,
> > > >> > > >>> Feng Jin
> > > >> > > >>>
> > > >> > > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu <
> > xbjt...@gmail.com>
> > > wrote:
> > > >> > > >>>
> > > >> > >  Congratulations, Well done!
> > > >> > > 
> > > >> > >  Best,
> > > >> > >  Leonard
> > > >> > > 
> > > >> > >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee <
> > > lincoln.8...@gmail.com>
> > > >> > >  wrote:
> > > >> > > 
> > > >> > > > Thanks for the great work! Congrats all!
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Lincoln Lee
> > > >> > > >
> > > >> > > >
> > > >> > > > Jing Ge  于2023年10月27日周五
> > 00:16写道:
> > > >> > > >
> > > >> > > >> The Apache Flink community is very happy to announce the
> > > release of
> > > >> > > > Apache
> > > >> > > >> Flink 1.18.0, which is the first release for the Apache
> > > Flink 1.18
> > > >> > > > series.
> > > >> > > >>
> > > >> > > >> Apache Flink® is an open-source unified stream and batch
> > data
> > > >> > >  processing
> > > >> > > >> framework for distributed, high-performing,
> > > always-available, and
> > > >> > > > accurate
> > > >> > > >> data applications.
> > > >> > > >>
> > > >> > > >> The release is available for download at:
> > > >> > > >> https://flink.apache.org/downloads.html
> > > >> > > >>
> > > >> > > >> Please check out the release blog post for an overview of
> > the
> > > >> > > > improvements
> > > >> > > >> for this release:
> > > >> > > >>
> > > >> > > >>
> > > >> > > >
> > > >> > > 
> > > >> > > >>>
> > > >> > > >>
> > > >> > >
> > >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > > >> > > >>
> > > >> > > >> The full release notes are available in Jira:
> > > >> > > >>
> > > >> > > >>
> > > >> > > >
> > > >> > > 
> > > >> > > >>>
> > > >> > > >>
> > > >> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > > >> > > >>
> > > >> > > >> We would like to thank all contributors of the Apache
> Flink
> > > >> > > >> community
> > > >> > >  who
> > > >> > > >> made this release possible!
> > > >> > > >>
> > > >> > > >> Best regards,
> > > >> > > >> Konstantin, Qingsheng, Sergey, and Jing
> > > >> > > >>
> > > >> > > >
> > > >> > > 
> > > >> > > >>>
> > > >> > > >>
> > > >> > >
> > > >> > >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >>
> > > >> Best,
> > > >> Benchao Li
> > >
> >
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Jark Wu
Congratulations and thanks release managers and everyone who has
contributed!

Best,
Jark

On Fri, 27 Oct 2023 at 12:25, Hang Ruan  wrote:

> Congratulations!
>
> Best,
> Hang
>
> Samrat Deb  于2023年10月27日周五 11:50写道:
>
> > Congratulations on the great release
> >
> > Bests,
> > Samrat
> >
> > On Fri, 27 Oct 2023 at 7:59 AM, Yangze Guo  wrote:
> >
> > > Great work! Congratulations to everyone involved!
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Fri, Oct 27, 2023 at 10:23 AM Qingsheng Ren 
> wrote:
> > > >
> > > > Congratulations and big THANK YOU to everyone helping with this
> > release!
> > > >
> > > > Best,
> > > > Qingsheng
> > > >
> > > > On Fri, Oct 27, 2023 at 10:18 AM Benchao Li 
> > > wrote:
> > > >>
> > > >> Great work, thanks everyone involved!
> > > >>
> > > >> Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
> > > >> >
> > > >> > Thanks for the great work!
> > > >> >
> > > >> > Best,
> > > >> > Rui
> > > >> >
> > > >> > On Fri, Oct 27, 2023 at 10:03 AM Paul Lam 
> > > wrote:
> > > >> >
> > > >> > > Finally! Thanks to all!
> > > >> > >
> > > >> > > Best,
> > > >> > > Paul Lam
> > > >> > >
> > > >> > > > 2023年10月27日 03:58,Alexander Fedulov <
> > alexander.fedu...@gmail.com>
> > > 写道:
> > > >> > > >
> > > >> > > > Great work, thanks everyone!
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Alexander
> > > >> > > >
> > > >> > > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser <
> > > martijnvis...@apache.org>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > >> Thank you all who have contributed!
> > > >> > > >>
> > > >> > > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin <
> > > jinfeng1...@gmail.com>
> > > >> > > >>
> > > >> > > >>> Thanks for the great work! Congratulations
> > > >> > > >>>
> > > >> > > >>>
> > > >> > > >>> Best,
> > > >> > > >>> Feng Jin
> > > >> > > >>>
> > > >> > > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu <
> > xbjt...@gmail.com>
> > > wrote:
> > > >> > > >>>
> > > >> > >  Congratulations, Well done!
> > > >> > > 
> > > >> > >  Best,
> > > >> > >  Leonard
> > > >> > > 
> > > >> > >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee <
> > > lincoln.8...@gmail.com>
> > > >> > >  wrote:
> > > >> > > 
> > > >> > > > Thanks for the great work! Congrats all!
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Lincoln Lee
> > > >> > > >
> > > >> > > >
> > > >> > > > Jing Ge  于2023年10月27日周五
> > 00:16写道:
> > > >> > > >
> > > >> > > >> The Apache Flink community is very happy to announce the
> > > release of
> > > >> > > > Apache
> > > >> > > >> Flink 1.18.0, which is the first release for the Apache
> > > Flink 1.18
> > > >> > > > series.
> > > >> > > >>
> > > >> > > >> Apache Flink® is an open-source unified stream and batch
> > data
> > > >> > >  processing
> > > >> > > >> framework for distributed, high-performing,
> > > always-available, and
> > > >> > > > accurate
> > > >> > > >> data applications.
> > > >> > > >>
> > > >> > > >> The release is available for download at:
> > > >> > > >> https://flink.apache.org/downloads.html
> > > >> > > >>
> > > >> > > >> Please check out the release blog post for an overview of
> > the
> > > >> > > > improvements
> > > >> > > >> for this release:
> > > >> > > >>
> > > >> > > >>
> > > >> > > >
> > > >> > > 
> > > >> > > >>>
> > > >> > > >>
> > > >> > >
> > >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > > >> > > >>
> > > >> > > >> The full release notes are available in Jira:
> > > >> > > >>
> > > >> > > >>
> > > >> > > >
> > > >> > > 
> > > >> > > >>>
> > > >> > > >>
> > > >> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > > >> > > >>
> > > >> > > >> We would like to thank all contributors of the Apache
> Flink
> > > >> > > >> community
> > > >> > >  who
> > > >> > > >> made this release possible!
> > > >> > > >>
> > > >> > > >> Best regards,
> > > >> > > >> Konstantin, Qingsheng, Sergey, and Jing
> > > >> > > >>
> > > >> > > >
> > > >> > > 
> > > >> > > >>>
> > > >> > > >>
> > > >> > >
> > > >> > >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >>
> > > >> Best,
> > > >> Benchao Li
> > >
> >
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Hang Ruan
Congratulations!

Best,
Hang

Samrat Deb  于2023年10月27日周五 11:50写道:

> Congratulations on the great release
>
> Bests,
> Samrat
>
> On Fri, 27 Oct 2023 at 7:59 AM, Yangze Guo  wrote:
>
> > Great work! Congratulations to everyone involved!
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Oct 27, 2023 at 10:23 AM Qingsheng Ren  wrote:
> > >
> > > Congratulations and big THANK YOU to everyone helping with this
> release!
> > >
> > > Best,
> > > Qingsheng
> > >
> > > On Fri, Oct 27, 2023 at 10:18 AM Benchao Li 
> > wrote:
> > >>
> > >> Great work, thanks everyone involved!
> > >>
> > >> Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
> > >> >
> > >> > Thanks for the great work!
> > >> >
> > >> > Best,
> > >> > Rui
> > >> >
> > >> > On Fri, Oct 27, 2023 at 10:03 AM Paul Lam 
> > wrote:
> > >> >
> > >> > > Finally! Thanks to all!
> > >> > >
> > >> > > Best,
> > >> > > Paul Lam
> > >> > >
> > >> > > > 2023年10月27日 03:58,Alexander Fedulov <
> alexander.fedu...@gmail.com>
> > 写道:
> > >> > > >
> > >> > > > Great work, thanks everyone!
> > >> > > >
> > >> > > > Best,
> > >> > > > Alexander
> > >> > > >
> > >> > > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser <
> > martijnvis...@apache.org>
> > >> > > > wrote:
> > >> > > >
> > >> > > >> Thank you all who have contributed!
> > >> > > >>
> > >> > > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin <
> > jinfeng1...@gmail.com>
> > >> > > >>
> > >> > > >>> Thanks for the great work! Congratulations
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> Best,
> > >> > > >>> Feng Jin
> > >> > > >>>
> > >> > > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu <
> xbjt...@gmail.com>
> > wrote:
> > >> > > >>>
> > >> > >  Congratulations, Well done!
> > >> > > 
> > >> > >  Best,
> > >> > >  Leonard
> > >> > > 
> > >> > >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee <
> > lincoln.8...@gmail.com>
> > >> > >  wrote:
> > >> > > 
> > >> > > > Thanks for the great work! Congrats all!
> > >> > > >
> > >> > > > Best,
> > >> > > > Lincoln Lee
> > >> > > >
> > >> > > >
> > >> > > > Jing Ge  于2023年10月27日周五
> 00:16写道:
> > >> > > >
> > >> > > >> The Apache Flink community is very happy to announce the
> > release of
> > >> > > > Apache
> > >> > > >> Flink 1.18.0, which is the first release for the Apache
> > Flink 1.18
> > >> > > > series.
> > >> > > >>
> > >> > > >> Apache Flink® is an open-source unified stream and batch
> data
> > >> > >  processing
> > >> > > >> framework for distributed, high-performing,
> > always-available, and
> > >> > > > accurate
> > >> > > >> data applications.
> > >> > > >>
> > >> > > >> The release is available for download at:
> > >> > > >> https://flink.apache.org/downloads.html
> > >> > > >>
> > >> > > >> Please check out the release blog post for an overview of
> the
> > >> > > > improvements
> > >> > > >> for this release:
> > >> > > >>
> > >> > > >>
> > >> > > >
> > >> > > 
> > >> > > >>>
> > >> > > >>
> > >> > >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > >> > > >>
> > >> > > >> The full release notes are available in Jira:
> > >> > > >>
> > >> > > >>
> > >> > > >
> > >> > > 
> > >> > > >>>
> > >> > > >>
> > >> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > >> > > >>
> > >> > > >> We would like to thank all contributors of the Apache Flink
> > >> > > >> community
> > >> > >  who
> > >> > > >> made this release possible!
> > >> > > >>
> > >> > > >> Best regards,
> > >> > > >> Konstantin, Qingsheng, Sergey, and Jing
> > >> > > >>
> > >> > > >
> > >> > > 
> > >> > > >>>
> > >> > > >>
> > >> > >
> > >> > >
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> Best,
> > >> Benchao Li
> >
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Hang Ruan
Congratulations!

Best,
Hang

Samrat Deb  于2023年10月27日周五 11:50写道:

> Congratulations on the great release
>
> Bests,
> Samrat
>
> On Fri, 27 Oct 2023 at 7:59 AM, Yangze Guo  wrote:
>
> > Great work! Congratulations to everyone involved!
> >
> > Best,
> > Yangze Guo
> >
> > On Fri, Oct 27, 2023 at 10:23 AM Qingsheng Ren  wrote:
> > >
> > > Congratulations and big THANK YOU to everyone helping with this
> release!
> > >
> > > Best,
> > > Qingsheng
> > >
> > > On Fri, Oct 27, 2023 at 10:18 AM Benchao Li 
> > wrote:
> > >>
> > >> Great work, thanks everyone involved!
> > >>
> > >> Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
> > >> >
> > >> > Thanks for the great work!
> > >> >
> > >> > Best,
> > >> > Rui
> > >> >
> > >> > On Fri, Oct 27, 2023 at 10:03 AM Paul Lam 
> > wrote:
> > >> >
> > >> > > Finally! Thanks to all!
> > >> > >
> > >> > > Best,
> > >> > > Paul Lam
> > >> > >
> > >> > > > 2023年10月27日 03:58,Alexander Fedulov <
> alexander.fedu...@gmail.com>
> > 写道:
> > >> > > >
> > >> > > > Great work, thanks everyone!
> > >> > > >
> > >> > > > Best,
> > >> > > > Alexander
> > >> > > >
> > >> > > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser <
> > martijnvis...@apache.org>
> > >> > > > wrote:
> > >> > > >
> > >> > > >> Thank you all who have contributed!
> > >> > > >>
> > >> > > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin <
> > jinfeng1...@gmail.com>
> > >> > > >>
> > >> > > >>> Thanks for the great work! Congratulations
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> Best,
> > >> > > >>> Feng Jin
> > >> > > >>>
> > >> > > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu <
> xbjt...@gmail.com>
> > wrote:
> > >> > > >>>
> > >> > >  Congratulations, Well done!
> > >> > > 
> > >> > >  Best,
> > >> > >  Leonard
> > >> > > 
> > >> > >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee <
> > lincoln.8...@gmail.com>
> > >> > >  wrote:
> > >> > > 
> > >> > > > Thanks for the great work! Congrats all!
> > >> > > >
> > >> > > > Best,
> > >> > > > Lincoln Lee
> > >> > > >
> > >> > > >
> > >> > > > Jing Ge  于2023年10月27日周五
> 00:16写道:
> > >> > > >
> > >> > > >> The Apache Flink community is very happy to announce the
> > release of
> > >> > > > Apache
> > >> > > >> Flink 1.18.0, which is the first release for the Apache
> > Flink 1.18
> > >> > > > series.
> > >> > > >>
> > >> > > >> Apache Flink® is an open-source unified stream and batch
> data
> > >> > >  processing
> > >> > > >> framework for distributed, high-performing,
> > always-available, and
> > >> > > > accurate
> > >> > > >> data applications.
> > >> > > >>
> > >> > > >> The release is available for download at:
> > >> > > >> https://flink.apache.org/downloads.html
> > >> > > >>
> > >> > > >> Please check out the release blog post for an overview of
> the
> > >> > > > improvements
> > >> > > >> for this release:
> > >> > > >>
> > >> > > >>
> > >> > > >
> > >> > > 
> > >> > > >>>
> > >> > > >>
> > >> > >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > >> > > >>
> > >> > > >> The full release notes are available in Jira:
> > >> > > >>
> > >> > > >>
> > >> > > >
> > >> > > 
> > >> > > >>>
> > >> > > >>
> > >> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > >> > > >>
> > >> > > >> We would like to thank all contributors of the Apache Flink
> > >> > > >> community
> > >> > >  who
> > >> > > >> made this release possible!
> > >> > > >>
> > >> > > >> Best regards,
> > >> > > >> Konstantin, Qingsheng, Sergey, and Jing
> > >> > > >>
> > >> > > >
> > >> > > 
> > >> > > >>>
> > >> > > >>
> > >> > >
> > >> > >
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> Best,
> > >> Benchao Li
> >
>


Re: 退订

2023-10-26 Thread Junrui Lee
Hi,

请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 来取消订阅邮件。

Best,
Junrui

13430298988 <13430298...@163.com> 于2023年10月27日周五 11:00写道:

> 退订


Re: 退订

2023-10-26 Thread Junrui Lee
Hi,

请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 来取消订阅邮件。

Best,
Junrui

chenyu_opensource  于2023年10月27日周五 10:20写道:

> 退订


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Yangze Guo
Great work! Congratulations to everyone involved!

Best,
Yangze Guo

On Fri, Oct 27, 2023 at 10:23 AM Qingsheng Ren  wrote:
>
> Congratulations and big THANK YOU to everyone helping with this release!
>
> Best,
> Qingsheng
>
> On Fri, Oct 27, 2023 at 10:18 AM Benchao Li  wrote:
>>
>> Great work, thanks everyone involved!
>>
>> Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
>> >
>> > Thanks for the great work!
>> >
>> > Best,
>> > Rui
>> >
>> > On Fri, Oct 27, 2023 at 10:03 AM Paul Lam  wrote:
>> >
>> > > Finally! Thanks to all!
>> > >
>> > > Best,
>> > > Paul Lam
>> > >
>> > > > 2023年10月27日 03:58,Alexander Fedulov  写道:
>> > > >
>> > > > Great work, thanks everyone!
>> > > >
>> > > > Best,
>> > > > Alexander
>> > > >
>> > > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser 
>> > > > wrote:
>> > > >
>> > > >> Thank you all who have contributed!
>> > > >>
>> > > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin 
>> > > >>
>> > > >>> Thanks for the great work! Congratulations
>> > > >>>
>> > > >>>
>> > > >>> Best,
>> > > >>> Feng Jin
>> > > >>>
>> > > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  
>> > > >>> wrote:
>> > > >>>
>> > >  Congratulations, Well done!
>> > > 
>> > >  Best,
>> > >  Leonard
>> > > 
>> > >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
>> > >  
>> > >  wrote:
>> > > 
>> > > > Thanks for the great work! Congrats all!
>> > > >
>> > > > Best,
>> > > > Lincoln Lee
>> > > >
>> > > >
>> > > > Jing Ge  于2023年10月27日周五 00:16写道:
>> > > >
>> > > >> The Apache Flink community is very happy to announce the release 
>> > > >> of
>> > > > Apache
>> > > >> Flink 1.18.0, which is the first release for the Apache Flink 1.18
>> > > > series.
>> > > >>
>> > > >> Apache Flink® is an open-source unified stream and batch data
>> > >  processing
>> > > >> framework for distributed, high-performing, always-available, and
>> > > > accurate
>> > > >> data applications.
>> > > >>
>> > > >> The release is available for download at:
>> > > >> https://flink.apache.org/downloads.html
>> > > >>
>> > > >> Please check out the release blog post for an overview of the
>> > > > improvements
>> > > >> for this release:
>> > > >>
>> > > >>
>> > > >
>> > > 
>> > > >>>
>> > > >>
>> > > https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
>> > > >>
>> > > >> The full release notes are available in Jira:
>> > > >>
>> > > >>
>> > > >
>> > > 
>> > > >>>
>> > > >>
>> > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
>> > > >>
>> > > >> We would like to thank all contributors of the Apache Flink
>> > > >> community
>> > >  who
>> > > >> made this release possible!
>> > > >>
>> > > >> Best regards,
>> > > >> Konstantin, Qingsheng, Sergey, and Jing
>> > > >>
>> > > >
>> > > 
>> > > >>>
>> > > >>
>> > >
>> > >
>>
>>
>>
>> --
>>
>> Best,
>> Benchao Li


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Yangze Guo
Great work! Congratulations to everyone involved!

Best,
Yangze Guo

On Fri, Oct 27, 2023 at 10:23 AM Qingsheng Ren  wrote:
>
> Congratulations and big THANK YOU to everyone helping with this release!
>
> Best,
> Qingsheng
>
> On Fri, Oct 27, 2023 at 10:18 AM Benchao Li  wrote:
>>
>> Great work, thanks everyone involved!
>>
>> Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
>> >
>> > Thanks for the great work!
>> >
>> > Best,
>> > Rui
>> >
>> > On Fri, Oct 27, 2023 at 10:03 AM Paul Lam  wrote:
>> >
>> > > Finally! Thanks to all!
>> > >
>> > > Best,
>> > > Paul Lam
>> > >
>> > > > 2023年10月27日 03:58,Alexander Fedulov  写道:
>> > > >
>> > > > Great work, thanks everyone!
>> > > >
>> > > > Best,
>> > > > Alexander
>> > > >
>> > > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser 
>> > > > wrote:
>> > > >
>> > > >> Thank you all who have contributed!
>> > > >>
>> > > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin 
>> > > >>
>> > > >>> Thanks for the great work! Congratulations
>> > > >>>
>> > > >>>
>> > > >>> Best,
>> > > >>> Feng Jin
>> > > >>>
>> > > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  
>> > > >>> wrote:
>> > > >>>
>> > >  Congratulations, Well done!
>> > > 
>> > >  Best,
>> > >  Leonard
>> > > 
>> > >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
>> > >  
>> > >  wrote:
>> > > 
>> > > > Thanks for the great work! Congrats all!
>> > > >
>> > > > Best,
>> > > > Lincoln Lee
>> > > >
>> > > >
>> > > > Jing Ge  于2023年10月27日周五 00:16写道:
>> > > >
>> > > >> The Apache Flink community is very happy to announce the release 
>> > > >> of
>> > > > Apache
>> > > >> Flink 1.18.0, which is the first release for the Apache Flink 1.18
>> > > > series.
>> > > >>
>> > > >> Apache Flink® is an open-source unified stream and batch data
>> > >  processing
>> > > >> framework for distributed, high-performing, always-available, and
>> > > > accurate
>> > > >> data applications.
>> > > >>
>> > > >> The release is available for download at:
>> > > >> https://flink.apache.org/downloads.html
>> > > >>
>> > > >> Please check out the release blog post for an overview of the
>> > > > improvements
>> > > >> for this release:
>> > > >>
>> > > >>
>> > > >
>> > > 
>> > > >>>
>> > > >>
>> > > https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
>> > > >>
>> > > >> The full release notes are available in Jira:
>> > > >>
>> > > >>
>> > > >
>> > > 
>> > > >>>
>> > > >>
>> > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
>> > > >>
>> > > >> We would like to thank all contributors of the Apache Flink
>> > > >> community
>> > >  who
>> > > >> made this release possible!
>> > > >>
>> > > >> Best regards,
>> > > >> Konstantin, Qingsheng, Sergey, and Jing
>> > > >>
>> > > >
>> > > 
>> > > >>>
>> > > >>
>> > >
>> > >
>>
>>
>>
>> --
>>
>> Best,
>> Benchao Li


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Qingsheng Ren
Congratulations and big THANK YOU to everyone helping with this release!

Best,
Qingsheng

On Fri, Oct 27, 2023 at 10:18 AM Benchao Li  wrote:

> Great work, thanks everyone involved!
>
> Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
> >
> > Thanks for the great work!
> >
> > Best,
> > Rui
> >
> > On Fri, Oct 27, 2023 at 10:03 AM Paul Lam  wrote:
> >
> > > Finally! Thanks to all!
> > >
> > > Best,
> > > Paul Lam
> > >
> > > > 2023年10月27日 03:58,Alexander Fedulov 
> 写道:
> > > >
> > > > Great work, thanks everyone!
> > > >
> > > > Best,
> > > > Alexander
> > > >
> > > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser <
> martijnvis...@apache.org>
> > > > wrote:
> > > >
> > > >> Thank you all who have contributed!
> > > >>
> > > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin 
> > > >>
> > > >>> Thanks for the great work! Congratulations
> > > >>>
> > > >>>
> > > >>> Best,
> > > >>> Feng Jin
> > > >>>
> > > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu 
> wrote:
> > > >>>
> > >  Congratulations, Well done!
> > > 
> > >  Best,
> > >  Leonard
> > > 
> > >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee <
> lincoln.8...@gmail.com>
> > >  wrote:
> > > 
> > > > Thanks for the great work! Congrats all!
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Jing Ge  于2023年10月27日周五 00:16写道:
> > > >
> > > >> The Apache Flink community is very happy to announce the
> release of
> > > > Apache
> > > >> Flink 1.18.0, which is the first release for the Apache Flink
> 1.18
> > > > series.
> > > >>
> > > >> Apache Flink® is an open-source unified stream and batch data
> > >  processing
> > > >> framework for distributed, high-performing, always-available,
> and
> > > > accurate
> > > >> data applications.
> > > >>
> > > >> The release is available for download at:
> > > >> https://flink.apache.org/downloads.html
> > > >>
> > > >> Please check out the release blog post for an overview of the
> > > > improvements
> > > >> for this release:
> > > >>
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > > >>
> > > >> The full release notes are available in Jira:
> > > >>
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > > >>
> > > >> We would like to thank all contributors of the Apache Flink
> > > >> community
> > >  who
> > > >> made this release possible!
> > > >>
> > > >> Best regards,
> > > >> Konstantin, Qingsheng, Sergey, and Jing
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > >
> > >
>
>
>
> --
>
> Best,
> Benchao Li
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Qingsheng Ren
Congratulations and big THANK YOU to everyone helping with this release!

Best,
Qingsheng

On Fri, Oct 27, 2023 at 10:18 AM Benchao Li  wrote:

> Great work, thanks everyone involved!
>
> Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
> >
> > Thanks for the great work!
> >
> > Best,
> > Rui
> >
> > On Fri, Oct 27, 2023 at 10:03 AM Paul Lam  wrote:
> >
> > > Finally! Thanks to all!
> > >
> > > Best,
> > > Paul Lam
> > >
> > > > 2023年10月27日 03:58,Alexander Fedulov 
> 写道:
> > > >
> > > > Great work, thanks everyone!
> > > >
> > > > Best,
> > > > Alexander
> > > >
> > > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser <
> martijnvis...@apache.org>
> > > > wrote:
> > > >
> > > >> Thank you all who have contributed!
> > > >>
> > > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin 
> > > >>
> > > >>> Thanks for the great work! Congratulations
> > > >>>
> > > >>>
> > > >>> Best,
> > > >>> Feng Jin
> > > >>>
> > > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu 
> wrote:
> > > >>>
> > >  Congratulations, Well done!
> > > 
> > >  Best,
> > >  Leonard
> > > 
> > >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee <
> lincoln.8...@gmail.com>
> > >  wrote:
> > > 
> > > > Thanks for the great work! Congrats all!
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Jing Ge  于2023年10月27日周五 00:16写道:
> > > >
> > > >> The Apache Flink community is very happy to announce the
> release of
> > > > Apache
> > > >> Flink 1.18.0, which is the first release for the Apache Flink
> 1.18
> > > > series.
> > > >>
> > > >> Apache Flink® is an open-source unified stream and batch data
> > >  processing
> > > >> framework for distributed, high-performing, always-available,
> and
> > > > accurate
> > > >> data applications.
> > > >>
> > > >> The release is available for download at:
> > > >> https://flink.apache.org/downloads.html
> > > >>
> > > >> Please check out the release blog post for an overview of the
> > > > improvements
> > > >> for this release:
> > > >>
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > > >>
> > > >> The full release notes are available in Jira:
> > > >>
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > > >>
> > > >> We would like to thank all contributors of the Apache Flink
> > > >> community
> > >  who
> > > >> made this release possible!
> > > >>
> > > >> Best regards,
> > > >> Konstantin, Qingsheng, Sergey, and Jing
> > > >>
> > > >
> > > 
> > > >>>
> > > >>
> > >
> > >
>
>
>
> --
>
> Best,
> Benchao Li
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Benchao Li
Great work, thanks everyone involved!

Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
>
> Thanks for the great work!
>
> Best,
> Rui
>
> On Fri, Oct 27, 2023 at 10:03 AM Paul Lam  wrote:
>
> > Finally! Thanks to all!
> >
> > Best,
> > Paul Lam
> >
> > > 2023年10月27日 03:58,Alexander Fedulov  写道:
> > >
> > > Great work, thanks everyone!
> > >
> > > Best,
> > > Alexander
> > >
> > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser 
> > > wrote:
> > >
> > >> Thank you all who have contributed!
> > >>
> > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin 
> > >>
> > >>> Thanks for the great work! Congratulations
> > >>>
> > >>>
> > >>> Best,
> > >>> Feng Jin
> > >>>
> > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:
> > >>>
> >  Congratulations, Well done!
> > 
> >  Best,
> >  Leonard
> > 
> >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
> >  wrote:
> > 
> > > Thanks for the great work! Congrats all!
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Jing Ge  于2023年10月27日周五 00:16写道:
> > >
> > >> The Apache Flink community is very happy to announce the release of
> > > Apache
> > >> Flink 1.18.0, which is the first release for the Apache Flink 1.18
> > > series.
> > >>
> > >> Apache Flink® is an open-source unified stream and batch data
> >  processing
> > >> framework for distributed, high-performing, always-available, and
> > > accurate
> > >> data applications.
> > >>
> > >> The release is available for download at:
> > >> https://flink.apache.org/downloads.html
> > >>
> > >> Please check out the release blog post for an overview of the
> > > improvements
> > >> for this release:
> > >>
> > >>
> > >
> > 
> > >>>
> > >>
> > https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > >>
> > >> The full release notes are available in Jira:
> > >>
> > >>
> > >
> > 
> > >>>
> > >>
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > >>
> > >> We would like to thank all contributors of the Apache Flink
> > >> community
> >  who
> > >> made this release possible!
> > >>
> > >> Best regards,
> > >> Konstantin, Qingsheng, Sergey, and Jing
> > >>
> > >
> > 
> > >>>
> > >>
> >
> >



-- 

Best,
Benchao Li


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Benchao Li
Great work, thanks everyone involved!

Rui Fan <1996fan...@gmail.com> 于2023年10月27日周五 10:16写道:
>
> Thanks for the great work!
>
> Best,
> Rui
>
> On Fri, Oct 27, 2023 at 10:03 AM Paul Lam  wrote:
>
> > Finally! Thanks to all!
> >
> > Best,
> > Paul Lam
> >
> > > 2023年10月27日 03:58,Alexander Fedulov  写道:
> > >
> > > Great work, thanks everyone!
> > >
> > > Best,
> > > Alexander
> > >
> > > On Thu, 26 Oct 2023 at 21:15, Martijn Visser 
> > > wrote:
> > >
> > >> Thank you all who have contributed!
> > >>
> > >> Op do 26 okt 2023 om 18:41 schreef Feng Jin 
> > >>
> > >>> Thanks for the great work! Congratulations
> > >>>
> > >>>
> > >>> Best,
> > >>> Feng Jin
> > >>>
> > >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:
> > >>>
> >  Congratulations, Well done!
> > 
> >  Best,
> >  Leonard
> > 
> >  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
> >  wrote:
> > 
> > > Thanks for the great work! Congrats all!
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Jing Ge  于2023年10月27日周五 00:16写道:
> > >
> > >> The Apache Flink community is very happy to announce the release of
> > > Apache
> > >> Flink 1.18.0, which is the first release for the Apache Flink 1.18
> > > series.
> > >>
> > >> Apache Flink® is an open-source unified stream and batch data
> >  processing
> > >> framework for distributed, high-performing, always-available, and
> > > accurate
> > >> data applications.
> > >>
> > >> The release is available for download at:
> > >> https://flink.apache.org/downloads.html
> > >>
> > >> Please check out the release blog post for an overview of the
> > > improvements
> > >> for this release:
> > >>
> > >>
> > >
> > 
> > >>>
> > >>
> > https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > >>
> > >> The full release notes are available in Jira:
> > >>
> > >>
> > >
> > 
> > >>>
> > >>
> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > >>
> > >> We would like to thank all contributors of the Apache Flink
> > >> community
> >  who
> > >> made this release possible!
> > >>
> > >> Best regards,
> > >> Konstantin, Qingsheng, Sergey, and Jing
> > >>
> > >
> > 
> > >>>
> > >>
> >
> >



-- 

Best,
Benchao Li


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Rui Fan
Thanks for the great work!

Best,
Rui

On Fri, Oct 27, 2023 at 10:03 AM Paul Lam  wrote:

> Finally! Thanks to all!
>
> Best,
> Paul Lam
>
> > 2023年10月27日 03:58,Alexander Fedulov  写道:
> >
> > Great work, thanks everyone!
> >
> > Best,
> > Alexander
> >
> > On Thu, 26 Oct 2023 at 21:15, Martijn Visser 
> > wrote:
> >
> >> Thank you all who have contributed!
> >>
> >> Op do 26 okt 2023 om 18:41 schreef Feng Jin 
> >>
> >>> Thanks for the great work! Congratulations
> >>>
> >>>
> >>> Best,
> >>> Feng Jin
> >>>
> >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:
> >>>
>  Congratulations, Well done!
> 
>  Best,
>  Leonard
> 
>  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
>  wrote:
> 
> > Thanks for the great work! Congrats all!
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Jing Ge  于2023年10月27日周五 00:16写道:
> >
> >> The Apache Flink community is very happy to announce the release of
> > Apache
> >> Flink 1.18.0, which is the first release for the Apache Flink 1.18
> > series.
> >>
> >> Apache Flink® is an open-source unified stream and batch data
>  processing
> >> framework for distributed, high-performing, always-available, and
> > accurate
> >> data applications.
> >>
> >> The release is available for download at:
> >> https://flink.apache.org/downloads.html
> >>
> >> Please check out the release blog post for an overview of the
> > improvements
> >> for this release:
> >>
> >>
> >
> 
> >>>
> >>
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> >>
> >> The full release notes are available in Jira:
> >>
> >>
> >
> 
> >>>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> >>
> >> We would like to thank all contributors of the Apache Flink
> >> community
>  who
> >> made this release possible!
> >>
> >> Best regards,
> >> Konstantin, Qingsheng, Sergey, and Jing
> >>
> >
> 
> >>>
> >>
>
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Rui Fan
Thanks for the great work!

Best,
Rui

On Fri, Oct 27, 2023 at 10:03 AM Paul Lam  wrote:

> Finally! Thanks to all!
>
> Best,
> Paul Lam
>
> > 2023年10月27日 03:58,Alexander Fedulov  写道:
> >
> > Great work, thanks everyone!
> >
> > Best,
> > Alexander
> >
> > On Thu, 26 Oct 2023 at 21:15, Martijn Visser 
> > wrote:
> >
> >> Thank you all who have contributed!
> >>
> >> Op do 26 okt 2023 om 18:41 schreef Feng Jin 
> >>
> >>> Thanks for the great work! Congratulations
> >>>
> >>>
> >>> Best,
> >>> Feng Jin
> >>>
> >>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:
> >>>
>  Congratulations, Well done!
> 
>  Best,
>  Leonard
> 
>  On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
>  wrote:
> 
> > Thanks for the great work! Congrats all!
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Jing Ge  于2023年10月27日周五 00:16写道:
> >
> >> The Apache Flink community is very happy to announce the release of
> > Apache
> >> Flink 1.18.0, which is the first release for the Apache Flink 1.18
> > series.
> >>
> >> Apache Flink® is an open-source unified stream and batch data
>  processing
> >> framework for distributed, high-performing, always-available, and
> > accurate
> >> data applications.
> >>
> >> The release is available for download at:
> >> https://flink.apache.org/downloads.html
> >>
> >> Please check out the release blog post for an overview of the
> > improvements
> >> for this release:
> >>
> >>
> >
> 
> >>>
> >>
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> >>
> >> The full release notes are available in Jira:
> >>
> >>
> >
> 
> >>>
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> >>
> >> We would like to thank all contributors of the Apache Flink
> >> community
>  who
> >> made this release possible!
> >>
> >> Best regards,
> >> Konstantin, Qingsheng, Sergey, and Jing
> >>
> >
> 
> >>>
> >>
>
>


Re: Invalid Null Check in DefaultFileFilter

2023-10-26 Thread Chirag Dewan via user
 Yeah agree, not a problem in general. But it just seems odd. Returning true if 
a fileName can be null will blow up a lot more in the reader as far as my 
understanding goes.
I just want to understand whether this is an erroneous condition or an actual 
use case. Lets say is it possible to get a null file name for some sub 
directories and hence important to return true so that the File Source can 
monitor inside those sub directories?
On Friday, 27 October, 2023 at 12:58:44 am IST, Alexander Fedulov 
 wrote:  
 
 Is there an actual issue behind this question?
In general: this is a form of defensive programming for a public interface and 
the decision here is to be more lenient when facing potentially erroneous user 
input rather than blow up the whole application with a NullPointerException.
Best,Alexander Fedulov
On Thu, 26 Oct 2023 at 07:35, Chirag Dewan via user  
wrote:

Hi,
I was looking at this check in DefaultFileFilter:
public boolean test(Path path) {
final String fileName = path.getName();
if (fileName == null || fileName.length() == 0) {
return true;
}Was wondering how can a file name be null?
And if null, shouldnt this be return false?
I created a JIRA for this - [FLINK-33367] Invalid Check in DefaultFileFilter - 
ASF JIRA

| 
| 
|  | 
[FLINK-33367] Invalid Check in DefaultFileFilter - ASF JIRA


 |

 |

 |

Any input is appreciated.
Thanks



  

Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Paul Lam
Finally! Thanks to all!

Best,
Paul Lam

> 2023年10月27日 03:58,Alexander Fedulov  写道:
> 
> Great work, thanks everyone!
> 
> Best,
> Alexander
> 
> On Thu, 26 Oct 2023 at 21:15, Martijn Visser 
> wrote:
> 
>> Thank you all who have contributed!
>> 
>> Op do 26 okt 2023 om 18:41 schreef Feng Jin 
>> 
>>> Thanks for the great work! Congratulations
>>> 
>>> 
>>> Best,
>>> Feng Jin
>>> 
>>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:
>>> 
 Congratulations, Well done!
 
 Best,
 Leonard
 
 On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
 wrote:
 
> Thanks for the great work! Congrats all!
> 
> Best,
> Lincoln Lee
> 
> 
> Jing Ge  于2023年10月27日周五 00:16写道:
> 
>> The Apache Flink community is very happy to announce the release of
> Apache
>> Flink 1.18.0, which is the first release for the Apache Flink 1.18
> series.
>> 
>> Apache Flink® is an open-source unified stream and batch data
 processing
>> framework for distributed, high-performing, always-available, and
> accurate
>> data applications.
>> 
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>> 
>> Please check out the release blog post for an overview of the
> improvements
>> for this release:
>> 
>> 
> 
 
>>> 
>> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
>> 
>> The full release notes are available in Jira:
>> 
>> 
> 
 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
>> 
>> We would like to thank all contributors of the Apache Flink
>> community
 who
>> made this release possible!
>> 
>> Best regards,
>> Konstantin, Qingsheng, Sergey, and Jing
>> 
> 
 
>>> 
>> 



Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Paul Lam
Finally! Thanks to all!

Best,
Paul Lam

> 2023年10月27日 03:58,Alexander Fedulov  写道:
> 
> Great work, thanks everyone!
> 
> Best,
> Alexander
> 
> On Thu, 26 Oct 2023 at 21:15, Martijn Visser 
> wrote:
> 
>> Thank you all who have contributed!
>> 
>> Op do 26 okt 2023 om 18:41 schreef Feng Jin 
>> 
>>> Thanks for the great work! Congratulations
>>> 
>>> 
>>> Best,
>>> Feng Jin
>>> 
>>> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:
>>> 
 Congratulations, Well done!
 
 Best,
 Leonard
 
 On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
 wrote:
 
> Thanks for the great work! Congrats all!
> 
> Best,
> Lincoln Lee
> 
> 
> Jing Ge  于2023年10月27日周五 00:16写道:
> 
>> The Apache Flink community is very happy to announce the release of
> Apache
>> Flink 1.18.0, which is the first release for the Apache Flink 1.18
> series.
>> 
>> Apache Flink® is an open-source unified stream and batch data
 processing
>> framework for distributed, high-performing, always-available, and
> accurate
>> data applications.
>> 
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>> 
>> Please check out the release blog post for an overview of the
> improvements
>> for this release:
>> 
>> 
> 
 
>>> 
>> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
>> 
>> The full release notes are available in Jira:
>> 
>> 
> 
 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
>> 
>> We would like to thank all contributors of the Apache Flink
>> community
 who
>> made this release possible!
>> 
>> Best regards,
>> Konstantin, Qingsheng, Sergey, and Jing
>> 
> 
 
>>> 
>> 



Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread liu ron
Great work, thanks everyone!

Best,
Ron

Alexander Fedulov  于2023年10月27日周五 04:00写道:

> Great work, thanks everyone!
>
> Best,
> Alexander
>
> On Thu, 26 Oct 2023 at 21:15, Martijn Visser 
> wrote:
>
> > Thank you all who have contributed!
> >
> > Op do 26 okt 2023 om 18:41 schreef Feng Jin 
> >
> > > Thanks for the great work! Congratulations
> > >
> > >
> > > Best,
> > > Feng Jin
> > >
> > > On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:
> > >
> > > > Congratulations, Well done!
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > > On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee  >
> > > > wrote:
> > > >
> > > > > Thanks for the great work! Congrats all!
> > > > >
> > > > > Best,
> > > > > Lincoln Lee
> > > > >
> > > > >
> > > > > Jing Ge  于2023年10月27日周五 00:16写道:
> > > > >
> > > > > > The Apache Flink community is very happy to announce the release
> of
> > > > > Apache
> > > > > > Flink 1.18.0, which is the first release for the Apache Flink
> 1.18
> > > > > series.
> > > > > >
> > > > > > Apache Flink® is an open-source unified stream and batch data
> > > > processing
> > > > > > framework for distributed, high-performing, always-available, and
> > > > > accurate
> > > > > > data applications.
> > > > > >
> > > > > > The release is available for download at:
> > > > > > https://flink.apache.org/downloads.html
> > > > > >
> > > > > > Please check out the release blog post for an overview of the
> > > > > improvements
> > > > > > for this release:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > > > > >
> > > > > > The full release notes are available in Jira:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > > > > >
> > > > > > We would like to thank all contributors of the Apache Flink
> > community
> > > > who
> > > > > > made this release possible!
> > > > > >
> > > > > > Best regards,
> > > > > > Konstantin, Qingsheng, Sergey, and Jing
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread liu ron
Great work, thanks everyone!

Best,
Ron

Alexander Fedulov  于2023年10月27日周五 04:00写道:

> Great work, thanks everyone!
>
> Best,
> Alexander
>
> On Thu, 26 Oct 2023 at 21:15, Martijn Visser 
> wrote:
>
> > Thank you all who have contributed!
> >
> > Op do 26 okt 2023 om 18:41 schreef Feng Jin 
> >
> > > Thanks for the great work! Congratulations
> > >
> > >
> > > Best,
> > > Feng Jin
> > >
> > > On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:
> > >
> > > > Congratulations, Well done!
> > > >
> > > > Best,
> > > > Leonard
> > > >
> > > > On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee  >
> > > > wrote:
> > > >
> > > > > Thanks for the great work! Congrats all!
> > > > >
> > > > > Best,
> > > > > Lincoln Lee
> > > > >
> > > > >
> > > > > Jing Ge  于2023年10月27日周五 00:16写道:
> > > > >
> > > > > > The Apache Flink community is very happy to announce the release
> of
> > > > > Apache
> > > > > > Flink 1.18.0, which is the first release for the Apache Flink
> 1.18
> > > > > series.
> > > > > >
> > > > > > Apache Flink® is an open-source unified stream and batch data
> > > > processing
> > > > > > framework for distributed, high-performing, always-available, and
> > > > > accurate
> > > > > > data applications.
> > > > > >
> > > > > > The release is available for download at:
> > > > > > https://flink.apache.org/downloads.html
> > > > > >
> > > > > > Please check out the release blog post for an overview of the
> > > > > improvements
> > > > > > for this release:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > > > > >
> > > > > > The full release notes are available in Jira:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > > > > >
> > > > > > We would like to thank all contributors of the Apache Flink
> > community
> > > > who
> > > > > > made this release possible!
> > > > > >
> > > > > > Best regards,
> > > > > > Konstantin, Qingsheng, Sergey, and Jing
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-26 Thread Chirag Dewan via user
 Hi Arjun,
Flink's FileSource doesnt move or delete the files as of now. It will keep the 
files as is and remember the name of the file read in checkpointed state to 
ensure it doesnt read the same file twice.
Flink's source API works in a way that single Enumerator operates on the 
JobManager. The enumerator is responsible for listing the files and splitting 
these into smaller units. These units could be the complete file (in case of 
row formats) or splits within a file (for bulk formats). The reading is done by 
SplitReaders in the Task Managers. This way it ensures that only reading is 
done concurrently and is able to track file completions.
You can read more Flink Sources and here

| 
| 
|  | 
FileSystem

FileSystem # This connector provides a unified Source and Sink for BATCH and 
STREAMING that reads or writes (par...
 |

 |

 |




On Thursday, 26 October, 2023 at 06:53:23 pm IST, arjun s 
 wrote:  
 
 Hello team,
I'm currently in the process of configuring a Flink job. This job entails 
reading files from a specified directory and then transmitting the data to a 
Kafka sink. I've already successfully designed a Flink job that reads the file 
contents in a streaming manner and effectively sends them to Kafka. However, my 
specific requirement is a bit more intricate. I need the job to not only read 
these files and push the data to Kafka but also relocate the processed file to 
a different directory once all of its contents have been processed. Following 
this, the job should seamlessly transition to processing the next file in the 
source directory. Additionally, I have some concerns regarding how the job will 
behave if it encounters a restart. Could you please advise if this is 
achievable, and if so, provide guidance or code to implement it?

I'm also quite interested in how the job will handle situations where the 
source has a parallelism greater than 2 or 3, and how it can accurately monitor 
the completion of reading all contents in each file.

Thanks and Regards,
Arjun   

Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Alexander Fedulov
Great work, thanks everyone!

Best,
Alexander

On Thu, 26 Oct 2023 at 21:15, Martijn Visser 
wrote:

> Thank you all who have contributed!
>
> Op do 26 okt 2023 om 18:41 schreef Feng Jin 
>
> > Thanks for the great work! Congratulations
> >
> >
> > Best,
> > Feng Jin
> >
> > On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:
> >
> > > Congratulations, Well done!
> > >
> > > Best,
> > > Leonard
> > >
> > > On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
> > > wrote:
> > >
> > > > Thanks for the great work! Congrats all!
> > > >
> > > > Best,
> > > > Lincoln Lee
> > > >
> > > >
> > > > Jing Ge  于2023年10月27日周五 00:16写道:
> > > >
> > > > > The Apache Flink community is very happy to announce the release of
> > > > Apache
> > > > > Flink 1.18.0, which is the first release for the Apache Flink 1.18
> > > > series.
> > > > >
> > > > > Apache Flink® is an open-source unified stream and batch data
> > > processing
> > > > > framework for distributed, high-performing, always-available, and
> > > > accurate
> > > > > data applications.
> > > > >
> > > > > The release is available for download at:
> > > > > https://flink.apache.org/downloads.html
> > > > >
> > > > > Please check out the release blog post for an overview of the
> > > > improvements
> > > > > for this release:
> > > > >
> > > > >
> > > >
> > >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > > > >
> > > > > The full release notes are available in Jira:
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > > > >
> > > > > We would like to thank all contributors of the Apache Flink
> community
> > > who
> > > > > made this release possible!
> > > > >
> > > > > Best regards,
> > > > > Konstantin, Qingsheng, Sergey, and Jing
> > > > >
> > > >
> > >
> >
>


Re: Offset lost with AT_LEAST_ONCE kafka delivery guarantees

2023-10-26 Thread Alexander Fedulov
* to clarify: by different output I mean that for the same input message
the output message could be slightly smaller due to the abovementioned
factors and fall into the allowed size range without causing any failures

On Thu, 26 Oct 2023 at 21:52, Alexander Fedulov 
wrote:

> Your expectations are correct. In case of AT_LEAST_ONCE  Flink will wait
> for all outstanding records in the Kafka buffers to be acknowledged before
> marking the checkpoint successful (=also recording the offsets of the
> sources). That said, there might be other factors involved that could lead
> to a different output even when reading the same data from the sources -
> such as using using processing time (instead of event time) or doing some
> sort of lookup calls to external systems. If you absolutely cannot think of
> a scenario where this could be the case for your application, please try to
> reproduce the error reliably - this is something that needs to be
> further looked into.
>
> Best,
> Alexander Fedulov
>
> On Mon, 23 Oct 2023 at 19:11, Gabriele Modena 
> wrote:
>
>> Hey folks,
>>
>> We currently run (py) flink 1.17 on k8s (managed by flink k8s
>> operator), with HA and checkpointing (fixed retries policy). We
>> produce into Kafka with AT_LEAST_ONCE delivery guarantee.
>>
>> Our application failed when trying to produce a message larger than
>> Kafka's message larger than message.max.bytes. This offset was never
>> going to be committed, so Flink HA was not able to recover the
>> application.
>>
>> Upon a manual restart, it looks like the offending offset has been
>> lost: it was not picked after rewinding to the checkpointed offset,
>> and it was not committed to Kafka. I would have expected this offset
>> to not have made it past the KafkaProducer commit checkpoint barrier,
>> and that the app would fail again on it.
>>
>> I understand that there are failure scenarios that could end in data
>> loss when Kafka delivery guarantee is set to EXACTLY_ONCE and kafka
>> expires an uncommitted transaction.
>>
>> However, it's unclear to me if other corner cases would apply to
>> AT_LEAST_ONCE guarantees. Upon broker failure and app restarts, I
>> would expect duplicate messages but no data loss. What I can see as a
>> problem is that this commit was never going to happen.
>>
>> Is this expected behaviour? Am I missing something here?
>>
>> Cheers,
>> --
>> Gabriele Modena (he / him)
>> Staff Software Engineer
>> Wikimedia Foundation
>>
>


Re: Offset lost with AT_LEAST_ONCE kafka delivery guarantees

2023-10-26 Thread Alexander Fedulov
Your expectations are correct. In case of AT_LEAST_ONCE  Flink will wait
for all outstanding records in the Kafka buffers to be acknowledged before
marking the checkpoint successful (=also recording the offsets of the
sources). That said, there might be other factors involved that could lead
to a different output even when reading the same data from the sources -
such as using using processing time (instead of event time) or doing some
sort of lookup calls to external systems. If you absolutely cannot think of
a scenario where this could be the case for your application, please try to
reproduce the error reliably - this is something that needs to be
further looked into.

Best,
Alexander Fedulov

On Mon, 23 Oct 2023 at 19:11, Gabriele Modena  wrote:

> Hey folks,
>
> We currently run (py) flink 1.17 on k8s (managed by flink k8s
> operator), with HA and checkpointing (fixed retries policy). We
> produce into Kafka with AT_LEAST_ONCE delivery guarantee.
>
> Our application failed when trying to produce a message larger than
> Kafka's message larger than message.max.bytes. This offset was never
> going to be committed, so Flink HA was not able to recover the
> application.
>
> Upon a manual restart, it looks like the offending offset has been
> lost: it was not picked after rewinding to the checkpointed offset,
> and it was not committed to Kafka. I would have expected this offset
> to not have made it past the KafkaProducer commit checkpoint barrier,
> and that the app would fail again on it.
>
> I understand that there are failure scenarios that could end in data
> loss when Kafka delivery guarantee is set to EXACTLY_ONCE and kafka
> expires an uncommitted transaction.
>
> However, it's unclear to me if other corner cases would apply to
> AT_LEAST_ONCE guarantees. Upon broker failure and app restarts, I
> would expect duplicate messages but no data loss. What I can see as a
> problem is that this commit was never going to happen.
>
> Is this expected behaviour? Am I missing something here?
>
> Cheers,
> --
> Gabriele Modena (he / him)
> Staff Software Engineer
> Wikimedia Foundation
>


Re: Invalid Null Check in DefaultFileFilter

2023-10-26 Thread Alexander Fedulov
Is there an actual issue behind this question?

In general: this is a form of defensive programming for a public interface
and the decision here is to be more lenient when facing potentially
erroneous user input rather than blow up the whole application with a
NullPointerException.

Best,
Alexander Fedulov

On Thu, 26 Oct 2023 at 07:35, Chirag Dewan via user 
wrote:

> Hi,
>
> I was looking at this check in DefaultFileFilter:
>
> public boolean test(Path path) {
> final String fileName = path.getName();
> if (fileName == null || fileName.length() == 0) {
> return true;
> }
>
> Was wondering how can a file name be null?
>
> And if null, shouldnt this be *return false*?
>
> I created a JIRA for this - [FLINK-33367] Invalid Check in
> DefaultFileFilter - ASF JIRA
> 
>
> [FLINK-33367] Invalid Check in DefaultFileFilter - ASF JIRA
>
> 
> Any input is appreciated.
>
> Thanks
>
>
>
>


Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-26 Thread Alexander Fedulov
Flink's FileSource will enumerate the files and keep track of the progress
in parallel for the individual files. Depending on the format you use, the
progress is tracked at the different level of granularity (TextLine being
the simplest one that tracks the progress based on the number of lines
processed in each file). In case of failures, the source will pick up where
it left off. Files removal is trickier - the easiest way to achieve that
would be to have tombstones at the end of files and process them in user
code.

Best,
Alexander Fedulov

On Thu, 26 Oct 2023 at 18:17, arjun s  wrote:

> Hello team,
> I'm currently in the process of configuring a Flink job. This job entails
> reading files from a specified directory and then transmitting the data to
> a Kafka sink. I've already successfully designed a Flink job that reads the
> file contents in a streaming manner and effectively sends them to Kafka.
> However, my specific requirement is a bit more intricate. I need the job to
> not only read these files and push the data to Kafka but also relocate the
> processed file to a different directory once all of its contents have been
> processed. Following this, the job should seamlessly transition to
> processing the next file in the source directory. Additionally, I have some
> concerns regarding how the job will behave if it encounters a restart.
> Could you please advise if this is achievable, and if so, provide guidance
> or code to implement it?
>
> I'm also quite interested in how the job will handle situations where the
> source has a parallelism greater than 2 or 3, and how it can accurately
> monitor the completion of reading all contents in each file.
>
> Thanks and Regards,
> Arjun
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Martijn Visser
Thank you all who have contributed!

Op do 26 okt 2023 om 18:41 schreef Feng Jin 

> Thanks for the great work! Congratulations
>
>
> Best,
> Feng Jin
>
> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:
>
> > Congratulations, Well done!
> >
> > Best,
> > Leonard
> >
> > On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
> > wrote:
> >
> > > Thanks for the great work! Congrats all!
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Jing Ge  于2023年10月27日周五 00:16写道:
> > >
> > > > The Apache Flink community is very happy to announce the release of
> > > Apache
> > > > Flink 1.18.0, which is the first release for the Apache Flink 1.18
> > > series.
> > > >
> > > > Apache Flink® is an open-source unified stream and batch data
> > processing
> > > > framework for distributed, high-performing, always-available, and
> > > accurate
> > > > data applications.
> > > >
> > > > The release is available for download at:
> > > > https://flink.apache.org/downloads.html
> > > >
> > > > Please check out the release blog post for an overview of the
> > > improvements
> > > > for this release:
> > > >
> > > >
> > >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > > >
> > > > The full release notes are available in Jira:
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > > >
> > > > We would like to thank all contributors of the Apache Flink community
> > who
> > > > made this release possible!
> > > >
> > > > Best regards,
> > > > Konstantin, Qingsheng, Sergey, and Jing
> > > >
> > >
> >
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Martijn Visser
Thank you all who have contributed!

Op do 26 okt 2023 om 18:41 schreef Feng Jin 

> Thanks for the great work! Congratulations
>
>
> Best,
> Feng Jin
>
> On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:
>
> > Congratulations, Well done!
> >
> > Best,
> > Leonard
> >
> > On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
> > wrote:
> >
> > > Thanks for the great work! Congrats all!
> > >
> > > Best,
> > > Lincoln Lee
> > >
> > >
> > > Jing Ge  于2023年10月27日周五 00:16写道:
> > >
> > > > The Apache Flink community is very happy to announce the release of
> > > Apache
> > > > Flink 1.18.0, which is the first release for the Apache Flink 1.18
> > > series.
> > > >
> > > > Apache Flink® is an open-source unified stream and batch data
> > processing
> > > > framework for distributed, high-performing, always-available, and
> > > accurate
> > > > data applications.
> > > >
> > > > The release is available for download at:
> > > > https://flink.apache.org/downloads.html
> > > >
> > > > Please check out the release blog post for an overview of the
> > > improvements
> > > > for this release:
> > > >
> > > >
> > >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > > >
> > > > The full release notes are available in Jira:
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > > >
> > > > We would like to thank all contributors of the Apache Flink community
> > who
> > > > made this release possible!
> > > >
> > > > Best regards,
> > > > Konstantin, Qingsheng, Sergey, and Jing
> > > >
> > >
> >
>


Re: CSV Decoder with AVRO schema generated Object

2023-10-26 Thread Alexander Fedulov
If you use Avro schema you should also store the data in Avro format.
Everything else is going to be a hack.
If you really want to proceed with the hack, you'll either need to use
aliases in your Avro reader schema or change the headers of the CSV file to
comply with the field names in avro.

Best,
Alexander

On Thu, 26 Oct 2023 at 17:56, Kirti Dhar Upadhyay K <
kirti.k.dhar.upadh...@ericsson.com> wrote:

> Hi Alexander,
>
>
>
> Thanks for reply.
>
> Actually I have a system where data travels in form of user defined, AVRO
> schema generated objects.
>
>
>
> *Sample code:*
>
>
>
> static void readCsvWithCustomSchemaDecoder(StreamExecutionEnvironment env,
> Path dataDirectory) throws Exception {
>
> Class recordClazz = EmployeeTest.class;
> *// This is AVRO generated java object having fields emp_id and Name *
> CsvSchema.Builder builder = CsvSchema.*builder*().setUseHeader(true
> ).setReorderColumns(true).setColumnSeparator(',').
> setEscapeChar('"').setLineSeparator(System.*lineSeparator*
> ()).setQuoteChar('"').setArrayElementSeparator(";").
> setNullValue("");
>
> CsvReaderFormat csvFormat = CsvReaderFormat.*forSchema*(
> CsvSchema.*builder*().build(), TypeInformation.*of*(recordClazz));
>
> FileSource.FileSourceBuilder fileSourceBuilder = FileSource.
> *forRecordStreamFormat*(csvFormat, dataDirectory).monitorContinuously(
> Duration.*ofSeconds*(30));
> fileSourceBuilder.setFileEnumerator((FileEnumerator.Provider) () -> new
> NonSplittingRecursiveEnumerator(new DefaultFileFilter()));
> FileSource source = fileSourceBuilder.build();
>
> final DataStreamSource file = env.fromSource(source,
> WatermarkStrategy.*forMonotonousTimestamps*()
> .withTimestampAssigner(new WatermarkAssigner((Object input)
> -> System.*currentTimeMillis*())),"FileSource");
> file.print();
> }
>
>
>
>
>
> Regards,
>
> Kirti Dhar
>
>
>
> *From:* Alexander Fedulov 
> *Sent:* 26 October 2023 20:59
> *To:* Kirti Dhar Upadhyay K 
> *Cc:* user@flink.apache.org
> *Subject:* Re: CSV Decoder with AVRO schema generated Object
>
>
>
> Hi Kirti,
>
>
>
> What do you mean exactly by "Flink CSV Decoder"? Please provide a snippet
> of the code that you are trying to execute.
>
>
> To be honest, combining CSV with AVRO-generated classes sounds rather
> strange and you might want to reconsider your approach.
> As for a quick fix, using aliases in your reader schema might help [1]
>
> [1] https://avro.apache.org/docs/1.8.1/spec.html#Aliases
>
>
> Best,
>
> Alexander Fedulov
>
>
>
> On Thu, 26 Oct 2023 at 16:24, Kirti Dhar Upadhyay K via user <
> user@flink.apache.org> wrote:
>
> Hi Team,
>
>
>
> I am using Flink CSV Decoder with AVSC generated java Object and facing
> issue if the field name contains underscore(_) or fieldname starts with
> Capital case.
>
> *Sample Schema:*
>
> {
>   "namespace": "avro.employee",
>   "type": "record",
>   "name": "EmployeeTest",
>   "fields": [
> {
>   "name": "emp_id",
>   "type": ["null","long"]
> },
> {
>   "name": "Name",
>   "type": ["null","string"]
> }
> ]
> }
>
>
>
> Generated Java Object getters/setters:
>
>
>
> public void *setEmpId*(java.lang.Long value) {
>   this.*emp_id* = value;
> }
>
>
>
> .
>
> .
>
>
>
> public java.lang.CharSequence *getName*() {
>   return *Name*;
> }
>
>
>
> *Input record:*
>
> emp_id,Name
>
> 1,peter
>
>
>
> *Exception:*
>
> Caused by:
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException:
> *Unrecognized field "emp_id" (class avro.person.EmployeeTest), not marked
> as ignorable (2 known properties: "empId", "name"])*
>
>
>
> I have also found an old JIRA regarding this:
> https://issues.apache.org/jira/browse/FLINK-2874
>
>
>
> Any help would be appreciated!
>
>
>
> Regards,
>
> Kirti Dhar
>
>
>
>
>
>
>
>
>
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Feng Jin
Thanks for the great work! Congratulations


Best,
Feng Jin

On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:

> Congratulations, Well done!
>
> Best,
> Leonard
>
> On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
> wrote:
>
> > Thanks for the great work! Congrats all!
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Jing Ge  于2023年10月27日周五 00:16写道:
> >
> > > The Apache Flink community is very happy to announce the release of
> > Apache
> > > Flink 1.18.0, which is the first release for the Apache Flink 1.18
> > series.
> > >
> > > Apache Flink® is an open-source unified stream and batch data
> processing
> > > framework for distributed, high-performing, always-available, and
> > accurate
> > > data applications.
> > >
> > > The release is available for download at:
> > > https://flink.apache.org/downloads.html
> > >
> > > Please check out the release blog post for an overview of the
> > improvements
> > > for this release:
> > >
> > >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > >
> > > The full release notes are available in Jira:
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > >
> > > We would like to thank all contributors of the Apache Flink community
> who
> > > made this release possible!
> > >
> > > Best regards,
> > > Konstantin, Qingsheng, Sergey, and Jing
> > >
> >
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Feng Jin
Thanks for the great work! Congratulations


Best,
Feng Jin

On Fri, Oct 27, 2023 at 12:36 AM Leonard Xu  wrote:

> Congratulations, Well done!
>
> Best,
> Leonard
>
> On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee 
> wrote:
>
> > Thanks for the great work! Congrats all!
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Jing Ge  于2023年10月27日周五 00:16写道:
> >
> > > The Apache Flink community is very happy to announce the release of
> > Apache
> > > Flink 1.18.0, which is the first release for the Apache Flink 1.18
> > series.
> > >
> > > Apache Flink® is an open-source unified stream and batch data
> processing
> > > framework for distributed, high-performing, always-available, and
> > accurate
> > > data applications.
> > >
> > > The release is available for download at:
> > > https://flink.apache.org/downloads.html
> > >
> > > Please check out the release blog post for an overview of the
> > improvements
> > > for this release:
> > >
> > >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> > >
> > > The full release notes are available in Jira:
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> > >
> > > We would like to thank all contributors of the Apache Flink community
> who
> > > made this release possible!
> > >
> > > Best regards,
> > > Konstantin, Qingsheng, Sergey, and Jing
> > >
> >
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Leonard Xu
Congratulations, Well done!

Best,
Leonard

On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee  wrote:

> Thanks for the great work! Congrats all!
>
> Best,
> Lincoln Lee
>
>
> Jing Ge  于2023年10月27日周五 00:16写道:
>
> > The Apache Flink community is very happy to announce the release of
> Apache
> > Flink 1.18.0, which is the first release for the Apache Flink 1.18
> series.
> >
> > Apache Flink® is an open-source unified stream and batch data processing
> > framework for distributed, high-performing, always-available, and
> accurate
> > data applications.
> >
> > The release is available for download at:
> > https://flink.apache.org/downloads.html
> >
> > Please check out the release blog post for an overview of the
> improvements
> > for this release:
> >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> >
> > The full release notes are available in Jira:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> >
> > We would like to thank all contributors of the Apache Flink community who
> > made this release possible!
> >
> > Best regards,
> > Konstantin, Qingsheng, Sergey, and Jing
> >
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Leonard Xu
Congratulations, Well done!

Best,
Leonard

On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee  wrote:

> Thanks for the great work! Congrats all!
>
> Best,
> Lincoln Lee
>
>
> Jing Ge  于2023年10月27日周五 00:16写道:
>
> > The Apache Flink community is very happy to announce the release of
> Apache
> > Flink 1.18.0, which is the first release for the Apache Flink 1.18
> series.
> >
> > Apache Flink® is an open-source unified stream and batch data processing
> > framework for distributed, high-performing, always-available, and
> accurate
> > data applications.
> >
> > The release is available for download at:
> > https://flink.apache.org/downloads.html
> >
> > Please check out the release blog post for an overview of the
> improvements
> > for this release:
> >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> >
> > The full release notes are available in Jira:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> >
> > We would like to thank all contributors of the Apache Flink community who
> > made this release possible!
> >
> > Best regards,
> > Konstantin, Qingsheng, Sergey, and Jing
> >
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Lincoln Lee
Thanks for the great work! Congrats all!

Best,
Lincoln Lee


Jing Ge  于2023年10月27日周五 00:16写道:

> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.18.0, which is the first release for the Apache Flink 1.18 series.
>
> Apache Flink® is an open-source unified stream and batch data processing
> framework for distributed, high-performing, always-available, and accurate
> data applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this release:
>
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Best regards,
> Konstantin, Qingsheng, Sergey, and Jing
>


Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 Thread Lincoln Lee
Thanks for the great work! Congrats all!

Best,
Lincoln Lee


Jing Ge  于2023年10月27日周五 00:16写道:

> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.18.0, which is the first release for the Apache Flink 1.18 series.
>
> Apache Flink® is an open-source unified stream and batch data processing
> framework for distributed, high-performing, always-available, and accurate
> data applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this release:
>
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Best regards,
> Konstantin, Qingsheng, Sergey, and Jing
>


RE: CSV Decoder with AVRO schema generated Object

2023-10-26 Thread Kirti Dhar Upadhyay K via user
Hi Alexander,

Thanks for reply.
Actually I have a system where data travels in form of user defined, AVRO 
schema generated objects.

Sample code:

static void readCsvWithCustomSchemaDecoder(StreamExecutionEnvironment env, Path 
dataDirectory) throws Exception {

Class recordClazz = EmployeeTest.class; // This is AVRO generated java 
object having fields emp_id and Name
CsvSchema.Builder builder = 
CsvSchema.builder().setUseHeader(true).setReorderColumns(true).setColumnSeparator(',').

setEscapeChar('"').setLineSeparator(System.lineSeparator()).setQuoteChar('"').setArrayElementSeparator(";").
setNullValue("");

CsvReaderFormat csvFormat = 
CsvReaderFormat.forSchema(CsvSchema.builder().build(), 
TypeInformation.of(recordClazz));

FileSource.FileSourceBuilder fileSourceBuilder = 
FileSource.forRecordStreamFormat(csvFormat, 
dataDirectory).monitorContinuously(Duration.ofSeconds(30));
fileSourceBuilder.setFileEnumerator((FileEnumerator.Provider) () -> new 
NonSplittingRecursiveEnumerator(new DefaultFileFilter()));
FileSource source = fileSourceBuilder.build();

final DataStreamSource file = env.fromSource(source, 
WatermarkStrategy.forMonotonousTimestamps()
.withTimestampAssigner(new WatermarkAssigner((Object input) -> 
System.currentTimeMillis())),"FileSource");
file.print();
}


Regards,
Kirti Dhar

From: Alexander Fedulov 
Sent: 26 October 2023 20:59
To: Kirti Dhar Upadhyay K 
Cc: user@flink.apache.org
Subject: Re: CSV Decoder with AVRO schema generated Object

Hi Kirti,

What do you mean exactly by "Flink CSV Decoder"? Please provide a snippet of 
the code that you are trying to execute.

To be honest, combining CSV with AVRO-generated classes sounds rather strange 
and you might want to reconsider your approach.
As for a quick fix, using aliases in your reader schema might help [1]

[1] https://avro.apache.org/docs/1.8.1/spec.html#Aliases

Best,
Alexander Fedulov

On Thu, 26 Oct 2023 at 16:24, Kirti Dhar Upadhyay K via user 
mailto:user@flink.apache.org>> wrote:
Hi Team,

I am using Flink CSV Decoder with AVSC generated java Object and facing issue 
if the field name contains underscore(_) or fieldname starts with Capital case.
Sample Schema:

{
  "namespace": "avro.employee",
  "type": "record",
  "name": "EmployeeTest",
  "fields": [
{
  "name": "emp_id",
  "type": ["null","long"]
},
{
  "name": "Name",
  "type": ["null","string"]
}
]
}

Generated Java Object getters/setters:



public void setEmpId(java.lang.Long value) {
  this.emp_id = value;
}



.

.



public java.lang.CharSequence getName() {
  return Name;
}


Input record:
emp_id,Name
1,peter

Exception:
Caused by: 
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException:
 Unrecognized field "emp_id" (class avro.person.EmployeeTest), not marked as 
ignorable (2 known properties: "empId", "name"])

I have also found an old JIRA regarding this: 
https://issues.apache.org/jira/browse/FLINK-2874

Any help would be appreciated!

Regards,
Kirti Dhar






Re: CSV Decoder with AVRO schema generated Object

2023-10-26 Thread Alexander Fedulov
Hi Kirti,

What do you mean exactly by "Flink CSV Decoder"? Please provide a snippet
of the code that you are trying to execute.

To be honest, combining CSV with AVRO-generated classes sounds rather
strange and you might want to reconsider your approach.
As for a quick fix, using aliases in your reader schema might help [1]

[1] https://avro.apache.org/docs/1.8.1/spec.html#Aliases

Best,
Alexander Fedulov

On Thu, 26 Oct 2023 at 16:24, Kirti Dhar Upadhyay K via user <
user@flink.apache.org> wrote:

> Hi Team,
>
>
>
> I am using Flink CSV Decoder with AVSC generated java Object and facing
> issue if the field name contains underscore(_) or fieldname starts with
> Capital case.
>
> *Sample Schema:*
>
> {
>   "namespace": "avro.employee",
>   "type": "record",
>   "name": "EmployeeTest",
>   "fields": [
> {
>   "name": "emp_id",
>   "type": ["null","long"]
> },
> {
>   "name": "Name",
>   "type": ["null","string"]
> }
> ]
> }
>
>
>
> Generated Java Object getters/setters:
>
>
>
> public void *setEmpId*(java.lang.Long value) {
>   this.*emp_id* = value;
> }
>
>
>
> .
>
> .
>
>
>
> public java.lang.CharSequence *getName*() {
>   return *Name*;
> }
>
>
>
> *Input record:*
>
> emp_id,Name
>
> 1,peter
>
>
>
> *Exception:*
>
> Caused by:
> org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException:
> *Unrecognized field "emp_id" (class avro.person.EmployeeTest), not marked
> as ignorable (2 known properties: "empId", "name"])*
>
>
>
> I have also found an old JIRA regarding this:
> https://issues.apache.org/jira/browse/FLINK-2874
>
>
>
> Any help would be appreciated!
>
>
>
> Regards,
>
> Kirti Dhar
>
>
>
>
>
>
>
>
>


Re: Flink1.17.1 yarn token 过期问题

2023-10-26 Thread Paul Lam
Hello,

这个问题解决了吗?我遇到相同的问题,还没定为到原因。

Best,
Paul Lam

> 2023年7月20日 12:04,王刚  写道:
> 
> 异常栈信息
> ```
> 
> 2023-07-20 11:43:01,627 ERROR 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner  [] - Terminating 
> TaskManagerRunner with exit code 1.
> org.apache.flink.util.FlinkException: Failed to start the TaskManagerRunner.
>at 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManager(TaskManagerRunner.java:488)
>  ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
>at 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.lambda$runTaskManagerProcessSecurely$5(TaskManagerRunner.java:530)
>  ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
>at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_92]
>at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_92]
>at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>  ~[hadoop-common-3.2.1.jar:?]
>at 
> org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>  ~[flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
>at 
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner.runTaskManagerProcessSecurely(TaskManagerRunner.java:530)
>  [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
>at 
> org.apache.flink.yarn.YarnTaskExecutorRunner.runTaskManagerSecurely(YarnTaskExecutorRunner.java:94)
>  [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
>at 
> org.apache.flink.yarn.YarnTaskExecutorRunner.main(YarnTaskExecutorRunner.java:68)
>  [flink-dist-1.17-SNAPSHOT.jar:1.17-SNAPSHOT]
> Caused by: org.apache.hadoop.ipc.RemoteException: token (token for flink: 
> HDFS_DELEGATION_TOKEN 
> owner=flink/lf-client-flink-28-243-196.hadoop.local@HADOOP.LOCAL, renewer=, 
> realUser=, issueDate=1689734389821, maxDate=1690339189821, 
> sequenceNumber=266208479, masterKeyId=1131) can't be found in cache
>at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1557) 
> ~[hadoop-common-3.2.1.jar:?]
>at org.apache.hadoop.ipc.Client.call(Client.java:1494) 
> ~[hadoop-common-3.2.1.jar:?]
>at org.apache.hadoop.ipc.Client.call(Client.java:1391) 
> ~[hadoop-common-3.2.1.jar:?]
>at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
>  ~[hadoop-common-3.2.1.jar:?]
>at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>  ~[hadoop-common-3.2.1.jar:?]
>at com.sun.proxy.$Proxy26.mkdirs(Unknown Source) ~[?:?]
>at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:660)
>  ~[hadoop-hdfs-client-3.2.1.jar:?]
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_92]
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_92]
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_92]
>at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_92]
>at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>  ~[hadoop-common-3.2.1.jar:?]
>at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>  ~[hadoop-common-3.2.1.jar:?]
>at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>  ~[hadoop-common-3.2.1.jar:?]
>at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>  ~[hadoop-common-3.2.1.jar:?]
>at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>  ~[hadoop-common-3.2.1.jar:?]
>at com.sun.proxy.$Proxy27.mkdirs(Unknown Source) ~[?:?]
>at 
> org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2425) 
> ~[hadoop-hdfs-client-3.2.1.jar:?]
>at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2401) 
> ~[hadoop-hdfs-client-3.2.1.jar:?]
>at 
> org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1318)
>  ~[hadoop-hdfs-client-3.2.1.jar:?]
>at 
> org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1315)
>  ~[hadoop-hdfs-client-3.2.1.jar:?]
>at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  ~[hadoop-common-3.2.1.jar:?]
>at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1332)
>  ~[hadoop-hdfs-client-3.2.1.jar:?]
>at 
> org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1307)
>  ~[hadoop-hdfs-client-3.2.1.jar:?]
>at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2275) 
> ~[hadoop-common-3.2.1.jar:?]
>at 
> 

Re: Barriers in Flink SQL

2023-10-25 Thread Giannis Polyzos
Hi Ralph,
can you explain a bit more? When you say "barriers" you should be referring
to the checkpoints, but from your description seems more like watermarks.
What functionality is supported in Flink and not Flink SQL? In terms of
watermarks, there were a few shortcomings between the two APIs which are
addressed in the upcoming 1.18 release
https://cwiki.apache.org/confluence/display/FLINK/FLIP-296%3A+Extend+watermark-related+features+for+SQL
What operations are you running and you see incorrect results?

Best

On Wed, Oct 25, 2023 at 9:51 PM Ralph Matthias Debusmann <
matthias.debusm...@gmail.com> wrote:

> Hi,
>
> one question - it seems that "barriers" are perfectly supported by Flink,
> but not yet supported in Flink SQL.
>
> When I e.g. do a UNION of two views derived from one source table fed by
> Kafka, I get thousands of intermediate results which are incorrect (the
> example I am using is this one:
> https://www.scattered-thoughts.net/writing/internal-consistency-in-streaming-systems/),
> and I can only be sure to get the correct result if I stop producing new
> messages into the source table.
>
> 1) Is my understanding correct that barriers are not implemented for Flink
> SQL?
> 2) Why is it not implemented/is this on the roadmap?
>
> Best, Ralph
>
>


Re: Unsubscribe from user list.

2023-10-24 Thread Hang Ruan
Hi,
Please send email to user-unsubscr...@flink.apache.org if you want to
unsubscribe the mail from user@flink.apache.org, and you can refer [1][2]
for more details.

Best,
Hang

[1]
https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8
[2] https://flink.apache.org/community.html#mailing-lists


bharghavi vajrala  于2023年10月21日周六 19:03写道:

> Team,
>
> Please unsubscribe my email id.
>
> On Thu, Oct 19, 2023 at 6:25 AM jihe18717838093 <18717838...@126.com>
> wrote:
>
>> Hi team,
>>
>>
>>
>> Could you please remove this email from the subscription list?
>>
>>
>>
>> Thank you!
>>
>>
>>
>> Best,
>>
>> Minglei
>>
>


Re:Re: Re: Flink SQL: non-deterministic join behavior, is it expected?

2023-10-24 Thread Xuyang
Hi, the plan seems correct. You can debug in StreamingJoinOperator#processLeft 
and #processRight by writing an IT case to see what happened. Do these records 
arrive the join operator out of order? If they are, maybe you can check the 
order while these data are sent in source.







--

Best!
Xuyang




在 2023-10-23 22:35:45,"Yaroslav Tkachenko"  写道:

Hi, sure, sharing it again:


SELECT a.funder, a.amounts_added, r.amounts_removed FROM table_a AS a JOIN 
table_b AS r ON a.funder = r.funder



and the Optimized Execution Plan:


Calc(select=[funder, vid AS a_vid, vid0 AS r_vid, amounts_added, 
amounts_removed])
+- Join(joinType=[InnerJoin], where=[(funder = funder0)], select=[vid, funder, 
amounts_added, vid0, funder0, amounts_removed], leftInputSpec=[NoUniqueKey], 
rightInputSpec=[NoUniqueKey])
   :- Exchange(distribution=[hash[funder]])
   :  +- Calc(select=[vid, funder, amounts_added])
   : +- TableSourceScan(table=[[*anonymous_datastream_source$1*]], 
fields=[vid, block_range, id, timestamp, fpmm, funder, amounts_added, 
amounts_refunded, shares_minted, _gs_chain, _gs_gid])
   +- Exchange(distribution=[hash[funder]])
  +- Calc(select=[vid, funder, amounts_removed])
 +- TableSourceScan(table=[[*anonymous_datastream_source$2*]], 
fields=[vid, block_range, id, timestamp, fpmm, funder, amounts_removed, 
collateral_removed, shares_burnt, _gs_chain, _gs_gid])



So I see an Exchange, which makes sense. I'm still confused about how it can be 
non-deterministic... 


On Mon, Oct 23, 2023 at 5:51 AM Xuyang  wrote:

Hi, Could you share the original SQL? If not, could you share the plan after 
executing 'EXPLAIN ...'. There should be one node 'Exchange' as the both inputs 
of the 'Join'  in both "== Optimized Physical Plan ==" and "== Optimized 
Execution Plan ==".




--

Best!
Xuyang




在 2023-10-20 15:57:28,"Yaroslav Tkachenko"  写道:

Hi Xuyang,


A shuffle by join key is what I'd expect, but I don't see it. The issue only 
happens with parallelism > 1. 


> do you mean the one +I record and two +U records arrive the sink with random 
> order?


Yes. 


On Fri, Oct 20, 2023 at 4:48 AM Xuyang  wrote:

Hi. Actually the results that arrive join are shuffled by join keys by design. 


In your test, do you means the one +I record and two +U records arrive the sink 
with random order? What is the parallelism of these operators ? It would be 
better if you could post an example that can be reproduced.










--

Best!
Xuyang




At 2023-10-20 04:31:09, "Yaroslav Tkachenko"  wrote:

Hi everyone,


I noticed that a simple INNER JOIN in Flink SQL behaves non-deterministicly. 
I'd like to understand if it's expected and whether an issue is created to 
address it.




In my example, I have the following query: 


SELECT a.funder, a.amounts_added, r.amounts_removed FROM table_a AS a JOIN 
table_b AS r ON a.funder = r.funder



Let's say I have three records with funder 12345 in the table_a and a single 
record with funder 12345 in the table_b. When I run this Flink job, I can see 
an INSERT with two UPDATEs as my results (corresponding to the records from 
table_a), but their order is not deterministic. If I re-run the application 
several times, I can see different results. 


It looks like Flink uses a GlobalPartitioner in this case, which tells me that 
it doesn't perform a shuffle on the column used in the join condition.




I use Flink 1.17.1. Appreciate any insights here! 

Re:如何在Flink Connector Source退出时清理资源

2023-10-23 Thread Xuyang
Hi, 
看一下你的DynamicTableSource实现的类,如果你用的是InputFormat的旧source(用的是类似InputFormatProvider.of),可以使用InputFormat里的close方法;
如果用的是flip-27的source(用的是类似SourceProvider.of),SplitReader里也有一个close方法










--

Best!
Xuyang





在 2023-10-24 11:54:36,"jinzhuguang"  写道:
>版本:Flink 1.16.0
>
>需求:在某个source结束退出时清理相关的资源。
>
>问题:目前没有找到Source退出时相关的hook函数,不知道在哪里编写清理资源的代码。
>
>恳请大佬们指教。


Re: Flink 1.17.2 planned?

2023-10-23 Thread Deepyaman Datta
Hi Jing,

My team and I have been blocked by the need for a PyFlink release including
https://github.com/apache/flink/pull/23141, and I saw that you mentioned
that anybody can be the release manager of a bug fix release. Could we
explore what it would take for me to do this (assuming nobody is already
handling the release)?

Best regards,
Deepyaman


Re: Re: Flink SQL: non-deterministic join behavior, is it expected?

2023-10-23 Thread Yaroslav Tkachenko
Hi, sure, sharing it again:

SELECT a.funder, a.amounts_added, r.amounts_removed FROM table_a AS a JOIN
table_b AS r ON a.funder = r.funder

and the Optimized Execution Plan:

Calc(select=[funder, vid AS a_vid, vid0 AS r_vid, amounts_added,
amounts_removed])
+- Join(joinType=[InnerJoin], where=[(funder = funder0)], select=[vid,
funder, amounts_added, vid0, funder0, amounts_removed],
leftInputSpec=[NoUniqueKey], rightInputSpec=[NoUniqueKey])
   :- Exchange(distribution=[hash[funder]])
   :  +- Calc(select=[vid, funder, amounts_added])
   : +- TableSourceScan(table=[[*anonymous_datastream_source$1*]],
fields=[vid, block_range, id, timestamp, fpmm, funder, amounts_added,
amounts_refunded, shares_minted, _gs_chain, _gs_gid])
   +- Exchange(distribution=[hash[funder]])
  +- Calc(select=[vid, funder, amounts_removed])
 +- TableSourceScan(table=[[*anonymous_datastream_source$2*]],
fields=[vid, block_range, id, timestamp, fpmm, funder, amounts_removed,
collateral_removed, shares_burnt, _gs_chain, _gs_gid])

So I see an Exchange, which makes sense. I'm still confused about how it
can be non-deterministic...

On Mon, Oct 23, 2023 at 5:51 AM Xuyang  wrote:

> Hi, Could you share the original SQL? If not, could you share the plan
> after executing 'EXPLAIN ...'. There should be one node 'Exchange' as the
> both inputs of the 'Join'  in both "== Optimized Physical Plan ==" and "==
> Optimized Execution Plan ==".
>
>
> --
> Best!
> Xuyang
>
>
> 在 2023-10-20 15:57:28,"Yaroslav Tkachenko"  写道:
>
> Hi Xuyang,
>
> A shuffle by join key is what I'd expect, but I don't see it. The issue
> only happens with parallelism > 1.
>
> > do you mean the one +I record and two +U records arrive the sink with
> random order?
>
> Yes.
>
> On Fri, Oct 20, 2023 at 4:48 AM Xuyang  wrote:
>
>> Hi. Actually the results that arrive join are shuffled by join keys by
>> design.
>>
>> In your test, do you means the one +I record and two +U records arrive
>> the sink with random order? What is the parallelism of these operators ? It
>> would be better if you could post an example that can be reproduced.
>>
>>
>>
>>
>> --
>> Best!
>> Xuyang
>>
>>
>> At 2023-10-20 04:31:09, "Yaroslav Tkachenko" 
>> wrote:
>>
>> Hi everyone,
>>
>> I noticed that a simple INNER JOIN in Flink SQL behaves
>> non-deterministicly. I'd like to understand if it's expected and whether an
>> issue is created to address it.
>>
>>
>> In my example, I have the following query:
>>
>> SELECT a.funder, a.amounts_added, r.amounts_removed FROM table_a AS a
>> JOIN table_b AS r ON a.funder = r.funder
>>
>> Let's say I have three records with funder 12345 in the table_a and a
>> single record with funder 12345 in the table_b. When I run this Flink job,
>> I can see an INSERT with two UPDATEs as my results (corresponding to the
>> records from table_a), but their order is not deterministic. If I re-run
>> the application several times, I can see different results.
>>
>> It looks like Flink uses a GlobalPartitioner in this case, which tells me
>> that it doesn't perform a shuffle on the column used in the join condition.
>>
>>
>> I use Flink 1.17.1. Appreciate any insights here!
>>
>>


Re: Problems with the state.backend.fs.memory-threshold parameter

2023-10-23 Thread rui chen
Hi,Zakelly
Thank you for your answer.

Best,
rui


Zakelly Lan  于2023年10月13日周五 19:12写道:

> Hi rui,
>
> The 'state.backend.fs.memory-threshold' configures the threshold below
> which state is stored as part of the metadata, rather than in separate
> files. So as a result the JM will use its memory to merge small
> checkpoint files and write them into one file. Currently the
> FLIP-306[1][2] is proposed to merge small checkpoint files without
> consuming JM memory. This feature is currently being worked on and is
> targeted for the next minor release (1.19).
>
>
> Best,
> Zakelly
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-306%3A+Unified+File+Merging+Mechanism+for+Checkpoints
> [2] https://issues.apache.org/jira/browse/FLINK-32070
>
> On Fri, Oct 13, 2023 at 6:28 PM rui chen  wrote:
> >
> > We found that for some tasks, the JM memory continued to increase. I set
> > the parameter of state.backend.fs.memory-threshold to 0, and the JM
> memory
> > would no longer increase, but many small files might be written in this
> > way. Does the community have any optimization plan for this area?
>


Re: Problems with the state.backend.fs.memory-threshold parameter

2023-10-23 Thread rui chen
Hi,Zakelly
Thank you for your answer.

Best,
rui


Zakelly Lan  于2023年10月13日周五 19:12写道:

> Hi rui,
>
> The 'state.backend.fs.memory-threshold' configures the threshold below
> which state is stored as part of the metadata, rather than in separate
> files. So as a result the JM will use its memory to merge small
> checkpoint files and write them into one file. Currently the
> FLIP-306[1][2] is proposed to merge small checkpoint files without
> consuming JM memory. This feature is currently being worked on and is
> targeted for the next minor release (1.19).
>
>
> Best,
> Zakelly
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-306%3A+Unified+File+Merging+Mechanism+for+Checkpoints
> [2] https://issues.apache.org/jira/browse/FLINK-32070
>
> On Fri, Oct 13, 2023 at 6:28 PM rui chen  wrote:
> >
> > We found that for some tasks, the JM memory continued to increase. I set
> > the parameter of state.backend.fs.memory-threshold to 0, and the JM
> memory
> > would no longer increase, but many small files might be written in this
> > way. Does the community have any optimization plan for this area?
>


Re: Needs help debugging an issue

2023-10-23 Thread Ashish Khatkar via user
The additional exceptions with the same error but on different files

Pyflink lib error :

java.lang.RuntimeException: An error occurred while copying the file.
at org.apache.flink.api.common.cache.DistributedCache.getFile(
DistributedCache.java:158)
at org.apache.flink.python.env.PythonDependencyInfo.create(
PythonDependencyInfo.java:151)
at org.apache.flink.streaming.api.operators.python.process.
AbstractExternalPythonFunctionOperator.createPythonEnvironmentManager(
AbstractExternalPythonFunctionOperator.java:124)
at org.apache.flink.table.runtime.operators.python.aggregate.
AbstractPythonStreamAggregateOperator.createPythonFunctionRunner(
AbstractPythonStreamAggregateOperator.java:176)
at org.apache.flink.streaming.api.operators.python.process.
AbstractExternalPythonFunctionOperator.open(
AbstractExternalPythonFunctionOperator.java:56)
at org.apache.flink.table.runtime.operators.python.aggregate.
AbstractPythonStreamAggregateOperator.open(
AbstractPythonStreamAggregateOperator.java:160)
at org.apache.flink.table.runtime.operators.python.aggregate.
AbstractPythonStreamGroupAggregateOperator.open(
AbstractPythonStreamGroupAggregateOperator.java:116)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain
.initializeStateAndOpenOperators(RegularOperatorChain.java:107)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(
StreamTask.java:734)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(
StreamTaskActionExecutor.java:55)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(
StreamTask.java:709)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask
.java:675)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(
Task.java:952)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:921)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.nio.file.FileAlreadyExistsException: File already exists:
/tmp/flink-dist-cache-6c6899b4-694e-43b7-a081-f58ade99f212/9572e
aa67fa026c8cfa1ebd5435a5c29/plugin_directory/venv/lib/python3.8
/site-packages/pyflink/lib/flink-json-1.17.0.jar
at org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem
.java:257)
at org.apache.flink.util.FileUtils.expandDirectory(FileUtils.java:536)
at org.apache.flink.runtime.filecache.FileCache$CopyFromBlobProcess.call(
FileCache.java:289)
at org.apache.flink.runtime.filecache.FileCache$CopyFromBlobProcess.call(
FileCache.java:261)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors
.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.
ScheduledThreadPoolExecutor$ScheduledFutureTask.run(
ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:628)
... 1 more



Cpython file error:

java.lang.RuntimeException: An error occurred while copying the file.
at org.apache.flink.api.common.cache.DistributedCache.getFile(
DistributedCache.java:158)
at org.apache.flink.python.env.PythonDependencyInfo.create(
PythonDependencyInfo.java:151)
at org.apache.flink.streaming.api.operators.python.process.
AbstractExternalPythonFunctionOperator.createPythonEnvironmentManager(
AbstractExternalPythonFunctionOperator.java:124)
at org.apache.flink.table.runtime.operators.python.aggregate.
AbstractPythonStreamAggregateOperator.createPythonFunctionRunner(
AbstractPythonStreamAggregateOperator.java:176)
at org.apache.flink.streaming.api.operators.python.process.
AbstractExternalPythonFunctionOperator.open(
AbstractExternalPythonFunctionOperator.java:56)
at org.apache.flink.table.runtime.operators.python.aggregate.
AbstractPythonStreamAggregateOperator.open(
AbstractPythonStreamAggregateOperator.java:160)
at org.apache.flink.table.runtime.operators.python.aggregate.
AbstractPythonStreamGroupAggregateOperator.open(
AbstractPythonStreamGroupAggregateOperator.java:116)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain
.initializeStateAndOpenOperators(RegularOperatorChain.java:107)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(
StreamTask.java:734)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(
StreamTaskActionExecutor.java:55)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(
StreamTask.java:709)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask
.java:675)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(
Task.java:952)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:921)
at 

Re: [Question] How to scale application based on 'reactive' mode

2023-10-23 Thread Dennis Jung
Hello,
Thanks for feedback. I'll start with these.

Regards

2023년 9월 7일 (목) 오후 7:08, Gyula Fóra 님이 작성:

> Jung,
> I don't want to sound unhelpful, but I think the best thing for you to do
> is simply to try these different models in your local env.
> It should be very easy to get started with the Kubernetes Operator on
> Kind/Minikube (
> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes-operator/quick-start/
> )
>
> It's very difficult to answer these questions fully here. Try the
> different modes, observe what happens, read the docs and you will get all
> the answers.
>
> Gyula
>
> On Thu, Sep 7, 2023 at 10:11 AM Dennis Jung  wrote:
>
>> Hello Chen,
>> Thanks for your reply! I have further questions as following...
>>
>> 1. In case of non-reactive mode in Flink 1.18, if the autoscaler adjusts
>> parallelism, what is the difference by using 'reactive' mode?
>> 2. In case if I use Flink 1.15~1.17 without autoscaler, is the difference
>> of using 'reactive' mode is, changing parallelism dynamically by change of
>> TM number (manually, or by custom scaler)?
>>
>> Regards,
>> Jung
>>
>>
>> 2023년 9월 5일 (화) 오후 3:59, Chen Zhanghao 님이 작성:
>>
>>> Hi Dennis,
>>>
>>>
>>>1. In Flink 1.18 + non-reactive mode, autoscaler adjusts the job's
>>>parallelism and the job will request for extra TMs if the current ones
>>>cannot satisfy its need and redundant TMs will be released automatically
>>>later for being idle. In other words, parallelism changes cause TM number
>>>change.
>>>2. The core metrics used is busy time (the amount of time spent on
>>>task processing per 1 second = 1 s - backpressured time - idle time), it 
>>> is
>>>considered to be superior as it counts I/O cost etc into account as well.
>>>Also, the metrics is on a per-task granularity and allows us to identify
>>>bottleneck tasks.
>>>3. Autoscaler feature currently only works for K8s opeartor + native
>>>K8s mode.
>>>
>>>
>>> Best,
>>> Zhanghao Chen
>>> --
>>> *发件人:* Dennis Jung 
>>> *发送时间:* 2023年9月2日 12:58
>>> *收件人:* Gyula Fóra 
>>> *抄送:* user@flink.apache.org 
>>> *主题:* Re: [Question] How to scale application based on 'reactive' mode
>>>
>>> Hello,
>>> Thanks for your notice.
>>>
>>> 1. In "Flink 1.18 + non-reactive", is parallelism being changed by the
>>> number of TM?
>>> 2. In the document(
>>> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.6/docs/custom-resource/autoscaler/),
>>> it said "we are not using any container memory / CPU utilization metrics
>>> directly here". Which metrics are these using internally?
>>> 3. I'm using standalone k8s(
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/)
>>> for deployment. Is autoscaler features only available by using the "flink
>>> k8s operator"(sorry I don't understand this clearly yet...)?
>>>
>>> Regards
>>>
>>>
>>> 2023년 9월 1일 (금) 오후 10:20, Gyula Fóra 님이 작성:
>>>
>>> Pretty much, except that with Flink 1.18 autoscaler can scale the job in
>>> place without restarting the JM (even without reactive mode )
>>>
>>> So actually best option is autoscaler with Flink 1.18 native mode (no
>>> reactive)
>>>
>>> Gyula
>>>
>>> On Fri, 1 Sep 2023 at 13:54, Dennis Jung  wrote:
>>>
>>> Thanks for feedback.
>>> Could you check whether I understand correctly?
>>>
>>> *Only using 'reactive' mode:*
>>> By manually adding TaskManager(TM) (such as using './bin/taskmanager.sh
>>> start'), parallelism will be increased. For example, when job parallelism
>>> is 1 and TM is 1, and if adding 1 new TM, JobManager will be restarted and
>>> parallelism will be 2.
>>> But the number of TM is not being controlled automatically.
>>>
>>> *Autoscaler + non-reactive:*
>>> It can flexibilly control the number of TM by several metrics(CPU usage,
>>> throughput, ...), and JobManager will be restarted when scaling. But job
>>> parallelism is the same after the number of TM has been changed.
>>>
>>> *Autoscaler + 'reactive' mode*:
>>> It can control numbers of TM by metric, and increase/decrease job
>>>

Re: Flink SQL: MySQL to Elaticsearch soft delete

2023-10-22 Thread Feng Jin
Hi Hemi

You can not just filter the delete records.

You must use the following syntax to generate a delete record.

```
CREATE TABLE test_source (f1 xxx, f2, xxx, f3 xxx,  deleted boolean) with
(.);

INSERT INTO es_sink
SELECT f1, f2, f3
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY f1, f2   ORDER BY proctime())  as rnk
from test_source
) where rnk = 1 AND deleted = false;
```


Best,
Feng


On Sat, Oct 21, 2023 at 11:32 PM Hemi Grs  wrote:

> Hi Feng,
>
> thanks for the reply. I actually already tried that because we have a
> deletedAt column (if the record is deleted then it will update with the
> timestamp to that column).
>
> What I've tried is as follows:
>
> insert into es_sink_table
> select * from table_source where deletedAt is null
>
> what it actually do is just not updating any more changes to the elastic
> because it's not queried but it't not actually deleting the documents in
> elastic.
> What I want to achieve is to actually delete the record/document in
> elastic ...
>
> What I'm thinking right now is actually just doing a job every few minutes
> to delete all records where the deletedAt is not null in elastic. But this
> approach is not elegant at all :) That's why I was hoping there'll be a
> better method to do this ...
>
>
>
> On Sat, Oct 21, 2023 at 5:21 PM Feng Jin  wrote:
>
>> Hi Hemi,
>>
>> One possible way, but it may generate many useless states.
>>
>> As shown below:
>> ```
>> CREATE TABLE test_source (f1 xxx, f2, xxx, f3 xxx,  deleted boolean) with
>> (.);
>>
>> INSERT INTO es_sink
>> SELECT f1, f2, f3
>> FROM (
>> SELECT *,
>> ROW_NUMBER() OVER (PARTITION BY f1, f2   ORDER BY proctime())  as rnk
>> from test_source
>> ) where rnk = 1 AND deleted = false;
>> ```
>>
>> Best,
>> Feng
>>
>> On Fri, Oct 20, 2023 at 1:38 PM Hemi Grs  wrote:
>>
>>> hello everyone,
>>>
>>> right now I'm using flink to sync from mysql to elasticsearch and so far
>>> so good. If we insert, update, or delete it will sync from mysql to elastic
>>> without any problem.
>>>
>>> The problem I have right now is the application is not actually doing
>>> hard delete to the records in mysql, but doing soft delete (updating a
>>> deletedAt column).
>>>
>>> Because it's not actually doing a deletion, flink is not deleting the
>>> data in elastic. How do I make it so it will delete the data in elastic?
>>>
>>> Thanks
>>>
>>


Re: Issue with flink-kubernetes-operator not updating execution.savepoint.path after savepoint deletion

2023-10-21 Thread Gyula Fóra
Hi Tony,

It doesn’t seem like the operator had too much to do with this error , I
wonder if this would still happen in newer Flink versions with the
JobResultStore already available.

It would be great to try. In any case I highly recommend upgrading to newer
Flink versions for better operator integration and general stability.

The next operator release (1.7.0) will drop support for Flink 1.13 and 1.14
as agreed by the community to only support the last 4 stable Flink minor
versions .

Cheers
Gyula

On Sat, 21 Oct 2023 at 20:49, Tony Chen  wrote:

> Hi Gyula,
>
> After upgrading our operator version to the HEAD commit of the release-1.6
> branch (
> https://github.com/apache/flink-kubernetes-operator/pkgs/container/flink-kubernetes-operator/127962962?tag=3f0dc2e),
> we are still seeing this same issue.
>
> Here's the log message on the last savepoint (log timestamp is in UTC):
>
> 2023-10-21 10:21:14,023 INFO
>>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] -
>> Completed checkpoint 87794 for job ee4f7c678794ee16506f9b41425c244e
>> (698450687 bytes, checkpointDuration=5601 ms, finalizationTime=296 ms).
>
>
> 4 minutes later, ConnectException occurred, and the jobmanager attempts to
> restart from the last savepoint first:
>
> 2023-10-21 10:25:30,725 WARN  akka.remote.transport.netty.NettyTransport
>> [] - Remote connection to [null] failed with
>> java.net.ConnectException: Connection refused: /10.11.181.62:6122
>> 2023-10-21 10:25:30,726 WARN  akka.remote.ReliableDeliverySupervisor
>>   [] - Association with remote system [akka.tcp://
>> flink@10.11.181.62:6122] has failed, address is now gated for [50] ms.
>> Reason: [Association failed with [akka.tcp://flink@10.11.181.62:6122]]
>> Caused by: [java.net.ConnectException: Connection refused: /
>> 10.11.181.62:6122]
>> 2023-10-21 10:25:37,935 WARN
>>  org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No
>> hostname could be resolved for the IP address 10.11.202.152, using IP
>> address as host name. Local input split assignment (such as for HDFS files)
>> may be impacted.
>> 2023-10-21 10:25:37,936 INFO
>>  org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Job
>>  (ee4f7c678794ee16506f9b41425c244e) switched from state
>> RESTARTING to RUNNING.
>> 2023-10-21 10:25:37,936 INFO
>>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] -
>> Restoring job ee4f7c678794ee16506f9b41425c244e from Savepoint 87794 @
>> 1697883668126 for ee4f7c678794ee16506f9b41425c244e located at
>> s3:///savepoint-ee4f7c-9c6499126fd0.
>> 2023-10-21 10:25:37,937 INFO
>>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - No
>> master state to restore
>
>
> However, a RecipientUnreachableException occurs, and the HA data gets
> cleaned up. Eventually, the Flink cluster shuts down and restarts:
>
>
>> org.apache.flink.runtime.rpc.exceptions.RecipientUnreachableException:
>> Could not send message
>> [RemoteRpcInvocation(null.submitTask(TaskDeploymentDescriptor, JobMasterId,
>> Time))] from sender [Actor[akka://flink/temp/taskmanager_0$ENE]] to
>> recipient [Actor[akka.tcp://
>> flink@10.11.181.62:6122/user/rpc/taskmanager_0#-43671188]], because the
>> recipient is unreachable. This can either mean that the recipient has been
>> terminated or that the remote RpcService is currently not reachable.
>> at
>> org.apache.flink.runtime.rpc.akka.DeadLettersActor.handleDeadLetter(DeadLettersActor.java:61)
>> ~[flink-rpc-akka_61fdae14-7548-48be-b7c8-11190d636910.jar:1.14.5]
>> ...
>> 2023-10-21 10:25:37,946 INFO
>>  org.apache.flink.runtime.executiongraph.ExecutionGraph   [] -
>> Discarding the results produced by task execution
>> 86d39b748d3655b6488fb9eaafb34f73.
>> ...
>> 2023-10-21 10:25:40,063 INFO
>>  org.apache.flink.kubernetes.highavailability.KubernetesHaServices [] -
>> Finished cleaning up the high availability data.
>> ...
>> 2023-10-21 10:25:40,170 INFO
>>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] -
>> Terminating cluster entrypoint process
>> KubernetesApplicationClusterEntrypoint with exit code 1443.
>> ...
>> 2023-10-21 10:25:44,631 INFO
>>  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] -
>> Recovered 2 pods from previous attempts, current attempt id is 2.
>> ...
>> 2023-10-21 10:25:44,631 INFO
>>  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
>> Recovered 2 workers from previous attempt.
>> ...
>> 2023-10-21 10:25:45,015 ERROR
>> org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler [] -
>> Unhandled exception.
>> org.apache.flink.runtime.rpc.exceptions.RecipientUnreachableException:
>> Could not send message
>> [RemoteFencedMessage(b55fb309bb698aa75925f70bce254756,
>> RemoteRpcInvocation(null.requestMultipleJobDetails(Time)))] from sender
>> [Actor[akka.tcp://flink@10.11.76.167:6123/temp/dispatcher_0$Tb]] to
>> recipient [Actor[akka://flink/user/rpc/dispatcher_0#1755511719]], 

Re: Issue with flink-kubernetes-operator not updating execution.savepoint.path after savepoint deletion

2023-10-21 Thread Tony Chen
Hi Gyula,

After upgrading our operator version to the HEAD commit of the release-1.6
branch (
https://github.com/apache/flink-kubernetes-operator/pkgs/container/flink-kubernetes-operator/127962962?tag=3f0dc2e),
we are still seeing this same issue.

Here's the log message on the last savepoint (log timestamp is in UTC):

2023-10-21 10:21:14,023 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] -
> Completed checkpoint 87794 for job ee4f7c678794ee16506f9b41425c244e
> (698450687 bytes, checkpointDuration=5601 ms, finalizationTime=296 ms).


4 minutes later, ConnectException occurred, and the jobmanager attempts to
restart from the last savepoint first:

2023-10-21 10:25:30,725 WARN  akka.remote.transport.netty.NettyTransport
> [] - Remote connection to [null] failed with
> java.net.ConnectException: Connection refused: /10.11.181.62:6122
> 2023-10-21 10:25:30,726 WARN  akka.remote.ReliableDeliverySupervisor
> [] - Association with remote system [akka.tcp://
> flink@10.11.181.62:6122] has failed, address is now gated for [50] ms.
> Reason: [Association failed with [akka.tcp://flink@10.11.181.62:6122]]
> Caused by: [java.net.ConnectException: Connection refused: /
> 10.11.181.62:6122]
> 2023-10-21 10:25:37,935 WARN
>  org.apache.flink.runtime.taskmanager.TaskManagerLocation [] - No
> hostname could be resolved for the IP address 10.11.202.152, using IP
> address as host name. Local input split assignment (such as for HDFS files)
> may be impacted.
> 2023-10-21 10:25:37,936 INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph   [] - Job
>  (ee4f7c678794ee16506f9b41425c244e) switched from state
> RESTARTING to RUNNING.
> 2023-10-21 10:25:37,936 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] -
> Restoring job ee4f7c678794ee16506f9b41425c244e from Savepoint 87794 @
> 1697883668126 for ee4f7c678794ee16506f9b41425c244e located at
> s3:///savepoint-ee4f7c-9c6499126fd0.
> 2023-10-21 10:25:37,937 INFO
>  org.apache.flink.runtime.checkpoint.CheckpointCoordinator[] - No
> master state to restore


However, a RecipientUnreachableException occurs, and the HA data gets
cleaned up. Eventually, the Flink cluster shuts down and restarts:


> org.apache.flink.runtime.rpc.exceptions.RecipientUnreachableException:
> Could not send message
> [RemoteRpcInvocation(null.submitTask(TaskDeploymentDescriptor, JobMasterId,
> Time))] from sender [Actor[akka://flink/temp/taskmanager_0$ENE]] to
> recipient [Actor[akka.tcp://
> flink@10.11.181.62:6122/user/rpc/taskmanager_0#-43671188]], because the
> recipient is unreachable. This can either mean that the recipient has been
> terminated or that the remote RpcService is currently not reachable.
> at
> org.apache.flink.runtime.rpc.akka.DeadLettersActor.handleDeadLetter(DeadLettersActor.java:61)
> ~[flink-rpc-akka_61fdae14-7548-48be-b7c8-11190d636910.jar:1.14.5]
> ...
> 2023-10-21 10:25:37,946 INFO
>  org.apache.flink.runtime.executiongraph.ExecutionGraph   [] -
> Discarding the results produced by task execution
> 86d39b748d3655b6488fb9eaafb34f73.
> ...
> 2023-10-21 10:25:40,063 INFO
>  org.apache.flink.kubernetes.highavailability.KubernetesHaServices [] -
> Finished cleaning up the high availability data.
> ...
> 2023-10-21 10:25:40,170 INFO
>  org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] -
> Terminating cluster entrypoint process
> KubernetesApplicationClusterEntrypoint with exit code 1443.
> ...
> 2023-10-21 10:25:44,631 INFO
>  org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] -
> Recovered 2 pods from previous attempts, current attempt id is 2.
> ...
> 2023-10-21 10:25:44,631 INFO
>  org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] -
> Recovered 2 workers from previous attempt.
> ...
> 2023-10-21 10:25:45,015 ERROR
> org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler [] -
> Unhandled exception.
> org.apache.flink.runtime.rpc.exceptions.RecipientUnreachableException:
> Could not send message
> [RemoteFencedMessage(b55fb309bb698aa75925f70bce254756,
> RemoteRpcInvocation(null.requestMultipleJobDetails(Time)))] from sender
> [Actor[akka.tcp://flink@10.11.76.167:6123/temp/dispatcher_0$Tb]] to
> recipient [Actor[akka://flink/user/rpc/dispatcher_0#1755511719]], because
> the recipient is unreachable. This can either mean that the recipient has
> been terminated or that the remote RpcService is currently not reachable.
> ...
> 2023-10-21 10:25:45,798 INFO
>  org.apache.flink.runtime.blob.FileSystemBlobStore[] - Creating
> highly available BLOB storage directory at
> s3:blob


When the Flink cluster restarts, it doesn't try to restore from the latest
savepoint anymore. Instead, it tries to restore from a savepoint in
`execution.savepoint.path` in the flink-config. Since this savepoint was
from a while ago, it has been disposed already, and so the Flink cluster
cannot restart again:

2023-10-21 

Re: Flink SQL: MySQL to Elaticsearch soft delete

2023-10-21 Thread Feng Jin
Hi Hemi,

One possible way, but it may generate many useless states.

As shown below:
```
CREATE TABLE test_source (f1 xxx, f2, xxx, f3 xxx,  deleted boolean) with
(.);

INSERT INTO es_sink
SELECT f1, f2, f3
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY f1, f2   ORDER BY proctime())  as rnk
from test_source
) where rnk = 1 AND deleted = false;
```

Best,
Feng

On Fri, Oct 20, 2023 at 1:38 PM Hemi Grs  wrote:

> hello everyone,
>
> right now I'm using flink to sync from mysql to elasticsearch and so far
> so good. If we insert, update, or delete it will sync from mysql to elastic
> without any problem.
>
> The problem I have right now is the application is not actually doing hard
> delete to the records in mysql, but doing soft delete (updating a deletedAt
> column).
>
> Because it's not actually doing a deletion, flink is not deleting the data
> in elastic. How do I make it so it will delete the data in elastic?
>
> Thanks
>


Re: Unsubscribe from user list.

2023-10-21 Thread bharghavi vajrala
Team,

Please unsubscribe my email id.

On Thu, Oct 19, 2023 at 6:25 AM jihe18717838093 <18717838...@126.com> wrote:

> Hi team,
>
>
>
> Could you please remove this email from the subscription list?
>
>
>
> Thank you!
>
>
>
> Best,
>
> Minglei
>


Re: changing the 'flink-main-container' name

2023-10-20 Thread Mate Czagany
Hi,

By naming the container flink-main-container, Flink will know which
container spec it should use for the Flink containers.
If you change the name Flink won't know which container spec to use for the
Flink container, and will probably think it's just a sidecar container, and
there will still be a flink-main-container with the default spec instead.

There is unfortunately no way to rename this container as of now.

Regards,
Mate

Nuno  ezt írta (időpont: 2023. okt. 20., P, 14:39):

> Hello,
>
> We just adopted the flink operator. According to this link
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/
> it prescribes a pod template containing among other things the following:
>
> containers:# Do not change the main container name- name: 
> flink-main-container
>
> I would actually like to change this name for each of my different flink 
> microservices, because having a bunch of microservices with the same 
> container name messes terribly with the container-based metrics and 
> dashboards of our monitoring system.
>
> I'd like to try to understand if possible why this comment is there, and how 
> seriously should I take it ? What will break, concretely, if I change it, 
> please ?
>
> I tried going through the operator code itself but couldn't find anything 
> concrete.
>
> Any help to understand the underlying constraints will be very welcome.
>
> Thank you very much!
>
>


Re: Bloom Filter for Rocksdb

2023-10-20 Thread Mate Czagany
Hi,

There have been no reports about setting this configuration causing any
issues. I would guess it's off by default because it can increase the
memory usage by an unpredictable amount.

I would say feel free to enable it, from what you've said I also think that
this would improve the performance of your jobs. But make sure to configure
your jobs so that they will be able to accommodate the potential memory
footprint growth. Also please read the following resources to know more
about RocksDBs bloom filter:
https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter
https://rocksdb.org/blog/2014/09/12/new-bloom-filter-format.html

Regards,
Mate


Kenan Kılıçtepe  ezt írta (időpont: 2023. okt. 20.,
P, 15:50):

> Can someone tell the exact performance effect of enabling bloom filter?
> May enabling it cause some unpredictable performance problems?
>
> I read what it is and how it works and it makes sense but  I also asked
> myself why the default value of state.backend.rocksdb.use-bloom-filter is
> false.
>
> We have a 5 servers flink cluster, processing real time IoT data coming
> from 5 million devices and for a lot of jobs, we keep different states for
> each device.
>
> Sometimes we have performance issues and when I check the flamegraph on
> the test server I always see rocksdb.get() is the blocker. I just want to
> increase rocksdb performance.
>
> Thanks
>
>


Re: Dealing with stale Watermark

2023-10-20 Thread Giannis Polyzos
Hi Irakli,
If you see the watermarks tab on the operator do you see being propagated?
If for example your source has multiple splits (like Kafka partitions) if
one is idle or stays behind the watermark won’t be propagated as it is the
minimum from all inputs (partitions/splits)

Best

On Fri, 20 Oct 2023 at 6:11 PM, irakli.keshel...@sony.com <
irakli.keshel...@sony.com> wrote:

> Hello,
>
> I have a Flink application that is consuming events from the Kafka topic
> and builds sessions from them. I'm using the Keyed stream. The application
> runs fine initially, but after some time it is getting "stuck". I can see
> that the "processElement" function is processing the incoming events, but
> the onTimer function is never called. I can also see that the Watermark is
> getting stale (I guess that's why the onTimer is not called). My window
> duration is 1 minute, and the logic lives in onTimer function. Every minute
> I delete the old timer and create a new one (again with one minute
> duration). If certain logic is fulfilled, then I don't create a new timer
> anymore and clear the state.
>
> The same application is running fine in another environment (this one just
> gets more data), because of this I believe the windowing logic has no
> flows. I can't find any anomalies in CPU/Memory consumption and the
> checkpoints are completed successfully as well.
>
> Does anyone have similar issues?
>
> Best,
> Irakli
>


Re: Bloom Filter for Rocksdb

2023-10-20 Thread Kartoglu, Emre
I don’t know much about the performance improvements that may come from using 
bloom filters, but I believe you can also improve RocksDB performance by 
increasing managed memory 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#taskmanager-memory-managed-fraction
 which RocksDB uses.



From: Kenan Kılıçtepe 
Date: Friday, 20 October 2023 at 14:51
To: user 
Subject: [EXTERNAL] Bloom Filter for Rocksdb


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Can someone tell the exact performance effect of enabling bloom filter?
May enabling it cause some unpredictable performance problems?

I read what it is and how it works and it makes sense but  I also asked myself 
why the default value of state.backend.rocksdb.use-bloom-filter is false.

We have a 5 servers flink cluster, processing real time IoT data coming from 5 
million devices and for a lot of jobs, we keep different states for each device.

Sometimes we have performance issues and when I check the flamegraph on the 
test server I always see rocksdb.get() is the blocker. I just want to increase 
rocksdb performance.

Thanks



Re: Flink SQL exception on using cte

2023-10-20 Thread elakiya udhayanan
Thanks Robin and Aniket for the suggestions you have given.

Will try and update on the same.

Thanks,
Elakiya

On Fri, Oct 20, 2023 at 2:34 AM Robin Moffatt  wrote:

> CTEs are supported, you can see an example in the docs [1] [2]. In the
> latter doc, it also says
>
> > CTEs are supported in Views, CTAS and INSERT statement
>
> So I'm just guessing here, but your SQL doesn't look right.
> The CTE needs to return a column called `pod`, and the `FROM` clause for
> the `SELECT` should be after it, not before the `INSERT`.
>
> i.e. something like this instead:
>
> WITH p1 AS ( SELECT empId AS pod FROM employee )
> INSERT INTO correlate
> SELECT pod FROM p1;
>
>
> Hope that helps,
>
> Robin
>
> [1]:
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sql/queries/with/
> [2]:
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/hive-compatibility/hive-dialect/queries/cte/
>
> On Thu, 19 Oct 2023 at 08:05, elakiya udhayanan 
> wrote:
>
>> Hi Team,
>>
>> I have a Flink job which uses the upsert-kafka connector to consume the
>> events from two different Kafka topics (confluent avro serialized) and
>> write them to two different tables (in Flink's memory using the Flink's SQL
>> DDL statements).
>>
>> I want to correlate them using the SQL join statements and for this I am
>> trying to use the cte expressions like below (sample): But getting
>> exception as below
>>
>>  *org.apache.flink.table.api.SqlParserException: SQL parse failed.
>> Incorrect syntax near the keyword 'INSERT'*
>>
>> WITH p1 AS ( SELECT empId FROM employee )
>>  FROM p1
>> INSERT INTO correlate
>> SELECT pod;
>>
>> Please let me know if queries with cte are supported in Apache Flink.
>>
>> Thanks,
>> Elakiya
>>
>


Re: Flink SQL: non-deterministic join behavior, is it expected?

2023-10-20 Thread Yaroslav Tkachenko
Hi Xuyang,

A shuffle by join key is what I'd expect, but I don't see it. The issue
only happens with parallelism > 1.

> do you mean the one +I record and two +U records arrive the sink with
random order?

Yes.

On Fri, Oct 20, 2023 at 4:48 AM Xuyang  wrote:

> Hi. Actually the results that arrive join are shuffled by join keys by
> design.
>
> In your test, do you means the one +I record and two +U records arrive the
> sink with random order? What is the parallelism of these operators ? It
> would be better if you could post an example that can be reproduced.
>
>
>
>
> --
> Best!
> Xuyang
>
>
> At 2023-10-20 04:31:09, "Yaroslav Tkachenko"  wrote:
>
> Hi everyone,
>
> I noticed that a simple INNER JOIN in Flink SQL behaves
> non-deterministicly. I'd like to understand if it's expected and whether an
> issue is created to address it.
>
>
> In my example, I have the following query:
>
> SELECT a.funder, a.amounts_added, r.amounts_removed FROM table_a AS a JOIN
> table_b AS r ON a.funder = r.funder
>
> Let's say I have three records with funder 12345 in the table_a and a
> single record with funder 12345 in the table_b. When I run this Flink job,
> I can see an INSERT with two UPDATEs as my results (corresponding to the
> records from table_a), but their order is not deterministic. If I re-run
> the application several times, I can see different results.
>
> It looks like Flink uses a GlobalPartitioner in this case, which tells me
> that it doesn't perform a shuffle on the column used in the join condition.
>
>
> I use Flink 1.17.1. Appreciate any insights here!
>
>


Re: kafka_appender收集flink任务日志连接数过多问题

2023-10-19 Thread Feng Jin
可以考虑在每台 yarn 机器部署日志服务(可收集本机的日志到 kafka)
yarn container -> 单机的日志服务 -> kafka.



On Mon, Oct 16, 2023 at 3:58 PM 阿华田  wrote:

>
> Flink集群使用kafka_appender收集flink产生的日志,但是现在实时运行的任务超过了三千个,运行的yarn-container有20万+。导致存储日志的kafka集群连接数过多,kafka集群压力有点大,请教各位大佬flink日志采集这块还有什么好的方式吗
>
>
> | |
> 阿华田
> |
> |
> a15733178...@163.com
> |
> 签名由网易邮箱大师定制
>
>


Re: flink sql不支持show create catalog 吗?

2023-10-19 Thread Feng Jin
hi casel


从 1.18 开始,引入了 CatalogStore,持久化了 Catalog 的配置,确实可以支持 show create catalog 了。


Best,
Feng

On Fri, Oct 20, 2023 at 11:55 AM casel.chen  wrote:

> 之前在flink sql中创建过一个catalog,现在想查看当初创建catalog的语句复制并修改一下另存为一个新的catalog,发现flink
> sql不支持show create catalog 。
> 而据我所知doris是支持show create catalog语句的。flink sql能否支持一下呢?


Re: How to handle BatchUpdateException when using JdbcSink

2023-10-19 Thread Feng Jin
Hi Sai,

If you directly utilize JdbcSink, you may not be able to catch this
exception.

But, you can create your own SinkFunction and invoke the `invoke` method of
JdbcSink and catch the Exception, and invoke the dlq sink.

As shown below,
```
public class SinkWrapper {

private JdbcSink jdbcSink;
private DlpSink dlpSink;

public void open(Configuration parameters){
jdbcSink.open(parameters);
   dlpSink.open(parameters);
}

@Override
public void invoke(T value, Context context) throws IOException {
   try {
 jdbcSink.invoke(value, context);
  } catch (Exception e) {
dlpSink.invoke(value, context);
  }
   }
}

```


Best,
Feng


On Thu, Oct 19, 2023 at 4:12 PM Sai Vishnu 
wrote:

> Hi team,
>
>
>
> We are using the JdbcSink from flink-connector-jdbc artifact, version
> 3.1.0-1.17.
>
> I want to know if it’s possible to catch BatchUpdateException thrown and
> put that message to DLQ.
>
>
>
> Below is the use case:
>
> Flink job reads a packet from Kafka and writes it to Postgres using the
> JdbcSink.
>
> For any missing field, we are catching it during the data transform layer
> and writing it a side output that writes the exception along with the
> original message to a dlq sink.
>
> In scenarios where a field has characters that is greater than what is
> defined in Postgres, we are currently receiving a BatchUpdateException
> during the update to the Postgres stage.
>
>
>
> Is it possible to catch this exception and write the message to a dlq sink?
>
>
>
> Thanks,
>
> Sai Vishnu Soudri
>


Re: Flink SQL exception on using cte

2023-10-19 Thread Robin Moffatt
CTEs are supported, you can see an example in the docs [1] [2]. In the
latter doc, it also says

> CTEs are supported in Views, CTAS and INSERT statement

So I'm just guessing here, but your SQL doesn't look right.
The CTE needs to return a column called `pod`, and the `FROM` clause for
the `SELECT` should be after it, not before the `INSERT`.

i.e. something like this instead:

WITH p1 AS ( SELECT empId AS pod FROM employee )
INSERT INTO correlate
SELECT pod FROM p1;


Hope that helps,

Robin

[1]:
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sql/queries/with/
[2]:
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/hive-compatibility/hive-dialect/queries/cte/

On Thu, 19 Oct 2023 at 08:05, elakiya udhayanan  wrote:

> Hi Team,
>
> I have a Flink job which uses the upsert-kafka connector to consume the
> events from two different Kafka topics (confluent avro serialized) and
> write them to two different tables (in Flink's memory using the Flink's SQL
> DDL statements).
>
> I want to correlate them using the SQL join statements and for this I am
> trying to use the cte expressions like below (sample): But getting
> exception as below
>
>  *org.apache.flink.table.api.SqlParserException: SQL parse failed.
> Incorrect syntax near the keyword 'INSERT'*
>
> WITH p1 AS ( SELECT empId FROM employee )
>  FROM p1
> INSERT INTO correlate
> SELECT pod;
>
> Please let me know if queries with cte are supported in Apache Flink.
>
> Thanks,
> Elakiya
>


RE: Flink SQL exception on using cte

2023-10-19 Thread Aniket Sule
Hello,
I have been able to use queries with cte in this syntax –
INSERT INTO t1
  WITH cte1 AS (SELECT
   ),
 cte2 AS (SELECT
   )
   (SELECT
  *
  FROM
  cte1 AS a, cte2 as b .
  );

Hope this helps you.

Regards,
Aniket Sule

From: elakiya udhayanan 
Sent: Thursday, October 19, 2023 3:04 AM
To: user@flink.apache.org
Subject: Flink SQL exception on using cte

CAUTION:External email. Do not click or open attachments unless you know and 
trust the sender.

Hi Team,

I have a Flink job which uses the upsert-kafka connector to consume the events 
from two different Kafka topics (confluent avro serialized) and write them to 
two different tables (in Flink's memory using the Flink's SQL DDL statements).

I want to correlate them using the SQL join statements and for this I am trying 
to use the cte expressions like below (sample): But getting  exception as below

 org.apache.flink.table.api.SqlParserException: SQL parse failed. Incorrect 
syntax near the keyword 'INSERT'


WITH p1 AS ( SELECT empId FROM employee )

 FROM p1

INSERT INTO correlate

SELECT pod;
Please let me know if queries with cte are supported in Apache Flink.

Thanks,
Elakiya
Caution: External email. Do not click or open attachments unless you know and 
trust the sender.


Re: Flink HDFS with Flink Kubernetes Operator

2023-10-19 Thread Raihan Sunny via user
That solved it!! Thank you so much!

On Thu, Oct 19, 2023 at 4:32 PM Mate Czagany  wrote:

> Hello,
>
> Please look into using 'kubernetes.decorator.hadoop-conf-mount.enabled'
> [1] that was added for use cases where the user wishes to skip adding these
> Hadoop mount decorators. It's true by default, but by setting it to false
> Flink won't add this mount.
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/config/#kubernetes-decorator-hadoop-conf-mount-enabled
>
> Regards,
> Mate
>
> Raihan Sunny via user  ezt írta (időpont: 2023.
> okt. 19., Cs, 11:48):
>
>> Hi,
>>
>> I've been using HDFS with Flink for checkpoint and savepoint storage
>> which works perfectly fine. Now I have another use case where I want to
>> read and write to HDFS from the application code as well. For this, I'm
>> using the "pyarrow" library which is already installed with PyFlink as a
>> dependency.
>>
>> According to the pyarrow documentation [1], HADOOP_HOME and CLASSPATH
>> environment variables are mandatory. As per the Flink documentation [2],
>> HADOOP_CLASSPATH must be set.
>>
>> I'm using Flink Kubernetes Operator to deploy my application and the
>> issue arises only when I'm using the native mode. When I deploy the
>> application with all the variables above, the JobManager starts up but the
>> TaskManager fails to start with the following error from Kubernetes:
>>
>> MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap
>> "hadoop-config-flink-job" not found
>>
>> It also seems to set a HADOOP_CONF_DIR environment variable on the
>> TaskManager with the value "/opt/hadoop/conf" which doesn't exist as my
>> hadoop installation is elsewhere. If I run the job on standalone mode,
>> everything seems to work fine as the TaskManager doesn't look for a
>> "hadoop-config-volume" to mount. Here's the YAML file for reference:
>>
>> apiVersion: flink.apache.org/v1beta1
>> kind: FlinkDeployment
>> metadata:
>>   name: flink-job
>>   namespace: flink
>> spec:
>>   image: 
>>   imagePullPolicy: Always
>>   flinkVersion: v1_17
>>   flinkConfiguration:
>> taskmanager.numberOfTaskSlots: "1"
>> state.savepoints.dir:
>> hdfs://hdfs-namenode-0.hdfs-namenodes.hdfs:8020/flink-data/savepoints
>> state.checkpoints.dir:
>> hdfs://hdfs-namenode-0.hdfs-namenodes.hdfs:8020/flink-data/checkpoints
>> high-availability:
>> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>> high-availability.storageDir:
>> hdfs://hdfs-namenode-0.hdfs-namenodes.hdfs:8020/flink-data/ha
>> execution.checkpointing.interval: "3s"
>> execution.checkpointing.unaligned: "true"
>> execution.checkpointing.timeout: "30m"
>>   serviceAccount: flink
>>   podTemplate:
>> spec:
>>   imagePullSecrets:
>> - name: 
>>   containers:
>> - name: flink-main-container
>>   env:
>> - name: HADOOP_HOME
>>   value: /hadoop-3.2.1
>> - name: CLASSPATH
>>   value: 
>> - name: HADOOP_CLASSPATH
>>   value: 
>>   jobManager:
>> resource:
>>   memory: "1024m"
>>   cpu: 0.5
>>   taskManager:
>> resource:
>>   memory: "1024m"
>>   cpu: 0.5
>>   job:
>> jarURI: local:///opt/flink/opt/flink-python_2.12-1.17.0.jar
>> entryClass: "org.apache.flink.client.python.PythonDriver"
>> args: ["python", "-pym", "flink_job"]
>> parallelism: 1
>> upgradeMode: savepoint
>> state: running
>> savepointTriggerNonce: 0
>>
>> Please note that I've also tried by installing hadoop to "/opt/" and
>> symlinking the "conf" directory as expected by HADOOP_CONF_DIR. This also
>> didn't work.
>>
>> As mentioned before, if I add "mode: standalone", this job runs without
>> any problem. But since the autoscaling feature only works on the native
>> mode, I need to get it working there. Any help is appreciated.
>>
>> Versions:
>> Flink - 1.17.1
>> PyFlink - 1.17.1
>> Flink Kubernetes Operator - 1.5.0
>>
>>
>> - [1]
>> https://arrow.apache.org/docs/python/filesystems.html#hadoop-distributed-file-system-hdfs
>> - [2]
>> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/configuration/advanced/#hadoop-dependencies
>>
>>
>> Thanks,
>> Sunny
>>
>> [image: SELISE]
>>
>> SELISE Group
>> Zürich: The Circle 37, 8058 Zürich-Airport, Switzerland
>> Munich: Tal 44, 80331 München, Germany
>> Dubai: Building 3, 3rd Floor, Dubai Design District, Dubai, United Arab
>> Emirates
>> Dhaka: Midas Center, Road 16, Dhanmondi, Dhaka 1209, Bangladesh
>> Thimphu: Bhutan Innovation Tech Center, Babesa, P.O. Box 633, Thimphu,
>> Bhutan
>>
>> Visit us: www.selisegroup.com
>>
>> *Important Note: This e-mail and any attachment are confidential and may
>> contain trade secrets and may well also be legally privileged or otherwise
>> protected from disclosure. If you have received it in error, you are on
>> notice of its status. Please notify us immediately by reply e-mail and then
>> 

Re: Flink HDFS with Flink Kubernetes Operator

2023-10-19 Thread Mate Czagany
Hello,

Please look into using 'kubernetes.decorator.hadoop-conf-mount.enabled' [1]
that was added for use cases where the user wishes to skip adding these
Hadoop mount decorators. It's true by default, but by setting it to false
Flink won't add this mount.

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/config/#kubernetes-decorator-hadoop-conf-mount-enabled

Regards,
Mate

Raihan Sunny via user  ezt írta (időpont: 2023. okt.
19., Cs, 11:48):

> Hi,
>
> I've been using HDFS with Flink for checkpoint and savepoint storage which
> works perfectly fine. Now I have another use case where I want to read and
> write to HDFS from the application code as well. For this, I'm using the
> "pyarrow" library which is already installed with PyFlink as a dependency.
>
> According to the pyarrow documentation [1], HADOOP_HOME and CLASSPATH
> environment variables are mandatory. As per the Flink documentation [2],
> HADOOP_CLASSPATH must be set.
>
> I'm using Flink Kubernetes Operator to deploy my application and the issue
> arises only when I'm using the native mode. When I deploy the application
> with all the variables above, the JobManager starts up but the TaskManager
> fails to start with the following error from Kubernetes:
>
> MountVolume.SetUp failed for volume "hadoop-config-volume" : configmap
> "hadoop-config-flink-job" not found
>
> It also seems to set a HADOOP_CONF_DIR environment variable on the
> TaskManager with the value "/opt/hadoop/conf" which doesn't exist as my
> hadoop installation is elsewhere. If I run the job on standalone mode,
> everything seems to work fine as the TaskManager doesn't look for a
> "hadoop-config-volume" to mount. Here's the YAML file for reference:
>
> apiVersion: flink.apache.org/v1beta1
> kind: FlinkDeployment
> metadata:
>   name: flink-job
>   namespace: flink
> spec:
>   image: 
>   imagePullPolicy: Always
>   flinkVersion: v1_17
>   flinkConfiguration:
> taskmanager.numberOfTaskSlots: "1"
> state.savepoints.dir:
> hdfs://hdfs-namenode-0.hdfs-namenodes.hdfs:8020/flink-data/savepoints
> state.checkpoints.dir:
> hdfs://hdfs-namenode-0.hdfs-namenodes.hdfs:8020/flink-data/checkpoints
> high-availability:
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
> high-availability.storageDir:
> hdfs://hdfs-namenode-0.hdfs-namenodes.hdfs:8020/flink-data/ha
> execution.checkpointing.interval: "3s"
> execution.checkpointing.unaligned: "true"
> execution.checkpointing.timeout: "30m"
>   serviceAccount: flink
>   podTemplate:
> spec:
>   imagePullSecrets:
> - name: 
>   containers:
> - name: flink-main-container
>   env:
> - name: HADOOP_HOME
>   value: /hadoop-3.2.1
> - name: CLASSPATH
>   value: 
> - name: HADOOP_CLASSPATH
>   value: 
>   jobManager:
> resource:
>   memory: "1024m"
>   cpu: 0.5
>   taskManager:
> resource:
>   memory: "1024m"
>   cpu: 0.5
>   job:
> jarURI: local:///opt/flink/opt/flink-python_2.12-1.17.0.jar
> entryClass: "org.apache.flink.client.python.PythonDriver"
> args: ["python", "-pym", "flink_job"]
> parallelism: 1
> upgradeMode: savepoint
> state: running
> savepointTriggerNonce: 0
>
> Please note that I've also tried by installing hadoop to "/opt/" and
> symlinking the "conf" directory as expected by HADOOP_CONF_DIR. This also
> didn't work.
>
> As mentioned before, if I add "mode: standalone", this job runs without
> any problem. But since the autoscaling feature only works on the native
> mode, I need to get it working there. Any help is appreciated.
>
> Versions:
> Flink - 1.17.1
> PyFlink - 1.17.1
> Flink Kubernetes Operator - 1.5.0
>
>
> - [1]
> https://arrow.apache.org/docs/python/filesystems.html#hadoop-distributed-file-system-hdfs
> - [2]
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/configuration/advanced/#hadoop-dependencies
>
>
> Thanks,
> Sunny
>
> [image: SELISE]
>
> SELISE Group
> Zürich: The Circle 37, 8058 Zürich-Airport, Switzerland
> Munich: Tal 44, 80331 München, Germany
> Dubai: Building 3, 3rd Floor, Dubai Design District, Dubai, United Arab
> Emirates
> Dhaka: Midas Center, Road 16, Dhanmondi, Dhaka 1209, Bangladesh
> Thimphu: Bhutan Innovation Tech Center, Babesa, P.O. Box 633, Thimphu,
> Bhutan
>
> Visit us: www.selisegroup.com
>
> *Important Note: This e-mail and any attachment are confidential and may
> contain trade secrets and may well also be legally privileged or otherwise
> protected from disclosure. If you have received it in error, you are on
> notice of its status. Please notify us immediately by reply e-mail and then
> delete this e-mail and any attachment from your system. If you are not the
> intended recipient please understand that you must not copy this e-mail or
> any attachment or disclose the contents to any other person. Thank you for
> your 

Re: Flink kubernets operator delete HA metadata after resuming from suspend

2023-10-19 Thread Gyula Fóra
Thanks for the feedback

We discussed with some devs and we are going to release the 1.6.1 based on
these batches in the next week or so.

Gyula

On Thu, Oct 19, 2023 at 9:44 AM Evgeniy Lyutikov 
wrote:

> Hi.
> I patched my copy of the 1.6.0 operator with edits from
> https://github.com/apache/flink-kubernetes-operator/pull/673
> This solved the problem
>
>
> --
> *От:* Tony Chen 
> *Отправлено:* 19 октября 2023 г. 4:18:36
> *Кому:* Evgeniy Lyutikov
> *Копия:* user@flink.apache.org; Gyula Fóra
> *Тема:* Re: Flink kubernets operator delete HA metadata after resuming
> from suspend
>
> HI Evgeniy,
>
> Did you rollback your operator version? If yes, did you run into any
> issues?
>
> I ran into the following exception in my flink-kubernetes-operator pod
> while rolling back, and I was wondering if you encountered this.
>
> 2023-10-18 21:01:15,251 i.f.k.c.e.l.LeaderElector  [ERROR] Exception
> occurred while releasing lock 'LeaseLock: flink-kubernetes-operator -
> flink-operator-lease (flink-kubernetes-operator-74f9688dd-bcqr2)'
> io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException:
> Unable to update LeaseLock
> at
> io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LeaseLock.update(LeaseLock.java:102)
> at
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.release(LeaderElector.java:139)
> at
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.stopLeading(LeaderElector.java:120)
> at
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$start$2(LeaderElector.java:104)
> at
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
> Source)
> at
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
> Source)
> at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
> Source)
> at
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
> Source)
> at io.fabric8.kubernetes.client.utils.Utils.lambda$null$12(Utils.java:523)
> at
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
> Source)
> at
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
> Source)
> at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
> Source)
> at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown
> Source)
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
> at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure
> executing: PUT at:
> https://10.241.0.1/apis/coordination.k8s.io/v1/namespaces/flink-kubernetes-operator/leases/flink-operator-lease
> <https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2F10.241.0.1%2Fapis%2Fcoordination.k8s.io%2Fv1%2Fnamespaces%2Fflink-kubernetes-operator%2Fleases%2Fflink-operator-lease=05%7C01%7Ceblyutikov%40avito.ru%7Cb3331280021e47d1da6e08dbd01fd244%7Caf0e07b3b90b472392e63fab11dd5396%7C0%7C0%7C638332607322558199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=n5H47J4tRlnE4S4AQsVj1jK8bhexUm9tbA1Zwu07LC8%3D=0>.
> Message: Operation cannot be fulfilled on leases.coordination.k8s.io
> <https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fleases.coordination.k8s.io%2F=05%7C01%7Ceblyutikov%40avito.ru%7Cb3331280021e47d1da6e08dbd01fd244%7Caf0e07b3b90b472392e63fab11dd5396%7C0%7C0%7C638332607322558199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=8HIPvlz33QGMYoP%2BVvOYLtHIV9XoWZXtvQFNJPgiEx8%3D=0>
> "flink-operator-lease": the object has been modified; please apply your
> changes to the latest version and try again. Received status:
> Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=
> coordination.k8s.io
> <https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcoordination.k8s.io%2F=05%7C01%7Ceblyutikov%40avito.ru%7Cb3331280021e47d1da6e08dbd01fd244%7Caf0e07b3b90b472392e63fab11dd5396%7C0%7C0%7C638332607322558199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=DMECoc2sISic7vPN8JhRr5g0WMuxheeChCaEYvUeM5I%3D=0>,
> kind=leases, name=flink-operator-lease, retryAfterSeconds=null, uid=null,
> additionalProperties={}), kind=Status, message=Operation cannot be
> fulfilled on leases.coordination.k8s.io
> <https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fleases.coordination.k8s.io%2F=05%7C01%7Ceblyutikov%40avito.ru%7Cb3

Re: Flink kubernets operator delete HA metadata after resuming from suspend

2023-10-19 Thread Evgeniy Lyutikov
Hi.

I patched my copy of the 1.6.0 operator with edits from 
https://github.com/apache/flink-kubernetes-operator/pull/673
This solved the problem




От: Tony Chen 
Отправлено: 19 октября 2023 г. 4:18:36
Кому: Evgeniy Lyutikov
Копия: user@flink.apache.org; Gyula Fóra
Тема: Re: Flink kubernets operator delete HA metadata after resuming from 
suspend

HI Evgeniy,

Did you rollback your operator version? If yes, did you run into any issues?

I ran into the following exception in my flink-kubernetes-operator pod while 
rolling back, and I was wondering if you encountered this.

2023-10-18 21:01:15,251 i.f.k.c.e.l.LeaderElector  [ERROR] Exception 
occurred while releasing lock 'LeaseLock: flink-kubernetes-operator - 
flink-operator-lease (flink-kubernetes-operator-74f9688dd-bcqr2)'
io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException:
 Unable to update LeaseLock
at 
io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LeaseLock.update(LeaseLock.java:102)
at 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.release(LeaderElector.java:139)
at 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.stopLeading(LeaderElector.java:120)
at 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$start$2(LeaderElector.java:104)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown 
Source)
at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
 Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown 
Source)
at io.fabric8.kubernetes.client.utils.Utils.lambda$null$12(Utils.java:523)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown 
Source)
at 
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
 Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure 
executing: PUT at: 
https://10.241.0.1/apis/coordination.k8s.io/v1/namespaces/flink-kubernetes-operator/leases/flink-operator-lease<https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2F10.241.0.1%2Fapis%2Fcoordination.k8s.io%2Fv1%2Fnamespaces%2Fflink-kubernetes-operator%2Fleases%2Fflink-operator-lease=05%7C01%7Ceblyutikov%40avito.ru%7Cb3331280021e47d1da6e08dbd01fd244%7Caf0e07b3b90b472392e63fab11dd5396%7C0%7C0%7C638332607322558199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=n5H47J4tRlnE4S4AQsVj1jK8bhexUm9tbA1Zwu07LC8%3D=0>.
 Message: Operation cannot be fulfilled on 
leases.coordination.k8s.io<https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fleases.coordination.k8s.io%2F=05%7C01%7Ceblyutikov%40avito.ru%7Cb3331280021e47d1da6e08dbd01fd244%7Caf0e07b3b90b472392e63fab11dd5396%7C0%7C0%7C638332607322558199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=8HIPvlz33QGMYoP%2BVvOYLtHIV9XoWZXtvQFNJPgiEx8%3D=0>
 "flink-operator-lease": the object has been modified; please apply your 
changes to the latest version and try again. Received status: 
Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], 
group=coordination.k8s.io<https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcoordination.k8s.io%2F=05%7C01%7Ceblyutikov%40avito.ru%7Cb3331280021e47d1da6e08dbd01fd244%7Caf0e07b3b90b472392e63fab11dd5396%7C0%7C0%7C638332607322558199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=DMECoc2sISic7vPN8JhRr5g0WMuxheeChCaEYvUeM5I%3D=0>,
 kind=leases, name=flink-operator-lease, retryAfterSeconds=null, uid=null, 
additionalProperties={}), kind=Status, message=Operation cannot be fulfilled on 
leases.coordination.k8s.io<https://eur04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fleases.coordination.k8s.io%2F=05%7C01%7Ceblyutikov%40avito.ru%7Cb3331280021e47d1da6e08dbd01fd244%7Caf0e07b3b90b472392e63fab11dd5396%7C0%7C0%7C638332607322558199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=8HIPvlz33QGMYoP%2BVvOYLtHIV9XoWZXtvQFNJPgiEx8%3D=0>
 "flink-operator-lease": the object has been modified; please apply your 
changes to the latest version and try again, metadata=ListMeta(_continue=null, 
remainingItemCount=null, resourceVersion=null, selfLink=null, 
additionalProperties={}), reason=Conflict, status=Failure, 
additionalProperties={}).

Re: Side outputs from sinks

2023-10-18 Thread Péter Váry
Hi Aian,

Which sink API are you using?
Have you tried the Sink v2 API [1]?

If you implement the WithPostCommitTopology interface [2], then you can
provide a follow-up step after the commits are finished. I have not tried
yet, but I expect that the failed Committables are emitted as well, and
available for further processing.
Be aware, that the Committables arrive to the PostCommitTopology in the
following checpoint-cycle than they are created. So if the data arrives in
checpoint 1 (C1), then we checpoint it in C1, commit in
C1.notifyCheckpointCompleted, and the PostcommitTopology will be able to
retry commit it in the following checkpoint (C2).
Also there is a known open issue at the end of streams. The last
checpoint's Committables are not processed by the PostCommitTopology at the
moment [3].

Alternatively you can implement retries in the Sink v2 Committer [4] as
well, and then, you don't have to rely on the PostCommitTopology.

I hope this helps,
Péter

- [1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-191%3A+Extend+unified+Sink+interface+to+support+small+file+compaction
- [2]
https://nightlies.apache.org/flink/flink-docs-master/api/java//org/apache/flink/streaming/api/connector/sink2/WithPostCommitTopology.html
- [3] https://issues.apache.org/jira/browse/FLINK-30238
- [4] https://issues.apache.org/jira/browse/FLINK-30238

On Wed, Oct 18, 2023, 18:08 Aian Cantabrana  wrote:

> Hi,
>
> We have an use case where we need to ensure that data reaches all
> endpoints/sinks one by one or to split the flow if any of them fails. Here
> is an schema of the use case:
>
> Current job:  Source -> filters/maps/process  -> sink1
>
> \-> sink2
>
> \-> sink3
>
> Desired job: Source -> filters/maps/process -> sink1 -> if OK
> sink2 -> if OK sink3
>
> \-> if NOK sink4 \-> if NOK sink4
>
> In order to achieve it, we would like to have some kind of side output
> coming out from our sinks but side outputs are not available in
> SinkFunctions. The only way we have come up with is to implement the sink
> functionality inside a ProcessFunction or just call the sink.invoke()
> method from inside a process function but we hope there is a better/cleaner
> way to do it.
>
> We are working with the DataStream API in java.
>
> Thanks in advance,
>
> Aian
>
> --
> -
> Aian Cantabrana
>
> ZYLK.net :: consultoría.openSource
> Ribera de Axpe, 11
> Edificio A, modulo 201-203
> 48950 Erandio (Bizkaia)
>
> telf.: +34 747421343
> ofic.: +34 944272119
> -
>


Re: Unsubscribe from user list.

2023-10-18 Thread Zakelly Lan
Hi Lijuan,

Please send email to user-unsubscr...@flink.apache.org if you want to
unsubscribe the mail from user@flink.apache.org.


Best,
Zakelly

On Thu, Oct 19, 2023 at 6:23 AM Hou, Lijuan via user
 wrote:
>
> Hi team,
>
>
>
> Could you please remove this email from the subscription list? I have another 
> email (juliehou...@gmail.com) subscribed as well. I can use that email to 
> receive flink emails.
>
>
>
> Thank you!
>
>
>
> Best,
>
> Lijuan


Re: What to do about local disks with RocksDB with Kubernetes Operator

2023-10-18 Thread Yanfei Lei
Hi Alex,

AFAIK, the emptyDir[1] can be used directly as local disks, and
emptyDir can be defined by referring to this pod template[2].

If you want to use local disks through PV, you can first create a
statefulSet and mount the PV through volume claim templates[3], the
example “Local Recovery Enabled TaskManager StatefulSet”[4] provided
in the docs may be useful.

[1] https://kubernetes.io/docs/concepts/storage/volumes/#emptydir
[2] 
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/pod-template/#pod-template
[3] 
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#volume-claim-templates
[4] 
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/#local-recovery-enabled-taskmanager-statefulset

Best,
Yanfei

Alex Craig  于2023年10月19日周四 02:57写道:
>
> The recommended practice for RocksDB usage is to have local disks accessible 
> to it. The Kubernetes Operator doesn’t have fields related to creating disks 
> for RocksDB to use.
>
>
>
> For instance, say I have maxParallelism=10 but parallelism=1. I have a 
> statically created PVC named “flink-rocksdb”. The first TaskManager spins up 
> and mounts this PVC. But successive ones fail to start because there is no 
> PVC for them to mount.
>
>
>
> Has anybody solved this? Seems like a big issue with using Flink in 
> Kubernetes…


Re: Flink Operator 1.6 causes JobManagerDeploymentStatus: MISSING

2023-10-18 Thread Tony Chen
I did see another email thread that mentions instructions on getting the
image from this link:
https://github.com/apache/flink-kubernetes-operator/pkgs/container/flink-kubernetes-operator/127962962?tag=3f0dc2e

On Wed, Oct 18, 2023 at 6:25 PM Tony Chen  wrote:

> We're using the Helm chart to deploy the operator right now, and the image
> that I'm using was downloaded from Docker Hub:
> https://hub.docker.com/r/apache/flink-kubernetes-operator/tags. I
> wouldn't be able to use the release-1.6 branch (
> https://github.com/apache/flink-kubernetes-operator/commits/release-1.6)
> to pick up the fix, unless I'm missing something.
>
> I was attempting to rollback the operator version to 1.4 today, and I ran
> into the following issues on some operator pods. I was wondering if you
> seen these Lease issues before.
>
> 2023-10-18 21:01:15,251 i.f.k.c.e.l.LeaderElector  [ERROR] Exception
> occurred while releasing lock 'LeaseLock: flink-kubernetes-operator -
> flink-operator-lease (flink-kubernetes-operator-74f9688dd-bcqr2)'
> io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException:
> Unable to update LeaseLock
> at
> io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LeaseLock.update(LeaseLock.java:102)
> at
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.release(LeaderElector.java:139)
> at
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.stopLeading(LeaderElector.java:120)
> at
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$start$2(LeaderElector.java:104)
> at
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
> Source)
> at
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
> Source)
> at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
> Source)
> at
> java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
> Source)
> at io.fabric8.kubernetes.client.utils.Utils.lambda$null$12(Utils.java:523)
> at
> java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
> Source)
> at
> java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
> Source)
> at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
> Source)
> at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown
> Source)
> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
> at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure
> executing: PUT at:
> https://10.241.0.1/apis/coordination.k8s.io/v1/namespaces/flink-kubernetes-operator/leases/flink-operator-lease.
> Message: Operation cannot be fulfilled on leases.coordination.k8s.io 
> "flink-operator-lease":
> the object has been modified; please apply your changes to the latest
> version and try again. Received status: Status(apiVersion=v1, code=409,
> details=StatusDetails(causes=[], group=coordination.k8s.io, kind=leases,
> name=flink-operator-lease, retryAfterSeconds=null, uid=null,
> additionalProperties={}), kind=Status, message=Operation cannot be
> fulfilled on leases.coordination.k8s.io "flink-operator-lease": the
> object has been modified; please apply your changes to the latest version
> and try again, metadata=ListMeta(_continue=null, remainingItemCount=null,
> resourceVersion=null, selfLink=null, additionalProperties={}),
> reason=Conflict, status=Failure, additionalProperties={}).
>
> On Wed, Oct 18, 2023 at 2:55 PM Gyula Fóra  wrote:
>
>> Hi!
>> Not sure if it’s the same but could you try picking up the fix from the
>> release branch and confirming that it solves the problem?
>>
>> If it does we may consider a quick bug fix release.
>>
>> Cheers
>> Gyula
>>
>> On Wed, 18 Oct 2023 at 18:09, Tony Chen  wrote:
>>
>>> Hi Flink Community,
>>>
>>> Most of the Flink applications run on 1.14 at my company. After
>>> upgrading the Flink Operator to 1.6, we've seen many jobmanager pods show
>>> "JobManagerDeploymentStatus: MISSING".
>>>
>>> Here are some logs from the operator pod on one of our Flink
>>> applications:
>>>
>>> [m [33m2023-10-18 02:02:40,823 [m [36mo.a.f.k.o.l.AuditUtils [m
>>> [32m[INFO ][nemo/nemo-streaming-users-identi-updates] >>> Event | Warning |
>>> SAVEPOINTERROR | Savepoint failed for savepointTriggerNonce: null
>>> ...
>>> [m [33m2023-10-18 02:02:40,883 [m [36mo.a.f.k.o.l.AuditUtils [m
>>> [32m[INFO ][nemo/nemo-streaming-users-identi-updates] >>> Event | Warning |
>>> CLUSTERDEPLOYMENTEXCEPTION | Status have been modified externally in
>>> version 17447422864 Previous: 
>>> ...
>>> [m [33m2023-10-18 02:02:40,919 [m [36mi.j.o.p.e.ReconciliationDispatcher
>>> [m [1;31m[ERROR][nemo/nemo-streaming-users-identi-updates] Error during
>>> event processing ExecutionScope{ resource id:
>>> 

Re: Flink Operator 1.6 causes JobManagerDeploymentStatus: MISSING

2023-10-18 Thread Tony Chen
We're using the Helm chart to deploy the operator right now, and the image
that I'm using was downloaded from Docker Hub:
https://hub.docker.com/r/apache/flink-kubernetes-operator/tags. I wouldn't
be able to use the release-1.6 branch (
https://github.com/apache/flink-kubernetes-operator/commits/release-1.6) to
pick up the fix, unless I'm missing something.

I was attempting to rollback the operator version to 1.4 today, and I ran
into the following issues on some operator pods. I was wondering if you
seen these Lease issues before.

2023-10-18 21:01:15,251 i.f.k.c.e.l.LeaderElector  [ERROR] Exception
occurred while releasing lock 'LeaseLock: flink-kubernetes-operator -
flink-operator-lease (flink-kubernetes-operator-74f9688dd-bcqr2)'
io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException:
Unable to update LeaseLock
at
io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LeaseLock.update(LeaseLock.java:102)
at
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.release(LeaderElector.java:139)
at
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.stopLeading(LeaderElector.java:120)
at
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$start$2(LeaderElector.java:104)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
Source)
at io.fabric8.kubernetes.client.utils.Utils.lambda$null$12(Utils.java:523)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
Source)
at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure
executing: PUT at:
https://10.241.0.1/apis/coordination.k8s.io/v1/namespaces/flink-kubernetes-operator/leases/flink-operator-lease.
Message: Operation cannot be fulfilled on leases.coordination.k8s.io
"flink-operator-lease":
the object has been modified; please apply your changes to the latest
version and try again. Received status: Status(apiVersion=v1, code=409,
details=StatusDetails(causes=[], group=coordination.k8s.io, kind=leases,
name=flink-operator-lease, retryAfterSeconds=null, uid=null,
additionalProperties={}), kind=Status, message=Operation cannot be
fulfilled on leases.coordination.k8s.io "flink-operator-lease": the object
has been modified; please apply your changes to the latest version and try
again, metadata=ListMeta(_continue=null, remainingItemCount=null,
resourceVersion=null, selfLink=null, additionalProperties={}),
reason=Conflict, status=Failure, additionalProperties={}).

On Wed, Oct 18, 2023 at 2:55 PM Gyula Fóra  wrote:

> Hi!
> Not sure if it’s the same but could you try picking up the fix from the
> release branch and confirming that it solves the problem?
>
> If it does we may consider a quick bug fix release.
>
> Cheers
> Gyula
>
> On Wed, 18 Oct 2023 at 18:09, Tony Chen  wrote:
>
>> Hi Flink Community,
>>
>> Most of the Flink applications run on 1.14 at my company. After upgrading
>> the Flink Operator to 1.6, we've seen many jobmanager pods show
>> "JobManagerDeploymentStatus: MISSING".
>>
>> Here are some logs from the operator pod on one of our Flink applications:
>>
>> [m [33m2023-10-18 02:02:40,823 [m [36mo.a.f.k.o.l.AuditUtils [m [32m[INFO
>> ][nemo/nemo-streaming-users-identi-updates] >>> Event | Warning |
>> SAVEPOINTERROR | Savepoint failed for savepointTriggerNonce: null
>> ...
>> [m [33m2023-10-18 02:02:40,883 [m [36mo.a.f.k.o.l.AuditUtils [m [32m[INFO
>> ][nemo/nemo-streaming-users-identi-updates] >>> Event | Warning |
>> CLUSTERDEPLOYMENTEXCEPTION | Status have been modified externally in
>> version 17447422864 Previous: 
>> ...
>> [m [33m2023-10-18 02:02:40,919 [m [36mi.j.o.p.e.ReconciliationDispatcher
>> [m [1;31m[ERROR][nemo/nemo-streaming-users-identi-updates] Error during
>> event processing ExecutionScope{ resource id:
>> ResourceID{name='nemo-streaming-users-identi-updates', namespace='nemo'},
>> version: 17447420285} failed.
>> ...
>> org.apache.flink.kubernetes.operator.exception.ReconciliationException:
>> org.apache.flink.kubernetes.operator.exception.StatusConflictException:
>> Status have been modified externally in version 17447422864 Previous:
>> 
>> ...
>> [m [33m2023-10-18 02:03:03,273 [m [36mo.a.f.k.o.o.d.ApplicationObserver
>> [m 

Re: Flink kubernets operator delete HA metadata after resuming from suspend

2023-10-18 Thread Tony Chen
HI Evgeniy,

Did you rollback your operator version? If yes, did you run into any issues?

I ran into the following exception in my flink-kubernetes-operator pod
while rolling back, and I was wondering if you encountered this.

2023-10-18 21:01:15,251 i.f.k.c.e.l.LeaderElector  [ERROR] Exception
occurred while releasing lock 'LeaseLock: flink-kubernetes-operator -
flink-operator-lease (flink-kubernetes-operator-74f9688dd-bcqr2)'
io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LockException:
Unable to update LeaseLock
at
io.fabric8.kubernetes.client.extended.leaderelection.resourcelock.LeaseLock.update(LeaseLock.java:102)
at
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.release(LeaderElector.java:139)
at
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.stopLeading(LeaderElector.java:120)
at
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector.lambda$start$2(LeaderElector.java:104)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown
Source)
at io.fabric8.kubernetes.client.utils.Utils.lambda$null$12(Utils.java:523)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown
Source)
at
java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown
Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown
Source)
at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure
executing: PUT at:
https://10.241.0.1/apis/coordination.k8s.io/v1/namespaces/flink-kubernetes-operator/leases/flink-operator-lease.
Message: Operation cannot be fulfilled on leases.coordination.k8s.io
"flink-operator-lease": the object has been modified; please apply your
changes to the latest version and try again. Received status:
Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=
coordination.k8s.io, kind=leases, name=flink-operator-lease,
retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status,
message=Operation cannot be fulfilled on leases.coordination.k8s.io
"flink-operator-lease": the object has been modified; please apply your
changes to the latest version and try again,
metadata=ListMeta(_continue=null, remainingItemCount=null,
resourceVersion=null, selfLink=null, additionalProperties={}),
reason=Conflict, status=Failure, additionalProperties={}).

On Tue, Sep 12, 2023 at 5:51 AM Gyula Fóra  wrote:

> Hi!
>
> I think this issue is the same as
> https://issues.apache.org/jira/browse/FLINK-33011
> Not sure what exactly is the underlying cause as I could not repro it, but
> the fix should be simple.
>
> Also I believe it's not 1.6.0 related unless a JOSDK/Fabric8 upgrade
> caused it.
>
> Cheers,
> Gyula
>
>
> On Mon, Sep 11, 2023 at 7:47 PM Gyula Fóra  wrote:
>
>> You don’t need it but you can really mess up clusters by rolling back CRD
>> changes…
>>
>> On Mon, 11 Sep 2023 at 19:42, Evgeniy Lyutikov 
>> wrote:
>>
>>> Why we need to use latest CRD version with older operator version?
>>> --
>>> *От:* Gyula Fóra 
>>> *Отправлено:* 12 сентября 2023 г. 0:36:26
>>>
>>> *Кому:* Evgeniy Lyutikov
>>> *Копия:* user@flink.apache.org
>>> *Тема:* Re: Flink kubernets operator delete HA metadata after resuming
>>> from suspend
>>>
>>> Do not change the CRD but you can roll back the operator itself I
>>> believe
>>>
>>> Gyula
>>>
>>> On Mon, 11 Sep 2023 at 18:52, Evgeniy Lyutikov 
>>> wrote:
>>>
>>>> Is it safe to rollback the operator version with replace to old CRDs?
>>>> --
>>>> *От:* Evgeniy Lyutikov 
>>>> *Отправлено:* 11 сентября 2023 г. 23:50:26
>>>> *Кому:* Gyula Fóra
>>>>
>>>> *Копия:* user@flink.apache.org
>>>> *Тема:* Re: Flink kubernets operator delete HA metadata after resuming
>>>> from suspend
>>>>
>>>>
>>>> Hi!
>>>> No, no one could restart jobmanager,
>>>> I monitored the pods in real time, they all deleted when suspended as
>>>> expected.
>>>>
>>&g

Re: Flink Operator 1.6 causes JobManagerDeploymentStatus: MISSING

2023-10-18 Thread Gyula Fóra
Hi!
Not sure if it’s the same but could you try picking up the fix from the
release branch and confirming that it solves the problem?

If it does we may consider a quick bug fix release.

Cheers
Gyula

On Wed, 18 Oct 2023 at 18:09, Tony Chen  wrote:

> Hi Flink Community,
>
> Most of the Flink applications run on 1.14 at my company. After upgrading
> the Flink Operator to 1.6, we've seen many jobmanager pods show
> "JobManagerDeploymentStatus: MISSING".
>
> Here are some logs from the operator pod on one of our Flink applications:
>
> [m [33m2023-10-18 02:02:40,823 [m [36mo.a.f.k.o.l.AuditUtils [m [32m[INFO
> ][nemo/nemo-streaming-users-identi-updates] >>> Event | Warning |
> SAVEPOINTERROR | Savepoint failed for savepointTriggerNonce: null
> ...
> [m [33m2023-10-18 02:02:40,883 [m [36mo.a.f.k.o.l.AuditUtils [m [32m[INFO
> ][nemo/nemo-streaming-users-identi-updates] >>> Event | Warning |
> CLUSTERDEPLOYMENTEXCEPTION | Status have been modified externally in
> version 17447422864 Previous: 
> ...
> [m [33m2023-10-18 02:02:40,919 [m [36mi.j.o.p.e.ReconciliationDispatcher
> [m [1;31m[ERROR][nemo/nemo-streaming-users-identi-updates] Error during
> event processing ExecutionScope{ resource id:
> ResourceID{name='nemo-streaming-users-identi-updates', namespace='nemo'},
> version: 17447420285} failed.
> ...
> org.apache.flink.kubernetes.operator.exception.ReconciliationException:
> org.apache.flink.kubernetes.operator.exception.StatusConflictException:
> Status have been modified externally in version 17447422864 Previous:
> 
> ...
> [m [33m2023-10-18 02:03:03,273 [m [36mo.a.f.k.o.o.d.ApplicationObserver [m
> [1;31m[ERROR][nemo/nemo-streaming-users-identi-updates] Missing JobManager
> deployment
> ...
> [m [33m2023-10-18 02:03:03,295 [m [36mo.a.f.k.o.l.AuditUtils [m [32m[INFO
> ][nemo/nemo-streaming-users-identi-updates] >>> Event | Warning | MISSING |
> Missing JobManager deployment
> [m [33m2023-10-18 02:03:03,295 [m [36mo.a.f.c.Configuration [m [33m[WARN
> ][nemo/nemo-streaming-users-identi-updates] Config uses deprecated
> configuration key 'high-availability' instead of proper key
> 'high-availability.type'
>
>
> This seems related to this email thread:
> https://www.mail-archive.com/user@flink.apache.org/msg51439.html.
> However, I believe that we're not seeing the HA metadata getting deleted.
>
> What could cause the JobManagerDeploymentStatus to be MISSING?
>
> Thanks,
> Tony
>
> --
>
> 
>
> Tony Chen
>
> Software Engineer
>
> Menlo Park, CA
>
> Don't copy, share, or use this email without permission. If you received
> it by accident, please let us know and then delete it right away.
>


RE: RocksDB: Memory is slowly increasing over time

2023-10-18 Thread Guozhen Yang
Hi, Patrick:

We have encountered the same issue, that TaskManager's memory consumption
increases almost monotonously.

I'll try to describe what we have observed and our solution. You can check
if it would solve the problem.

We have observed that
1. Jobs with RocksDB state backend would fail after a random period of time
after deploying. All failures were TaskManager pod OOM killed by k8s. No
jvm exceptions like "java.lang.OutOfMemoryError: Java heap space" have ever
happened.
2. TaskManager pods were still OOM killed by k8s after setting
kubernetes.taskmanager.memory.limit-factor to
some number larger than 1.0, like 1.5 or 2.0. The config
kubernetes.taskmanager.memory.limit-factor controls
the ratio between memory limit and request submitted to k8s.
3. There would be a longer period of time before the first OOM killed event
if we had requested more memory from k8s. container_memory_working_set_bytes
increased almost monotonously. And pods got OOM killed when
container_memory_working_set_bytes
hitted the configured memory limit.
4. We used jeprof to profile RocksDB's native memory allocation. We found
no memory leak. RocksDB's overall native memory consumption was less than
managed memory configured.
5. container_memory_rss was way larger than memory requested. We gathered
some statistics by using jcmd and jeprof. It turns out that rss was way
larger than the memory size consumed by JVM and Jemalloc.

So we draw conclusions from what we have observed
1. The memory issue is not caused by JVM. Since we would see JVM exceptions
in log if it was caused by JVM.
2. We googled and found Flink cannot control RocksDB's memory consumption
precisely so there will be some memory over-request. But we believe it's
not our case since in our case we request 8GB memory and limit it to 16GB.
We believe there may be some memory over-request but not that size.
3. We suspect that there may be a memory leak issue. We googled and there
were many conversations about memory leaks caused by RocksDB.
4. But since the jeprof result shows no memory leak. We denied the
suspicion that RocksDB caused a memory leak and caused the memory issue.
5. We suspect Jemalloc caused the memory issue. Since the jeprof result
shows no memory issue but container_memory_rss indicates there's a memory
issue. We suspect there's a gap between the memory statistic I mentioned
above and the container_memory_rss metric.

We did some google search and found that Jemalloc does not cope well with
Transparent Huge Page, which is defaultly set to always at our host.
Briefly speaking, Jemalloc requests huge pages (2MB pages but not 4KB
pages) from the kernel but tells the kernel part of the 2MB page is not
used so the kernel can free this part. But since the kernel does not split
the huge page into normal 4KB pages, the kernel would never free the whole
huge page if the whole huge page is not marked ok to be freed by the kernel.

Many database systems like redis, mongo, oracle recommend to disable
Transparent Huge Page. So we disabled this kernel function. After disabling
Transparent Huge Page, we observed no memory issue anymore.

Hope our experience will help you.


On 2023/10/17 13:41:02 "Eifler, Patrick" wrote:
> Hello,
>
> We are running Flink jobs on K8s and using RocksDB as state backend. It
is connected to S3 for checkpointing. We have multiple states in the job
(mapstate and value states). We are seeing a slow but stable increase over
time on the memory consumption. We only see this in our jobs connected to
RocksDB.
>
>
> We are currently using the default memory setting
(state-backend-rocksdb-memory-managed=true). Now we are wondering what a
good alternative setting would be. We want to try to enable
thestate.backend.rocksdb.memory.partitioned-index-filters but it only takes
effect if the managed memory is turned off, so we need to figure out what
would be a good amount for memory.fixed-per-slot.
>
> Any hint what a good indicator for that calculation would be?
> Any other experience if someone has seen similar behavior before would
also be much appreciated.
> Thanks!
>
> Best Regards
>
> --
> Patrick Eifler
>
>


Re: Metrics with labels

2023-10-17 Thread Chesnay Schepler

> I think this is a general issue with the Flink metrics.

Not quite. There are a few instance in Flink were code wasn't updated to 
encode metadata as additional labels, and the RocksDB metrics may be one 
of them.
Also for RocksDB, you could try setting 
"state.backend.rocksdb.metrics.column-family-as-variable: true" to 
resolve this particular problem.


> If I define a custom metric, it is not supported to use labels

You can do so via MetricGroup#addGroup(String key, String value).
See 
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/ops/metrics/#user-variables


On 17/10/2023 14:31, Lars Skjærven wrote:

Hello,

We're experiencing difficulties in using Flink metrics in a generic 
way since various properties are included in the name of the metric 
itself. This makes it difficult to generate sensible (and general) 
dashboards (with aggregations).


One example is the metric for rocksdb estimated live data size 
(state.backend.rocksdb.metrics.estimate-live-data-size). the name 
appears as : 
flink_taskmanager_job_task_operator__state_rocksdb_estimate_live_data_size 
.


If, on the other hand, the state name was included as label, this 
would facilitate aggregation across states, i.e.:

flink_taskmanager_job_task_operator_state_rocksdb_estimate_live_data_size{state_descriptor="my_state_descriptor"}

I think this is a general issue with the Flink metrics. If I define a 
custom metric, it is not supported to use labels 
(https://prometheus.io/docs/practices/naming/#labels) in a dynamic way.


Thanks !

Lars



Re: [EXTERNAL]Re: Next Release of the Flink Kubernetes Operator

2023-10-17 Thread Gyula Fóra
Building your own custom operator image is also fairly straightforward and
is also a good exercise to make sure you can easily backport critical fixes
into your prod envs in the future :)

Cheers,
Gyula

On Tue, Oct 17, 2023 at 10:48 AM Niklas Wilcke 
wrote:

> Hi Gyula,
>
> thank you very much for the update about the release schedule and for
> pointing me to the snapshot images. This is indeed very helpful and we will
> consider our options now.
>
> Regards,
> Niklas
>
> On 16. Oct 2023, at 17:56, Gyula Fóra  wrote:
>
> Hi Niklas!
>
> We weren't planning a 1.6.1 release and instead we were focusing on
> wrapping up changes for the 1.7.0 release coming in a month or so.
>
> However if there is enough interest and we have some committers/PMC
> willing to help with the release we can always do 1.6.1 but I personally
> don't have the bandwidth for it right now.
>
> In case you want to pick up the latest 1.6 backported fixes, you could
> either create a custom operator build or simply pick up the image built
> from the branch automatically [1]
> This would probably be an easy way to get the fixes into your environment.
> But of course this is not an official release, just a snapshot version so
> you have to do your own verification to make sure it works for you.
>
> Cheers,
> Gyula
>
> [1]
> https://github.com/apache/flink-kubernetes-operator/pkgs/container/flink-kubernetes-operator/127962962?tag=3f0dc2e
>
> On Mon, Oct 16, 2023 at 5:41 PM Niklas Wilcke 
> wrote:
>
>> Hi Flink Community,
>>
>> we are waiting for the next release of the Flink Kubernetes Operator,
>> because we are experiencing problems with loosing the HA metadata similar
>> to FLINK-33011 [0].
>> Since the problem is already fixed and also backported to the 1.6 branch
>> [1], my question would be whether we can expect a release soon?
>> Any input is highly appreciated. Thank you!
>>
>> Regards,
>> Niklas
>>
>>
>> [0] https://issues.apache.org/jira/browse/FLINK-33011
>> [1]
>> https://github.com/apache/flink-kubernetes-operator/commits/release-1.6
>>
>
>


Re: [EXTERNAL]Re: Next Release of the Flink Kubernetes Operator

2023-10-17 Thread Niklas Wilcke
Hi Gyula,

thank you very much for the update about the release schedule and for pointing 
me to the snapshot images. This is indeed very helpful and we will consider our 
options now.

Regards,
Niklas

> On 16. Oct 2023, at 17:56, Gyula Fóra  wrote:
> 
> Hi Niklas!
> 
> We weren't planning a 1.6.1 release and instead we were focusing on wrapping 
> up changes for the 1.7.0 release coming in a month or so.
> 
> However if there is enough interest and we have some committers/PMC willing 
> to help with the release we can always do 1.6.1 but I personally don't have 
> the bandwidth for it right now. 
> 
> In case you want to pick up the latest 1.6 backported fixes, you could either 
> create a custom operator build or simply pick up the image built from the 
> branch automatically [1]
> This would probably be an easy way to get the fixes into your environment. 
> But of course this is not an official release, just a snapshot version so you 
> have to do your own verification to make sure it works for you.
> 
> Cheers,
> Gyula
> 
> [1] 
> https://github.com/apache/flink-kubernetes-operator/pkgs/container/flink-kubernetes-operator/127962962?tag=3f0dc2e
> 
> On Mon, Oct 16, 2023 at 5:41 PM Niklas Wilcke  > wrote:
>> Hi Flink Community,
>> 
>> we are waiting for the next release of the Flink Kubernetes Operator, 
>> because we are experiencing problems with loosing the HA metadata similar to 
>> FLINK-33011 [0].
>> Since the problem is already fixed and also backported to the 1.6 branch 
>> [1], my question would be whether we can expect a release soon?
>> Any input is highly appreciated. Thank you!
>> 
>> Regards,
>> Niklas
>> 
>> 
>> [0] https://issues.apache.org/jira/browse/FLINK-33011
>> [1] https://github.com/apache/flink-kubernetes-operator/commits/release-1.6



smime.p7s
Description: S/MIME cryptographic signature


Re: Flink SQL的状态清理

2023-10-17 Thread Jane Chan
Hi, 你好

如果使用的是 standalone session cluster, 想要在 JM/TM 日志中看到参数打印出来, 需要在集群启动前在
flink-conf.yaml 配置 table.exec.state.ttl: '${TTL}', 再启动集群;
集群启动后再修改的话, 日志不会打印出来, 可以通过 SET; 命令查看当前 SQL CLI 中配置的参数.
另外, 需要先执行 SET 'table.exec.state.ttl' = '${TTL}' , 然后提交作业, 可以确认下操作顺序是否有误.

祝好!
Jane

On Mon, Oct 9, 2023 at 6:01 PM 小昌同学  wrote:

> 你好,老师,我也是这样设置的,我这边是flink sql client,但是我去flink web ui界面并没有看到这个配置生效。
>
>
> | |
> 小昌同学
> |
> |
> ccc0606fight...@163.com
> |
>  回复的原邮件 
> | 发件人 | Jane Chan |
> | 发送日期 | 2023年9月25日 11:24 |
> | 收件人 |  |
> | 主题 | Re: Flink SQL的状态清理 |
> Hi,
>
> 可以通过设置 table.exec.state.ttl 来控制状态算子的 state TTL. 更多信息请参阅 [1]
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-master/zh/docs/dev/table/concepts/overview/#%e7%8a%b6%e6%80%81%e7%ae%a1%e7%90%86
>
> Best,
> Jane
>
> On Thu, Sep 21, 2023 at 5:17 PM faronzz  wrote:
>
> 试试这个 t_env.get_config().set("table.exec.state.ttl", "86400 s")
>
>
>
>
> | |
> faronzz
> |
> |
> faro...@163.com
> |
>
>
>  回复的原邮件 
> | 发件人 | 小昌同学 |
> | 发送日期 | 2023年09月21日 17:06 |
> | 收件人 | user-zh |
> | 主题 | Flink SQL的状态清理 |
>
>
> 各位老师好,请教一下大家关于flink sql的状态清理问题,我百度的话只找到相关的minbath设置,sql是没有配置state的ttl设置嘛
> | |
> 小昌同学
> |
> |
> ccc0606fight...@163.com
> |
>


Re: 请问1.18什么时候可以发布呢,想体验1.17jdk

2023-10-16 Thread Jing Ge
快了,已经开始voting了 :-))

On Sun, Oct 15, 2023 at 5:55 AM kcz <573693...@qq.com.invalid> wrote:

>


Re: Next Release of the Flink Kubernetes Operator

2023-10-16 Thread Gyula Fóra
Hi Niklas!

We weren't planning a 1.6.1 release and instead we were focusing on
wrapping up changes for the 1.7.0 release coming in a month or so.

However if there is enough interest and we have some committers/PMC willing
to help with the release we can always do 1.6.1 but I personally don't have
the bandwidth for it right now.

In case you want to pick up the latest 1.6 backported fixes, you could
either create a custom operator build or simply pick up the image built
from the branch automatically [1]
This would probably be an easy way to get the fixes into your environment.
But of course this is not an official release, just a snapshot version so
you have to do your own verification to make sure it works for you.

Cheers,
Gyula

[1]
https://github.com/apache/flink-kubernetes-operator/pkgs/container/flink-kubernetes-operator/127962962?tag=3f0dc2e

On Mon, Oct 16, 2023 at 5:41 PM Niklas Wilcke 
wrote:

> Hi Flink Community,
>
> we are waiting for the next release of the Flink Kubernetes Operator,
> because we are experiencing problems with loosing the HA metadata similar
> to FLINK-33011 [0].
> Since the problem is already fixed and also backported to the 1.6 branch
> [1], my question would be whether we can expect a release soon?
> Any input is highly appreciated. Thank you!
>
> Regards,
> Niklas
>
>
> [0] https://issues.apache.org/jira/browse/FLINK-33011
> [1]
> https://github.com/apache/flink-kubernetes-operator/commits/release-1.6
>


Re: Plans for upgrading Curl with latest 8.4.0

2023-10-16 Thread Martijn Visser
Hi Ankur,

This is used during CI runs, but it's not bundled/distributed with
Flink itself.

Best regards,

Martijn

On Thu, Oct 12, 2023 at 12:09 PM Singhal, Ankur  wrote:
>
> Hi Matijn,
>
> This is just a reference but we are using it at multiple places.
> https://github.com/apache/flink/blob/master/tools/ci/maven-utils.sh#L59
>
> Although the hostname we are referring here is hardcoded so it can be 
> mitigated.
>
> Thanks and Regards,
> Ankur Singhal
>
> -Original Message-
> From: Martijn Visser 
> Sent: Thursday, October 12, 2023 3:24 PM
> To: Singhal, Ankur 
> Cc: user@flink.apache.org
> Subject: Re: Plans for upgrading Curl with latest 8.4.0
>
> Hi Ankur,
>
> Where do you see Flink using/bundling Curl?
>
> Best regards,
>
> Martijn
>
> On Wed, Oct 11, 2023 at 9:08 AM Singhal, Ankur  wrote:
> >
> > Hi Team,
> >
> >
> >
> > Do we have any plans to update flink to support Curl 8.4.0 with earlier 
> > versions having severe vulnerabilities?
> >
> >
> >
> > Thanks & Regards,
> >
> > Ankur Singhal
> >
> >


Re: File Source Watermark Issue

2023-10-16 Thread Martijn Visser
Hi Kirti Dhar,

There isn't really enough information to answer it: are you using
Flink in bounded mode, how have you created your job, what is
appearing in the logs etc.

Best regards,

Martijn

On Mon, Oct 16, 2023 at 7:01 AM Kirti Dhar Upadhyay K via user
 wrote:
>
> Hi Community,
>
>
>
> Can someone help me here?
>
>
>
> Regards,
>
> Kirti Dhar
>
>
>
> From: Kirti Dhar Upadhyay K
> Sent: 10 October 2023 15:52
> To: user@flink.apache.org
> Subject: File Source Watermark Issue
>
>
>
> Hi Team,
>
>
>
> I am using Flink File Source with window aggregator as process function, and 
> stuck with a weird issues.
>
> File source doesn’t seem emitting/progressing the watermarks, whereas if I 
> put a delay (say 100ms) while extracting timestamp from event, it is working 
> fine.
>
>
>
> A bit same thing I found in comments here 
> https://stackoverflow.com/questions/68736330/flink-watermark-not-advancing-at-all-stuck-at-9223372036854775808/68743019#68743019
>
>
>
> Can someone help me here?
>
>
>
> Regards,
>
> Kirti Dhar


Re: Flink 1.17.1 with 1.8 projects

2023-10-16 Thread Martijn Visser
Hi Patricia,

There's no guarantee of compatibility between different Flink minor
versions and it's not supported. If it works, that can be specific to
this use case and could break at any time. It's up to you to determine
if that is sufficient for you or not.

Best regards,

Martijn

On Mon, Oct 16, 2023 at 9:48 AM patricia lee  wrote:
>
> Hi,
>
> Some of my colleagues are using Flink 1.17.1 server but with projects with 
> Flink 1.8 libraries, so far the projects are working fine without issue for a 
> month now.
>
> Will there be any issue that we are not just aware of, if we continue with 
> this kind of set up env? Appreciate any response.
>
>
> Regards,
> Patricia


Re: Table API in process function

2023-10-15 Thread Feng Jin
Hi Yashoda,

I think this is not a reasonable way and it is not supported at the moment.

I suggest that you can convert the DataStream generated by windowsAll into
a Table, and then use the TableAPI.

AllWindowProcess -> ConvertDataStreamToTable ->  ProcessUsingTableAPI


Best,
Feng

On Fri, Oct 13, 2023 at 9:31 PM Yashoda Krishna T 
wrote:

> Is it possible to use table API inside a processAll window function .
> Lets say, the use case is process function should enrich for each element
> by querying some SQL queries over the entire elements in the window using
> table API. Is this case supported in flink? If not what is the suggested way
>
> Thanks
>


RE: File Source Watermark Issue

2023-10-15 Thread Kirti Dhar Upadhyay K via user
Hi Community,

Can someone help me here?

Regards,
Kirti Dhar

From: Kirti Dhar Upadhyay K
Sent: 10 October 2023 15:52
To: user@flink.apache.org
Subject: File Source Watermark Issue

Hi Team,

I am using Flink File Source with window aggregator as process function, and 
stuck with a weird issues.
File source doesn't seem emitting/progressing the watermarks, whereas if I put 
a delay (say 100ms) while extracting timestamp from event, it is working fine.

A bit same thing I found in comments here 
https://stackoverflow.com/questions/68736330/flink-watermark-not-advancing-at-all-stuck-at-9223372036854775808/68743019#68743019

Can someone help me here?

Regards,
Kirti Dhar


Re: NPE in Calcite RelMetadataQueryBase

2023-10-15 Thread Jad Naous
I've now tracked this down to the fact that my application is trying to run
a query in a separate thread from the one that set up the table
environment. is StreamTableEnvironment supposed to be thread safe? Can it
be a singleton that is available to multiple threads to server queries on
or should each thread built its own?

Thanks!

Jad Naous 
Grepr, CEO/Founder



On Fri, Oct 13, 2023 at 5:42 PM Jad Naous  wrote:

> Hi all,
>
> We're using the Table API to read in batch mode from an Iceberg table on
> s3 using DynamoDB as the catalog implementation. We're hitting a NPE in the
> Calcite planner. The same query works fine when using the local FS and
> in-memory catalog. Below is a snipped stacktrace. Any thoughts on how to
> troubleshoot this?Any help would be much appreciated!
> Thanks!
> Jad
>
> java.lang.NullPointerException: metadataHandlerProvider
> at java.base/java.util.Objects.requireNonNull(Objects.java:246)
> at
> org.apache.calcite.rel.metadata.RelMetadataQueryBase.getMetadataHandlerProvider(RelMetadataQueryBase.java:122)
> at
> org.apache.calcite.rel.metadata.RelMetadataQueryBase.revise(RelMetadataQueryBase.java:118)
> at
> org.apache.calcite.rel.metadata.RelMetadataQuery.collations(RelMetadataQuery.java:604)
> at
> org.apache.calcite.rel.metadata.RelMdCollation.project(RelMdCollation.java:291)
> at
> org.apache.calcite.rel.logical.LogicalProject.lambda$create$0(LogicalProject.java:125)
> at org.apache.calcite.plan.RelTraitSet.replaceIfs(RelTraitSet.java:244)
> at
> org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:124)
> at
> org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:114)
> at
> org.apache.calcite.rel.core.RelFactories$ProjectFactoryImpl.createProject(RelFactories.java:178)
> at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:2191)
> at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1970)
> at
> org.apache.flink.table.planner.plan.QueryOperationConverter$SingleRelVisitor.visit(QueryOperationConverter.java:165)
> at
> org.apache.flink.table.planner.plan.QueryOperationConverter$SingleRelVisitor.visit(QueryOperationConverter.java:158)
> at
> org.apache.flink.table.operations.ProjectQueryOperation.accept(ProjectQueryOperation.java:76)
> at
> org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:155)
> at
> org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:135)
> at
> org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:47)
> at
> org.apache.flink.table.operations.ProjectQueryOperation.accept(ProjectQueryOperation.java:76)
> at
> org.apache.flink.table.planner.plan.QueryOperationConverter.lambda$defaultMethod$0(QueryOperationConverter.java:154)
> at
> java.base/java.util.Collections$SingletonList.forEach(Collections.java:4856)
> at
> org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:154)
> at
> org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:135)
> at
> org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:72)
> at
> org.apache.flink.table.operations.FilterQueryOperation.accept(FilterQueryOperation.java:67)
> at
> org.apache.flink.table.planner.plan.QueryOperationConverter.lambda$defaultMethod$0(QueryOperationConverter.java:154)
> at
> java.base/java.util.Collections$SingletonList.forEach(Collections.java:4856)
> at
> org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:154)
> at
> org.apache.flink.table.planner.plan.QueryOperationConverter.defaultMethod(QueryOperationConverter.java:135)
> at
> org.apache.flink.table.operations.utils.QueryOperationDefaultVisitor.visit(QueryOperationDefaultVisitor.java:82)
> at
> org.apache.flink.table.operations.SortQueryOperation.accept(SortQueryOperation.java:93)
> at
> org.apache.flink.table.planner.calcite.FlinkRelBuilder.queryOperation(FlinkRelBuilder.java:261)
> at
> org.apache.flink.table.planner.delegation.PlannerBase.translateToRel(PlannerBase.scala:224)
> at
> org.apache.flink.table.planner.delegation.PlannerBase.$anonfun$translate$1(PlannerBase.scala:194)
> at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
> at scala.collection.Iterator.foreach(Iterator.scala:937)
> at scala.collection.Iterator.foreach$(Iterator.scala:937)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
> at scala.collection.IterableLike.foreach(IterableLike.scala:70)
> at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at scala.collection.TraversableLike.map(TraversableLike.scala:233)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
> at 

RE: [EXTERNAL] Re: Kinesis Producer - Support separate Cloudwatch credentials

2023-10-13 Thread Diem, Chase via user
Thanks for the feedback.

It looks like that is available in 1.15 and above?  Is that correct?  We can 
look into upgrading.

We are on Flink 1.14.  At the time a year or so ago, it was the latest version 
that AWS EMR offered (emr-6.7.0) out of the box and we just keep up to date 
with the latest patch versions.



Machine
 Learning
eXperience
Chase Diem
Senior Software Engineer
[Smart Phone] Cell: 484.522.9101
[Receiver]  Desk: 610.3503395
Comcast, CXE
1800 Arch St | Philladelphia, PA







From: Danny Cranmer 
Sent: Friday, October 13, 2023 5:18 AM
To: Diem, Chase 
Cc: user 
Subject: [EXTERNAL] Re: Kinesis Producer - Support separate Cloudwatch 
credentials

Hey,

The FlinkKinesisProducer is deprecated in favour of the KinesisSink. The new 
sink does not rely on KPL, so this would not be a problem here. Is there a 
reason you are using the FlinkKinesisProducer instead of KinesisSink?

Thanks for the deep dive, generally speaking I agree it would be 
possible/useful to add a separate config for metrics. However since this 
connector is deprecated we will not be adding new features, unless there is a 
strong reason to do so.

Thanks,
Danny

On Thu, Oct 12, 2023 at 5:45 PM Diem, Chase via user 
mailto:user@flink.apache.org>> wrote:
Hey Team,

Looking for some thoughts here:

  *   We have a Kinesis Producer that produces to a topic in another AWS account
  *   The producer allows for configurations to set credentials for that 
account: 
https://github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/KinesisConfigUtil.java#L494<https://urldefense.com/v3/__https:/github.com/apache/flink-connector-aws/blob/main/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/util/KinesisConfigUtil.java*L494__;Iw!!CQl3mcHX2A!BQT977vsnxhY7S5V11mdfy03gu-dvHPA_hLxdtX13qVh9Jol-V_AhYQPDBBILWvS2m90XIbpko5eF5UlVQmqpEuv$>
  *   We DO NOT have access produce to their Cloudwatch.  We would prefer to 
produce the metrics to our own account instead.
  *   The AWS KinesisProducerConfiguration supports setting separate 
credentials for both the producer and Cloudwatch, but the KinesisConfigUtil 
does not support it: 
https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer/src/main/java/com/amazonaws/services/kinesis/producer/KinesisProducerConfiguration.java#L139<https://urldefense.com/v3/__https:/github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer/src/main/java/com/amazonaws/services/kinesis/producer/KinesisProducerConfiguration.java*L139__;Iw!!CQl3mcHX2A!BQT977vsnxhY7S5V11mdfy03gu-dvHPA_hLxdtX13qVh9Jol-V_AhYQPDBBILWvS2m90XIbpko5eF5UlVfs2ITLc$>

It would be nice to support properties for say:
 * metrics.aws.credentials.provider for custom Cloudwatch metrics 
credentials provider,
 * metrics.aws.credentials.provider.role.provider for custom Cloudwatch 
credentials provider for assuming a role,


AWS KinesisProducerConfiguration.java:
/**
 * {@link AWSCredentialsProvider} that supplies credentials used to upload
 * metrics to CloudWatch.
 * 
 * If not given, the credentials used to put records
 * to Kinesis are also used for CloudWatch.
 *
 * @see #setCredentialsProvider(AWSCredentialsProvider)
 */
public KinesisProducerConfiguration 
setMetricsCredentialsProvider(AWSCredentialsProvider 
metricsCredentialsProvider) {
this.metricsCredentialsProvider = metricsCredentialsProvider;
return this;
}

Flink KinesisConfigUtil.java:
KinesisProducerConfiguration kpc = 
KinesisProducerConfiguration.fromProperties(config);
kpc.setRegion(config.getProperty(AWSConfigConstants.AWS_REGION));

kpc.setCredentialsProvider(AWSUtil.getCredentialsProvider(config));

// we explicitly lower the credential refresh delay (default is 5 
seconds)
// to avoid an ignorable interruption warning that occurs when shutting 
down the
// KPL client. See 
https://github.com/awslabs/amazon-kinesis-producer/issues/10<https://urldefense.com/v3/__https:/github.com/awslabs/amazon-kinesis-producer/issues/10__;!!CQl3mcHX2A!BQT977vsnxhY7S5V11mdfy03gu-dvHPA_hLxdtX13qVh9Jol-V_AhYQPDBBILWvS2m90XIbpko5eF5UlVai_F1Hw$>.
kpc.setCredentialsRefreshDelay(100);






<    6   7   8   9   10   11   12   13   14   15   >