Re: latency related to the checkpointing mode EXACTLY ONCE

2021-02-24 Thread Arvid Heise
When Flink fails and restarts, it goes back in time to reprocess the data
of the latest checkpoint. That's why it also deleted all uncommitted data
on restart or else you would receive duplicates in your output.
Hence, to get exactly once, you cannot read uncommitted data. That is true
for all streaming systems and sinks that depend on transactions.

In general, low latency and exactly once are contradicting each other a
bit. In Flink, you can only get it in a meaningful way if your
checkpointing interval is very low, which is currently only possible if
your state is very small (no big join windows for example). We are working
on improving that limitation though.

One solution if you need low latency is to drop exactly once and
deduplicate events in your downstream application.
On Fri, Feb 19, 2021 at 9:55 AM Tan, Min  wrote:

> Many thanks for your quick response.
>
>
>
> The config read_commit for the kafka consumers is required by the exactly
> once (EOS)?
>
> No exactly once if we read un committed messages?
>
>
>
> Regards,
>
> Min
>
>
>
> *From:* Chesnay Schepler 
> *Sent:* Thursday, February 18, 2021 8:27 PM
> *To:* Tan, Min ; user 
> *Subject:* [External] Re: latency related to the checkpointing mode
> EXACTLY ONCE
>
>
>
> Yes, if you are only reading committed data than it will take least the
> checkpoint interval for the data to be available to downstream consumers.
>
>
>
> On 2/18/2021 6:17 PM, Tan, Min wrote:
>
> Hi,
>
>
>
> We use the checkpointing mode EXACTLY ONCE for some of our flink jobs.
>
>
>
> I wonder how the checkpoint configurations specially its checkpoint
> interval are related to the end to end latency.
>
>
>
> We need to setup read_commit true for the kafak consumers.
>
>
>
> Does this lead a latency from one flink job is greater than that of
> checkpoint interval?
>
>
>
> Thank you very much for your help in advance.
>
>
>
> Min
>
>
>


RE: latency related to the checkpointing mode EXACTLY ONCE

2021-02-19 Thread Tan, Min
Many thanks for your quick response.

The config read_commit for the kafka consumers is required by the exactly once 
(EOS)?
No exactly once if we read un committed messages?

Regards,
Min

From: Chesnay Schepler 
Sent: Thursday, February 18, 2021 8:27 PM
To: Tan, Min ; user 
Subject: [External] Re: latency related to the checkpointing mode EXACTLY ONCE

Yes, if you are only reading committed data than it will take least the 
checkpoint interval for the data to be available to downstream consumers.

On 2/18/2021 6:17 PM, Tan, Min wrote:
Hi,

We use the checkpointing mode EXACTLY ONCE for some of our flink jobs.

I wonder how the checkpoint configurations specially its checkpoint interval 
are related to the end to end latency.

We need to setup read_commit true for the kafak consumers.

Does this lead a latency from one flink job is greater than that of checkpoint 
interval?

Thank you very much for your help in advance.

Min



E-mails can involve SUBSTANTIAL RISKS, e.g. lack of confidentiality, potential 
manipulation of contents and/or sender's address, incorrect recipient 
(misdirection), viruses etc. Based on previous e-mail correspondence with you 
and/or an agreement reached with you, UBS considers itself authorized to 
contact you via e-mail. UBS assumes no responsibility for any loss or damage 
resulting from the use of e-mails. 
The recipient is aware of and accepts the inherent risks of using e-mails, in 
particular the risk that the banking relationship and confidential information 
relating thereto are disclosed to third parties.
UBS reserves the right to retain and monitor all messages. Messages are 
protected and accessed only in legally justified cases.
For information on how UBS uses and discloses personal data, how long we retain 
it, how we keep it secure and your data protection rights, please see our 
Privacy Notice http://www.ubs.com/privacy-statement

Re: latency related to the checkpointing mode EXACTLY ONCE

2021-02-18 Thread Chesnay Schepler
Yes, if you are only reading committed data than it will take least the 
checkpoint interval for the data to be available to downstream consumers.


On 2/18/2021 6:17 PM, Tan, Min wrote:


Hi,

We use the checkpointing mode EXACTLY ONCE for some of our flink jobs.

I wonder how the checkpoint configurations specially its
checkpoint interval are related to the end to end latency.

We need to setup read_commit true for the kafak consumers.

Does this lead a latency from one flink job is greater than that
of checkpoint interval?

Thank you very much for your help in advance.

Min





latency related to the checkpointing mode EXACTLY ONCE

2021-02-18 Thread Tan, Min
Hi,

We use the checkpointing mode EXACTLY ONCE for some of our flink jobs.

I wonder how the checkpoint configurations specially its checkpoint interval 
are related to the end to end latency.

We need to setup read_commit true for the kafak consumers.

Does this lead a latency from one flink job is greater than that of checkpoint 
interval?

Thank you very much for your help in advance.

Min

E-mails can involve SUBSTANTIAL RISKS, e.g. lack of confidentiality, potential 
manipulation of contents and/or sender's address, incorrect recipient 
(misdirection), viruses etc. Based on previous e-mail correspondence with you 
and/or an agreement reached with you, UBS considers itself authorized to 
contact you via e-mail. UBS assumes no responsibility for any loss or damage 
resulting from the use of e-mails. 
The recipient is aware of and accepts the inherent risks of using e-mails, in 
particular the risk that the banking relationship and confidential information 
relating thereto are disclosed to third parties.
UBS reserves the right to retain and monitor all messages. Messages are 
protected and accessed only in legally justified cases.
For information on how UBS uses and discloses personal data, how long we retain 
it, how we keep it secure and your data protection rights, please see our 
Privacy Notice http://www.ubs.com/privacy-statement