Re: Kafka Consumer commit error

2022-06-15 Thread Qingsheng Ren
Hi,

Thanks for reporting the issue and the demo provided by Christian!

I traced the code and think it's a bug in KafkaConsumer (see KAFKA-13563 [1]). 
We probably need to bump the Kafka client to 3.1 to fix it but we should check 
the compatilibity issue first because it’s crossing major version of Kafka (2.x 
-> 3.x). 

[1] https://issues.apache.org/jira/browse/KAFKA-13563

Best, 

Qingsheng

> On Jun 15, 2022, at 02:14, Martijn Visser  wrote:
> 
> Hi Christian,
> 
> There's another similar error reported by someone else. I've linked the 
> tickets together and asked one of the Kafka maintainers to have a look at 
> this.
> 
> Best regards,
> 
> Martijn
> 
> Op di 14 jun. 2022 om 17:16 schreef Christian Lorenz 
> :
> Hi Alexander,
> 
>  
> 
> I’ve created a Jira ticket here 
> https://issues.apache.org/jira/browse/FLINK-28060.
> 
> Unfortunately this is causing some issues to us.
> 
> I hope with the attached demo project the root cause of this can also be 
> determined, as this is reproducible in Flink 1.15.0, but not in Flink 1.14.4.
> 
>  
> 
> Kind regards,
> 
> Christian
> 
>  
> 
> Von: Alexander Fedulov 
> Datum: Montag, 13. Juni 2022 um 23:42
> An: Christian Lorenz 
> Cc: "user@flink.apache.org" 
> Betreff: Re: Kafka Consumer commit error
> 
>  
> 
> This email has reached Mapp via an external source
> 
>  
> 
> Hi Christian,
> 
>  
> 
> thanks for the reply. We use AT_LEAST_ONCE delivery semantics in this 
> application. Do you think this might still be related?
> 
>  
> 
> No, in that case, Kafka transactions are not used, so it should not be 
> relevant.
> 
>  
> 
> Best,
> 
> Alexander Fedulov
> 
>  
> 
> On Mon, Jun 13, 2022 at 3:48 PM Christian Lorenz  
> wrote:
> 
> Hi Alexander,
> 
>  
> 
> thanks for the reply. We use AT_LEAST_ONCE delivery semantics in this 
> application. Do you think this might still be related?
> 
>  
> 
> Best regards,
> 
> Christian
> 
>  
> 
>  
> 
> Von: Alexander Fedulov 
> Datum: Montag, 13. Juni 2022 um 13:06
> An: "user@flink.apache.org" 
> Cc: Christian Lorenz 
> Betreff: Re: Kafka Consumer commit error
> 
>  
> 
> This email has reached Mapp via an external source
> 
>  
> 
> Hi Christian,
> 
>  
> 
> you should check if the exceptions that you see after the broker is back from 
> maintenance are the same as the ones you posted here. If you are using 
> EXACTLY_ONCE, it could be that the later errors are caused by Kafka purging 
> transactions that Flink attempts to commit [1].
> 
>  
> 
> Best,
> 
> Alexander Fedulov
> 
> 
> [1] 
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#fault-tolerance
> 
>  
> 
> On Mon, Jun 13, 2022 at 12:04 PM Martijn Visser  
> wrote:
> 
> Hi Christian,
> 
>  
> 
> I would expect that after the broker comes back up and recovers completely, 
> these error messages would disappear automagically. It should not require a 
> restart (only time). Flink doesn't rely on Kafka's checkpointing mechanism 
> for fault tolerance. 
> 
>  
> 
> Best regards,
> 
>  
> 
> Martijn
> 
>  
> 
> Op wo 8 jun. 2022 om 15:49 schreef Christian Lorenz 
> :
> 
> Hi,
> 
>  
> 
> we have some issues with a job using the flink-sql-connector-kafka (flink 
> 1.15.0/standalone cluster). If one broker e.g. is restarted for maintainance 
> (replication-factor=2), the taskmanagers executing the job are constantly 
> logging errors on each checkpoint creation:
> 
>  
> 
> Failed to commit consumer offsets for checkpoint 50659
> 
> org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException:
>  Offset commit failed with a retriable exception. You should retry committing 
> the latest consumed offsets.
> 
> Caused by: 
> org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException:
>  The coordinator is not available.
> 
>  
> 
> AFAICT the error itself is produced by the underlying kafka consumer. 
> Unfortunately this error cannot be reproduced on our test system.
> 
> From my understanding this error might occur once, but follow up checkpoints 
> / kafka commits should be fine again.
> 
> Currently my only way of “fixing” the issue is to restart the taskmanagers.
> 
>  
> 
> Is there maybe some kafka consumer setting which would help to circumvent 
> this?
> 
>  
> 
> Kind regards,
> 
> Christian
> 
> Mapp Digital Germany GmbH with registered offices at Dachauer, Str.

Re: Kafka Consumer commit error

2022-06-14 Thread Martijn Visser
Hi Christian,

There's another similar error reported by someone else. I've linked the
tickets together and asked one of the Kafka maintainers to have a look at
this.

Best regards,

Martijn

Op di 14 jun. 2022 om 17:16 schreef Christian Lorenz <
christian.lor...@mapp.com>:

> Hi Alexander,
>
>
>
> I’ve created a Jira ticket here
> https://issues.apache.org/jira/browse/FLINK-28060.
>
> Unfortunately this is causing some issues to us.
>
> I hope with the attached demo project the root cause of this can also be
> determined, as this is reproducible in Flink 1.15.0, but not in Flink
> 1.14.4.
>
>
>
> Kind regards,
>
> Christian
>
>
>
> *Von: *Alexander Fedulov 
> *Datum: *Montag, 13. Juni 2022 um 23:42
> *An: *Christian Lorenz 
> *Cc: *"user@flink.apache.org" 
> *Betreff: *Re: Kafka Consumer commit error
>
>
>
> This email has reached Mapp via an external source
>
>
>
> Hi Christian,
>
>
>
> thanks for the reply. We use AT_LEAST_ONCE delivery semantics in this
> application. Do you think this might still be related?
>
>
>
> No, in that case, Kafka transactions are not used, so it should not be
> relevant.
>
>
>
> Best,
>
> Alexander Fedulov
>
>
>
> On Mon, Jun 13, 2022 at 3:48 PM Christian Lorenz <
> christian.lor...@mapp.com> wrote:
>
> Hi Alexander,
>
>
>
> thanks for the reply. We use AT_LEAST_ONCE delivery semantics in this
> application. Do you think this might still be related?
>
>
>
> Best regards,
>
> Christian
>
>
>
>
>
> *Von: *Alexander Fedulov 
> *Datum: *Montag, 13. Juni 2022 um 13:06
> *An: *"user@flink.apache.org" 
> *Cc: *Christian Lorenz 
> *Betreff: *Re: Kafka Consumer commit error
>
>
>
> This email has reached Mapp via an external source
>
>
>
> Hi Christian,
>
>
>
> you should check if the exceptions that you see after the broker is back
> from maintenance are the same as the ones you posted here. If you are using
> EXACTLY_ONCE, it could be that the later errors are caused by Kafka purging
> transactions that Flink attempts to commit [1].
>
>
>
> Best,
>
> Alexander Fedulov
>
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#fault-tolerance
>
>
>
> On Mon, Jun 13, 2022 at 12:04 PM Martijn Visser 
> wrote:
>
> Hi Christian,
>
>
>
> I would expect that after the broker comes back up and recovers
> completely, these error messages would disappear automagically. It should
> not require a restart (only time). Flink doesn't rely on Kafka's
> checkpointing mechanism for fault tolerance.
>
>
>
> Best regards,
>
>
>
> Martijn
>
>
>
> Op wo 8 jun. 2022 om 15:49 schreef Christian Lorenz <
> christian.lor...@mapp.com>:
>
> Hi,
>
>
>
> we have some issues with a job using the flink-sql-connector-kafka (flink
> 1.15.0/standalone cluster). If one broker e.g. is restarted for
> maintainance (replication-factor=2), the taskmanagers executing the job are
> constantly logging errors on each checkpoint creation:
>
>
>
> Failed to commit consumer offsets for checkpoint 50659
>
> org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException:
> Offset commit failed with a retriable exception. You should retry
> committing the latest consumed offsets.
>
> Caused by:
> org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException:
> The coordinator is not available.
>
>
>
> AFAICT the error itself is produced by the underlying kafka consumer.
> Unfortunately this error cannot be reproduced on our test system.
>
> From my understanding this error might occur once, but follow up
> checkpoints / kafka commits should be fine again.
>
> Currently my only way of “fixing” the issue is to restart the taskmanagers.
>
>
>
> Is there maybe some kafka consumer setting which would help to circumvent
> this?
>
>
>
> Kind regards,
>
> Christian
>
> Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63,
> 80335 München.
> Registered with the District Court München HRB 226181
> Managing Directors: Frasier, Christopher & Warren, Steve
>
> This e-mail is from Mapp Digital and its international legal entities and
> may contain information that is confidential or proprietary.
> If you are not the intended recipient, do not read, copy or distribute the
> e-mail or any attachments. Instead, please notify the sender and delete the
> e-mail and any attachments.
> Please consider the environment before printing. Thank y

Re: Kafka Consumer commit error

2022-06-14 Thread Christian Lorenz
Hi Alexander,

I’ve created a Jira ticket here 
https://issues.apache.org/jira/browse/FLINK-28060.
Unfortunately this is causing some issues to us.
I hope with the attached demo project the root cause of this can also be 
determined, as this is reproducible in Flink 1.15.0, but not in Flink 1.14.4.

Kind regards,
Christian

Von: Alexander Fedulov 
Datum: Montag, 13. Juni 2022 um 23:42
An: Christian Lorenz 
Cc: "user@flink.apache.org" 
Betreff: Re: Kafka Consumer commit error

This email has reached Mapp via an external source

Hi Christian,

thanks for the reply. We use AT_LEAST_ONCE delivery semantics in this 
application. Do you think this might still be related?

No, in that case, Kafka transactions are not used, so it should not be relevant.

Best,
Alexander Fedulov

On Mon, Jun 13, 2022 at 3:48 PM Christian Lorenz 
mailto:christian.lor...@mapp.com>> wrote:
Hi Alexander,

thanks for the reply. We use AT_LEAST_ONCE delivery semantics in this 
application. Do you think this might still be related?

Best regards,
Christian


Von: Alexander Fedulov mailto:alexan...@ververica.com>>
Datum: Montag, 13. Juni 2022 um 13:06
An: "user@flink.apache.org<mailto:user@flink.apache.org>" 
mailto:user@flink.apache.org>>
Cc: Christian Lorenz 
mailto:christian.lor...@mapp.com>>
Betreff: Re: Kafka Consumer commit error

This email has reached Mapp via an external source

Hi Christian,

you should check if the exceptions that you see after the broker is back from 
maintenance are the same as the ones you posted here. If you are using 
EXACTLY_ONCE, it could be that the later errors are caused by Kafka purging 
transactions that Flink attempts to commit [1].

Best,
Alexander Fedulov

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#fault-tolerance

On Mon, Jun 13, 2022 at 12:04 PM Martijn Visser 
mailto:martijnvis...@apache.org>> wrote:
Hi Christian,

I would expect that after the broker comes back up and recovers completely, 
these error messages would disappear automagically. It should not require a 
restart (only time). Flink doesn't rely on Kafka's checkpointing mechanism for 
fault tolerance.

Best regards,

Martijn

Op wo 8 jun. 2022 om 15:49 schreef Christian Lorenz 
mailto:christian.lor...@mapp.com>>:
Hi,

we have some issues with a job using the flink-sql-connector-kafka (flink 
1.15.0/standalone cluster). If one broker e.g. is restarted for maintainance 
(replication-factor=2), the taskmanagers executing the job are constantly 
logging errors on each checkpoint creation:

Failed to commit consumer offsets for checkpoint 50659
org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException:
 Offset commit failed with a retriable exception. You should retry committing 
the latest consumed offsets.
Caused by: 
org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException:
 The coordinator is not available.

AFAICT the error itself is produced by the underlying kafka consumer. 
Unfortunately this error cannot be reproduced on our test system.
From my understanding this error might occur once, but follow up checkpoints / 
kafka commits should be fine again.
Currently my only way of “fixing” the issue is to restart the taskmanagers.

Is there maybe some kafka consumer setting which would help to circumvent this?

Kind regards,
Christian
Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 
München.
Registered with the District Court München HRB 226181
Managing Directors: Frasier, Christopher & Warren, Steve
This e-mail is from Mapp Digital and its international legal entities and may 
contain information that is confidential or proprietary.
If you are not the intended recipient, do not read, copy or distribute the 
e-mail or any attachments. Instead, please notify the sender and delete the 
e-mail and any attachments.
Please consider the environment before printing. Thank you.
Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 
München.
Registered with the District Court München HRB 226181
Managing Directors: Frasier, Christopher & Warren, Steve
This e-mail is from Mapp Digital and its international legal entities and may 
contain information that is confidential or proprietary.
If you are not the intended recipient, do not read, copy or distribute the 
e-mail or any attachments. Instead, please notify the sender and delete the 
e-mail and any attachments.
Please consider the environment before printing. Thank you.
Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 
München.
Registered with the District Court München HRB 226181
Managing Directors: Frasier, Christopher & Warren, Steve
This e-mail is from Mapp Digital and its international legal entities and may 
contain information that is confidential or proprietary.
If you are not the intended recipien

Re: Kafka Consumer commit error

2022-06-13 Thread Alexander Fedulov
Hi Christian,

thanks for the reply. We use AT_LEAST_ONCE delivery semantics in this
> application. Do you think this might still be related?


No, in that case, Kafka transactions are not used, so it should not be
relevant.

Best,
Alexander Fedulov

On Mon, Jun 13, 2022 at 3:48 PM Christian Lorenz 
wrote:

> Hi Alexander,
>
>
>
> thanks for the reply. We use AT_LEAST_ONCE delivery semantics in this
> application. Do you think this might still be related?
>
>
>
> Best regards,
>
> Christian
>
>
>
>
>
> *Von: *Alexander Fedulov 
> *Datum: *Montag, 13. Juni 2022 um 13:06
> *An: *"user@flink.apache.org" 
> *Cc: *Christian Lorenz 
> *Betreff: *Re: Kafka Consumer commit error
>
>
>
> This email has reached Mapp via an external source
>
>
>
> Hi Christian,
>
>
>
> you should check if the exceptions that you see after the broker is back
> from maintenance are the same as the ones you posted here. If you are using
> EXACTLY_ONCE, it could be that the later errors are caused by Kafka purging
> transactions that Flink attempts to commit [1].
>
>
>
> Best,
>
> Alexander Fedulov
>
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#fault-tolerance
>
>
>
> On Mon, Jun 13, 2022 at 12:04 PM Martijn Visser 
> wrote:
>
> Hi Christian,
>
>
>
> I would expect that after the broker comes back up and recovers
> completely, these error messages would disappear automagically. It should
> not require a restart (only time). Flink doesn't rely on Kafka's
> checkpointing mechanism for fault tolerance.
>
>
>
> Best regards,
>
>
>
> Martijn
>
>
>
> Op wo 8 jun. 2022 om 15:49 schreef Christian Lorenz <
> christian.lor...@mapp.com>:
>
> Hi,
>
>
>
> we have some issues with a job using the flink-sql-connector-kafka (flink
> 1.15.0/standalone cluster). If one broker e.g. is restarted for
> maintainance (replication-factor=2), the taskmanagers executing the job are
> constantly logging errors on each checkpoint creation:
>
>
>
> Failed to commit consumer offsets for checkpoint 50659
>
> org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException:
> Offset commit failed with a retriable exception. You should retry
> committing the latest consumed offsets.
>
> Caused by:
> org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException:
> The coordinator is not available.
>
>
>
> AFAICT the error itself is produced by the underlying kafka consumer.
> Unfortunately this error cannot be reproduced on our test system.
>
> From my understanding this error might occur once, but follow up
> checkpoints / kafka commits should be fine again.
>
> Currently my only way of “fixing” the issue is to restart the taskmanagers.
>
>
>
> Is there maybe some kafka consumer setting which would help to circumvent
> this?
>
>
>
> Kind regards,
>
> Christian
>
> Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63,
> 80335 München.
> Registered with the District Court München HRB 226181
> Managing Directors: Frasier, Christopher & Warren, Steve
>
> This e-mail is from Mapp Digital and its international legal entities and
> may contain information that is confidential or proprietary.
> If you are not the intended recipient, do not read, copy or distribute the
> e-mail or any attachments. Instead, please notify the sender and delete the
> e-mail and any attachments.
> Please consider the environment before printing. Thank you.
>
> Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63,
> 80335 München.
> Registered with the District Court München HRB 226181
> Managing Directors: Frasier, Christopher & Warren, Steve
>
> This e-mail is from Mapp Digital and its international legal entities and
> may contain information that is confidential or proprietary.
> If you are not the intended recipient, do not read, copy or distribute the
> e-mail or any attachments. Instead, please notify the sender and delete the
> e-mail and any attachments.
> Please consider the environment before printing. Thank you.
>


Re: Kafka Consumer commit error

2022-06-13 Thread Christian Lorenz
Hi Alexander,

thanks for the reply. We use AT_LEAST_ONCE delivery semantics in this 
application. Do you think this might still be related?

Best regards,
Christian


Von: Alexander Fedulov 
Datum: Montag, 13. Juni 2022 um 13:06
An: "user@flink.apache.org" 
Cc: Christian Lorenz 
Betreff: Re: Kafka Consumer commit error

This email has reached Mapp via an external source

Hi Christian,

you should check if the exceptions that you see after the broker is back from 
maintenance are the same as the ones you posted here. If you are using 
EXACTLY_ONCE, it could be that the later errors are caused by Kafka purging 
transactions that Flink attempts to commit [1].

Best,
Alexander Fedulov

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#fault-tolerance

On Mon, Jun 13, 2022 at 12:04 PM Martijn Visser 
mailto:martijnvis...@apache.org>> wrote:
Hi Christian,

I would expect that after the broker comes back up and recovers completely, 
these error messages would disappear automagically. It should not require a 
restart (only time). Flink doesn't rely on Kafka's checkpointing mechanism for 
fault tolerance.

Best regards,

Martijn

Op wo 8 jun. 2022 om 15:49 schreef Christian Lorenz 
mailto:christian.lor...@mapp.com>>:
Hi,

we have some issues with a job using the flink-sql-connector-kafka (flink 
1.15.0/standalone cluster). If one broker e.g. is restarted for maintainance 
(replication-factor=2), the taskmanagers executing the job are constantly 
logging errors on each checkpoint creation:

Failed to commit consumer offsets for checkpoint 50659
org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException:
 Offset commit failed with a retriable exception. You should retry committing 
the latest consumed offsets.
Caused by: 
org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException:
 The coordinator is not available.

AFAICT the error itself is produced by the underlying kafka consumer. 
Unfortunately this error cannot be reproduced on our test system.
From my understanding this error might occur once, but follow up checkpoints / 
kafka commits should be fine again.
Currently my only way of “fixing” the issue is to restart the taskmanagers.

Is there maybe some kafka consumer setting which would help to circumvent this?

Kind regards,
Christian
Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 
München.
Registered with the District Court München HRB 226181
Managing Directors: Frasier, Christopher & Warren, Steve
This e-mail is from Mapp Digital and its international legal entities and may 
contain information that is confidential or proprietary.
If you are not the intended recipient, do not read, copy or distribute the 
e-mail or any attachments. Instead, please notify the sender and delete the 
e-mail and any attachments.
Please consider the environment before printing. Thank you.
Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 
München.
Registered with the District Court München HRB 226181
Managing Directors: Frasier, Christopher & Warren, Steve
This e-mail is from Mapp Digital and its international legal entities and may 
contain information that is confidential or proprietary.
If you are not the intended recipient, do not read, copy or distribute the 
e-mail or any attachments. Instead, please notify the sender and delete the 
e-mail and any attachments.
Please consider the environment before printing. Thank you.


Re: Kafka Consumer commit error

2022-06-13 Thread Christian Lorenz
Hi Martijn,

thanks for replying. I would also expect the behavior you describe below. 
AFAICT it was also like this with Flink 1.14. I am aware that Flink is using 
checkpointing for fault tolerance, but for example the Kafka offsets are part 
of our monitoring and this will lead to alerts. Other applications which use 
the Kafka client directly also do not show repeated commit failures once all 
Kafka brokers are online again.
I think this occurs in Flink jobs using Flinks Kafka Connector directly 
(KafkaSource) or via a Kafka SQL Connector based application.

Will try to write a small job to verify this behavior, as we also use 
flink-avro-confluent-registry which makes it harder to understand the root of 
the issue.

Best regards,
Christian

Von: Martijn Visser 
Datum: Montag, 13. Juni 2022 um 12:05
An: Christian Lorenz 
Cc: "user@flink.apache.org" 
Betreff: Re: Kafka Consumer commit error

This email has reached Mapp via an external source

Hi Christian,

I would expect that after the broker comes back up and recovers completely, 
these error messages would disappear automagically. It should not require a 
restart (only time). Flink doesn't rely on Kafka's checkpointing mechanism for 
fault tolerance.

Best regards,

Martijn

Op wo 8 jun. 2022 om 15:49 schreef Christian Lorenz 
mailto:christian.lor...@mapp.com>>:
Hi,

we have some issues with a job using the flink-sql-connector-kafka (flink 
1.15.0/standalone cluster). If one broker e.g. is restarted for maintainance 
(replication-factor=2), the taskmanagers executing the job are constantly 
logging errors on each checkpoint creation:

Failed to commit consumer offsets for checkpoint 50659
org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException:
 Offset commit failed with a retriable exception. You should retry committing 
the latest consumed offsets.
Caused by: 
org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException:
 The coordinator is not available.

AFAICT the error itself is produced by the underlying kafka consumer. 
Unfortunately this error cannot be reproduced on our test system.
From my understanding this error might occur once, but follow up checkpoints / 
kafka commits should be fine again.
Currently my only way of “fixing” the issue is to restart the taskmanagers.

Is there maybe some kafka consumer setting which would help to circumvent this?

Kind regards,
Christian
Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 
München.
Registered with the District Court München HRB 226181
Managing Directors: Frasier, Christopher & Warren, Steve
This e-mail is from Mapp Digital and its international legal entities and may 
contain information that is confidential or proprietary.
If you are not the intended recipient, do not read, copy or distribute the 
e-mail or any attachments. Instead, please notify the sender and delete the 
e-mail and any attachments.
Please consider the environment before printing. Thank you.
Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 
München.
Registered with the District Court München HRB 226181
Managing Directors: Frasier, Christopher & Warren, Steve
This e-mail is from Mapp Digital and its international legal entities and may 
contain information that is confidential or proprietary.
If you are not the intended recipient, do not read, copy or distribute the 
e-mail or any attachments. Instead, please notify the sender and delete the 
e-mail and any attachments.
Please consider the environment before printing. Thank you.


Re: Kafka Consumer commit error

2022-06-13 Thread Alexander Fedulov
Hi Christian,

you should check if the exceptions that you see after the broker is back
from maintenance are the same as the ones you posted here. If you are using
EXACTLY_ONCE, it could be that the later errors are caused by Kafka purging
transactions that Flink attempts to commit [1].

Best,
Alexander Fedulov

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#fault-tolerance

On Mon, Jun 13, 2022 at 12:04 PM Martijn Visser 
wrote:

> Hi Christian,
>
> I would expect that after the broker comes back up and recovers
> completely, these error messages would disappear automagically. It should
> not require a restart (only time). Flink doesn't rely on Kafka's
> checkpointing mechanism for fault tolerance.
>
> Best regards,
>
> Martijn
>
> Op wo 8 jun. 2022 om 15:49 schreef Christian Lorenz <
> christian.lor...@mapp.com>:
>
>> Hi,
>>
>>
>>
>> we have some issues with a job using the flink-sql-connector-kafka (flink
>> 1.15.0/standalone cluster). If one broker e.g. is restarted for
>> maintainance (replication-factor=2), the taskmanagers executing the job are
>> constantly logging errors on each checkpoint creation:
>>
>>
>>
>> Failed to commit consumer offsets for checkpoint 50659
>>
>> org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException:
>> Offset commit failed with a retriable exception. You should retry
>> committing the latest consumed offsets.
>>
>> Caused by:
>> org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException:
>> The coordinator is not available.
>>
>>
>>
>> AFAICT the error itself is produced by the underlying kafka consumer.
>> Unfortunately this error cannot be reproduced on our test system.
>>
>> From my understanding this error might occur once, but follow up
>> checkpoints / kafka commits should be fine again.
>>
>> Currently my only way of “fixing” the issue is to restart the
>> taskmanagers.
>>
>>
>>
>> Is there maybe some kafka consumer setting which would help to circumvent
>> this?
>>
>>
>>
>> Kind regards,
>>
>> Christian
>>
>> Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63,
>> 80335 München.
>> Registered with the District Court München HRB 226181
>> Managing Directors: Frasier, Christopher & Warren, Steve
>>
>> This e-mail is from Mapp Digital and its international legal entities and
>> may contain information that is confidential or proprietary.
>> If you are not the intended recipient, do not read, copy or distribute
>> the e-mail or any attachments. Instead, please notify the sender and delete
>> the e-mail and any attachments.
>> Please consider the environment before printing. Thank you.
>>
>


Re: Kafka Consumer commit error

2022-06-13 Thread Martijn Visser
Hi Christian,

I would expect that after the broker comes back up and recovers completely,
these error messages would disappear automagically. It should not require a
restart (only time). Flink doesn't rely on Kafka's checkpointing mechanism
for fault tolerance.

Best regards,

Martijn

Op wo 8 jun. 2022 om 15:49 schreef Christian Lorenz <
christian.lor...@mapp.com>:

> Hi,
>
>
>
> we have some issues with a job using the flink-sql-connector-kafka (flink
> 1.15.0/standalone cluster). If one broker e.g. is restarted for
> maintainance (replication-factor=2), the taskmanagers executing the job are
> constantly logging errors on each checkpoint creation:
>
>
>
> Failed to commit consumer offsets for checkpoint 50659
>
> org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException:
> Offset commit failed with a retriable exception. You should retry
> committing the latest consumed offsets.
>
> Caused by:
> org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException:
> The coordinator is not available.
>
>
>
> AFAICT the error itself is produced by the underlying kafka consumer.
> Unfortunately this error cannot be reproduced on our test system.
>
> From my understanding this error might occur once, but follow up
> checkpoints / kafka commits should be fine again.
>
> Currently my only way of “fixing” the issue is to restart the taskmanagers.
>
>
>
> Is there maybe some kafka consumer setting which would help to circumvent
> this?
>
>
>
> Kind regards,
>
> Christian
>
> Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63,
> 80335 München.
> Registered with the District Court München HRB 226181
> Managing Directors: Frasier, Christopher & Warren, Steve
>
> This e-mail is from Mapp Digital and its international legal entities and
> may contain information that is confidential or proprietary.
> If you are not the intended recipient, do not read, copy or distribute the
> e-mail or any attachments. Instead, please notify the sender and delete the
> e-mail and any attachments.
> Please consider the environment before printing. Thank you.
>


Kafka Consumer commit error

2022-06-08 Thread Christian Lorenz
Hi,

we have some issues with a job using the flink-sql-connector-kafka (flink 
1.15.0/standalone cluster). If one broker e.g. is restarted for maintainance 
(replication-factor=2), the taskmanagers executing the job are constantly 
logging errors on each checkpoint creation:

Failed to commit consumer offsets for checkpoint 50659
org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException:
 Offset commit failed with a retriable exception. You should retry committing 
the latest consumed offsets.
Caused by: 
org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException:
 The coordinator is not available.

AFAICT the error itself is produced by the underlying kafka consumer. 
Unfortunately this error cannot be reproduced on our test system.
From my understanding this error might occur once, but follow up checkpoints / 
kafka commits should be fine again.
Currently my only way of “fixing” the issue is to restart the taskmanagers.

Is there maybe some kafka consumer setting which would help to circumvent this?

Kind regards,
Christian
Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 
München.
Registered with the District Court München HRB 226181
Managing Directors: Frasier, Christopher & Warren, Steve
This e-mail is from Mapp Digital and its international legal entities and may 
contain information that is confidential or proprietary.
If you are not the intended recipient, do not read, copy or distribute the 
e-mail or any attachments. Instead, please notify the sender and delete the 
e-mail and any attachments.
Please consider the environment before printing. Thank you.