Re: Flink 1.18: Unable to resume from a savepoint with error InvalidPidMappingException

2024-04-23 Thread Yanfei Lei
Hi JM, > why having "transactional.id.expiration.ms" < "transaction.timeout.ms" helps When recover a job from a checkpoint/savepoint which contains Kafka transactions, Flink will try to re-commit those transactions based on transaction ID upon recovery. If those transactions timeout or

RE: Flink 1.18: Unable to resume from a savepoint with error InvalidPidMappingException

2024-04-23 Thread Jean-Marc Paulin
Thanks for y our insight. I am still trying to understand exactly what happens here. We currently have the default setting in kafka, and we set the "transaction.timeout.ms" to 15 minutes (which also happen to be the default "transaction.max.timeout.ms". My expectation would be that if our

Re: Flink 1.18: Unable to resume from a savepoint with error InvalidPidMappingException

2024-04-21 Thread Yanfei Lei
Hi JM, Yes, `InvalidPidMappingException` occurs because the transaction is lost in most cases. For short-term, " transaction.timeout.ms" > "transactional.id.expiration.ms" can ignore the `InvalidPidMappingException`[1]. For long-term, FLIP-319[2] provides a solution. [1]

Flink 1.18: Unable to resume from a savepoint with error InvalidPidMappingException

2024-04-19 Thread Jean-Marc Paulin
Hi, we use Flink 1.18 with Kafka Sink, and we enabled `EXACTLY_ONCE` on one of our kafka sink. We set the transation timeout to 15 minutes. When we try to restore from a savepoint, way after that 15 minutes window, Flink enter in a RESTARTING loop. We see the error: ``` { "exception": {