handle SUSPENDED in ZooKeeperLeaderRetrievalService

2021-04-12 Thread chenqin
Hi there, We observed several 1.11 job running in 1.11 restart due to job leader lost. Dig deeper, the issue seems related to SUSPENDED state handler in ZooKeeperLeaderRetrievalService. ASFAIK, suspended state is expected when zk is not certain if leader is still alive. It can follow up with

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-04-06 Thread chenqin
Friendly ping, the fix for entropy marker is ready. -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-30 Thread chenqin
link fix pr here https://github.com/apache/flink/pull/15442 we might need someone help review and merge meanwhile. -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: flink 1.11 class loading question

2021-03-30 Thread chenqin
Hi Till, We did some investigation and found this memory usage point to rocksdbstatebackend running on managed memory. So far we have seen this bug in rocksdbstatebackend on managed memory. we followed suggestion [1] and disabled managed memory management so far not seeing issue. I felt this

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-23 Thread chenqin
Also noticed the actual states stored in _metadata still contains entropy marker after we fix metadata directory issue. This issue seems related to code refactory as well as doesn't conveyed in tests. -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-23 Thread chenqin
make it easier to read @Nullable private static EntropyInjectingFileSystem getEntropyFs(FileSystem fs) { LOG.warn(fs.getClass().toGenericString()); if (fs instanceof EntropyInjectingFileSystem) { return

Re: Flink job cannot find recover path after using entropy injection for s3 file systems

2021-03-23 Thread chenqin
Hi Till, Thanks for sharing pointers related to entropy injection feature on 1.11. We did some investigation and so far it seems like an edge case handling bug. Testing Environment: flink 1.11.2 release with plugins plugins/s3-fs-hadoop/flink-s3-fs-hadoop