Hello,

Out cluster occasionally fails with "partition map exchange failure"
errors, I have searched around and it seems that a lot of people have had a
similar issue in the past. My high-level understanding is that when one of
the nodes fails (out of memory, exception, GC etc.) nodes fail to exchange
partition maps. However, I have a few questions
1) When does partition map exchange happen? Periodically, when a node
joins, etc.
2) Is it done in the same thread as communication SPI, or is a separate
worker?
3) How does the exchange happen? Via a coordinator, peer to peer, etc?
4) What does the exchange block?
5) When is the exchange retried?
5) How to resolve the error? The only thing I have seen online is to
decrease failureDetectionTimeout

Our settings are
- Zookeeper SPI
- Persistence enabled

Cheers,
Eugene

Reply via email to