[jira] [Updated] (IGNITE-17507) Failed to wait for partition map exchange on some clients
[ https://issues.apache.org/jira/browse/IGNITE-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-17507: - Release Note: Fixed an issue that could lead to unexpected partition map exchange on client nodes. > Failed to wait for partition map exchange on some clients > - > > Key: IGNITE-17507 > URL: https://issues.apache.org/jira/browse/IGNITE-17507 > Project: Ignite > Issue Type: Bug >Reporter: Vyacheslav Koptilin >Assignee: Vyacheslav Koptilin >Priority: Major > Fix For: 2.14 > > Time Spent: 0.5h > Remaining Estimate: 0h > > We have scenario with several client and server nodes, which can stuck on PME > after start: > * Start some server nodes > * Trigger rebalance > * Start some client and server nodes > * Some of the client nodes stuck with _Failed to wait for partition map > exchange [topVer=AffinityTopologyVersion…_ > Deep investigation of the logs showed, that the root cause of the stuck PME > on client is the race between joining new client node and receiving stale > _CacheAffinityChangeMessage_ on a client, which causes PME, but when other > old nodes receive this _CacheAffinityChangeMessage_, they skip it because of > some optimization. > Optimization can be found in the method > _CacheAffinitySharedManager#onDiscoveryEvent_, we save _lastAffVer = topVer_ > for old nodes, but because of some race _lastAffVer_ for the problem client > node is null when we reach _CacheAffinitySharedManager#onCustomEvent_ and we > schedule invalid PME in _msg.exchangeNeeded(exchangeNeeded)_, but other > nodes skip this PME > The possible fix is that we can try to make the _CacheAffinityChangeMessage_ > mutable (mutable discovery custom message). It allows to modify the message > before sending it across the ring. This approach does not require to make a > decision to apply or skip the message on client nodes, the required flag will > be transferred from a server node. In case of using Zookeeper Discovery, > there is no ability to mutate discovery messages. However is is possible to > mutate the message on the coordinator node (this requires adding > _stopProcess_ flag in _DiscoveryCustomMessage_ which was removed by > IGNITE-12400). This is quite enough for our case. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-17507) Failed to wait for partition map exchange on some clients
[ https://issues.apache.org/jira/browse/IGNITE-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-17507: - Ignite Flags: Release Notes Required > Failed to wait for partition map exchange on some clients > - > > Key: IGNITE-17507 > URL: https://issues.apache.org/jira/browse/IGNITE-17507 > Project: Ignite > Issue Type: Bug >Reporter: Vyacheslav Koptilin >Assignee: Vyacheslav Koptilin >Priority: Major > Fix For: 2.14 > > Time Spent: 0.5h > Remaining Estimate: 0h > > We have scenario with several client and server nodes, which can stuck on PME > after start: > * Start some server nodes > * Trigger rebalance > * Start some client and server nodes > * Some of the client nodes stuck with _Failed to wait for partition map > exchange [topVer=AffinityTopologyVersion…_ > Deep investigation of the logs showed, that the root cause of the stuck PME > on client is the race between joining new client node and receiving stale > _CacheAffinityChangeMessage_ on a client, which causes PME, but when other > old nodes receive this _CacheAffinityChangeMessage_, they skip it because of > some optimization. > Optimization can be found in the method > _CacheAffinitySharedManager#onDiscoveryEvent_, we save _lastAffVer = topVer_ > for old nodes, but because of some race _lastAffVer_ for the problem client > node is null when we reach _CacheAffinitySharedManager#onCustomEvent_ and we > schedule invalid PME in _msg.exchangeNeeded(exchangeNeeded)_, but other > nodes skip this PME > The possible fix is that we can try to make the _CacheAffinityChangeMessage_ > mutable (mutable discovery custom message). It allows to modify the message > before sending it across the ring. This approach does not require to make a > decision to apply or skip the message on client nodes, the required flag will > be transferred from a server node. In case of using Zookeeper Discovery, > there is no ability to mutate discovery messages. However is is possible to > mutate the message on the coordinator node (this requires adding > _stopProcess_ flag in _DiscoveryCustomMessage_ which was removed by > IGNITE-12400). This is quite enough for our case. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-17507) Failed to wait for partition map exchange on some clients
[ https://issues.apache.org/jira/browse/IGNITE-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-17507: - Reviewer: Ivan Daschinsky > Failed to wait for partition map exchange on some clients > - > > Key: IGNITE-17507 > URL: https://issues.apache.org/jira/browse/IGNITE-17507 > Project: Ignite > Issue Type: Bug >Reporter: Vyacheslav Koptilin >Assignee: Vyacheslav Koptilin >Priority: Major > Fix For: 2.14 > > Time Spent: 10m > Remaining Estimate: 0h > > We have scenario with several client and server nodes, which can stuck on PME > after start: > * Start some server nodes > * Trigger rebalance > * Start some client and server nodes > * Some of the client nodes stuck with _Failed to wait for partition map > exchange [topVer=AffinityTopologyVersion…_ > Deep investigation of the logs showed, that the root cause of the stuck PME > on client is the race between joining new client node and receiving stale > _CacheAffinityChangeMessage_ on a client, which causes PME, but when other > old nodes receive this _CacheAffinityChangeMessage_, they skip it because of > some optimization. > Optimization can be found in the method > _CacheAffinitySharedManager#onDiscoveryEvent_, we save _lastAffVer = topVer_ > for old nodes, but because of some race _lastAffVer_ for the problem client > node is null when we reach _CacheAffinitySharedManager#onCustomEvent_ and we > schedule invalid PME in _msg.exchangeNeeded(exchangeNeeded)_, but other > nodes skip this PME > The possible fix is that we can try to make the _CacheAffinityChangeMessage_ > mutable (mutable discovery custom message). It allows to modify the message > before sending it across the ring. This approach does not require to make a > decision to apply or skip the message on client nodes, the required flag will > be transferred from a server node. In case of using Zookeeper Discovery, > there is no ability to mutate discovery messages. However is is possible to > mutate the message on the coordinator node (this requires adding > _stopProcess_ flag in _DiscoveryCustomMessage_ which was removed by > IGNITE-12400). This is quite enough for our case. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IGNITE-17507) Failed to wait for partition map exchange on some clients
[ https://issues.apache.org/jira/browse/IGNITE-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-17507: - Description: We have scenario with several client and server nodes, which can stuck on PME after start: * Start some server nodes * Trigger rebalance * Start some client and server nodes * Some of the client nodes stuck with _Failed to wait for partition map exchange [topVer=AffinityTopologyVersion…_ Deep investigation of the logs showed, that the root cause of the stuck PME on client is the race between joining new client node and receiving stale _CacheAffinityChangeMessage_ on a client, which causes PME, but when other old nodes receive this _CacheAffinityChangeMessage_, they skip it because of some optimization. Optimization can be found in the method _CacheAffinitySharedManager#onDiscoveryEvent_, we save _lastAffVer = topVer_ for old nodes, but because of some race _lastAffVer_ for the problem client node is null when we reach _CacheAffinitySharedManager#onCustomEvent_ and we schedule invalid PME in _msg.exchangeNeeded(exchangeNeeded)_, but other nodes skip this PME The possible fix is that we can try to make the _CacheAffinityChangeMessage_ mutable (mutable discovery custom message). It allows to modify the message before sending it across the ring. This approach does not require to make a decision to apply or skip the message on client nodes, the required flag will be transferred from a server node. In case of using Zookeeper Discovery, there is no ability to mutate discovery messages. However is is possible to mutate the message on the coordinator node (this requires adding _stopProcess_ flag in _DiscoveryCustomMessage_ which was removed by IGNITE-12400). This is quite enough for our case. was: We have scenario with several client and server nodes, which can stuck on PME after start: * Start some server nodes * Trigger rebalance * Start some client and server nodes * Some of the client nodes stuck with _Failed to wait for partition map exchange [topVer=AffinityTopologyVersion…_ Deep investigation of the logs showed, that the root cause of the stuck PME on client is the race between joining new client node and receiving stale _CacheAffinityChangeMessage_ on a client, which causes PME, but when other old nodes receive this _CacheAffinityChangeMessage_, they skip it because of some optimization. Optimization can be found in the method _CacheAffinitySharedManager#onDiscoveryEvent_, we save _lastAffVer = topVer_ for old nodes, but because of some race _lastAffVer_ for the problem client node is null when we reach _CacheAffinitySharedManager#onCustomEvent_ and we schedule invalid PME in _msg.exchangeNeeded(exchangeNeeded)_, but other nodes skip this PME The possible fix is that we can try to make the _CacheAffinityChangeMessage_ mutable (mutable discovery custom message). It allows to modify the message before sending it across the ring. This approach does not require to make a decision to apply or skip the message on client nodes, the required flag will be transferred from a server node. In case of using Zookeeper Discovery, there is no ability to mutate discovery messages. However is is possible to mutate the message on the coordinator node. This is quite enough for our case. > Failed to wait for partition map exchange on some clients > - > > Key: IGNITE-17507 > URL: https://issues.apache.org/jira/browse/IGNITE-17507 > Project: Ignite > Issue Type: Bug >Reporter: Vyacheslav Koptilin >Assignee: Vyacheslav Koptilin >Priority: Major > > We have scenario with several client and server nodes, which can stuck on PME > after start: > * Start some server nodes > * Trigger rebalance > * Start some client and server nodes > * Some of the client nodes stuck with _Failed to wait for partition map > exchange [topVer=AffinityTopologyVersion…_ > Deep investigation of the logs showed, that the root cause of the stuck PME > on client is the race between joining new client node and receiving stale > _CacheAffinityChangeMessage_ on a client, which causes PME, but when other > old nodes receive this _CacheAffinityChangeMessage_, they skip it because of > some optimization. > Optimization can be found in the method > _CacheAffinitySharedManager#onDiscoveryEvent_, we save _lastAffVer = topVer_ > for old nodes, but because of some race _lastAffVer_ for the problem client > node is null when we reach _CacheAffinitySharedManager#onCustomEvent_ and we > schedule invalid PME in _msg.exchangeNeeded(exchangeNeeded)_, but other > nodes skip this PME > The possible fix is that we can try to make the _CacheAffinityChangeMessage_ > mutable (mutable discovery custom message). It allows to modify the message >
[jira] [Updated] (IGNITE-17507) Failed to wait for partition map exchange on some clients
[ https://issues.apache.org/jira/browse/IGNITE-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-17507: - Description: We have scenario with several client and server nodes, which can stuck on PME after start: * Start some server nodes * Trigger rebalance * Start some client and server nodes * Some of the client nodes stuck with _Failed to wait for partition map exchange [topVer=AffinityTopologyVersion…_ Deep investigation of the logs showed, that the root cause of the stuck PME on client is the race between joining new client node and receiving stale _CacheAffinityChangeMessage_ on a client, which causes PME, but when other old nodes receive this _CacheAffinityChangeMessage_, they skip it because of some optimization. Optimization can be found in the method _CacheAffinitySharedManager#onDiscoveryEvent_, we save _lastAffVer = topVer_ for old nodes, but because of some race _lastAffVer_ for the problem client node is null when we reach _CacheAffinitySharedManager#onCustomEvent_ and we schedule invalid PME in _msg.exchangeNeeded(exchangeNeeded)_, but other nodes skip this PME The possible fix is that we can try to make the _CacheAffinityChangeMessage_ mutable (mutable discovery custom message). It allows to modify the message before sending it across the ring. This approach does not require to make a decision to apply or skip the message on client nodes, the required flag will be transferred from a server node. In case of using Zookeeper Discovery, there is no ability to mutate discovery messages. However is is possible to mutate the message on the coordinator node. This is quite enough for our case. was: We have scenario with several client and server nodes, which can stuck on PME after start: * Start some server nodes * Trigger rebalance * Start some client and server nodes * Some of the client nodes stuck with _Failed to wait for partition map exchange [topVer=AffinityTopologyVersion…_ Deep investigation of the logs showed, that the root cause of the stuck PME on client is the race between joining new client node and receiving stale _CacheAffinityChangeMessage _on a client, which causes PME, but when other old nodes receive this _CacheAffinityChangeMessage_, they skip it because of some optimization. Optimization can be found in the method _CacheAffinitySharedManager#onDiscoveryEvent_ , we save _lastAffVer = topVer_ for old nodes, but because of some race _lastAffVer_ for the problem client node is null when we reach _CacheAffinitySharedManager#onCustomEvent_ and we schedule invalid PME in _msg.exchangeNeeded(exchangeNeeded)_ , but other nodes skip this PME The possible fix is that we can try to make the _CacheAffinityChangeMessage_ mutable (mutable discovery custom message). It allows to modify the message before sending it across the ring. This approach does not require to make a decision to apply or skip the message on client nodes, the required flag will be transferred from a server node. In case of using Zookeeper Discovery, there is no ability to mutate discovery messages. However is is possible to mutate the message on the coordinator node. This is quite enough for our case. > Failed to wait for partition map exchange on some clients > - > > Key: IGNITE-17507 > URL: https://issues.apache.org/jira/browse/IGNITE-17507 > Project: Ignite > Issue Type: Bug >Reporter: Vyacheslav Koptilin >Assignee: Vyacheslav Koptilin >Priority: Major > > We have scenario with several client and server nodes, which can stuck on PME > after start: > * Start some server nodes > * Trigger rebalance > * Start some client and server nodes > * Some of the client nodes stuck with _Failed to wait for partition map > exchange [topVer=AffinityTopologyVersion…_ > Deep investigation of the logs showed, that the root cause of the stuck PME > on client is the race between joining new client node and receiving stale > _CacheAffinityChangeMessage_ on a client, which causes PME, but when other > old nodes receive this _CacheAffinityChangeMessage_, they skip it because of > some optimization. > Optimization can be found in the method > _CacheAffinitySharedManager#onDiscoveryEvent_, we save _lastAffVer = topVer_ > for old nodes, but because of some race _lastAffVer_ for the problem client > node is null when we reach _CacheAffinitySharedManager#onCustomEvent_ and we > schedule invalid PME in _msg.exchangeNeeded(exchangeNeeded)_, but other > nodes skip this PME > The possible fix is that we can try to make the _CacheAffinityChangeMessage_ > mutable (mutable discovery custom message). It allows to modify the message > before sending it across the ring. This approach does not require to make a > decision to apply or skip
[jira] [Updated] (IGNITE-17507) Failed to wait for partition map exchange on some clients
[ https://issues.apache.org/jira/browse/IGNITE-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-17507: - Description: We have scenario with several client and server nodes, which can stuck on PME after start: * Start some server nodes * Trigger rebalance * Start some client and server nodes * Some of the client nodes stuck with _Failed to wait for partition map exchange [topVer=AffinityTopologyVersion…_ Deep investigation of the logs showed, that the root cause of the stuck PME on client is the race between joining new client node and receiving stale _CacheAffinityChangeMessage _on a client, which causes PME, but when other old nodes receive this _CacheAffinityChangeMessage_, they skip it because of some optimization. Optimization can be found in the method _CacheAffinitySharedManager#onDiscoveryEvent_ , we save _lastAffVer = topVer_ for old nodes, but because of some race _lastAffVer_ for the problem client node is null when we reach _CacheAffinitySharedManager#onCustomEvent_ and we schedule invalid PME in _msg.exchangeNeeded(exchangeNeeded)_ , but other nodes skip this PME The possible fix is that we can try to make the _CacheAffinityChangeMessage_ mutable (mutable discovery custom message). It allows to modify the message before sending it across the ring. This approach does not require to make a decision to apply or skip the message on client nodes, the required flag will be transferred from a server node. In case of using Zookeeper Discovery, there is no ability to mutate discovery messages. However is is possible to mutate the message on the coordinator node. This is quite enough for our case. was: We have scenario with several client and server nodes, which can stuck on PME after start: * Start some server nodes * Trigger rebalance * Start some client and server nodes * Some of the client nodes stuck with _Failed to wait for partition map exchange [topVer=AffinityTopologyVersion…_ Deep investigation of the logs showed, that the root cause of the stuck PME on client is the race between joining new client node and receiving stale _CacheAffinityChangeMessage _on a client, which causes PME, but when other old nodes receive this _CacheAffinityChangeMessage_, they skip it because of some optimization. Optimization can be found in the method _CacheAffinitySharedManager#onDiscoveryEvent_, we save _lastAffVer = topVer_ for old nodes, but because of some race _lastAffVer_ for the problem client node is null when we reach _CacheAffinitySharedManager#onCustomEvent_ and we schedule invalid PME in _msg.exchangeNeeded(exchangeNeeded)_, but other nodes skip this PME The possible fix is that we can try to make the _CacheAffinityChangeMessage_ mutable (mutable discovery custom message). It allows to modify the message before sending it across the ring. This approach does not require to make a decision to apply or skip the message on client nodes, the required flag will be transferred from a server node. In case of using Zookeeper Discovery, there is no ability to mutate discovery messages. However is is possible to mutate the message on the coordinator node. This is quite enough for our case. > Failed to wait for partition map exchange on some clients > - > > Key: IGNITE-17507 > URL: https://issues.apache.org/jira/browse/IGNITE-17507 > Project: Ignite > Issue Type: Bug >Reporter: Vyacheslav Koptilin >Assignee: Vyacheslav Koptilin >Priority: Major > > We have scenario with several client and server nodes, which can stuck on PME > after start: > * Start some server nodes > * Trigger rebalance > * Start some client and server nodes > * Some of the client nodes stuck with _Failed to wait for partition map > exchange [topVer=AffinityTopologyVersion…_ > Deep investigation of the logs showed, that the root cause of the stuck PME > on client is the race between joining new client node and receiving stale > _CacheAffinityChangeMessage _on a client, which causes PME, but when other > old nodes receive this _CacheAffinityChangeMessage_, they skip it because of > some optimization. > Optimization can be found in the method > _CacheAffinitySharedManager#onDiscoveryEvent_ , we save _lastAffVer = topVer_ > for old nodes, but because of some race _lastAffVer_ for the problem client > node is null when we reach _CacheAffinitySharedManager#onCustomEvent_ and we > schedule invalid PME in _msg.exchangeNeeded(exchangeNeeded)_ , but other > nodes skip this PME > The possible fix is that we can try to make the _CacheAffinityChangeMessage_ > mutable (mutable discovery custom message). It allows to modify the message > before sending it across the ring. This approach does not require to make a > decision to apply or
[jira] [Updated] (IGNITE-17507) Failed to wait for partition map exchange on some clients
[ https://issues.apache.org/jira/browse/IGNITE-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-17507: - Description: We have scenario with several client and server nodes, which can stuck on PME after start: * Start some server nodes * Trigger rebalance * Start some client and server nodes * Some of the client nodes stuck with _Failed to wait for partition map exchange [topVer=AffinityTopologyVersion…_ Deep investigation of the logs showed, that the root cause of the stuck PME on client is the race between joining new client node and receiving stale _CacheAffinityChangeMessage _on a client, which causes PME, but when other old nodes receive this _CacheAffinityChangeMessage_, they skip it because of some optimization. Optimization can be found in the method _CacheAffinitySharedManager#onDiscoveryEvent_, we save _lastAffVer = topVer_ for old nodes, but because of some race _lastAffVer_ for the problem client node is null when we reach _CacheAffinitySharedManager#onCustomEvent_ and we schedule invalid PME in _msg.exchangeNeeded(exchangeNeeded)_, but other nodes skip this PME The possible fix is that we can try to make the _CacheAffinityChangeMessage_ mutable (mutable discovery custom message). It allows to modify the message before sending it across the ring. This approach does not require to make a decision to apply or skip the message on client nodes, the required flag will be transferred from a server node. In case of using Zookeeper Discovery, there is no ability to mutate discovery messages. However is is possible to mutate the message on the coordinator node. This is quite enough for our case. was: We have scenario with several client and server nodes, which can stuck on PME after start: * Start some server nodes * Trigger rebalance * Start some client and server nodes * Some of the client nodes stuck with Failed to wait for partition map exchange [topVer=AffinityTopologyVersion… Deep investigation of the logs showed, that the root cause of the stuck PME on client is the race between joining new client node and receiving stale CacheAffinityChangeMessage on a client, which causes PME, but when other old nodes receive this CacheAffinityChangeMessage, they skip it because of some optimization. Optimization can be found in the method CacheAffinitySharedManager#onDiscoveryEvent, we save lastAffVer = topVer; for old nodes, but because of some race lastAffVer for the problem client node is null when we reach CacheAffinitySharedManager#onCustomEvent and we schedule invalid PME in msg.exchangeNeeded(exchangeNeeded);, but other nodes skip this PME The possible fix is that we can try to make the _CacheAffinityChangeMessage _mutable (mutable discovery custom message). It allows to modify the message before sending it across the ring. This approach does not require to make a decision to apply or skip the message on client nodes, the required flag will be transferred from a server node. In case of using Zookeeper Discovery, there is no ability to mutate discovery messages. However is is possible to mutate the message on the coordinator node. This is quite enough for our case. TeamCity does not demonstrates any issue with this approach. > Failed to wait for partition map exchange on some clients > - > > Key: IGNITE-17507 > URL: https://issues.apache.org/jira/browse/IGNITE-17507 > Project: Ignite > Issue Type: Bug >Reporter: Vyacheslav Koptilin >Assignee: Vyacheslav Koptilin >Priority: Major > > We have scenario with several client and server nodes, which can stuck on PME > after start: > * Start some server nodes > * Trigger rebalance > * Start some client and server nodes > * Some of the client nodes stuck with _Failed to wait for partition map > exchange [topVer=AffinityTopologyVersion…_ > Deep investigation of the logs showed, that the root cause of the stuck PME > on client is the race between joining new client node and receiving stale > _CacheAffinityChangeMessage _on a client, which causes PME, but when other > old nodes receive this _CacheAffinityChangeMessage_, they skip it because of > some optimization. > Optimization can be found in the method > _CacheAffinitySharedManager#onDiscoveryEvent_, we save _lastAffVer = topVer_ > for old nodes, but because of some race _lastAffVer_ for the problem client > node is null when we reach _CacheAffinitySharedManager#onCustomEvent_ and we > schedule invalid PME in _msg.exchangeNeeded(exchangeNeeded)_, but other > nodes skip this PME > The possible fix is that we can try to make the _CacheAffinityChangeMessage_ > mutable (mutable discovery custom message). It allows to modify the message > before sending it across the ring. This approach does not
[jira] [Updated] (IGNITE-17507) Failed to wait for partition map exchange on some clients
[ https://issues.apache.org/jira/browse/IGNITE-17507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vyacheslav Koptilin updated IGNITE-17507: - Ignite Flags: (was: Docs Required,Release Notes Required) > Failed to wait for partition map exchange on some clients > - > > Key: IGNITE-17507 > URL: https://issues.apache.org/jira/browse/IGNITE-17507 > Project: Ignite > Issue Type: Bug >Reporter: Vyacheslav Koptilin >Assignee: Vyacheslav Koptilin >Priority: Major > > We have scenario with several client and server nodes, which can stuck on PME > after start: > * Start some server nodes > * Trigger rebalance > * Start some client and server nodes > * Some of the client nodes stuck with Failed to wait for partition map > exchange [topVer=AffinityTopologyVersion… > Deep investigation of the logs showed, that the root cause of the stuck PME > on client is the race between joining new client node and receiving stale > CacheAffinityChangeMessage on a client, which causes PME, but when other old > nodes receive this CacheAffinityChangeMessage, they skip it because of some > optimization. > Optimization can be found in the method > CacheAffinitySharedManager#onDiscoveryEvent, we save lastAffVer = topVer; for > old nodes, but because of some race lastAffVer for the problem client node is > null when we reach CacheAffinitySharedManager#onCustomEvent and we schedule > invalid PME in msg.exchangeNeeded(exchangeNeeded);, but other nodes skip > this PME > The possible fix is that we can try to make the _CacheAffinityChangeMessage > _mutable (mutable discovery custom message). It allows to modify the message > before sending it across the ring. This approach does not require to make a > decision to apply or skip the message on client nodes, the required flag will > be transferred from a server node. In case of using Zookeeper Discovery, > there is no ability to mutate discovery messages. However is is possible to > mutate the message on the coordinator node. This is quite enough for our > case. TeamCity does not demonstrates any issue with this approach. -- This message was sent by Atlassian Jira (v8.20.10#820010)