Hi,

We have a cluster of Ignite 2.8.1 server nodes and have recently started
looking at the individual cache metrics for primary keys
org.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl.OffHeapPrimaryEntriesCount

In our configuration we have a replicated cache with 2 backups.  Our cluster
has 3 nodes in it so the primaries should be spread equally on the 3 nodes
and each node has backups from the other two nodes.  All these server nodes
are in the baseline.  Additionally we have some thick clients connected but
I don't think they are relevant to the discussion.

Whenever we do a rolling restart one node at a time, at the end after the
last node is restarted it always owns zero primaries and owns solely
backups.  The two nodes restarted earlier during the rolling restart own all
the primaries.

When our cluster is in this scenario, if we start and stop visor, when visor
leaves the cluster it triggers a PME where all keys get balanced on all
server nodes.  Looking at the visor cache stats between the start and stop
we can see a min of 0 keys on the nodes for our cache so visor and the jmx
metrics line up on that front.  After stopping visor, the jmx metrics show
the evenly distributed primaries and then starting visor a second time we
can confirm that again the min, average, max node keys are all evenly
distributed.

Every join and leave during the rolling restart and during visor start/stop
shows reflects a topology increment and node leave and join events in the
logs.  

According to
https://cwiki.apache.org/confluence/display/IGNITE/%2528Partition+Map%2529+Exchange+-+under+the+hood
each leave and join should trigger the PME but we only see the keys changing
on the leaves.

Additionally, we tried waiting longer between the stop and start part of the
rolling restart to see if that had any effect.  We ensured we waited long
enough for a PME to do any moving but waiting longer didn't have any effect. 
The stop always has the PME move the keys off that node and the start never
sees the PME move any primaries back.

Why are we only seeing the PME change keys when nodes (server or visor) stop
and never when they join?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to