RE: incorrect partition map exchange behaviour

Alexandr Shapkin Wed, 13 Jan 2021 07:13:14 -0800

Hi,

As you correctly pointed to the PME implementation details webpage, this is a process of exchanging information about partition holders. And it’s happening on every topology change, cluster deactivation, etc. The process itself is not about data rebalancing, it’s about what node should store a particular partition.

If you want to check whether the data rebalance happened you need to find something like

[2020-01-15 15:46:57,042][INFO ][sys-#50][GridDhtPartitionDemander] Starting rebalance routine [ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], supplier=9e88a103-4465-4e5b-865f-4edaa909fee1, fullPartitions=[0-99], histPartitions=[]]

It also depends on whether your cluster is under load during the rolling upgrade, if there are no updates happening then no data rebalance should happen as well.

I’m not pretty sure about the metric and visor. Anyway you can perform the checks explicitly from code:

ignite.cache("myCache").localSize(CachePeekMode.BACKUP);

ignite.cache("myCache").localSize(CachePeekMode.PRIMARY);

From: tschauenberg
Sent: Friday, January 8, 2021 3:59 AM
To: [email protected]
Subject: incorrect partition map exchange behaviour

Hi,

We have a cluster of Ignite 2.8.1 server nodes and have recently started

looking at the individual cache metrics for primary keys

org.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl.OffHeapPrimaryEntriesCount

In our configuration we have a replicated cache with 2 backups. Our cluster

has 3 nodes in it so the primaries should be spread equally on the 3 nodes

and each node has backups from the other two nodes. All these server nodes

are in the baseline. Additionally we have some thick clients connected but

I don't think they are relevant to the discussion.

Whenever we do a rolling restart one node at a time, at the end after the

last node is restarted it always owns zero primaries and owns solely

backups. The two nodes restarted earlier during the rolling restart own all

the primaries.

When our cluster is in this scenario, if we start and stop visor, when visor

leaves the cluster it triggers a PME where all keys get balanced on all

server nodes. Looking at the visor cache stats between the start and stop

we can see a min of 0 keys on the nodes for our cache so visor and the jmx

metrics line up on that front. After stopping visor, the jmx metrics show

the evenly distributed primaries and then starting visor a second time we

can confirm that again the min, average, max node keys are all evenly

distributed.

Every join and leave during the rolling restart and during visor start/stop

shows reflects a topology increment and node leave and join events in the

logs.

According to

https://cwiki.apache.org/confluence/display/IGNITE/%2528Partition+Map%2529+Exchange+-+under+the+hood

each leave and join should trigger the PME but we only see the keys changing

on the leaves.

Additionally, we tried waiting longer between the stop and start part of the

rolling restart to see if that had any effect. We ensured we waited long

enough for a PME to do any moving but waiting longer didn't have any effect.

The stop always has the PME move the keys off that node and the start never

sees the PME move any primaries back.

Why are we only seeing the PME change keys when nodes (server or visor) stop

and never when they join?

Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: incorrect partition map exchange behaviour

Reply via email to