Hi, As you correctly pointed to the PME implementation details webpage, this is a process of exchanging information about partition holders. And it’s happening on every topology change, cluster deactivation, etc. The process itself is not about data rebalancing, it’s about what node should store a particular partition. If you want to check whether the data rebalance happened you need to find something like [2020-01-15 15:46:57,042][INFO ][sys-#50][GridDhtPartitionDemander] Starting rebalance routine [ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], supplier=9e88a103-4465-4e5b-865f-4edaa909fee1, fullPartitions=[0-99], histPartitions=[]] It also depends on whether your cluster is under load during the rolling upgrade, if there are no updates happening then no data rebalance should happen as well. I’m not pretty sure about the metric and visor. Anyway you can perform the checks explicitly from code: ignite.cache("myCache").localSize(CachePeekMode.BACKUP); ignite.cache("myCache").localSize(CachePeekMode.PRIMARY); From: tschauenberg Hi, We have a cluster of Ignite 2.8.1 server nodes and have recently started looking at the individual cache metrics for primary keys org.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl.OffHeapPrimaryEntriesCount In our configuration we have a replicated cache with 2 backups. Our cluster has 3 nodes in it so the primaries should be spread equally on the 3 nodes and each node has backups from the other two nodes. All these server nodes are in the baseline. Additionally we have some thick clients connected but I don't think they are relevant to the discussion. Whenever we do a rolling restart one node at a time, at the end after the last node is restarted it always owns zero primaries and owns solely backups. The two nodes restarted earlier during the rolling restart own all the primaries. When our cluster is in this scenario, if we start and stop visor, when visor leaves the cluster it triggers a PME where all keys get balanced on all server nodes. Looking at the visor cache stats between the start and stop we can see a min of 0 keys on the nodes for our cache so visor and the jmx metrics line up on that front. After stopping visor, the jmx metrics show the evenly distributed primaries and then starting visor a second time we can confirm that again the min, average, max node keys are all evenly distributed. Every join and leave during the rolling restart and during visor start/stop shows reflects a topology increment and node leave and join events in the logs. According to https://cwiki.apache.org/confluence/display/IGNITE/%2528Partition+Map%2529+Exchange+-+under+the+hood each leave and join should trigger the PME but we only see the keys changing on the leaves. Additionally, we tried waiting longer between the stop and start part of the rolling restart to see if that had any effect. We ensured we waited long enough for a PME to do any moving but waiting longer didn't have any effect. The stop always has the PME move the keys off that node and the start never sees the PME move any primaries back. Why are we only seeing the PME change keys when nodes (server or visor) stop and never when they join? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/ |
- incorrect partition map exchange behaviour tschauenberg
- Re: incorrect partition map exchange behaviour tschauenberg
- Re: incorrect partition map exchange behaviour Ilya Kasnacheev
- Re: incorrect partition map exchange behavio... tschauenberg
- RE: incorrect partition map exchange behaviour Alexandr Shapkin
- RE: incorrect partition map exchange behaviour tschauenberg
- Re: incorrect partition map exchange behavio... Ilya Kasnacheev
