Sorry but what is exactly the PME ? On Mon, Apr 1, 2019 at 1:55 AM Павлухин Иван <[email protected]> wrote:
> Hi, > > Sorry for the late answer. An observed result seems expected to me. I > suppose following: > 1. EVT_CACHE_REBALANCE_STOPPED is fired when a particular node loaded > all partitions which it will be responsible for. > 2. All nodes it the cluster must become aware that partition > assignment was changed. So, PME will happen to make all nodes aware of > new assignment. > 3. Once PME completes all nodes will consistently treat just entered > node as primary for a corresponding set of partitions. > > Do not hesitate to write back if you feel that something is going wrong. > > вт, 19 мар. 2019 г. в 19:30, Koitoer <[email protected]>: > > > > Hello Igniters > > > > The version of Ignite that we are using is 2.7.0. I'm adding the events > that I want to hear via the IgniteConfiguration using the > `setIncludeEventTypes` > > Then using ignite.event().localListen(listenerPredicate, eventTypes); > > > > EVT_CACHE_REBALANCE_STARTED, > > EVT_CACHE_REBALANCE_STOPPED, > > EVT_CACHE_REBALANCE_PART_LOADED, > > EVT_CACHE_REBALANCE_PART_UNLOADED, > > EVT_CACHE_REBALANCE_PART_DATA_LOST > > > > Once I listen any of the events above, I used > `ignite.affinity(cacheName.name())` to retrieve the Affinity function in > which I'm calling the `primaryPartitions` method or `allPartitions` using > the ClusterNode instance that represents `this` node. > > > > Once I hear the rebalance process stop event I created a thread in > charge of checking the partition assignment as follows. > > > > new Thread(() -> { > > for (int attempt = 0; attempt <= attempts; attempt++) { > > log.info("event=partitionAssignmentRetryLogic attempt={}, > before={}, now={}", attempt, assignedPartitions, > > affinity.primaryPartitions(clusterNode)); > > > > try { > > if (affinity.primaryPartitions(clusterNode).length != 0) { > > log.info("event=partitionAssignmentRetryLogicSuccess"); > > } > > TimeUnit.SECONDS.sleep(delay); > > } catch (Exception e) { > > log.error("event=ErrorOnTimerWait message={}", > e.getMessage(), e); > > } > > } > > }).start(); > > > > > > After a couple of attempts (some seconds), the `primaryPartitions` is > returning the correct set of partitions assigned to a node. I will check > the AffinityAssignment for trying to do this in a cleaner way as you > suggest. > > > > > > On Fri, Mar 15, 2019 at 12:11 PM Павлухин Иван <[email protected]> > wrote: > >> > >> Hi, > >> > >> What Ignite version do you use? > >> How do you register your listener? > >> On what object do you call primaryPartitions/allPartitions? > >> > >> It is true that Ignite uses late affinitly assignment. And it means > >> that for each topology change (node enter or node leave) parttion > >> assigment changes twice. First time temporay backups are created which > >> should be rebalanced from other nodes (EVT_CACHE_REBALANCE_STARTED > >> takes place here). Second time redundant partition replicas should be > >> marked as unusable (and unloaded after that) > >> (EVT_CACHE_REBALANCE_STOPPED). And it is useful to understand that > >> Affinity interface calculates partition distribution using affinity > >> function and such distribution might differ from real partitoin > >> assignment. And it differes when rebalance is in progress. See > >> AffinityAssignment interface. > >> > >> ср, 13 мар. 2019 г. в 21:59, Koitoer <[email protected]>: > >> > > >> > Hi All. > >> > > >> > I'm trying to follow the rebalance events of my ignite cluster so I'm > able to track which partitions are assigned to each node at any point in > time. I am listening to the `EVT_CACHE_REBALANCE_STARTED` and > `EVT_CACHE_REBALANCE_STOPPED` > >> > events from Ignite and that is working well, except in the case one > node crash and another take its place. > >> > > >> > My cluster is 5 nodes. > >> > Ex. Node 1 has let's say 100 partitions, after I kill this node the > partitions that were assigned to it, got rebalance across the entire > cluster, I'm able to track that done with the STOPPED event and checking > the affinity function in each one of them using the `primaryPartitions` > method gives me that, if I add all those numbers I get 1024 partitions, > which is why I was expected. > >> > > >> > However when a new node replaces the previous one, I see a rebalance > process occurs and now I'm getting that some of the partitions `disappear` > from the already existing nodes (which is expected as well as new node will > take some partitions from them) but when the STOPPED event is listened by > this new node if I call the `primaryPartitions` that one returns an empty > list, but if I used the `allPartitions` method that one give me a list (I > think at this point is primary + backups). > >> > > >> > If I let pass some time and I execute the `primaryPartitions` method > again I am able to retrieve the partitions that I was expecting to see > after the STOPPED event comes. I read here > https://cwiki.apache.org/confluence/display/IGNITE/%28Partition+Map%29+Exchange+-+under+the+hood#id-(PartitionMap)Exchange-under > the hood-LateAffinityAssignment that it could be a late assignment, that > after the cache rebalance the new node needs to bring all the entries to > fill-out the cache and after that, the `primaryPartitions` will return > something. > >> > Will be great to know if this actually what is happening. > >> > > >> > My question is if there is any kind of event that I should listen so > I can be aware that this process (if this is what is happening) already > finish. I would like to said, "After you bring this node into the cluster > the partitions assigned to that node are the following: XXX, XXX". > >> > > >> > Also, I'm aware of the event `EVT_CACHE_REBALANCE_PART_LOADED` but > I'm seeing a ton of them and at this point, I would be able to know when > the last one arrives and say that are now my primary partitions. > >> > > >> > Thanks in advance. > >> > >> > >> > >> -- > >> Best regards, > >> Ivan Pavlukhin > > > > > > > > -- > > koitoer .... > > > > -- > Best regards, > Ivan Pavlukhin > -- koitoer ....
