Thank you for your answer, Stephen.
We use the metrics to measure the system, but for our current development we need to know exactly when start and finish a rebalancing. We use the local cache on each server node to feed an external process, local to each node (divide and conquer algorithm 😉). To coordinate those local processes, we implemented a Ignite Service that listen to the events that I described. We try to understand the sequence of events involved to cover all cases. We are specially worried about failing messages (PART_DATA_LOST, PART_MISSED, etc.) to cover properly those events. Do you know about them and why don’t we receive the REBALANCE_STARTED event (that’s weird ☹)? About the other question, you are right, it is better to let the cluster to decide when to move the service. From: Stephen Darlington <sdarling...@apache.org> Sent: miércoles, 4 de diciembre de 2024 18:32 To: user@ignite.apache.org Subject: Re: Rebalancing of an Ignite cluster There are a bunch of "rebalance" cache metrics (https://ignite.apache.org/docs/latest/monitoring-metrics/new-metrics#caches <https://www.google.com/url?q=https://ignite.apache.org/docs/latest/monitoring-metrics/new-metrics%23caches&source=gmail-imap&ust=1733938352000000&usg=AOvVaw2JiC5DpYI8OguJ1pHzuWp1> ). Do they not do what you need? I don't know enough about the internals to guide you on the ordering, but I would say that it's not a good idea to subscribe to CACHE_ENTRY_* events. There could be millions of them. AFAIK, you can't configure a cluster singleton service as you suggest. The idea is that the cluster manages it for you. You can limit which nodes it runs on using cluster groups. It's more common to use node singletons so that you service scales. On Wed, 4 Dec 2024 at 11:18, <jrov...@identy.io <mailto:jrov...@identy.io> > wrote: Hi. We are currently working on a feature to track the rebalancing of our Ignite cluster listening all events: when it starts and end, how many partitions are sent, etc. We implemented a singleton Ignite Service and deploy it on the cluster. When the service starts it listens the following remote events: EVT_NODE_JOINED EVT_NODE_LEFT EVT_NODE_FAILED EVT_NODE_SEGMENTED EVT_CACHE_REBALANCE_STARTED EVT_CACHE_REBALANCE_STOPPED EVT_CACHE_REBALANCE_PART_SUPPLIED EVT_CACHE_REBALANCE_PART_LOADED EVT_CACHE_REBALANCE_PART_UNLOADED EVT_CACHE_REBALANCE_PART_DATA_LOST EVT_CACHE_REBALANCE_PART_MISSED Some questions about our approach: 1. Is the following sequence of events correct? * Baseline: A,B * setBaselineTopology: A, B, C, D * C and D send event EVT_CACHE_REBALANCE_STARTED * A and B send multiple EVT_CACHE_REBALANCE_PART_SUPPLIED with the partitions to rebalance * C and D receive data from A and B * C and D send multiple CACHE_ENTRY_CREATED, CACHE_ENTRY_DESTROYED, CACHE_REBALANCE_OBJECT_LOADED to create new entries * C and D send multiple CACHE_REBALANCE_PART_LOADED when the data is loaded into the new nodes * C and D send event EVT_CACHE_REBALANCE_STOPPED * A and B send multiple CACHE_ENTRY_CREATED, CACHE_ENTRY_DESTROYED, CACHE_REBALANCE_OBJECT_UNLOADED to remove old entries * A and B send multiple EVT_CACHE_REBALANCE_PART_UNLOADED when the partitions are removed 2. As you can see, I receive some CACHE_ENTRY_DESTROYED when C and D are receiving the data. Why are they destroying entries? Also, something similar happens when A and B are removing partitions, I receive some CACHE_ENTRY_CREATED. 3. Sometimes I did not receive an EVT_CACHE_REBALANCE_STARTED from a node that is supposed to start a rebalancing, how is this possible? 4. About the use of services, when the node that contains a service is removed from the cluster, it is translated to another node. Is there a way to do it manually? A mean, for example I know that the node is processing a lot of data, and I want to transfer this service to a different node, is it possible to do it? 5. We want to know if a rebalancing is in course, is there a way to ask the cluster about this? We currently listen the events and maintain a variable with this state, but if the service dies… the variable is lost. Thank you.