Dear Ivan, Thank you for your reply this clarifies the mater for me. Kind regards
Dr. Evangelos Morakis Software Architect > On 8 May 2019, at 09:00, Ivan Pavlukhina <[email protected]> wrote: > > Evangelos, > > Thank you for your feedback! > > Regarding your questions: > 1. Sounds quite widely. To avoid misunderstanding I try to give a general > hint only. Actually a decision to rebalance data or not is made after PME. > So, it is true that not every PME leads to a rebalance. (one more example > when it is not needed is changing ownership of an empty partition). Also you > can observe decisions about rebalance in logs (I guess INFO level). If > something still needs clarification feel free to ask. > 2. Yes it is true. At least for latest versions. > > Sent from my iPhone > >> On 4 May 2019, at 00:49, Evangelos Morakis <[email protected]> wrote: >> >> >> Dear Ivan, >> Thank you very much for you comprehensive answer, may I just say that ignite >> is definitely my favorite “beast” amongst the existing solutions due to its >> versatility and the power it delivers when it comes to designing complex >> distributed solutions as in my specific use case. In any case, based on your >> answer, things are clearer now in regards to ignite’s operation but could >> you just confirm 2 points in order to validate my understanding. >> 1) PME does NOT always result in data rebalancing among nodes since as you >> mention ignite is clever enough to keep primary partitions in existing nodes >> prior to PME, caused by a new server node joining the cluster, to the same >> node as before. In addition if backup partitions have not been defined in >> the config then there should not be any data rebalancing happening is that >> correct ? >> 2) the behavior regarding data rebalancing during PME is as you mention in >> the case where a new server node joins in. What happens if the node is not a >> server but a client node (meaning no data are ever stored locally in that >> node)? Am I correct to assume that in such a case there will NOT be any data >> rebalancing triggered ? >> >> Thank you in advance for your time and effort. >> Kind regards >> >> Dr. Evangelos Morakis >> Software Architect >> >>> On 3 May 2019, at 17:35, Павлухин Иван <[email protected]> wrote: >>> >>> Hi Evangelos and Matt, >>> >>> As far as know there were issues with a join of a client node in >>> previous Ignite versions. In new versions a joining client should not >>> cause any spikes. >>> >>> In fact PME is (unfortunately) a widely known beast in the Ignite >>> world. Fundamentally PME can (and should) perform smooth when new >>> server nodes join the cluster not very frequently. I will bring some >>> details what happens when a new server node joins the cluster. I hope >>> it will help to answer a question 3 from a first message in this >>> thread. >>> >>> As its name hints PME is a process when all nodes agree on a data >>> distribution in the cluster after an events which leads to a >>> redistribution. E.g. such event is node joining. And data distribution >>> is a knowledge that a partition i is located on a node j. And for >>> correct cluster operations each node should agree on the same >>> distribution (consensus). So, it is all about a consistent data >>> distribution. >>> >>> Consquently some data should be rebalanced after nodes come to an >>> agreement on a distribution. And Ignite uses a clever trick to allow >>> operations during data is rebalanced. When new node joins: >>> 1. PME occurs and nodes agree on a same data distribution among nodes. >>> And in that distribution all primary partitions belong to same nodes >>> which they belong before PME. Also temporary backup partitions are >>> assigned to the new node which will become a primary node for those >>> partitions (keep reading). >>> 2. Rebalance starts and delivers a data to the temporary backup >>> partitions* mentioned before. The cluster is fully operational >>> meanwhile. >>> 3. Once rebalance completes another one PME happens. Now the temporary >>> backups become primary (and other redundant partitions are marked for >>> unload). >>> * it worth noting here that a partition was empty and loaded during >>> rebalance is marked as MOVING. It is not readable because it does not >>> containt all data yet, but all writes come to this partition as well >>> in order to make it up to date when rebalnce completes. >>> (In Ignite the described trick is sometimes called "late affinity >>> assignment") >>> >>> So, PME should not be very heavy because it is mainly about >>> establishing an agreement on data distribution. Heavier data rebalance >>> process happens when a cluster is fully operational. But PME still >>> requires a silence period during establishing an agreement. As you >>> might know PME and write operations use a mechanism similar to a >>> read-write lock. Write operations are guarded by that lock in a shared >>> mode. PME acquires that lock in an exclusive mode. So, at any moment >>> we can have either several running write operations or only one >>> running PME. It means that PME have to await all write operations to >>> complete before it can start. Also it blocks all new write operations >>> to start. Therefore long running transactions blocking PME can lead to >>> a prolonged "silence" period. >>> >>> чт, 25 апр. 2019 г. в 00:58, Evangelos Morakis <[email protected]>: >>>> >>>> Matt thank you for your reply, >>>> Indeed I saw your question too yesterday. In regards to points 3-4 of my >>>> question I suppose that as you mention, if one shuts down gracefully the >>>> client node and if the number of threads responsible for rebalancing the >>>> data gets tweaked, then I guess the amount of time the cluster blocks >>>> could be managed. For point 2 I think it’s necessary for someone from the >>>> dev team to provide a bit more insight as to what ignite’s behavior is in >>>> regards to client nodes joining/leaving the cluster as I fail to >>>> understand why PEM is triggered for such nodes given their natural >>>> exclusion from computations and the lack of storage of cache data in >>>> them. Indeed if the case is that PEM is triggered for client nodes when >>>> joining/leaving, scenarios where remote clients come and go on demand >>>> become difficult to accommodate at best, and this sounds very >>>> restrictive. I simply need to know more on this otherwise it would not be >>>> possible to develop a working strategy for accommodating clients that >>>> come, do a bit of work, and then they leave until next time. >>>> >>>> Kind regards >>>> >>>> Dr. Evangelos Morakis >>>> Software Architect >>>> >>>>> On 24 Apr 2019, at 21:21, MattNohelty <[email protected]> wrote: >>>>> >>>>> I have these same questions and posted about this yesterday >>>>> (http://apache-ignite-users.70518.x6.nabble.com/What-happens-when-a-client-gets-disconnected-td27959.html). >>>>> Based on my understanding: >>>>> >>>>> 1) Yes, PME will always happen when a server node joins >>>>> >>>>> 2) This is my biggest question. I'm currently using 2.4 and it appears >>>>> PME >>>>> is happening when a client connects or disconnects but I received one >>>>> response that seemed to indicate that PME should not happen in this case >>>>> in >>>>> the newest versions of Ignite. I agree with your reasoning that these >>>>> rebalancing processes do not seem necessary as all the data is on the >>>>> server >>>>> nodes which is what prompted my initial question. >>>>> >>>>> 3) The responses I received do say that the cluster blocks while this >>>>> happens and I've seen evidence of this as well. I've only seen >>>>> substantial >>>>> blocking though when a client node is disconnected ungracefully. When the >>>>> start or stop properly, we do not observe substantial blocking on the >>>>> other >>>>> clients. >>>>> >>>>> This behavior has caused some issues for us recently and it seems very >>>>> problematic that one client node crashing can cause issues on all other >>>>> client nodes. Granted, we are still on Ignite 2.4 so maybe this has been >>>>> correct in 2.7, but I would really like to understand what the expected >>>>> behavior should be. >>>>> >>>>> >>>>> >>>>> -- >>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >>> >>> >>> >>> -- >>> Best regards, >>> Ivan Pavlukhin
