This is an automated email from the ASF dual-hosted git repository. agoncharuk pushed a commit to branch ignite-14393 in repository https://gitbox.apache.org/repos/asf/ignite-3.git
The following commit(s) were added to refs/heads/ignite-14393 by this push: new 2687266 IGNITE-14393 Components interaction 2687266 is described below commit 26872663e3bcf574bd05507b08f68f6f5db0b14b Author: Alexey Goncharuk <alexey.goncha...@gmail.com> AuthorDate: Mon Apr 5 13:04:56 2021 +0300 IGNITE-14393 Components interaction --- modules/runner/README.md | 55 ++++++++++++++++++++++++++++++++++-------------- 1 file changed, 39 insertions(+), 16 deletions(-) diff --git a/modules/runner/README.md b/modules/runner/README.md index 6776168..47a1927 100644 --- a/modules/runner/README.md +++ b/modules/runner/README.md @@ -46,10 +46,9 @@ enforces serveral rules related to the components interaction: <!-- Change /svg/... to /uml/... here to view the image UML. --> -![Example components dependency graph](http://www.plantuml.com/plantuml/svg/VP7F2i8m3CRlVOeUzR22Vxm8Ze5ud1V5UxO6dNQjT9c-lcMwactWfMto_KBIBrb3uAcjZN357KkI5sWDYUwKM-cyHYztSY5w_OZqTluXVZTZbf79aQ0Xv-6AWhqPkuhigRnX3JdO07WnY1Y_ZlmP2eCzpW9ERyIWRPAaSMnUqGaCd8XVCoRflpF7ikXGecEAezFs8rbfbSf-4V2izc3CR3xucMe3CZB1L0sXHQ-SFRpGKYZ9qEgmuWy0) +![Components dependency graph](http://www.plantuml.com/plantuml/svg/TP7DJiCm48Jl-nHv0KjG_f68oWytuD1Mt9t4AGOdbfoD46-FwiJRYQnUrflPp-jePZsm3ZnsZdprRMekFlNec68jxWiV6XEAX-ASqlpDrzez-xwrUu8Us9Mm7uP_VVYX-GJcGfYDRfaE1QQNCdqth0VsGUyDGG_ibR0lTk1Wgv5DC_zVfi2zQxatmnbn8yIJ7eoplQ7K07Khr6FRsjxo7wK6g3kXjlMNwJHD1pfy9iXELyvGh0WSCzYyRdTqA3XUqI8FrQXo2bFiZn8ma-rAbH8KMgp2gO4gOsfeBpwAcQ6EBwEUvoO-vmkNayIBuncF4-6NegIRGicMW2vFndYY9C63bc861HQAd9oSmbIo_lWTILgRlXaxzmy0) -The diagram above shows an example component dependency diagram and provides an order in which compomnents may be -initialized. +The diagram above shows the component dependency diagram and provides an order in which compomnents may be initialized. ## Cluster lifecycle For a cluster to become operational, the metastorage instance must be initialized first. The initialization command @@ -58,7 +57,27 @@ initialization command, it either creates a bootstrapped Raft instance with the group node), or writes the metastorage group member IDs to the vault as a private system-level property. After the metastorage is initialized, components start to receive and process watch events, updating the local state -according to the changes received from the watch. +according to the changes received from the watch. + +An entry point to user-initiated cluster state changes is [cluster configuration](../configuration/README.md). +Configuration module provides convenient ways for managing configuration both as Java API, as well as from ``ignite`` +command line utility. + +## Reliable configuration changes +Any configuration change is translated to a metastorage multi-update and has a single configuration change ID. This ID +is used to enforce CAS-style configuration changes and to ensure no duplicate configuration changes are executed during +the cluster runtime. To reliably process cluster configuration changes, we introduce an additional metastorage key + +``` +internal.configuration.applied=<change ID> +``` + +that indicates the configuration change ID that was already processed and corresponding changes are written to the +metastorage. Whenever a node processes a configuration change, it must also conditionally update the +``internal.configuration.applied`` value checking that the previous value is smaller than the change ID being applied. +This prevents configuration changes being processed more than once. Any metastorage update that processes configuration +change must update this key to indicate that this configuraion change has been already processed. It is safe to process +the same configuration change more than once since only one update will be applied. ## Reliable watch processing All cluster state is written and maintained in the metastorage. Nodes may update some state in the metastorage, which @@ -92,30 +111,34 @@ of the watch, and not processed events are replayed. We require that each Ignite table is assigned a globally unique ID (the ID must not repeat even after the table is dropped, so we use the metastorage key revision to assign table IDs). -When a table is created, Ignite first checks that a table with the given name does not exist, then attempts to create -the following key-value pair in the metastorage via the conditional update ensuring atomic `putIfAbsent` semantics: +To create a table, a user makes a change in the configuration tree by introducing the corresponding configuration +object. This can be done either via public [configuration API](TODO link to configuration API) or via the ``ignite`` +[configuration command](TODO link to CLI readme). Configuration validator checks that a table with the same name does +not exist (and performs other necessary checks) and writes the change to the metastorage. If the update succeeds, Ignite +considers the table created and completes user call. + +After the configuration change is applied, the table manager receives configuration change notification (essentially, +a transformed watch) on metastorage group nodes. Table manager uses configuration keys update counters (not revision) +as table IDs and attempts to create the following keys (updating the ``internal.configuration.applied`` key as was +described above): ``` -internal.tables.names.<name>=<name> +internal.tables.<ID>=<name> ``` -If the update succeeds, Ignite considers the table created and uses the returned key-value pair revision as the table -ID. If the update fails, then the table with the same name was concurrently created (the operation fails in this case). - In order to process affinity calculations and assignments, the affinity manager creates a reliable watch for the following keys on metastorage group members: ``` -internal.tables.names.* +internal.tables.* internal.baseline ``` Whenever a watch is fired, the affinity manager checks which key was updated. If the watch is triggered for -``internal.tables.names.<name>`` key, it calculates a new affinity for the table with the given name (using key revision -as the table ID). If the watch is triggered for ``internal.baseline`` key, the manager recalculates affinity for all -tables exsiting at the watch revision (this can be done using the metastorage ``range(keys, upperBound)`` method -providing the watch event revision as the upper bound). The calculated affinity is written to the -``internal.tables.affinity.<ID>`` key. +``internal.tables.<ID>`` key, it calculates a new affinity for the table with the given ID. If the watch is triggered +for ``internal.baseline`` key, the manager recalculates affinity for all tables exsiting at the watch revision +(this can be done using the metastorage ``range(keys, upperBound)`` method providing the watch event revision as the +upper bound). The calculated affinity is written to the ``internal.tables.affinity.<ID>`` key. > Note that ideally the watch should only be processed on metastorage group > leader, thus eliminating unnecessary network > trips. Theoretically, we could have embedded this logic to the state > machine, but this would enormously complicate