belliottsmith commented on code in PR #3947: URL: https://github.com/apache/cassandra/pull/3947#discussion_r1979483632
########## doc/modules/cassandra/pages/developing/accord/index.adoc: ########## @@ -0,0 +1,355 @@ +== Accord Intro + +This document is intended to facilitate quick dive into Accord and +Cassandra Integration code for anyone interested in the project. Readers +should be closely familiar at very least with Single-Decree Paxos and +fluent in Consensus terminology. Familiarity with Accord protocol +itself, or similar protocols such as EPaxos, TAPIR, Janus, or Tempo, can +be useful. + +Accord code is logically split into local and coordinator part. +Coordination code contains code intended for coordination/invocation of +the client query, driving it through the Accord state machine, and all +commands and utilities for tracking/retrying their state. Node-local +code contains utility for keeping record of replica state and facilitate +local execution (i.e. responding to coordinator queries). + +There are _many_ enums in Accord. They’re extremely useful for +understanding the state machine of each of the components. + +Cassandra Integration implements interfaces provided by Accord, and +plugs in messaging, serialization, CQL, concurrency/execution, on-disk +state management, and stable storage (i.e. Cassandra tables). + +When the request comes from the client, broadly speaking, it gets parsed +and turns into `TransactionStatement`. `TransactionStatement` contains +updates, selects, assignments, and conditions intended for +atomic/transactional execution. These statements are translated into +Accord commands (i.e. `Read`, `Write`, or `Update`), and form Accord +Transaction (`Txn`). Transaction is executed yielding `TxnResult` that +can be returned to the client. + +== Coordinator Side + +=== Accord Protocol Basics + +Coordinator allocates a globally unique transaction ID `TxnId` for the +transaction, and begins coordination (see `CoordinateTransaction`). +Here, coordinator perform initial rounds of `PreAccept` and `Accept` +until the agreement about when transaction should execute is reached. +Coordinated query execution starts with a `PreAccept` message, which +contains transaction definition and routing information. + +On the replica locally, each Accord message first lands in +`AccordVerbHandler`, which handles _all_ Accord messages. Replica +determines whether it is aware of the _epoch_ specified by the +transaction coordinator. Messages for the future epochs are parked until +epoch becomes active on the node; messages for known epochs are +submitted to their corresponding command stores (think: local shards). +Replica applies the message locally, changing its local state, and +producing coordinator response. Coordinator collects replica responses +and continues driving transaction through the execution state machine. + +Every transaction has a home key - a global value that defines the home +shard, the one tasked with ensuring the transaction is finished. Home +key is chosen arbitrarily: it is either a first key the coordinator +owns, or it is picked completely at random. + +== Replica Side + +=== CommandStore + +`Command` is a unit of Accord _metadata_ that relates to a specific +operation, as opposed to `Message`, which is an _instruction_ sent by +coordinator to the replica for execution that _changes_ this command +state. `Command` does _not_ hold the state of an entire transaction, but +rather a _part_ of transaction executed on a particular shard. +_Coordinator_ is responsible for executing the entirety of the +transaction, `Command`s are just local execution states. + +Commands are held by a Command _Store_, a single threaded internal shard +of accord transaction metadata. It holds state required for command +execution, and executes commands sequentially. For command execution, +`CommandStore` creates a `SafeCommandStore`, a version of `CommandStore` +created for command execution, during which it has exclusive access to +it. + +Roughly speaking, you can think of relation between CommandStore and +SafeCommandStore as: + +.... +SafeCommandStore safeStore = commandStore.beginOperation(context) +try { + message.apply(safeStore); +} +finally { + commandStore.completeOperation(safeStore); +} +.... + +In other words, `CommandStore` collects the `PreLoadContext`, state +required to be in memory for command execution (possible dependencies, +such as `TxnId`s, and `Key`s of commands, but also `CommandsForKeys` +that will be needed during execution). Once the context is collected and +command’s turn to execute on command store comes, _safe_ command store +is created and passed to the command. + +Any executing operation may require changes to command store state. For +this, `SafeCommandStore` creates a special version of command state, +`SafeCommand` and `SafeCommandsForKey` that can be updated during +execution. Naturally, either _all_ of the states changed during +operation execution will become visible, or none of them will. In order +to ensure transactional integrity, changes to commands are tracked and +are recorded into `Journal` for crash-recovery. `ProgressLog` and +`CommandsForKey` are up + +On Cassandra side, concurrent execution is controlled by `AccordTask`, +which contains cache loading logic and persistence callbacks. Since +Accord may potentially hold a large number of command states in memory, +their states may be _shrunk_ to their binary representation to save some +memory, or they can get fully evicted. This also means that `AccordTask` +will have to reload relevant dependencies from preload context before +command execution can begin. + +=== AsyncChain, AccordTask, AccordExecutor + +Accord is designed for high concurrency, and most things are constructed +as asynchronous chains. `AsyncChain` API is very similar to the one of +Java futures, but has several convenient methods that make execution on +multiple executors (think: command stores, loaders) simpler. + +Each `CommandStore` has its own `AccordExecutor`. For the purpose of +this document you may consider it as a single-threaded executor. +`AccordExecutor` keeps track of tasks in different states, primarily: + +* `WAITING_TO_LOAD` - executor has a maximum number of concurrent load Review Comment: incomplete sentence? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For additional commands, e-mail: pr-h...@cassandra.apache.org