Hi openstackers,

In order to implement 
https://blueprints.launchpad.net/magnetodb/+spec/support-tuneable-consistency 
we need tunable consistency support in MagnetoDB what is described here 
https://blueprints.launchpad.net/magnetodb/+spec/configurable-consistency

So, here is specification draft of concept.

1. First of all, there is a list of suggested consistency levels for MagnetoDB:

 *   STRONG - Provides the highest consistency and the lowest availability of 
any other level. (A write must be written to the commit log and memory table on 
all replica nodes in the cluster for that row. Read returns the record with the 
most recent timestamp after all replicas have responded.)
 *   WEAK - Provides low latency. Delivers the lowest consistency and highest 
availability compared to other levels. (A write must be written to the commit 
log and memory table of at least one replica node. Read returns a response from 
at least one replica node)
 *   QUORUM - Provides strong consistency if you can tolerate some level of 
failure. (A write must be written to the commit log and memory table on a 
quorum of replica nodes. Read returns the record with the most recent timestamp 
after a quorum of replicas has responded regardless of data center.)

    And special Multi Data Center consistency levels:

 *   MDC_EACH_QUORUM - Used in multiple data center clusters to strictly 
maintain consistency at the same level in each data center. (A write must be 
written to the commit log and memory table on a quorum of replica nodes in all 
data centers. Read returns the record with the most recent timestamp once a 
quorum of replicas in each data center of the cluster has responded.)
 *   MDC_LOCAL_QUORUM - Used in multiple data center clusters to maintain 
consistency in local (current) data center. (A write must be written to the 
commit log and memory table on a quorum of replica nodes in the same data 
center as the coordinator node. Read returns the record with the most recent 
timestamp once a quorum of replicas in the current data center as the 
coordinator node has reported. Avoids latency of inter-data center 
communication.)

BUT: We can't use inconsistent write if we use indexed table and condition 
operations which indexes based on. Because this staff requires the state of 
data. So it seems that we can:
1) tune consistent read/write operation in the next combinations: 
QUORUM/QUORUM, MDC_LOCAL_QUORUM/MDC_EACH_QUORUM, 
MDC_EACH_QUORUM/MDC_LOCAL_QUORUM, STRONG/WEAK) .
And also we have inconsistent read operation with CL=WEAK
2) if we really need inconsistent write we can allow it for tables without 
indexing. In this case we provide more flexibility and optimization 
possibility, but on another hand we make MagnetoDB more complicated.



2. JSON request examples.

I suggest adding new 'consistency_level' attribute. So we should check 
corresponding naming in backend API, cause it can be little different there.



For read data operation we will use for example get item request:

            {
                "key": {
                    "ForumName": {
                        "S": "MagnetoDB"
                    },
                    "Subject": {
                        "S": "What about configurable consistency support?"
                    }
                },
                "attributes_to_get": ["LastPostDateTime","Message","Tags"],
                "consistency_level": "STRONG"
            }

Here we use consistency level STRONG, so it means, that response returns the 
record with the most recent timestamp after all replicas have responded. In 
this case we will have the highest consistency but the lowest availability of 
any other level.

For write data operation we will use for example put item request:

            {
                "item": {
                    "LastPostDateTime": {
                        "S": "201303190422"
                    },
                    "Tags": {
                        "SS": ["Update","Multiple items","HelpMe"]
                    },
                    "ForumName": {
                        "S": "Amazon DynamoDB"
                    },
                    "Message": {
                        "S": "I want to update multiple items."
                    },
                    "Subject": {
                        "S": "How do I update multiple items?"
                    },
                    "LastPostedBy": {
                        "S": "f...@example.com<mailto:f...@example.com>"
                    }
                },
                "expected": {
                    "ForumName": {
                        "exists": false
                    },
                    "Subject": {
                        "exists": false
                    },
                },
                "consistency_level": "WEAK"
            }
        """

Here we use consistency level WEAK, so it means, that write will be written to 
the commit log and memory table of at least one replica node. In this case we 
will have lowest consistency but highest availability compared to other levels.



And one more example for table creation:



            {
                "attribute_definitions": [
                    {
                        "attribute_name": "ForumName",
                        "attribute_type": "S"
                    },
                    {
                        "attribute_name": "Subject",
                        "attribute_type": "S"
                    },
                    {
                        "attribute_name": "LastPostDateTime",
                        "attribute_type": "S"
                    }
                ],
                "table_name": "Thread",
                "key_schema": [
                    {
                        "attribute_name": "ForumName",
                        "key_type": "HASH"
                    },
                    {
                        "attribute_name": "Subject",
                        "key_type": "RANGE"
                    }
                ],
                "consistency_level": "QUORUM"
            }



Here we use consistency level QUORUM and set it to default consistency level to 
all write and read operations with this table. But we can post requests for it 
with another CL thereby change CL for needed operations.



3. Default behaviour

If we use in the "consistent read/write"request field value 'DEFAULT', or if we 
do not specify any level there or omit this field at all, we should use any 
default value for data consistency in MagnetoDB. I suggest QUORUM level for it, 
because it provides quite strong consistency if we can tolerate some level of 
failure. So, we will have consistent data with QUORUM on read and write 
operations.



4. Сhanges in database API.

In this approach, changes in database api will be minimal. For example, let's 
see to the select_item method of storage API.

https://github.com/stackforge/magnetodb/blob/master/magnetodb/storage/__init__.py#L162-L190

It takes 'consistent' argument and transmit it to lower level. Now it is 
boolean value. So we suggest to change it and use specific values, that we have 
suggested above, and transmit it to backend. There it will be mapped to 
backend-dependent consistency levels and be used by specific backend directly. 
For example, in Cassandra it will be used here:

https://github.com/stackforge/magnetodb/blob/master/magnetodb/storage/impl/cassandra_impl.py#L265-L284

To storage api methods, where we don't have such args, it will be added.



5. Error handling approach.

We suggest to add validations. The first one should be done on the REST api 
level. If we have unsupported level in request (or maybe mistake etc.), we 
should return error. Another validation should be on the backend level: support 
it or not
such consistency level. Here we have 2 variants: 1) Return error and message 
about unsupported CL or 2) Use some default behaviour for specified backend. 
And the last validation will be directly in the backend DB, so if any, we will 
transmit it to higher level.
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to