The DynamoDB model has several key benefits over Cassandra's.
The most notable one is the tablet concept - data is partitioned into 10GB
chunks. So scaling happens where such a tablet reaches maximum capacity
and it is automatically divided to two. It can happen in parallel across
the entire
data
Expansion probably much faster in 4.0 with complete sstable streaming
(skips ser/deser), though that may have diminishing returns with vnodes
unless you're using LCS.
Dynamo on demand / autoscaling isn't magic - they're overprovisioning to
give you the burst, then expanding on demand. That
Out of curiosity, does DynamoDB autoscaling allows you to exceed the
partition limits (e.g. push more data than it is allowed for some outlier
heavy partitions) ? If yes, it can be interesting (I guess DynamoDB is
doing some kind of rebalancing behind the scene). If no, it's just an
artificial
Oh right frozen vs unfrozen.
On Mon, Dec 9, 2019 at 2:23 PM DuyHai Doan wrote:
> It depends on.. Latest version of Cassandra allows unfrozen UDT. The
> individual fields of UDT are updated atomically and they are stored
> effectively in distinct physical columns inside the partition, thus
>
Dynamo salespeople have been pushing autoscaling abilities that have been
one of the key temptations to our management to switch off of cassandra.
Has anyone done any numbers on how well dynamo will autoscale demand
spikes, and how we could architect cassandra to compete with such abilities?
We
It depends on.. Latest version of Cassandra allows unfrozen UDT. The
individual fields of UDT are updated atomically and they are stored
effectively in distinct physical columns inside the partition, thus
applying ttl() on them makes sense. I'm not sure however if the CQL parser
allows this syntax
My speculation on rapidly churning/fast reads of recently written data:
- data written at quorum (for RF3): write confirm is after two nodes reply
- data read very soon after (possibly code antipattern), and let's assume
the third node update hasn't completed yet (e.g. AWS network "variance").
I could be wrong, but UDTs I think are written (and overwritten) as one
unit, so the notion of a TTL on a UDT field doesn't exist, the TTL is
applied to the overall structure.
Think of it like a serialized json object with multiple fields. To update a
field they deserialize the json, then
Jeff: the gp2 drives are expensive, especially if you have to make them
unnecessarily large to get the IOPS, and I want to get cheap per node as
possible to get as many nodes as possible.
i3 + a cheap rust backup beats an m5 or similar one + EBS gp2 in cost when
i did the numbers
Ben: Going to