Re: Dynamo autoscaling: does it beat cassandra?

2019-12-09 Thread Dor Laor
The DynamoDB model has several key benefits over Cassandra's. The most notable one is the tablet concept - data is partitioned into 10GB chunks. So scaling happens where such a tablet reaches maximum capacity and it is automatically divided to two. It can happen in parallel across the entire data

Re: Dynamo autoscaling: does it beat cassandra?

2019-12-09 Thread Jeff Jirsa
Expansion probably much faster in 4.0 with complete sstable streaming (skips ser/deser), though that may have diminishing returns with vnodes unless you're using LCS. Dynamo on demand / autoscaling isn't magic - they're overprovisioning to give you the burst, then expanding on demand. That

Re: Dynamo autoscaling: does it beat cassandra?

2019-12-09 Thread DuyHai Doan
Out of curiosity, does DynamoDB autoscaling allows you to exceed the partition limits (e.g. push more data than it is allowed for some outlier heavy partitions) ? If yes, it can be interesting (I guess DynamoDB is doing some kind of rebalancing behind the scene). If no, it's just an artificial

Re: TTL on UDT

2019-12-09 Thread Carl Mueller
Oh right frozen vs unfrozen. On Mon, Dec 9, 2019 at 2:23 PM DuyHai Doan wrote: > It depends on.. Latest version of Cassandra allows unfrozen UDT. The > individual fields of UDT are updated atomically and they are stored > effectively in distinct physical columns inside the partition, thus >

Dynamo autoscaling: does it beat cassandra?

2019-12-09 Thread Carl Mueller
Dynamo salespeople have been pushing autoscaling abilities that have been one of the key temptations to our management to switch off of cassandra. Has anyone done any numbers on how well dynamo will autoscale demand spikes, and how we could architect cassandra to compete with such abilities? We

Re: TTL on UDT

2019-12-09 Thread DuyHai Doan
It depends on.. Latest version of Cassandra allows unfrozen UDT. The individual fields of UDT are updated atomically and they are stored effectively in distinct physical columns inside the partition, thus applying ttl() on them makes sense. I'm not sure however if the CQL parser allows this syntax

Re: Seeing tons of DigestMismatchException exceptions after upgrading from 2.2.13 to 3.11.4

2019-12-09 Thread Carl Mueller
My speculation on rapidly churning/fast reads of recently written data: - data written at quorum (for RF3): write confirm is after two nodes reply - data read very soon after (possibly code antipattern), and let's assume the third node update hasn't completed yet (e.g. AWS network "variance").

Re: TTL on UDT

2019-12-09 Thread Carl Mueller
I could be wrong, but UDTs I think are written (and overwritten) as one unit, so the notion of a TTL on a UDT field doesn't exist, the TTL is applied to the overall structure. Think of it like a serialized json object with multiple fields. To update a field they deserialize the json, then

Re: AWS ephemeral instances + backup

2019-12-09 Thread Carl Mueller
Jeff: the gp2 drives are expensive, especially if you have to make them unnecessarily large to get the IOPS, and I want to get cheap per node as possible to get as many nodes as possible. i3 + a cheap rust backup beats an m5 or similar one + EBS gp2 in cost when i did the numbers Ben: Going to