And by all means, do not treat Cassandra as a relational Database. - Beware
of the limitations of CQL in contrast to SQL.
I don't want to argue angainst Cassandra because I like it for what it was
primarly designed - horizontal scalability for HUGE amounts of data.
It is good to access your Data by Key, but not for searching. High
Availibility is a nice giveaway here.
If you end having only one Table in C*, maybe something like Redis would
work for your needs, too.
Some hints from my own expierience with it - if you choose to use Cassandra:
Have at least as much Racks as Replication Factor - Replication Factor of 3
means you want to have at least 3 Racks.
Choose your Partitioning wisely - it starts becoming relevant from 10 mio.
Am Di., 10. Juli 2018 um 18:18 Uhr schrieb Jeff Jirsa <jji...@gmail.com>:
> On Tue, Jul 10, 2018 at 8:29 AM, Code Wiget <codewige...@gmail.com> wrote:
>> I have been tasked with picking and setting up a database with the
>> following characteristics:
>> - Ultra-high availability - The real requirement is uptime - our
>> whole platform becomes inaccessible without a “read” from the database. We
>> need the read to authenticate users. Databases will never be spread across
>> multiple networks.
> Sooner or later life will happen and you're going to have some
> unavailability - may be worth taking the time to make it fail gracefully
> (cache auth responses, etc).
>> - Reasonably quick access speeds
>> - Very low data storage - The data storage is very low - for 10
>> million users, we would have around 8GB of storage total.
>> Having done a bit of research on Cassandra, I think the optimal approach
>> for my use-case would be to replicate the data on *ALL* nodes possible,
>> but require reads to only have a consistency level of one. So, in the case
>> that a node goes down, we can still read/write to other nodes. It is not
>> very important that a read be unanimously agreed upon, as long as Cassandra
>> is eventually consistent, within around 1s, then there shouldn’t be an
> Seems like a reasonably good fit, but there's no 1s guarantee - it'll
> USUALLY happen within milliseconds, but the edge cases don't have a strict
> guarantee at all (imagine two hosts in adjacent racks, the link between the
> two racks goes down, but both are otherwise functional - a query at ONE in
> either rack would be able to read and write data, but it would diverge
> between the two racks for some period of time).
>> When I go to set up the database though, I am required to set a
>> replication factor to a number - 1,2,3,etc. So I can’t just say “ALL” and
>> have it replicate to all nodes.
> That option doesn't exist. It's been proposed (and exists in Datastax
> Enterprise, which is a proprietary fork), but reportedly causes quite a bit
> of pain when misused, so people have successfully lobbied against it's
> inclusion in OSS Apache Cassandra. You could (assuming some basic java
> knowledge) extend NetworkTopologyStrategy to have it accomplish this, but I
> imagine you don't REALLY want this unless you're frequently auto-scaling
> nodes in/out of the cluster. You should probably just pick a high RF and
> you'll be OK with it.
>> Right now, I have a 2 node cluster with replication factor 3. Will this
>> cause any issues, having a RF > #nodes? Or is there a way to just have it
>> copy to *all* nodes?
> It's obviously not the intended config, but I don't think it'll cause many
>> Is there any way that I can tune Cassandra to be more read-optimized?
> Yes - definitely use leveled compaction instead of STCS (the default), and
> definitely take the time to tune the JVM args - read path generates a lot
> of short lived java objects, so a larger eden will help you (maybe up to
> 40-50% of max heap size).
>> Finally, I have some misgivings about how well Cassandra fits my
>> use-case. Please, if anyone has a suggestion as to why or why not it is a
>> good fit, I would really appreciate your input! If this could be done with
>> a simple SQL database and this is overkill, please let me know.
>> Thanks for your input!