And by all means, do not treat Cassandra as a relational Database. - Beware of the limitations of CQL in contrast to SQL. I don't want to argue angainst Cassandra because I like it for what it was primarly designed - horizontal scalability for HUGE amounts of data. It is good to access your Data by Key, but not for searching. High Availibility is a nice giveaway here. If you end having only one Table in C*, maybe something like Redis would work for your needs, too.
Some hints from my own expierience with it - if you choose to use Cassandra: Have at least as much Racks as Replication Factor - Replication Factor of 3 means you want to have at least 3 Racks. Choose your Partitioning wisely - it starts becoming relevant from 10 mio. records onwards. regards, Jürgen Am Di., 10. Juli 2018 um 18:18 Uhr schrieb Jeff Jirsa <jji...@gmail.com>: > > > On Tue, Jul 10, 2018 at 8:29 AM, Code Wiget <codewige...@gmail.com> wrote: > >> Hi, >> >> I have been tasked with picking and setting up a database with the >> following characteristics: >> >> - Ultra-high availability - The real requirement is uptime - our >> whole platform becomes inaccessible without a “read” from the database. We >> need the read to authenticate users. Databases will never be spread across >> multiple networks. >> >> > Sooner or later life will happen and you're going to have some > unavailability - may be worth taking the time to make it fail gracefully > (cache auth responses, etc). > >> >> - Reasonably quick access speeds >> - Very low data storage - The data storage is very low - for 10 >> million users, we would have around 8GB of storage total. >> >> Having done a bit of research on Cassandra, I think the optimal approach >> for my use-case would be to replicate the data on *ALL* nodes possible, >> but require reads to only have a consistency level of one. So, in the case >> that a node goes down, we can still read/write to other nodes. It is not >> very important that a read be unanimously agreed upon, as long as Cassandra >> is eventually consistent, within around 1s, then there shouldn’t be an >> issue. >> > > Seems like a reasonably good fit, but there's no 1s guarantee - it'll > USUALLY happen within milliseconds, but the edge cases don't have a strict > guarantee at all (imagine two hosts in adjacent racks, the link between the > two racks goes down, but both are otherwise functional - a query at ONE in > either rack would be able to read and write data, but it would diverge > between the two racks for some period of time). > > >> >> When I go to set up the database though, I am required to set a >> replication factor to a number - 1,2,3,etc. So I can’t just say “ALL” and >> have it replicate to all nodes. >> > > That option doesn't exist. It's been proposed (and exists in Datastax > Enterprise, which is a proprietary fork), but reportedly causes quite a bit > of pain when misused, so people have successfully lobbied against it's > inclusion in OSS Apache Cassandra. You could (assuming some basic java > knowledge) extend NetworkTopologyStrategy to have it accomplish this, but I > imagine you don't REALLY want this unless you're frequently auto-scaling > nodes in/out of the cluster. You should probably just pick a high RF and > you'll be OK with it. > > >> Right now, I have a 2 node cluster with replication factor 3. Will this >> cause any issues, having a RF > #nodes? Or is there a way to just have it >> copy to *all* nodes? >> > > It's obviously not the intended config, but I don't think it'll cause many > problems. > > >> Is there any way that I can tune Cassandra to be more read-optimized? >> >> > Yes - definitely use leveled compaction instead of STCS (the default), and > definitely take the time to tune the JVM args - read path generates a lot > of short lived java objects, so a larger eden will help you (maybe up to > 40-50% of max heap size). > > >> Finally, I have some misgivings about how well Cassandra fits my >> use-case. Please, if anyone has a suggestion as to why or why not it is a >> good fit, I would really appreciate your input! If this could be done with >> a simple SQL database and this is overkill, please let me know. >> >> Thanks for your input! >> >> >