This is off-topic. But if your goal is to maximise storage density and
also ensuring data durability and availability, this is what you should
be looking at:
* hardware:
https://www.backblaze.com/blog/open-source-data-storage-server/
* architecture and software:
https://www.backblaze.com/blog/vault-cloud-storage-architecture/
On 08/04/2021 17:50, Joe Obernberger wrote:
I am also curious on this question. Say your use case is to store
10PBytes of data in a new server room / data-center with new
equipment, what makes the most sense? If your database is primarily
write with little read, I think you'd want to maximize disk space per
rack space. So you may opt for a 2u server with 24 3.5" disks at
16TBytes each for a node with 384TBytes of disk - so ~27 servers for
10PBytes.
Cassandra doesn't seem to be the good choice for that configuration;
the rule of thumb that I'm hearing is ~2Tbytes per node, in which case
we'd need over 5000 servers. This seems really unreasonable.
-Joe
On 4/8/2021 9:56 AM, Lapo Luchini wrote:
Hi, one project I wrote is using Cassandra to back the huge amount of
data it needs (data is written only once and read very rarely, but
needs to be accessible for years, so the storage needs become huge in
time and I chose Cassandra mainly for its horizontal scalability
regarding disk size) and a client of mine needs to install that on
his hosts.
Problem is, while I usually use a cluster of 6 "smallish" nodes
(which can grow in time), he only has big ESX servers with huge disk
space (which is already RAID-6 redundant) but wouldn't have the
possibility to have 3+ nodes per DC.
This is out of my usual experience with Cassandra and, as far as I
read around, out of most use-cases found on the website or this
mailing list, so the question is:
does it make sense to use Cassandra with a big (let's talk 6TB today,
up to 20TB in a few years) single-node DataCenter, and another
single-node DataCenter (to act as disaster recovery)?
Thanks in advance for any suggestion or comment!