Re: Huge single-node DCs (?)

2021-04-15 Thread Jeff Jirsa
> On Apr 9, 2021, at 6:15 AM, Joe Obernberger > wrote: > >  > We run a ~1PByte HBase cluster on top of Hadoop/HDFS that works pretty well. > I would love to be able to use Cassandra instead on a system like that. > 1PB is definitely in the range of viable cassandra clusters today > Even

Re: Huge single-node DCs (?)

2021-04-15 Thread Kane Wilson
4.0 has gone a ways to enable better densification of nodes, but it wasn't a main focus. We're probably still only thinking that 4TB - 8TB nodes will be feasible (and then maybe only for expert users). The main problems tend to be streaming, compaction, and repairs when it comes to dense nodes.

RE: Huge single-node DCs (?)

2021-04-09 Thread Durity, Sean R
available in the open source version, too. Sean Durity – Staff Systems Engineer, Cassandra From: Elliott Sims Sent: Thursday, April 8, 2021 6:36 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Huge single-node DCs (?) I'm not sure I'd suggest building a single DIY Backblaze pod

Re: Huge single-node DCs (?)

2021-04-09 Thread Joe Obernberger
We run a ~1PByte HBase cluster on top of Hadoop/HDFS that works pretty well.  I would love to be able to use Cassandra instead on a system like that.  HBase queries / scans are not the easiest to deal with, but, as with Cassandra, if you know the primary key, you can get to your data fast,

Re: Huge single-node DCs (?)

2021-04-08 Thread Elliott Sims
I'm not sure I'd suggest building a single DIY Backblaze pod. The SATA port multipliers are a pain both from a supply chain and systems management perspective. Can be worth it when you're amortizing that across a lot of servers and can exert some leverage over wholesale suppliers, but less so

Re: Huge single-node DCs (?)

2021-04-08 Thread Bowen Song
This is off-topic. But if your goal is to maximise storage density and also ensuring data durability and availability, this is what you should be looking at: * hardware: https://www.backblaze.com/blog/open-source-data-storage-server/ * architecture and software:

Re: Huge single-node DCs (?)

2021-04-08 Thread Joe Obernberger
I am also curious on this question.  Say your use case is to store 10PBytes of data in a new server room / data-center with new equipment, what makes the most sense?  If your database is primarily write with little read, I think you'd want to maximize disk space per rack space.  So you may opt

Re: Huge single-node DCs (?)

2021-04-08 Thread Bowen Song
I'm sure there's a lots of pitfalls. A few of them in my mind right now: * With a single node, you will completely lose the benefit of high availability from Cassandra. Not only hardware failure will result in downtime, routine maintenance (such as software upgrade) can also result in