Re: cassandra vs. mongodb quick question(good additional info)

aaron morton Thu, 21 Feb 2013 09:46:12 -0800

If you are lazy like me wolfram alpha can help 

http://www.wolframalpha.com/input/?i=transfer+42TB+at+10GbE&a=UnitClash_*TB.*Tebibytes--


10 hours 15 minutes 43.59 seconds

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/02/2013, at 11:31 AM, Wojciech Meler <wojciech.me...@gmail.com> wrote:

> you have 86400 seconds a day so 42T could take less than 12 hours on 10Gb link
> 
> 19 lut 2013 02:01, "Hiller, Dean" <dean.hil...@nrel.gov> napisał(a):
> I thought about this more, and even with a 10Gbit network, it would take 40 
> days to bring up a replacement node if mongodb did truly have a 42T / node 
> like I had heard.  I wrote the below email to the person I heard this from 
> going back to basics which really puts some perspective on it….(and a lot of 
> people don't even have a 10Gbit network like we do)
> 
> Nodes are hooked up by a 10G network at most right now where that is 
> 10gigabit.  We are talking about 10Terabytes on disk per node recently.
> 
> Google "10 gigabit in gigabytes" gives me 1.25 gigabytes/second  (yes I could 
> have divided by 8 in my head but eh…course when I saw the number, I went duh)
> 
> So trying to transfer 10 Terabytes  or 10,000 Gigabytes to a node that we are 
> bringing online to replace a dead node would take approximately 5 days???
> 
> This means no one else is using the bandwidth too ;).  10,000Gigabytes * 1 
> second/1.25 * 1hr/60secs * 1 day / 24 hrs = 5.555555 days.  This is more 
> likely 11 days if we only use 50% of the network.
> 
> So bringing a new node up to speed is more like 11 days once it is crashed.  
> I think this is the main reason the 1Terabyte exists to begin with, right?
> 
> From an ops perspective, this could sound like a nightmare scenario of 
> waiting 10 days…..maybe it is livable though.  Either way, I thought it would 
> be good to share the numbers.  ALSO, that is assuming the bus with it's 10 
> disk can keep up with 10G????  Can it?  What is the limit of throughput on a 
> bus / second on the computers we have as on wikipedia there is a huge 
> variance?
> 
> What is the rate of the disks too (multiplied by 10 of course)?  Will they 
> keep up with a 10G rate for bringing a new node online?
> 
> This all comes into play even more so when you want to double the size of 
> your cluster of course as all nodes have to transfer half of what they have 
> to all the new nodes that come online(cassandra actually has a very data 
> center/rack aware topology to transfer data correctly to not use up all 
> bandwidth unecessarily…I am not sure mongodb has that).  Anyways, just food 
> for thought.
> 
> From: aaron morton <aa...@thelastpickle.com<mailto:aa...@thelastpickle.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Monday, February 18, 2013 1:39 PM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>, Vegard Berget 
> <p...@fantasista.no<mailto:p...@fantasista.no>>
> Subject: Re: cassandra vs. mongodb quick question
> 
> My experience is repair of 300GB compressed data takes longer than 300GB of 
> uncompressed, but I cannot point to an exact number. Calculating the 
> differences is mostly CPU bound and works on the non compressed data.
> 
> Streaming uses compression (after uncompressing the on disk data).
> 
> So if you have 300GB of compressed data, take a look at how long repair takes 
> and see if you are comfortable with that. You may also want to test replacing 
> a node so you can get the procedure documented and understand how long it 
> takes.
> 
> The idea of the soft 300GB to 500GB limit cam about because of a number of 
> cases where people had 1 TB on a single node and they were surprised it took 
> days to repair or replace. If you know how long things may take, and that 
> fits in your operations then go with it.
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 18/02/2013, at 10:08 PM, Vegard Berget 
> <p...@fantasista.no<mailto:p...@fantasista.no>> wrote:
> 
> 
> 
> Just out of curiosity :
> 
> When using compression, does this affect this one way or another?  Is 300G 
> (compressed) SSTable size, or total size of data?
> 
> .vegard,
> 
> ----- Original Message -----
> From:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>
> 
> To:
> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Cc:
> 
> Sent:
> Mon, 18 Feb 2013 08:41:25 +1300
> Subject:
> Re: cassandra vs. mongodb quick question
> 
> 
> If you have spinning disk and 1G networking and no virtual nodes, I would 
> still say 300G to 500G is a soft limit.
> 
> If you are using virtual nodes, SSD, JBOD disk configuration or faster 
> networking you may go higher.
> 
> The limiting factors are the time it take to repair, the time it takes to 
> replace a node, the memory considerations for 100's of millions of rows. If 
> you the performance of those operations is acceptable to you, then go crazy.
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com<http://www.thelastpickle.com/>
> 
> On 16/02/2013, at 9:05 AM, "Hiller, Dean" 
> <dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>> wrote:
> 
> So I found out mongodb varies their node size from 1T to 42T per node 
> depending on the profile.  So if I was going to be writing a lot but rarely 
> changing rows, could I also use cassandra with a per node size of +20T or is 
> that not advisable?
> 
> Thanks,
> Dean
> 
>

Re: cassandra vs. mongodb quick question(good additional info)

Reply via email to