The theoretical maximum of 10G is not even close to what you actually get. http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CDIQFjAA&url=http%3A%2F%2Fdownload.intel.com%2Fsupport%2Fnetwork%2Fsb%2Ffedexcasestudyfinal.pdf&ei=HawmUcWIM6q20QG8j4DIBw&usg=AFQjCNG8Qskl9vXdJvB7OLtIPQgparrt9A&bvm=bv.42661473,d.dmQ&cad=rja
Sorry did not have time to strip the google stuff out of this link. On Thu, Feb 21, 2013 at 12:45 PM, aaron morton <aa...@thelastpickle.com> wrote: > If you are lazy like me wolfram alpha can help > > http://www.wolframalpha.com/input/?i=transfer+42TB+at+10GbE&a=UnitClash_*TB.*Tebibytes-- > > 10 hours 15 minutes 43.59 seconds > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 21/02/2013, at 11:31 AM, Wojciech Meler <wojciech.me...@gmail.com> wrote: > > you have 86400 seconds a day so 42T could take less than 12 hours on 10Gb > link > > 19 lut 2013 02:01, "Hiller, Dean" <dean.hil...@nrel.gov> napisał(a): >> >> I thought about this more, and even with a 10Gbit network, it would take >> 40 days to bring up a replacement node if mongodb did truly have a 42T / >> node like I had heard. I wrote the below email to the person I heard this >> from going back to basics which really puts some perspective on it….(and a >> lot of people don't even have a 10Gbit network like we do) >> >> Nodes are hooked up by a 10G network at most right now where that is >> 10gigabit. We are talking about 10Terabytes on disk per node recently. >> >> Google "10 gigabit in gigabytes" gives me 1.25 gigabytes/second (yes I >> could have divided by 8 in my head but eh…course when I saw the number, I >> went duh) >> >> So trying to transfer 10 Terabytes or 10,000 Gigabytes to a node that we >> are bringing online to replace a dead node would take approximately 5 >> days??? >> >> This means no one else is using the bandwidth too ;). 10,000Gigabytes * 1 >> second/1.25 * 1hr/60secs * 1 day / 24 hrs = 5.555555 days. This is more >> likely 11 days if we only use 50% of the network. >> >> So bringing a new node up to speed is more like 11 days once it is >> crashed. I think this is the main reason the 1Terabyte exists to begin >> with, right? >> >> From an ops perspective, this could sound like a nightmare scenario of >> waiting 10 days…..maybe it is livable though. Either way, I thought it >> would be good to share the numbers. ALSO, that is assuming the bus with >> it's 10 disk can keep up with 10G???? Can it? What is the limit of >> throughput on a bus / second on the computers we have as on wikipedia there >> is a huge variance? >> >> What is the rate of the disks too (multiplied by 10 of course)? Will they >> keep up with a 10G rate for bringing a new node online? >> >> This all comes into play even more so when you want to double the size of >> your cluster of course as all nodes have to transfer half of what they have >> to all the new nodes that come online(cassandra actually has a very data >> center/rack aware topology to transfer data correctly to not use up all >> bandwidth unecessarily…I am not sure mongodb has that). Anyways, just food >> for thought. >> >> From: aaron morton >> <aa...@thelastpickle.com<mailto:aa...@thelastpickle.com>> >> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" >> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> >> Date: Monday, February 18, 2013 1:39 PM >> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" >> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>, Vegard Berget >> <p...@fantasista.no<mailto:p...@fantasista.no>> >> Subject: Re: cassandra vs. mongodb quick question >> >> My experience is repair of 300GB compressed data takes longer than 300GB >> of uncompressed, but I cannot point to an exact number. Calculating the >> differences is mostly CPU bound and works on the non compressed data. >> >> Streaming uses compression (after uncompressing the on disk data). >> >> So if you have 300GB of compressed data, take a look at how long repair >> takes and see if you are comfortable with that. You may also want to test >> replacing a node so you can get the procedure documented and understand how >> long it takes. >> >> The idea of the soft 300GB to 500GB limit cam about because of a number of >> cases where people had 1 TB on a single node and they were surprised it took >> days to repair or replace. If you know how long things may take, and that >> fits in your operations then go with it. >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 18/02/2013, at 10:08 PM, Vegard Berget >> <p...@fantasista.no<mailto:p...@fantasista.no>> wrote: >> >> >> >> Just out of curiosity : >> >> When using compression, does this affect this one way or another? Is 300G >> (compressed) SSTable size, or total size of data? >> >> .vegard, >> >> ----- Original Message ----- >> From: >> user@cassandra.apache.org<mailto:user@cassandra.apache.org> >> >> To: >> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>> >> Cc: >> >> Sent: >> Mon, 18 Feb 2013 08:41:25 +1300 >> Subject: >> Re: cassandra vs. mongodb quick question >> >> >> If you have spinning disk and 1G networking and no virtual nodes, I would >> still say 300G to 500G is a soft limit. >> >> If you are using virtual nodes, SSD, JBOD disk configuration or faster >> networking you may go higher. >> >> The limiting factors are the time it take to repair, the time it takes to >> replace a node, the memory considerations for 100's of millions of rows. If >> you the performance of those operations is acceptable to you, then go crazy. >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Developer >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com<http://www.thelastpickle.com/> >> >> On 16/02/2013, at 9:05 AM, "Hiller, Dean" >> <dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>> wrote: >> >> So I found out mongodb varies their node size from 1T to 42T per node >> depending on the profile. So if I was going to be writing a lot but rarely >> changing rows, could I also use cassandra with a per node size of +20T or is >> that not advisable? >> >> Thanks, >> Dean >> >> >