RE: Current data density limits with Open Source Cassandra

2017-02-15 Thread SEAN_R_DURITY
I request 1-2 TB of disk per node, depending on how large the data is estimated 
to be (for larger data, 2 TB). I have some dense nodes (4+ TB of disk 
available). They are harder to manage for repairs, bootstrapping, compaction, 
etc. because it takes so long to stream the data, etc. For the actual 
application, I have not seen a great impact based on the size of disk available.


Sean Durity

From: daemeon reiydelle [mailto:daeme...@gmail.com]
Sent: Wednesday, February 08, 2017 10:56 PM
To: user@cassandra.apache.org
Subject: Re: Current data density limits with Open Source Cassandra

your MMV. Think of that storage limit as fairly reasonable for active data 
likely to tombstone. Add more for older/historic data. Then think about time to 
recover a node.


...

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

On Wed, Feb 8, 2017 at 2:14 PM, Ben Slater 
<ben.sla...@instaclustr.com<mailto:ben.sla...@instaclustr.com>> wrote:
The major issue we’ve seen with very high density (we generally say <2TB node 
is best) is manageability - if you need to replace a node or add node then 
restreaming data takes a *long* time and there we fairly high chance of a 
glitch in the universe meaning you have to start again before it’s done.

Also, if you’re uses STCS you can end up with gigantic compactions which also 
take a long time and can cause issues.

Heap limitations are mainly related to partition size rather than node density 
in my experience.

Cheers
Ben

On Thu, 9 Feb 2017 at 08:20 Hannu Kröger 
<hkro...@gmail.com<mailto:hkro...@gmail.com>> wrote:
Hello,

Back in the day it was recommended that max disk density per node for Cassandra 
1.2 was at around 3-5TB of uncompressed data.

IIRC it was mostly because of heap memory limitations? Now that off-heap 
support is there for certain data and 3.x has different data storage format, is 
that 3-5TB still a valid limit?

Does anyone have experience on running Cassandra with 3-5TB compressed data ?

Cheers,
Hannu
--

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798<tel:+61%20437%20929%20798>




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: Current data density limits with Open Source Cassandra

2017-02-08 Thread daemeon reiydelle
your MMV. Think of that storage limit as fairly reasonable for active data
likely to tombstone. Add more for older/historic data. Then think about
time to recover a node.


*...*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Feb 8, 2017 at 2:14 PM, Ben Slater 
wrote:

> The major issue we’ve seen with very high density (we generally say <2TB
> node is best) is manageability - if you need to replace a node or add node
> then restreaming data takes a *long* time and there we fairly high chance
> of a glitch in the universe meaning you have to start again before it’s
> done.
>
> Also, if you’re uses STCS you can end up with gigantic compactions which
> also take a long time and can cause issues.
>
> Heap limitations are mainly related to partition size rather than node
> density in my experience.
>
> Cheers
> Ben
>
> On Thu, 9 Feb 2017 at 08:20 Hannu Kröger  wrote:
>
>> Hello,
>>
>> Back in the day it was recommended that max disk density per node for
>> Cassandra 1.2 was at around 3-5TB of uncompressed data.
>>
>> IIRC it was mostly because of heap memory limitations? Now that off-heap
>> support is there for certain data and 3.x has different data storage
>> format, is that 3-5TB still a valid limit?
>>
>> Does anyone have experience on running Cassandra with 3-5TB compressed
>> data ?
>>
>> Cheers,
>> Hannu
>
> --
> 
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798 <+61%20437%20929%20798>
>


Re: Current data density limits with Open Source Cassandra

2017-02-08 Thread Ben Slater
The major issue we’ve seen with very high density (we generally say <2TB
node is best) is manageability - if you need to replace a node or add node
then restreaming data takes a *long* time and there we fairly high chance
of a glitch in the universe meaning you have to start again before it’s
done.

Also, if you’re uses STCS you can end up with gigantic compactions which
also take a long time and can cause issues.

Heap limitations are mainly related to partition size rather than node
density in my experience.

Cheers
Ben

On Thu, 9 Feb 2017 at 08:20 Hannu Kröger  wrote:

> Hello,
>
> Back in the day it was recommended that max disk density per node for
> Cassandra 1.2 was at around 3-5TB of uncompressed data.
>
> IIRC it was mostly because of heap memory limitations? Now that off-heap
> support is there for certain data and 3.x has different data storage
> format, is that 3-5TB still a valid limit?
>
> Does anyone have experience on running Cassandra with 3-5TB compressed
> data ?
>
> Cheers,
> Hannu

-- 

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


Current data density limits with Open Source Cassandra

2017-02-08 Thread Hannu Kröger
Hello,

Back in the day it was recommended that max disk density per node for Cassandra 
1.2 was at around 3-5TB of uncompressed data. 

IIRC it was mostly because of heap memory limitations? Now that off-heap 
support is there for certain data and 3.x has different data storage format, is 
that 3-5TB still a valid limit?

Does anyone have experience on running Cassandra with 3-5TB compressed data ?

Cheers,
Hannu