Thanks for the response Alain. I am using STCS and would like to take some 
action as we would be hitting 50% disk space pretty soon. Would adding nodes be 
the right way to start if I want to get the data per node down otherwise can 
you or someone on the list please suggest the right way to go about it.

Thanks

Sent from my iPhone

> On Apr 14, 2016, at 5:17 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:
> 
> Hi,
> 
>> I seek advice in data size per node. Each of my node has close to 1 TB of 
>> data. I am not seeing any issues as of now but wanted to run it by you guys 
>> if this data size is pushing the limits in any manner and if I should be 
>> working on reducing data size per node.
> 
> There is no real limit to the data size other than 50% of the machine disk 
> space using STCS and 80 % if you are using LCS. Those are 'soft' limits as it 
> will depend on your biggest sstables size and the number of concurrent 
> compactions mainly, but to stay away from trouble, it is better to keep 
> things under control, below the limits mentioned above.
> 
>> I will me migrating to incremental repairs shortly and full repair as of now 
>> takes 20 hr/node. I am not seeing any issues with the nodes for now.
> 
> As you noticed, you need to keep in mind that the larger the dataset is, the 
> longer operations will take. Repairs but also bootstrap or replace a node, 
> remove a node, any operation that require to stream data or read it. Repair 
> time can be mitigated by using incremental repairs indeed. 
> 
>> I am running a 9 node C* 2.1.12 cluster.
> 
> It should be quite safe to give incremental repair a try as many bugs have 
> been fixe in this version:
> 
> FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction - 
> incremental only
> 
> https://issues.apache.org/jira/browse/CASSANDRA-10422
> 
> FIX 2.1.12 - repair hang when replica is down - incremental only
> 
> https://issues.apache.org/jira/browse/CASSANDRA-10288
> 
> If you are using DTCS be aware of 
> https://issues.apache.org/jira/browse/CASSANDRA-11113
> 
> If using LCS, watch closely sstable and compactions pending counts.
> 
> As a general comment, I would say that Cassandra has evolved to be able to 
> handle huge datasets (memory structures off-heap + increase of heap size 
> using G1GC, JBOD, vnodes, ...). Today Cassandra works just fine with big 
> dataset. I have seen clusters with 4+ TB nodes and other using a few GB per 
> node. It all depends on your requirements and your machines spec. If fast 
> operations are absolutely necessary, keep it small. If you want to use the 
> entire disk space (50/80% of total disk space max), go ahead as long as other 
> resources are fine (CPU, memory, disk throughput, ...).
> 
> C*heers,
> 
> -----------------------
> Alain Rodriguez - al...@thelastpickle.com
> France
> 
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> 
> 2016-04-14 10:57 GMT+02:00 Aiman Parvaiz <ai...@flipagram.com>:
>> Hi all,
>> I am running a 9 node C* 2.1.12 cluster. I seek advice in data size per 
>> node. Each of my node has close to 1 TB of data. I am not seeing any issues 
>> as of now but wanted to run it by you guys if this data size is pushing the 
>> limits in any manner and if I should be working on reducing data size per 
>> node. I will me migrating to incremental repairs shortly and full repair as 
>> of now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>> 
>> Thanks
> 

Reply via email to