system.size_estimates - safe to remove sstables?

2018-03-04 Thread Kunal Gangakhedkar
Hi all,

I have a 2-node cluster running cassandra 2.1.18.
One of the nodes has run out of disk space and died - almost all of it
shows up as occupied by size_estimates CF.
Out of 296GiB, 288GiB shows up as consumed by size_estimates in 'du -sh'
output.

This is while the other node is chugging along - shows only 25MiB consumed
by size_estimates (du -sh output).

Any idea why this descripancy?
Is it safe to remove the size_estimates sstables from the affected node and
restart the service?

Thanks,
Kunal


Please give input or approval of JIRA 14128 so we can continue document cleanup

2018-03-04 Thread Kenneth Brotman
Two months ago Kurt Greaves cleaned up the home page of the website which
currently has broken links and other issues.  We need to get that JIRA
rapped up.  Further improvements, scores of them are coming.  Get ready!
Please take time soon to review the patch he submitted.  

https://issues.apache.org/jira/browse/CASSANDRA-14128

 

Kenneth Brotman

 

 

 



Re: Cassandra 2.1.18 - Concurrent nodetool repair resulting in > 30K SSTables for a single small (GBytes) CF

2018-03-04 Thread kurt greaves
Repairs with vnodes is likely to cause a lot of small SSTables if you have
inconsistencies (at least 1 per vnode). Did you have any issues when adding
nodes, or did you add multiple nodes at a time? Anything that could have
lead to a bit of inconsistency could have been the cause.

I'd probably avoid running the repairs across all the nodes simultaneously
and instead spread them out over a week. That likely made it worse. Also
worth noting that in versions 3.0+ you won't be able to run nodetool repair
in such a way because anti-compaction will be triggered which will fail if
multiple anti-compactions are attempted simultaneously (if you run multiple
repairs simultaneously).

Have a look at orchestrating your repairs with TLP's fork of
cassandra-reaper .
​


Row cache functionality - Some confusion

2018-03-04 Thread Hannu Kröger
Hello,

I am trying to verify and understand fully the functionality of row cache in 
Cassandra.

I have been using mainly two different sources for information:
https://github.com/apache/cassandra/blob/0db88242c66d3a7193a9ad836f9a515b3ac7f9fa/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java#L476
 

AND
http://cassandra.apache.org/doc/latest/cql/ddl.html#caching-options 


and based on what I read documentation is not correct. 

Documentation says like this:
“rows_per_partition: The amount of rows to cache per partition (“row cache”). 
If an integer n is specified, the first n queried rows of a partition will be 
cached. Other possible options are ALL, to cache all rows of a queried 
partition, or NONE to disable row caching.”

The problematic part is "the first n queried rows of a partition will be 
cached”. Shouldn’t it be that the first N rows in a partition will be cached? 
Not first N that are queried?

If this is the case, I’m more than happy to create a ticket (and maybe even 
create a patch) for the doc update.

BR,
Hannu