Would you share some more context with us?
- What Cassandra version do you use?
- What is the data size per node?
- How much RAM does the hardware have?
- Does your client use paging?
A few ideas to explore:
- Try tracing the query, see what's taking time (and resources)
- From the tracing, logs, sstablemetadata tool or monitoring dashboard, do
you see any tombstone?
- What is the percentage of GC pause per second? 128 GB seems huge to me,
even with G1GC. Do you still have memory for page caching? Also from
general logs, gc logs or dashboard. Reallocating 70GB every minute does not
seem right. Maybe using a smaller size for the heap (more common) would
have more frequent but smaller pauses?
- Any pending/blocked thread (monitoring charts about thread pool or
'nodetool tpstats'. Also 'watch -d "nodetool tpstats' will make evolution
and newly pending/blocked thread obvious to you (or a cassandra restart
reset stats as well).
- What is the number of SSTable touched per read operations on the main
- Are the bloom filters efficient?
- Is key cache efficient (ratio of hit 0.8, 0.9+)
- The logs should be reporting something during the 10 minutes the machines
were unresponsive, give a try to: grep -e "WARN" -e "ERROR"
More than 200 MB per partitions is quite big. Explore improving what can be
operationally, but you might have to reduce the partition size ultimately.
On the other side, Cassandra tends to evolve allowing bigger partition
sizes, as it handles them with a better efficiency over time. If you can
work on the operational side, you might be able to keep this model.
If it is possible to experiment on a canary node and observe, I would
probably go this path after identifying a possible origin and solution for
Other tips that might help here:
- Disabling 'dynamic snitching' proved to improve performances (often
clearly visible looking at p99) as there is a better usage of page caching
- Making sure that most of your partitions fit within the read block size
(buffer) you are using can also make reads more efficient (when data is
compressed, the chunk size determines the buffer size.
I hope, this helps. I am curious about that one, please let us know what
you find out :).
Alain Rodriguez - @arodream - al...@thelastpickle.com
France / Spain
The Last Pickle - Apache Cassandra Consulting
2018-05-26 14:21 GMT+01:00 onmstester onmstester <onmstes...@zoho.com>:
> By reading 90 partitions concurrently(each having size > 200 MB), My
> single node Apache Cassandra became unresponsive,
> no read and write works for almost 10 minutes.
> I'm using this configs:
> memtable_allocation_type: offheap_buffers
> gc: G1GC
> heap: 128GB
> concurrent_reads: 128 (having more than 12 disk)
> There is not much pressure on my resources except for the memory that the
> eden with 70GB is filled and reallocated in less than a minute.
> Cpu is about 20% while read is crashed and iostat shows no significant
> load on disk.
> Sent using Zoho Mail <https://www.zoho.com/mail/>