Trying to approach this in a bit more structured way to make it more helpful for others.
AFAIU my problem seems to be the combination of heavy write load and very limited RAM (2GB). C* design seems to cause nodes to run out of heap space instead of reducing processing of incoming writes. What I am looking for is a configuration setting that keps C* stable and puts the burden on the client to slow down if it wants all writes to be handled. (Sure, we can scale out - but the point here is really that I want to configure C* to protect itself better from being overwhelmed by writes). What I think I understand so far is that there are four major areas to look at: - Cache sizes - Compaction - GC - Request handling * Cache sizes * Cache Sizes are solely a topic relevant to reads, correct? As I do not do reads for now, I set everything to 0 I found about key and row caches. Anything else? * Compaction* I do not fully understand how compaction thresholds mix with GC. For example, there is a comment[1] in cassandra.yaml that tells me that other config switches are far more important than the threshold defined by flush_largest_memtables_at. I did set - flush_largest_memtables_at: 0.50 - CMSInitiatingOccupancyFraction: 50 Does that make sense or should I go lower, i.e. 0.1 and 10? In addition, I have disabled compaction throttling (set to 0). Makes sense? And I did set - in_memory_compaction_limit_in_mb: 1 Is that actually good or bad? * GC * Besides CMSInitiatingOccupancyFraction I do not really have an understanding what else regarding GC I should do to prevent the OutOfMemory error I see in my logs. *Request Handling* The goal here would be to find a configuration that prevents request processing when C* is still busy writing to disk (which is what *I* want in this case, right?) I have set - rpc_server_type: hsha (though I only have one client, so sync would not make a difference) - rpc_min_threads: 1 - rpc_max_threads: 1 (which also renders hsha vs sync irrelevant, right?) Do you have any suggestion what I could specifically monitor to see the development of the causes of the out of memory error and what other switches I should try out? Jan [1] # emergency pressure valve: each time heap usage after a full (CMS) # garbage collection is above this fraction of the max, Cassandra will # flush the largest memtables. # # Set to 1.0 to disable. Setting this lower than # CMSInitiatingOccupancyFraction is not likely to be useful. # # RELYING ON THIS AS YOUR PRIMARY TUNING MECHANISM WILL WORK POORLY: # it is most effective under light to moderate load, or read-heavy # workloads; under truly massive write load, it will often be too # little, too late. flush_largest_memtables_at: 0.50