On Tue, Dec 2, 2014 at 5:01 PM, Gianluca Borello <gianl...@draios.com>
wrote:

> We mainly store time series-like data, where each data point is a binary
> blob of 5-20KB. We use wide rows, and try to put in the same row all the
> data that we usually need in a single query (but not more than that). As a
> result, our application logic is very simple (since we have to do just one
> query to read the data on average) and read/write response times are very
> satisfactory. This is a cfhistograms and a cfstats of our heaviest CF:
>

100mb is not HYOOOGE but is around the size where large rows can cause heap
pressure.

You seem to be unclear on the implications of pending compactions, however.

Briefly, pending compactions indicate that you have more SSTables than you
"should". As compaction both merges row versions and reduces the number of
SSTables, a high number of pending compactions causes problems associated
with both having too many row versions ("fragmentation") and a large number
of SSTables (per-SSTable heap/memory (depending on version) overhead like
bloom filters and index samples). In your case, it seems the problem is
probably just the compaction throttle being too low.

My conjecture is that, given your normal data size and read/write workload,
you are relatively close to "GC pre-fail" when compaction is working. When
it stops working, you relatively quickly get into a state where you exhaust
heap because you have too many SSTables.

=Rob
http://twitter.com/rcolidba
PS - Given 30GB of RAM on the machine, you could consider investigating
"large-heap" configurations, rbranson from Instagram has some slides out
there on the topic. What you pay is longer stop the world GCs, IOW latency
if you happen to be talking to a replica node when it pauses.

Reply via email to