I was using the pre-2.1.0 configuration scheme of setting caching to 
‘rows_only’ on the column family.  I’ve tried runs with  row_cache_size_in_mb 
set to both 16384 and 32768.

I don’t think the new settings would have helped in my case.  My understanding 
of the rows_per_partition setting is that it allows you to restrict the number 
of rows which are cached compared to the pre-2.1.0 way of doing things, while 
we want to cache as much as possible.

From: DuyHai Doan [mailto:[email protected]]
Sent: 22 October 2014 16:59
To: [email protected]
Cc: James Lee
Subject: Re: Performance Issue: Keeping rows in memory

If you're using 2.1.0 the row cache has been redesigned. How did you configure 
it ? There is some new parameters to specify how many "CQL rows" you want to 
keep in the cache: http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1

On Wed, Oct 22, 2014 at 1:34 PM, Thomas Whiteway 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I’m working on an application using a Cassandra (2.1.0) cluster where

-          our entire dataset is around 22GB

-          each node has 48GB of memory but only a single (mechanical) hard disk

-          in normal operation we have a low level of writes and no reads

-          very occasionally we need to read rows very fast (>1.5K 
rows/second), and only read each row once.

When we try and read the rows it takes up to five minutes before Cassandra is 
able to keep up.  The problem seems to be that it takes a while to get the data 
into the page cache and until then Cassandra can’t retrieve the data from disk 
fast enough (e.g. if I drop the page cache mid-test then Cassandra slows down 
for the next 5 minutes).

Given that the total amount of should fit comfortably in memory I’ve been 
trying to find a way to keep the rows cached in memory but there doesn’t seem 
to be a particularly great way to achieve this.

I’ve tried enabling the row cache and pre-populating the test by querying every 
row before starting the load which gives good performance, but the row cache 
isn’t really intended to be used this way and we’d be fighting the row cache to 
keep the rows in (e.g. by cyclically reading through all the rows during normal 
operation).

Keeping the page cache warm by running a background task to keep accessing the 
files for the sstables would be simpler and currently this is the solution 
we’re leaning towards, but we have less control over the page cache, it would 
be vulnerable to other processes knocking Cassandra’s files out, and it 
generally feels like a bit of a hack.

Has anyone had any success with trying to do something similar to this or have 
any suggestions for possible solutions?

Thanks,
Thomas


Reply via email to