RE: Performance Issue: Keeping rows in memory

Thomas Whiteway Wed, 22 Oct 2014 09:53:30 -0700

I haven't tried running a query trace.  I'm pretty confident that the 
difference in performance during the test is due to whether the files are 
cached or not as
- if I explicitly empty the page cache before the test I get a 5 minute slow 
period 
- if I leave a few hours between tests but don't do anything to the page cache 
explicitly I get a 2-3 minute slow period 
- if I warm the page cache by reading all the files in the Cassandra data 
directory before the test I don't get any slow period.


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of 
Jonathan Haddad
Sent: 22 October 2014 17:20
To: [email protected]
Cc: James Lee
Subject: Re: Performance Issue: Keeping rows in memory

First, did you run a query trace?

I recommend Al Tobey's pcstat util to determine if your files are in the buffer 
cache: https://github.com/tobert/pcstat



On Wed, Oct 22, 2014 at 4:34 AM, Thomas Whiteway 
<[email protected]> wrote:
> Hi,
>
>
>
> I’m working on an application using a Cassandra (2.1.0) cluster where
>
> -          our entire dataset is around 22GB
>
> -          each node has 48GB of memory but only a single (mechanical) hard
> disk
>
> -          in normal operation we have a low level of writes and no reads
>
> -          very occasionally we need to read rows very fast (>1.5K
> rows/second), and only read each row once.
>
>
>
> When we try and read the rows it takes up to five minutes before 
> Cassandra is able to keep up.  The problem seems to be that it takes a 
> while to get the data into the page cache and until then Cassandra 
> can’t retrieve the data from disk fast enough (e.g. if I drop the page 
> cache mid-test then Cassandra slows down for the next 5 minutes).
>
>
>
> Given that the total amount of should fit comfortably in memory I’ve 
> been trying to find a way to keep the rows cached in memory but there 
> doesn’t seem to be a particularly great way to achieve this.
>
>
>
> I’ve tried enabling the row cache and pre-populating the test by 
> querying every row before starting the load which gives good 
> performance, but the row cache isn’t really intended to be used this 
> way and we’d be fighting the row cache to keep the rows in (e.g. by 
> cyclically reading through all the rows during normal operation).
>
>
>
> Keeping the page cache warm by running a background task to keep 
> accessing the files for the sstables would be simpler and currently 
> this is the solution we’re leaning towards, but we have less control 
> over the page cache, it would be vulnerable to other processes 
> knocking Cassandra’s files out, and it generally feels like a bit of a hack.
>
>
>
> Has anyone had any success with trying to do something similar to this 
> or have any suggestions for possible solutions?
>
>
>
> Thanks,
>
> Thomas
>
>



--
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

RE: Performance Issue: Keeping rows in memory

Reply via email to