I haven't tried running a query trace. I'm pretty confident that the difference in performance during the test is due to whether the files are cached or not as - if I explicitly empty the page cache before the test I get a 5 minute slow period - if I leave a few hours between tests but don't do anything to the page cache explicitly I get a 2-3 minute slow period - if I warm the page cache by reading all the files in the Cassandra data directory before the test I don't get any slow period.
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Jonathan Haddad Sent: 22 October 2014 17:20 To: [email protected] Cc: James Lee Subject: Re: Performance Issue: Keeping rows in memory First, did you run a query trace? I recommend Al Tobey's pcstat util to determine if your files are in the buffer cache: https://github.com/tobert/pcstat On Wed, Oct 22, 2014 at 4:34 AM, Thomas Whiteway <[email protected]> wrote: > Hi, > > > > I’m working on an application using a Cassandra (2.1.0) cluster where > > - our entire dataset is around 22GB > > - each node has 48GB of memory but only a single (mechanical) hard > disk > > - in normal operation we have a low level of writes and no reads > > - very occasionally we need to read rows very fast (>1.5K > rows/second), and only read each row once. > > > > When we try and read the rows it takes up to five minutes before > Cassandra is able to keep up. The problem seems to be that it takes a > while to get the data into the page cache and until then Cassandra > can’t retrieve the data from disk fast enough (e.g. if I drop the page > cache mid-test then Cassandra slows down for the next 5 minutes). > > > > Given that the total amount of should fit comfortably in memory I’ve > been trying to find a way to keep the rows cached in memory but there > doesn’t seem to be a particularly great way to achieve this. > > > > I’ve tried enabling the row cache and pre-populating the test by > querying every row before starting the load which gives good > performance, but the row cache isn’t really intended to be used this > way and we’d be fighting the row cache to keep the rows in (e.g. by > cyclically reading through all the rows during normal operation). > > > > Keeping the page cache warm by running a background task to keep > accessing the files for the sstables would be simpler and currently > this is the solution we’re leaning towards, but we have less control > over the page cache, it would be vulnerable to other processes > knocking Cassandra’s files out, and it generally feels like a bit of a hack. > > > > Has anyone had any success with trying to do something similar to this > or have any suggestions for possible solutions? > > > > Thanks, > > Thomas > > -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade
