On Mon, Feb 13, 2012 at 7:21 PM, Peter Schuller <peter.schul...@infidyne.com > wrote:
> > I actually has the opposite 'problem'. I have a pair of servers that have > > been static since mid last week, but have seen performance vary > > significantly (x10) for exactly the same query. I hypothesised it was > > various caches so I shut down Cassandra, flushed the O/S buffer cache and > > then bought it back up. The performance wasn't significantly different to > > the pre-flush performance > > I don't get this thread at all :) > > Why would restarting with clean caches be expected to *improve* > performance? I was expecting it to reduce performance due to cleaning of keycache and O/S buffer cache - performance stayed roughly the same > And why is key cache loading involved other than to delay > start-up and hopefully pre-populating caches for better (not worse) > performance? > > If you want to figure out why queries seem to be slow relative to > normal, you'll need to monitor the behavior of the nodes. Look at disk > I/O statistics primarily (everyone reading this running Cassandra who > aren't intimately familiar with "iostat -x -k 1" should go and read up > on it right away; make sure you understand the utilization and avg > queue size columns), CPU usage, weather compaction is happening, etc. > Yep - I've been looking at these - I don't see anything in iostat/dstat etc that point strongly to a problem. There is quite a bit of I/O load, but it looks roughly uniform on slow and fast instances of the queries. The last compaction ran 4 days ago - which was before I started seeing variable performance > One easy way to see sudden bursts of poor behavior is to be heavily > reliant on cache, and then have sudden decreases in performance due to > compaction evicting data from page cache while also generating more > I/O. > Unlikely to be a cache issue - In one case an immediate second run of exactly the same query performed significantly worse. > > But that's total speculation. It is also the case that you cannot > expect consistent performance on EC2 and that might be it. > Variable performance from ec2 is my lead theory at the moment. > > But my #1 advise: Log into the node while it is being slow, and > observe. Figure out what the bottleneck is. iostat, top, nodetool > tpstats, nodetool netstats, nodetool compactionstats. > I now why it is slow - it's clearly I/O bound. I am trying to hunt down why it is sometimes much faster even though I have (tried) to replicate the same conditions > > -- > / Peter Schuller (@scode, http://worldmodscode.wordpress.com) > -- *Franc Carter* | Systems architect | Sirca Ltd <marc.zianideferra...@sirca.org.au> franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 9236 9118 Level 9, 80 Clarence St, Sydney NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215