Re: Cassandra read throughput with little/no caching.

2013-01-03 Thread Tyler Hobbs
Your description above was much better :-) I'm more interested in docs for the raw metrics provided in JMX. I don't think there are any good docs for what is exposed directly through JMX. Most of the OpsCenter metrics map closely to one exposed JMX item, so that's a start. Other than that,

Re: Cassandra read throughput with little/no caching.

2013-01-02 Thread James Masson
On 31/12/12 18:45, Tyler Hobbs wrote: On Mon, Dec 31, 2012 at 11:24 AM, James Masson james.mas...@opigram.com mailto:james.mas...@opigram.com wrote: Well, it turns out the Read-Request Latency graph in Ops-Center is highly misleading. Using jconsole, the read-latency for the

Re: Cassandra read throughput with little/no caching.

2013-01-02 Thread Tyler Hobbs
On Wed, Jan 2, 2013 at 5:28 AM, James Masson james.mas...@opigram.comwrote: thanks for clarifying this. So you're saying the difference between the global Read Request latency in opscenter, and the column family specific one is in the effort coordinating a validated read across multiple

Re: Cassandra read throughput with little/no caching.

2013-01-02 Thread James Masson
On 02/01/13 16:18, Tyler Hobbs wrote: On Wed, Jan 2, 2013 at 5:28 AM, James Masson james.mas...@opigram.com mailto:james.mas...@opigram.com wrote: 1) Hector sends a request to some node in the cluster, which will act as the coordinator. 2) The coordinator then sends the actual read requests

Re: Cassandra read throughput with little/no caching.

2012-12-31 Thread James Masson
Hi Yiming, I've had the chance to observe what happens to cassandra read response time over time. It starts out with fast 1ms reads, until the first compaction starts, then the CPUs are maxed out for a period, and read latency rises to 4ms. After compaction finishes, the system returns to

Re: Cassandra read throughput with little/no caching.

2012-12-31 Thread James Masson
Well, it turns out the Read-Request Latency graph in Ops-Center is highly misleading. Using jconsole, the read-latency for the column family in question is actually normally around 800 microseconds, punctuated by occasional big spikes that drive up the averages. Towards the end of the

Re: Cassandra read throughput with little/no caching.

2012-12-31 Thread Keith Wright
Following up on this, I was hoping to get everyone's take on my use case for Cassandra and see if everyone agrees it can meet the requirements: I have a very tight SLA around get times. These are almost always single row fetches for 20-50 columns on a row that is likely under 200 columns. The

Re: Cassandra read throughput with little/no caching.

2012-12-31 Thread Tyler Hobbs
On Mon, Dec 31, 2012 at 11:24 AM, James Masson james.mas...@opigram.comwrote: Well, it turns out the Read-Request Latency graph in Ops-Center is highly misleading. Using jconsole, the read-latency for the column family in question is actually normally around 800 microseconds, punctuated by

Re: Cassandra read throughput with little/no caching.

2012-12-24 Thread James Masson
On 21/12/12 17:56, Yiming Sun wrote: James, you could experiment with Row cache, with off-heap JNA cache, and see if it helps. My own experience with row cache was not good, and the OS cache seemed to be most useful, but in my case, our data space was big, over 10TB. Your sequential access

Re: Cassandra read throughput with little/no caching.

2012-12-23 Thread aaron morton
First, the non helpful advice, I strongly suggest changing the data model so you do not have 100MB+ rows. They will make life harder. Write request latency is about 900 microsecs, read request latency is about 4000 microsecs. 4 milliseconds to

Cassandra read throughput with little/no caching.

2012-12-21 Thread James Masson
Hi list-users, We have an application that has a relatively unusual access pattern in cassandra 1.1.6 Essentially we read an entire multi hundred megabyte column family sequentially (little chance of a cassandra cache hit), perform some operations on the data, and write the data back to

Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread Yiming Sun
I have a few questions for you, James, 1. how many nodes are in your Cassandra ring? 2. what is the replication factor? 3. when you say sequentially, what do you mean? what Partitioner do you use? 4. how many columns per row? how much data per row? per column? 5. what client library do you use

Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread Yiming Sun
James, using RandomPartitioner, the order of the rows is random, so when you request these rows in Sequential order (sort by the date?), Cassandra is not reading them sequentially. The size of the data, 200Mb, 300Mb , and 40Mb, are these the size for each column? Or are these the total size of

Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread James Masson
On 21/12/12 16:27, Yiming Sun wrote: James, using RandomPartitioner, the order of the rows is random, so when you request these rows in Sequential order (sort by the date?), Cassandra is not reading them sequentially. Yes, I understand the next row to be retrieved in sequence is likely to

Re: Cassandra read throughput with little/no caching.

2012-12-21 Thread Yiming Sun
James, you could experiment with Row cache, with off-heap JNA cache, and see if it helps. My own experience with row cache was not good, and the OS cache seemed to be most useful, but in my case, our data space was big, over 10TB. Your sequential access pattern certainly doesn't play well with