So, you're doing about 20 ops/s where each op consists of "read 2 metadata columns, then read ~250 columns of ~2K each." Is that right?
Is your test client multithreaded? Is it on a separate machine from the Cassandra server? What is your bottleneck? http://spyced.blogspot.com/2010/01/linux-performance-basics.html On Thu, May 17, 2012 at 1:08 PM, Yiming Sun <[email protected]> wrote: > Hi Aaron, > > Thank you for guiding us by breaking down the issue. Please see my answers > embedded > >> Is this a single client ? > > Yes > >> How many columns is it asking for ? > > the client knows a list of all row keys, and it randomly picks 100, and > loops 100 times. It first reads a metadata column to figure out how many > columns to read, and it then reads these columns > >> What sort of query are you sending, slice or named columns? > > currently all queries are slice queries. so the first slice query reads the > metadata column (actually 2 metadata columns, one is for Number of columns > to read, the other for other information which is not needed for the purpose > of performance test, but I kept it in there to make it similar to the real > situation). It then generates the column name array and sends the second > slice query. > > The timing for the queries is completely isolated, and excludes the time > spent generating column name array etc. > > >> From the client side how long is a single read taking ? > > I am not 100% sure on what you are asking... are you saying how long it > takes for SliceQuery.execute()? The average we are getting are between > 50-70 ms, and nodetool report similar latency, differ by 5-10ms at top. > > >> What is the write workload like? it sounds like it's write once read >> many. > > Indeed it is like a WORM environment. For the performance, we don't have any > writes. > >> memory speed > network speed > > yes. right now, our data is only a sample about 250K rows, so the default > 200,000 key cache hits above 90%. But we soon will be hosting the real deal > with about 3M rows, so I am not sure our memory size will be able to keep up > with it. > > In any case, Aaron, please let us know if you have any > suggestions/comments/insights. Thanks! > > -- Y. > > > On Thu, May 17, 2012 at 1:04 AM, aaron morton <[email protected]> > wrote: >> >> The read rate that I have been seeing is about 3MB/sec, and that is >> reading the raw bytes... using string serializer the rate is even lower, >> about 2.2MB/sec. >> >> Can we break this down a bit: >> >> Is this a single client ? >> How many columns is it asking for ? >> What sort of query are you sending, slice or named columns? >> From the client side how long is a single read taking ? >> What is the write workload like? it sounds like it's write once read >> many. >> >> Use nodetool cfstats to see what the read latency is on a single node. >> (see http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/) Is there >> much difference between this and the latency from the client perspective ? >> >> >> >> Using JNA may help, but a blog article seems to say it only increase 13%, >> which is not very significant when the base performance is in single-digit >> MBs. >> >> There are other reasons to have JNA installed: more efficient snapshots >> and advising the OS when file operations should not be cached. >> >> Our environment is virtualized, and the disks are actually SAN through >> fiber channels, so I don't know if that has impact on performance as well. >> >> memory speed > network speed >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
