On 21/12/12 16:27, Yiming Sun wrote:
James, using RandomPartitioner, the order of the rows is random, so when you request these rows in "Sequential" order (sort by the date?), Cassandra is not reading them sequentially.
Yes, I understand the "next" row to be retrieved in sequence is likely to be on a different node, and the ordering is random. I'm using the word sequential to try to explain that the data being requested is in an order, and not repeated, until the next cycle. The data is not guaranteed to be of a size that is cache-able as a whole.
The size of the data, 200Mb, 300Mb , and 40Mb, are these the size for each column? Or are these the total size of the entire column family? It wasn't too clear to me. But if these are the total size of the column families, you will be able to fit them mostly in memory, so you should enable row cache.
Size of the column family, on a single node. Row caching is off at the moment.
Are you saying that I should increase the JVM heap to fit some data in the row cache, at the expense of linux disk caching?
Bear in mind that the data is only going to be re-requested in sequence again - I'm not sure what the value is in the cassandra native caching if rows are not re-requested before being evicted.
My current key-cache hit-rates are near zero on this workload, hence I'm interested in cassandra's zero-cache performance. Unless I can guarantee to fit the entire data-set in memory, it's difficult to justify using memory on a cassandra cache if LRU and workload means it's not actually a benefit.
I happen to have done some performance tests of my own on cassandra, mostly on the read, and was also only able to get less than 6MB/sec read rate out of a cluster of 6 nodes RF2 using a single threaded client. But it makes a huge difference when I changed the client to an asynchronous multi-threaded structure.
Yes, I've been talking to the developers about having a separate thread or two that keeps cassandra busy, keeping Disruptor (http://lmax-exchange.github.com/disruptor/) fed to do the processing work.
But this all doesn't change the fact that under this zero-cache workload, cassandra seems to be very CPU expensive for throughput.
thanks James M
On Fri, Dec 21, 2012 at 10:36 AM, James Masson <james.mas...@opigram.com <mailto:james.mas...@opigram.com>> wrote: Hi, thanks for the reply On 21/12/12 14:36, Yiming Sun wrote: I have a few questions for you, James, 1. how many nodes are in your Cassandra ring? 2 or 3 - depending on environment - it doesn't seem to make a difference to throughput very much. What is a 30 minute task on a 2 node environment is a 30 minute task on a 3 node environment. 2. what is the replication factor? 1 3. when you say sequentially, what do you mean? what Partitioner do you use? The data is organised by date - the keys are read sequentially in order, only once. Random partitioner - the data is equally spread across the nodes to avoid hotspots. 4. how many columns per row? how much data per row? per column? varies - described in the schema. create keyspace mykeyspace with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor : 1} and durable_writes = true; create column family entities with column_type = 'Standard' and comparator = 'BytesType' and default_validation_class = 'BytesType' and key_validation_class = 'AsciiType' and read_repair_chance = 0.0 and dclocal_read_repair_chance = 0.0 and gc_grace = 0 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = false and compaction_strategy = 'org.apache.cassandra.db.__compaction.__SizeTieredCompactionStrategy' and caching = 'NONE' and column_metadata = [ {column_name : '64656c65746564', validation_class : BytesType, index_name : 'deleted_idx', index_type : 0}, {column_name : '6576656e744964', validation_class : TimeUUIDType, index_name : 'eventId_idx', index_type : 0}, {column_name : '7061796c6f6164', validation_class : UTF8Type}]; 2 columns per row here - about 200Mb of data in total create column family events with column_type = 'Standard' and comparator = 'BytesType' and default_validation_class = 'BytesType' and key_validation_class = 'TimeUUIDType' and read_repair_chance = 0.0 and dclocal_read_repair_chance = 0.0 and gc_grace = 0 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = false and compaction_strategy = 'org.apache.cassandra.db.__compaction.__SizeTieredCompactionStrategy' and caching = 'NONE'; 1 column per row - about 300Mb of data create column family intervals with column_type = 'Standard' and comparator = 'BytesType' and default_validation_class = 'BytesType' and key_validation_class = 'AsciiType' and read_repair_chance = 0.0 and dclocal_read_repair_chance = 0.0 and gc_grace = 0 and min_compaction_threshold = 4 and max_compaction_threshold = 32 and replicate_on_write = false and compaction_strategy = 'org.apache.cassandra.db.__compaction.__SizeTieredCompactionStrategy' and caching = 'NONE'; variable columns per row - about 40Mb of data. 5. what client library do you use to access Cassandra? (Hector?). Is your client code single threaded? Hector - yes, the processing side of the client is single threaded, but is largely waiting for cassandra responses and has plenty of CPU headroom. I guess what I'm most interested in is why the discrepancy in between read/write latency - although I understand the data volume is much larger in reads, even though the request rate is lower. Network usage on a cassandra box barely gets above 20Mbit, including inter-cluster comms. Averages 5mbit client<>cassandra There is near zero disk I/O, and what little there is is served sub 1ms. Storage is backed by a very fast SAN, but like I said earlier, the dataset just about fits in the Linux disk cache. 2Gb VM, 512Mb cassandra heap - GCs are nice and quick, no JVM memory problems, used heap oscillates between 280-350Mb. Basically, I'm just puzzled as cassandra doesn't behave as I would expect. Huge CPU use in cassandra for very little throughput. I'm struggling to find anything that's wrong with the environment, there's no bottleneck that I can see. thanks James M On Fri, Dec 21, 2012 at 7:27 AM, James Masson <james.mas...@opigram.com <mailto:james.mas...@opigram.com> <mailto:james.masson@opigram.__com <mailto:james.mas...@opigram.com>>> wrote: Hi list-users, We have an application that has a relatively unusual access pattern in cassandra 1.1.6 Essentially we read an entire multi hundred megabyte column family sequentially (little chance of a cassandra cache hit), perform some operations on the data, and write the data back to another column family in the same keyspace. We do about 250 writes/sec and 100 reads/sec during this process. Write request latency is about 900 microsecs, read request latency is about 4000 microsecs. * First Question: Do these numbers make sense? read-request latency seems a little high to me, cassandra hasn't had a chance to cache this data, but it's likely in the Linux disk cache, given the sizing of the node/data/jvm. thanks James M