Re: Cassandra read throughput with little/no caching.

James Masson Fri, 21 Dec 2012 09:04:18 -0800


On 21/12/12 16:27, Yiming Sun wrote:

James, using RandomPartitioner, the order of the rows is random, so when
you request these rows in "Sequential" order (sort by the date?),
Cassandra is not reading them sequentially.

Yes, I understand the "next" row to be retrieved in sequence is likelyto be on a different node, and the ordering is random. I'm using theword sequential to try to explain that the data being requested is in anorder, and not repeated, until the next cycle. The data is notguaranteed to be of a size that is cache-able as a whole.


The size of the data, 200Mb, 300Mb , and 40Mb, are these the size for
each column? Or are these the total size of the entire column family?
  It wasn't too clear to me.  But if these are the total size of the
column families, you will be able to fit them mostly in memory, so you
should enable row cache.

Size of the column family, on a single node. Row caching is off at themoment.

Are you saying that I should increase the JVM heap to fit some data inthe row cache, at the expense of linux disk caching?

Bear in mind that the data is only going to be re-requested in sequenceagain - I'm not sure what the value is in the cassandra native cachingif rows are not re-requested before being evicted.

My current key-cache hit-rates are near zero on this workload, hence I'minterested in cassandra's zero-cache performance. Unless I can guaranteeto fit the entire data-set in memory, it's difficult to justify usingmemory on a cassandra cache if LRU and workload means it's not actuallya benefit.


I happen to have done some performance tests of my own on cassandra,
mostly on the read, and was also only able to get less than 6MB/sec read
rate out of a cluster of 6 nodes RF2 using a single threaded client.
  But it makes a huge difference when I changed the client to an
asynchronous multi-threaded structure.

Yes, I've been talking to the developers about having a separate threador two that keeps cassandra busy, keeping Disruptor(http://lmax-exchange.github.com/disruptor/) fed to do the processing work.

But this all doesn't change the fact that under this zero-cacheworkload, cassandra seems to be very CPU expensive for throughput.


thanks

James M




On Fri, Dec 21, 2012 at 10:36 AM, James Masson <james.mas...@opigram.com
<mailto:james.mas...@opigram.com>> wrote:


    Hi,

    thanks for the reply


    On 21/12/12 14:36, Yiming Sun wrote:

        I have a few questions for you, James,

        1. how many nodes are in your Cassandra ring?


    2 or 3 - depending on environment - it doesn't seem to make a
    difference to throughput very much. What is a 30 minute task on a 2
    node environment is a 30 minute task on a 3 node environment.


        2. what is the replication factor?


    1

        3. when you say sequentially, what do you mean?  what
        Partitioner do you
        use?


    The data is organised by date - the keys are read sequentially in
    order, only once.

    Random partitioner - the data is equally spread across the nodes to
    avoid hotspots.


        4. how many columns per row?  how much data per row?  per column?


    varies - described in the schema.

    create keyspace mykeyspace
       with placement_strategy = 'SimpleStrategy'
       and strategy_options = {replication_factor : 1}
       and durable_writes = true;


    create column family entities
       with column_type = 'Standard'
       and comparator = 'BytesType'
       and default_validation_class = 'BytesType'
       and key_validation_class = 'AsciiType'
       and read_repair_chance = 0.0
       and dclocal_read_repair_chance = 0.0
       and gc_grace = 0
       and min_compaction_threshold = 4
       and max_compaction_threshold = 32
       and replicate_on_write = false
       and compaction_strategy =
    'org.apache.cassandra.db.__compaction.__SizeTieredCompactionStrategy'
       and caching = 'NONE'
       and column_metadata = [
         {column_name : '64656c65746564',
         validation_class : BytesType,
         index_name : 'deleted_idx',
         index_type : 0},
         {column_name : '6576656e744964',
         validation_class : TimeUUIDType,
         index_name : 'eventId_idx',
         index_type : 0},
         {column_name : '7061796c6f6164',
         validation_class : UTF8Type}];

    2 columns per row here - about 200Mb of data in total


    create column family events
       with column_type = 'Standard'
       and comparator = 'BytesType'
       and default_validation_class = 'BytesType'
       and key_validation_class = 'TimeUUIDType'
       and read_repair_chance = 0.0
       and dclocal_read_repair_chance = 0.0
       and gc_grace = 0
       and min_compaction_threshold = 4
       and max_compaction_threshold = 32
       and replicate_on_write = false
       and compaction_strategy =
    'org.apache.cassandra.db.__compaction.__SizeTieredCompactionStrategy'
       and caching = 'NONE';

    1 column per row - about 300Mb of data

    create column family intervals
       with column_type = 'Standard'
       and comparator = 'BytesType'
       and default_validation_class = 'BytesType'
       and key_validation_class = 'AsciiType'
       and read_repair_chance = 0.0
       and dclocal_read_repair_chance = 0.0
       and gc_grace = 0
       and min_compaction_threshold = 4
       and max_compaction_threshold = 32
       and replicate_on_write = false
       and compaction_strategy =
    'org.apache.cassandra.db.__compaction.__SizeTieredCompactionStrategy'
       and caching = 'NONE';

    variable columns per row - about 40Mb of data.



        5. what client library do you use to access Cassandra?
          (Hector?).  Is
        your client code single threaded?


    Hector - yes, the processing side of the client is single threaded,
    but is largely waiting for cassandra responses and has plenty of CPU
    headroom.


    I guess what I'm most interested in is why the discrepancy in
    between read/write latency - although I understand the data volume
    is much larger in reads, even though the request rate is lower.

    Network usage on a cassandra box barely gets above 20Mbit, including
    inter-cluster comms. Averages 5mbit client<>cassandra

    There is near zero disk I/O, and what little there is is served sub
    1ms. Storage is backed by a very fast SAN, but like I said earlier,
    the dataset just about fits in the Linux disk cache. 2Gb VM, 512Mb
    cassandra heap - GCs are nice and quick, no JVM memory problems,
    used heap oscillates between 280-350Mb.

    Basically, I'm just puzzled as cassandra doesn't behave as I would
    expect. Huge CPU use in cassandra for very little throughput. I'm
    struggling to find anything that's wrong with the environment,
    there's no bottleneck that I can see.

    thanks

    James M





        On Fri, Dec 21, 2012 at 7:27 AM, James Masson
        <james.mas...@opigram.com <mailto:james.mas...@opigram.com>
        <mailto:james.masson@opigram.__com
        <mailto:james.mas...@opigram.com>>> wrote:


             Hi list-users,

             We have an application that has a relatively unusual access
        pattern
             in cassandra 1.1.6

             Essentially we read an entire multi hundred megabyte column
        family
             sequentially (little chance of a cassandra cache hit),
        perform some
             operations on the data, and write the data back to another
        column
             family in the same keyspace.

             We do about 250 writes/sec and 100 reads/sec during this
        process.
             Write request latency is about 900 microsecs, read request
        latency
             is about 4000 microsecs.

             * First Question: Do these numbers make sense?

             read-request latency seems a little high to me, cassandra
        hasn't had
             a chance to cache this data, but it's likely in the Linux disk
             cache, given the sizing of the node/data/jvm.

             thanks

             James M

Re: Cassandra read throughput with little/no caching.

Reply via email to