Enable tracing in cqlsh and see how many sstables are being lifted to
satisfy the query (are you repeatedly writing to the same partition
[row_time]) over time?).

Also watch for whether you're hitting a lot of tombstones (are you deleting
lots of values in the same partition over time?).

On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith <david92galbra...@gmail.com>
wrote:

> Duncan: I'm thinking it might be something like that. I'm also seeing just
> a ton of garbage collection on the box, could it be pulling rows for all
> 100k attrs for a given row_time into memory since only row_time is the
> partition key?
>
> Jens: I'm not using EBS (although I used to until I read up on how useless
> it is). I'm not sure what constitutes proper paging but my client has a
> pretty small amount of available memory so I'm doing pages of size 5k using
> the C++ Datastax driver.
>
> Thanks for the replies!
>
> -Dave
>
> On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil <jens.ran...@tink.se> wrote:
>
>> Also, two control questions:
>>
>>    - Are you using EBS for data storage? It might introduce additional
>>    latencies.
>>    - Are you doing proper paging when querying the keyspace?
>>
>> Cheers,
>> Jens
>>
>> On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith <
>> david92galbra...@gmail.com> wrote:
>>
>>> Hi! So I've got a table like this:
>>>
>>> CREATE TABLE "default".metrics (row_time int,attrs varchar,offset
>>> int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
>>> STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
>>> comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
>>> index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
>>> AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
>>> speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
>>> compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
>>> AND compression={'sstable_compression':'LZ4Compressor'};
>>>
>>> and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4
>>> GB of heap space. So it's timeseries data that I'm doing so I increment
>>> "row_time" each day, "attrs" is additional identifying information about
>>> each series, and "offset" is the number of milliseconds into the day for
>>> each data point. So for the past 5 days, I've been inserting 3k
>>> points/second distributed across 100k distinct "attrs"es. And now when I
>>> try to run queries on this data that look like
>>>
>>> "SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs =
>>> 'potatoes_and_jam'"
>>>
>>> it takes an absurdly long time and sometimes just times out. I did
>>> "nodetool cftsats default" and here's what I get:
>>>
>>> Keyspace: default
>>>     Read Count: 59
>>>     Read Latency: 397.12523728813557 ms.
>>>     Write Count: 155128
>>>     Write Latency: 0.3675690719921613 ms.
>>>     Pending Flushes: 0
>>>         Table: metrics
>>>         SSTable count: 26
>>>         Space used (live): 35146349027
>>>         Space used (total): 35146349027
>>>         Space used by snapshots (total): 0
>>>         SSTable Compression Ratio: 0.10386468749216264
>>>         Memtable cell count: 141800
>>>         Memtable data size: 31071290
>>>         Memtable switch count: 41
>>>         Local read count: 59
>>>         Local read latency: 397.126 ms
>>>         Local write count: 155128
>>>         Local write latency: 0.368 ms
>>>         Pending flushes: 0
>>>         Bloom filter false positives: 0
>>>         Bloom filter false ratio: 0.00000
>>>         Bloom filter space used: 2856
>>>         Compacted partition minimum bytes: 104
>>>         Compacted partition maximum bytes: 36904729268
>>>         Compacted partition mean bytes: 986530969
>>>         Average live cells per slice (last five minutes):
>>> 501.66101694915255
>>>         Maximum live cells per slice (last five minutes): 502.0
>>>         Average tombstones per slice (last five minutes): 0.0
>>>         Maximum tombstones per slice (last five minutes): 0.0
>>>
>>> Ouch! 400ms of read latency, orders of magnitude higher than it has any
>>> right to be. How could this have happened? Is there something fundamentally
>>> broken about my data model? Thanks!
>>>
>>>
>>
>>
>> --
>> Jens Rantil
>> Backend engineer
>> Tink AB
>>
>> Email: jens.ran...@tink.se
>> Phone: +46 708 84 18 32
>> Web: www.tink.se
>>
>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>>  Twitter <https://twitter.com/tink>
>>
>
>

Reply via email to