Aaron,
Thanks for your email. The test kinda resembles how the actual application
will be.
It is going to be a simple key-value store with 500 million keys per node.
The traffic will be read heavy in steady state, and there will be some keys
that will have a lot more traffic than others. The expected hot rows are
estimated to be anywhere between 500000  to 1 million keys.

I have already populated this test system with 500 million keys, compacted
it all to 1 file to check the size of the bloom filter and the index.

This is how i am estimating my memory for 500 million keys. plz correct me
if i am wrong or if i am missing any step.

bloom filter: 1 gig
index samples: Index file is 8.5 gig. I believe this index file is for all
keys. Index interval is 128. Hence in RAM, this would be (8.5g / 128)*10
(factor for datastructure overhead) = 664 mb (lets say 1 gig)

key cache size (3 million): 3 gigs
memtable_total_space_mb : 2 gigs

This totals 7 gig.
my heap size is 8 gigs.
Is there anything else that i am missing here?
When i do top right now, it shows java as 96% memory, thats a concern
because there is no write load. Should i be looking at any other number
here?

Off heap row cache: 500,000 - 750,000 ~ 3 and 5 gigs (avg row size =
250-500 bytes)

My test system has 16 gigs RAM, production system will mostly have 32 gigs
RAM and 12 spindles instead of 6 that i am testing with.

I changed the underneath filesystem from xfs to ext2, and i am seeing
better results, though not the best.
The cfstats latency is down to 20 ms for 35 qps read load. row cache hit
rate is 0.21, key cache = 0.75.
Measuring from the client side, i am seeing roughly 10-15 ms per key, i
would want even lesser though, any tips would greatly help.
In production,  i am hoping the row cache hit rate will be higher.


The biggest thing that is affecting my system right now is the "Invalid
frame size of 0" error that cassandra server seems to be printing. Its
causing read timeouts every minute or 2 minutes. I havent been able to
figure out a way to fix this one. I see someone else also reported seeing
this, but not sure where the problem is hector, cassandra or thrift.

Thanks
Gurpreet






On Wed, May 30, 2012 at 4:38 PM, aaron morton <aa...@thelastpickle.com>wrote:

> 80 ms per request
>
> sounds high.
>
> I'm doing some guessing here, i am guessing memory usage is the problem..
>
> * I assume you are not longer seeing excessive GC activity.
> * The key cache will not get used when you hit the row cache. I would
> disable the row cache if you have a random workload, which it looks like
> you do.
> * 500 million is a lot of keys to have on a single node. At the default
> index sample of every 128 keys it will have about 4 million samples, which
> is probably taking up a lot of memory.
>
> Is this testing a real world scenario or an abstract benchmark ? IMHO you
> will get more insight from testing something that resembles your
> application.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 26/05/2012, at 8:48 PM, Gurpreet Singh wrote:
>
> Hi Aaron,
> Here is the latest on this..
> i switched to a node with 6 disks and running some read tests, and i am
> seeing something weird.
>
> setup:
> 1 node, cassandra 1.0.9, 8 cpu, 16 gig RAM, 6 7200 rpm SATA data disks
> striped 512 kb, commitlog mirrored.
> 1 keyspace with just 1 column family
> random partitioner
> total number of keys: 500 million (the keys are just longs from 1 to 500
> million)
> avg key size: 8 bytes
> bloom filter size: 1 gig
> total disk usage: 70 gigs compacted 1 sstable
> mean compacted row size: 149 bytes
> heap size: 8 gigs
> keycache size: 2 million (takes around 2 gigs in RAM)
> rowcache size: 1 million (off-heap)
> memtable_total_space_mb : 2 gigs
>
> test:
> Trying to do 5 reads per second. Each read is a multigetslice query for
> just 1 key, 2 columns.
>
> observations:
> row cache hit rate: 0.4
> key cache hit rate: 0.0 (this will increase later on as system moves to
> steady state)
> cfstats - 80 ms
>
> iostat (every 5 seconds):
>
> r/s : 400
> %util: 20%  (all disks are at equal utilization)
> await: 65-70 ms (for each disk)
> svctm : 2.11 ms (for each disk)
> r-kB/s - 35000
>
> why this is weird is because..
> 5 reads per second is causing a latency of 80 ms per request (according to
> cfstats). isnt this too high?
> 35 MB/s is being read from the disk. That is again very weird. This number
> is way too high, avg row size is just 149 bytes. Even index reads should
> not cause this high data being read from the disk.
>
> what i understand is that each read request translates to 2 disk accesses
> (because there is only 1 sstable). 1 for the index, 1 for the data. At such
> a low reads/second, why is the latency so high?
>
> would appreciate help debugging this issue.
> Thanks
> Gurpreet
>
>
> On Tue, May 22, 2012 at 2:46 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> With
>>
>> heap size = 4 gigs
>>
>> I would check for GC activity in the logs and consider setting it to 8
>> given you have 16 GB.  You can also check if the IO system is saturated (
>> http://spyced.blogspot.co.nz/2010/01/linux-performance-basics.html) Also
>> take a look at nodetool cfhistogram perhaps to see how many sstables are
>> involved.
>>
>>
>> I would start by looking at the latency reported on the server, then work
>> back to the client….
>>
>> I may have missed it in the email but what recent latency for the CF is
>> reported by nodetool cfstats ? That's latency for a single request on a
>> single read thread. The default settings give you 32 read threads.
>>
>> If you know the latency for a single request, and you know you have 32
>> concurrent read threads, you can get an idea of the max throughput for a
>> single node. Once you get above that throughput the latency for a request
>> will start to include wait time.
>>
>> It's a bit more complicated, because when you request 40 rows that turns
>> into 40 read tasks. So if two clients send a request for 40 rows at the
>> same time there will be 80 read tasks to be processed by 32 threads.
>>
>> Hope that helps.
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 20/05/2012, at 4:10 PM, Radim Kolar wrote:
>>
>> Dne 19.5.2012 0:09, Gurpreet Singh napsal(a):
>>
>> Thanks Radim.
>>
>> Radim, actually 100 reads per second is achievable even with 2 disks.
>>
>> it will become worse as rows will get fragmented.
>>
>> But achieving them with a really low avg latency per key is the issue.
>>
>>
>> I am wondering if anyone has played with index_interval, and how much of
>> a difference would it make to reads on reducing the index_interval.
>>
>> close to zero. but try it yourself too and post your findings.
>>
>>
>>
>
>

Reply via email to