RE: What is the effect of reducing the thrift message sizes on GC

2013-06-18 Thread Viktor Jevdokimov
Our experience shows that write load (memtables) impacts ParNew GC most. More 
writes, more frequent ParNew GC. Time of ParNew GC depends on how many writes 
was made during cycle between ParNew GC's and size of NEW_HEAP (young gen).

Basicly ParNew GC itself takes longer when more objects have to be copied from 
young to old space. So reads and compactions will not promote objects to old 
space (short living objects) and you can see that increased reads and 
compactions during the same write load will increase GC frequency but decrease 
GC pause time.

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider
Take a ride with Adform's Rich Media Suite

[Adform News] 
[Adform awarded the Best Employer 2012] 



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Ananth Gundabattula [mailto:agundabatt...@threatmetrix.com]
Sent: Tuesday, June 18, 2013 10:31 AM
To: user@cassandra.apache.org
Subject: What is the effect of reducing the thrift message sizes on GC

We are currently running on 1.1.10 and planning to migrate to a higher
version 1.2.4.

The question pertains to tweaking all the knobs to reduce GC related issues
( we have been fighting a lot of really bad GC issues on 1.1.10 and met with 
little
success all the way using 1.1.10)

Taking into consideration GC tuning is a black art, I was wondering if we
can have some good effect on the GC by tweaking the following settings:
*thrift_framed_transport_size_in_mb & thrift_max_message_length_in_mb*
*
*
Our system is a very short column (both in number of columns and data sizes
) tables but having millions/billions of rows in each column family. The typical
number of columns in each column family is 4. The typical lookup involves
specifying the row key and fetching one column most of the times. The
writes are also similar except for one keyspace where the number of columns
are 50 but very small data sizes per column.

Assuming we can tweak the config values :
*
*
* > thrift_framed_transport_size_in_mb & *
* >  thrift_max_message_length_in_mb *

to lower values in the above context, I was wondering if it helps in the GC
being invoked less if the thrift settings reflect our data model reads and 
writes ?

For example: What is the impact by reducing the above config values on the
GC to say 1 mb rather than say 15 or 16 ?

Thanks a lot for your inputs and thoughts.


Regards,
Ananth
<><>

Re: What is the effect of reducing the thrift message sizes on GC

2013-06-18 Thread Ananth Gundabattula
Thanks Aaron for the insight.

One quick question:

>The buffers are not pre allocated, but once they are allocated they are
>not returned. So it's only an issue if have lots of clients connecting
>and reading a lot of data.
So to understand you correctly, the buffer is allocated per client
connection and remains all the while during the JVM and is reused for each
request ? 
If that is the case, then I am presuming there is no much gain by playing
around with this config with respect to optimizing for Gcs.

>reduce bloom filters, index intervals Š.
Well we have tried all the configs as advised below (and others like key
cache sizes etc ) and hit a dead end and that is the reason for a 1.2.4
move. Thanks for all your thoughts and advice on this.


Regards,
Ananth 



On 6/18/13 5:56 PM, "aaron morton"  wrote:

>> *thrift_framed_transport_size_in_mb & thrift_max_message_length_in_mb*
>This control the max size of a bugger allocated by thrift when processing
>requests / responses. The buffers are not pre allocated, but once they
>are allocated they are not returned. So it's only an issue if have lots
>of clients connecting and reading a lot of data.
>
>> Our system is a very short column (both in number of columns and data
>>sizes
>> ) tables but having millions/billions of rows in each column family.
>If you have over 500 million rows per node you may be running into issues
>with the bloom filters and index samples.
>
>This typically looks like the heap usage does not reduce after CMS
>compaction has completed.
>
>Ensure the bloom_file_fp_chance on the CF's is set to 0.01 for size
>tiered compaction and 0.1 for levelled compaction. If you need to change
>it  run nodetool upgradesstables
>
>Then consider increasing the index_interval in the yaml file, see the
>comments. 
>
>Note that v 1.2 moves the bloom filters off heap, so if you upgrade to
>1.2 it will probably resolve your issues.
>
>Cheers
>
>-
>Aaron Morton
>Freelance Cassandra Consultant
>New Zealand
>
>@aaronmorton
>http://www.thelastpickle.com
>
>On 18/06/2013, at 7:30 PM, Ananth Gundabattula
> wrote:
>
>> We are currently running on 1.1.10 and planning to migrate to a higher
>> version 1.2.4.
>> 
>> The question pertains to tweaking all the knobs to reduce GC related
>>issues
>> ( we have been fighting a lot of really bad GC issues on 1.1.10 and met
>>with little
>> success all the way using 1.1.10)
>> 
>> Taking into consideration GC tuning is a black art, I was wondering if
>>we
>> can have some good effect on the GC by tweaking the following settings:
>> 
>> *thrift_framed_transport_size_in_mb & thrift_max_message_length_in_mb*
>> *
>> *
>> Our system is a very short column (both in number of columns and data
>>sizes
>> ) tables but having millions/billions of rows in each column family.
>>The typical
>> number of columns in each column family is 4. The typical lookup
>>involves
>> specifying the row key and fetching one column most of the times. The
>> writes are also similar except for one keyspace where the number of
>>columns
>> are 50 but very small data sizes per column.
>> 
>> Assuming we can tweak the config values :
>> *
>> *
>> * > thrift_framed_transport_size_in_mb & *
>> * >  thrift_max_message_length_in_mb *
>> 
>> to lower values in the above context, I was wondering if it helps in
>>the GC
>> being invoked less if the thrift settings reflect our data model reads
>>and writes ?
>> 
>> For example: What is the impact by reducing the above config values on
>>the
>> GC to say 1 mb rather than say 15 or 16 ?
>> 
>> Thanks a lot for your inputs and thoughts.
>> 
>> 
>> Regards,
>> Ananth
>



Re: What is the effect of reducing the thrift message sizes on GC

2013-06-18 Thread aaron morton
> *thrift_framed_transport_size_in_mb & thrift_max_message_length_in_mb*
This control the max size of a bugger allocated by thrift when processing 
requests / responses. The buffers are not pre allocated, but once they are 
allocated they are not returned. So it's only an issue if have lots of clients 
connecting and reading a lot of data. 

> Our system is a very short column (both in number of columns and data sizes
> ) tables but having millions/billions of rows in each column family.
If you have over 500 million rows per node you may be running into issues with 
the bloom filters and index samples. 

This typically looks like the heap usage does not reduce after CMS compaction 
has completed. 

Ensure the bloom_file_fp_chance on the CF's is set to 0.01 for size tiered 
compaction and 0.1 for levelled compaction. If you need to change it  run 
nodetool upgradesstables

Then consider increasing the index_interval in the yaml file, see the comments. 

Note that v 1.2 moves the bloom filters off heap, so if you upgrade to 1.2 it 
will probably resolve your issues. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/06/2013, at 7:30 PM, Ananth Gundabattula  
wrote:

> We are currently running on 1.1.10 and planning to migrate to a higher
> version 1.2.4.
> 
> The question pertains to tweaking all the knobs to reduce GC related issues
> ( we have been fighting a lot of really bad GC issues on 1.1.10 and met with 
> little
> success all the way using 1.1.10)
> 
> Taking into consideration GC tuning is a black art, I was wondering if we
> can have some good effect on the GC by tweaking the following settings:
> 
> *thrift_framed_transport_size_in_mb & thrift_max_message_length_in_mb*
> *
> *
> Our system is a very short column (both in number of columns and data sizes
> ) tables but having millions/billions of rows in each column family. The 
> typical
> number of columns in each column family is 4. The typical lookup involves
> specifying the row key and fetching one column most of the times. The
> writes are also similar except for one keyspace where the number of columns
> are 50 but very small data sizes per column.
> 
> Assuming we can tweak the config values :
> *
> *
> * > thrift_framed_transport_size_in_mb & *
> * >  thrift_max_message_length_in_mb *
> 
> to lower values in the above context, I was wondering if it helps in the GC
> being invoked less if the thrift settings reflect our data model reads and 
> writes ?
> 
> For example: What is the impact by reducing the above config values on the
> GC to say 1 mb rather than say 15 or 16 ?
> 
> Thanks a lot for your inputs and thoughts.
> 
> 
> Regards,
> Ananth