subject:"Re\: High CPU usage during repair"

Re: High CPU usage during repair

2013-02-11 Thread aaron morton

 What machine size?
 m1.large 
If you are seeing high CPU move to an m1.xlarge, that's the sweet spot. 

 That's normally ok. How many are waiting?
 
 I have seen 4 this morning 
That's not really abnormal. 
The pending task count goes when when a file *may* be eligible for compaction, 
not when there is a compaction task waiting. 

If you suddenly create a number of new SSTables for a CF the pending count will 
rise, however one of the tasks may compact all the sstables waiting for 
compaction. So the count will suddenly drop as well. 

 Just to make sure I understand you correctly, you suggest that I change 
 throughput to 12 regardless of whether repair is ongoing or not. I will do it 
 using nodetool and change the yaml file in case a restart will occur in the 
 future? 
Yes. 
If you are seeing performance degrade during compaction or repair try reducing 
the throughput. 

I would attribute most of the problems you have described to using m1.large. 

Cheers
 

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/02/2013, at 9:16 AM, Tamar Fraenkel ta...@tok-media.com wrote:

 Hi!
 Thanks for the response.
 See my answers and questions below.
 Thanks!
 Tamar
 
 Tamar Fraenkel 
 Senior Software Engineer, TOK Media 
 
 tokLogo.png
 
 ta...@tok-media.com
 Tel:   +972 2 6409736 
 Mob:  +972 54 8356490 
 Fax:   +972 2 5612956 
 
 
 
 
 On Sun, Feb 10, 2013 at 10:04 PM, aaron morton aa...@thelastpickle.com 
 wrote:
 During repair I see high CPU consumption, 
 Repair reads the data and computes a hash, this is a CPU intensive operation.
 Is the CPU over loaded or is just under load?
  Usually just load, but in the past two weeks I have seen CPU of over 90%!
 I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
 
 What machine size?
 m1.large 
 
 there are compactions waiting.
 That's normally ok. How many are waiting?
 
 I have seen 4 this morning 
 I thought of adding a call to my repair script, before repair starts to do:
 nodetool setcompactionthroughput 0
 and then when repair finishes call
 nodetool setcompactionthroughput 16
 That will remove throttling on compaction and the validation compaction used 
 for the repair. Which may in turn add additional IO load, CPU load and GC 
 pressure. You probably do not want to do this. 
 
 Try reducing the compaction throughput to say 12 normally and see the effect.
 
 Just to make sure I understand you correctly, you suggest that I change 
 throughput to 12 regardless of whether repair is ongoing or not. I will do it 
 using nodetool and change the yaml file in case a restart will occur in the 
 future? 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 11/02/2013, at 1:01 AM, Tamar Fraenkel ta...@tok-media.com wrote:
 
 Hi!
 I run repair weekly, using a scheduled cron job.
 During repair I see high CPU consumption, and messages in the log file
 INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line 122) 
 GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is 3894411264
 From time to time, there are also messages of the form
 INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java (line 
 607) 1 READ messages dropped in last 5000ms
 
 Using opscenter, jmx and nodetool compactionstats I can see that during the 
 time the CPU consumption is high, there are compactions waiting.
 
 I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
 I have the default settings:
 compaction_throughput_mb_per_sec: 16
 in_memory_compaction_limit_in_mb: 64
 multithreaded_compaction: false
 compaction_preheat_key_cache: true
 
 I am thinking on the following solution, and wanted to ask if I am on the 
 right track:
 I thought of adding a call to my repair script, before repair starts to do:
 nodetool setcompactionthroughput 0
 and then when repair finishes call
 nodetool setcompactionthroughput 16
 
 Is this a right solution?
 Thanks,
 Tamar
 
 Tamar Fraenkel 
 Senior Software Engineer, TOK Media 
 
 tokLogo.png
 
 
 ta...@tok-media.com
 Tel:   +972 2 6409736 
 Mob:  +972 54 8356490 
 Fax:   +972 2 5612956

Re: High CPU usage during repair

2013-02-11 Thread Tamar Fraenkel

Thank you very much! Due to monetary limitations I will keep the m1.large
for now, but try the throughput modification.
Tamar

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




On Mon, Feb 11, 2013 at 11:30 AM, aaron morton aa...@thelastpickle.comwrote:

  What machine size?

 m1.large

 If you are seeing high CPU move to an m1.xlarge, that's the sweet spot.

 That's normally ok. How many are waiting?

 I have seen 4 this morning

 That's not really abnormal.
 The pending task count goes when when a file *may* be eligible for
 compaction, not when there is a compaction task waiting.

 If you suddenly create a number of new SSTables for a CF the pending count
 will rise, however one of the tasks may compact all the sstables waiting
 for compaction. So the count will suddenly drop as well.

 Just to make sure I understand you correctly, you suggest that I change
 throughput to 12 regardless of whether repair is ongoing or not. I will do
 it using nodetool and change the yaml file in case a restart will occur in
 the future?

 Yes.
 If you are seeing performance degrade during compaction or repair try
 reducing the throughput.

 I would attribute most of the problems you have described to using
 m1.large.

 Cheers


-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 11/02/2013, at 9:16 AM, Tamar Fraenkel ta...@tok-media.com wrote:

 Hi!
 Thanks for the response.
 See my answers and questions below.
 Thanks!
 Tamar

  *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 tokLogo.png

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956




 On Sun, Feb 10, 2013 at 10:04 PM, aaron morton aa...@thelastpickle.comwrote:

 During repair I see high CPU consumption,

 Repair reads the data and computes a hash, this is a CPU intensive
 operation.
 Is the CPU over loaded or is just under load?

  Usually just load, but in the past two weeks I have seen CPU of over 90%!

 I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.

 What machine size?

 m1.large


 there are compactions waiting.

 That's normally ok. How many are waiting?

 I have seen 4 this morning

 I thought of adding a call to my repair script, before repair starts to
 do:
 nodetool setcompactionthroughput 0
 and then when repair finishes call
 nodetool setcompactionthroughput 16

 That will remove throttling on compaction and the validation compaction
 used for the repair. Which may in turn add additional IO load, CPU load and
 GC pressure. You probably do not want to do this.

 Try reducing the compaction throughput to say 12 normally and see the
 effect.

 Just to make sure I understand you correctly, you suggest that I change
 throughput to 12 regardless of whether repair is ongoing or not. I will do
 it using nodetool and change the yaml file in case a restart will occur in
 the future?

 Cheers


-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 11/02/2013, at 1:01 AM, Tamar Fraenkel ta...@tok-media.com wrote:

 Hi!
 I run repair weekly, using a scheduled cron job.
 During repair I see high CPU consumption, and messages in the log file
 INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line
 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is
 3894411264
 From time to time, there are also messages of the form
 INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java
 (line 607) 1 READ messages dropped in last 5000ms

 Using opscenter, jmx and nodetool compactionstats I can see that during
 the time the CPU consumption is high, there are compactions waiting.

 I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
 I have the default settings:
 compaction_throughput_mb_per_sec: 16
 in_memory_compaction_limit_in_mb: 64
 multithreaded_compaction: false
 compaction_preheat_key_cache: true

 I am thinking on the following solution, and wanted to ask if I am on the
 right track:
 I thought of adding a call to my repair script, before repair starts to
 do:
 nodetool setcompactionthroughput 0
 and then when repair finishes call
 nodetool setcompactionthroughput 16

 Is this a right solution?
 Thanks,
 Tamar

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 tokLogo.png


 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956






tokLogo.png

Re: High CPU usage during repair

2013-02-10 Thread aaron morton

 During repair I see high CPU consumption, 
Repair reads the data and computes a hash, this is a CPU intensive operation.
Is the CPU over loaded or is just under load?

 I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
What machine size?

 there are compactions waiting.
That's normally ok. How many are waiting?

 I thought of adding a call to my repair script, before repair starts to do:
 nodetool setcompactionthroughput 0
 and then when repair finishes call
 nodetool setcompactionthroughput 16
That will remove throttling on compaction and the validation compaction used 
for the repair. Which may in turn add additional IO load, CPU load and GC 
pressure. You probably do not want to do this. 

Try reducing the compaction throughput to say 12 normally and see the effect.

Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/02/2013, at 1:01 AM, Tamar Fraenkel ta...@tok-media.com wrote:

 Hi!
 I run repair weekly, using a scheduled cron job.
 During repair I see high CPU consumption, and messages in the log file
 INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line 122) 
 GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is 3894411264
 From time to time, there are also messages of the form
 INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java (line 
 607) 1 READ messages dropped in last 5000ms
 
 Using opscenter, jmx and nodetool compactionstats I can see that during the 
 time the CPU consumption is high, there are compactions waiting.
 
 I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
 I have the default settings:
 compaction_throughput_mb_per_sec: 16
 in_memory_compaction_limit_in_mb: 64
 multithreaded_compaction: false
 compaction_preheat_key_cache: true
 
 I am thinking on the following solution, and wanted to ask if I am on the 
 right track:
 I thought of adding a call to my repair script, before repair starts to do:
 nodetool setcompactionthroughput 0
 and then when repair finishes call
 nodetool setcompactionthroughput 16
 
 Is this a right solution?
 Thanks,
 Tamar
 
 Tamar Fraenkel 
 Senior Software Engineer, TOK Media 
 
 tokLogo.png
 
 ta...@tok-media.com
 Tel:   +972 2 6409736 
 Mob:  +972 54 8356490 
 Fax:   +972 2 5612956

Re: High CPU usage during repair

2013-02-10 Thread Tamar Fraenkel

Hi!
Thanks for the response.
See my answers and questions below.
Thanks!
Tamar

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




On Sun, Feb 10, 2013 at 10:04 PM, aaron morton aa...@thelastpickle.comwrote:

 During repair I see high CPU consumption,

 Repair reads the data and computes a hash, this is a CPU intensive
 operation.
 Is the CPU over loaded or is just under load?

 Usually just load, but in the past two weeks I have seen CPU of over 90%!

 I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.

 What machine size?

m1.large


 there are compactions waiting.

 That's normally ok. How many are waiting?

 I have seen 4 this morning

 I thought of adding a call to my repair script, before repair starts to do:
 nodetool setcompactionthroughput 0
 and then when repair finishes call
 nodetool setcompactionthroughput 16

 That will remove throttling on compaction and the validation compaction
 used for the repair. Which may in turn add additional IO load, CPU load and
 GC pressure. You probably do not want to do this.

 Try reducing the compaction throughput to say 12 normally and see the
 effect.

 Just to make sure I understand you correctly, you suggest that I change
throughput to 12 regardless of whether repair is ongoing or not. I will do
it using nodetool and change the yaml file in case a restart will occur in
the future?

 Cheers


-
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 11/02/2013, at 1:01 AM, Tamar Fraenkel ta...@tok-media.com wrote:

 Hi!
 I run repair weekly, using a scheduled cron job.
 During repair I see high CPU consumption, and messages in the log file
 INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line
 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is
 3894411264
 From time to time, there are also messages of the form
 INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java
 (line 607) 1 READ messages dropped in last 5000ms

 Using opscenter, jmx and nodetool compactionstats I can see that during
 the time the CPU consumption is high, there are compactions waiting.

 I run Cassandra  version 1.0.11, on 3 node setup on EC2 instances.
 I have the default settings:
 compaction_throughput_mb_per_sec: 16
 in_memory_compaction_limit_in_mb: 64
 multithreaded_compaction: false
 compaction_preheat_key_cache: true

 I am thinking on the following solution, and wanted to ask if I am on the
 right track:
 I thought of adding a call to my repair script, before repair starts to do:
 nodetool setcompactionthroughput 0
 and then when repair finishes call
 nodetool setcompactionthroughput 16

 Is this a right solution?
 Thanks,
 Tamar

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 tokLogo.png


 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956




tokLogo.png

Re: High CPU usage during repair

Re: High CPU usage during repair

Re: High CPU usage during repair

Re: High CPU usage during repair

4 matches

Site Navigation

Mail list logo

Footer information