Re: High CPU usage during repair
What machine size? m1.large If you are seeing high CPU move to an m1.xlarge, that's the sweet spot. That's normally ok. How many are waiting? I have seen 4 this morning That's not really abnormal. The pending task count goes when when a file *may* be eligible for compaction, not when there is a compaction task waiting. If you suddenly create a number of new SSTables for a CF the pending count will rise, however one of the tasks may compact all the sstables waiting for compaction. So the count will suddenly drop as well. Just to make sure I understand you correctly, you suggest that I change throughput to 12 regardless of whether repair is ongoing or not. I will do it using nodetool and change the yaml file in case a restart will occur in the future? Yes. If you are seeing performance degrade during compaction or repair try reducing the throughput. I would attribute most of the problems you have described to using m1.large. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 11/02/2013, at 9:16 AM, Tamar Fraenkel ta...@tok-media.com wrote: Hi! Thanks for the response. See my answers and questions below. Thanks! Tamar Tamar Fraenkel Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Sun, Feb 10, 2013 at 10:04 PM, aaron morton aa...@thelastpickle.com wrote: During repair I see high CPU consumption, Repair reads the data and computes a hash, this is a CPU intensive operation. Is the CPU over loaded or is just under load? Usually just load, but in the past two weeks I have seen CPU of over 90%! I run Cassandra version 1.0.11, on 3 node setup on EC2 instances. What machine size? m1.large there are compactions waiting. That's normally ok. How many are waiting? I have seen 4 this morning I thought of adding a call to my repair script, before repair starts to do: nodetool setcompactionthroughput 0 and then when repair finishes call nodetool setcompactionthroughput 16 That will remove throttling on compaction and the validation compaction used for the repair. Which may in turn add additional IO load, CPU load and GC pressure. You probably do not want to do this. Try reducing the compaction throughput to say 12 normally and see the effect. Just to make sure I understand you correctly, you suggest that I change throughput to 12 regardless of whether repair is ongoing or not. I will do it using nodetool and change the yaml file in case a restart will occur in the future? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 11/02/2013, at 1:01 AM, Tamar Fraenkel ta...@tok-media.com wrote: Hi! I run repair weekly, using a scheduled cron job. During repair I see high CPU consumption, and messages in the log file INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is 3894411264 From time to time, there are also messages of the form INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java (line 607) 1 READ messages dropped in last 5000ms Using opscenter, jmx and nodetool compactionstats I can see that during the time the CPU consumption is high, there are compactions waiting. I run Cassandra version 1.0.11, on 3 node setup on EC2 instances. I have the default settings: compaction_throughput_mb_per_sec: 16 in_memory_compaction_limit_in_mb: 64 multithreaded_compaction: false compaction_preheat_key_cache: true I am thinking on the following solution, and wanted to ask if I am on the right track: I thought of adding a call to my repair script, before repair starts to do: nodetool setcompactionthroughput 0 and then when repair finishes call nodetool setcompactionthroughput 16 Is this a right solution? Thanks, Tamar Tamar Fraenkel Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956
Re: High CPU usage during repair
Thank you very much! Due to monetary limitations I will keep the m1.large for now, but try the throughput modification. Tamar *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Mon, Feb 11, 2013 at 11:30 AM, aaron morton aa...@thelastpickle.comwrote: What machine size? m1.large If you are seeing high CPU move to an m1.xlarge, that's the sweet spot. That's normally ok. How many are waiting? I have seen 4 this morning That's not really abnormal. The pending task count goes when when a file *may* be eligible for compaction, not when there is a compaction task waiting. If you suddenly create a number of new SSTables for a CF the pending count will rise, however one of the tasks may compact all the sstables waiting for compaction. So the count will suddenly drop as well. Just to make sure I understand you correctly, you suggest that I change throughput to 12 regardless of whether repair is ongoing or not. I will do it using nodetool and change the yaml file in case a restart will occur in the future? Yes. If you are seeing performance degrade during compaction or repair try reducing the throughput. I would attribute most of the problems you have described to using m1.large. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 11/02/2013, at 9:16 AM, Tamar Fraenkel ta...@tok-media.com wrote: Hi! Thanks for the response. See my answers and questions below. Thanks! Tamar *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Sun, Feb 10, 2013 at 10:04 PM, aaron morton aa...@thelastpickle.comwrote: During repair I see high CPU consumption, Repair reads the data and computes a hash, this is a CPU intensive operation. Is the CPU over loaded or is just under load? Usually just load, but in the past two weeks I have seen CPU of over 90%! I run Cassandra version 1.0.11, on 3 node setup on EC2 instances. What machine size? m1.large there are compactions waiting. That's normally ok. How many are waiting? I have seen 4 this morning I thought of adding a call to my repair script, before repair starts to do: nodetool setcompactionthroughput 0 and then when repair finishes call nodetool setcompactionthroughput 16 That will remove throttling on compaction and the validation compaction used for the repair. Which may in turn add additional IO load, CPU load and GC pressure. You probably do not want to do this. Try reducing the compaction throughput to say 12 normally and see the effect. Just to make sure I understand you correctly, you suggest that I change throughput to 12 regardless of whether repair is ongoing or not. I will do it using nodetool and change the yaml file in case a restart will occur in the future? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 11/02/2013, at 1:01 AM, Tamar Fraenkel ta...@tok-media.com wrote: Hi! I run repair weekly, using a scheduled cron job. During repair I see high CPU consumption, and messages in the log file INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is 3894411264 From time to time, there are also messages of the form INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java (line 607) 1 READ messages dropped in last 5000ms Using opscenter, jmx and nodetool compactionstats I can see that during the time the CPU consumption is high, there are compactions waiting. I run Cassandra version 1.0.11, on 3 node setup on EC2 instances. I have the default settings: compaction_throughput_mb_per_sec: 16 in_memory_compaction_limit_in_mb: 64 multithreaded_compaction: false compaction_preheat_key_cache: true I am thinking on the following solution, and wanted to ask if I am on the right track: I thought of adding a call to my repair script, before repair starts to do: nodetool setcompactionthroughput 0 and then when repair finishes call nodetool setcompactionthroughput 16 Is this a right solution? Thanks, Tamar *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png
Re: High CPU usage during repair
During repair I see high CPU consumption, Repair reads the data and computes a hash, this is a CPU intensive operation. Is the CPU over loaded or is just under load? I run Cassandra version 1.0.11, on 3 node setup on EC2 instances. What machine size? there are compactions waiting. That's normally ok. How many are waiting? I thought of adding a call to my repair script, before repair starts to do: nodetool setcompactionthroughput 0 and then when repair finishes call nodetool setcompactionthroughput 16 That will remove throttling on compaction and the validation compaction used for the repair. Which may in turn add additional IO load, CPU load and GC pressure. You probably do not want to do this. Try reducing the compaction throughput to say 12 normally and see the effect. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 11/02/2013, at 1:01 AM, Tamar Fraenkel ta...@tok-media.com wrote: Hi! I run repair weekly, using a scheduled cron job. During repair I see high CPU consumption, and messages in the log file INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is 3894411264 From time to time, there are also messages of the form INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java (line 607) 1 READ messages dropped in last 5000ms Using opscenter, jmx and nodetool compactionstats I can see that during the time the CPU consumption is high, there are compactions waiting. I run Cassandra version 1.0.11, on 3 node setup on EC2 instances. I have the default settings: compaction_throughput_mb_per_sec: 16 in_memory_compaction_limit_in_mb: 64 multithreaded_compaction: false compaction_preheat_key_cache: true I am thinking on the following solution, and wanted to ask if I am on the right track: I thought of adding a call to my repair script, before repair starts to do: nodetool setcompactionthroughput 0 and then when repair finishes call nodetool setcompactionthroughput 16 Is this a right solution? Thanks, Tamar Tamar Fraenkel Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956
Re: High CPU usage during repair
Hi! Thanks for the response. See my answers and questions below. Thanks! Tamar *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Sun, Feb 10, 2013 at 10:04 PM, aaron morton aa...@thelastpickle.comwrote: During repair I see high CPU consumption, Repair reads the data and computes a hash, this is a CPU intensive operation. Is the CPU over loaded or is just under load? Usually just load, but in the past two weeks I have seen CPU of over 90%! I run Cassandra version 1.0.11, on 3 node setup on EC2 instances. What machine size? m1.large there are compactions waiting. That's normally ok. How many are waiting? I have seen 4 this morning I thought of adding a call to my repair script, before repair starts to do: nodetool setcompactionthroughput 0 and then when repair finishes call nodetool setcompactionthroughput 16 That will remove throttling on compaction and the validation compaction used for the repair. Which may in turn add additional IO load, CPU load and GC pressure. You probably do not want to do this. Try reducing the compaction throughput to say 12 normally and see the effect. Just to make sure I understand you correctly, you suggest that I change throughput to 12 regardless of whether repair is ongoing or not. I will do it using nodetool and change the yaml file in case a restart will occur in the future? Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 11/02/2013, at 1:01 AM, Tamar Fraenkel ta...@tok-media.com wrote: Hi! I run repair weekly, using a scheduled cron job. During repair I see high CPU consumption, and messages in the log file INFO [ScheduledTasks:1] 2013-02-10 11:48:06,396 GCInspector.java (line 122) GC for ParNew: 208 ms for 1 collections, 1704786200 used; max is 3894411264 From time to time, there are also messages of the form INFO [ScheduledTasks:1] 2012-12-04 13:34:52,406 MessagingService.java (line 607) 1 READ messages dropped in last 5000ms Using opscenter, jmx and nodetool compactionstats I can see that during the time the CPU consumption is high, there are compactions waiting. I run Cassandra version 1.0.11, on 3 node setup on EC2 instances. I have the default settings: compaction_throughput_mb_per_sec: 16 in_memory_compaction_limit_in_mb: 64 multithreaded_compaction: false compaction_preheat_key_cache: true I am thinking on the following solution, and wanted to ask if I am on the right track: I thought of adding a call to my repair script, before repair starts to do: nodetool setcompactionthroughput 0 and then when repair finishes call nodetool setcompactionthroughput 16 Is this a right solution? Thanks, Tamar *Tamar Fraenkel * Senior Software Engineer, TOK Media tokLogo.png ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 tokLogo.png