Re: IO scheduler for SSDs on EC2?
Hi Ali, The best practice is to use the noop scheduler on array of SSDs behind your block device (Hardware RAID controller). If you are using only one SSD disk, the deadline scheduler is the best choice to reduce IO latency. It is not recommended to set cfq on SSDs disks. Regards, Roni Balthazar On 15 March 2015 at 09:03, Ali Akhtar ali.rac...@gmail.com wrote: I was watching a talk recently on Elasticsearch performance in EC2, and they recommended setting the IO scheduler to noop for SSDs. Is that the case for Cassandra as well, or is it recommended to keep the default 'deadline' scheduler for Cassandra? Thanks.
Downgrade Cassandra from 2.1.x to 2.0.x
Hi there, What is the best way to downgrade a C* 2.1.3 cluster to the stable 2.0.12? I know it's not supported, but we are getting too many issues with the 2.1.x... It is leading us to think that the best solution is to use the stable version. Is there a safe way to do that? Cheers, Roni
OOM and high SSTables count
$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_31] at org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) ~[apache-cassandra-2.1.3.jar:2.1.3] So I am asking how to debug this issue and what are the best practices in this situation? Regards, Roni
Re: Possible problem with disk latency
Hi Ja, How are the pending compactions distributed between the nodes? Run nodetool compactionstats on all of your nodes and check if the pendings tasks are balanced or they are concentrated in only few nodes. You also can check the if the SSTable count is balanced running nodetool cfstats on your nodes. Cheers, Roni Balthazar On 25 February 2015 at 13:29, Ja Sam ptrstp...@gmail.com wrote: I do NOT have SSD. I have normal HDD group by JBOD. My CF have SizeTieredCompactionStrategy I am using local quorum for reads and writes. To be precise I have a lot of writes and almost 0 reads. I changed cold_reads_to_omit to 0.0 as someone suggest me. I used set compactionthrouput to 999. So if my disk are idle, my CPU is less then 40%, I have some free RAM - why SSTables count is growing? How I can speed up compactions? On Wed, Feb 25, 2015 at 5:16 PM, Nate McCall n...@thelastpickle.com wrote: If You could be so kind and validate above and give me an answer is my disk are real problems or not? And give me a tip what should I do with above cluster? Maybe I have misconfiguration? You disks are effectively idle. What consistency level are you using for reads and writes? Actually, 'await' is sort of weirdly high for idle SSDs. Check your interrupt mappings (cat /proc/interrupts) and make sure the interrupts are not being stacked on a single CPU.
Re: Possible problem with disk latency
Hi Piotr, Are your repairs finishing without errors? Regards, Roni Balthazar On 25 February 2015 at 15:43, Ja Sam ptrstp...@gmail.com wrote: Hi, Roni, They aren't exactly balanced but as I wrote before they are in range from 2500-6000. If you need exactly data I will check them tomorrow morning. But all nodes in AGRAF have small increase of pending compactions during last week, which is wrong direction I will check in the morning get compaction throuput, but my feeling about this parameter is that it doesn't change anything. Regards Piotr On Wed, Feb 25, 2015 at 7:34 PM, Roni Balthazar ronibaltha...@gmail.com wrote: Hi Piotr, What about the nodes on AGRAF? Are the pending tasks balanced between this DC nodes as well? You can check the pending compactions on each node. Also try to run nodetool getcompactionthroughput on all nodes and check if the compaction throughput is set to 999. Cheers, Roni Balthazar On 25 February 2015 at 14:47, Ja Sam ptrstp...@gmail.com wrote: Hi Roni, It is not balanced. As I wrote you last week I have problems only in DC in which we writes (on screen it is named as AGRAF: https://drive.google.com/file/d/0B4N_AbBPGGwLR21CZk9OV1kxVDA/view). The problem is on ALL nodes in this dc. In second DC (ZETO) only one node have more than 30 SSTables and pending compactions are decreasing to zero. In AGRAF the minimum pending compaction is 2500 , maximum is 6000 (avg on screen from opscenter is less then 5000) Regards Piotrek. p.s. I don't know why my mail client display my name as Ja Sam instead of Piotr Stapp, but this doesn't change anything :) On Wed, Feb 25, 2015 at 5:45 PM, Roni Balthazar ronibaltha...@gmail.com wrote: Hi Ja, How are the pending compactions distributed between the nodes? Run nodetool compactionstats on all of your nodes and check if the pendings tasks are balanced or they are concentrated in only few nodes. You also can check the if the SSTable count is balanced running nodetool cfstats on your nodes. Cheers, Roni Balthazar On 25 February 2015 at 13:29, Ja Sam ptrstp...@gmail.com wrote: I do NOT have SSD. I have normal HDD group by JBOD. My CF have SizeTieredCompactionStrategy I am using local quorum for reads and writes. To be precise I have a lot of writes and almost 0 reads. I changed cold_reads_to_omit to 0.0 as someone suggest me. I used set compactionthrouput to 999. So if my disk are idle, my CPU is less then 40%, I have some free RAM - why SSTables count is growing? How I can speed up compactions? On Wed, Feb 25, 2015 at 5:16 PM, Nate McCall n...@thelastpickle.com wrote: If You could be so kind and validate above and give me an answer is my disk are real problems or not? And give me a tip what should I do with above cluster? Maybe I have misconfiguration? You disks are effectively idle. What consistency level are you using for reads and writes? Actually, 'await' is sort of weirdly high for idle SSDs. Check your interrupt mappings (cat /proc/interrupts) and make sure the interrupts are not being stacked on a single CPU.
Re: Possible problem with disk latency
Hi, Check how many active CompactionExecutors is showing in nodetool tpstats. Maybe your concurrent_compactors is too low. Enforce 1 per CPU core, even it's the default value on 2.1. Some of our nodes were running with 2 compactors, but we have an 8 core CPU... After that monitor your nodes to be sure that the value is not too high. You may get too much IO if you increase concurrent compactors when using spinning disks. Regards, Roni Balthazar On 25 February 2015 at 16:37, Ja Sam ptrstp...@gmail.com wrote: Hi, One more thing. Hinted Handoff for last week for all nodes was less than 5. For me every READ is a problem because it must open too many files (3 SSTables), which occurs as an error in reads, repairs, etc. Regards Piotrek On Wed, Feb 25, 2015 at 8:32 PM, Ja Sam ptrstp...@gmail.com wrote: Hi, It is not obvious, because data is replicated to second data center. We check it manually for random records we put into Cassandra and we find all of them in secondary DC. We know about every single GC failure, but this doesn't change anything. The problem with GC failure is only one: restart the node. For few days we do not have GC errors anymore. It looks for me like memory leaks. We use Chef. By MANUAL compaction you mean running nodetool compact? What does it change to permanently running compactions? Regards Piotrek On Wed, Feb 25, 2015 at 8:13 PM, daemeon reiydelle daeme...@gmail.com wrote: I think you may have a vicious circle of errors: because your data is not properly replicated to the neighbour, it is not replicating to the secondary data center (yeah, obvious). I would suspect the GC errors are (also obviously) the result of a backlog of compactions that take out the neighbour (assuming replication of 3, that means each neighbour is participating in compaction from at least one other node besides the primary you are looking at (and can of course be much more, depending on e.g. vnode count if used). What happens is that when a node fails due to a GC error (can't reclaim space), that causes a cascade of other errors, as you see. Might I suggest you have someone in devops with monitoring experience install a monitoring tool that will notify you of EVERY SINGLE java GC failure event? Your DevOps team may have a favorite log shipping/monitoring tool, could use e.g. Puppet I think you may have to go through a MANUAL, table by table compaction. ... “Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a Ride!” - Hunter Thompson Daemeon C.M. Reiydelle USA (+1) 415.501.0198 London (+44) (0) 20 8144 9872 On Wed, Feb 25, 2015 at 11:01 AM, Ja Sam ptrstp...@gmail.com wrote: Hi Roni, The repair results is following (we run it Friday): Cannot proceed on repair because a neighbor (/192.168.61.201) is dead: session failed But to be honest the neighbor did not died. It seemed to trigger a series of full GC events on the initiating node. The results form logs are: [2015-02-20 16:47:54,884] Starting repair command #2, repairing 7 ranges for keyspace prem_maelstrom_2 (parallelism=PARALLEL, full=false) [2015-02-21 02:21:55,640] Lost notification. You should check server log for repair status of keyspace prem_maelstrom_2 [2015-02-21 02:22:55,642] Lost notification. You should check server log for repair status of keyspace prem_maelstrom_2 [2015-02-21 02:23:55,642] Lost notification. You should check server log for repair status of keyspace prem_maelstrom_2 [2015-02-21 02:24:55,644] Lost notification. You should check server log for repair status of keyspace prem_maelstrom_2 [2015-02-21 04:41:08,607] Repair session d5d01dd0-b917-11e4-bc97-e9a66e5b2124 for range (85070591730234615865843651857942052874,102084710076281535261119195933814292480] failed with error org.apache.cassandra.exceptions.RepairException: [repair #d5d01dd0-b917-11e4-bc97-e9a66e5b2124 on prem_maelstrom_2/customer_events, (85070591730234615865843651857942052874,102084710076281535261119195933814292480]] Sync failed between /192.168.71.196 and /192.168.61.199 [2015-02-21 04:41:08,608] Repair session eb8d8d10-b967-11e4-bc97-e9a66e5b2124 for range (68056473384187696470568107782069813248,85070591730234615865843651857942052874] failed with error java.io.IOException: Endpoint /192.168.61.199 died [2015-02-21 04:41:08,608] Repair session c48aef00-b971-11e4-bc97-e9a66e5b2124 for range (0,10] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/192.168.61.201) is dead: session failed [2015-02-21 04:41:08,609] Repair session c48d38f0-b971-11e4-bc97-e9a66e5b2124 for range (42535295865117307932921825928971026442,68056473384187696470568107782069813248] failed with error java.io.IOException: Cannot proceed on repair because a neighbor
Re: Many pending compactions
Try repair -pr on all nodes. If after that you still have issues, you can try to rebuild the SSTables using nodetool upgradesstables or scrub. Regards, Roni Balthazar Em 18/02/2015, às 14:13, Ja Sam ptrstp...@gmail.com escreveu: ad 3) I did this already yesterday (setcompactionthrouput also). But still SSTables are increasing. ad 1) What do you think I should use -pr or try to use incremental? On Wed, Feb 18, 2015 at 4:54 PM, Roni Balthazar ronibaltha...@gmail.com wrote: You are right... Repair makes the data consistent between nodes. I understand that you have 2 issues going on. You need to run repair periodically without errors and need to decrease the numbers of compactions pending. So I suggest: 1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can use incremental repairs. There were some bugs on 2.1.2. 2) Run cleanup on all nodes 3) Since you have too many cold SSTables, set cold_reads_to_omit to 0.0, and increase setcompactionthroughput for some time and see if the number of SSTables is going down. Let us know what errors are you getting when running repairs. Regards, Roni Balthazar On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam ptrstp...@gmail.com wrote: Can you explain me what is the correlation between growing SSTables and repair? I was sure, until your mail, that repair is only to make data consistent between nodes. Regards On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar ronibaltha...@gmail.com wrote: Which error are you getting when running repairs? You need to run repair on your nodes within gc_grace_seconds (eg: weekly). They have data that are not read frequently. You can run repair -pr on all nodes. Since you do not have deletes, you will not have trouble with that. If you have deletes, it's better to increase gc_grace_seconds before the repair. http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html After repair, try to run a nodetool cleanup. Check if the number of SSTables goes down after that... Pending compactions must decrease as well... Cheers, Roni Balthazar On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam ptrstp...@gmail.com wrote: 1) we tried to run repairs but they usually does not succeed. But we had Leveled compaction before. Last week we ALTER tables to STCS, because guys from DataStax suggest us that we should not use Leveled and alter tables in STCS, because we don't have SSD. After this change we did not run any repair. Anyway I don't think it will change anything in SSTable count - if I am wrong please give me an information 2) I did this. My tables are 99% write only. It is audit system 3) Yes I am using default values 4) In both operations I am using LOCAL_QUORUM. I am almost sure that READ timeout happens because of too much SSTables. Anyway firstly I would like to fix to many pending compactions. I still don't know how to speed up them. On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar ronibaltha...@gmail.com wrote: Are you running repairs within gc_grace_seconds? (default is 10 days) http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html Double check if you set cold_reads_to_omit to 0.0 on tables with STCS that you do not read often. Are you using default values for the properties min_compaction_threshold(4) and max_compaction_threshold(32)? Which Consistency Level are you using for reading operations? Check if you are not reading from DC_B due to your Replication Factor and CL. http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html Cheers, Roni Balthazar On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam ptrstp...@gmail.com wrote: I don't have problems with DC_B (replica) only in DC_A(my system write only to it) I have read timeouts. I checked in OpsCenter SSTable count and I have: 1) in DC_A same +-10% for last week, a small increase for last 24h (it is more than 15000-2 SSTables depends on node) 2) in DC_B last 24h shows up to 50% decrease, which give nice prognostics. Now I have less then 1000 SSTables What did you measure during system optimizations? Or do you have an idea what more should I check? 1) I look at CPU Idle (one node is 50% idle, rest 70% idle) 2) Disk queue - mostly is it near zero: avg 0.09. Sometimes there are spikes 3) system RAM usage is almost full 4) In Total Bytes Compacted most most lines are below 3MB/s. For total DC_A it is less than 10MB/s, in DC_B it looks much better (avg is like 17MB/s) something else? On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar ronibaltha...@gmail.com wrote: Hi, You can check if the number of SSTables is decreasing. Look for the SSTable count information of your tables using nodetool
Re: Many pending compactions
Are you running repairs within gc_grace_seconds? (default is 10 days) http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html Double check if you set cold_reads_to_omit to 0.0 on tables with STCS that you do not read often. Are you using default values for the properties min_compaction_threshold(4) and max_compaction_threshold(32)? Which Consistency Level are you using for reading operations? Check if you are not reading from DC_B due to your Replication Factor and CL. http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html Cheers, Roni Balthazar On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam ptrstp...@gmail.com wrote: I don't have problems with DC_B (replica) only in DC_A(my system write only to it) I have read timeouts. I checked in OpsCenter SSTable count and I have: 1) in DC_A same +-10% for last week, a small increase for last 24h (it is more than 15000-2 SSTables depends on node) 2) in DC_B last 24h shows up to 50% decrease, which give nice prognostics. Now I have less then 1000 SSTables What did you measure during system optimizations? Or do you have an idea what more should I check? 1) I look at CPU Idle (one node is 50% idle, rest 70% idle) 2) Disk queue - mostly is it near zero: avg 0.09. Sometimes there are spikes 3) system RAM usage is almost full 4) In Total Bytes Compacted most most lines are below 3MB/s. For total DC_A it is less than 10MB/s, in DC_B it looks much better (avg is like 17MB/s) something else? On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar ronibaltha...@gmail.com wrote: Hi, You can check if the number of SSTables is decreasing. Look for the SSTable count information of your tables using nodetool cfstats. The compaction history can be viewed using nodetool compactionhistory. About the timeouts, check this out: http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure Also try to run nodetool tpstats to see the threads statistics. It can lead you to know if you are having performance problems. If you are having too many pending tasks or dropped messages, maybe will you need to tune your system (eg: driver's timeout, concurrent reads and so on) Regards, Roni Balthazar On Wed, Feb 18, 2015 at 9:51 AM, Ja Sam ptrstp...@gmail.com wrote: Hi, Thanks for your tip it looks that something changed - I still don't know if it is ok. My nodes started to do more compaction, but it looks that some compactions are really slow. In IO we have idle, CPU is quite ok (30%-40%). We set compactionthrouput to 999, but I do not see difference. Can we check something more? Or do you have any method to monitor progress with small files? Regards On Tue, Feb 17, 2015 at 2:43 PM, Roni Balthazar ronibaltha...@gmail.com wrote: HI, Yes... I had the same issue and setting cold_reads_to_omit to 0.0 was the solution... The number of SSTables decreased from many thousands to a number below a hundred and the SSTables are now much bigger with several gigabytes (most of them). Cheers, Roni Balthazar On Tue, Feb 17, 2015 at 11:32 AM, Ja Sam ptrstp...@gmail.com wrote: After some diagnostic ( we didn't set yet cold_reads_to_omit ). Compaction are running but VERY slow with idle IO. We had a lot of Data files in Cassandra. In DC_A it is about ~12 (only xxx-Data.db) in DC_B has only ~4000. I don't know if this change anything but: 1) in DC_A avg size of Data.db file is ~13 mb. I have few a really big ones, but most is really small (almost 1 files are less then 100mb). 2) in DC_B avg size of Data.db is much bigger ~260mb. Do you think that above flag will help us? On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam ptrstp...@gmail.com wrote: I set setcompactionthroughput 999 permanently and it doesn't change anything. IO is still same. CPU is idle. On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar ronibaltha...@gmail.com wrote: Hi, You can run nodetool compactionstats to view statistics on compactions. Setting cold_reads_to_omit to 0.0 can help to reduce the number of SSTables when you use Size-Tiered compaction. You can also create a cron job to increase the value of setcompactionthroughput during the night or when your IO is not busy. From http://wiki.apache.org/cassandra/NodeTool: 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16 Cheers, Roni Balthazar On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam ptrstp...@gmail.com wrote: One think I do not understand. In my case compaction is running permanently. Is there a way to check which compaction is pending? The only information is about total count. On Monday, February 16, 2015, Ja Sam ptrstp...@gmail.com wrote
Re: Many pending compactions
You are right... Repair makes the data consistent between nodes. I understand that you have 2 issues going on. You need to run repair periodically without errors and need to decrease the numbers of compactions pending. So I suggest: 1) Run repair -pr on all nodes. If you upgrade to the new 2.1.3, you can use incremental repairs. There were some bugs on 2.1.2. 2) Run cleanup on all nodes 3) Since you have too many cold SSTables, set cold_reads_to_omit to 0.0, and increase setcompactionthroughput for some time and see if the number of SSTables is going down. Let us know what errors are you getting when running repairs. Regards, Roni Balthazar On Wed, Feb 18, 2015 at 1:31 PM, Ja Sam ptrstp...@gmail.com wrote: Can you explain me what is the correlation between growing SSTables and repair? I was sure, until your mail, that repair is only to make data consistent between nodes. Regards On Wed, Feb 18, 2015 at 4:20 PM, Roni Balthazar ronibaltha...@gmail.com wrote: Which error are you getting when running repairs? You need to run repair on your nodes within gc_grace_seconds (eg: weekly). They have data that are not read frequently. You can run repair -pr on all nodes. Since you do not have deletes, you will not have trouble with that. If you have deletes, it's better to increase gc_grace_seconds before the repair. http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html After repair, try to run a nodetool cleanup. Check if the number of SSTables goes down after that... Pending compactions must decrease as well... Cheers, Roni Balthazar On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam ptrstp...@gmail.com wrote: 1) we tried to run repairs but they usually does not succeed. But we had Leveled compaction before. Last week we ALTER tables to STCS, because guys from DataStax suggest us that we should not use Leveled and alter tables in STCS, because we don't have SSD. After this change we did not run any repair. Anyway I don't think it will change anything in SSTable count - if I am wrong please give me an information 2) I did this. My tables are 99% write only. It is audit system 3) Yes I am using default values 4) In both operations I am using LOCAL_QUORUM. I am almost sure that READ timeout happens because of too much SSTables. Anyway firstly I would like to fix to many pending compactions. I still don't know how to speed up them. On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar ronibaltha...@gmail.com wrote: Are you running repairs within gc_grace_seconds? (default is 10 days) http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html Double check if you set cold_reads_to_omit to 0.0 on tables with STCS that you do not read often. Are you using default values for the properties min_compaction_threshold(4) and max_compaction_threshold(32)? Which Consistency Level are you using for reading operations? Check if you are not reading from DC_B due to your Replication Factor and CL. http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html Cheers, Roni Balthazar On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam ptrstp...@gmail.com wrote: I don't have problems with DC_B (replica) only in DC_A(my system write only to it) I have read timeouts. I checked in OpsCenter SSTable count and I have: 1) in DC_A same +-10% for last week, a small increase for last 24h (it is more than 15000-2 SSTables depends on node) 2) in DC_B last 24h shows up to 50% decrease, which give nice prognostics. Now I have less then 1000 SSTables What did you measure during system optimizations? Or do you have an idea what more should I check? 1) I look at CPU Idle (one node is 50% idle, rest 70% idle) 2) Disk queue - mostly is it near zero: avg 0.09. Sometimes there are spikes 3) system RAM usage is almost full 4) In Total Bytes Compacted most most lines are below 3MB/s. For total DC_A it is less than 10MB/s, in DC_B it looks much better (avg is like 17MB/s) something else? On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar ronibaltha...@gmail.com wrote: Hi, You can check if the number of SSTables is decreasing. Look for the SSTable count information of your tables using nodetool cfstats. The compaction history can be viewed using nodetool compactionhistory. About the timeouts, check this out: http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure Also try to run nodetool tpstats to see the threads statistics. It can lead you to know if you are having performance problems. If you are having too many pending tasks or dropped messages, maybe will you need to tune your system (eg: driver's timeout, concurrent reads and so on) Regards, Roni Balthazar
Re: Many pending compactions
Hi, You can check if the number of SSTables is decreasing. Look for the SSTable count information of your tables using nodetool cfstats. The compaction history can be viewed using nodetool compactionhistory. About the timeouts, check this out: http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure Also try to run nodetool tpstats to see the threads statistics. It can lead you to know if you are having performance problems. If you are having too many pending tasks or dropped messages, maybe will you need to tune your system (eg: driver's timeout, concurrent reads and so on) Regards, Roni Balthazar On Wed, Feb 18, 2015 at 9:51 AM, Ja Sam ptrstp...@gmail.com wrote: Hi, Thanks for your tip it looks that something changed - I still don't know if it is ok. My nodes started to do more compaction, but it looks that some compactions are really slow. In IO we have idle, CPU is quite ok (30%-40%). We set compactionthrouput to 999, but I do not see difference. Can we check something more? Or do you have any method to monitor progress with small files? Regards On Tue, Feb 17, 2015 at 2:43 PM, Roni Balthazar ronibaltha...@gmail.com wrote: HI, Yes... I had the same issue and setting cold_reads_to_omit to 0.0 was the solution... The number of SSTables decreased from many thousands to a number below a hundred and the SSTables are now much bigger with several gigabytes (most of them). Cheers, Roni Balthazar On Tue, Feb 17, 2015 at 11:32 AM, Ja Sam ptrstp...@gmail.com wrote: After some diagnostic ( we didn't set yet cold_reads_to_omit ). Compaction are running but VERY slow with idle IO. We had a lot of Data files in Cassandra. In DC_A it is about ~12 (only xxx-Data.db) in DC_B has only ~4000. I don't know if this change anything but: 1) in DC_A avg size of Data.db file is ~13 mb. I have few a really big ones, but most is really small (almost 1 files are less then 100mb). 2) in DC_B avg size of Data.db is much bigger ~260mb. Do you think that above flag will help us? On Tue, Feb 17, 2015 at 9:04 AM, Ja Sam ptrstp...@gmail.com wrote: I set setcompactionthroughput 999 permanently and it doesn't change anything. IO is still same. CPU is idle. On Tue, Feb 17, 2015 at 1:15 AM, Roni Balthazar ronibaltha...@gmail.com wrote: Hi, You can run nodetool compactionstats to view statistics on compactions. Setting cold_reads_to_omit to 0.0 can help to reduce the number of SSTables when you use Size-Tiered compaction. You can also create a cron job to increase the value of setcompactionthroughput during the night or when your IO is not busy. From http://wiki.apache.org/cassandra/NodeTool: 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16 Cheers, Roni Balthazar On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam ptrstp...@gmail.com wrote: One think I do not understand. In my case compaction is running permanently. Is there a way to check which compaction is pending? The only information is about total count. On Monday, February 16, 2015, Ja Sam ptrstp...@gmail.com wrote: Of couse I made a mistake. I am using 2.1.2. Anyway night build is available from http://cassci.datastax.com/job/cassandra-2.1/ I read about cold_reads_to_omit It looks promising. Should I set also compaction throughput? p.s. I am really sad that I didn't read this before: https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ On Monday, February 16, 2015, Carlos Rolo r...@pythian.com wrote: Hi 100% in agreement with Roland, 2.1.x series is a pain! I would never recommend the current 2.1.x series for production. Clocks is a pain, and check your connectivity! Also check tpstats to see if your threadpools are being overrun. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo Tel: 1649 www.pythian.com On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer r.etzenham...@t-online.de wrote: Hi, 1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by Al Tobey from DataStax) 7) minimal reads (usually none, sometimes few) those two points keep me repeating an anwser I got. First where did you get 2.1.3 from? Maybe I missed it, I will have a look. But if it is 2.1.2 whis is the latest released version, that version has many bugs - most of them I got kicked by while testing 2.1.2. I got many problems with compactions not beeing triggred on column families not beeing read, compactions and repairs not beeing completed. See https://www.mail-archive.com/search?l=user@cassandra.apache.orgq=subject:%22Re%3A
Re: Many pending compactions
Which error are you getting when running repairs? You need to run repair on your nodes within gc_grace_seconds (eg: weekly). They have data that are not read frequently. You can run repair -pr on all nodes. Since you do not have deletes, you will not have trouble with that. If you have deletes, it's better to increase gc_grace_seconds before the repair. http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html After repair, try to run a nodetool cleanup. Check if the number of SSTables goes down after that... Pending compactions must decrease as well... Cheers, Roni Balthazar On Wed, Feb 18, 2015 at 12:39 PM, Ja Sam ptrstp...@gmail.com wrote: 1) we tried to run repairs but they usually does not succeed. But we had Leveled compaction before. Last week we ALTER tables to STCS, because guys from DataStax suggest us that we should not use Leveled and alter tables in STCS, because we don't have SSD. After this change we did not run any repair. Anyway I don't think it will change anything in SSTable count - if I am wrong please give me an information 2) I did this. My tables are 99% write only. It is audit system 3) Yes I am using default values 4) In both operations I am using LOCAL_QUORUM. I am almost sure that READ timeout happens because of too much SSTables. Anyway firstly I would like to fix to many pending compactions. I still don't know how to speed up them. On Wed, Feb 18, 2015 at 2:49 PM, Roni Balthazar ronibaltha...@gmail.com wrote: Are you running repairs within gc_grace_seconds? (default is 10 days) http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html Double check if you set cold_reads_to_omit to 0.0 on tables with STCS that you do not read often. Are you using default values for the properties min_compaction_threshold(4) and max_compaction_threshold(32)? Which Consistency Level are you using for reading operations? Check if you are not reading from DC_B due to your Replication Factor and CL. http://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html Cheers, Roni Balthazar On Wed, Feb 18, 2015 at 11:07 AM, Ja Sam ptrstp...@gmail.com wrote: I don't have problems with DC_B (replica) only in DC_A(my system write only to it) I have read timeouts. I checked in OpsCenter SSTable count and I have: 1) in DC_A same +-10% for last week, a small increase for last 24h (it is more than 15000-2 SSTables depends on node) 2) in DC_B last 24h shows up to 50% decrease, which give nice prognostics. Now I have less then 1000 SSTables What did you measure during system optimizations? Or do you have an idea what more should I check? 1) I look at CPU Idle (one node is 50% idle, rest 70% idle) 2) Disk queue - mostly is it near zero: avg 0.09. Sometimes there are spikes 3) system RAM usage is almost full 4) In Total Bytes Compacted most most lines are below 3MB/s. For total DC_A it is less than 10MB/s, in DC_B it looks much better (avg is like 17MB/s) something else? On Wed, Feb 18, 2015 at 1:32 PM, Roni Balthazar ronibaltha...@gmail.com wrote: Hi, You can check if the number of SSTables is decreasing. Look for the SSTable count information of your tables using nodetool cfstats. The compaction history can be viewed using nodetool compactionhistory. About the timeouts, check this out: http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure Also try to run nodetool tpstats to see the threads statistics. It can lead you to know if you are having performance problems. If you are having too many pending tasks or dropped messages, maybe will you need to tune your system (eg: driver's timeout, concurrent reads and so on) Regards, Roni Balthazar On Wed, Feb 18, 2015 at 9:51 AM, Ja Sam ptrstp...@gmail.com wrote: Hi, Thanks for your tip it looks that something changed - I still don't know if it is ok. My nodes started to do more compaction, but it looks that some compactions are really slow. In IO we have idle, CPU is quite ok (30%-40%). We set compactionthrouput to 999, but I do not see difference. Can we check something more? Or do you have any method to monitor progress with small files? Regards On Tue, Feb 17, 2015 at 2:43 PM, Roni Balthazar ronibaltha...@gmail.com wrote: HI, Yes... I had the same issue and setting cold_reads_to_omit to 0.0 was the solution... The number of SSTables decreased from many thousands to a number below a hundred and the SSTables are now much bigger with several gigabytes (most of them). Cheers, Roni Balthazar On Tue, Feb 17, 2015 at 11:32 AM, Ja Sam ptrstp...@gmail.com wrote: After some diagnostic ( we didn't set yet cold_reads_to_omit ). Compaction are running but VERY slow
Re: Many pending compactions
Hi, You can run nodetool compactionstats to view statistics on compactions. Setting cold_reads_to_omit to 0.0 can help to reduce the number of SSTables when you use Size-Tiered compaction. You can also create a cron job to increase the value of setcompactionthroughput during the night or when your IO is not busy. From http://wiki.apache.org/cassandra/NodeTool: 0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999 0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16 Cheers, Roni Balthazar On Mon, Feb 16, 2015 at 7:47 PM, Ja Sam ptrstp...@gmail.com wrote: One think I do not understand. In my case compaction is running permanently. Is there a way to check which compaction is pending? The only information is about total count. On Monday, February 16, 2015, Ja Sam ptrstp...@gmail.com wrote: Of couse I made a mistake. I am using 2.1.2. Anyway night build is available from http://cassci.datastax.com/job/cassandra-2.1/ I read about cold_reads_to_omit It looks promising. Should I set also compaction throughput? p.s. I am really sad that I didn't read this before: https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ On Monday, February 16, 2015, Carlos Rolo r...@pythian.com wrote: Hi 100% in agreement with Roland, 2.1.x series is a pain! I would never recommend the current 2.1.x series for production. Clocks is a pain, and check your connectivity! Also check tpstats to see if your threadpools are being overrun. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo Tel: 1649 www.pythian.com On Mon, Feb 16, 2015 at 8:12 PM, Roland Etzenhammer r.etzenham...@t-online.de wrote: Hi, 1) Actual Cassandra 2.1.3, it was upgraded from 2.1.0 (suggested by Al Tobey from DataStax) 7) minimal reads (usually none, sometimes few) those two points keep me repeating an anwser I got. First where did you get 2.1.3 from? Maybe I missed it, I will have a look. But if it is 2.1.2 whis is the latest released version, that version has many bugs - most of them I got kicked by while testing 2.1.2. I got many problems with compactions not beeing triggred on column families not beeing read, compactions and repairs not beeing completed. See https://www.mail-archive.com/search?l=user@cassandra.apache.orgq=subject:%22Re%3A+Compaction+failing+to+trigger%22o=newestf=1 https://www.mail-archive.com/user%40cassandra.apache.org/msg40768.html Apart from that, how are those both datacenters connected? Maybe there is a bottleneck. Also do you have ntp up and running on all nodes to keep all clocks in thight sync? Note: I'm no expert (yet) - just sharing my 2 cents. Cheers, Roland --
Re: High read latency after data volume increased
Hi there, The compaction remains running with our workload. We are using SATA HDDs RAIDs. When trying to run cfhistograms on our user_data table, we are getting this message: nodetool: Unable to compute when histogram overflowed Please see what happens when running some queries on this cf: http://pastebin.com/jbAgDzVK Thanks, Roni Balthazar On Fri, Jan 9, 2015 at 12:03 PM, datastax jlacefi...@datastax.com wrote: Hello You may not be experiencing versioning issues. Do you know if compaction is keeping up with your workload? The behavior described in the subject is typically associated with compaction falling behind or having a suboptimal compaction strategy configured. What does the output of nodetool cfhistograms keyspace table look like for a table that is experiencing this issue? Also, what type of disks are you using on the nodes? Sent from my iPad On Jan 9, 2015, at 8:55 AM, Brian Tarbox briantar...@gmail.com wrote: C* seems to have more than its share of version x doesn't work, use version y type issues On Thu, Jan 8, 2015 at 2:23 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Jan 8, 2015 at 11:14 AM, Roni Balthazar ronibaltha...@gmail.com wrote: We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2. https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ 2.1.2 in particular is known to have significant issues. You'd be better off running 2.1.1 ... =Rob -- http://about.me/BrianTarbox
Re: High read latency after data volume increased
Hi Robert, We downgraded to 2.1.1, but got the very same result. The read latency is still high, but we figured out that it happens only using a specific keyspace. Please see the graphs below... Trying another keyspace with 600+ reads/sec, we are getting the acceptable ~30ms read latency. Let me know if I need to provide more information. Thanks, Roni Balthazar On Thu, Jan 8, 2015 at 5:23 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Jan 8, 2015 at 11:14 AM, Roni Balthazar ronibaltha...@gmail.com wrote: We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2. https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/ 2.1.2 in particular is known to have significant issues. You'd be better off running 2.1.1 ... =Rob
High read latency after data volume increased
Hi there, We are using C* 2.1.2 with 2 DCs. 30 nodes DC1 and 10 nodes DC2. While our data volume is increasing (34 TB now), we are running into some problems: 1) Read latency is around 1000 ms when running 600 reads/sec (DC1 CL.LOCAL_ONE). At the same time the load average is about 20-30 on all DC1 nodes(8 cores CPU - 32 GB RAM). C* starts timing out connections. Still in this scenario OpsCenter has some issues as well. Opscenter resets all Graphs layout and backs to the default layout on every refresh. It doesn't back to normal after the load decrease. I only managed to put OpsCenter to it's normal behavior after reinstalling it. Just for reference, we are using SATA HDDs on all nodes and running hdparm to check disk performance under this load, some nodes are reporting very low read rates (under 10 MB/sec), while others above 100 MB/sec. Under low load average this rate is above 250 MB/sec. 2) Repair takes at least 4-5 days to complete. Last repair was 20 days ago. Running repair under high loads is bringing some nodes down with the exception: JVMStabilityInspector.java:94 - JVM state determined to be unstable. Exiting forcefully due to: java.lang.OutOfMemoryError: Java heap space Any hints? Regards, Roni Balthazar
Re: Operating on large cluster
Hi, We use Puppet to manage our Cassandra configuration. (http://puppetlabs.com) You can use Cluster SSH to send commands to the server as well. Another good choice is Saltstack. Regards, Roni On Thu, Oct 23, 2014 at 5:18 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: Hi, I was wondering about how do you guys handle a large cluster (50+ machines). I mean there is sometime you need to change configuration (cassandra.yaml) or send a command to one, some or all nodes (cleanup, upgradesstables, setstramthoughput or whatever). So far we have been using things like custom scripts for repairs or any routine maintenance and cssh for specific and one shot actions on the cluster. But I guess this doesn't really scale, I guess we coul use pssh instead. For configuration changes we use Capistrano that might scale properly. So I would like to known, what are the methods that operators use on large cluster out there ? Have some of you built some open sourced cluster management interfaces or scripts that could make things easier while operating on large Cassandra clusters ? Alain
What will be the steps for adding new nodes
I have a 0.6.4 Cassandra cluster of two nodes in full replica (replica factor 2). I wants to add two more nodes and balance the cluster (replica factor 2). I want all of them to be seed's. What should be the simple steps: 1. add the AutoBootstraptrue/AutoBootstrap to all the nodes or only the new ones? 2. add the Seed[new_node]/Seed to the config file of the old nodes before adding the new ones? 3. do the old node need to be restarted (if no change is needed in their config file)? TX,
What will be the steps for adding new nodes
I have a 0.6.4 Cassandra cluster of two nodes in full replica (replica factor 2). I wants to add two more nodes and balance the cluster (replica factor 2). I want all of them to be seed's. What should be the simple steps: 1. add the AutoBootstraptrue/AutoBootstrap to all the nodes or only the new ones? 2. add the Seed[new_node]/Seed to the config file of the old nodes before adding the new ones? 3. do the old node need to be restarted (if no change is needed in their config file)? TX,