Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm
Hi Peter, are you using the hsha RPC server type on this node? If you are, then it looks like rpc_max_threads threads will be allocated on startup in 2.0.11 while this wasn't the case before. This can exhaust your heap if the value of rpc_max_threads is too large (eg if you use the default). Ciao, Duncan. On 29/10/14 01:08, Peter Haggerty wrote: On a 3 node test cluster we recently upgraded one node from 2.0.10 to 2.0.11. This is a cluster that had been happily running 2.0.10 for weeks and that has very little load and very capable hardware. The upgrade was just your typical package upgrade: $ dpkg -s cassandra | egrep '^Ver|^Main' Maintainer: Eric Evans eev...@apache.org Version: 2.0.11 Immediately after started it ran a couple of ParNews and then started executing CMS runs. In 10 minutes the node had become unreachable and was marked as down by the two other nodes in the ring, which are still 2.0.10. We have jstack output and the server logs but nothing seems to be jumping out. Has anyone else run into this? What should we be looking for? Peter
Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm
That definitely appears to be the issue. Thanks for pointing that out! https://issues.apache.org/jira/browse/CASSANDRA-8116 It looks like 2.0.12 will check for the default and throw an exception (thanks Mike Adamson) and also includes a bit more text in the config file but I'm thinking that 2.0.12 should be pushed out sooner rather than later as anyone using hsha and the default settings will simply have their cluster stop working a few minutes after the upgrade and without any indication of the actual problem. Peter On Wed, Oct 29, 2014 at 5:23 AM, Duncan Sands duncan.sa...@gmail.com wrote: Hi Peter, are you using the hsha RPC server type on this node? If you are, then it looks like rpc_max_threads threads will be allocated on startup in 2.0.11 while this wasn't the case before. This can exhaust your heap if the value of rpc_max_threads is too large (eg if you use the default). Ciao, Duncan. On 29/10/14 01:08, Peter Haggerty wrote: On a 3 node test cluster we recently upgraded one node from 2.0.10 to 2.0.11. This is a cluster that had been happily running 2.0.10 for weeks and that has very little load and very capable hardware. The upgrade was just your typical package upgrade: $ dpkg -s cassandra | egrep '^Ver|^Main' Maintainer: Eric Evans eev...@apache.org Version: 2.0.11 Immediately after started it ran a couple of ParNews and then started executing CMS runs. In 10 minutes the node had become unreachable and was marked as down by the two other nodes in the ring, which are still 2.0.10. We have jstack output and the server logs but nothing seems to be jumping out. Has anyone else run into this? What should we be looking for? Peter
RE: OldGen saturation
Thank you Bryan and Mark. I have redesigned my schema in such a way that I only have 50CFs and I’ve given 2GB for the Heap space and now it’s working fine. De: Mark Reddy [mailto:mark.l.re...@gmail.com] Enviado el: martes, 28 de octubre de 2014 18:31 Para: user@cassandra.apache.org Asunto: Re: OldGen saturation Hi Adrià, We have about 50.000 CFs of varying size Before I read any further, having 50,000 CFs is something that I would highly discourage. Each column family is allocated 1MB of available memory (CASSANDRA-2252https://issues.apache.org/jira/browse/CASSANDRA-2252) so having anything over a few hundred on a 1GB heap would be the first thing I would reconsider. Also 1GB isn't something I'd run a production or load test Cassandra on. If your test machine has only 4GB give it half the total memory (2GB), for a production system you would want something more than a 4GB machine. Here are some JIRAs and mailing list topics on the subject of large quantities of CFs: https://issues.apache.org/jira/browse/CASSANDRA-7643 https://issues.apache.org/jira/browse/CASSANDRA-6794 https://issues.apache.org/jira/browse/CASSANDRA-7444 http://mail-archives.apache.org/mod_mbox/cassandra-user/201407.mbox/%3C10D771CCF4F243149C928D0CB32BCD78@JackKrupansky14%3E http://mail-archives.apache.org/mod_mbox/cassandra-user/201408.mbox/%3ccaazu44m87c1yuffz08nzvtkqnww95yaw9bosy_ugu0fswl7...@mail.gmail.com%3E http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201408.mbox/%3CCALRai9Ao=mdkrklowrbyajjp+fc4h5tpx-ejgdqxtayqj5u...@mail.gmail.com%3E Regards, Mark On 28 October 2014 17:19, Bryan Talbot bryan.tal...@playnext.commailto:bryan.tal...@playnext.com wrote: On Tue, Oct 28, 2014 at 9:02 AM, Adria Arcarons adria.arcar...@greenpowermonitor.commailto:adria.arcar...@greenpowermonitor.com wrote: Hi, Hi We have about 50.000 CFs of varying size The writing test consists of a continuous flow of inserts. The inserts are done inside BATCH statements in groups of 1.000 to a single CF at a time to make them faster. The problem I’m experiencing is that, eventually, when the script has been running for almost 40mins, the heap gets saturated. OldGen gets full and then there is an intensive GC activity trying to free OldGen objects, but it can only free very little space in each pass. Then GC saturates the CPU. Here are the graphs obtained with VisualVM that show this behavior: My total heap size is 1GB and the the NewGen region of 256MB. The C* node has 4GB RAM. Intel Xeon CPU E5520 @ Without looking at your VM graphs, I'm going to go out on a limb here and say that your host is woefully underpowered to host fifty-thousand column families and batch writes of one-thousand statements. A 1 GB java heap size is sometimes acceptable for a unit test or playing around with but you can't actually expect it to be adequate for a load test can you? Every CF consumes some permanent heap space for its metadata. Too many CF are a bad thing. You probably have ten times more CF than would be recommended as an upper limit. -Bryan
Upgrade to 2.1.1 causes error
Have a cassandra cluster that has been running under 2.1.0 fine. Rebuilt the cluster using the same settings on 2.1.1 and get this error, even with only one node present: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless I only have one non-system keyspace. To be sure, I even set system_auth and system_traces to use the same replication factor as my main keyspace, but the error still persists. Tried again with a 2.1.0 node and upgraded it to 2.1.1 and the cluster errors out again. Any ideas or hints? Here is what my keyspaces are set to for RF: CREATE KEYSPACE system_auth WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east': '3'} AND durable_writes = true; CREATE KEYSPACE testspace WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east': '3'} AND durable_writes = true; CREATE KEYSPACE system_traces WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east': '3'} AND durable_writes = true; CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'} AND durable_writes = true; --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
Re: OOM at Bootstrap Time
Some ideas: 1) Put on DEBUG log on the joining node to see what is going on in details with the stream with 1500 files 2) Check the stream ID to see whether it's a new stream or an old one pending On Wed, Oct 29, 2014 at 2:21 AM, Maxime maxim...@gmail.com wrote: Doan, thanks for the tip, I just read about it this morning, just waiting for the new version to pop up on the debian datastax repo. Michael, I do believe you are correct in the general running of the cluster and I've reset everything. So it took me a while to reply, I finally got the SSTables down, as seen in the OpsCenter graphs. I'm stumped however because when I bootstrap the new node, I still see very large number of files being streamed (~1500 for some nodes) and the bootstrap process is failing exactly as it did before, in a flury of Enqueuing flush of ... Any ideas? I'm reaching the end of what I know I can do, OpsCenter says around 32 SStables per CF, but still streaming tons of files. :-/ On Mon, Oct 27, 2014 at 1:12 PM, DuyHai Doan doanduy...@gmail.com wrote: Tombstones will be a very important issue for me since the dataset is very much a rolling dataset using TTLs heavily. -- You can try the new DateTiered compaction strategy ( https://issues.apache.org/jira/browse/CASSANDRA-6602) released on 2.1.1 if you have a time series data model to eliminate tombstones On Mon, Oct 27, 2014 at 5:47 PM, Laing, Michael michael.la...@nytimes.com wrote: Again, from our experience w 2.0.x: Revert to the defaults - you are manually setting heap way too high IMHO. On our small nodes we tried LCS - way too much compaction - switch all CFs to STCS. We do a major rolling compaction on our small nodes weekly during less busy hours - works great. Be sure you have enough disk. We never explicitly delete and only use ttls or truncation. You can set GC to 0 in that case, so tombstones are more readily expunged. There are a couple threads in the list that discuss this... also normal rolling repair becomes optional, reducing load (still repair if something unusual happens tho...). In your current situation, you need to kickstart compaction - are there any CFs you can truncate at least temporarily? Then try compacting a small CF, then another, etc. Hopefully you can get enough headroom to add a node. ml On Sun, Oct 26, 2014 at 6:24 PM, Maxime maxim...@gmail.com wrote: Hmm, thanks for the reading. I initially followed some (perhaps too old) maintenance scripts, which included weekly 'nodetool compact'. Is there a way for me to undo the damage? Tombstones will be a very important issue for me since the dataset is very much a rolling dataset using TTLs heavily. On Sun, Oct 26, 2014 at 6:04 PM, DuyHai Doan doanduy...@gmail.com wrote: Should doing a major compaction on those nodes lead to a restructuration of the SSTables? -- Beware of the major compaction on SizeTiered, it will create 2 giant SSTables and the expired/outdated/tombstone columns in this big file will be never cleaned since the SSTable will never get a chance to be compacted again Essentially to reduce the fragmentation of small SSTables you can stay with SizeTiered compaction and play around with compaction properties (the thresholds) to make C* group a bunch of files each time it compacts so that the file number shrinks to a reasonable count Since you're using C* 2.1 and anti-compaction has been introduced, I hesitate advising you to use Leveled compaction as a work-around to reduce SSTable count. Things are a little bit more complicated because of the incremental repair process (I don't know whether you're using incremental repair or not in production). The Dev blog says that Leveled compaction is performed only on repaired SSTables, the un-repaired ones still use SizeTiered, more details here: http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1 Regards On Sun, Oct 26, 2014 at 9:44 PM, Jonathan Haddad j...@jonhaddad.com wrote: If the issue is related to I/O, you're going to want to determine if you're saturated. Take a look at `iostat -dmx 1`, you'll see avgqu-sz (queue size) and svctm, (service time).The higher those numbers are, the most overwhelmed your disk is. On Sun, Oct 26, 2014 at 12:01 PM, DuyHai Doan doanduy...@gmail.com wrote: Hello Maxime Increasing the flush writers won't help if your disk I/O is not keeping up. I've had a look into the log file, below are some remarks: 1) There are a lot of SSTables on disk for some tables (events for example, but not only). I've seen that some compactions are taking up to 32 SSTables (which corresponds to the default max value for SizeTiered compaction). 2) There is a secondary index that I found suspicious : loc.loc_id_idx. As its name implies I have the impression that it's an index on the id of the loc which would lead to almost an 1-1 relationship between the indexed value and
Re: Upgrade to 2.1.1 causes error
On 10/29/2014 02:05 PM, James Derieg wrote: Have a cassandra cluster that has been running under 2.1.0 fine. Rebuilt the cluster using the same settings on 2.1.1 and get this error, even with only one node present: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless That's not an error. It is an informative message when using 'nodetool status' without specifying a keyspace. https://issues.apache.org/jira/browse/CASSANDRA-7173 -- Michael
Re: Upgrade to 2.1.1 causes error
Ah, thanks Michael! Good to know that's not an error. On 10/29/2014 1:25 PM, Michael Shuler wrote: On 10/29/2014 02:05 PM, James Derieg wrote: Have a cassandra cluster that has been running under 2.1.0 fine. Rebuilt the cluster using the same settings on 2.1.1 and get this error, even with only one node present: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless That's not an error. It is an informative message when using 'nodetool status' without specifying a keyspace. https://issues.apache.org/jira/browse/CASSANDRA-7173 --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
Commissioning failure
I have a 5 node cassandra cluster and i commissioned 1 new node to the cluster. when i added 1 node. it received streams from 3 nodes out of which 2 were completed successfully and one stream got failed. how can i resume the stream which has failed?
Re: Commissioning failure
On Wed, Oct 29, 2014 at 12:49 PM, venkat sam samvenkat...@outlook.com wrote: I have a 5 node cassandra cluster and i commissioned 1 new node to the cluster. when i added 1 node. it received streams from 3 nodes out of which 2 were completed successfully and one stream got failed. how can i resume the stream which has failed? You can't, you have to wipe the node and start over. =Rob http://twitter.com/rcolidba
tuning concurrent_reads param
Hi, looking at the docs, the default value for concurrent_reads is 32, which seems bit small to me (comparing to say http server)? because if my node is receiving slight traffic, any more than 32 concurrent read query will have to wait.(?) Recommend rule is, 16* number of drives. Would that be different if I have SSDs? I am attempting to increase it because I have a few tables have wide rows that app will fetch them, the pure size of data may already eating up the thread time, which can cause other read threads need to wait and essential slow. thanks
Re: tuning concurrent_reads param
Theres a bit to it, sometimes it can use tweaking though. Its a good default for most systems so I wouldn't increase right off the bat. When using ssds or something with a lot of horsepower it could be higher though (ie i2.xlarge+ on ec2). If you monitor the number of active threads in read thread pool (nodetool tpstats) you can see if they are actually all busy or not. If its near 32 (or whatever you set it at) all the time it may be a bottleneck. --- Chris Lohfink On Wed, Oct 29, 2014 at 10:41 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: Hi, looking at the docs, the default value for concurrent_reads is 32, which seems bit small to me (comparing to say http server)? because if my node is receiving slight traffic, any more than 32 concurrent read query will have to wait.(?) Recommend rule is, 16* number of drives. Would that be different if I have SSDs? I am attempting to increase it because I have a few tables have wide rows that app will fetch them, the pure size of data may already eating up the thread time, which can cause other read threads need to wait and essential slow. thanks
Re: Commissioning failure
What could be the reasons for the stream error other than SSTABLE corruption? Aravind -Robert Coli rc...@eventbrite.com wrote: - To: user@cassandra.apache.org user@cassandra.apache.org From: Robert Coli rc...@eventbrite.com Date: 10/30/2014 02:21AM Subject: Re: Commissioning failure On Wed, Oct 29, 2014 at 12:49 PM, venkat sam samvenkat...@outlook.com wrote: I have a 5 node cassandra cluster and i commissioned 1 new node to the cluster. when i added 1 node. it received streams from 3 nodes out of which 2 were completed successfully and one stream got failed. how can i resume the stream which has failed? You can't, you have to wipe the node and start over. =Rob http://twitter.com/rcolidba =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you