[jira] [Commented] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665726#comment-16665726 ] Jeff Jirsa commented on CASSANDRA-14840: No, its an open issue. There are some proposed patches flying around that may work for you (check the linked JIRA), but it's not committed or tested or guaranteed to work. Also, you dont have to put iptables on every host, you can put the rules only on hte bootstrapping host, you just need to be able to block most of the cluster quickly (and undo it quickly). > Bootstrap of new node fails with OOM in a large cluster > --- > > Key: CASSANDRA-14840 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14840 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Critical > > We are seeing new node addition fails with OOM during bootstrap in a cluster > of more than 80 nodes and 3000 CF without any data in those CFs. > > Steps to reproduce: > # Launch a 3 node cluster > # Create 3000 CF in the cluster > # Start adding nodes to the cluster one by one > # After adding 75-80 nodes, the new node bootstrap fails with OOM. > {code:java} > ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 > JVMStabilityInspector.java:78 - Exiting due to error while processing commit > log during initialization. > java.lang.OutOfMemoryError: Java heap space > at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151] > at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151] > at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151] > at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151] > at java.lang.String.format(String.java:2940) ~[na:1.8.0_151] > at > org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code} > Cassandra Version: 2.1.16 > OS: CentOS7 > num_tokens: 256 on each node. > > This behavior is blocking us from adding extra capacity when needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663076#comment-16663076 ] Jai Bheemsen Rao Dhanwada commented on CASSANDRA-14840: --- [~jjirsa] This is the production cluster and all the CF are being used, so I can't delete any of the CF. # I am already using Offheap memtables, still the getting OOM. Current Heap settings are `-Xms8192M -Xmx8192M -Xmn1200M` and CMS Heap. I tried increasing the heap size to 16G and after adding 120 nodes I still OOM issues to the new node bootstrapping. any other suggestions here? # Sounds like very handful approach, not sure if I can time it very well. > Bootstrap of new node fails with OOM in a large cluster > --- > > Key: CASSANDRA-14840 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14840 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Critical > > We are seeing new node addition fails with OOM during bootstrap in a cluster > of more than 80 nodes and 3000 CF without any data in those CFs. > > Steps to reproduce: > # Launch a 3 node cluster > # Create 3000 CF in the cluster > # Start adding nodes to the cluster one by one > # After adding 75-80 nodes, the new node bootstrap fails with OOM. > {code:java} > ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 > JVMStabilityInspector.java:78 - Exiting due to error while processing commit > log during initialization. > java.lang.OutOfMemoryError: Java heap space > at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151] > at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151] > at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151] > at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151] > at java.lang.String.format(String.java:2940) ~[na:1.8.0_151] > at > org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code} > Cassandra Version: 2.1.16 > OS: CentOS7 > num_tokens: 256 on each node. > > This behavior is blocking us from adding extra capacity when needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661748#comment-16661748 ] Jeff Jirsa commented on CASSANDRA-14840: This is a duplicate of CASSANDRA-11748 and/or CASSANDRA-13569 - what's happening is that when the new instance comes online, it pulls schema from all of the other instances in the cluster at once, getting 80+ copies of what's probably a very large schema all at once. If you really have no data in any of those tables, the easiest solution may be to start removing them to decrease schema size and make the thundering herd of schema mutations less painful (this may be a viable option if the tables are old and unused - if you expect to use them again, keep reading). Beyond that, you have two options: 1) Try to make it so you can better handle all of the incoming mutations - this may mean a bigger heap, tuning the memtable, or similar. Hard to give concrete suggestions without a heap dump and knowing your current settings. Offheap memtable may be a starting point given you're on 2.1. 2) Try to limit the number of concurrent migrations - this is going to sound awful, for obvious reasons, but one of the things that may work is to artificially restrict your instance's view of the ring using firewall rules so it can only communicate with a handful of hosts (maybe just the seeds) for the first 5-15 seconds after it starts, then once it's got the schema, remove the rules allowing it to talk to the rest of the cluster so it can properly bootstrap. One of the other two JIRAs will eventually get addressed; I'm going to dupe this to CASSANDRA-11748 since it's a lower number (earlier reporting). > Bootstrap of new node fails with OOM in a large cluster > --- > > Key: CASSANDRA-14840 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14840 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Jai Bheemsen Rao Dhanwada >Priority: Critical > > We are seeing new node addition fails with OOM during bootstrap in a cluster > of more than 80 nodes and 3000 CF without any data in those CFs. > > Steps to reproduce: > # Launch a 3 node cluster > # Create 3000 CF in the cluster > # Start adding nodes to the cluster one by one > # After adding 75-80 nodes, the new node bootstrap fails with OOM. > {code:java} > ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 > JVMStabilityInspector.java:78 - Exiting due to error while processing commit > log during initialization. > java.lang.OutOfMemoryError: Java heap space > at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151] > at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151] > at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151] > at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151] > at java.lang.String.format(String.java:2940) ~[na:1.8.0_151] > at > org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code} > Cassandra Version: 2.1.16 > OS: CentOS7 > num_tokens: 256 on each node. > > This behavior is blocking us from adding extra capacity when needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org