[jira] [Commented] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster

2018-10-26 Thread Jeff Jirsa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665726#comment-16665726
 ] 

Jeff Jirsa commented on CASSANDRA-14840:


No, its an open issue. There are some proposed patches flying around that may 
work for you (check the linked JIRA), but it's not committed or tested or 
guaranteed to work.

Also, you dont have to put iptables on every host, you can put the rules only 
on hte bootstrapping host, you just need to be able to block most of the 
cluster quickly (and undo it quickly).



> Bootstrap of new node fails with OOM in a large cluster
> ---
>
> Key: CASSANDRA-14840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14840
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> We are seeing new node addition fails with OOM during bootstrap in a cluster 
> of more than 80 nodes and 3000 CF without any data in those CFs.
>  
> Steps to reproduce:
>  # Launch a 3 node cluster
>  # Create 3000 CF in the cluster
>  # Start adding nodes to the cluster one by one
>  # After adding 75-80 nodes, the new node bootstrap fails with OOM.
> {code:java}
> ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 
> JVMStabilityInspector.java:78 - Exiting due to error while processing commit 
> log during initialization.
> java.lang.OutOfMemoryError: Java heap space
>  at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151]
>  at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151]
>  at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151]
>  at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151]
>  at java.lang.String.format(String.java:2940) ~[na:1.8.0_151]
>  at 
> org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code}
> Cassandra Version: 2.1.16
> OS: CentOS7
> num_tokens: 256 on each node.
>  
> This behavior is blocking us from adding extra capacity when needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster

2018-10-24 Thread Jai Bheemsen Rao Dhanwada (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16663076#comment-16663076
 ] 

Jai Bheemsen Rao Dhanwada commented on CASSANDRA-14840:
---

[~jjirsa] This is the production cluster and all the CF are being used, so I 
can't delete any of the CF.

 
 # I am already using Offheap memtables, still the getting OOM. Current Heap 
settings are `-Xms8192M -Xmx8192M -Xmn1200M` and CMS Heap. I tried increasing 
the heap size to 16G and after adding 120 nodes I still OOM issues to the new 
node bootstrapping. any other suggestions here?
 # Sounds like very handful approach, not sure if I can time it very well.

> Bootstrap of new node fails with OOM in a large cluster
> ---
>
> Key: CASSANDRA-14840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14840
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> We are seeing new node addition fails with OOM during bootstrap in a cluster 
> of more than 80 nodes and 3000 CF without any data in those CFs.
>  
> Steps to reproduce:
>  # Launch a 3 node cluster
>  # Create 3000 CF in the cluster
>  # Start adding nodes to the cluster one by one
>  # After adding 75-80 nodes, the new node bootstrap fails with OOM.
> {code:java}
> ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 
> JVMStabilityInspector.java:78 - Exiting due to error while processing commit 
> log during initialization.
> java.lang.OutOfMemoryError: Java heap space
>  at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151]
>  at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151]
>  at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151]
>  at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151]
>  at java.lang.String.format(String.java:2940) ~[na:1.8.0_151]
>  at 
> org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code}
> Cassandra Version: 2.1.16
> OS: CentOS7
> num_tokens: 256 on each node.
>  
> This behavior is blocking us from adding extra capacity when needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14840) Bootstrap of new node fails with OOM in a large cluster

2018-10-23 Thread Jeff Jirsa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661748#comment-16661748
 ] 

Jeff Jirsa commented on CASSANDRA-14840:


This is a duplicate of CASSANDRA-11748 and/or CASSANDRA-13569 - what's 
happening is that when the new instance comes online, it pulls schema from all 
of the other instances in the cluster at once, getting 80+ copies of what's 
probably a very large schema all at once. 

If you really have no data in any of those tables, the easiest solution may be 
to start removing them to decrease schema size and make the thundering herd of 
schema mutations less painful (this may be a viable option if the tables are 
old and unused - if you expect to use them again, keep reading).

Beyond that, you have two options:
1) Try to make it so you can better handle all of the incoming mutations - this 
may mean a bigger heap, tuning the memtable, or similar. Hard to give concrete 
suggestions without a heap dump and knowing your current settings. Offheap 
memtable may be a starting point given you're on 2.1.
2) Try to limit the number of concurrent migrations - this is going to sound 
awful, for obvious reasons, but one of the things that may work is to 
artificially restrict your instance's view of the ring using firewall rules so 
it can only communicate with a handful of hosts (maybe just the seeds) for the 
first 5-15 seconds after it starts, then once it's got the schema, remove the 
rules allowing it to talk to the rest of the cluster so it can properly 
bootstrap.

One of the other two JIRAs will eventually get addressed; I'm going to dupe 
this to CASSANDRA-11748 since it's a lower number (earlier reporting). 

> Bootstrap of new node fails with OOM in a large cluster
> ---
>
> Key: CASSANDRA-14840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14840
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Jai Bheemsen Rao Dhanwada
>Priority: Critical
>
> We are seeing new node addition fails with OOM during bootstrap in a cluster 
> of more than 80 nodes and 3000 CF without any data in those CFs.
>  
> Steps to reproduce:
>  # Launch a 3 node cluster
>  # Create 3000 CF in the cluster
>  # Start adding nodes to the cluster one by one
>  # After adding 75-80 nodes, the new node bootstrap fails with OOM.
> {code:java}
> ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 
> JVMStabilityInspector.java:78 - Exiting due to error while processing commit 
> log during initialization.
> java.lang.OutOfMemoryError: Java heap space
>  at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151]
>  at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151]
>  at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151]
>  at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151]
>  at java.lang.String.format(String.java:2940) ~[na:1.8.0_151]
>  at 
> org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105)
>  ~[apache-cassandra-2.1.16.jar:2.1.16]
>  at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code}
> Cassandra Version: 2.1.16
> OS: CentOS7
> num_tokens: 256 on each node.
>  
> This behavior is blocking us from adding extra capacity when needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org