Re: Add a new node of 3.11.5 in a 3.11.0 Cassandra Cluster

2020-05-09 Thread manish khandelwal
You should not bootstrap with mixed cluster. First you should upgrade and
then bootstrap a new node. If you are not able to upgrade due to disk space
constraints while running upgradesstable, then

1. Reduce the number of threads for upgradesstable (--jobs parameter). By
default it is 2. You can do it one.
2. Run upgradesstable with keyspace/table



On Sat, May 9, 2020 at 9:51 PM Surbhi Gupta 
wrote:

> Hi,
>
> We are facing some issue in bootstrapping new node in 3.11.0 and
> bootstrapping is failing.
> We have two tasks here :
> 1. Expand the cluster (Due to disk concern and dropped mutation)
> 2. Upgrade the cluster from 3.11.0 to 3.11.5 because of various bugs we
> are hitting in 3.11.0 .
>
> So my question here is :
> 1. Can we add new node with 3.11.5 rpm on a 3.11.0 cluster ?
> 2. Because there is a sstable format change from mc to md , is there any
> issue in bootstrapping a node with 3.11.5 version ?
>
> We are not able to upgrade the cluster first and add nodes because of the
> disk concerns . upgradesstable will need space because of sstable format
> change.
>
> Please advise.
>
> Thanks
> Surbhi
>
>


Add a new node of 3.11.5 in a 3.11.0 Cassandra Cluster

2020-05-09 Thread Surbhi Gupta
Hi,

We are facing some issue in bootstrapping new node in 3.11.0 and
bootstrapping is failing.
We have two tasks here :
1. Expand the cluster (Due to disk concern and dropped mutation)
2. Upgrade the cluster from 3.11.0 to 3.11.5 because of various bugs we are
hitting in 3.11.0 .

So my question here is :
1. Can we add new node with 3.11.5 rpm on a 3.11.0 cluster ?
2. Because there is a sstable format change from mc to md , is there any
issue in bootstrapping a node with 3.11.5 version ?

We are not able to upgrade the cluster first and add nodes because of the
disk concerns . upgradesstable will need space because of sstable format
change.

Please advise.

Thanks
Surbhi


Re: Bootstraping is failing

2020-05-09 Thread Surbhi Gupta
I tried to change the heap size from 31GB to 62GB on the bootstrapping node
because , I noticed that , when it reached the mid way of bootstrapping ,
heap reached to around 90% or more and node just freeze .
But still it is the same behavior , it again reached midway and heap again
reached 90% or more and node just freeze and none of the node tool command
returns the output, other node also removed this node from the joining as
they were not able to gossip.
We are on 3.11.0 .

I tried to take heap dump when the node had 90% + heap utilization of 62GB
heap size and opened the leak report and found 3 leak suspect and out of
three 2 were as below:

1. The thread *io.netty.util.concurrent.FastThreadLocalThread @
0x7fbe9533bf98 StreamReceiveTask:26* keeps local variables with total
size *16,898,023,552
(31.10%)*bytes.
The memory is accumulated in one instance of
*"io.netty.util.Recycler$DefaultHandle[]"* loaded by
*"sun.misc.Launcher$AppClassLoader
@ 0x7fb917c76dc8"*.

2. The thread *io.netty.util.concurrent.FastThreadLocalThread @
0x7fbb846fb800 StreamReceiveTask:29* keeps local variables with total
size *11,696,214,424
(21.53%)*bytes.
The memory is accumulated in one instance of
*"io.netty.util.Recycler$DefaultHandle[]"* loaded by
*"sun.misc.Launcher$AppClassLoader
@ 0x7fb917c76dc8"*.

Am I getting hit by https://issues.apache.org/jira/browse/CASSANDRA-13929

I haven't changed the tcp settings . My tcp settings are more than
recommended, what I wanted to understand , how tcp settings can effect the
bootstrapping process ?

Thanks
Surbhi

On Thu, 7 May 2020 at 17:01, Surbhi Gupta  wrote:

> When we are starting the node, it is starting bootstrap automatically and
> restreaming the whole data again.  It is not resuming .
>
> On Thu, May 7, 2020 at 4:47 PM Adam Scott  wrote:
>
>> I think you want to run `nodetool bootstrap resume` (
>> https://cassandra.apache.org/doc/latest/tools/nodetool/bootstrap.html)
>> to pick up where it last left off. Sorry for the late reply.
>>
>>
>> On Thu, May 7, 2020 at 2:22 PM Surbhi Gupta 
>> wrote:
>>
>>> So after failed bootstrapped , if we start cassandra again on the new
>>> node , will it resume bootstrap or will it start over?
>>>
>>> On Thu, 7 May 2020 at 13:32, Adam Scott  wrote:
>>>
 I recommend it on all nodes.  This will eliminate that as a source of
 trouble further on down the road.


 On Thu, May 7, 2020 at 1:30 PM Surbhi Gupta 
 wrote:

> streaming_socket_timeout_in_ms is 24 hour.
>   So tcp settings should be changed on the new bootstrap node or on
> all nodes ?
>
>
> On Thu, 7 May 2020 at 13:23, Adam Scott 
> wrote:
>
>>
>> *edit
>> /etc/sysctl.confnet.ipv4.tcp_keepalive_time=60 
>> net.ipv4.tcp_keepalive_probes=3net.ipv4.tcp_keepalive_intvl=10*
>> then run sysctl -p to cause the kernel to reload the settings
>>
>> 5 minutes (300) seconds is probably too long.
>>
>> On Thu, May 7, 2020 at 1:09 PM Surbhi Gupta 
>> wrote:
>>
>>> [root@abc cassandra]# cat /proc/sys/net/ipv4/tcp_keepalive_time
>>>
>>> 300
>>>
>>> [root@abc cassandra]# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
>>>
>>> 30
>>>
>>> [root@abc cassandra]# cat /proc/sys/net/ipv4/tcp_keepalive_probes
>>>
>>> 9
>>>
>>> On Thu, 7 May 2020 at 12:32, Adam Scott 
>>> wrote:
>>>
 Maybe a firewall killing a connection?

 What does the following show?
 cat /proc/sys/net/ipv4/tcp_keepalive_time
 cat /proc/sys/net/ipv4/tcp_keepalive_intvl
 cat /proc/sys/net/ipv4/tcp_keepalive_probes

 On Thu, May 7, 2020 at 10:31 AM Surbhi Gupta <
 surbhi.gupt...@gmail.com> wrote:

> Hi,
>
> We are trying to expand a datacenter and trying to add nodes but
> when node is bootstrapping , it goes half way through and then fail 
> with
> below error, We have increased stremthroughput from 200 to 400 when 
> we were
> trying for the 2nd time but still it failed. We are on 3.11.0 , using 
> G1GC
> with 31GB heap.
>
> ERROR [MessagingService-Incoming-/10.X.X.X] 2020-05-07
> 09:42:38,933 CassandraDaemon.java:228 - Exception in thread
> Thread[MessagingService-Incoming-/10.X.X.X,main]
>
> java.io.IOError: java.io.EOFException: Stream ended prematurely
>
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext(UnfilteredRowIteratorSerializer.java:227)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
> at
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext(UnfilteredRowIteratorSerializer.java:215)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
> at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)