Re: Problem adding a new node to a cluster

2017-12-18 Thread Jonathan Haddad
Definitely upgrade to 3.11.1.
On Sun, Dec 17, 2017 at 8:54 PM Pradeep Chhetri 
wrote:

> Hello Kurt,
>
> I realized it was because of RAM shortage which caused the issue. I bumped
> up the memory of the machine and node bootstrap started but this time i hit
> this bug of cassandra 3.9:
>
> https://issues.apache.org/jira/browse/CASSANDRA-12905
>
> I tried running nodetool bootstrap resume multiple times but every time it
> fails with exception after completing around 963%
>
> https://gist.github.com/chhetripradeep/93567ad24c44ba72d0753d4088a10ce4
>
> Do you think there is some workaround for this. Or do you suggest
> upgrading to v3.11 which has this fix.
>
> Also, can we just upgrade the cassandra from 3.9 -> 3.11 in rolling
> fashion or do we need to take care of something in case we have to upgrade.
>
> Thanks.
>
>
>
>
>
>
> On Mon, Dec 18, 2017 at 5:45 AM, kurt greaves 
> wrote:
>
>> You haven't provided enough logs for us to really tell what's wrong. I
>> suggest running *nodetool netstats* *| grep -v 100% *to see if any
>> streams are still ongoing, and also running *nodetool compactionstats -H* to
>> see if there are any index builds the node might be waiting for prior to
>> joining the ring.
>>
>> If neither of those provide any useful information, send us the full
>> system.log and debug.log
>>
>> On 17 December 2017 at 11:19, Pradeep Chhetri 
>> wrote:
>>
>>> Hello all,
>>>
>>> I am trying to add a 4th node to a 3-node cluster which is using
>>> SimpleSnitch. But this new node is stuck in Joining state for last 20
>>> hours. We have around 10GB data per node with RF as 3.
>>>
>>> Its mostly stuck in redistributing index summaries phase.
>>>
>>> Here are the logs:
>>>
>>> https://gist.github.com/chhetripradeep/37e4f232ddf0dd3b830091ca9829416d
>>>
>>> # nodetool status
>>> Datacenter: datacenter1
>>> ===
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  AddressLoad   Tokens   Owns (effective)  Host ID
>>>Rack
>>> UJ  10.42.187.43   9.73 GiB   256  ?
>>>  36384dc5-a183-4a5b-ae2d-ee67c897df3d  rack1
>>> UN  10.42.106.184  9.95 GiB   256  100.0%
>>> 42cd09e9-8efb-472f-ace6-c7bb98634887  rack1
>>> UN  10.42.169.195  10.35 GiB  256  100.0%
>>> 9fcc99a1-6334-4df8-818d-b097b1920bb9  rack1
>>> UN  10.42.209.245  8.54 GiB   256  100.0%
>>> 9b99d5d8-818e-4741-9533-259d0fc0e16d  rack1
>>>
>>> Not sure what is going here, will be very helpful if someone can help in
>>> identifying the issue.
>>>
>>> Thank you.
>>>
>>>
>>>
>>
>


Re: Problem adding a new node to a cluster

2017-12-17 Thread Pradeep Chhetri
Hello Kurt,

I realized it was because of RAM shortage which caused the issue. I bumped
up the memory of the machine and node bootstrap started but this time i hit
this bug of cassandra 3.9:

https://issues.apache.org/jira/browse/CASSANDRA-12905

I tried running nodetool bootstrap resume multiple times but every time it
fails with exception after completing around 963%

https://gist.github.com/chhetripradeep/93567ad24c44ba72d0753d4088a10ce4

Do you think there is some workaround for this. Or do you suggest upgrading
to v3.11 which has this fix.

Also, can we just upgrade the cassandra from 3.9 -> 3.11 in rolling fashion
or do we need to take care of something in case we have to upgrade.

Thanks.






On Mon, Dec 18, 2017 at 5:45 AM, kurt greaves  wrote:

> You haven't provided enough logs for us to really tell what's wrong. I
> suggest running *nodetool netstats* *| grep -v 100% *to see if any
> streams are still ongoing, and also running *nodetool compactionstats -H* to
> see if there are any index builds the node might be waiting for prior to
> joining the ring.
>
> If neither of those provide any useful information, send us the full
> system.log and debug.log
>
> On 17 December 2017 at 11:19, Pradeep Chhetri 
> wrote:
>
>> Hello all,
>>
>> I am trying to add a 4th node to a 3-node cluster which is using
>> SimpleSnitch. But this new node is stuck in Joining state for last 20
>> hours. We have around 10GB data per node with RF as 3.
>>
>> Its mostly stuck in redistributing index summaries phase.
>>
>> Here are the logs:
>>
>> https://gist.github.com/chhetripradeep/37e4f232ddf0dd3b830091ca9829416d
>>
>> # nodetool status
>> Datacenter: datacenter1
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  AddressLoad   Tokens   Owns (effective)  Host ID
>>  Rack
>> UJ  10.42.187.43   9.73 GiB   256  ?
>>  36384dc5-a183-4a5b-ae2d-ee67c897df3d  rack1
>> UN  10.42.106.184  9.95 GiB   256  100.0%
>> 42cd09e9-8efb-472f-ace6-c7bb98634887  rack1
>> UN  10.42.169.195  10.35 GiB  256  100.0%
>> 9fcc99a1-6334-4df8-818d-b097b1920bb9  rack1
>> UN  10.42.209.245  8.54 GiB   256  100.0%
>> 9b99d5d8-818e-4741-9533-259d0fc0e16d  rack1
>>
>> Not sure what is going here, will be very helpful if someone can help in
>> identifying the issue.
>>
>> Thank you.
>>
>>
>>
>


Re: Problem adding a new node to a cluster

2017-12-17 Thread kurt greaves
You haven't provided enough logs for us to really tell what's wrong. I
suggest running *nodetool netstats* *| grep -v 100% *to see if any streams
are still ongoing, and also running *nodetool compactionstats -H* to see if
there are any index builds the node might be waiting for prior to joining
the ring.

If neither of those provide any useful information, send us the full
system.log and debug.log

On 17 December 2017 at 11:19, Pradeep Chhetri  wrote:

> Hello all,
>
> I am trying to add a 4th node to a 3-node cluster which is using
> SimpleSnitch. But this new node is stuck in Joining state for last 20
> hours. We have around 10GB data per node with RF as 3.
>
> Its mostly stuck in redistributing index summaries phase.
>
> Here are the logs:
>
> https://gist.github.com/chhetripradeep/37e4f232ddf0dd3b830091ca9829416d
>
> # nodetool status
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens   Owns (effective)  Host ID
>  Rack
> UJ  10.42.187.43   9.73 GiB   256  ?
>  36384dc5-a183-4a5b-ae2d-ee67c897df3d  rack1
> UN  10.42.106.184  9.95 GiB   256  100.0%
> 42cd09e9-8efb-472f-ace6-c7bb98634887  rack1
> UN  10.42.169.195  10.35 GiB  256  100.0%
> 9fcc99a1-6334-4df8-818d-b097b1920bb9  rack1
> UN  10.42.209.245  8.54 GiB   256  100.0%
> 9b99d5d8-818e-4741-9533-259d0fc0e16d  rack1
>
> Not sure what is going here, will be very helpful if someone can help in
> identifying the issue.
>
> Thank you.
>
>
>