Re: Problem adding a new node to a cluster
Definitely upgrade to 3.11.1. On Sun, Dec 17, 2017 at 8:54 PM Pradeep Chhetri wrote: > Hello Kurt, > > I realized it was because of RAM shortage which caused the issue. I bumped > up the memory of the machine and node bootstrap started but this time i hit > this bug of cassandra 3.9: > > https://issues.apache.org/jira/browse/CASSANDRA-12905 > > I tried running nodetool bootstrap resume multiple times but every time it > fails with exception after completing around 963% > > https://gist.github.com/chhetripradeep/93567ad24c44ba72d0753d4088a10ce4 > > Do you think there is some workaround for this. Or do you suggest > upgrading to v3.11 which has this fix. > > Also, can we just upgrade the cassandra from 3.9 -> 3.11 in rolling > fashion or do we need to take care of something in case we have to upgrade. > > Thanks. > > > > > > > On Mon, Dec 18, 2017 at 5:45 AM, kurt greaves > wrote: > >> You haven't provided enough logs for us to really tell what's wrong. I >> suggest running *nodetool netstats* *| grep -v 100% *to see if any >> streams are still ongoing, and also running *nodetool compactionstats -H* to >> see if there are any index builds the node might be waiting for prior to >> joining the ring. >> >> If neither of those provide any useful information, send us the full >> system.log and debug.log >> >> On 17 December 2017 at 11:19, Pradeep Chhetri >> wrote: >> >>> Hello all, >>> >>> I am trying to add a 4th node to a 3-node cluster which is using >>> SimpleSnitch. But this new node is stuck in Joining state for last 20 >>> hours. We have around 10GB data per node with RF as 3. >>> >>> Its mostly stuck in redistributing index summaries phase. >>> >>> Here are the logs: >>> >>> https://gist.github.com/chhetripradeep/37e4f232ddf0dd3b830091ca9829416d >>> >>> # nodetool status >>> Datacenter: datacenter1 >>> === >>> Status=Up/Down >>> |/ State=Normal/Leaving/Joining/Moving >>> -- AddressLoad Tokens Owns (effective) Host ID >>>Rack >>> UJ 10.42.187.43 9.73 GiB 256 ? >>> 36384dc5-a183-4a5b-ae2d-ee67c897df3d rack1 >>> UN 10.42.106.184 9.95 GiB 256 100.0% >>> 42cd09e9-8efb-472f-ace6-c7bb98634887 rack1 >>> UN 10.42.169.195 10.35 GiB 256 100.0% >>> 9fcc99a1-6334-4df8-818d-b097b1920bb9 rack1 >>> UN 10.42.209.245 8.54 GiB 256 100.0% >>> 9b99d5d8-818e-4741-9533-259d0fc0e16d rack1 >>> >>> Not sure what is going here, will be very helpful if someone can help in >>> identifying the issue. >>> >>> Thank you. >>> >>> >>> >> >
Re: Problem adding a new node to a cluster
Hello Kurt, I realized it was because of RAM shortage which caused the issue. I bumped up the memory of the machine and node bootstrap started but this time i hit this bug of cassandra 3.9: https://issues.apache.org/jira/browse/CASSANDRA-12905 I tried running nodetool bootstrap resume multiple times but every time it fails with exception after completing around 963% https://gist.github.com/chhetripradeep/93567ad24c44ba72d0753d4088a10ce4 Do you think there is some workaround for this. Or do you suggest upgrading to v3.11 which has this fix. Also, can we just upgrade the cassandra from 3.9 -> 3.11 in rolling fashion or do we need to take care of something in case we have to upgrade. Thanks. On Mon, Dec 18, 2017 at 5:45 AM, kurt greaves wrote: > You haven't provided enough logs for us to really tell what's wrong. I > suggest running *nodetool netstats* *| grep -v 100% *to see if any > streams are still ongoing, and also running *nodetool compactionstats -H* to > see if there are any index builds the node might be waiting for prior to > joining the ring. > > If neither of those provide any useful information, send us the full > system.log and debug.log > > On 17 December 2017 at 11:19, Pradeep Chhetri > wrote: > >> Hello all, >> >> I am trying to add a 4th node to a 3-node cluster which is using >> SimpleSnitch. But this new node is stuck in Joining state for last 20 >> hours. We have around 10GB data per node with RF as 3. >> >> Its mostly stuck in redistributing index summaries phase. >> >> Here are the logs: >> >> https://gist.github.com/chhetripradeep/37e4f232ddf0dd3b830091ca9829416d >> >> # nodetool status >> Datacenter: datacenter1 >> === >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- AddressLoad Tokens Owns (effective) Host ID >> Rack >> UJ 10.42.187.43 9.73 GiB 256 ? >> 36384dc5-a183-4a5b-ae2d-ee67c897df3d rack1 >> UN 10.42.106.184 9.95 GiB 256 100.0% >> 42cd09e9-8efb-472f-ace6-c7bb98634887 rack1 >> UN 10.42.169.195 10.35 GiB 256 100.0% >> 9fcc99a1-6334-4df8-818d-b097b1920bb9 rack1 >> UN 10.42.209.245 8.54 GiB 256 100.0% >> 9b99d5d8-818e-4741-9533-259d0fc0e16d rack1 >> >> Not sure what is going here, will be very helpful if someone can help in >> identifying the issue. >> >> Thank you. >> >> >> >
Re: Problem adding a new node to a cluster
You haven't provided enough logs for us to really tell what's wrong. I suggest running *nodetool netstats* *| grep -v 100% *to see if any streams are still ongoing, and also running *nodetool compactionstats -H* to see if there are any index builds the node might be waiting for prior to joining the ring. If neither of those provide any useful information, send us the full system.log and debug.log On 17 December 2017 at 11:19, Pradeep Chhetri wrote: > Hello all, > > I am trying to add a 4th node to a 3-node cluster which is using > SimpleSnitch. But this new node is stuck in Joining state for last 20 > hours. We have around 10GB data per node with RF as 3. > > Its mostly stuck in redistributing index summaries phase. > > Here are the logs: > > https://gist.github.com/chhetripradeep/37e4f232ddf0dd3b830091ca9829416d > > # nodetool status > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens Owns (effective) Host ID > Rack > UJ 10.42.187.43 9.73 GiB 256 ? > 36384dc5-a183-4a5b-ae2d-ee67c897df3d rack1 > UN 10.42.106.184 9.95 GiB 256 100.0% > 42cd09e9-8efb-472f-ace6-c7bb98634887 rack1 > UN 10.42.169.195 10.35 GiB 256 100.0% > 9fcc99a1-6334-4df8-818d-b097b1920bb9 rack1 > UN 10.42.209.245 8.54 GiB 256 100.0% > 9b99d5d8-818e-4741-9533-259d0fc0e16d rack1 > > Not sure what is going here, will be very helpful if someone can help in > identifying the issue. > > Thank you. > > >