Re: Re: How to gracefully decommission a highly loaded node?

Riccardo Ferrari Mon, 17 Dec 2018 02:44:23 -0800

I am having "the same" issue.
One of my nodes seems to have some hardware struggle, out of 6 nodes (same
instance size) this one is likely to be makred down, it consntantly
compacting, high system load, it's just a big pain.


My idea was to add nodes and decommission all the one running on old
hardware (m1.xlarge), however this very specific "bad" node is causing
trouble to the whole cluster and decided to decommission it first.

The node is simply stuck in "LEAVING" - Not sending any stream. I already
have disabled binary and autocompactions and tried to restart the
decommission process couple of times with no luck.
Any suggestions?
assassinate vs removenode?
Any tuning that could help?

Best,

On Thu, Dec 6, 2018 at 10:59 AM onmstester onmstester
<onmstes...@zoho.com.invalid> wrote:

> After few hours, i just removed the node. done another node
> decommissioned, which finished successfully (the writer app was down, so no
> pressure on the cluster)
> Started another node decommission (third), Since didn't have time to wait
> for decommissioning to finish, i started the writer Application, when
> almost most of decommissioning-node's streaming was done and only a few GBs
> to two other nodes remained to be streamed.
> After 12 Hours i checked the decommissioning node  and netstats says:
> LEAVING, Restore Replica Count....!
> So just ran removednode on this one too.
> Is there something wrong with decommissioning while someones writing to
> Cluster?
> Using Apache Cassandra 3.11.2
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ============ Forwarded message ============
> From : onmstester onmstester <onmstes...@zoho.com.INVALID>
> To : "user"<user@cassandra.apache.org>
> Date : Wed, 05 Dec 2018 09:00:34 +0330
> Subject : Fwd: Re: How to gracefully decommission a highly loaded node?
> ============ Forwarded message ============
>
> After a long time stuck in LEAVING, and "not doing any streams", i killed
> Cassandra process and restart it, then again ran nodetool decommission
> (Datastax recipe for stuck decommission),
> now it says, LEAVING, "unbootstrap $(the node id)"
>
> What's going on? Should i forget about decommission and just remove the
> node?
>
> There is an issue to make decommission resumable:
> https://issues.apache.org/jira/browse/CASSANDRA-12008
>
> but i couldn't figure out how this suppose to work? I was expecting that
> after restarting stucked-decommission-cassandra, it resume the
> decommissioning process, but the node became UN after restart.
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
> ============ Forwarded message ============
> From : Simon Fontana Oscarsson <simon.fontana.oscars...@ericsson.com>
> To : "user@cassandra.apache.org"<user@cassandra.apache.org>
> Date : Tue, 04 Dec 2018 15:20:15 +0330
> Subject : Re: How to gracefully decommission a highly loaded node?
> ============ Forwarded message ============
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
> Hi,
>
> If it already uses 100 % CPU I have a hard time seeing it being able to do
> a decomission while serving requests. If you have a lot of free space I
> would first try nodetool disableautocompaction. If you don't see any
> progress in nodetool netstats you can also disablebinary, disablethrift and
> disablehandoff to stop serving client requests.
>
> --
>
> SIMON FONTANA OSCARSSON
> Software Developer
>
> Ericsson
> Ölandsgatan 1
> 37133 Karlskrona, swedensimon.fontana.oscars...@ericsson.comwww.ericsson.com
>
>
> On tis, 2018-12-04 at 14:21 +0330, onmstester onmstester wrote:
>
> One node suddenly uses 100% CPU, i suspect hardware problems and do not
> have time to trace that, so decided to just remove the node from the
> cluster, but although the node state changed to UL, but no sign of Leaving:
> the node is still compacting and flushing memtables, writing mutations and
> CPU is 100% for hours since.
> Is there any means to force a Cassandra Node to just decommission and stop
> doing normal things?
> Due to W.CL=ONE, i can not use removenode and shutdown the node
>
> Best Regards
>
> Sent using Zoho Mail <https://www.zoho.com/mail/>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Re: How to gracefully decommission a highly loaded node?

Reply via email to