Just curious - aside from the "sleep", is this all not part of the shutdown
Is this an "opportunity" to improve C*?
Having worked with RDBMSes, Hadoop and HBase, stopping communication, flushing
memcache (HBase), and relinquishing ownership of data (HBase) is all part of
the shutdown process.
From: Alain RODRIGUEZ <arodr...@gmail.com>
Date: Wednesday, January 10, 2018 at 6:19 AM
To: "user cassandra.apache.org" <firstname.lastname@example.org>
Subject: Re: Question upon gracefully restarting c* node(s)
I agree with comments above. Cassandra is robust, and we are just talking about
optimising the process. Nothing mandatory. Going to an extreme I would say you
can pull and plug back the node power cable and call it a restart, It should
not harm if your cluster is properly tuned. Yet optimisation are welcomed as
they improve entropy, starting time. Plus we are civilized operators, not
barbarians, aren't we ;-)? It's just more 'clean' and efficient.
Also, historically, it was mandatory to drain when using counter to prevent
over-count as counter are not idempotent. Not sure about this nowadays).
Last time I asked this very question I ended up building this command that I
have been using since then:
`date && nodetool disablebinary && nodetool disablegossip && sleep 10 &&
nodetool flush && nodetool drain && sleep 10 && sudo service cassandra restart`
It does the following:
- Print the date for the record
- Stop all clients transports. I never heard about a benefice of shutting down
the gossip protocol, and so never did so, it might be better but I can't really
say. This way we stop listening for clients.
- After a small while no clients are using the node, calling the drain flushes
memtables and recycle commitlog as Kurt detailed above. Here I add a 'flush'
because I haven't been that lucky in the past with drain, sometimes not working
at all, sometimes not cleaning commitlogs. I believe flushing first makes this
restart command more robust.
- Finally restart the service.
I think there is not only one good way to do this. Also, doing it wrong is
often not such a big deal.
Alain Rodriguez - @arodream -
France / Spain
The Last Pickle - Apache Cassandra Consulting
2018-01-08 3:33 GMT+00:00 Jeff Jirsa
The sequence does have some objective benefits - especially stopping transports
and then gossip, it tells everything you’re going offline before you do, so
requests won’t get dropped or have to speculate to other replicas.
On Jan 7, 2018, at 7:22 PM, kurt greaves
None are essential. Cassandra will gracefully shutdown in any scenario as long
as it's not killed with a SIGKILL. However, drain does have a few benefits over
just a normal shutdown. It will stop a few extra services (batchlog,
compactions) and importantly it will also force recycling of dirty commitlog
segments, meaning there will be less commitlog files to replay on startup and
reducing startup time.
A comment in the code for drain also indicates that it will wait for
in-progress streaming to complete, but I haven't managed to find 1. where this
occurs, or 2. if it actually differs to a normal shutdown. Note that this is
all w.r.t 2.1. In 3.0.10 and 3.10 drain and shutdown more or less do the exact
same thing, however drain will log some extra messages.
On 2 January 2018 at 07:07, Jing Meng
Recently we made a change to our production env c* cluster (2.1.18) - placing
the commit log to the same SSD where data is stored, which needs restarting all
Before restarting a cassandra node, we ran the following nodetool utils:
$ nodetool disablethrift && sleep 5
$ nodetool disablebinary && sleep 5
$ nodetool disable gossip && sleep 5
$ nodetool drain && sleep 5
It was "graceful" as expected (no significant errors found), but the process is
still a myth to us: are those commands used above "sufficient", and/or why? The
offical doc (docs.datastax.com<http://docs.datastax.com>) did not help with
this operation detail, though "nodetool drain" is apparently essential.