Shutdown (drain, rather) does all of those things, but it’s not very patient - it doesn’t sleep (and there’s no setup time like reconnecting for every invocation of nodetool) so things shutdown quickly in rapid succession, which may have client-visible impact.
-- Jeff Jirsa > On Jan 10, 2018, at 6:20 AM, Thakrar, Jayesh <jthak...@conversantmedia.com> > wrote: > > Just curious - aside from the "sleep", is this all not part of the shutdown > command? > Is this an "opportunity" to improve C*? > Having worked with RDBMSes, Hadoop and HBase, stopping communication, > flushing memcache (HBase), and relinquishing ownership of data (HBase) is all > part of the shutdown process. > > > From: Alain RODRIGUEZ <arodr...@gmail.com> > Date: Wednesday, January 10, 2018 at 6:19 AM > To: "user cassandra.apache.org" <email@example.com> > Subject: Re: Question upon gracefully restarting c* node(s) > > I agree with comments above. Cassandra is robust, and we are just talking > about optimising the process. Nothing mandatory. Going to an extreme I would > say you can pull and plug back the node power cable and call it a restart, It > should not harm if your cluster is properly tuned. Yet optimisation are > welcomed as they improve entropy, starting time. Plus we are civilized > operators, not barbarians, aren't we ;-)? It's just more 'clean' and > efficient. > Also, historically, it was mandatory to drain when using counter to prevent > over-count as counter are not idempotent. Not sure about this nowadays). > > Last time I asked this very question I ended up building this command that I > have been using since then: > > `date && nodetool disablebinary && nodetool disablegossip && sleep 10 && > nodetool flush && nodetool drain && sleep 10 && sudo service cassandra > restart` > > It does the following: > > - Print the date for the record > - Stop all clients transports. I never heard about a benefice of shutting > down the gossip protocol, and so never did so, it might be better but I can't > really say. This way we stop listening for clients. > - After a small while no clients are using the node, calling the drain > flushes memtables and recycle commitlog as Kurt detailed above. Here I add a > 'flush' because I haven't been that lucky in the past with drain, sometimes > not working at all, sometimes not cleaning commitlogs. I believe flushing > first makes this restart command more robust. > - Finally restart the service. > > I think there is not only one good way to do this. Also, doing it wrong is > often not such a big deal. > > C*heers, > ----------------------- > Alain Rodriguez - @arodream - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > > > > > 2018-01-08 3:33 GMT+00:00 Jeff Jirsa <jji...@gmail.com>: > The sequence does have some objective benefits - especially stopping > transports and then gossip, it tells everything you’re going offline before > you do, so requests won’t get dropped or have to speculate to other replicas. > > > > -- > Jeff Jirsa > > > On Jan 7, 2018, at 7:22 PM, kurt greaves <k...@instaclustr.com> wrote: > > None are essential. Cassandra will gracefully shutdown in any scenario as > long as it's not killed with a SIGKILL. However, drain does have a few > benefits over just a normal shutdown. It will stop a few extra services > (batchlog, compactions) and importantly it will also force recycling of dirty > commitlog segments, meaning there will be less commitlog files to replay on > startup and reducing startup time. > > A comment in the code for drain also indicates that it will wait for > in-progress streaming to complete, but I haven't managed to find 1. where > this occurs, or 2. if it actually differs to a normal shutdown. Note that > this is all w.r.t 2.1. In 3.0.10 and 3.10 drain and shutdown more or less do > the exact same thing, however drain will log some extra messages. > > On 2 January 2018 at 07:07, Jing Meng <self.rel...@gmail.com> wrote: > Hi all. > > Recently we made a change to our production env c* cluster (2.1.18) - placing > the commit log to the same SSD where data is stored, which needs restarting > all nodes. > > Before restarting a cassandra node, we ran the following nodetool utils: > $ nodetool disablethrift && sleep 5 > $ nodetool disablebinary && sleep 5 > $ nodetool disable gossip && sleep 5 > $ nodetool drain && sleep 5 > > It was "graceful" as expected (no significant errors found), but the process > is still a myth to us: are those commands used above "sufficient", and/or > why? The offical doc (docs.datastax.com) did not help with this operation > detail, though "nodetool drain" is apparently essential. > >