Re: Question upon gracefully restarting c* node(s)

Jeff Jirsa Wed, 10 Jan 2018 08:35:16 -0800

Shutdown (drain, rather) does all of those things, but it’s not very patient - 
it doesn’t sleep (and there’s no setup time like reconnecting for every 
invocation of nodetool) so things shutdown quickly in rapid succession, which 
may have client-visible impact.




-- 
Jeff Jirsa


> On Jan 10, 2018, at 6:20 AM, Thakrar, Jayesh <jthak...@conversantmedia.com> 
> wrote:
> 
> Just curious - aside from the "sleep", is this all not part of the shutdown 
> command?
> Is this an "opportunity" to improve C*?
> Having worked with RDBMSes, Hadoop and HBase, stopping communication, 
> flushing memcache (HBase), and relinquishing ownership of data (HBase) is all 
> part of the shutdown process.
>  
>  
> From: Alain RODRIGUEZ <arodr...@gmail.com>
> Date: Wednesday, January 10, 2018 at 6:19 AM
> To: "user cassandra.apache.org" <user@cassandra.apache.org>
> Subject: Re: Question upon gracefully restarting c* node(s)
>  
> I agree with comments above. Cassandra is robust, and we are just talking 
> about optimising the process. Nothing mandatory. Going to an extreme I would 
> say you can pull and plug back the node power cable and call it a restart, It 
> should not harm if your cluster is properly tuned. Yet optimisation are 
> welcomed as they improve entropy, starting time. Plus we are civilized 
> operators, not barbarians, aren't we ;-)? It's just more 'clean' and 
> efficient. 
> Also, historically, it was mandatory to drain when using counter to prevent 
> over-count as counter are not idempotent. Not sure about this nowadays).
>  
> Last time I asked this very question I ended up building this command that I 
> have been using since then:
>  
> `date && nodetool disablebinary && nodetool disablegossip && sleep 10 && 
> nodetool flush && nodetool drain && sleep 10 && sudo service cassandra 
> restart`
>  
> It does the following:
>  
> - Print the date for the record
> - Stop all clients transports. I never heard about a benefice of shutting 
> down the gossip protocol, and so never did so, it might be better but I can't 
> really say. This way we stop listening for clients.
> - After a small while no clients are using the node, calling the drain 
> flushes memtables and recycle commitlog as Kurt detailed above. Here I add a 
> 'flush' because I haven't been that lucky in the past with drain, sometimes 
> not working at all, sometimes not cleaning commitlogs. I believe flushing 
> first makes this restart command more robust.
> - Finally restart the service.
>  
> I think there is not only one good way to do this. Also, doing it wrong is 
> often not such a big deal.
>  
> C*heers,
> -----------------------
> Alain Rodriguez - @arodream - al...@thelastpickle.com
> France / Spain
>  
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>  
>  
>  
>  
>  
> 2018-01-08 3:33 GMT+00:00 Jeff Jirsa <jji...@gmail.com>:
> The sequence does have some objective benefits - especially stopping 
> transports and then gossip, it tells everything you’re going offline before 
> you do, so requests won’t get dropped or have to speculate to other replicas. 
>  
>  
> 
> -- 
> Jeff Jirsa
>  
> 
> On Jan 7, 2018, at 7:22 PM, kurt greaves <k...@instaclustr.com> wrote:
> 
> None are essential. Cassandra will gracefully shutdown in any scenario as 
> long as it's not killed with a SIGKILL. However, drain does have a few 
> benefits over just a normal shutdown. It will stop a few extra services 
> (batchlog, compactions) and importantly it will also force recycling of dirty 
> commitlog segments, meaning there will be less commitlog files to replay on 
> startup and reducing startup time.
>  
> A comment in the code for drain also indicates that it will wait for 
> in-progress streaming to complete, but I haven't managed to find 1. where 
> this occurs, or 2. if it actually differs to a normal shutdown. Note that 
> this is all w.r.t 2.1. In 3.0.10 and 3.10 drain and shutdown more or less do 
> the exact same thing, however drain will log some extra messages.
>  
> On 2 January 2018 at 07:07, Jing Meng <self.rel...@gmail.com> wrote:
> Hi all.
>  
> Recently we made a change to our production env c* cluster (2.1.18) - placing 
> the commit log to the same SSD where data is stored, which needs restarting 
> all nodes. 
>  
> Before restarting a cassandra node, we ran the following nodetool utils:
> $ nodetool disablethrift && sleep 5
> $ nodetool disablebinary && sleep 5
> $ nodetool disable gossip && sleep 5
> $ nodetool drain && sleep 5
>  
> It was "graceful" as expected (no significant errors found), but the process 
> is still a myth to us: are those commands used above "sufficient", and/or 
> why? The offical doc (docs.datastax.com) did not help with this operation 
> detail, though "nodetool drain" is apparently essential.
>  
>

Re: Question upon gracefully restarting c* node(s)

Reply via email to