Re: Cassandra single unreachable node causing total cluster outage

2018-12-02 Thread Marc Selwan
Ben's question is a good one - What are the exact symptoms you're experiencing? Is it latency spikes? Nodes flapping? That'll help us figure out where to look. When you removed the down node, which command did you use? Best, Marc On Sun, Dec 2, 2018 at 1:36 PM Agrawal, Pratik wrote: > One

Re: Cassandra single unreachable node causing total cluster outage

2018-12-02 Thread Agrawal, Pratik
I looked into some of the logs and I saw that at the time of the event the Native requests started getting blocked. e.g. [INFO] org.apache.cassandra.utils.StatusLogger: Native-Transport-Requests 128 133 5179582116 19114 The number of blocked requests

Re: Cassandra single unreachable node causing total cluster outage

2018-12-02 Thread Agrawal, Pratik
One other thing I forgot to add: native_transport_max_threads: 128 we have commented this setting out, should we bound this? I am planning to experiment with this setting to bound it. Thanks, Pratik From: "Agrawal, Pratik" Date: Sunday, December 2, 2018 at 4:33 PM To:

Cassandra Upgrade Plan 2.2.4 to 3.11.3

2018-12-02 Thread Devaki, Srinivas
Hi everyone, I have planned out our org's cassandra upgrade plan and want to make sure if it seems fine. Details Existing Cluster: * Cassandra 2.2.4 * 8 nodes with 32G ram and 12G max heap allocated to cassandra * 4 nodes in each rack 1. Ensured all clients to use LOCAL_* consistency levels and

Re: Problem with restoring a snapshot using sstableloader

2018-12-02 Thread Alex Ott
It's a bug in the sstableloader introduced many years ago - before that, it worked as described in documentation... Oliver Herrmann at "Fri, 30 Nov 2018 17:05:43 +0100" wrote: OH> Hi, OH> I'm having some problems to restore a snapshot using sstableloader. I'm using cassandra 3.11.1 and

Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

2018-12-02 Thread Jeff Jirsa
> On Dec 2, 2018, at 12:40 PM, Shravan R wrote: > > Marc/Dimitry/Jon - greatly appreciate your feedback. I will look into the > version part that you suggested. The reason to go direct to 3.x is to take a > bi leap and reduce overall effort to upgrade a large cluster (development >