Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-05-22 Thread Hiroyuki Yamada
Hi, FYI: I created a bug ticket since I think the behavior is just not right. https://issues.apache.org/jira/browse/CASSANDRA-15138 Thanks, Hiro On Mon, May 13, 2019 at 10:58 AM Hiroyuki Yamada wrote: > Hi, > > Should I post a bug ? > It doesn't seem to be an expected behavior, > so I think

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-05-12 Thread Hiroyuki Yamada
Hi, Should I post a bug ? It doesn't seem to be an expected behavior, so I think it should be at least documented somewhere. Thanks, Hiro On Fri, Apr 26, 2019 at 3:17 PM Hiroyuki Yamada wrote: > Hello, > > Thank you for some feedbacks. > > >Ben > Thank you. > I've tested with lower

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-04-26 Thread Hiroyuki Yamada
Hello, Thank you for some feedbacks. >Ben Thank you. I've tested with lower concurrency in my side, the issue still occurs. We are using 3 x T3.xlarge instances for C* and small and separate instance for the client program. But if we tried with 1 host with 3 C* nodes, the issue didn't occur. >

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-04-25 Thread Alok Dwivedi
Could it be related to hinted hand offs being stored in Node1 and then attempted to be replayed in Node2 when it comes back causing more load as new mutations are also being applied from cassandra-stress at same time? Alok Dwivedi Senior Consultant https://www.instaclustr.com/ > On 26 Apr

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-04-25 Thread Ben Slater
In the absence of anyone else having any bright ideas - it still sounds to me like the kind of scenario that can occur in a heavily overloaded cluster. I would try again with a lower load. What size machines are you using for stress client and the nodes? Are they all on separate machines? Cheers

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-04-25 Thread Hiroyuki Yamada
Hello, Sorry again. We found yet another weird thing in this. If we stop nodes with systemctl or just kill (TERM), it causes the problem, but if we kill -9, it doesn't cause the problem. Thanks, Hiro On Wed, Apr 24, 2019 at 11:31 PM Hiroyuki Yamada wrote: > Sorry, I didn't write the version

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-04-24 Thread Hiroyuki Yamada
Sorry, I didn't write the version and the configurations. I've tested with C* 3.11.4, and the configurations are mostly set to default except for the replication factor and listen_address for proper networking. Thanks, Hiro On Wed, Apr 24, 2019 at 5:12 PM Hiroyuki Yamada wrote: > Hello Ben, >

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-04-24 Thread Hiroyuki Yamada
Hello Ben, Thank you for the quick reply. I haven't tried that case, but it does't recover even if I stopped the stress. Thanks, Hiro On Wed, Apr 24, 2019 at 3:36 PM Ben Slater wrote: > Is it possible that stress is overloading node 1 so it’s not recovering > state properly when node 2 comes

Re: A cluster (RF=3) not recovering after two nodes are stopped

2019-04-24 Thread Ben Slater
Is it possible that stress is overloading node 1 so it’s not recovering state properly when node 2 comes up? Have you tried running with a lower load (say 2 or 3 threads)? Cheers Ben --- *Ben Slater* *Chief Product Officer*

A cluster (RF=3) not recovering after two nodes are stopped

2019-04-24 Thread Hiroyuki Yamada
Hello, I faced a weird issue when recovering a cluster after two nodes are stopped. It is easily reproduce-able and looks like a bug or an issue to fix, so let me write down the steps to reproduce. === STEPS TO REPRODUCE === * Create a 3-node cluster with RF=3 - node1(seed), node2, node3 *