You could certainly log a JIRA for the “failure node rejoin” issue ( https://issues.apache.org/*jira*/browse/ <https://issues.apache.org/jira/browse/>*cassandra <https://issues.apache.org/jira/browse/cassandra>). I*t sounds like unexpected behaviour to me. However, I’m not sure it will be viewed a high priority to fix given there is a clear operational work-around.
Cheers Ben On Thu, 24 Nov 2016 at 15:14 Yuji Ito <y...@imagine-orb.com> wrote: > Hi Ben, > > I continue to investigate the data loss issue. > I'm investigating logs and source code and try to reproduce the data loss > issue with a simple test. > I also try my destructive test with DROP instead of TRUNCATE. > > BTW, I want to discuss the issue of the title "failure node rejoin" again. > > Will this issue be fixed? Other nodes should refuse this unexpected rejoin. > Or should I be more careful to add failure nodes to the existing cluster? > > Thanks, > yuji > > > On Fri, Nov 11, 2016 at 1:00 PM, Ben Slater <ben.sla...@instaclustr.com> > wrote: > > From a quick look I couldn’t find any defects other than the ones you’ve > found that seem potentially relevant to your issue (if any one else on the > list knows of one please chime in). Maybe the next step, if you haven’t > done so already, is to check your Cassandra logs for any signs of issues > (ie WARNING or ERROR logs) in the failing case. > > Cheers > Ben > > On Fri, 11 Nov 2016 at 13:07 Yuji Ito <y...@imagine-orb.com> wrote: > > Thanks Ben, > > I tried 2.2.8 and could reproduce the problem. > So, I'm investigating some bug fixes of repair and commitlog between 2.2.8 > and 3.0.9. > > - CASSANDRA-12508: "nodetool repair returns status code 0 for some errors" > > - CASSANDRA-12436: "Under some races commit log may incorrectly think it > has unflushed data" > - related to CASSANDRA-9669, CASSANDRA-11828 (the fix of 2.2 is > different from that of 3.0?) > > Do you know other bug fixes related to commitlog? > > Regards > yuji > > On Wed, Nov 9, 2016 at 11:34 AM, Ben Slater <ben.sla...@instaclustr.com> > wrote: > > There have been a few commit log bugs around in the last couple of months > so perhaps you’ve hit something that was fixed recently. Would be > interesting to know the problem is still occurring in 2.2.8. > > I suspect what is happening is that when you do your initial read (without > flush) to check the number of rows, the data is in memtables and > theoretically the commitlogs but not sstables. With the forced stop the > memtables are lost and Cassandra should read the commitlog from disk at > startup to reconstruct the memtables. However, it looks like that didn’t > happen for some (bad) reason. > > Good news that 3.0.9 fixes the problem so up to you if you want to > investigate further and see if you can narrow it down to file a JIRA > (although the first step of that would be trying 2.2.9 to make sure it’s > not already fixed there). > > Cheers > Ben > > On Wed, 9 Nov 2016 at 12:56 Yuji Ito <y...@imagine-orb.com> wrote: > > I tried C* 3.0.9 instead of 2.2. > The data lost problem hasn't happen for now (without `nodetool flush`). > > Thanks > > On Fri, Nov 4, 2016 at 3:50 PM, Yuji Ito <y...@imagine-orb.com> wrote: > > Thanks Ben, > > When I added `nodetool flush` on all nodes after step 2, the problem > didn't happen. > Did replay from old commit logs delete rows? > > Perhaps, the flush operation just detected that some nodes were down in > step 2 (just after truncating tables). > (Insertion and check in step2 would succeed if one node was down because > consistency levels was serial. > If the flush failed on more than one node, the test would retry step 2.) > However, if so, the problem would happen without deleting Cassandra data. > > Regards, > yuji > > > On Mon, Oct 24, 2016 at 8:37 AM, Ben Slater <ben.sla...@instaclustr.com> > wrote: > > Definitely sounds to me like something is not working as expected but I > don’t really have any idea what would cause that (other than the fairly > extreme failure scenario). A couple of things I can think of to try to > narrow it down: > 1) Run nodetool flush on all nodes after step 2 - that will make sure all > data is written to sstables rather than relying on commit logs > 2) Run the test with consistency level quorom rather than serial > (shouldn’t be any different but quorom is more widely used so maybe there > is a bug that’s specific to serial) > > Cheers > Ben > > On Mon, 24 Oct 2016 at 10:29 Yuji Ito <y...@imagine-orb.com> wrote: > > Hi Ben, > > The test without killing nodes has been working well without data lost. > I've repeated my test about 200 times after removing data and > rebuild/repair. > > Regards, > > > On Fri, Oct 21, 2016 at 3:14 PM, Yuji Ito <y...@imagine-orb.com> wrote: > > > Just to confirm, are you saying: > > a) after operation 2, you select all and get 1000 rows > > b) after operation 3 (which only does updates and read) you select and > only get 953 rows? > > That's right! > > I've started the test without killing nodes. > I'll report the result to you next Monday. > > Thanks > > > On Fri, Oct 21, 2016 at 3:05 PM, Ben Slater <ben.sla...@instaclustr.com> > wrote: > > Just to confirm, are you saying: > a) after operation 2, you select all and get 1000 rows > b) after operation 3 (which only does updates and read) you select and > only get 953 rows? > > If so, that would be very unexpected. If you run your tests without > killing nodes do you get the expected (1,000) rows? > > Cheers > Ben > > On Fri, 21 Oct 2016 at 17:00 Yuji Ito <y...@imagine-orb.com> wrote: > > > Are you certain your tests don’t generate any overlapping inserts (by > PK)? > > Yes. The operation 2) also checks the number of rows just after all > insertions. > > > On Fri, Oct 21, 2016 at 2:51 PM, Ben Slater <ben.sla...@instaclustr.com> > wrote: > > OK. Are you certain your tests don’t generate any overlapping inserts (by > PK)? Cassandra basically treats any inserts with the same primary key as > updates (so 1000 insert operations may not necessarily result in 1000 rows > in the DB). > > On Fri, 21 Oct 2016 at 16:30 Yuji Ito <y...@imagine-orb.com> wrote: > > thanks Ben, > > > 1) At what stage did you have (or expect to have) 1000 rows (and have > the mismatch between actual and expected) - at that end of operation (2) or > after operation (3)? > > after operation 3), at operation 4) which reads all rows by cqlsh with > CL.SERIAL > > > 2) What replication factor and replication strategy is used by the test > keyspace? What consistency level is used by your operations? > > - create keyspace testkeyspace WITH REPLICATION = > {'class':'SimpleStrategy','replication_factor':3}; > - consistency level is SERIAL > > > On Fri, Oct 21, 2016 at 12:04 PM, Ben Slater <ben.sla...@instaclustr.com> > wrote: > > > A couple of questions: > 1) At what stage did you have (or expect to have) 1000 rows (and have the > mismatch between actual and expected) - at that end of operation (2) or > after operation (3)? > 2) What replication factor and replication strategy is used by the test > keyspace? What consistency level is used by your operations? > > > Cheers > Ben > > On Fri, 21 Oct 2016 at 13:57 Yuji Ito <y...@imagine-orb.com> wrote: > > Thanks Ben, > > I tried to run a rebuild and repair after the failure node rejoined the > cluster as a "new" node with -Dcassandra.replace_address_first_boot. > The failure node could rejoined and I could read all rows successfully. > (Sometimes a repair failed because the node cannot access other node. If > it failed, I retried a repair) > > But some rows were lost after my destructive test repeated (after about > 5-6 hours). > After the test inserted 1000 rows, there were only 953 rows at the end of > the test. > >