On Thu, Aug 3, 2017 at 9:33 AM, Daniel Hölbling-Inzko <
daniel.hoelbling-in...@bitmovin.com> wrote:

> No I set Auto bootstrap to true and the node was UN in nodetool status but
> when doing a select on the node with ONE I got incomplete data.
>

What I think is happening here is not related to the new node being added.

When you increase Replication Factor, that does not automatically
redistribute the existing data.  It just makes other nodes responsible for
portions of the data they might not really have yet.  So I would expect
that all your nodes show some inconsistencies, before you run a full repair
of the ring.

I can fairly easily reproduce it locally with ccm[1], 3 nodes, version
3.0.13.

$ ccm status
Cluster: 'v3013'
----------------
node1: UP
node3: UP
node2: UP

$ ccm node1 cqlsh
cqlsh> create keyspace test_rf WITH replication = {'class':
'NetworkTopologyStrategy', 'datacenter1': 1};
cqlsh> create table test_rf.t1(id int, data text, primary key(id));
cqlsh> insert into test_rf.t1(id, data) values(1, 'one');
cqlsh> select * from test_rf.t1;

 id | data
----+------
  1 |  one

(1 rows)

At this point selecting from t1 works correctly on any of the nodes with
the default CL=ONE.

If we would now increase the RF and try reading again, something surprising
will happen:

cqlsh> alter keyspace test_rf WITH replication = {'class':
'NetworkTopologyStrategy', 'datacenter1': 2};
cqlsh> select * from test_rf.t1;

 id | data
----+------

(0 rows)

And in my test this happens on all nodes at the same time.  Explanation is
fairly simple: now a different node is responsible for the data that was
written to only one other node previously.

A repair in this tiny test is trivial:
cqlsh> CONSISTENCY ALL;
cqlsh> select * from test_rf.t1;

 id | data
----+------
  1 |  one

(1 rows)

And now the data can be read from any node again, since we did a "full
repair".

--
Alex

[1] https://github.com/pcmanus/ccm

Reply via email to