Re: failure node rejoin

2016-10-16 Thread Ben Slater
To cassandra, the node where you deleted the files looks like a brand new
machine. It doesn’t automatically rebuild machines to prevent accidental
replacement. You need to tell it to build the “new” machines as a
replacement for the “old” machine with that IP by setting
-Dcassandra.replace_address_first_boot=.
See http://cassandra.apache.org/doc/latest/operating/topo_changes.html.

Cheers
Ben

On Mon, 17 Oct 2016 at 16:41 Yuji Ito  wrote:

> Hi all,
>
> A failure node can rejoin a cluster.
> On the node, all data in /var/lib/cassandra were deleted.
> Is it normal?
>
> I can reproduce it as below.
>
> cluster:
> - C* 2.2.7
> - a cluster has node1, 2, 3
> - node1 is a seed
> - replication_factor: 3
>
> how to:
> 1) stop C* process and delete all data in /var/lib/cassandra on node2
> ($sudo rm -rf /var/lib/cassandra/*)
> 2) stop C* process on node1 and node3
> 3) restart C* on node1
> 4) restart C* on node2
>
> nodetool status after 4):
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens   Owns (effective)  Host ID
> Rack
> DN  [node3 IP]  ? 256  100.0%
>  325553c6-3e05-41f6-a1f7-47436743816f  rack1
> UN  [node2 IP]  7.76 MB  256  100.0%
>  05bdb1d4-c39b-48f1-8248-911d61935925  rack1
> UN  [node1 IP]  416.13 MB  256  100.0%
>  a8ec0a31-cb92-44b0-b156-5bcd4f6f2c7b  rack1
>
> If I restart C* on node 2 when C* on node1 and node3 are running (without
> 2), 3)), a runtime exception happens.
> RuntimeException: "A node with address [node2 IP] already exists,
> cancelling join..."
>
> I'm not sure this causes data lost. All data can be read properly just
> after this rejoin.
> But some rows are lost when I kill C* for destructive tests after
> this rejoin.
>
> Thanks.
>
> --

Ben Slater
Chief Product Officer
Instaclustr: Cassandra + Spark - Managed | Consulting | Support
+61 437 929 798


failure node rejoin

2016-10-16 Thread Yuji Ito
Hi all,

A failure node can rejoin a cluster.
On the node, all data in /var/lib/cassandra were deleted.
Is it normal?

I can reproduce it as below.

cluster:
- C* 2.2.7
- a cluster has node1, 2, 3
- node1 is a seed
- replication_factor: 3

how to:
1) stop C* process and delete all data in /var/lib/cassandra on node2
($sudo rm -rf /var/lib/cassandra/*)
2) stop C* process on node1 and node3
3) restart C* on node1
4) restart C* on node2

nodetool status after 4):
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens   Owns (effective)  Host ID
Rack
DN  [node3 IP]  ? 256  100.0%
 325553c6-3e05-41f6-a1f7-47436743816f  rack1
UN  [node2 IP]  7.76 MB  256  100.0%
 05bdb1d4-c39b-48f1-8248-911d61935925  rack1
UN  [node1 IP]  416.13 MB  256  100.0%
 a8ec0a31-cb92-44b0-b156-5bcd4f6f2c7b  rack1

If I restart C* on node 2 when C* on node1 and node3 are running (without
2), 3)), a runtime exception happens.
RuntimeException: "A node with address [node2 IP] already exists,
cancelling join..."

I'm not sure this causes data lost. All data can be read properly just
after this rejoin.
But some rows are lost when I kill C* for destructive tests after
this rejoin.

Thanks.