nodetool status inconsistencies, repair performance and system keyspace compactions

Ondřej Černoš Tue, 26 Mar 2013 06:36:48 -0700

Hi all,

I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and writes.


Currently I test various operational qualities of the setup.

During one of my tests - see this thread in this mailing list:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html
- I ran into this situation:

- all nodes have all data and agree on it:

[user@host1-dc1:~] nodetool status

Datacenter: na-prod
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                        Load         Tokens  Owns
(effective)  Host ID                                            Rack
UN  XXX.XXX.XXX.XXX   7.74 MB    256     100.0%
0b1f1d79-52af-4d1b-a86d-bf4b65a05c49  cmp17
UN  XXX.XXX.XXX.XXX   7.74 MB    256     100.0%
039f206e-da22-44b5-83bd-2513f96ddeac  cmp10
UN  XXX.XXX.XXX.XXX   7.72 MB    256     100.0%
007097e9-17e6-43f7-8dfc-37b082a784c4  cmp11
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                        Load         Tokens  Owns
(effective)  Host ID                                            Rack
UN  XXX.XXX.XXX.XXX    7.73 MB    256     100.0%
a336efae-8d9c-4562-8e2a-b766b479ecb4  1d
UN  XXX.XXX.XXX.XXX    7.73 MB    256     100.0%
ab1bbf0a-8ddc-4a12-a925-b119bd2de98e  1d
UN  XXX.XXX.XXX.XXX     7.73 MB    256     100.0%
f53fd294-16cc-497e-9613-347f07ac3850  1d

- only one node disagrees:

[user@host1-dc2:~] nodetool status
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                         Load       Tokens   Owns   Host ID
                                              Rack
UN  XXX.XXX.XXX.XXX    7.73 MB    256     17.6%
a336efae-8d9c-4562-8e2a-b766b479ecb4  1d
UN  XXX.XXX.XXX.XXX    7.75 MB    256     16.4%
ab1bbf0a-8ddc-4a12-a925-b119bd2de98e  1d
UN  XXX.XXX.XXX.XXX     7.73 MB    256     15.7%
f53fd294-16cc-497e-9613-347f07ac3850  1d
Datacenter: na-prod
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address                         Load       Tokens   Owns   Host ID
                                              Rack
UN  XXX.XXX.XXX.XXX   7.74 MB    256     16.9%
0b1f1d79-52af-4d1b-a86d-bf4b65a05c49  cmp17
UN  XXX.XXX.XXX.XXX   7.72 MB    256     17.1%
007097e9-17e6-43f7-8dfc-37b082a784c4  cmp11
UN  XXX.XXX.XXX.XXX   7.73 MB    256     16.3%
039f206e-da22-44b5-83bd-2513f96ddeac  cmp10

I tried to rebuild the node from scratch, repair the node, no results.
Still the same owns stats.

The cluster is built from cassandra 1.2.3 and uses vnodes.


On the related note: the data size, as you can see, is really small.
The cluster was created by setting up the us-east datacenter,
populating it with the dataset, then building the na-prod datacenter
and running nodetool rebuild us-east. When I tried to run nodetool
repair it took 25 minutes to finish, on this small dataset. Is this
ok?

One other think I notices is the amount of compactions on the system keyspace:

/.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11694-TOC.txt
/.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11693-Statistics.db

This is just after running the repair. Is this ok, considering the
dataset is 7MB and during the repair no operations were running
against the database, neither read, nor write, nothing?

How will this perform in production with much bigger data if repair
takes 25 minutes on 7MB and 11k compactions were triggered by the
repair run?

regards,

Ondrej Cernos

nodetool status inconsistencies, repair performance and system keyspace compactions

Reply via email to