Hi all, I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and writes.
Currently I test various operational qualities of the setup. During one of my tests - see this thread in this mailing list: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html - I ran into this situation: - all nodes have all data and agree on it: [user@host1-dc1:~] nodetool status Datacenter: na-prod =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN XXX.XXX.XXX.XXX 7.74 MB 256 100.0% 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 UN XXX.XXX.XXX.XXX 7.74 MB 256 100.0% 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 UN XXX.XXX.XXX.XXX 7.72 MB 256 100.0% 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 Datacenter: us-east =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN XXX.XXX.XXX.XXX 7.73 MB 256 100.0% a336efae-8d9c-4562-8e2a-b766b479ecb4 1d UN XXX.XXX.XXX.XXX 7.73 MB 256 100.0% ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d UN XXX.XXX.XXX.XXX 7.73 MB 256 100.0% f53fd294-16cc-497e-9613-347f07ac3850 1d - only one node disagrees: [user@host1-dc2:~] nodetool status Datacenter: us-east =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN XXX.XXX.XXX.XXX 7.73 MB 256 17.6% a336efae-8d9c-4562-8e2a-b766b479ecb4 1d UN XXX.XXX.XXX.XXX 7.75 MB 256 16.4% ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d UN XXX.XXX.XXX.XXX 7.73 MB 256 15.7% f53fd294-16cc-497e-9613-347f07ac3850 1d Datacenter: na-prod =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN XXX.XXX.XXX.XXX 7.74 MB 256 16.9% 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 UN XXX.XXX.XXX.XXX 7.72 MB 256 17.1% 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 UN XXX.XXX.XXX.XXX 7.73 MB 256 16.3% 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 I tried to rebuild the node from scratch, repair the node, no results. Still the same owns stats. The cluster is built from cassandra 1.2.3 and uses vnodes. On the related note: the data size, as you can see, is really small. The cluster was created by setting up the us-east datacenter, populating it with the dataset, then building the na-prod datacenter and running nodetool rebuild us-east. When I tried to run nodetool repair it took 25 minutes to finish, on this small dataset. Is this ok? One other think I notices is the amount of compactions on the system keyspace: /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11694-TOC.txt /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11693-Statistics.db This is just after running the repair. Is this ok, considering the dataset is 7MB and during the repair no operations were running against the database, neither read, nor write, nothing? How will this perform in production with much bigger data if repair takes 25 minutes on 7MB and 11k compactions were triggered by the repair run? regards, Ondrej Cernos
