Hi, most has been resolved - the failed to uncompress error was really a bug in cassandra (see https://issues.apache.org/jira/browse/CASSANDRA-5391) and the problem with different load reporting is a change between 1.2.1 (reports 100% for 3 replicas/3 nodes/2 DCs setup I have) and 1.2.3 which reports the fraction. Is this correct?
Anyway, the nodetool repair still takes ages to finish, considering only megabytes of not changing data are involved in my test: [root@host:/etc/puppet] nodetool repair ks [2013-04-04 13:26:46,618] Starting repair command #1, repairing 1536 ranges for keyspace ks [2013-04-04 13:47:17,007] Repair session 88ebc700-9d1a-11e2-a0a1-05b94e1385c7 for range (-2270395505556181001,-2268004533044804266] finished ... [2013-04-04 13:47:17,063] Repair session 65d31180-9d1d-11e2-a0a1-05b94e1385c7 for range (1069254279177813908,1070290707448386360] finished [2013-04-04 13:47:17,063] Repair command #1 finished This is the status before the repair (by the way, after the datacenter has been bootstrapped from the remote one): [root@host:/etc/puppet] nodetool status Datacenter: us-east =================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN xxx.xxx.xxx.xxx 5.74 MB 256 17.1% 06ff8328-32a3-4196-a31f-1e0f608d0638 1d UN xxx.xxx.xxx.xxx 5.73 MB 256 15.3% 7a96bf16-e268-433a-9912-a0cf1668184e 1d UN xxx.xxx.xxx.xxx 5.72 MB 256 17.5% 67a68a2a-12a8-459d-9d18-221426646e84 1d Datacenter: na-dev ================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN xxx.xxx.xxx.xxx 5.74 MB 256 16.4% eb86aaae-ef0d-40aa-9b74-2b9704c77c0a cmp02 UN xxx.xxx.xxx.xxx 5.74 MB 256 17.0% cd24af74-7f6a-4eaa-814f-62474b4e4df1 cmp01 UN xxx.xxx.xxx.xxx 5.74 MB 256 16.7% 1a55cfd4-bb30-4250-b868-a9ae13d81ae1 cmp05 Why does it take 20 minutes to finish? Fortunately the big number of compactions I reported in the previous email was not triggered. And is there a documentation where I could find the exact semantics of repair when vnodes are used (and what -pr means in such a setup) and when run in multiple datacenter setup? I still don't quite get it. regards, Ondřej Černoš On Thu, Mar 28, 2013 at 3:30 AM, aaron morton <aa...@thelastpickle.com> wrote: > During one of my tests - see this thread in this mailing list: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html > > That thread has been updated, check the bug ondrej created. > > How will this perform in production with much bigger data if repair > takes 25 minutes on 7MB and 11k compactions were triggered by the > repair run? > > Seems a little odd. > See what happens the next time you run repair. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 27/03/2013, at 2:36 AM, Ondřej Černoš <cern...@gmail.com> wrote: > > Hi all, > > I have 2 DCs, 3 nodes each, RF:3, I use local quorum for both reads and > writes. > > Currently I test various operational qualities of the setup. > > During one of my tests - see this thread in this mailing list: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/java-io-IOException-FAILED-TO-UNCOMPRESS-5-exception-when-running-nodetool-rebuild-td7586494.html > - I ran into this situation: > > - all nodes have all data and agree on it: > > [user@host1-dc1:~] nodetool status > > Datacenter: na-prod > =================== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns > (effective) Host ID Rack > UN XXX.XXX.XXX.XXX 7.74 MB 256 100.0% > 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 > UN XXX.XXX.XXX.XXX 7.74 MB 256 100.0% > 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 > UN XXX.XXX.XXX.XXX 7.72 MB 256 100.0% > 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 > Datacenter: us-east > =================== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns > (effective) Host ID Rack > UN XXX.XXX.XXX.XXX 7.73 MB 256 100.0% > a336efae-8d9c-4562-8e2a-b766b479ecb4 1d > UN XXX.XXX.XXX.XXX 7.73 MB 256 100.0% > ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d > UN XXX.XXX.XXX.XXX 7.73 MB 256 100.0% > f53fd294-16cc-497e-9613-347f07ac3850 1d > > - only one node disagrees: > > [user@host1-dc2:~] nodetool status > Datacenter: us-east > =================== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN XXX.XXX.XXX.XXX 7.73 MB 256 17.6% > a336efae-8d9c-4562-8e2a-b766b479ecb4 1d > UN XXX.XXX.XXX.XXX 7.75 MB 256 16.4% > ab1bbf0a-8ddc-4a12-a925-b119bd2de98e 1d > UN XXX.XXX.XXX.XXX 7.73 MB 256 15.7% > f53fd294-16cc-497e-9613-347f07ac3850 1d > Datacenter: na-prod > =================== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns Host ID > Rack > UN XXX.XXX.XXX.XXX 7.74 MB 256 16.9% > 0b1f1d79-52af-4d1b-a86d-bf4b65a05c49 cmp17 > UN XXX.XXX.XXX.XXX 7.72 MB 256 17.1% > 007097e9-17e6-43f7-8dfc-37b082a784c4 cmp11 > UN XXX.XXX.XXX.XXX 7.73 MB 256 16.3% > 039f206e-da22-44b5-83bd-2513f96ddeac cmp10 > > I tried to rebuild the node from scratch, repair the node, no results. > Still the same owns stats. > > The cluster is built from cassandra 1.2.3 and uses vnodes. > > > On the related note: the data size, as you can see, is really small. > The cluster was created by setting up the us-east datacenter, > populating it with the dataset, then building the na-prod datacenter > and running nodetool rebuild us-east. When I tried to run nodetool > repair it took 25 minutes to finish, on this small dataset. Is this > ok? > > One other think I notices is the amount of compactions on the system > keyspace: > > /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11694-TOC.txt > /.../system/schema_columnfamilies/system-schema_columnfamilies-ib-11693-Statistics.db > > This is just after running the repair. Is this ok, considering the > dataset is 7MB and during the repair no operations were running > against the database, neither read, nor write, nothing? > > How will this perform in production with much bigger data if repair > takes 25 minutes on 7MB and 11k compactions were triggered by the > repair run? > > regards, > > Ondrej Cernos > >