[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-8366: Component/s: (was: Legacy/Streaming and Messaging) > Repair grows data on nodes, causes load to become unbalanced > > > Key: CASSANDRA-8366 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8366 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair > Environment: 4 node cluster > 2.1.2 Cassandra > Inserts and reads are done with CQL driver >Reporter: Jan Karlsson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 2.1.5 > > Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, > results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, > results-500_2_inc_repairs.txt, > results-500_full_repair_then_inc_repairs.txt, > results-500_inc_repairs_not_parallel.txt, > run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, > run3_no_compact_before_repair.log, test.sh, testv2.sh > > > There seems to be something weird going on when repairing data. > I have a program that runs 2 hours which inserts 250 random numbers and reads > 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. > I use size-tiered compaction for my cluster. > After those 2 hours I run a repair and the load of all nodes goes up. If I > run incremental repair the load goes up alot more. I saw the load shoot up 8 > times the original size multiple times with incremental repair. (from 2G to > 16G) > with node 9 8 7 and 6 the repro procedure looked like this: > (Note that running full repair first is not a requirement to reproduce.) > {noformat} > After 2 hours of 250 reads + 250 writes per second: > UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > Repair -pr -par on all nodes sequentially > UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > repair -inc -par on all nodes sequentially > UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > after rolling restart > UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 2.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.19 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > compact all nodes sequentially > UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 1.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > repair -inc -par on all nodes sequentially > UN 9 1.98 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 3.71 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > restart once more > UN 9 2 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.05 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 4.1 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > {noformat} > Is there something im missing or is this strange behavior? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-8366: Component/s: Consistency/Repair > Repair grows data on nodes, causes load to become unbalanced > > > Key: CASSANDRA-8366 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8366 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Legacy/Streaming and Messaging > Environment: 4 node cluster > 2.1.2 Cassandra > Inserts and reads are done with CQL driver >Reporter: Jan Karlsson >Assignee: Marcus Eriksson >Priority: Major > Fix For: 2.1.5 > > Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, > results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, > results-500_2_inc_repairs.txt, > results-500_full_repair_then_inc_repairs.txt, > results-500_inc_repairs_not_parallel.txt, > run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, > run3_no_compact_before_repair.log, test.sh, testv2.sh > > > There seems to be something weird going on when repairing data. > I have a program that runs 2 hours which inserts 250 random numbers and reads > 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. > I use size-tiered compaction for my cluster. > After those 2 hours I run a repair and the load of all nodes goes up. If I > run incremental repair the load goes up alot more. I saw the load shoot up 8 > times the original size multiple times with incremental repair. (from 2G to > 16G) > with node 9 8 7 and 6 the repro procedure looked like this: > (Note that running full repair first is not a requirement to reproduce.) > {noformat} > After 2 hours of 250 reads + 250 writes per second: > UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > Repair -pr -par on all nodes sequentially > UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > repair -inc -par on all nodes sequentially > UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > after rolling restart > UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 2.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.19 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > compact all nodes sequentially > UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 1.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > repair -inc -par on all nodes sequentially > UN 9 1.98 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 3.71 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > restart once more > UN 9 2 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.05 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 4.1 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > {noformat} > Is there something im missing or is this strange behavior? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-8366: --- Component/s: Streaming and Messaging > Repair grows data on nodes, causes load to become unbalanced > > > Key: CASSANDRA-8366 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8366 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: 4 node cluster > 2.1.2 Cassandra > Inserts and reads are done with CQL driver >Reporter: Jan Karlsson >Assignee: Marcus Eriksson > Fix For: 2.1.5 > > Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, > results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, > results-500_2_inc_repairs.txt, > results-500_full_repair_then_inc_repairs.txt, > results-500_inc_repairs_not_parallel.txt, > run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, > run3_no_compact_before_repair.log, test.sh, testv2.sh > > > There seems to be something weird going on when repairing data. > I have a program that runs 2 hours which inserts 250 random numbers and reads > 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. > I use size-tiered compaction for my cluster. > After those 2 hours I run a repair and the load of all nodes goes up. If I > run incremental repair the load goes up alot more. I saw the load shoot up 8 > times the original size multiple times with incremental repair. (from 2G to > 16G) > with node 9 8 7 and 6 the repro procedure looked like this: > (Note that running full repair first is not a requirement to reproduce.) > {noformat} > After 2 hours of 250 reads + 250 writes per second: > UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > Repair -pr -par on all nodes sequentially > UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > repair -inc -par on all nodes sequentially > UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > after rolling restart > UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 2.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.19 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > compact all nodes sequentially > UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 1.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > repair -inc -par on all nodes sequentially > UN 9 1.98 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 3.71 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > restart once more > UN 9 2 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 > UN 8 2.05 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 > UN 7 4.1 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 > UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 > {noformat} > Is there something im missing or is this strange behavior? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-8366: --- Reviewer: Yuki Morishita Reproduced In: 2.1.2, 2.1.1 (was: 2.1.1, 2.1.2) Repair grows data on nodes, causes load to become unbalanced Key: CASSANDRA-8366 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366 Project: Cassandra Issue Type: Bug Environment: 4 node cluster 2.1.2 Cassandra Inserts and reads are done with CQL driver Reporter: Jan Karlsson Assignee: Marcus Eriksson Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, results-500_2_inc_repairs.txt, results-500_full_repair_then_inc_repairs.txt, results-500_inc_repairs_not_parallel.txt, run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, run3_no_compact_before_repair.log, test.sh, testv2.sh There seems to be something weird going on when repairing data. I have a program that runs 2 hours which inserts 250 random numbers and reads 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. I use size-tiered compaction for my cluster. After those 2 hours I run a repair and the load of all nodes goes up. If I run incremental repair the load goes up alot more. I saw the load shoot up 8 times the original size multiple times with incremental repair. (from 2G to 16G) with node 9 8 7 and 6 the repro procedure looked like this: (Note that running full repair first is not a requirement to reproduce.) {noformat} After 2 hours of 250 reads + 250 writes per second: UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 Repair -pr -par on all nodes sequentially UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 after rolling restart UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.19 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 compact all nodes sequentially UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 1.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 1.98 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 3.71 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 restart once more UN 9 2 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.05 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 4.1 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 {noformat} Is there something im missing or is this strange behavior? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-8366: --- Attachment: 0001-8366.patch attaching patch that picks the sstables to compact as late as possible. Actually a semi-backport of CASSANDRA-7586 We will still have a slightly bigger live size on the nodes after one of these repairs as some sstables will not get anticompacted due to being compacted away (we could probably improve this as well, but in another ticket), but it is much better: {code} $ du -sch /home/marcuse/.ccm/8366/node?/data/r1/ 1,8G/home/marcuse/.ccm/8366/node1/data/r1/ 1,8G/home/marcuse/.ccm/8366/node2/data/r1/ 1,8G/home/marcuse/.ccm/8366/node3/data/r1/ 5,2Gtotal {code} Repair grows data on nodes, causes load to become unbalanced Key: CASSANDRA-8366 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366 Project: Cassandra Issue Type: Bug Environment: 4 node cluster 2.1.2 Cassandra Inserts and reads are done with CQL driver Reporter: Jan Karlsson Assignee: Marcus Eriksson Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, results-500_2_inc_repairs.txt, results-500_full_repair_then_inc_repairs.txt, results-500_inc_repairs_not_parallel.txt, run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, run3_no_compact_before_repair.log, test.sh, testv2.sh There seems to be something weird going on when repairing data. I have a program that runs 2 hours which inserts 250 random numbers and reads 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. I use size-tiered compaction for my cluster. After those 2 hours I run a repair and the load of all nodes goes up. If I run incremental repair the load goes up alot more. I saw the load shoot up 8 times the original size multiple times with incremental repair. (from 2G to 16G) with node 9 8 7 and 6 the repro procedure looked like this: (Note that running full repair first is not a requirement to reproduce.) {noformat} After 2 hours of 250 reads + 250 writes per second: UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 Repair -pr -par on all nodes sequentially UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 after rolling restart UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.19 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 compact all nodes sequentially UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 1.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 1.98 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 3.71 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 restart once more UN 9 2 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.05 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 4.1 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 {noformat} Is there something im missing or is this strange behavior? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Boudreault updated CASSANDRA-8366: --- Attachment: results-1000-inc-repairs.txt Adding a log file run with latest cassandra-2.1 branch: !results-1000-inc-repairs.txt! Repair grows data on nodes, causes load to become unbalanced Key: CASSANDRA-8366 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366 Project: Cassandra Issue Type: Bug Environment: 4 node cluster 2.1.2 Cassandra Inserts and reads are done with CQL driver Reporter: Jan Karlsson Assignee: Marcus Eriksson Attachments: results-1000-inc-repairs.txt, results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, results-500_2_inc_repairs.txt, results-500_full_repair_then_inc_repairs.txt, results-500_inc_repairs_not_parallel.txt, run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, run3_no_compact_before_repair.log, test.sh, testv2.sh There seems to be something weird going on when repairing data. I have a program that runs 2 hours which inserts 250 random numbers and reads 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. I use size-tiered compaction for my cluster. After those 2 hours I run a repair and the load of all nodes goes up. If I run incremental repair the load goes up alot more. I saw the load shoot up 8 times the original size multiple times with incremental repair. (from 2G to 16G) with node 9 8 7 and 6 the repro procedure looked like this: (Note that running full repair first is not a requirement to reproduce.) {noformat} After 2 hours of 250 reads + 250 writes per second: UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 Repair -pr -par on all nodes sequentially UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 after rolling restart UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.19 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 compact all nodes sequentially UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 1.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 1.98 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 3.71 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 restart once more UN 9 2 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.05 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 4.1 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 {noformat} Is there something im missing or is this strange behavior? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Boudreault updated CASSANDRA-8366: --- Assignee: Marcus Eriksson (was: Alan Boudreault) Repair grows data on nodes, causes load to become unbalanced Key: CASSANDRA-8366 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366 Project: Cassandra Issue Type: Bug Environment: 4 node cluster 2.1.2 Cassandra Inserts and reads are done with CQL driver Reporter: Jan Karlsson Assignee: Marcus Eriksson Attachments: results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, results-500_2_inc_repairs.txt, results-500_full_repair_then_inc_repairs.txt, results-500_inc_repairs_not_parallel.txt, run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, run3_no_compact_before_repair.log, test.sh, testv2.sh There seems to be something weird going on when repairing data. I have a program that runs 2 hours which inserts 250 random numbers and reads 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. I use size-tiered compaction for my cluster. After those 2 hours I run a repair and the load of all nodes goes up. If I run incremental repair the load goes up alot more. I saw the load shoot up 8 times the original size multiple times with incremental repair. (from 2G to 16G) with node 9 8 7 and 6 the repro procedure looked like this: (Note that running full repair first is not a requirement to reproduce.) {noformat} After 2 hours of 250 reads + 250 writes per second: UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 Repair -pr -par on all nodes sequentially UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 after rolling restart UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.19 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 compact all nodes sequentially UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 1.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 1.98 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 3.71 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 restart once more UN 9 2 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.05 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 4.1 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 {noformat} Is there something im missing or is this strange behavior? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Boudreault updated CASSANDRA-8366: --- Reproduced In: 2.1.2, 2.1.1 (was: 2.1.1, 2.1.2) Tester: Alan Boudreault Repair grows data on nodes, causes load to become unbalanced Key: CASSANDRA-8366 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366 Project: Cassandra Issue Type: Bug Environment: 4 node cluster 2.1.2 Cassandra Inserts and reads are done with CQL driver Reporter: Jan Karlsson Assignee: Marcus Eriksson Attachments: results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, results-500_2_inc_repairs.txt, results-500_full_repair_then_inc_repairs.txt, results-500_inc_repairs_not_parallel.txt, run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, run3_no_compact_before_repair.log, test.sh, testv2.sh There seems to be something weird going on when repairing data. I have a program that runs 2 hours which inserts 250 random numbers and reads 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. I use size-tiered compaction for my cluster. After those 2 hours I run a repair and the load of all nodes goes up. If I run incremental repair the load goes up alot more. I saw the load shoot up 8 times the original size multiple times with incremental repair. (from 2G to 16G) with node 9 8 7 and 6 the repro procedure looked like this: (Note that running full repair first is not a requirement to reproduce.) {noformat} After 2 hours of 250 reads + 250 writes per second: UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 Repair -pr -par on all nodes sequentially UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 after rolling restart UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.19 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 compact all nodes sequentially UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 1.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 1.98 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 3.71 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 restart once more UN 9 2 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.05 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 4.1 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 {noformat} Is there something im missing or is this strange behavior? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Boudreault updated CASSANDRA-8366: --- Attachment: run2_no_compact_before_repair.log run1_with_compact_before_repair.log run3_no_compact_before_repair.log testv2.sh [~krummas] I'm attaching a new version of the test script. (testv2.sh). This one has some improvements and gives more details after each operations (it shows sstable size, wait properly that all compaction tasks finish, display streaming status, it flushes nodes, it cleans nodes etc.). I've run 3 times the script to see the differences. * run1 is the only real successful result. The reason is that I compact all nodes right after the cassandra-stress operation. Apparently, this removed the need to repair, so everything is fine and at the end of the script all nodes are at the proper size (1.43G). * run2 doesn't compact after the stress. The repair is then ran and we only see the Did not get a positive answer until the end of the node2 repair. So we can see that the keyspace r1 has been successfully repaired for node1 and node2. The repair for node3 failed but it seems that the 2 other repairs have taken care to repair things so everything is OK at the end of the script. (node size ~1.43G) * run3 doesn't compact after the stress. This time, the repair fails at the beginning (node1 repair call). This makes the node2 and node2 repairs fails too. After flushing + cleaning + compacting, all nodes have an extra 1G of data, which I don't know what they are. There is no streaming, all compaction is done and looks like I cannot get rid of them. This is not in the log, but I restarted my cluster again, then retried to full repair sequentially all nodes then re-cleaning, re-compacting and nothing changed. I let the cluster ran all night long to be sure. I have not deleted this cluster so if you need more information, I just have to restart it. Do you see anything wrong in my tests? Ping me on IRC if you want to discuss more about this ticket. Repair grows data on nodes, causes load to become unbalanced Key: CASSANDRA-8366 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366 Project: Cassandra Issue Type: Bug Environment: 4 node cluster 2.1.2 Cassandra Inserts and reads are done with CQL driver Reporter: Jan Karlsson Assignee: Alan Boudreault Attachments: results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, results-500_2_inc_repairs.txt, results-500_full_repair_then_inc_repairs.txt, results-500_inc_repairs_not_parallel.txt, run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, run3_no_compact_before_repair.log, test.sh, testv2.sh There seems to be something weird going on when repairing data. I have a program that runs 2 hours which inserts 250 random numbers and reads 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. I use size-tiered compaction for my cluster. After those 2 hours I run a repair and the load of all nodes goes up. If I run incremental repair the load goes up alot more. I saw the load shoot up 8 times the original size multiple times with incremental repair. (from 2G to 16G) with node 9 8 7 and 6 the repro procedure looked like this: (Note that running full repair first is not a requirement to reproduce.) {noformat} After 2 hours of 250 reads + 250 writes per second: UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 Repair -pr -par on all nodes sequentially UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 after rolling restart UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.46 GB256 ?
[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-8366: Description: There seems to be something weird going on when repairing data. I have a program that runs 2 hours which inserts 250 random numbers and reads 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. I use size-tiered compaction for my cluster. After those 2 hours I run a repair and the load of all nodes goes up. If I run incremental repair the load goes up alot more. I saw the load shoot up 8 times the original size multiple times with incremental repair. (from 2G to 16G) with node 9 8 7 and 6 the repro procedure looked like this: (Note that running full repair first is not a requirement to reproduce.) {noformat} After 2 hours of 250 reads + 250 writes per second: UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 Repair -pr -par on all nodes sequentially UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 after rolling restart UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.19 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 compact all nodes sequentially UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 1.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 1.98 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 3.71 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 restart once more UN 9 2 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.05 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 4.1 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 {noformat} Is there something im missing or is this strange behavior? was: There seems to be something weird going on when repairing data. I have a program that runs 2 hours which inserts 250 random numbers and reads 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. I use size-tiered compaction for my cluster. After those 2 hours I run a repair and the load of all nodes goes up. If I run incremental repair the load goes up alot more. I saw the load shoot up 8 times the original size multiple times with incremental repair. (from 2G to 16G) with node 9 8 7 and 6 the repro procedure looked like this: (Note that running full repair first is not a requirement to reproduce.) After 2 hours of 250 reads + 250 writes per second: UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 Repair -pr -par on all nodes sequentially UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.6 GB 256 ?
[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Boudreault updated CASSANDRA-8366: --- Attachment: results-1750_inc_repair.txt results-500_inc_repairs_not_parallel.txt results-500_full_repair_then_inc_repairs.txt results-500_2_inc_repairs.txt results-500_1_inc_repairs.txt test.sh I have been able to reproduce the issue with 2.1.2 and branch cassandra-2.1. From my tests, the issue seems to be related the parallel incremental repairs. I don't see the issue with full repairs. With full repairs, the storage size increases but everything is fine after a compaction. With incremental repairs, I've seen nodes going from 1.5G to 15G of storage size. It looks like something is broken with inc repairs. Most of the time, I get one of the following errors during the repairs: * Repair session 6f6c4ae0-78d6-11e4-9b48-b56034537865 for range (3074457345618258602,-9223372036854775808] failed with error org.apache.cassandra.exceptions.RepairException: [repair #6f6c4ae0-78d6-11e4-9b48-b56034537865 on r1/Standard1, (3074457345618258602,-9223372036854775808]] Sync failed between /127.0.0.1 and /127.0.0.3 * Repair failed with error Did not get positive replies from all endpoints. List of failed endpoint(s): [127.0.0.1] So this issue might be related to CASSANDRA-8613. I've attached the script I used to reproduce the issue and also 3 result files. Repair grows data on nodes, causes load to become unbalanced Key: CASSANDRA-8366 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366 Project: Cassandra Issue Type: Bug Environment: 4 node cluster 2.1.2 Cassandra Inserts and reads are done with CQL driver Reporter: Jan Karlsson Assignee: Alan Boudreault Attachments: results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, results-500_2_inc_repairs.txt, results-500_full_repair_then_inc_repairs.txt, results-500_inc_repairs_not_parallel.txt, test.sh There seems to be something weird going on when repairing data. I have a program that runs 2 hours which inserts 250 random numbers and reads 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. I use size-tiered compaction for my cluster. After those 2 hours I run a repair and the load of all nodes goes up. If I run incremental repair the load goes up alot more. I saw the load shoot up 8 times the original size multiple times with incremental repair. (from 2G to 16G) with node 9 8 7 and 6 the repro procedure looked like this: (Note that running full repair first is not a requirement to reproduce.) After 2 hours of 250 reads + 250 writes per second: UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 Repair -pr -par on all nodes sequentially UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 after rolling restart UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.19 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 compact all nodes sequentially UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 1.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 1.98 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 3.71 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b
[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced
[ https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Karlsson updated CASSANDRA-8366: Description: There seems to be something weird going on when repairing data. I have a program that runs 2 hours which inserts 250 random numbers and reads 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. I use size-tiered compaction for my cluster. After those 2 hours I run a repair and the load of all nodes goes up. If I run incremental repair the load goes up alot more. I saw the load shoot up 8 times the original size multiple times with incremental repair. (from 2G to 16G) with node 9 8 7 and 6 the repro procedure looked like this: (Note that running full repair first is not a requirement to reproduce.) After 2 hours of 250 reads + 250 writes per second: UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 Repair -pr -par on all nodes sequentially UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.6 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 2.17 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 after rolling restart UN 9 1.47 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 1.5 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.19 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 compact all nodes sequentially UN 9 989.99 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 994.75 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 1.46 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.82 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 1.98 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.3 GB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 3.71 GB256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 restart once more UN 9 2 GB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.05 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 4.1 GB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 1.68 GB256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 Is there something im missing or is this strange behavior? was: There seems to be something weird going on when repairing data. I have a program that runs 2 hours which inserts 250 random numbers and reads 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. I use size-tiered compaction for my cluster. After those 2 hours I run a repair and the load of all nodes goes up. If I run incremental repair the load goes up alot more. I saw the load shoot up 8 times the original size multiple times with incremental repair. (from 2G to 16G) with node 9 8 7 and 7 the repro procedure looked like this: (Note that running full repair first is not a requirement to reproduce.) After 2 hours of 250 reads + 250 writes per second: UN 9 583.39 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 584.01 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 583.72 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 583.84 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 Repair -pr -par on all nodes sequentially UN 9 746.29 MB 256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 751.02 MB 256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 748.89 MB 256 ? 2b6b5d66-13c8-43d8-855c-290c0f3c3a0b rack1 UN 6 758.34 MB 256 ? b8bd67f1-a816-46ff-b4a4-136ad5af6d4b rack1 repair -inc -par on all nodes sequentially UN 9 2.41 GB256 ? 28220962-26ae-4eeb-8027-99f96e377406 rack1 UN 8 2.53 GB256 ? f2de6ea1-de88-4056-8fde-42f9c476a090 rack1 UN 7 2.6 GB 256 ?