[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2019-01-29 Thread Jan Karlsson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-8366:

Component/s: (was: Legacy/Streaming and Messaging)

> Repair grows data on nodes, causes load to become unbalanced
> 
>
> Key: CASSANDRA-8366
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8366
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
> Environment: 4 node cluster
> 2.1.2 Cassandra
> Inserts and reads are done with CQL driver
>Reporter: Jan Karlsson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 2.1.5
>
> Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, 
> results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, 
> results-500_2_inc_repairs.txt, 
> results-500_full_repair_then_inc_repairs.txt, 
> results-500_inc_repairs_not_parallel.txt, 
> run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, 
> run3_no_compact_before_repair.log, test.sh, testv2.sh
>
>
> There seems to be something weird going on when repairing data.
> I have a program that runs 2 hours which inserts 250 random numbers and reads 
> 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 
> I use size-tiered compaction for my cluster. 
> After those 2 hours I run a repair and the load of all nodes goes up. If I 
> run incremental repair the load goes up alot more. I saw the load shoot up 8 
> times the original size multiple times with incremental repair. (from 2G to 
> 16G)
> with node 9 8 7 and 6 the repro procedure looked like this:
> (Note that running full repair first is not a requirement to reproduce.)
> {noformat}
> After 2 hours of 250 reads + 250 writes per second:
> UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> Repair -pr -par on all nodes sequentially
> UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> repair -inc -par on all nodes sequentially
> UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> after rolling restart
> UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> compact all nodes sequentially
> UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> repair -inc -par on all nodes sequentially
> UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> restart once more
> UN  9  2 GB   256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.05 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  4.1 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> {noformat}
> Is there something im missing or is this strange behavior?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2019-01-29 Thread Jan Karlsson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-8366:

Component/s: Consistency/Repair

> Repair grows data on nodes, causes load to become unbalanced
> 
>
> Key: CASSANDRA-8366
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8366
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Legacy/Streaming and Messaging
> Environment: 4 node cluster
> 2.1.2 Cassandra
> Inserts and reads are done with CQL driver
>Reporter: Jan Karlsson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 2.1.5
>
> Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, 
> results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, 
> results-500_2_inc_repairs.txt, 
> results-500_full_repair_then_inc_repairs.txt, 
> results-500_inc_repairs_not_parallel.txt, 
> run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, 
> run3_no_compact_before_repair.log, test.sh, testv2.sh
>
>
> There seems to be something weird going on when repairing data.
> I have a program that runs 2 hours which inserts 250 random numbers and reads 
> 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 
> I use size-tiered compaction for my cluster. 
> After those 2 hours I run a repair and the load of all nodes goes up. If I 
> run incremental repair the load goes up alot more. I saw the load shoot up 8 
> times the original size multiple times with incremental repair. (from 2G to 
> 16G)
> with node 9 8 7 and 6 the repro procedure looked like this:
> (Note that running full repair first is not a requirement to reproduce.)
> {noformat}
> After 2 hours of 250 reads + 250 writes per second:
> UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> Repair -pr -par on all nodes sequentially
> UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> repair -inc -par on all nodes sequentially
> UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> after rolling restart
> UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> compact all nodes sequentially
> UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> repair -inc -par on all nodes sequentially
> UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> restart once more
> UN  9  2 GB   256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.05 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  4.1 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> {noformat}
> Is there something im missing or is this strange behavior?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2015-12-02 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-8366:
---
Component/s: Streaming and Messaging

> Repair grows data on nodes, causes load to become unbalanced
> 
>
> Key: CASSANDRA-8366
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8366
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: 4 node cluster
> 2.1.2 Cassandra
> Inserts and reads are done with CQL driver
>Reporter: Jan Karlsson
>Assignee: Marcus Eriksson
> Fix For: 2.1.5
>
> Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, 
> results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, 
> results-500_2_inc_repairs.txt, 
> results-500_full_repair_then_inc_repairs.txt, 
> results-500_inc_repairs_not_parallel.txt, 
> run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, 
> run3_no_compact_before_repair.log, test.sh, testv2.sh
>
>
> There seems to be something weird going on when repairing data.
> I have a program that runs 2 hours which inserts 250 random numbers and reads 
> 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 
> I use size-tiered compaction for my cluster. 
> After those 2 hours I run a repair and the load of all nodes goes up. If I 
> run incremental repair the load goes up alot more. I saw the load shoot up 8 
> times the original size multiple times with incremental repair. (from 2G to 
> 16G)
> with node 9 8 7 and 6 the repro procedure looked like this:
> (Note that running full repair first is not a requirement to reproduce.)
> {noformat}
> After 2 hours of 250 reads + 250 writes per second:
> UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> Repair -pr -par on all nodes sequentially
> UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> repair -inc -par on all nodes sequentially
> UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> after rolling restart
> UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> compact all nodes sequentially
> UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> repair -inc -par on all nodes sequentially
> UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> restart once more
> UN  9  2 GB   256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.05 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  4.1 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> {noformat}
> Is there something im missing or is this strange behavior?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2015-02-24 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-8366:
---
 Reviewer: Yuki Morishita
Reproduced In: 2.1.2, 2.1.1  (was: 2.1.1, 2.1.2)

 Repair grows data on nodes, causes load to become unbalanced
 

 Key: CASSANDRA-8366
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366
 Project: Cassandra
  Issue Type: Bug
 Environment: 4 node cluster
 2.1.2 Cassandra
 Inserts and reads are done with CQL driver
Reporter: Jan Karlsson
Assignee: Marcus Eriksson
 Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, 
 results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, 
 results-500_2_inc_repairs.txt, 
 results-500_full_repair_then_inc_repairs.txt, 
 results-500_inc_repairs_not_parallel.txt, 
 run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, 
 run3_no_compact_before_repair.log, test.sh, testv2.sh


 There seems to be something weird going on when repairing data.
 I have a program that runs 2 hours which inserts 250 random numbers and reads 
 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 
 I use size-tiered compaction for my cluster. 
 After those 2 hours I run a repair and the load of all nodes goes up. If I 
 run incremental repair the load goes up alot more. I saw the load shoot up 8 
 times the original size multiple times with incremental repair. (from 2G to 
 16G)
 with node 9 8 7 and 6 the repro procedure looked like this:
 (Note that running full repair first is not a requirement to reproduce.)
 {noformat}
 After 2 hours of 250 reads + 250 writes per second:
 UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 Repair -pr -par on all nodes sequentially
 UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 after rolling restart
 UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 compact all nodes sequentially
 UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 restart once more
 UN  9  2 GB   256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.05 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  4.1 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 {noformat}
 Is there something im missing or is this strange behavior?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2015-02-17 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-8366:
---
Attachment: 0001-8366.patch

attaching patch that picks the sstables to compact as late as possible.

Actually a semi-backport of CASSANDRA-7586

We will still have a slightly bigger live size on the nodes after one of these 
repairs as some sstables will not get anticompacted due to being compacted away 
(we could probably improve this as well, but in another ticket), but it is much 
better:
{code}
$ du -sch /home/marcuse/.ccm/8366/node?/data/r1/
1,8G/home/marcuse/.ccm/8366/node1/data/r1/
1,8G/home/marcuse/.ccm/8366/node2/data/r1/
1,8G/home/marcuse/.ccm/8366/node3/data/r1/
5,2Gtotal
{code}

 Repair grows data on nodes, causes load to become unbalanced
 

 Key: CASSANDRA-8366
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366
 Project: Cassandra
  Issue Type: Bug
 Environment: 4 node cluster
 2.1.2 Cassandra
 Inserts and reads are done with CQL driver
Reporter: Jan Karlsson
Assignee: Marcus Eriksson
 Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, 
 results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, 
 results-500_2_inc_repairs.txt, 
 results-500_full_repair_then_inc_repairs.txt, 
 results-500_inc_repairs_not_parallel.txt, 
 run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, 
 run3_no_compact_before_repair.log, test.sh, testv2.sh


 There seems to be something weird going on when repairing data.
 I have a program that runs 2 hours which inserts 250 random numbers and reads 
 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 
 I use size-tiered compaction for my cluster. 
 After those 2 hours I run a repair and the load of all nodes goes up. If I 
 run incremental repair the load goes up alot more. I saw the load shoot up 8 
 times the original size multiple times with incremental repair. (from 2G to 
 16G)
 with node 9 8 7 and 6 the repro procedure looked like this:
 (Note that running full repair first is not a requirement to reproduce.)
 {noformat}
 After 2 hours of 250 reads + 250 writes per second:
 UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 Repair -pr -par on all nodes sequentially
 UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 after rolling restart
 UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 compact all nodes sequentially
 UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 restart once more
 UN  9  2 GB   256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.05 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  4.1 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 {noformat}
 Is there something im missing or is this strange behavior?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2015-02-13 Thread Alan Boudreault (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-8366:
---
Attachment: results-1000-inc-repairs.txt

Adding a log file run with latest cassandra-2.1 branch: 
!results-1000-inc-repairs.txt!

 Repair grows data on nodes, causes load to become unbalanced
 

 Key: CASSANDRA-8366
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366
 Project: Cassandra
  Issue Type: Bug
 Environment: 4 node cluster
 2.1.2 Cassandra
 Inserts and reads are done with CQL driver
Reporter: Jan Karlsson
Assignee: Marcus Eriksson
 Attachments: results-1000-inc-repairs.txt, 
 results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, 
 results-500_2_inc_repairs.txt, 
 results-500_full_repair_then_inc_repairs.txt, 
 results-500_inc_repairs_not_parallel.txt, 
 run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, 
 run3_no_compact_before_repair.log, test.sh, testv2.sh


 There seems to be something weird going on when repairing data.
 I have a program that runs 2 hours which inserts 250 random numbers and reads 
 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 
 I use size-tiered compaction for my cluster. 
 After those 2 hours I run a repair and the load of all nodes goes up. If I 
 run incremental repair the load goes up alot more. I saw the load shoot up 8 
 times the original size multiple times with incremental repair. (from 2G to 
 16G)
 with node 9 8 7 and 6 the repro procedure looked like this:
 (Note that running full repair first is not a requirement to reproduce.)
 {noformat}
 After 2 hours of 250 reads + 250 writes per second:
 UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 Repair -pr -par on all nodes sequentially
 UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 after rolling restart
 UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 compact all nodes sequentially
 UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 restart once more
 UN  9  2 GB   256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.05 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  4.1 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 {noformat}
 Is there something im missing or is this strange behavior?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2015-02-03 Thread Alan Boudreault (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-8366:
---
Assignee: Marcus Eriksson  (was: Alan Boudreault)

 Repair grows data on nodes, causes load to become unbalanced
 

 Key: CASSANDRA-8366
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366
 Project: Cassandra
  Issue Type: Bug
 Environment: 4 node cluster
 2.1.2 Cassandra
 Inserts and reads are done with CQL driver
Reporter: Jan Karlsson
Assignee: Marcus Eriksson
 Attachments: results-1750_inc_repair.txt, 
 results-500_1_inc_repairs.txt, results-500_2_inc_repairs.txt, 
 results-500_full_repair_then_inc_repairs.txt, 
 results-500_inc_repairs_not_parallel.txt, 
 run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, 
 run3_no_compact_before_repair.log, test.sh, testv2.sh


 There seems to be something weird going on when repairing data.
 I have a program that runs 2 hours which inserts 250 random numbers and reads 
 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 
 I use size-tiered compaction for my cluster. 
 After those 2 hours I run a repair and the load of all nodes goes up. If I 
 run incremental repair the load goes up alot more. I saw the load shoot up 8 
 times the original size multiple times with incremental repair. (from 2G to 
 16G)
 with node 9 8 7 and 6 the repro procedure looked like this:
 (Note that running full repair first is not a requirement to reproduce.)
 {noformat}
 After 2 hours of 250 reads + 250 writes per second:
 UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 Repair -pr -par on all nodes sequentially
 UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 after rolling restart
 UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 compact all nodes sequentially
 UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 restart once more
 UN  9  2 GB   256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.05 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  4.1 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 {noformat}
 Is there something im missing or is this strange behavior?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2015-02-03 Thread Alan Boudreault (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-8366:
---
Reproduced In: 2.1.2, 2.1.1  (was: 2.1.1, 2.1.2)
   Tester: Alan Boudreault

 Repair grows data on nodes, causes load to become unbalanced
 

 Key: CASSANDRA-8366
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366
 Project: Cassandra
  Issue Type: Bug
 Environment: 4 node cluster
 2.1.2 Cassandra
 Inserts and reads are done with CQL driver
Reporter: Jan Karlsson
Assignee: Marcus Eriksson
 Attachments: results-1750_inc_repair.txt, 
 results-500_1_inc_repairs.txt, results-500_2_inc_repairs.txt, 
 results-500_full_repair_then_inc_repairs.txt, 
 results-500_inc_repairs_not_parallel.txt, 
 run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, 
 run3_no_compact_before_repair.log, test.sh, testv2.sh


 There seems to be something weird going on when repairing data.
 I have a program that runs 2 hours which inserts 250 random numbers and reads 
 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 
 I use size-tiered compaction for my cluster. 
 After those 2 hours I run a repair and the load of all nodes goes up. If I 
 run incremental repair the load goes up alot more. I saw the load shoot up 8 
 times the original size multiple times with incremental repair. (from 2G to 
 16G)
 with node 9 8 7 and 6 the repro procedure looked like this:
 (Note that running full repair first is not a requirement to reproduce.)
 {noformat}
 After 2 hours of 250 reads + 250 writes per second:
 UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 Repair -pr -par on all nodes sequentially
 UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 after rolling restart
 UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 compact all nodes sequentially
 UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 restart once more
 UN  9  2 GB   256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.05 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  4.1 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 {noformat}
 Is there something im missing or is this strange behavior?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2015-01-23 Thread Alan Boudreault (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-8366:
---
Attachment: run2_no_compact_before_repair.log
run1_with_compact_before_repair.log
run3_no_compact_before_repair.log
testv2.sh

[~krummas] I'm attaching a new version of the test script. (testv2.sh). This 
one has some improvements and gives more details after each operations (it 
shows sstable size, wait properly that all compaction tasks finish, display  
streaming status, it flushes nodes, it cleans nodes etc.).

I've run  3 times the script to see the differences. 

* run1 is the only real successful result. The reason is that I compact all 
nodes right after the cassandra-stress operation. Apparently, this removed the 
need to repair, so everything is fine and at the end of the script all nodes 
are at the proper size (1.43G).

* run2 doesn't compact after the stress. The repair is then ran and we only see 
the Did not get a positive answer until the end of the node2 repair. So we 
can see that the keyspace r1 has been successfully repaired for node1 and 
node2. The repair for node3 failed but it seems that the 2 other repairs have 
taken care to repair things so everything is OK at the end of the script. (node 
size ~1.43G)

* run3 doesn't compact after the stress. This time, the repair fails at the 
beginning (node1 repair call). This makes the node2 and node2 repairs fails 
too. After flushing + cleaning + compacting, all nodes have an extra 1G of 
data, which I don't know what they are. There is no streaming, all compaction 
is done and looks like I cannot get rid of them. This is not in the log, but I 
restarted my cluster again, then retried to full repair sequentially all nodes 
then re-cleaning, re-compacting and nothing changed. I let the cluster ran all 
night long to be sure. I have not deleted this cluster so if you need more 
information, I just have to restart it.

Do you see anything wrong in my tests? Ping me on IRC if you want to discuss 
more about this ticket. 




 Repair grows data on nodes, causes load to become unbalanced
 

 Key: CASSANDRA-8366
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366
 Project: Cassandra
  Issue Type: Bug
 Environment: 4 node cluster
 2.1.2 Cassandra
 Inserts and reads are done with CQL driver
Reporter: Jan Karlsson
Assignee: Alan Boudreault
 Attachments: results-1750_inc_repair.txt, 
 results-500_1_inc_repairs.txt, results-500_2_inc_repairs.txt, 
 results-500_full_repair_then_inc_repairs.txt, 
 results-500_inc_repairs_not_parallel.txt, 
 run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, 
 run3_no_compact_before_repair.log, test.sh, testv2.sh


 There seems to be something weird going on when repairing data.
 I have a program that runs 2 hours which inserts 250 random numbers and reads 
 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 
 I use size-tiered compaction for my cluster. 
 After those 2 hours I run a repair and the load of all nodes goes up. If I 
 run incremental repair the load goes up alot more. I saw the load shoot up 8 
 times the original size multiple times with incremental repair. (from 2G to 
 16G)
 with node 9 8 7 and 6 the repro procedure looked like this:
 (Note that running full repair first is not a requirement to reproduce.)
 {noformat}
 After 2 hours of 250 reads + 250 writes per second:
 UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 Repair -pr -par on all nodes sequentially
 UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 after rolling restart
 UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.46 GB256 ?   

[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2015-01-14 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-8366:

Description: 
There seems to be something weird going on when repairing data.

I have a program that runs 2 hours which inserts 250 random numbers and reads 
250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 

I use size-tiered compaction for my cluster. 

After those 2 hours I run a repair and the load of all nodes goes up. If I run 
incremental repair the load goes up alot more. I saw the load shoot up 8 times 
the original size multiple times with incremental repair. (from 2G to 16G)


with node 9 8 7 and 6 the repro procedure looked like this:
(Note that running full repair first is not a requirement to reproduce.)
{noformat}
After 2 hours of 250 reads + 250 writes per second:
UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

Repair -pr -par on all nodes sequentially
UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

repair -inc -par on all nodes sequentially
UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

after rolling restart
UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

compact all nodes sequentially
UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

repair -inc -par on all nodes sequentially
UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

restart once more
UN  9  2 GB   256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  2.05 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  4.1 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
{noformat}

Is there something im missing or is this strange behavior?

  was:
There seems to be something weird going on when repairing data.

I have a program that runs 2 hours which inserts 250 random numbers and reads 
250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 

I use size-tiered compaction for my cluster. 

After those 2 hours I run a repair and the load of all nodes goes up. If I run 
incremental repair the load goes up alot more. I saw the load shoot up 8 times 
the original size multiple times with incremental repair. (from 2G to 16G)


with node 9 8 7 and 6 the repro procedure looked like this:
(Note that running full repair first is not a requirement to reproduce.)

After 2 hours of 250 reads + 250 writes per second:
UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

Repair -pr -par on all nodes sequentially
UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

repair -inc -par on all nodes sequentially
UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  2.6 GB 256 ?   

[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2014-11-30 Thread Alan Boudreault (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Boudreault updated CASSANDRA-8366:
---
Attachment: results-1750_inc_repair.txt
results-500_inc_repairs_not_parallel.txt
results-500_full_repair_then_inc_repairs.txt
results-500_2_inc_repairs.txt
results-500_1_inc_repairs.txt
test.sh

I have been able to reproduce the issue with 2.1.2 and branch cassandra-2.1. 
From my tests, the issue seems to be related the parallel incremental repairs. 
I don't see the issue with  full repairs. With full repairs, the storage size 
increases but everything is fine after a compaction.  With incremental repairs, 
I've seen nodes going from 1.5G to 15G of storage size. 

It looks like something is broken with inc repairs. Most of the time, I get one 
of the following errors during the repairs:

* Repair session 6f6c4ae0-78d6-11e4-9b48-b56034537865 for range 
(3074457345618258602,-9223372036854775808] failed with error 
org.apache.cassandra.exceptions.RepairException: [repair 
#6f6c4ae0-78d6-11e4-9b48-b56034537865 on r1/Standard1, 
(3074457345618258602,-9223372036854775808]] Sync failed between /127.0.0.1 and 
/127.0.0.3

* Repair failed with error Did not get positive replies from all endpoints. 
List of failed endpoint(s): [127.0.0.1]

So this issue might be related to CASSANDRA-8613.  I've attached the script I 
used to reproduce the issue and also 3 result files.

 Repair grows data on nodes, causes load to become unbalanced
 

 Key: CASSANDRA-8366
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8366
 Project: Cassandra
  Issue Type: Bug
 Environment: 4 node cluster
 2.1.2 Cassandra
 Inserts and reads are done with CQL driver
Reporter: Jan Karlsson
Assignee: Alan Boudreault
 Attachments: results-1750_inc_repair.txt, 
 results-500_1_inc_repairs.txt, results-500_2_inc_repairs.txt, 
 results-500_full_repair_then_inc_repairs.txt, 
 results-500_inc_repairs_not_parallel.txt, test.sh


 There seems to be something weird going on when repairing data.
 I have a program that runs 2 hours which inserts 250 random numbers and reads 
 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 
 I use size-tiered compaction for my cluster. 
 After those 2 hours I run a repair and the load of all nodes goes up. If I 
 run incremental repair the load goes up alot more. I saw the load shoot up 8 
 times the original size multiple times with incremental repair. (from 2G to 
 16G)
 with node 9 8 7 and 6 the repro procedure looked like this:
 (Note that running full repair first is not a requirement to reproduce.)
 After 2 hours of 250 reads + 250 writes per second:
 UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 Repair -pr -par on all nodes sequentially
 UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 after rolling restart
 UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 compact all nodes sequentially
 UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
 UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
 repair -inc -par on all nodes sequentially
 UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
 UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
 UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  

[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2014-11-24 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-8366:

Description: 
There seems to be something weird going on when repairing data.

I have a program that runs 2 hours which inserts 250 random numbers and reads 
250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 

I use size-tiered compaction for my cluster. 

After those 2 hours I run a repair and the load of all nodes goes up. If I run 
incremental repair the load goes up alot more. I saw the load shoot up 8 times 
the original size multiple times with incremental repair. (from 2G to 16G)


with node 9 8 7 and 6 the repro procedure looked like this:
(Note that running full repair first is not a requirement to reproduce.)

After 2 hours of 250 reads + 250 writes per second:
UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

Repair -pr -par on all nodes sequentially
UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

repair -inc -par on all nodes sequentially
UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

after rolling restart
UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

compact all nodes sequentially
UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

repair -inc -par on all nodes sequentially
UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

restart once more
UN  9  2 GB   256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  2.05 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  4.1 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1


Is there something im missing or is this strange behavior?

  was:
There seems to be something weird going on when repairing data.

I have a program that runs 2 hours which inserts 250 random numbers and reads 
250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 

I use size-tiered compaction for my cluster. 

After those 2 hours I run a repair and the load of all nodes goes up. If I run 
incremental repair the load goes up alot more. I saw the load shoot up 8 times 
the original size multiple times with incremental repair. (from 2G to 16G)


with node 9 8 7 and 7 the repro procedure looked like this:
(Note that running full repair first is not a requirement to reproduce.)

After 2 hours of 250 reads + 250 writes per second:
UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

Repair -pr -par on all nodes sequentially
UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

repair -inc -par on all nodes sequentially
UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  2.6 GB 256 ?