[jira] [Commented] (CASSANDRA-12489) consecutive repairs of same range always finds 'out of sync' in sane cluster

2017-03-08 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900951#comment-15900951
 ] 

Benjamin Roth commented on CASSANDRA-12489:
---

Awesome. I see that there has been a lot of work done by thelastpickle since I 
initially forked it from spotifiy (which didn't support 3.x back then).
A simple changelog.md would be even more awesome to see if there have been 
important changes.

> consecutive repairs of same range always finds 'out of sync' in sane cluster
> 
>
> Key: CASSANDRA-12489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12489
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>  Labels: lhf
> Attachments: trace_3_10.1.log.gz, trace_3_10.2.log.gz, 
> trace_3_10.3.log.gz, trace_3_10.4.log.gz, trace_3_9.1.log.gz, 
> trace_3_9.2.log.gz
>
>
> No matter how often or when I run the same subrange repair, it ALWAYS tells 
> me that some ranges are our of sync. Tested in 3.9 + 3.10 (git trunk of 
> 2016-08-17). The cluster is sane. All nodes are up, cluster is not overloaded.
> I guess this is not a desired behaviour. I'd expect that a repair does what 
> it says and a consecutive repair shouldn't report "out of syncs" any more if 
> the cluster is sane.
> Especially for tables with MVs that puts a lot of pressure during repair as 
> ranges are repaired over and over again.
> See traces of different runs attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12489) consecutive repairs of same range always finds 'out of sync' in sane cluster

2017-03-07 Thread Kurt Greaves (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900209#comment-15900209
 ] 

Kurt Greaves commented on CASSANDRA-12489:
--

Reaper supports incremental repairs as well, without splitting the ranges (or 
at least that's what I'm led to believe). In which case it should just run 
repairs so they don't occur at the same time. As Marcus says, incremental 
repairs reduce the amount of data that needs to be repaired, while splitting on 
the ranges still repairs all the data, but just in a more manageable fashion.

> consecutive repairs of same range always finds 'out of sync' in sane cluster
> 
>
> Key: CASSANDRA-12489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12489
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>  Labels: lhf
> Attachments: trace_3_10.1.log.gz, trace_3_10.2.log.gz, 
> trace_3_10.3.log.gz, trace_3_10.4.log.gz, trace_3_9.1.log.gz, 
> trace_3_9.2.log.gz
>
>
> No matter how often or when I run the same subrange repair, it ALWAYS tells 
> me that some ranges are our of sync. Tested in 3.9 + 3.10 (git trunk of 
> 2016-08-17). The cluster is sane. All nodes are up, cluster is not overloaded.
> I guess this is not a desired behaviour. I'd expect that a repair does what 
> it says and a consecutive repair shouldn't report "out of syncs" any more if 
> the cluster is sane.
> Especially for tables with MVs that puts a lot of pressure during repair as 
> ranges are repaired over and over again.
> See traces of different runs attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12489) consecutive repairs of same range always finds 'out of sync' in sane cluster

2017-03-06 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898060#comment-15898060
 ] 

Marcus Eriksson commented on CASSANDRA-12489:
-

the idea is that with incremental repairs we don't need to use the tools that 
split the ranges, instead the amount of data to repair is small since we only 
include unrepaired data

> consecutive repairs of same range always finds 'out of sync' in sane cluster
> 
>
> Key: CASSANDRA-12489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12489
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>  Labels: lhf
> Attachments: trace_3_10.1.log.gz, trace_3_10.2.log.gz, 
> trace_3_10.3.log.gz, trace_3_10.4.log.gz, trace_3_9.1.log.gz, 
> trace_3_9.2.log.gz
>
>
> No matter how often or when I run the same subrange repair, it ALWAYS tells 
> me that some ranges are our of sync. Tested in 3.9 + 3.10 (git trunk of 
> 2016-08-17). The cluster is sane. All nodes are up, cluster is not overloaded.
> I guess this is not a desired behaviour. I'd expect that a repair does what 
> it says and a consecutive repair shouldn't report "out of syncs" any more if 
> the cluster is sane.
> Especially for tables with MVs that puts a lot of pressure during repair as 
> ranges are repaired over and over again.
> See traces of different runs attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12489) consecutive repairs of same range always finds 'out of sync' in sane cluster

2017-03-06 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898051#comment-15898051
 ] 

Benjamin Roth commented on CASSANDRA-12489:
---

Thanks for the answer. Thats what I thought. But what a right to exist do 
incremental repairs then have in the real world if (most, many, whatever) 
people use a tool that makes repairs manageable which eliminates this case. The 
use case + real benefit is quite limited then, isn't it?
Probably thats a philosophic question but I'm curios what other guys think 
about it and if I am maybe missing a valuable use case.

> consecutive repairs of same range always finds 'out of sync' in sane cluster
> 
>
> Key: CASSANDRA-12489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12489
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>  Labels: lhf
> Attachments: trace_3_10.1.log.gz, trace_3_10.2.log.gz, 
> trace_3_10.3.log.gz, trace_3_10.4.log.gz, trace_3_9.1.log.gz, 
> trace_3_9.2.log.gz
>
>
> No matter how often or when I run the same subrange repair, it ALWAYS tells 
> me that some ranges are our of sync. Tested in 3.9 + 3.10 (git trunk of 
> 2016-08-17). The cluster is sane. All nodes are up, cluster is not overloaded.
> I guess this is not a desired behaviour. I'd expect that a repair does what 
> it says and a consecutive repair shouldn't report "out of syncs" any more if 
> the cluster is sane.
> Especially for tables with MVs that puts a lot of pressure during repair as 
> ranges are repaired over and over again.
> See traces of different runs attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12489) consecutive repairs of same range always finds 'out of sync' in sane cluster

2017-03-06 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898043#comment-15898043
 ] 

Marcus Eriksson commented on CASSANDRA-12489:
-

Yeah, typically people use stuff like spotifys reaper which splits the range of 
the in n (1000?) parts. If we have an sstable that covers the full range of the 
node, we would rewrite it n times - we write each repaired range into a new 
sstable, and the unrepaired parts get written to another sstable (and that 
sstable gets rewritten on the next repair etc).

> consecutive repairs of same range always finds 'out of sync' in sane cluster
> 
>
> Key: CASSANDRA-12489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12489
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>  Labels: lhf
> Attachments: trace_3_10.1.log.gz, trace_3_10.2.log.gz, 
> trace_3_10.3.log.gz, trace_3_10.4.log.gz, trace_3_9.1.log.gz, 
> trace_3_9.2.log.gz
>
>
> No matter how often or when I run the same subrange repair, it ALWAYS tells 
> me that some ranges are our of sync. Tested in 3.9 + 3.10 (git trunk of 
> 2016-08-17). The cluster is sane. All nodes are up, cluster is not overloaded.
> I guess this is not a desired behaviour. I'd expect that a repair does what 
> it says and a consecutive repair shouldn't report "out of syncs" any more if 
> the cluster is sane.
> Especially for tables with MVs that puts a lot of pressure during repair as 
> ranges are repaired over and over again.
> See traces of different runs attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12489) consecutive repairs of same range always finds 'out of sync' in sane cluster

2017-03-06 Thread Benjamin Roth (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898028#comment-15898028
 ] 

Benjamin Roth commented on CASSANDRA-12489:
---

May I ask what's the reason that incremental + subrange repair doesn't do 
anticompaction? Is it because anticompaction is too expensive in this case or 
to say it in different words: A subrange full repair is cheaper than subrange 
incremental repair with anticompaction?

> consecutive repairs of same range always finds 'out of sync' in sane cluster
> 
>
> Key: CASSANDRA-12489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12489
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Benjamin Roth
>Assignee: Benjamin Roth
>  Labels: lhf
> Attachments: trace_3_10.1.log.gz, trace_3_10.2.log.gz, 
> trace_3_10.3.log.gz, trace_3_10.4.log.gz, trace_3_9.1.log.gz, 
> trace_3_9.2.log.gz
>
>
> No matter how often or when I run the same subrange repair, it ALWAYS tells 
> me that some ranges are our of sync. Tested in 3.9 + 3.10 (git trunk of 
> 2016-08-17). The cluster is sane. All nodes are up, cluster is not overloaded.
> I guess this is not a desired behaviour. I'd expect that a repair does what 
> it says and a consecutive repair shouldn't report "out of syncs" any more if 
> the cluster is sane.
> Especially for tables with MVs that puts a lot of pressure during repair as 
> ranges are repaired over and over again.
> See traces of different runs attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12489) consecutive repairs of same range always finds 'out of sync' in sane cluster

2016-09-01 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15455756#comment-15455756
 ] 

Marcus Eriksson commented on CASSANDRA-12489:
-

hmm, we should probably not mark streamed sstables as repaired in this case

> consecutive repairs of same range always finds 'out of sync' in sane cluster
> 
>
> Key: CASSANDRA-12489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12489
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Benjamin Roth
> Attachments: trace_3_10.1.log.gz, trace_3_10.2.log.gz, 
> trace_3_10.3.log.gz, trace_3_10.4.log.gz, trace_3_9.1.log.gz, 
> trace_3_9.2.log.gz
>
>
> No matter how often or when I run the same subrange repair, it ALWAYS tells 
> me that some ranges are our of sync. Tested in 3.9 + 3.10 (git trunk of 
> 2016-08-17). The cluster is sane. All nodes are up, cluster is not overloaded.
> I guess this is not a desired behaviour. I'd expect that a repair does what 
> it says and a consecutive repair shouldn't report "out of syncs" any more if 
> the cluster is sane.
> Especially for tables with MVs that puts a lot of pressure during repair as 
> ranges are repaired over and over again.
> See traces of different runs attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12489) consecutive repairs of same range always finds 'out of sync' in sane cluster

2016-08-31 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453736#comment-15453736
 ] 

Paulo Motta commented on CASSANDRA-12489:
-

It seems this problem is due to the use of combined incremental and subrange 
repair. One one hand, subrange incremental repair does not mark original 
sstables as repaired (CASSANDRA-10422), while incremental repair will mark 
streamed sstables as repaired. So, in the next execution of subrange 
incremental repair the mismatch will persist. So, we should either:
A) Disable combination of incremental + subrange repair
B) Mark sstables originating from incremental subrange repair as unrepaired

WDYT [~krummas]?

> consecutive repairs of same range always finds 'out of sync' in sane cluster
> 
>
> Key: CASSANDRA-12489
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12489
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
>Reporter: Benjamin Roth
> Attachments: trace_3_10.1.log.gz, trace_3_10.2.log.gz, 
> trace_3_10.3.log.gz, trace_3_10.4.log.gz, trace_3_9.1.log.gz, 
> trace_3_9.2.log.gz
>
>
> No matter how often or when I run the same subrange repair, it ALWAYS tells 
> me that some ranges are our of sync. Tested in 3.9 + 3.10 (git trunk of 
> 2016-08-17). The cluster is sane. All nodes are up, cluster is not overloaded.
> I guess this is not a desired behaviour. I'd expect that a repair does what 
> it says and a consecutive repair shouldn't report "out of syncs" any more if 
> the cluster is sane.
> Especially for tables with MVs that puts a lot of pressure during repair as 
> ranges are repaired over and over again.
> See traces of different runs attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)