subject:"\[jira\] \[Commented\] \(CASSANDRA\-10446\) Run repair with down replicas"

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2019-02-13 Thread Blake Eggleston (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767466#comment-16767466
 ] 

Blake Eggleston commented on CASSANDRA-10446:
-

[~laxmikant99] because it's a new feature. 3.11.x is for bugfixes only.

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2019-02-11 Thread Laxmikant Upadhyay (JIRA)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765765#comment-16765765
 ] 

Laxmikant Upadhyay commented on CASSANDRA-10446:


Any specific reason of not introducing this change in 3.11.x ?

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2017-06-02 Thread Marcus Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16034729#comment-16034729
 ] 

Marcus Eriksson commented on CASSANDRA-10446:
-

LGTM, just missing a few brace-on-newline in the RepairSession changes

feel free to fix that on commit

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-12-05 Thread Paulo Motta (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723734#comment-15723734
 ] 

Paulo Motta commented on CASSANDRA-10446:
-

Sounds good!

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-12-05 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15723526#comment-15723526
 ] 

Blake Eggleston commented on CASSANDRA-10446:
-

bq. skipping anti-compaction on the coordinator is not sufficient, since 
anti-compaction is what cleans repair state on the replicas

Good catch. Post CASSANDRA-9143, this will most likely no longer be the case. 
Why don't we wait until that gets committed before continuing with this one.

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-12-02 Thread Paulo Motta (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15716721#comment-15716721
 ] 

Paulo Motta commented on CASSANDRA-10446:
-

Sorry for the delay Blake, got sucked into other things... will try to be reply 
more promptly on the next round.

skipping anti-compaction on the coordinator is not sufficient, since 
anti-compaction is what cleans repair state on the replicas, a simpler approach 
here is to set the parent repair session as {{!isGlobal}} when the force flag 
is set and this will already skip anti-compaction and set {{repairedAt}} as 
{{ActiveRepairService.UNREPAIRED_SSTABLE}}.  By not needing to set the 
{{repairedAt}} dynamically per-repair session, we can probably simplify this a 
bit and move the force flag enforcement from {{RepairSession}}'s constructor to 
the alive check on  {{RepairSession.start}}. What do you think?

If you agree with the suggestions, after you submit a new patch, could you 
please rebase, prepare for commit and resubmit tests? Thanks!

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-11-14 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663973#comment-15663973
 ] 

Sylvain Lebresne commented on CASSANDRA-10446:
--

bq. Likewise, when you trigger a repair {{--force}}, only a subset of the child 
repair sessions may have down nodes

Make sense, that's the part I missed. Thanks.

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-11-14 Thread Paulo Motta (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663961#comment-15663961
 ] 

Paulo Motta commented on CASSANDRA-10446:
-

bq. Which means we obviously cannot mark anything "repaired" if some node was 
down. This seems to be what the last patch is doing, but some of the 
discussions above seems to suggest this could be done differently in the 
future, after CASSANDRA-9143 in particular.  Did I misread those discussions or 
did I miss something more fundamental?

When you trigger a repair command (parent repair session) it will trigger many 
(child) repair sessions, typically one for each vnode subrange. In the end of 
the parent repair session, it will anti-compact only the ranges of successful 
child repair sessions, since a subset of the child repair sessions may have 
failed due to node failures or whatever, and so their ranges cannot be marked 
as repaired. Likewise, when you trigger a repair {{--force}}, only a subset of 
the child repair sessions may have down nodes, so we can still mark ranges of 
successful child repair sessions as repaired (the ones where all nodes were 
up), and this is what the patch is currently doing and will be kept after 
CASSANDRA-9143.

What was brought here and might have confused things a bit is that in both 
cases (with and without {{--force}}), streamed sstables are always marked as 
repaired, what may cause problems in some edge failure scenarios (if a repair 
session fails after part of the syncs are completed), and this limitation in 
particular will be addressed on CASSANDRA-9143.

Does this clarify your concerns or is there something else we may be missing?

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-11-14 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663766#comment-15663766
 ] 

Sylvain Lebresne commented on CASSANDRA-10446:
--

I wouldn't mind some clarification on this and incremental repair. My 
understanding of incremental repair, as it's currently implemented, is that 
having a sstable marked "repaired" is a global property (it means "all the 
replicas have the data in that sstable"). Which means we obviously cannot mark 
anything "repaired" if some node was down. This seems to be what the last patch 
is doing, but some of the discussions above seems to suggest this could be done 
differently in the future, after CASSANDRA-9143 in particular. Did I misread 
those discussions or did I miss something more fundamental?

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-11-08 Thread Paulo Motta (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648625#comment-15648625
 ] 

Paulo Motta commented on CASSANDRA-10446:
-

bq. doesn't CASSANDRA-6503 handle this issue?

Good point! I didn't recall this so this is not as bad as I though initially 
but there is still at least one hairy scenario where things could go wrong:
{noformat}
A: unrepaired={1} repaired={}
B: unrepaired={2} repaired={}
C: unrepaired={3} repaired={}
{noformat}

During incremental repair, A sends key 1 to B and dies. B and C stream 
successful. At the end of the failed repair session, things will look like:
{noformat}
A: unrepaired={1} repaired={}
B: unrepaired={2} repaired={1, 2, 3}
C: unrepaired={3} repaired={2, 3}
{noformat}

If A dies permanently before next repair, key 1 will never be incrementally 
repaired between B and C. Likewise, if C dies, A will never get key 3 from B 
via incremental repair. Maybe this is such an edge case it that wouldn't 
justify a change per se, but if we defer setting repairedAt of streamed 
sstables to anti-compaction phase then we could make this slightly more correct 
while supporting session-based --force repair without adding a new repairedAt 
field to {{SyncRequest}}.

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-11-08 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648584#comment-15648584
 ] 

Blake Eggleston commented on CASSANDRA-10446:
-

In the {{--force}} case it doesn't because {{RepairMessageVerbHandler}} will 
apply the repairedAt value computed at the beginning of the parent session, 
even if some nodes are being left out of the repair.

In the normal case, CASSANDRA-6503 helps, but the inconsistency is still 
possible because {{OnCompletionRunnable}} is run once a node has received all 
the files _it's_ expecting, but not necessarily before other nodes involved in 
the repair have received all their data, and there could still be a failure in 
that time.

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-11-08 Thread Marcus Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648431#comment-15648431
 ] 

Marcus Eriksson commented on CASSANDRA-10446:
-

doesn't CASSANDRA-6503 handle this issue?

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-11-07 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645856#comment-15645856
 ] 

Blake Eggleston commented on CASSANDRA-10446:
-

Oh wow, good catch, that's not good. So that issue, and several others, will be 
addressed in CASSANDRA-9143. I'm hoping to post a patch for it by the end of 
this week. Since that should address the fundamental issue of data being 
misclassified as repaired I've just pushed a commit up to my branch that 
doesn't set repairedAt, or run anti-compaction when the force flag is set.

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-11-07 Thread Paulo Motta (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15645492#comment-15645492
 ] 

Paulo Motta commented on CASSANDRA-10446:
-

It seems repairedAt field is not being set on remote sync tasks, what will 
cause streamed data to be marked as repaired (see SYNC_REQUEST handling on 
RepairMessageVerbHandler). We could add a repairedAt field to SyncRequest 
message (what would break minor compatibility, so could only go on 4.0), but 
this shows a more fundamental problem with repair failure handling which is 
that if a repair session fails in the middle of sync, streamed sstables will be 
marked as repaired even if not all nodes got the data. In order to solve this 
we could stream sstables with repairedAt=0, and add them to the pool of 
sstables to be anti-compacted, so they will only be marked as repaired at the 
end of the parent repair session.

If we want to add support to -force without fixing the more fundamental problem 
with repair sync failure handling, we could mark a forced ParentRepairSession 
as !isGlobal, what would mark all streamed sstables as *not* repaired as well 
as skip anti-compaction for the whole parent repair session.

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 4.0
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-10-12 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569398#comment-15569398
 ] 

Blake Eggleston commented on CASSANDRA-10446:
-

| [trunk|https://github.com/bdeggleston/cassandra/commits/10446-trunk] | 
[dtest|http://cassci.datastax.com/view/Dev/view/bdeggleston/job/bdeggleston-10446-trunk-dtest/]
 | 
[testall|http://cassci.datastax.com/view/Dev/view/bdeggleston/job/bdeggleston-10446-trunk-testall/]|

[associated dtest|https://github.com/bdeggleston/cassandra-dtest/commits/10446]

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: Blake Eggleston
>Priority: Minor
> Fix For: 3.x
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-01-20 Thread Anuj Wadehra (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109045#comment-15109045
 ] 

Anuj Wadehra commented on CASSANDRA-10446:
--

I think this option won't do the job. Referring to scenario, when a node failed 
in 20 node cluster, what nodes will you set in -hosts and how will you ensure 
that the entire ring is repaired? 

Suppose host20 failed, you would run "full repair with -hosts 
hosts1,host2...host19 option" on all 19 healthy nodes.This option is 
unrealistic. Clusters generally use repair -pr option to repair the cluster. 
With RF=5, Repair time would be 5 times more for 19 nodes. Moreover, it 
requires special planning and manual intervention with just one node failure 
which should be undesirable in a distributed fault tolerant system.

Another option would be to run repair -pr on 19 nodes and run repair separately 
on the ranges for which the failed node was responsible. But that wont work 
because -pr and -hosts options don't work together. 

Can you provide a better way to use -hosts option for addressing the issue?


> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Priority: Minor
> Fix For: 3.x
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-01-20 Thread Yuki Morishita (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108701#comment-15108701
 ] 

Yuki Morishita commented on CASSANDRA-10446:


You can still use '-hosts' repair option to specify which hosts to repair.
You can just give live nodes like 'nodetool repair -hosts node1 -hosts node2 
-hosts node3', and cassandra will repair among those nodes.

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Priority: Minor
> Fix For: 3.x
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-01-19 Thread Anuj Wadehra (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108146#comment-15108146
 ] 

Anuj Wadehra commented on CASSANDRA-10446:
--

Whether its bug or an improvement is debatable. The intent of the suggestion to 
increase the priority and change the type was to ensure that it gets due 
attention. I think by giving detailed scneario, I have tried to explain the 
critically of the issue. No, I was not interested in working on this.  

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Priority: Minor
> Fix For: 3.x
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-01-19 Thread sankalp kohli (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108124#comment-15108124
 ] 

sankalp kohli commented on CASSANDRA-10446:
---

This is an improvement and not a bug. Seems like you are interested in working 
on it...Should I assign it to you? 

> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Priority: Minor
> Fix For: 3.x
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

2016-01-19 Thread Anuj Wadehra (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107876#comment-15107876
 ] 

Anuj Wadehra commented on CASSANDRA-10446:
--

I think, this an issue with the way we handled the "downed replica" scenario in 
repairs. We should increase the priority and change the type from Improvement 
to Bug.

Consider following scenario and flow of events which demonstrate the importance 
of this issue:
Scenario: I have a 20 node clsuter, RF=5, Read/Write Quorum, gc grace 
period=20. My cluster is fault tolerant and it can afford 2 node failures.

Suddenly, one node goes down due to some hardware issue. The failed node would 
prevent repair on many nodes in the cluster as it has approximately 5/20th 
share of total data ..1/20 which it owns and 4/20 which is stored as replica of 
data owned by other nodes. Now Its 10 days since the node is down, most of the 
nodes are not being repaired and now its decision time. I am not sure how soon 
the issue would be fixed may be next 2 days i.e. 8 days before gc grace, so I 
shouldnt remove node early and add node back as it would cause significant and 
unnecessary streaming due to token re-arrangement. At the same time, if I dont 
remove the failed node at this time i.e. 10 days (much before gc grace), my 
entire system health would be in question and it would be a panic situation as 
most of the data didnt get repaired in last 10 days and gc grace is 
approaching. I need sufficient time to repair all nodes.
What looked like a fault tolerant Cassandra cluster which can easily afford 2 
node failure, required urgent attention and manual decision making when a 
single node went down. If some replicas are down, we should allow Repair to 
proceed with remaining replicas. If failed nodes comes up before gc grace 
period, we would run repair to fix inconsistencies and otheriwse we would 
discard data and bootstrap. I think that would be a really robust fault 
tolerant system.



> Run repair with down replicas
> -
>
> Key: CASSANDRA-10446
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10446
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Priority: Minor
> Fix For: 3.x
>
>
> We should have an option of running repair when replicas are down. We can 
> call it -force.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

[jira] [Commented] (CASSANDRA-10446) Run repair with down replicas

20 matches

Site Navigation

Mail list logo

Footer information