[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15980821#comment-15980821 ] Stefan Podkowinski commented on CASSANDRA-13257: +1 Thanks for taking the opportunity to add some content to the repair page, Blake! I'll add some comments and additional content in a separate PR on top of it. > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15979131#comment-15979131 ] Blake Eggleston commented on CASSANDRA-13257: - [~spo...@gmail.com], I added a commit adding docs and NEWS.txt stuff [here|https://github.com/bdeggleston/cassandra/commit/dd3efd19179dae6297c95444c623c128976cb658], can you take a look? I made a small change to the nodetool doc generator so we can link to the generated docs from other pages. I also added a line to the upgrading/incremental repair section that recommends users run a full repair after upgrading if they were using incremental repair in 3.x > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974353#comment-15974353 ] Stefan Podkowinski commented on CASSANDRA-13257: This is a new feature that should be covered in the docs and NEWS.txt. > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974337#comment-15974337 ] Marcus Eriksson commented on CASSANDRA-13257: - +1 > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969672#comment-15969672 ] Blake Eggleston commented on CASSANDRA-13257: - ok, comments addressed here: https://github.com/bdeggleston/cassandra/tree/13257-squashed-trunk dtest branch here: https://github.com/bdeggleston/cassandra-dtest/tree/13257 > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967447#comment-15967447 ] Marcus Eriksson commented on CASSANDRA-13257: - code LGTM, just a few small comments; * {{\-p}} (short for {{--preview}}) clashes with {{-p}} for port * logging - we should make it clear that we are doing a preview repair, perhaps replace the prefix {{\[repair #{}\] ...}} with {{\[preview repair #{}\]}}? * Use {{FBUtilities.prettyPrintMemory}} when displaying the result? * Seems we still insert into {{system_distributed.repair_history}} in a few places during a preview, we should probably avoid that * Log the result as well as outputting it to stdout - if we ctrl+c the command we could still read the result * A few dtests running these new commands so we don't break them in the future > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954197#comment-15954197 ] Blake Eggleston commented on CASSANDRA-13257: - /cc [~yukim] ^^ > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949858#comment-15949858 ] Blake Eggleston commented on CASSANDRA-13257: - bq. I think even so streaming preview covers both full and incremental repair case, and other streaming usage. No, I’m afraid it doesn’t. Part of the confusion here is that my linked patch doesn’t include the fix included in CASSANDRA-13328, which fixes how sstables are selected for streaming post #9143. Sorry about that. The other part is that, post CASSANDRA-9143, incremental repair does an anti-compaction before doing anything else, including validation or streaming. Rewriting a bunch of sstables just so we can estimate the streaming that would happen if we ran one for real is sort of a non-starter. So, I still don’t see a way we can prevent StreamSession from having some notion of what is being previewed. Previewing incremental repair streaming means that we need StreamSession to know it should only include unrepaired sstables, instead of all sstables, as it would with a full repair, since we won’t be including a pending repair id. After #13328, the isIncremental flag in StreamSession is not doing anything, and I have a note to remove it before 4.0. We could make the argument that we should leave it to support preview, but then why not just have the preview enum, which has a much clearer purpose? Also, while knowing that there was a merkle tree mismatch is technically enough to validate whether repaired data is in sync across nodes, having information about the related streaming we expect does have value which shouldn’t be dismissed just because it’s a bit abstract. From the development side, it will provide clues about the cause of the mismatch (ie: a one way transfer indicates that one node failed to promote an sstable). From the operational side, knowing how much data needs to be streamed to fix the out of sync data is useful, it also indicates the severity of the problem, and worst case data loss risk in the case of corruption. But, we can't do this without StreamSession having some notion of what's being previewed. Rebased against trunk (and CASSANDRA-13325) here: https://github.com/bdeggleston/cassandra/tree/13257-squashed-trunk > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946527#comment-15946527 ] Yuki Morishita commented on CASSANDRA-13257: Sorry for delay. I think about {{PreviewKind}} again, and I'd like to keep that in repair. The reason is that there is and will be a filtering logic that only gets executed when "previewing", which sound weird to me since the logic never get called when not previewing. Right now the only case is to validate repair. And for that, there is no need to preview streaming. I think even so streaming preview covers both full and incremental repair case, and other streaming usage. My patch to move {{PreviewKind}} is here: https://github.com/yukim/cassandra/commit/55c3db065867c402bc4b1fc38ac0460854db6af6 For repair validation, {{SyncTask}} returns without invoking streaming preview. I think we should add more info to {{SyncStat}}, as you left the note in TODO comment to give more information for repair validation. I also renamed {{PreviewKind}} to match repair options. as it gets displayed in repair command output. Maybe implementing {{toString()}} is better though. Other than that, I think overall well implemented. Thanks! > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929079#comment-15929079 ] Blake Eggleston commented on CASSANDRA-13257: - bq. If we have subcommand, then nodetool repair validate would be the right command. Good idea, I've changed the nodetool command to {{nodetool repair --validate}} bq. Does this need to "preview" streaming as well? Seems validating repaired SSTables is enough. Strictly speaking, no. However, since the other previews do, I think it makes more sense to reuse it, and not make another code path that does basically the same thing. > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927830#comment-15927830 ] Yuki Morishita commented on CASSANDRA-13257: bq. validate that repaired data is in sync I'm not sure this should go to {{nodetool repair}} command. If we have subcommand, then {{nodetool repair validate}} would be the right command. Does this need to "preview" streaming as well? Seems validating repaired SSTables is enough. About {{PreviewKind}} in streaming, it is how SSTables are selected and I'm fine with it for now until we have more cleaner way to decouple from streaming itself. bq. We’d only ever be able to preview the full repair case. Would it be so? I will look up CASSANDRA-13328 as well. > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926759#comment-15926759 ] Blake Eggleston commented on CASSANDRA-13257: - The use cases I have in mind for the preview types {{ALL}}, {{UNREPAIRED}}, {{REPAIRED}} are: estimate streaming required for a full repair, estimate streaming for an incremental repair, and validate that repaired data is in sync, respectively. The immediate need this ticket addresses is validating CASSANDRA-9143 in a large and active cluster, where the main use of preview will be validating that repaired data is in sync Using just a boolean {{isPreview}} on the streaming side doesn’t provide enough information to perform an accurate preview. We’d only ever be able to preview the full repair case. The existing stream session sstable selection logic either selects all sstables for a token range, or (post CASSANDRA-13328) only the sstables in a token range which are part of an in-progress repair. Selecting only the repaired or unrepaired sstables is not supported. Starting an actual incremental repair won’t work because it will perform anti-compaction before it does anything. Making {{StreamSession}} aware of {{PreviewKind}} is the most straightforward way to do this. Between that and the potential for previewing streaming for things like decommission, etc, I think the best place for {{PreviewKind}} is in the streaming package. Supporting some repair related operations isn’t a stretch, given repair and streaming are already fairly closely coupled. > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925864#comment-15925864 ] Yuki Morishita commented on CASSANDRA-13257: I wonder what is the usage of "Perform preview on repaired data"/{{PreviewKind.REPAIRED}}. I kind of want to separate context of "preview streaming" and "preview repair". So I'd like to keep {{PreviewKind}} in repair package, and have boolean {{isPreview}} in streaming. It seems to me that the reason we have {{PreviewKind}} in streaming right now is to add above functionality. > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature > Components: Streaming and Messaging >Reporter: Blake Eggleston >Assignee: Blake Eggleston > Fix For: 4.0 > > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899902#comment-15899902 ] Blake Eggleston commented on CASSANDRA-13257: - [~yukim] I've pushed commits converting the preview to use streaming instead of merkle tree diffs. A squashed version is [here|https://github.com/bdeggleston/cassandra/tree/13257-squashed] New ci runs are here: |[dtest|http://cassci.datastax.com/job/bdeggleston-13257-squashed-dtest/]|[testall|http://cassci.datastax.com/job/bdeggleston-13257-squashed-testall/]| > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature >Reporter: Blake Eggleston >Assignee: Blake Eggleston > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886692#comment-15886692 ] Blake Eggleston commented on CASSANDRA-13257: - bq. Is it possible to add "dry-run" feature to stremaing instead of TreeDifference? That’s a really good idea. It will also be more accurate than TreeDifference, since it will measure actual sstable sizes, not the size of the partitions that come out of the validation compaction. > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature >Reporter: Blake Eggleston >Assignee: Blake Eggleston > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885005#comment-15885005 ] Yuki Morishita commented on CASSANDRA-13257: Is it possible to add "dry-run" feature to stremaing instead of TreeDifference? That way, we can use the feature to preview other streaming operations. For example: {code} StreamPlan newPlan = new StreamPlan(dryRun=true); newPlan.addTransferRanges(range); StreamResultFuture future = newPlan.execute(); // if dryRun, only run streaming until PREPARE phase so that exactly send and *receive* size are known, and return calculated StreamState based on those. StreamState dryRunResult = future.get(); {code} Or, when streaming is only for out going (like decommission) we may be able to return without exchanging messages. WDYT? > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature >Reporter: Blake Eggleston >Assignee: Blake Eggleston > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13257) Add repair streaming preview
[ https://issues.apache.org/jira/browse/CASSANDRA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880947#comment-15880947 ] Blake Eggleston commented on CASSANDRA-13257: - [~pauloricardomg], [~yukim] - would one of you be interested in reviewing? > Add repair streaming preview > > > Key: CASSANDRA-13257 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13257 > Project: Cassandra > Issue Type: New Feature >Reporter: Blake Eggleston >Assignee: Blake Eggleston > > It would be useful to be able to estimate the amount of repair streaming that > needs to be done, without actually doing any streaming. Our main motivation > for this having something this is validating CASSANDRA-9143 in production, > but I’d imagine it could also be a useful tool in troubleshooting. -- This message was sent by Atlassian JIRA (v6.3.15#6346)