[jira] [Commented] (CASSANDRA-11933) Improve Repair performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328698#comment-15328698 ] Paulo Motta commented on CASSANDRA-11933: - Thanks for the update [~mahdix]. The patch looks good, I fixed one minor nit on 2.1 test, added CHANGES.txt entries, updated commit message (and author information that was screwed up on 2.2 and 3.0) and resubmitted tests (still running). ||2.1||2.2||3.0||trunk|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:11933-2.1]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...pauloricardomg:11933-2.2]|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:11933-3.0]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:11933-trunk]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-11933-2.1-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-11933-2.2-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-11933-3.0-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-11933-trunk-testall/lastCompletedBuild/testReport/]| |[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-11933-2.1-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-11933-2.2-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-11933-3.0-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-11933-trunk-dtest/lastCompletedBuild/testReport/]| > Improve Repair performance > -- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Cyril Scetbon >Assignee: Mahdi Mohammadi > > During a full repair on a ~ 60 nodes cluster, I've been able to see that > this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 > calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it > takes more than 99% of the time. This call takes 600ms when there is no load > on the cluster and more if there is. So for 10k ranges, you can imagine that > it takes at least 1.5 hours just to compute ranges. > Underneath it calls > [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] > which can get pretty inefficient ([~jbellis]'s > [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend > hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11933) Improve Repair performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328324#comment-15328324 ] Paulo Motta commented on CASSANDRA-11933: - Sorry for the delay, I was away for a few days, I will setup this shortly and post back here. > Improve Repair performance > -- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Cyril Scetbon >Assignee: Mahdi Mohammadi > > During a full repair on a ~ 60 nodes cluster, I've been able to see that > this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 > calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it > takes more than 99% of the time. This call takes 600ms when there is no load > on the cluster and more if there is. So for 10k ranges, you can imagine that > it takes at least 1.5 hours just to compute ranges. > Underneath it calls > [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] > which can get pretty inefficient ([~jbellis]'s > [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend > hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11933) Improve Repair performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328318#comment-15328318 ] Mahdi Mohammadi commented on CASSANDRA-11933: - Can someone setup CI for this ticket? > Improve Repair performance > -- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Cyril Scetbon >Assignee: Mahdi Mohammadi > > During a full repair on a ~ 60 nodes cluster, I've been able to see that > this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 > calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it > takes more than 99% of the time. This call takes 600ms when there is no load > on the cluster and more if there is. So for 10k ranges, you can imagine that > it takes at least 1.5 hours just to compute ranges. > Underneath it calls > [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] > which can get pretty inefficient ([~jbellis]'s > [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend > hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11933) Improve Repair performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321471#comment-15321471 ] Mahdi Mohammadi commented on CASSANDRA-11933: - [~pauloricardomg] Would you please setup CI for my branches? > Improve Repair performance > -- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Cyril Scetbon >Assignee: Mahdi Mohammadi > > During a full repair on a ~ 60 nodes cluster, I've been able to see that > this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 > calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it > takes more than 99% of the time. This call takes 600ms when there is no load > on the cluster and more if there is. So for 10k ranges, you can imagine that > it takes at least 1.5 hours just to compute ranges. > Underneath it calls > [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] > which can get pretty inefficient ([~jbellis]'s > [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend > hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11933) Improve Repair performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315660#comment-15315660 ] Mahdi Mohammadi commented on CASSANDRA-11933: - [~pauloricardomg] can you please set-up CI for my branches again? > Improve Repair performance > -- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Cyril Scetbon >Assignee: Mahdi Mohammadi > > During a full repair on a ~ 60 nodes cluster, I've been able to see that > this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 > calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it > takes more than 99% of the time. This call takes 600ms when there is no load > on the cluster and more if there is. So for 10k ranges, you can imagine that > it takes at least 1.5 hours just to compute ranges. > Underneath it calls > [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] > which can get pretty inefficient ([~jbellis]'s > [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend > hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11933) Improve Repair performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314971#comment-15314971 ] Mahdi Mohammadi commented on CASSANDRA-11933: - ||2.1||2.2|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...mm-binary:11933-2.1?expand=1]|[branch|https://github.com/apache/cassandra/compare/cassandra-2.2...mm-binary:11933-2.2?expand=0]| |testall|testall| |dtest|dtest| Will continue to add remaining branches if can't be auto-merged. > Improve Repair performance > -- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Cyril Scetbon >Assignee: Mahdi Mohammadi > > During a full repair on a ~ 60 nodes cluster, I've been able to see that > this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 > calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it > takes more than 99% of the time. This call takes 600ms when there is no load > on the cluster and more if there is. So for 10k ranges, you can imagine that > it takes at least 1.5 hours just to compute ranges. > Underneath it calls > [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] > which can get pretty inefficient ([~jbellis]'s > [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend > hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11933) Improve Repair performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313449#comment-15313449 ] Paulo Motta commented on CASSANDRA-11933: - Don't worry, they seem to be unrelated (flakey tests addressed elsewhere). > Improve Repair performance > -- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Cyril Scetbon >Assignee: Mahdi Mohammadi > > During a full repair on a ~ 60 nodes cluster, I've been able to see that > this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 > calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it > takes more than 99% of the time. This call takes 600ms when there is no load > on the cluster and more if there is. So for 10k ranges, you can imagine that > it takes at least 1.5 hours just to compute ranges. > Underneath it calls > [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] > which can get pretty inefficient ([~jbellis]'s > [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend > hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11933) Improve Repair performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313430#comment-15313430 ] Mahdi Mohammadi commented on CASSANDRA-11933: - CI Reports 5 test failures in dtest and testall together. Does that mean something is wrong with my change? > Improve Repair performance > -- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Cyril Scetbon >Assignee: Mahdi Mohammadi > > During a full repair on a ~ 60 nodes cluster, I've been able to see that > this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 > calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it > takes more than 99% of the time. This call takes 600ms when there is no load > on the cluster and more if there is. So for 10k ranges, you can imagine that > it takes at least 1.5 hours just to compute ranges. > Underneath it calls > [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] > which can get pretty inefficient ([~jbellis]'s > [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend > hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11933) Improve Repair performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313426#comment-15313426 ] Mahdi Mohammadi commented on CASSANDRA-11933: - You are right. Will rename that and check for other versions too. > Improve Repair performance > -- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Cyril Scetbon >Assignee: Mahdi Mohammadi > > During a full repair on a ~ 60 nodes cluster, I've been able to see that > this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 > calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it > takes more than 99% of the time. This call takes 600ms when there is no load > on the cluster and more if there is. So for 10k ranges, you can imagine that > it takes at least 1.5 hours just to compute ranges. > Underneath it calls > [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] > which can get pretty inefficient ([~jbellis]'s > [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend > hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11933) Improve Repair performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313220#comment-15313220 ] Paulo Motta commented on CASSANDRA-11933: - This looks good, thanks! Just a minor nitpick: can you rename the variable from {{keyspaceLocalRange}} to {{keyspaceLocalRange*s*}}, or maybe just {{localRanges}} (it's implicit that is for a given keyspace)? Also, could you check if this patch merges to cassandra-2.2 all the way up to trunk (via cassandra-3.7), and if not, provide patch for conflicting versions? Submitted CI unit and dtests for 2.1: ||2.1|| |[branch|https://github.com/apache/cassandra/compare/cassandra-2.1...pauloricardomg:11933-2.1]| |[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-11933-2.1-testall/lastCompletedBuild/testReport/]| |[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-11933-2.1-dtest/lastCompletedBuild/testReport/]| > Improve Repair performance > -- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Cyril Scetbon >Assignee: Mahdi Mohammadi > > During a full repair on a ~ 60 nodes cluster, I've been able to see that > this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 > calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it > takes more than 99% of the time. This call takes 600ms when there is no load > on the cluster and more if there is. So for 10k ranges, you can imagine that > it takes at least 1.5 hours just to compute ranges. > Underneath it calls > [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] > which can get pretty inefficient ([~jbellis]'s > [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend > hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11933) Improve Repair performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313183#comment-15313183 ] Mahdi Mohammadi commented on CASSANDRA-11933: - How can I run my branch in ci (for testall and dtests)? > Improve Repair performance > -- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Cyril Scetbon >Assignee: Mahdi Mohammadi > > During a full repair on a ~ 60 nodes cluster, I've been able to see that > this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 > calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it > takes more than 99% of the time. This call takes 600ms when there is no load > on the cluster and more if there is. So for 10k ranges, you can imagine that > it takes at least 1.5 hours just to compute ranges. > Underneath it calls > [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] > which can get pretty inefficient ([~jbellis]'s > [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend > hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11933) Improve Repair performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15313181#comment-15313181 ] Mahdi Mohammadi commented on CASSANDRA-11933: - [Branch for 2.1|https://github.com/mm-binary/cassandra/tree/11933-2.1] > Improve Repair performance > -- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Cyril Scetbon >Assignee: Mahdi Mohammadi > > During a full repair on a ~ 60 nodes cluster, I've been able to see that > this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 > calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it > takes more than 99% of the time. This call takes 600ms when there is no load > on the cluster and more if there is. So for 10k ranges, you can imagine that > it takes at least 1.5 hours just to compute ranges. > Underneath it calls > [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] > which can get pretty inefficient ([~jbellis]'s > [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend > hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11933) Improve Repair performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310628#comment-15310628 ] Joshua McKenzie commented on CASSANDRA-11933: - Go for it - assigned it to you. > Improve Repair performance > -- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Cyril Scetbon >Assignee: Mahdi Mohammadi > > During a full repair on a ~ 60 nodes cluster, I've been able to see that > this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 > calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it > takes more than 99% of the time. This call takes 600ms when there is no load > on the cluster and more if there is. So for 10k ranges, you can imagine that > it takes at least 1.5 hours just to compute ranges. > Underneath it calls > [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] > which can get pretty inefficient ([~jbellis]'s > [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend > hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11933) Improve Repair performance
[ https://issues.apache.org/jira/browse/CASSANDRA-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15310024#comment-15310024 ] Mahdi Mohammadi commented on CASSANDRA-11933: - Can I work on this ticket? > Improve Repair performance > -- > > Key: CASSANDRA-11933 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11933 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Cyril Scetbon > > During a full repair on a ~ 60 nodes cluster, I've been able to see that > this stage can be significant (up to 60 percent of the whole time) : > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageService.java#L2983-L2997 > It's merely caused by the fact that > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/ActiveRepairService.java#L189 > calls {code}ss.getLocalRanges(keyspaceName){code} everytime and that it > takes more than 99% of the time. This call takes 600ms when there is no load > on the cluster and more if there is. So for 10k ranges, you can imagine that > it takes at least 1.5 hours just to compute ranges. > Underneath it calls > [ReplicationStrategy.getAddressRanges|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L170] > which can get pretty inefficient ([~jbellis]'s > [words|https://github.com/apache/cassandra/blob/3dcbe90e02440e6ee534f643c7603d50ca08482b/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L165]) > *ss.getLocalRanges(keyspaceName)* should be cached to avoid having to spend > hours on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)