[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268362#comment-16268362 ] Corentin Chary commented on CASSANDRA-13215: [~krummas] I tried it on our test cluster and it seems to work great. startup time got divided by 3~4. I expect it to have an even greater impact on prod (mode nodes, more sstables). > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin >Assignee: Marcus Eriksson > Fix For: 3.11.2, 4.0 > > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264445#comment-16264445 ] Paulo Motta commented on CASSANDRA-13215: - bq. both look pretty bad, but I don't think it is because of this patch yeah, they seem to be a problem with the jolokia agent not working correctly, probably some configuration error on the CI server. In any case, I ran some failing compaction tests locally and they passed which indicate it's indeed a problem with CI. Patch LGTM, though can you just add a {{toString}} to {{DiskBoundaries}} so the boundary changes are logged correctly? Marking this as ready to commit so feel free to fix this on commit. > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin >Assignee: Marcus Eriksson > Fix For: 3.11.x, 4.x > > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263983#comment-16263983 ] Marcus Eriksson commented on CASSANDRA-13215: - https://github.com/krummas/cassandra/commits/marcuse/13215-version2 https://github.com/krummas/cassandra/commits/marcuse/13215-trunk test results: trunk: https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/437/ 3.11: https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/434/ both look pretty bad, but I don't think it is because of this patch > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin >Assignee: Marcus Eriksson > Fix For: 3.11.x, 4.x > > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245164#comment-16245164 ] Paulo Motta commented on CASSANDRA-13215: - Good job, this is much nicer than having the StorageService manage the disk boundaries. Patch and tests LGTM. Two minor nits: * could you make {{getDiskBoundaryValue}} and {{getDiskBoundaries}} static? * can you log the actual boundary changes to facilitate debugging? Probably not a big deal but we will unnecessarily invalidate the disk boundaries whenever there is a keyspace change (table creation, drop, add view, etc) - rather then when replication settings or local ranges changes, do you think we should invalidate the boundaries only when replication settings/local range change or not really bother about this? > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin >Assignee: Marcus Eriksson > Fix For: 3.11.x, 4.x > > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233674#comment-16233674 ] Paulo Motta commented on CASSANDRA-13215: - bq. I might change it slightly during the day, but if you want to start reviewing it should be ok Thanks for the heads up! I should only be able to review it early next week anyway, so please update if/when you have another version. > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin >Assignee: Marcus Eriksson >Priority: Major > Fix For: 3.11.x, 4.x > > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201558#comment-16201558 ] Corentin Chary commented on CASSANDRA-13215: I'll try to test that in our test env in the next days :) > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin >Assignee: Marcus Eriksson > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200205#comment-16200205 ] Marcus Eriksson commented on CASSANDRA-13215: - so finally got around to finishing this up, I still need to write a bunch of tests, but if anyone has a safe place to try it out, please do https://github.com/krummas/cassandra/commits/marcuse/13215 > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin >Assignee: Marcus Eriksson > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173029#comment-16173029 ] Corentin Chary commented on CASSANDRA-13215: Cool, will be happy to test it and report performance improvements (mostly during startup) > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin >Assignee: Marcus Eriksson > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145464#comment-16145464 ] Marcus Eriksson commented on CASSANDRA-13215: - [~iksaif] yeah I'm working on it, should have a patch ready soon > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin >Assignee: Marcus Eriksson > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145453#comment-16145453 ] Corentin Chary commented on CASSANDRA-13215: I can confirm that this is affecting us too (startup and repairs). [~krummas] did you end up doing something for this issue ? Else I might give it a shot. > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin >Assignee: Marcus Eriksson > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057322#comment-16057322 ] Viktor Kuzmin commented on CASSANDRA-13215: --- Marcus Eriksson, It would be great if you'll be able to pick this up. > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057183#comment-16057183 ] Marcus Eriksson commented on CASSANDRA-13215: - [~kvaster] let me know if you want me to pick this up > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044231#comment-16044231 ] Viktor Kuzmin commented on CASSANDRA-13215: --- It seems that my knowledge of cassandra internals are not enough to complete this task fast... > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891771#comment-15891771 ] Marcus Eriksson commented on CASSANDRA-13215: - We also need to invalidate the cache on ring changes (ie, if a node joins/leaves the ring, we need to recalculate the boundaries) - ie, you probably need to implement {{IEndpointStateChangeSubscriber}} and register in {{Gossiper}} Let me know if you have time to work on this, otherwise I can pick it up > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868206#comment-15868206 ] Viktor Kuzmin commented on CASSANDRA-13215: --- It is really used on critical path. I think that this affects not only startup time, but repairs aswell... And maybe some other parts... > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x
[ https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865479#comment-15865479 ] Romain Hardouin commented on CASSANDRA-13215: - It's related to CASSANDRA-6696 i.e. since 3.2. Regarding {{AbstractReplicationStrategy.getAddressRanges}} it seems to be a known limitation. Maybe we can now consider that it's used on a critical path: {code} /* * NOTE: this is pretty inefficient. also the inverse (getRangeAddresses) below. * this is fine as long as we don't use this on any critical path. * (fixing this would probably require merging tokenmetadata into replicationstrategy, * so we could cache/invalidate cleanly.) */ public MultimapgetAddressRanges(TokenMetadata metadata) {code} > Cassandra nodes startup time 20x more after upgarding to 3.x > > > Key: CASSANDRA-13215 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13215 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Cluster setup: two datacenters (dc-main, dc-backup). > dc-main - 9 servers, no vnodes > dc-backup - 6 servers, vnodes >Reporter: Viktor Kuzmin > Attachments: simple-cache.patch > > > CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable > at startup. And this function calls StorageService.getDiskBoundaries. And > getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges. > It appears that last function can be really slow. In our environment we have > 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 > computations in worst case (maybe I'm wrong, but it really takes lot's of > cpu). > Also this function can affect runtime later, cause it is called not only > during startup. > I've tried to implement simple cache for getDiskBoundaries results and now > startup time is about one minute instead of 25m, but I'm not sure if it's a > good solution. -- This message was sent by Atlassian JIRA (v6.3.15#6346)