[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-14210: Resolution: Fixed Fix Version/s: (was: 4.x) 3.11.3 3.0.17 4.0 Reproduced In: 3.0.15, 2.2.11 (was: 2.2.11, 3.0.15) Status: Resolved (was: Ready to Commit) committed as {{f88ec9357de406daad0f795951f17e5f854ade10}} - thanks! > Optimize SSTables upgrade task scheduling > - > > Key: CASSANDRA-14210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14210 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Oleksandr Shulgin >Assignee: Kurt Greaves >Priority: Major > Fix For: 4.0, 3.0.17, 3.11.3 > > > When starting the SSTable-rewrite process by running {{nodetool > upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are > used. > For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}. > What we observed both for version 2.2 and 3.0, is that initially all 4 > provided slots are used for "Upgrade sstables" compactions, but later when > some of the 4 tasks are finished, no new tasks are scheduled immediately. It > takes the last of the 4 tasks to finish before new 4 tasks would be > scheduled. This happens on every node we've observed. > This doesn't utilize available resources to the full extent allowed by the > --jobs N parameter. In the field, on a cluster of 12 nodes with 4-5 TiB data > each, we've seen that the whole process was taking more than 7 days, instead > of estimated 1.5-2 days (provided there would be close to full N slots > utilization). > Instead, new tasks should be scheduled as soon as there is a free compaction > slot. > Additionally, starting from the biggest SSTables could further reduce the > total time required for the whole process to finish on any given node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Greaves updated CASSANDRA-14210: - Status: Ready to Commit (was: Patch Available) > Optimize SSTables upgrade task scheduling > - > > Key: CASSANDRA-14210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14210 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Oleksandr Shulgin >Assignee: Kurt Greaves >Priority: Major > Fix For: 4.x > > > When starting the SSTable-rewrite process by running {{nodetool > upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are > used. > For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}. > What we observed both for version 2.2 and 3.0, is that initially all 4 > provided slots are used for "Upgrade sstables" compactions, but later when > some of the 4 tasks are finished, no new tasks are scheduled immediately. It > takes the last of the 4 tasks to finish before new 4 tasks would be > scheduled. This happens on every node we've observed. > This doesn't utilize available resources to the full extent allowed by the > --jobs N parameter. In the field, on a cluster of 12 nodes with 4-5 TiB data > each, we've seen that the whole process was taking more than 7 days, instead > of estimated 1.5-2 days (provided there would be close to full N slots > utilization). > Instead, new tasks should be scheduled as soon as there is a free compaction > slot. > Additionally, starting from the biggest SSTables could further reduce the > total time required for the whole process to finish on any given node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Greaves updated CASSANDRA-14210: - Reproduced In: 3.0.15, 2.2.11 (was: 2.2.11, 3.0.15) Status: Patch Available (was: Awaiting Feedback) > Optimize SSTables upgrade task scheduling > - > > Key: CASSANDRA-14210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14210 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Oleksandr Shulgin >Assignee: Kurt Greaves >Priority: Major > Fix For: 4.x > > > When starting the SSTable-rewrite process by running {{nodetool > upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are > used. > For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}. > What we observed both for version 2.2 and 3.0, is that initially all 4 > provided slots are used for "Upgrade sstables" compactions, but later when > some of the 4 tasks are finished, no new tasks are scheduled immediately. It > takes the last of the 4 tasks to finish before new 4 tasks would be > scheduled. This happens on every node we've observed. > This doesn't utilize available resources to the full extent allowed by the > --jobs N parameter. In the field, on a cluster of 12 nodes with 4-5 TiB data > each, we've seen that the whole process was taking more than 7 days, instead > of estimated 1.5-2 days (provided there would be close to full N slots > utilization). > Instead, new tasks should be scheduled as soon as there is a free compaction > slot. > Additionally, starting from the biggest SSTables could further reduce the > total time required for the whole process to finish on any given node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-14210: Reproduced In: 3.0.15, 2.2.11 (was: 2.2.11, 3.0.15) Status: Open (was: Patch Available) > Optimize SSTables upgrade task scheduling > - > > Key: CASSANDRA-14210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14210 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Oleksandr Shulgin >Assignee: Kurt Greaves >Priority: Major > Fix For: 4.x > > > When starting the SSTable-rewrite process by running {{nodetool > upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are > used. > For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}. > What we observed both for version 2.2 and 3.0, is that initially all 4 > provided slots are used for "Upgrade sstables" compactions, but later when > some of the 4 tasks are finished, no new tasks are scheduled immediately. It > takes the last of the 4 tasks to finish before new 4 tasks would be > scheduled. This happens on every node we've observed. > This doesn't utilize available resources to the full extent allowed by the > --jobs N parameter. In the field, on a cluster of 12 nodes with 4-5 TiB data > each, we've seen that the whole process was taking more than 7 days, instead > of estimated 1.5-2 days (provided there would be close to full N slots > utilization). > Instead, new tasks should be scheduled as soon as there is a free compaction > slot. > Additionally, starting from the biggest SSTables could further reduce the > total time required for the whole process to finish on any given node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-14210: Reproduced In: 3.0.15, 2.2.11 (was: 2.2.11, 3.0.15) Status: Awaiting Feedback (was: Open) > Optimize SSTables upgrade task scheduling > - > > Key: CASSANDRA-14210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14210 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Oleksandr Shulgin >Assignee: Kurt Greaves >Priority: Major > Fix For: 4.x > > > When starting the SSTable-rewrite process by running {{nodetool > upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are > used. > For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}. > What we observed both for version 2.2 and 3.0, is that initially all 4 > provided slots are used for "Upgrade sstables" compactions, but later when > some of the 4 tasks are finished, no new tasks are scheduled immediately. It > takes the last of the 4 tasks to finish before new 4 tasks would be > scheduled. This happens on every node we've observed. > This doesn't utilize available resources to the full extent allowed by the > --jobs N parameter. In the field, on a cluster of 12 nodes with 4-5 TiB data > each, we've seen that the whole process was taking more than 7 days, instead > of estimated 1.5-2 days (provided there would be close to full N slots > utilization). > Instead, new tasks should be scheduled as soon as there is a free compaction > slot. > Additionally, starting from the biggest SSTables could further reduce the > total time required for the whole process to finish on any given node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-14210: Reproduced In: 3.0.15, 2.2.11 (was: 2.2.11, 3.0.15) Reviewer: Marcus Eriksson > Optimize SSTables upgrade task scheduling > - > > Key: CASSANDRA-14210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14210 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Oleksandr Shulgin >Assignee: Kurt Greaves >Priority: Major > Fix For: 4.x > > > When starting the SSTable-rewrite process by running {{nodetool > upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are > used. > For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}. > What we observed both for version 2.2 and 3.0, is that initially all 4 > provided slots are used for "Upgrade sstables" compactions, but later when > some of the 4 tasks are finished, no new tasks are scheduled immediately. It > takes the last of the 4 tasks to finish before new 4 tasks would be > scheduled. This happens on every node we've observed. > This doesn't utilize available resources to the full extent allowed by the > --jobs N parameter. In the field, on a cluster of 12 nodes with 4-5 TiB data > each, we've seen that the whole process was taking more than 7 days, instead > of estimated 1.5-2 days (provided there would be close to full N slots > utilization). > Instead, new tasks should be scheduled as soon as there is a free compaction > slot. > Additionally, starting from the biggest SSTables could further reduce the > total time required for the whole process to finish on any given node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-14210: --- Fix Version/s: 4.x > Optimize SSTables upgrade task scheduling > - > > Key: CASSANDRA-14210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14210 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Oleksandr Shulgin >Assignee: Kurt Greaves >Priority: Major > Fix For: 4.x > > > When starting the SSTable-rewrite process by running {{nodetool > upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are > used. > For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}. > What we observed both for version 2.2 and 3.0, is that initially all 4 > provided slots are used for "Upgrade sstables" compactions, but later when > some of the 4 tasks are finished, no new tasks are scheduled immediately. It > takes the last of the 4 tasks to finish before new 4 tasks would be > scheduled. This happens on every node we've observed. > This doesn't utilize available resources to the full extent allowed by the > --jobs N parameter. In the field, on a cluster of 12 nodes with 4-5 TiB data > each, we've seen that the whole process was taking more than 7 days, instead > of estimated 1.5-2 days (provided there would be close to full N slots > utilization). > Instead, new tasks should be scheduled as soon as there is a free compaction > slot. > Additionally, starting from the biggest SSTables could further reduce the > total time required for the whole process to finish on any given node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kurt Greaves updated CASSANDRA-14210: - Reproduced In: 3.0.15, 2.2.11 (was: 2.2.11, 3.0.15) Status: Patch Available (was: In Progress) [trunk|https://github.com/apache/cassandra/compare/trunk...kgreav:14210-trunk] I've implemented the above, but it's not clear to me why starting with the largest SSTables would be faster. I think that we should do them in size order, to maximise benefit of any storage improvements from new storage formats, but this is really quite minor. I suppose it also makes the process easier to reason about from a user perspective, but really I can't see how it will make much of a difference. > Optimize SSTables upgrade task scheduling > - > > Key: CASSANDRA-14210 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14210 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Oleksandr Shulgin >Assignee: Kurt Greaves >Priority: Major > > When starting the SSTable-rewrite process by running {{nodetool > upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are > used. > For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}. > What we observed both for version 2.2 and 3.0, is that initially all 4 > provided slots are used for "Upgrade sstables" compactions, but later when > some of the 4 tasks are finished, no new tasks are scheduled immediately. It > takes the last of the 4 tasks to finish before new 4 tasks would be > scheduled. This happens on every node we've observed. > This doesn't utilize available resources to the full extent allowed by the > --jobs N parameter. In the field, on a cluster of 12 nodes with 4-5 TiB data > each, we've seen that the whole process was taking more than 7 days, instead > of estimated 1.5-2 days (provided there would be close to full N slots > utilization). > Instead, new tasks should be scheduled as soon as there is a free compaction > slot. > Additionally, starting from the biggest SSTables could further reduce the > total time required for the whole process to finish on any given node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org