[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling

2018-03-05 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-14210:

   Resolution: Fixed
Fix Version/s: (was: 4.x)
   3.11.3
   3.0.17
   4.0
Reproduced In: 3.0.15, 2.2.11  (was: 2.2.11, 3.0.15)
   Status: Resolved  (was: Ready to Commit)

committed as {{f88ec9357de406daad0f795951f17e5f854ade10}} - thanks!

> Optimize SSTables upgrade task scheduling
> -
>
> Key: CASSANDRA-14210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Oleksandr Shulgin
>Assignee: Kurt Greaves
>Priority: Major
> Fix For: 4.0, 3.0.17, 3.11.3
>
>
> When starting the SSTable-rewrite process by running {{nodetool 
> upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are 
> used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  
> What we observed both for version 2.2 and 3.0, is that initially all 4 
> provided slots are used for "Upgrade sstables" compactions, but later when 
> some of the 4 tasks are finished, no new tasks are scheduled immediately.  It 
> takes the last of the 4 tasks to finish before new 4 tasks would be 
> scheduled.  This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the 
> --jobs N parameter.  In the field, on a cluster of 12 nodes with 4-5 TiB data 
> each, we've seen that the whole process was taking more than 7 days, instead 
> of estimated 1.5-2 days (provided there would be close to full N slots 
> utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction 
> slot.
> Additionally, starting from the biggest SSTables could further reduce the 
> total time required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling

2018-02-27 Thread Kurt Greaves (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves updated CASSANDRA-14210:
-
Status: Ready to Commit  (was: Patch Available)

> Optimize SSTables upgrade task scheduling
> -
>
> Key: CASSANDRA-14210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Oleksandr Shulgin
>Assignee: Kurt Greaves
>Priority: Major
> Fix For: 4.x
>
>
> When starting the SSTable-rewrite process by running {{nodetool 
> upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are 
> used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  
> What we observed both for version 2.2 and 3.0, is that initially all 4 
> provided slots are used for "Upgrade sstables" compactions, but later when 
> some of the 4 tasks are finished, no new tasks are scheduled immediately.  It 
> takes the last of the 4 tasks to finish before new 4 tasks would be 
> scheduled.  This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the 
> --jobs N parameter.  In the field, on a cluster of 12 nodes with 4-5 TiB data 
> each, we've seen that the whole process was taking more than 7 days, instead 
> of estimated 1.5-2 days (provided there would be close to full N slots 
> utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction 
> slot.
> Additionally, starting from the biggest SSTables could further reduce the 
> total time required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling

2018-02-27 Thread Kurt Greaves (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves updated CASSANDRA-14210:
-
Reproduced In: 3.0.15, 2.2.11  (was: 2.2.11, 3.0.15)
   Status: Patch Available  (was: Awaiting Feedback)

> Optimize SSTables upgrade task scheduling
> -
>
> Key: CASSANDRA-14210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Oleksandr Shulgin
>Assignee: Kurt Greaves
>Priority: Major
> Fix For: 4.x
>
>
> When starting the SSTable-rewrite process by running {{nodetool 
> upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are 
> used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  
> What we observed both for version 2.2 and 3.0, is that initially all 4 
> provided slots are used for "Upgrade sstables" compactions, but later when 
> some of the 4 tasks are finished, no new tasks are scheduled immediately.  It 
> takes the last of the 4 tasks to finish before new 4 tasks would be 
> scheduled.  This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the 
> --jobs N parameter.  In the field, on a cluster of 12 nodes with 4-5 TiB data 
> each, we've seen that the whole process was taking more than 7 days, instead 
> of estimated 1.5-2 days (provided there would be close to full N slots 
> utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction 
> slot.
> Additionally, starting from the biggest SSTables could further reduce the 
> total time required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling

2018-02-19 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-14210:

Reproduced In: 3.0.15, 2.2.11  (was: 2.2.11, 3.0.15)
   Status: Open  (was: Patch Available)

> Optimize SSTables upgrade task scheduling
> -
>
> Key: CASSANDRA-14210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Oleksandr Shulgin
>Assignee: Kurt Greaves
>Priority: Major
> Fix For: 4.x
>
>
> When starting the SSTable-rewrite process by running {{nodetool 
> upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are 
> used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  
> What we observed both for version 2.2 and 3.0, is that initially all 4 
> provided slots are used for "Upgrade sstables" compactions, but later when 
> some of the 4 tasks are finished, no new tasks are scheduled immediately.  It 
> takes the last of the 4 tasks to finish before new 4 tasks would be 
> scheduled.  This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the 
> --jobs N parameter.  In the field, on a cluster of 12 nodes with 4-5 TiB data 
> each, we've seen that the whole process was taking more than 7 days, instead 
> of estimated 1.5-2 days (provided there would be close to full N slots 
> utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction 
> slot.
> Additionally, starting from the biggest SSTables could further reduce the 
> total time required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling

2018-02-19 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-14210:

Reproduced In: 3.0.15, 2.2.11  (was: 2.2.11, 3.0.15)
   Status: Awaiting Feedback  (was: Open)

> Optimize SSTables upgrade task scheduling
> -
>
> Key: CASSANDRA-14210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Oleksandr Shulgin
>Assignee: Kurt Greaves
>Priority: Major
> Fix For: 4.x
>
>
> When starting the SSTable-rewrite process by running {{nodetool 
> upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are 
> used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  
> What we observed both for version 2.2 and 3.0, is that initially all 4 
> provided slots are used for "Upgrade sstables" compactions, but later when 
> some of the 4 tasks are finished, no new tasks are scheduled immediately.  It 
> takes the last of the 4 tasks to finish before new 4 tasks would be 
> scheduled.  This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the 
> --jobs N parameter.  In the field, on a cluster of 12 nodes with 4-5 TiB data 
> each, we've seen that the whole process was taking more than 7 days, instead 
> of estimated 1.5-2 days (provided there would be close to full N slots 
> utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction 
> slot.
> Additionally, starting from the biggest SSTables could further reduce the 
> total time required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling

2018-02-13 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-14210:

Reproduced In: 3.0.15, 2.2.11  (was: 2.2.11, 3.0.15)
 Reviewer: Marcus Eriksson

> Optimize SSTables upgrade task scheduling
> -
>
> Key: CASSANDRA-14210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Oleksandr Shulgin
>Assignee: Kurt Greaves
>Priority: Major
> Fix For: 4.x
>
>
> When starting the SSTable-rewrite process by running {{nodetool 
> upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are 
> used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  
> What we observed both for version 2.2 and 3.0, is that initially all 4 
> provided slots are used for "Upgrade sstables" compactions, but later when 
> some of the 4 tasks are finished, no new tasks are scheduled immediately.  It 
> takes the last of the 4 tasks to finish before new 4 tasks would be 
> scheduled.  This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the 
> --jobs N parameter.  In the field, on a cluster of 12 nodes with 4-5 TiB data 
> each, we've seen that the whole process was taking more than 7 days, instead 
> of estimated 1.5-2 days (provided there would be close to full N slots 
> utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction 
> slot.
> Additionally, starting from the biggest SSTables could further reduce the 
> total time required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling

2018-02-12 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-14210:
---
Fix Version/s: 4.x

> Optimize SSTables upgrade task scheduling
> -
>
> Key: CASSANDRA-14210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Oleksandr Shulgin
>Assignee: Kurt Greaves
>Priority: Major
> Fix For: 4.x
>
>
> When starting the SSTable-rewrite process by running {{nodetool 
> upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are 
> used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  
> What we observed both for version 2.2 and 3.0, is that initially all 4 
> provided slots are used for "Upgrade sstables" compactions, but later when 
> some of the 4 tasks are finished, no new tasks are scheduled immediately.  It 
> takes the last of the 4 tasks to finish before new 4 tasks would be 
> scheduled.  This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the 
> --jobs N parameter.  In the field, on a cluster of 12 nodes with 4-5 TiB data 
> each, we've seen that the whole process was taking more than 7 days, instead 
> of estimated 1.5-2 days (provided there would be close to full N slots 
> utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction 
> slot.
> Additionally, starting from the biggest SSTables could further reduce the 
> total time required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14210) Optimize SSTables upgrade task scheduling

2018-02-12 Thread Kurt Greaves (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves updated CASSANDRA-14210:
-
Reproduced In: 3.0.15, 2.2.11  (was: 2.2.11, 3.0.15)
   Status: Patch Available  (was: In Progress)

[trunk|https://github.com/apache/cassandra/compare/trunk...kgreav:14210-trunk]

I've implemented the above, but it's not clear to me why starting with the 
largest SSTables would be faster. I think that we should do them in size order, 
to maximise benefit of any storage improvements from new storage formats, but 
this is really quite minor. I suppose it also makes the process easier to 
reason about from a user perspective, but really I can't see how it will make 
much of a difference.

> Optimize SSTables upgrade task scheduling
> -
>
> Key: CASSANDRA-14210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14210
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Oleksandr Shulgin
>Assignee: Kurt Greaves
>Priority: Major
>
> When starting the SSTable-rewrite process by running {{nodetool 
> upgradesstables --jobs N}}, with N > 1, not all of the provided N slots are 
> used.
> For example, we were testing with {{concurrent_compactors=5}} and {{N=4}}.  
> What we observed both for version 2.2 and 3.0, is that initially all 4 
> provided slots are used for "Upgrade sstables" compactions, but later when 
> some of the 4 tasks are finished, no new tasks are scheduled immediately.  It 
> takes the last of the 4 tasks to finish before new 4 tasks would be 
> scheduled.  This happens on every node we've observed.
> This doesn't utilize available resources to the full extent allowed by the 
> --jobs N parameter.  In the field, on a cluster of 12 nodes with 4-5 TiB data 
> each, we've seen that the whole process was taking more than 7 days, instead 
> of estimated 1.5-2 days (provided there would be close to full N slots 
> utilization).
> Instead, new tasks should be scheduled as soon as there is a free compaction 
> slot.
> Additionally, starting from the biggest SSTables could further reduce the 
> total time required for the whole process to finish on any given node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org