[ 
https://issues.apache.org/jira/browse/CASSANDRA-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-8868:
----------------------------------
    Description: 
I'd like to propose a new compaction strategy targeting JBOD configurations.  I 
believe this strategy would be most useful to machines with 4+ spinning disks 
but would also see some benefit on SSDs.

There are a several goals with this strategy: 

1. Minimize disk seeks during the compaction process
2. Maximize disk throughput when multiple disks are present
3. Better data distribution across disks.  Data should automatically be 
balanced (applies when adding a new, empty disk)

When a compaction is to occur, the algorithm first selects a disk to be the 
receiver.  This disk should be the one with the most free space.  The disks 
with the least free space should then be chosen as the origin disks.  SStables 
are selected from the origin disks, and compacted to the receiver.  This should 
both minimize seeks as well as auto balance data across disks.

I'm not sure if this would apply to leveled compaction, but it may apply to 
date tiered.

  was:
I'd like to propose a new compaction strategy targeting JBOD configurations.  I 
believe this strategy would be most useful to machines with 4+ spinning disks 
but would also see some benefit on SSDs.

There are a several goals with this strategy: 

1. Minimize disk seeks during the compaction process
2. Maximize disk throughput when multiple disks are present
3. Better data distribution across disks.  Data should automatically be 
balanced (applies when adding a new, empty disk)

When a compaction is to occur, the algorithm first selects a disk to be the 
receiver.  This disk should be the one with the most free space.  The fullest 
disks are chosen to be the origin.  SStables are selected from the origin 
disks, and compacted onto the receiver.  This should both minimize seeks as 
well as auto balance.  

I'm not sure if this would apply to leveled compaction, but it may apply to 
date tiered.


> JBOD Aware Size Tiered Compaction Strategy
> ------------------------------------------
>
>                 Key: CASSANDRA-8868
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8868
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jon Haddad
>            Priority: Minor
>         Attachments: jbod_aware.png
>
>
> I'd like to propose a new compaction strategy targeting JBOD configurations.  
> I believe this strategy would be most useful to machines with 4+ spinning 
> disks but would also see some benefit on SSDs.
> There are a several goals with this strategy: 
> 1. Minimize disk seeks during the compaction process
> 2. Maximize disk throughput when multiple disks are present
> 3. Better data distribution across disks.  Data should automatically be 
> balanced (applies when adding a new, empty disk)
> When a compaction is to occur, the algorithm first selects a disk to be the 
> receiver.  This disk should be the one with the most free space.  The disks 
> with the least free space should then be chosen as the origin disks.  
> SStables are selected from the origin disks, and compacted to the receiver.  
> This should both minimize seeks as well as auto balance data across disks.
> I'm not sure if this would apply to leveled compaction, but it may apply to 
> date tiered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to