[ 
https://issues.apache.org/jira/browse/CASSANDRA-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-8571.
---------------------------------------
       Resolution: Duplicate
    Fix Version/s:     (was: 2.1.3)

> Free space management does not work very well
> ---------------------------------------------
>
>                 Key: CASSANDRA-8571
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8571
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Bartłomiej Romański
>
> Hi all,
> We've got a cluster of 2.1.2 with 18 nodes equipped with 3x 480GB SSD each 
> (JBODs). We mostly use LCS.
> Recently, our nodes starts failing with 'no space left on device'. It all 
> started with our mistake - we let our LCS accumulate too much in L0.
> As a result, STCS woke up and we end with some big sstables on each node 
> (let's say 5-10 sstables, 20-50gb each).
> During normal operation we keep our disks about 50% full. This gives about 
> 200 GB free space on each of them. This was too little for compacting all 
> accumulated L0 sstables at once. Cassandra kept trying to do that and keep 
> failing...
> Evantually, we managed to stabilized the situation (with some crazy code 
> hacking, manually moving sstables etc...). However, there are a few things 
> that would be more than helpful in recovering from such situations more 
> automatically... 
> First, please look at DiskAwareRunnable.runMayThrow(). This methods initiates 
> (local) variable: writeSize. I believe we should check somewhere here if we 
> have enough space on a chosen disk. The problem is that writeSize is never 
> read... Am I missing something here?
> Btw, while in STCS we first look for the least overloaded disk, and then (if 
> there are more than one such disks) for the one with the most free space 
> (please note the sort order in Directories.getWriteableLocation()). That's 
> often suboptimal (it's usually better to wait for the bigger disk than to 
> compact fewer sstables now), but probably not crucial.
> Second, the strategy (used by LCS) that we first choose target disk and then 
> use it for whole compaction is not the best one. For big compactions (eg. 
> after some massive operations like bootstrap or repair; or after some issues 
> with LCS like in our case) on small drives (eg. JBOD of SSDs) these will 
> never succeed. Much better strategy would be to choose target drive for each 
> output sstable separately, or at least round robin them.
> Third, it would be helpful if the default check for MAX_COMPACTING_L0 in 
> LeveledManifest.getCandidatesFor() would be expanded to support also limit 
> for total space. After fallback STCS in L0 you end up with very big sstables 
> and 32 of them is just too much for one compaction on a small drives.
> We finally used some hack similar the last option (as it was the easiest one 
> to implement in a hurry), but any improvents described above would save us 
> from all this.
> Thanks,
> BR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to