[jira] [Updated] (CASSANDRA-8571) Free space management does not work very well

2015-01-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bartłomiej Romański updated CASSANDRA-8571:
---
Description: 
Hi all,

We've got a cluster of 2.1.2 with 18 nodes equipped with 3x 480GB SSD each 
(JBODs). We mostly use LCS.

Recently, our nodes starts failing with 'no space left on device'. It all 
started with our mistake - we let our LCS accumulate too much in L0.

As a result, STCS woke up and we end with some big sstables on each node (let's 
say 5-10 sstables, 20-50gb each).

During normal operation we keep our disks about 50% full. This gives about 200 
GB free space on each of them. This was too little for compacting all 
accumulated L0 sstables at once. Cassandra kept trying to do that and keep 
failing...

Evantually, we managed to stabilized the situation (with some crazy code 
hacking, manually moving sstables etc...). However, there are a few things that 
would be more than helpful in recovering from such situations more 
automatically... 

First, please look at DiskAwareRunnable.runMayThrow(). This methods initiates 
(local) variable: writeSize. I believe we should check somewhere here if we 
have enough space on a chosen disk. The problem is that writeSize is never 
read... Am I missing something here?

Btw, while in STCS we first look for the least overloaded disk, and then (if 
there are more than one such disks) for the one with the most free space 
(please note the sort order in Directories.getWriteableLocation()). That's 
often suboptimal (it's usually better to wait for the bigger disk than to 
compact fewer sstables now), but probably not crucial.

Second, the strategy (used by LCS) that we first choose target disk and then 
use it for whole compaction is not the best one. For big compactions (eg. after 
some massive operations like bootstrap or repair; or after some issues with LCS 
like in our case) on small drives (eg. JBOD of SSDs) these will never succeed. 
Much better strategy would be to choose target drive for each output sstable 
separately, or at least round robin them.

Third, it would be helpful if the default check for MAX_COMPACTING_L0 in 
LeveledManifest.getCandidatesFor() would be expanded to support also limit for 
total space. After fallback STCS in L0 you end up with very big sstables and 32 
of them is just too much for one compaction on a small drives.

We finally used some hack similar the last option (as it was the easiest one to 
implement in a hurry), but any improvents described above would save us from 
all this.

Thanks,
BR


  was:
Hi all,

We've got a cluster of 2.1.2 with 18 nodes equipped with 3x 480GB SSD each 
(JBODs). We mostly use LCS.

Recently, our nodes starts failing with 'no space left on device'. It all 
started with our mistake - we let our LCS accumulate too much in L0.

As a result, STCS woke up and we end with some big sstables on each node (let's 
say 5-10 sstables, 20-50gb each).

During normal operation we keep our disks about 50% full. This gives about 200 
GB free space on each of them. This was too little for compacting all 
accumulated L0 sstables at once. Cassandra kept trying to do that and keep 
failing...

Evantually, we managed to stabilized the situation (with some crazy code 
hacking, manually moving sstables etc...). However, there are a few things that 
would be more than helpful in recovering from such situations more 
automatically... 

First, please look at DiskAwareRunnable.runMayThrow(). This methods initiates 
(local) variable: writeSize. I believe we should check somewhere here if we 
have enough space on a chosen disk. The problem is that writeSize is never 
read... Am I missing something here?

Btw, while in STCS we first look for the least overloaded disk, and then (if 
there are more than one such disks) for the one with the most free space 
(please note the sort order in Directories.getWriteableLocation()). That's 
often suboptimal (it's usually better to wait for the bigger disk than to 
compact fewer sstables now), but probably not crucial.

Second, the strategy (used by LCS) that we first choose target disk and then 
use it for whole compaction is not the best one. For big compactions (eg. after 
some massive operations like bootstrap or repair; or after some issues with LCS 
like in our case) on small drives (eg. JBOD of SSDs) these will never succeed. 
Much better strategy would be to choose target drive for each output sstable 
separately, or at least round robin them.

Third, it would be helpful if the default check for MAX_COMPACTING_L0 in 
LeveledManifest.getCandidatesFor() would be expanded to support also limit for 
total space. After fallback STCS in L0 you end up with very big sstables an 32 
of them is just too much for one compaction on a small drives.

We finally used some hack similar the last option (as it was the easiest one to 
implement in a hurry), but any 

[jira] [Updated] (CASSANDRA-8571) Free space management does not work very well

2015-01-06 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-8571:
---
Reproduced In: 2.1.2
Fix Version/s: 2.1.3

 Free space management does not work very well
 -

 Key: CASSANDRA-8571
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8571
 Project: Cassandra
  Issue Type: Bug
Reporter: Bartłomiej Romański
 Fix For: 2.1.3


 Hi all,
 We've got a cluster of 2.1.2 with 18 nodes equipped with 3x 480GB SSD each 
 (JBODs). We mostly use LCS.
 Recently, our nodes starts failing with 'no space left on device'. It all 
 started with our mistake - we let our LCS accumulate too much in L0.
 As a result, STCS woke up and we end with some big sstables on each node 
 (let's say 5-10 sstables, 20-50gb each).
 During normal operation we keep our disks about 50% full. This gives about 
 200 GB free space on each of them. This was too little for compacting all 
 accumulated L0 sstables at once. Cassandra kept trying to do that and keep 
 failing...
 Evantually, we managed to stabilized the situation (with some crazy code 
 hacking, manually moving sstables etc...). However, there are a few things 
 that would be more than helpful in recovering from such situations more 
 automatically... 
 First, please look at DiskAwareRunnable.runMayThrow(). This methods initiates 
 (local) variable: writeSize. I believe we should check somewhere here if we 
 have enough space on a chosen disk. The problem is that writeSize is never 
 read... Am I missing something here?
 Btw, while in STCS we first look for the least overloaded disk, and then (if 
 there are more than one such disks) for the one with the most free space 
 (please note the sort order in Directories.getWriteableLocation()). That's 
 often suboptimal (it's usually better to wait for the bigger disk than to 
 compact fewer sstables now), but probably not crucial.
 Second, the strategy (used by LCS) that we first choose target disk and then 
 use it for whole compaction is not the best one. For big compactions (eg. 
 after some massive operations like bootstrap or repair; or after some issues 
 with LCS like in our case) on small drives (eg. JBOD of SSDs) these will 
 never succeed. Much better strategy would be to choose target drive for each 
 output sstable separately, or at least round robin them.
 Third, it would be helpful if the default check for MAX_COMPACTING_L0 in 
 LeveledManifest.getCandidatesFor() would be expanded to support also limit 
 for total space. After fallback STCS in L0 you end up with very big sstables 
 and 32 of them is just too much for one compaction on a small drives.
 We finally used some hack similar the last option (as it was the easiest one 
 to implement in a hurry), but any improvents described above would save us 
 from all this.
 Thanks,
 BR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)