[jira] [Commented] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

2017-04-07 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960403#comment-15960403
 ] 

Jan Karlsson commented on CASSANDRA-13354:
--

yes small change lgtm

> LCS estimated compaction tasks does not take number of files into account
> -
>
> Key: CASSANDRA-13354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 2.2.9
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png
>
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
> taking the size of a SSTable and multiply it by four. This would give 4*160mb 
> with default settings. This calculation is used to determine whether repaired 
> or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over 
> many many sstables which could be smaller than the configured SSTable size 
> depending on your use case. In our case we are talking about many thousands 
> of tiny SSTables. As number of files increases one can run into any number of 
> problems, including GC issues, too many open files or plain increase in read 
> latency.
> With the current algorithm we will choose repaired or unrepaired depending on 
> whichever side has more data in it. Even if the repaired files outnumber the 
> unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 
> SSTables at a time in L0, however our estimated task calculation does not 
> take this number into account. These two mechanisms should be aligned with 
> each other.
> I propose that we take the number of files in L0 into account when estimating 
> remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

2017-04-07 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960380#comment-15960380
 ] 

Marcus Eriksson commented on CASSANDRA-13354:
-

patch lgtm, I just pushed a tiny nit here: 
https://github.com/krummas/cassandra/commits/13354

I'm running dtests, not unlikely that some test relies on the old calculations, 
will commit if tests look good and you agree with my small change [~Jan 
Karlsson]

> LCS estimated compaction tasks does not take number of files into account
> -
>
> Key: CASSANDRA-13354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 2.2.9
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png
>
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
> taking the size of a SSTable and multiply it by four. This would give 4*160mb 
> with default settings. This calculation is used to determine whether repaired 
> or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over 
> many many sstables which could be smaller than the configured SSTable size 
> depending on your use case. In our case we are talking about many thousands 
> of tiny SSTables. As number of files increases one can run into any number of 
> problems, including GC issues, too many open files or plain increase in read 
> latency.
> With the current algorithm we will choose repaired or unrepaired depending on 
> whichever side has more data in it. Even if the repaired files outnumber the 
> unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 
> SSTables at a time in L0, however our estimated task calculation does not 
> take this number into account. These two mechanisms should be aligned with 
> each other.
> I propose that we take the number of files in L0 into account when estimating 
> remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

2017-04-06 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959691#comment-15959691
 ] 

Joshua McKenzie commented on CASSANDRA-13354:
-

[~krummas]: have bandwidth for review on this one?

> LCS estimated compaction tasks does not take number of files into account
> -
>
> Key: CASSANDRA-13354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 2.2.9
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png
>
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
> taking the size of a SSTable and multiply it by four. This would give 4*160mb 
> with default settings. This calculation is used to determine whether repaired 
> or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over 
> many many sstables which could be smaller than the configured SSTable size 
> depending on your use case. In our case we are talking about many thousands 
> of tiny SSTables. As number of files increases one can run into any number of 
> problems, including GC issues, too many open files or plain increase in read 
> latency.
> With the current algorithm we will choose repaired or unrepaired depending on 
> whichever side has more data in it. Even if the repaired files outnumber the 
> unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 
> SSTables at a time in L0, however our estimated task calculation does not 
> take this number into account. These two mechanisms should be aligned with 
> each other.
> I propose that we take the number of files in L0 into account when estimating 
> remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

2017-03-23 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939107#comment-15939107
 ] 

Jan Karlsson commented on CASSANDRA-13354:
--

I did some tests simulating traffic on a 4 node cluster. 2 of the nodes were 
running with my patch while the other two ran without it.
Steps to reproduce:
Traffic on
Turn one of the nodes off
Wait 7 minutes
Truncate hints on all other nodes
Turn node on
Run repair on the node

As you can see the unpatched version kept increasing as non-repaired data from 
ongoing traffic was prioritized. If I had more discrepancies in my data set, 
this would just increase to the configured FD limit or until you die from heap 
pressure.

Repair is completed at 8:11pm but those small repaired files are not compacted 
as it picks unrepaired new sstables over the small repaired sstables. However, 
it did show a downwards trend as compaction was slightly faster than insertion 
and would probably eventually end with the repaired files compacted.

During the unpatched test, it only showed 2 pending compactions with 22k~ file 
descriptors open/10k~ sstables. At 8:33pm I disabled the traffic completely to 
hurry this along.
SSTables in each level: [10347/4, 5, 0, 0, 0, 0, 0, 0, 0]

> LCS estimated compaction tasks does not take number of files into account
> -
>
> Key: CASSANDRA-13354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 2.2.9
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png
>
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
> taking the size of a SSTable and multiply it by four. This would give 4*160mb 
> with default settings. This calculation is used to determine whether repaired 
> or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over 
> many many sstables which could be smaller than the configured SSTable size 
> depending on your use case. In our case we are talking about many thousands 
> of tiny SSTables. As number of files increases one can run into any number of 
> problems, including GC issues, too many open files or plain increase in read 
> latency.
> With the current algorithm we will choose repaired or unrepaired depending on 
> whichever side has more data in it. Even if the repaired files outnumber the 
> unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 
> SSTables at a time in L0, however our estimated task calculation does not 
> take this number into account. These two mechanisms should be aligned with 
> each other.
> I propose that we take the number of files in L0 into account when estimating 
> remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)