[jira] [Commented] (CASSANDRA-10971) Compressed commit log has no backpressure and can OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197690#comment-15197690 ] Benjamin Lerer commented on CASSANDRA-10971: I ran the tests on CI for 3.5 and they were flapping. I add a look at the tests and found 2 problems: * Some tests were using {{new CommitLog()}} with the same directory that {{CommitLog.INSTANCE}} which was causing 2 commit log instance to run at the same time with 2 differents configurations. This was resulting on some commit log files not being deleted for the {{replay_StandardMmapped}} test. * As the unit tests are run in random orders the configuration changes made by the compression and encrytion tests were affecting other test when {{resetUnsafe}} was used. I fixed the problems by using only {{CommitLog.INSTANCE}} in all the tests and restoring the initial configuration parameters after each test that was modifying them. Ran the test on CI and it looks that we are good to go. \o/ Thanks for all the work [~aweisberg] > Compressed commit log has no backpressure and can OOM > - > > Key: CASSANDRA-10971 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10971 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 3.0.x, 3.x > > > I validated this via a unit test that slowed the ability of the log to drain > to the filesystem. The compressed commit log will keep allocating buffers > pending compression until it OOMs. > I have a fix that am not very happy with because the whole signal a thread to > allocate a segment that depends on a resource that may not be available > results in some obtuse usage of {{CompleatableFuture}} to rendezvous > available buffers with {{CommitLogSegmentManager}} thread waiting to finish > constructing a new segment. The {{CLSM}} thread is in turn signaled by the > thread(s) that actually wants to write to the next segment, but aren't able > to do it themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10971) Compressed commit log has no backpressure and can OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195790#comment-15195790 ] Ariel Weisberg commented on CASSANDRA-10971: I'll take a look at it. It's not passing on OS X on trunk for me at all. It does pass on Linux. > Compressed commit log has no backpressure and can OOM > - > > Key: CASSANDRA-10971 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10971 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 3.0.x, 3.x > > > I validated this via a unit test that slowed the ability of the log to drain > to the filesystem. The compressed commit log will keep allocating buffers > pending compression until it OOMs. > I have a fix that am not very happy with because the whole signal a thread to > allocate a segment that depends on a resource that may not be available > results in some obtuse usage of {{CompleatableFuture}} to rendezvous > available buffers with {{CommitLogSegmentManager}} thread waiting to finish > constructing a new segment. The {{CLSM}} thread is in turn signaled by the > thread(s) that actually wants to write to the next segment, but aren't able > to do it themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10971) Compressed commit log has no backpressure and can OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195040#comment-15195040 ] Benjamin Lerer commented on CASSANDRA-10971: [~aweisberg] Sorry, I missed the lat ticket updates. I am +1 on the patch. I am only having an issue with {{org.apache.cassandra.db.commitlog.CommitLogTest.replay_Encrypted}} it always timeout on CI and fail on my machine. I do not think that the patch is the reason for the problem but I will be more confident if the test was passing. Does it work on your machine? > Compressed commit log has no backpressure and can OOM > - > > Key: CASSANDRA-10971 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10971 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 3.0.x, 3.x > > > I validated this via a unit test that slowed the ability of the log to drain > to the filesystem. The compressed commit log will keep allocating buffers > pending compression until it OOMs. > I have a fix that am not very happy with because the whole signal a thread to > allocate a segment that depends on a resource that may not be available > results in some obtuse usage of {{CompleatableFuture}} to rendezvous > available buffers with {{CommitLogSegmentManager}} thread waiting to finish > constructing a new segment. The {{CLSM}} thread is in turn signaled by the > thread(s) that actually wants to write to the next segment, but aren't able > to do it themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10971) Compressed commit log has no backpressure and can OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172141#comment-15172141 ] Ariel Weisberg commented on CASSANDRA-10971: Ooops. I guess you don't need to since the decrement is associated with a wakeup anyways. > Compressed commit log has no backpressure and can OOM > - > > Key: CASSANDRA-10971 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10971 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 3.0.x, 3.x > > > I validated this via a unit test that slowed the ability of the log to drain > to the filesystem. The compressed commit log will keep allocating buffers > pending compression until it OOMs. > I have a fix that am not very happy with because the whole signal a thread to > allocate a segment that depends on a resource that may not be available > results in some obtuse usage of {{CompleatableFuture}} to rendezvous > available buffers with {{CommitLogSegmentManager}} thread waiting to finish > constructing a new segment. The {{CLSM}} thread is in turn signaled by the > thread(s) that actually wants to write to the next segment, but aren't able > to do it themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10971) Compressed commit log has no backpressure and can OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172127#comment-15172127 ] Ariel Weisberg commented on CASSANDRA-10971: That would work. You just need to add a poke to the CLSM thread when decrementing the counter otherwise it won't know it can create the segment now. I'll get that done. > Compressed commit log has no backpressure and can OOM > - > > Key: CASSANDRA-10971 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10971 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 3.0.x, 3.x > > > I validated this via a unit test that slowed the ability of the log to drain > to the filesystem. The compressed commit log will keep allocating buffers > pending compression until it OOMs. > I have a fix that am not very happy with because the whole signal a thread to > allocate a segment that depends on a resource that may not be available > results in some obtuse usage of {{CompleatableFuture}} to rendezvous > available buffers with {{CommitLogSegmentManager}} thread waiting to finish > constructing a new segment. The {{CLSM}} thread is in turn signaled by the > thread(s) that actually wants to write to the next segment, but aren't able > to do it themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10971) Compressed commit log has no backpressure and can OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158576#comment-15158576 ] Benjamin Lerer commented on CASSANDRA-10971: Could we not just keep your latest design but track the number of buffers in use and use it has a limit? Something like [this|https://github.com/apache/cassandra/compare/trunk...blerer:10971-trunk]. {quote}As a nit, I think it might be safer to use an instanceof FileDirectSegment for enforceSegmentLimit in trunk. In case somebody decide to add a new sub-class to FileDirectSegment{quote} Forget about that, I did not read the code properly. > Compressed commit log has no backpressure and can OOM > - > > Key: CASSANDRA-10971 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10971 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 3.0.x, 3.x > > > I validated this via a unit test that slowed the ability of the log to drain > to the filesystem. The compressed commit log will keep allocating buffers > pending compression until it OOMs. > I have a fix that am not very happy with because the whole signal a thread to > allocate a segment that depends on a resource that may not be available > results in some obtuse usage of {{CompleatableFuture}} to rendezvous > available buffers with {{CommitLogSegmentManager}} thread waiting to finish > constructing a new segment. The {{CLSM}} thread is in turn signaled by the > thread(s) that actually wants to write to the next segment, but aren't able > to do it themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10971) Compressed commit log has no backpressure and can OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157763#comment-15157763 ] Ariel Weisberg commented on CASSANDRA-10971: I think that brings us back to where we started with the original design that asynchronously supplies the buffer when it becomes available. I think I can do it without all the {{Future}}s nonsense by poking the CLSM thread when the buffer becomes available. I need to update that version anyways because of the changes that occurred since this ticket was started. > Compressed commit log has no backpressure and can OOM > - > > Key: CASSANDRA-10971 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10971 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 3.0.x, 3.x > > > I validated this via a unit test that slowed the ability of the log to drain > to the filesystem. The compressed commit log will keep allocating buffers > pending compression until it OOMs. > I have a fix that am not very happy with because the whole signal a thread to > allocate a segment that depends on a resource that may not be available > results in some obtuse usage of {{CompleatableFuture}} to rendezvous > available buffers with {{CommitLogSegmentManager}} thread waiting to finish > constructing a new segment. The {{CLSM}} thread is in turn signaled by the > thread(s) that actually wants to write to the next segment, but aren't able > to do it themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10971) Compressed commit log has no backpressure and can OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157651#comment-15157651 ] Benjamin Lerer commented on CASSANDRA-10971: If I am not mistaken, a {{CompressedSegment}} or {{FileDirectSegment}} will release its buffer once it has been fully written to the disk whereas segments will stay active until they are recycled. By consequence, it might be better to use as limit the number of non fully written segments rather than the number of active ones. It seems that it could be done by counting the number of available segments which have a non-null buffer. As a nit, I think it might be safer to use an {{instanceof FileDirectSegment}} for {{enforceSegmentLimit}} in trunk. In case somebody decide to add a new sub-class to {{FileDirectSegment}} > Compressed commit log has no backpressure and can OOM > - > > Key: CASSANDRA-10971 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10971 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 3.0.x, 3.x > > > I validated this via a unit test that slowed the ability of the log to drain > to the filesystem. The compressed commit log will keep allocating buffers > pending compression until it OOMs. > I have a fix that am not very happy with because the whole signal a thread to > allocate a segment that depends on a resource that may not be available > results in some obtuse usage of {{CompleatableFuture}} to rendezvous > available buffers with {{CommitLogSegmentManager}} thread waiting to finish > constructing a new segment. The {{CLSM}} thread is in turn signaled by the > thread(s) that actually wants to write to the next segment, but aren't able > to do it themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10971) Compressed commit log has no backpressure and can OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153110#comment-15153110 ] Ariel Weisberg commented on CASSANDRA-10971: Pushed an updated and much more succinct version. Tests are running now. > Compressed commit log has no backpressure and can OOM > - > > Key: CASSANDRA-10971 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10971 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 3.x > > > I validated this via a unit test that slowed the ability of the log to drain > to the filesystem. The compressed commit log will keep allocating buffers > pending compression until it OOMs. > I have a fix that am not very happy with because the whole signal a thread to > allocate a segment that depends on a resource that may not be available > results in some obtuse usage of {{CompleatableFuture}} to rendezvous > available buffers with {{CommitLogSegmentManager}} thread waiting to finish > constructing a new segment. The {{CLSM}} thread is in turn signaled by the > thread(s) that actually wants to write to the next segment, but aren't able > to do it themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10971) Compressed commit log has no backpressure and can OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152981#comment-15152981 ] Ariel Weisberg commented on CASSANDRA-10971: The memory mapped implementation doesn't need/want to bound the number of buffers in flight. Backpressure comes from the operating system which will block writer threads when there isn't enough free memory to buffer writes. You are right that this would be simpler if the {{CLSM}} maintained the bound. It's already being woken up every time a segment is discarded. I'll rewrite it that way. I'll only have it bound if there is comrpression. > Compressed commit log has no backpressure and can OOM > - > > Key: CASSANDRA-10971 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10971 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 3.x > > > I validated this via a unit test that slowed the ability of the log to drain > to the filesystem. The compressed commit log will keep allocating buffers > pending compression until it OOMs. > I have a fix that am not very happy with because the whole signal a thread to > allocate a segment that depends on a resource that may not be available > results in some obtuse usage of {{CompleatableFuture}} to rendezvous > available buffers with {{CommitLogSegmentManager}} thread waiting to finish > constructing a new segment. The {{CLSM}} thread is in turn signaled by the > thread(s) that actually wants to write to the next segment, but aren't able > to do it themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10971) Compressed commit log has no backpressure and can OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150187#comment-15150187 ] Benjamin Lerer commented on CASSANDRA-10971: My understanding of the problem is that in the case where the commit log cannot flush to the disk fast enough, due to the compression overhead, the {{CommitLogSegmentManager}} will keep on creating new {{CompressedSegments}}. As each of those segments will use a new buffer (the ones of the pool being all in use), Cassandra can run out of memory. Will it not be simpler to add backpressure by limiting the number of active segments? What I mean is, if the {{CommitLogSegmentManager}} stops allocating new segments once a certain number of active segments has been reached, it will make the {{CommitLog.add}} method blocking until some segments have been reclaimed. It seems to me that, even in the case of {{MemoryMappedSegment}}, we should be able to apply back pressure, if the disk cannot handle the load. Am I wrong on that? As I am not a CommitLog expert I might have missed something. > Compressed commit log has no backpressure and can OOM > - > > Key: CASSANDRA-10971 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10971 > Project: Cassandra > Issue Type: Improvement > Components: Local Write-Read Paths >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 3.x > > > I validated this via a unit test that slowed the ability of the log to drain > to the filesystem. The compressed commit log will keep allocating buffers > pending compression until it OOMs. > I have a fix that am not very happy with because the whole signal a thread to > allocate a segment that depends on a resource that may not be available > results in some obtuse usage of {{CompleatableFuture}} to rendezvous > available buffers with {{CommitLogSegmentManager}} thread waiting to finish > constructing a new segment. The {{CLSM}} thread is in turn signaled by the > thread(s) that actually wants to write to the next segment, but aren't able > to do it themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10971) Compressed commit log has no backpressure and can OOM
[ https://issues.apache.org/jira/browse/CASSANDRA-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15086097#comment-15086097 ] Ariel Weisberg commented on CASSANDRA-10971: |[trunk code|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-10971-trunk?expand=1]|[utest|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10971-trunk-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10971-trunk-dtest/]| |[3.0 code|https://github.com/apache/cassandra/compare/cassandra-3.0...aweisberg:CASSANDRA-10971-3.0?expand=1]|[utest|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10971-3.0-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10971-3.0-dtest/]| |[2.2 code|https://github.com/apache/cassandra/compare/cassandra-2.2...aweisberg:CASSANDRA-10971-2.2?expand=1]|[utest|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10971-2.2-testall/]|[dtest|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10971-2.2-dtest/]| > Compressed commit log has no backpressure and can OOM > - > > Key: CASSANDRA-10971 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10971 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg > Fix For: 2.2.x, 3.0.x, 3.x > > > I validated this via a unit test that slowed the ability of the log to drain > to the filesystem. The compressed commit log will keep allocating buffers > pending compression until it OOMs. > I have a fix that am not very happy with because the whole signal a thread to > allocate a segment that depends on a resource that may not be available > results in some obtuse usage of {{CompleatableFuture}} to rendezvous > available buffers with {{CommitLogSegmentManager}} thread waiting to finish > constructing a new segment. The {{CLSM}} thread is in turn signaled by the > thread(s) that actually wants to write to the next segment, but aren't able > to do it themselves. -- This message was sent by Atlassian JIRA (v6.3.4#6332)