[jira] [Commented] (CASSANDRA-13299) Potential OOMs and lock contention in write path streams
[ https://issues.apache.org/jira/browse/CASSANDRA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184270#comment-16184270 ] ZhaoYang commented on CASSANDRA-13299: -- Thanks for reviewing~ > Potential OOMs and lock contention in write path streams > > > Key: CASSANDRA-13299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13299 > Project: Cassandra > Issue Type: Improvement > Components: Materialized Views >Reporter: Benjamin Roth >Assignee: ZhaoYang > Fix For: 4.0 > > > I see a potential OOM, when a stream (e.g. repair) goes through the write > path as it is with MVs. > StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators > and they again produce mutations. So every partition creates a single > mutation, which in case of (very) big partitions can result in (very) big > mutations. Those are created on heap and stay there until they finished > processing. > I don't think it is necessary to create a single mutation for each partition. > Why don't we implement a PartitionUpdateGeneratorIterator that takes a > UnfilteredRowIterator and a max size and spits out PartitionUpdates to be > used to create and apply mutations? > The max size should be something like min(reasonable_absolute_max_size, > max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size > could be like 16M or sth. > A mutation shouldn't be too large as it also affects MV partition locking. > The longer a MV partition is locked during a stream, the higher chances are > that WTE's occur during streams. > I could also imagine that a max number of updates per mutation regardless of > size in bytes could make sense to avoid lock contention. > Love to get feedback and suggestions, incl. naming suggestions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13299) Potential OOMs and lock contention in write path streams
[ https://issues.apache.org/jira/browse/CASSANDRA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183862#comment-16183862 ] ZhaoYang commented on CASSANDRA-13299: -- [Utest|http://jenkins-cassandra.datastax.lan/view/Dev/view/jasonstack/job/jasonstack-CASSANDRA-13299-trunk-testall/lastCompletedBuild/testReport/]: 3 failed, passed on local [Dtest|http://jenkins-cassandra.datastax.lan/view/Dev/view/jasonstack/job/jasonstack-CASSANDRA-13299-trunk-dtest/lastCompletedBuild/testReport/]: either passed on local or failed on trunk > Potential OOMs and lock contention in write path streams > > > Key: CASSANDRA-13299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13299 > Project: Cassandra > Issue Type: Improvement > Components: Materialized Views >Reporter: Benjamin Roth >Assignee: ZhaoYang > Fix For: 4.x > > > I see a potential OOM, when a stream (e.g. repair) goes through the write > path as it is with MVs. > StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators > and they again produce mutations. So every partition creates a single > mutation, which in case of (very) big partitions can result in (very) big > mutations. Those are created on heap and stay there until they finished > processing. > I don't think it is necessary to create a single mutation for each partition. > Why don't we implement a PartitionUpdateGeneratorIterator that takes a > UnfilteredRowIterator and a max size and spits out PartitionUpdates to be > used to create and apply mutations? > The max size should be something like min(reasonable_absolute_max_size, > max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size > could be like 16M or sth. > A mutation shouldn't be too large as it also affects MV partition locking. > The longer a MV partition is locked during a stream, the higher chances are > that WTE's occur during streams. > I could also imagine that a max number of updates per mutation regardless of > size in bytes could make sense to avoid lock contention. > Love to get feedback and suggestions, incl. naming suggestions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13299) Potential OOMs and lock contention in write path streams
[ https://issues.apache.org/jira/browse/CASSANDRA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16183689#comment-16183689 ] ZhaoYang commented on CASSANDRA-13299: -- Thanks for the fix. bq. Could you also modify complexThrottleWithTombstoneTest to test range deletions? Added. bq. I think that instead of throwing an AssertionError when the returned iterator is not exhausted, we could simply exhaust it +1 bq. Right now we're verifying the results with all the nodes UP, but it's possible that another node responds the query even though one of the inconsistent nodes did not stream correctly. I think we should check the results on each node individually (with the others down) to ensure they streamed data correctly from other nodes. bq. Add range deletions since that's when the range tombstones special cases will be properly exercised. Added. > Potential OOMs and lock contention in write path streams > > > Key: CASSANDRA-13299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13299 > Project: Cassandra > Issue Type: Improvement > Components: Materialized Views >Reporter: Benjamin Roth >Assignee: ZhaoYang > Fix For: 4.x > > > I see a potential OOM, when a stream (e.g. repair) goes through the write > path as it is with MVs. > StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators > and they again produce mutations. So every partition creates a single > mutation, which in case of (very) big partitions can result in (very) big > mutations. Those are created on heap and stay there until they finished > processing. > I don't think it is necessary to create a single mutation for each partition. > Why don't we implement a PartitionUpdateGeneratorIterator that takes a > UnfilteredRowIterator and a max size and spits out PartitionUpdates to be > used to create and apply mutations? > The max size should be something like min(reasonable_absolute_max_size, > max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size > could be like 16M or sth. > A mutation shouldn't be too large as it also affects MV partition locking. > The longer a MV partition is locked during a stream, the higher chances are > that WTE's occur during streams. > I could also imagine that a max number of updates per mutation regardless of > size in bytes could make sense to avoid lock contention. > Love to get feedback and suggestions, incl. naming suggestions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13299) Potential OOMs and lock contention in write path streams
[ https://issues.apache.org/jira/browse/CASSANDRA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180923#comment-16180923 ] Paulo Motta commented on CASSANDRA-13299: - Thanks for the updates, this is looking good! I managed to reproduce an OOM when repairing a wide partition with 100K rows and verified that this patch avoids the OOM by splitting the partition in multiple batches (found CASSANDRA-13899 on the way). Awesome job! While splitting on the happy case is working nicely, we need to ensure the range tombstone handling (specially range deletions) is working correctly and well tested before committing this. I noticed that the previous {{ThrottledUnfilteredIterator}} implementation could [potentially return|https://github.com/jasonstack/cassandra/blob/b8cb49035d3cf77198b31df7c51a174fffe3edaf/src/java/org/apache/cassandra/db/rows/ThrottledUnfilteredIterator.java#L166] {{throttle+2}} unfiltereds, differently from the documentation which states that the maximum number of unfiltereds per batch is {{throttle+1}}. I also noticed that the case when there is a row between two markers was not being tested by existing tests, since we need a range deletion to reproduce this scenario. I fixed this and added more tests on [this commit|https://github.com/pauloricardomg/cassandra/commit/47d8ca3592cb6382bb4c308720646395306a0a69]. Could you also modify [complexThrottleWithTombstoneTest|https://github.com/jasonstack/cassandra/commit/b8cb49035d3cf77198b31df7c51a174fffe3edaf#diff-5162644c24391628b339b88c3619427cR66] to test range deletions? The previous change [requires|https://github.com/pauloricardomg/cassandra/commit/47d8ca3592cb6382bb4c308720646395306a0a69#diff-2acee8fea5cd82a51fda4af6e38faf13R60] the minimum throttle size to be 2, otherwise it would not be possible to make progress on the iterator in the presence of open and close markers. I think that instead of throwing an {{AssertionError}} when the returned iterator is not exhausted, we could simply exhaust it, effectively skipping entries, since this might be a possible usage of {{ThrottledUnfilteredIterator}} so I did this on [this commit|https://github.com/pauloricardomg/cassandra/commit/04ed5ecb5183195601950fc9efd2ca9123596487]. I also added an utility method {{ThrottledUnfilteredIterator.throttle(UnfilteredPartitionIterator partitionIterator, int maxBatchSize)}} to allow throttling an {{UnfilteredPartitionIterator}} transparently and used that on {{StreamReceiveTask}} [on this commit|https://github.com/pauloricardomg/cassandra/commit/4f8c3b8faa2644133d301ac7bf7b748f7ec265ee]. I had another look at the {{throttled_partition_update_test}} [dtest|https://github.com/riptano/cassandra-dtest/commit/f3307adef349f232ec0ae64e902164684f32cca0] and think we can make the following improvements: * Right now we're [verifying the results|https://github.com/riptano/cassandra-dtest/commit/f3307adef349f232ec0ae64e902164684f32cca0#diff-62ba429edee6a4681782f078246c9893R1410] with all the nodes UP, but it's possible that another node responds the query even though one of the inconsistent nodes did not stream correctly. I think we should check the results on each node individually (with the others down) to ensure they streamed data correctly from other nodes. * Add [range deletions|https://issues.apache.org/jira/browse/CASSANDRA-6237] since that's when the range tombstones special cases will be properly exercised. Please let me know what do you think about these suggestions. > Potential OOMs and lock contention in write path streams > > > Key: CASSANDRA-13299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13299 > Project: Cassandra > Issue Type: Improvement > Components: Materialized Views >Reporter: Benjamin Roth >Assignee: ZhaoYang > Fix For: 4.x > > > I see a potential OOM, when a stream (e.g. repair) goes through the write > path as it is with MVs. > StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators > and they again produce mutations. So every partition creates a single > mutation, which in case of (very) big partitions can result in (very) big > mutations. Those are created on heap and stay there until they finished > processing. > I don't think it is necessary to create a single mutation for each partition. > Why don't we implement a PartitionUpdateGeneratorIterator that takes a > UnfilteredRowIterator and a max size and spits out PartitionUpdates to be > used to create and apply mutations? > The max size should be something like min(reasonable_absolute_max_size, > max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size > could be like 16M or sth. > A mutation shouldn't be too large as it also affects MV partition locking. > The
[jira] [Commented] (CASSANDRA-13299) Potential OOMs and lock contention in write path streams
[ https://issues.apache.org/jira/browse/CASSANDRA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169634#comment-16169634 ] ZhaoYang commented on CASSANDRA-13299: -- Thanks for the feedback. Rebased with lastest trunk, dtest is unstable due to netty.. bq. Make ThrottledUnfilteredIterator an Iterator instead of using hasNextGroup and resetLimit which is analogous to hasNext and next. Extended {{AbstractIterator}} and implements {{computeNext()}} {quote} Move to org.apache.cassandra.db.rows package Add simple javadoc explaining what it does Move cassandra.mv.mutation.row.count out of ThrottledUnfilteredIterator, and maybe rename it to cassandra.repair.mutation_repair_rows_per_batch (or similar, since it's also used for CDC). {quote} +1 fixed bq. Add unit test to ThrottledUnfilteredIterator to make sure it's generating range tombstones correctly Added {{ThrottledUnfilteredIteratorTest}} > Potential OOMs and lock contention in write path streams > > > Key: CASSANDRA-13299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13299 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth >Assignee: ZhaoYang > > I see a potential OOM, when a stream (e.g. repair) goes through the write > path as it is with MVs. > StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators > and they again produce mutations. So every partition creates a single > mutation, which in case of (very) big partitions can result in (very) big > mutations. Those are created on heap and stay there until they finished > processing. > I don't think it is necessary to create a single mutation for each partition. > Why don't we implement a PartitionUpdateGeneratorIterator that takes a > UnfilteredRowIterator and a max size and spits out PartitionUpdates to be > used to create and apply mutations? > The max size should be something like min(reasonable_absolute_max_size, > max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size > could be like 16M or sth. > A mutation shouldn't be too large as it also affects MV partition locking. > The longer a MV partition is locked during a stream, the higher chances are > that WTE's occur during streams. > I could also imagine that a max number of updates per mutation regardless of > size in bytes could make sense to avoid lock contention. > Love to get feedback and suggestions, incl. naming suggestions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13299) Potential OOMs and lock contention in write path streams
[ https://issues.apache.org/jira/browse/CASSANDRA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166039#comment-16166039 ] Paulo Motta commented on CASSANDRA-13299: - Thanks, the patch looks good from an initial look, great job! Some minor comments: * Generalize {{ThrottledUnfilteredIterator}} since it can also be useful outside of streaming package: ** Make {{ThrottledUnfilteredIterator}} an {{Iterator}} instead of using {{hasNextGroup}} and {{resetLimit}} which is analogous to {{hasNext}} and {{next}}. ** Move to {{org.apache.cassandra.db.rows}} package ** Add simple javadoc explaining what it does ** Move {{cassandra.mv.mutation.row.count}} out of {{ThrottledUnfilteredIterator}}, and maybe rename it to {{cassandra.repair.mutation_repair_rows_per_batch}} (or similar, since it's also used for CDC). * Add unit test to {{ThrottledUnfilteredIterator}} to make sure it's generating range tombstones correctly > Potential OOMs and lock contention in write path streams > > > Key: CASSANDRA-13299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13299 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth >Assignee: ZhaoYang > > I see a potential OOM, when a stream (e.g. repair) goes through the write > path as it is with MVs. > StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators > and they again produce mutations. So every partition creates a single > mutation, which in case of (very) big partitions can result in (very) big > mutations. Those are created on heap and stay there until they finished > processing. > I don't think it is necessary to create a single mutation for each partition. > Why don't we implement a PartitionUpdateGeneratorIterator that takes a > UnfilteredRowIterator and a max size and spits out PartitionUpdates to be > used to create and apply mutations? > The max size should be something like min(reasonable_absolute_max_size, > max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size > could be like 16M or sth. > A mutation shouldn't be too large as it also affects MV partition locking. > The longer a MV partition is locked during a stream, the higher chances are > that WTE's occur during streams. > I could also imagine that a max number of updates per mutation regardless of > size in bytes could make sense to avoid lock contention. > Love to get feedback and suggestions, incl. naming suggestions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13299) Potential OOMs and lock contention in write path streams
[ https://issues.apache.org/jira/browse/CASSANDRA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143793#comment-16143793 ] ZhaoYang commented on CASSANDRA-13299: -- [~brstgt] could you give some feedback? > Potential OOMs and lock contention in write path streams > > > Key: CASSANDRA-13299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13299 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth >Assignee: ZhaoYang > > I see a potential OOM, when a stream (e.g. repair) goes through the write > path as it is with MVs. > StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators > and they again produce mutations. So every partition creates a single > mutation, which in case of (very) big partitions can result in (very) big > mutations. Those are created on heap and stay there until they finished > processing. > I don't think it is necessary to create a single mutation for each partition. > Why don't we implement a PartitionUpdateGeneratorIterator that takes a > UnfilteredRowIterator and a max size and spits out PartitionUpdates to be > used to create and apply mutations? > The max size should be something like min(reasonable_absolute_max_size, > max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size > could be like 16M or sth. > A mutation shouldn't be too large as it also affects MV partition locking. > The longer a MV partition is locked during a stream, the higher chances are > that WTE's occur during streams. > I could also imagine that a max number of updates per mutation regardless of > size in bytes could make sense to avoid lock contention. > Love to get feedback and suggestions, incl. naming suggestions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13299) Potential OOMs and lock contention in write path streams
[ https://issues.apache.org/jira/browse/CASSANDRA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137759#comment-16137759 ] ZhaoYang commented on CASSANDRA-13299: -- [~brstgt] thanks :) Found one more issue related to RangeTombstoneMarker in MV when writing dtest. > Potential OOMs and lock contention in write path streams > > > Key: CASSANDRA-13299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13299 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth >Assignee: ZhaoYang > > I see a potential OOM, when a stream (e.g. repair) goes through the write > path as it is with MVs. > StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators > and they again produce mutations. So every partition creates a single > mutation, which in case of (very) big partitions can result in (very) big > mutations. Those are created on heap and stay there until they finished > processing. > I don't think it is necessary to create a single mutation for each partition. > Why don't we implement a PartitionUpdateGeneratorIterator that takes a > UnfilteredRowIterator and a max size and spits out PartitionUpdates to be > used to create and apply mutations? > The max size should be something like min(reasonable_absolute_max_size, > max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size > could be like 16M or sth. > A mutation shouldn't be too large as it also affects MV partition locking. > The longer a MV partition is locked during a stream, the higher chances are > that WTE's occur during streams. > I could also imagine that a max number of updates per mutation regardless of > size in bytes could make sense to avoid lock contention. > Love to get feedback and suggestions, incl. naming suggestions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13299) Potential OOMs and lock contention in write path streams
[ https://issues.apache.org/jira/browse/CASSANDRA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134749#comment-16134749 ] Benjamin Roth commented on CASSANDRA-13299: --- Sorry for the late response, I was on vacation. No, I am not working on that ticket. But thanks a lot for your efforts (not only) on that ticket! > Potential OOMs and lock contention in write path streams > > > Key: CASSANDRA-13299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13299 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth >Assignee: ZhaoYang > > I see a potential OOM, when a stream (e.g. repair) goes through the write > path as it is with MVs. > StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators > and they again produce mutations. So every partition creates a single > mutation, which in case of (very) big partitions can result in (very) big > mutations. Those are created on heap and stay there until they finished > processing. > I don't think it is necessary to create a single mutation for each partition. > Why don't we implement a PartitionUpdateGeneratorIterator that takes a > UnfilteredRowIterator and a max size and spits out PartitionUpdates to be > used to create and apply mutations? > The max size should be something like min(reasonable_absolute_max_size, > max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size > could be like 16M or sth. > A mutation shouldn't be too large as it also affects MV partition locking. > The longer a MV partition is locked during a stream, the higher chances are > that WTE's occur during streams. > I could also imagine that a max number of updates per mutation regardless of > size in bytes could make sense to avoid lock contention. > Love to get feedback and suggestions, incl. naming suggestions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13299) Potential OOMs and lock contention in write path streams
[ https://issues.apache.org/jira/browse/CASSANDRA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134733#comment-16134733 ] ZhaoYang commented on CASSANDRA-13299: -- [trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13299-trunk] [dtest|https://github.com/riptano/cassandra-dtest/commits/CASSANDRA-13299 ] Changes: 1. Throttle by number of base unfiltered. default is 100. 2. A pair of open/close range tombstone could have any number of unshadowed rows in between, in the patch, simply cache the range tombstones to avoid exceeding the limit. And apply cached range tombstones, in next batch. Note: One partition deletion or a range deletion could cause huge number of view rows to be removed, thus view mutation may fail to apply due to WTE or max_mutation_size, but it could be resolved separately in CASSANDRA-12783. Here, I only address the issue of holding entire partition into memory when repairing base with mv. > Potential OOMs and lock contention in write path streams > > > Key: CASSANDRA-13299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13299 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth > > I see a potential OOM, when a stream (e.g. repair) goes through the write > path as it is with MVs. > StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators > and they again produce mutations. So every partition creates a single > mutation, which in case of (very) big partitions can result in (very) big > mutations. Those are created on heap and stay there until they finished > processing. > I don't think it is necessary to create a single mutation for each partition. > Why don't we implement a PartitionUpdateGeneratorIterator that takes a > UnfilteredRowIterator and a max size and spits out PartitionUpdates to be > used to create and apply mutations? > The max size should be something like min(reasonable_absolute_max_size, > max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size > could be like 16M or sth. > A mutation shouldn't be too large as it also affects MV partition locking. > The longer a MV partition is locked during a stream, the higher chances are > that WTE's occur during streams. > I could also imagine that a max number of updates per mutation regardless of > size in bytes could make sense to avoid lock contention. > Love to get feedback and suggestions, incl. naming suggestions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13299) Potential OOMs and lock contention in write path streams
[ https://issues.apache.org/jira/browse/CASSANDRA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122903#comment-16122903 ] ZhaoYang commented on CASSANDRA-13299: -- [~brstgt] Hi benjamin, are you working on this ticket? I think there isn't a perfect base mutation size or number of base rows in a mutation that fits all data models. Your suggested Min(16MB, max_mutation_size) should be good enough. First target is to reduce memory pressure for huge partition with MV in repair. > Potential OOMs and lock contention in write path streams > > > Key: CASSANDRA-13299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13299 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth > > I see a potential OOM, when a stream (e.g. repair) goes through the write > path as it is with MVs. > StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators > and they again produce mutations. So every partition creates a single > mutation, which in case of (very) big partitions can result in (very) big > mutations. Those are created on heap and stay there until they finished > processing. > I don't think it is necessary to create a single mutation for each partition. > Why don't we implement a PartitionUpdateGeneratorIterator that takes a > UnfilteredRowIterator and a max size and spits out PartitionUpdates to be > used to create and apply mutations? > The max size should be something like min(reasonable_absolute_max_size, > max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size > could be like 16M or sth. > A mutation shouldn't be too large as it also affects MV partition locking. > The longer a MV partition is locked during a stream, the higher chances are > that WTE's occur during streams. > I could also imagine that a max number of updates per mutation regardless of > size in bytes could make sense to avoid lock contention. > Love to get feedback and suggestions, incl. naming suggestions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13299) Potential OOMs and lock contention in write path streams
[ https://issues.apache.org/jira/browse/CASSANDRA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896460#comment-15896460 ] Benjamin Roth commented on CASSANDRA-13299: --- Relating to CASSANDRA-11670 this would also allow to write all streamed mutations to commitlog without problems. I also propose to do so with small streams (see CASSANDRA-13290). Writing small streams (e.g. < 100KB) to commitlog does not require a flush at the end of stream receive. This avoids tons of flushes if tons of tiny streams are sent during a repair session. These are maybe apples and oranges but fixing all these ends makes the whole process less error prone and probably perform better. > Potential OOMs and lock contention in write path streams > > > Key: CASSANDRA-13299 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13299 > Project: Cassandra > Issue Type: Improvement >Reporter: Benjamin Roth > > I see a potential OOM, when a stream (e.g. repair) goes through the write > path as it is with MVs. > StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators > and they again produce mutations. So every partition creates a single > mutation, which in case of (very) big partitions can result in (very) big > mutations. Those are created on heap and stay there until they are processed. > I don't think it is necessary to create a single mutation for each partition. > Why don't we implement a PartitionUpdateGeneratorIterator that takes a > UnfilteredRowIterator and a max size and spits out PartitionUpdates to be > used to create and apply mutations? > The max size should be something like min(reasonable_absolute_max_size, > max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size > could be like 16M or sth. > A mutation shouldn't be too large as it also affects MV partition locking. As > longer a MV partition is locked during a stream, the higher chances are that > WTE's occur during streams. > I could also imagine that a max number of updates per mutation regardless of > size in bytes could make sense to avoid lock contention. > Love to get feedback and suggestions, incl. naming suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)