[jira] [Commented] (CASSANDRA-6142) Remove multithreaded compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13808820#comment-13808820 ] Marcus Eriksson commented on CASSANDRA-6142: ok, looks good to me how about 1.2? Remove multithreaded compaction --- Key: CASSANDRA-6142 URL: https://issues.apache.org/jira/browse/CASSANDRA-6142 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 2.1 There is at best a very small sweet spot for multithreaded compaction (ParallelCompactionIterable). For large rows, we stall the pipeline and fall back to a single LCR pass. For small rows, the overhead of the coordination outweighs the benefits of parallelization (45s to compact 2x1M stress rows with multithreading enabled, vs 35 with it disabled). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6142) Remove multithreaded compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809198#comment-13809198 ] Jonathan Ellis commented on CASSANDRA-6142: --- The 2.0 backport was bad enough, I don't even want to think about 1.2. They're all pretty rare corner cases, so I'm fine with telling people to upgrade to 2.0 if they care. Remove multithreaded compaction --- Key: CASSANDRA-6142 URL: https://issues.apache.org/jira/browse/CASSANDRA-6142 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 2.1 There is at best a very small sweet spot for multithreaded compaction (ParallelCompactionIterable). For large rows, we stall the pipeline and fall back to a single LCR pass. For small rows, the overhead of the coordination outweighs the benefits of parallelization (45s to compact 2x1M stress rows with multithreading enabled, vs 35 with it disabled). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6142) Remove multithreaded compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13809201#comment-13809201 ] Jonathan Ellis commented on CASSANDRA-6142: --- Split that out to CASSANDRA-6274 to keep CHANGES clean when i tag it 2.0.3. Remove multithreaded compaction --- Key: CASSANDRA-6142 URL: https://issues.apache.org/jira/browse/CASSANDRA-6142 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 2.1 There is at best a very small sweet spot for multithreaded compaction (ParallelCompactionIterable). For large rows, we stall the pipeline and fall back to a single LCR pass. For small rows, the overhead of the coordination outweighs the benefits of parallelization (45s to compact 2x1M stress rows with multithreading enabled, vs 35 with it disabled). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6142) Remove multithreaded compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13804861#comment-13804861 ] Jonathan Ellis commented on CASSANDRA-6142: --- Posted 2.0 fixes to https://github.com/jbellis/cassandra/commits/6142-2.0. Note that b959e8ff3bccd3437de70d33da91307ab9c12a19 is a different, less-invasive approach than the one taken for trunk. Remove multithreaded compaction --- Key: CASSANDRA-6142 URL: https://issues.apache.org/jira/browse/CASSANDRA-6142 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 2.1 There is at best a very small sweet spot for multithreaded compaction (ParallelCompactionIterable). For large rows, we stall the pipeline and fall back to a single LCR pass. For small rows, the overhead of the coordination outweighs the benefits of parallelization (45s to compact 2x1M stress rows with multithreading enabled, vs 35 with it disabled). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6142) Remove multithreaded compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801548#comment-13801548 ] Marcus Eriksson commented on CASSANDRA-6142: looks good to me regarding saveOutOfOrderRows, i guess a solution would be to flush a new sstable from the TreeSet when its size exceeds some limit? Unsure how common this is though. Remove multithreaded compaction --- Key: CASSANDRA-6142 URL: https://issues.apache.org/jira/browse/CASSANDRA-6142 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 2.1 There is at best a very small sweet spot for multithreaded compaction (ParallelCompactionIterable). For large rows, we stall the pipeline and fall back to a single LCR pass. For small rows, the overhead of the coordination outweighs the benefits of parallelization (45s to compact 2x1M stress rows with multithreading enabled, vs 35 with it disabled). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6142) Remove multithreaded compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13801998#comment-13801998 ] Jonathan Ellis commented on CASSANDRA-6142: --- I'm guessing not super common because the existing code will just break if it hits that case. (A LCR object will throw errors if you try to use it after advancing the underlying stream to another row.) I guess the next step is probably for me to pull the fixes out for application to 2.0. Remove multithreaded compaction --- Key: CASSANDRA-6142 URL: https://issues.apache.org/jira/browse/CASSANDRA-6142 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 2.1 There is at best a very small sweet spot for multithreaded compaction (ParallelCompactionIterable). For large rows, we stall the pipeline and fall back to a single LCR pass. For small rows, the overhead of the coordination outweighs the benefits of parallelization (45s to compact 2x1M stress rows with multithreading enabled, vs 35 with it disabled). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6142) Remove multithreaded compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1370#comment-1370 ] Jonathan Ellis commented on CASSANDRA-6142: --- Pushed fixes for these to the same branch. (They are indeed existing bugs in LCR.) Remove multithreaded compaction --- Key: CASSANDRA-6142 URL: https://issues.apache.org/jira/browse/CASSANDRA-6142 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 2.1 There is at best a very small sweet spot for multithreaded compaction (ParallelCompactionIterable). For large rows, we stall the pipeline and fall back to a single LCR pass. For small rows, the overhead of the coordination outweighs the benefits of parallelization (45s to compact 2x1M stress rows with multithreading enabled, vs 35 with it disabled). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6142) Remove multithreaded compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798480#comment-13798480 ] Jonathan Ellis commented on CASSANDRA-6142: --- Damn, not sure how I missed that. Suspect another existing bug. Will investigate. Remove multithreaded compaction --- Key: CASSANDRA-6142 URL: https://issues.apache.org/jira/browse/CASSANDRA-6142 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 2.1 There is at best a very small sweet spot for multithreaded compaction (ParallelCompactionIterable). For large rows, we stall the pipeline and fall back to a single LCR pass. For small rows, the overhead of the coordination outweighs the benefits of parallelization (45s to compact 2x1M stress rows with multithreading enabled, vs 35 with it disabled). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6142) Remove multithreaded compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796515#comment-13796515 ] Marcus Eriksson commented on CASSANDRA-6142: CompactionsPurgeTest fails: {noformat} [junit] Testsuite: org.apache.cassandra.db.compaction.CompactionsPurgeTest [junit] Tests run: 6, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 9.365 sec [junit] [junit] Testcase: testMinTimestampPurge(org.apache.cassandra.db.compaction.CompactionsPurgeTest): FAILED [junit] expected:2 but was:1 [junit] junit.framework.AssertionFailedError: expected:2 but was:1 [junit] at org.apache.cassandra.db.compaction.CompactionsPurgeTest.testMinTimestampPurge(CompactionsPurgeTest.java:185) [junit] [junit] [junit] Testcase: testCompactionPurgeTombstonedRow(org.apache.cassandra.db.compaction.CompactionsPurgeTest): FAILED [junit] expected:10 but was:5 [junit] junit.framework.AssertionFailedError: expected:10 but was:5 [junit] at org.apache.cassandra.db.compaction.CompactionsPurgeTest.testCompactionPurgeTombstonedRow(CompactionsPurgeTest.java:313) [junit] [junit] [junit] Test org.apache.cassandra.db.compaction.CompactionsPurgeTest FAILED {noformat} and a few nits: in LazilyCompactedRow: * make reducer and merger final * remove comment about reducer being null on row 123 Remove multithreaded compaction --- Key: CASSANDRA-6142 URL: https://issues.apache.org/jira/browse/CASSANDRA-6142 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 2.1 There is at best a very small sweet spot for multithreaded compaction (ParallelCompactionIterable). For large rows, we stall the pipeline and fall back to a single LCR pass. For small rows, the overhead of the coordination outweighs the benefits of parallelization (45s to compact 2x1M stress rows with multithreading enabled, vs 35 with it disabled). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6142) Remove multithreaded compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785287#comment-13785287 ] Jonathan Ellis commented on CASSANDRA-6142: --- I tried parallelizing at the OnDiskAtomIterator level instead (thread-per-iterator-per-partition, buffering into a queue) and for small partitions the performance is ridiculously bad, easily 100x worse than single threaded mode. Any better ideas [~krummas] [~yukim] [~iamaleksey] [~slebresne]? If not I will post a patch to rip out PCI. Remove multithreaded compaction --- Key: CASSANDRA-6142 URL: https://issues.apache.org/jira/browse/CASSANDRA-6142 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 2.1 There is at best a very small sweet spot for multithreaded compaction (ParallelCompactionIterable). For large rows, we stall the pipeline and fall back to a single LCR pass. For small rows, the overhead of the coordination outweighs the benefits of parallelization (45s to compact 2x1M stress rows with multithreading enabled, vs 35 with it disabled). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6142) Remove multithreaded compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785321#comment-13785321 ] Marcus Eriksson commented on CASSANDRA-6142: i tried improving it a while back as well, got basically the same results, yes, we should remove it concluded that the best way to improve the speed was to do more compactions in parallel (CASSANDRA-5936 - i should finish that up..) Remove multithreaded compaction --- Key: CASSANDRA-6142 URL: https://issues.apache.org/jira/browse/CASSANDRA-6142 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 2.1 There is at best a very small sweet spot for multithreaded compaction (ParallelCompactionIterable). For large rows, we stall the pipeline and fall back to a single LCR pass. For small rows, the overhead of the coordination outweighs the benefits of parallelization (45s to compact 2x1M stress rows with multithreading enabled, vs 35 with it disabled). -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-6142) Remove multithreaded compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-6142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13785612#comment-13785612 ] Jonathan Ellis commented on CASSANDRA-6142: --- Pushed removal to https://github.com/jbellis/cassandra/commits/6142. Also removes PrecompactedRow, which is no longer necessary, and fixes a couple existing bugs in LCR and Scrub that this revealed (last two commits). Remove multithreaded compaction --- Key: CASSANDRA-6142 URL: https://issues.apache.org/jira/browse/CASSANDRA-6142 Project: Cassandra Issue Type: Bug Components: Core Reporter: Jonathan Ellis Assignee: Jonathan Ellis Priority: Minor Fix For: 2.1 There is at best a very small sweet spot for multithreaded compaction (ParallelCompactionIterable). For large rows, we stall the pipeline and fall back to a single LCR pass. For small rows, the overhead of the coordination outweighs the benefits of parallelization (45s to compact 2x1M stress rows with multithreading enabled, vs 35 with it disabled). -- This message was sent by Atlassian JIRA (v6.1#6144)