subject:"\[jira\] \[Updated\] \(CASSANDRA\-2901\) Allow taking advantage of multiple cores while compacting a single CF"

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 2901-trunk.txt

Rebased to trunk.

Allow taking advantage of multiple cores while compacting a single CF
-

Key: CASSANDRA-2901
URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
Fix For: 1.0

Attachments: 2901-0.8.txt, 2901-trunk.txt, 2901-trunk.txt

Moved from CASSANDRA-1876:
There are five stages: read, deserialize, merge, serialize, and write. We
probably want to continue doing read+deserialize and serialize+write
together, or you waste a lot copying to/from buffers.
So, what I would suggest is: one thread per input sstable doing read +
deserialize (a row at a time). A thread pool (one per core?) merging
corresponding rows from each input sstable. One thread doing serialize +
writing the output (this has to wait for the merge threads to complete
in-order, obviously). This should take us from being CPU bound on SSDs (since
only one core is compacting) to being I/O bound.
This will require roughly 2x the memory, to allow the reader threads to work
ahead of the merge stage. (I.e. for each input sstable you will have up to
one row in a queue waiting to be merged, and the reader thread working on the
next.) Seems quite reasonable on that front. You'll also want a small queue
size for the serialize-merged-rows executor.
Multithreaded compaction should be either on or off. It doesn't make sense to
try to do things halfway (by doing the reads with a
threadpool whose size you can grow/shrink, for instance): we still have
compaction threads tuned to low priority, by default, so the impact on the
rest of the system won't be very different. Nor do we expect to have so many
input sstables that we lose a lot in context switching between reader threads.
IMO it's acceptable to punt completely on rows that are larger than memory,
and fall back to the old non-parallel code there. I don't see any sane way to
parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 2901-trunk.txt)

Allow taking advantage of multiple cores while compacting a single CF
-

Attachments: 2901-0.8.txt, 2901-trunk.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 2901-trunk.txt)

Allow taking advantage of multiple cores while compacting a single CF
-

Attachments: 2901-0.8.txt, 2901-trunk.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 2901-trunk.txt

Allow taking advantage of multiple cores while compacting a single CF
-

Attachments: 2901-0.8.txt, 2901-trunk.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-06 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 0002-parallel-compaction.txt)

Allow taking advantage of multiple cores while compacting a single CF
-

Attachments: 2901-trunk.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-06 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 2901-trunk.txt

Committed the CompactionIterable refactoring to trunk. Attaching latest trunk
version, which fixes not closing the Deserializer sources.

Allow taking advantage of multiple cores while compacting a single CF
-

Attachments: 2901-trunk.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-06 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 2901-0.8.txt

... and backported to 0.8.

Allow taking advantage of multiple cores while compacting a single CF
-

Attachments: 2901-0.8.txt, 2901-trunk.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-05 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 0002-parallel-compaction.txt
0001-refactor-CompactionIterator-CompactionIterable.txt

Allow taking advantage of multiple cores while compacting a single CF
-

Attachments: 0001-refactor-CompactionIterator-CompactionIterable.txt,
0002-parallel-compaction.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-05 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was:
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt)

Allow taking advantage of multiple cores while compacting a single CF
-

Attachments: 0001-refactor-CompactionIterator-CompactionIterable.txt,
0002-parallel-compaction.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-05 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 0002-parallel-compaction.txt)

Allow taking advantage of multiple cores while compacting a single CF
-

Attachments: 0001-refactor-CompactionIterator-CompactionIterable.txt,
0002-parallel-compaction.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-04 Thread Sylvain Lebresne (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne updated CASSANDRA-2901:

Attachment: 0003-Fix-LCR.patch

The DefsTest and CliTest problem is because we don't ignore purged tombstone on
the first pass when computing the serializedSize. Attaching a small patch with
the fix. The patch also fixes a failure with StreamingTransferTest: in SSTII,
the columnPosition should be set for non file input, otherwise headerSiez()
returns the wrong value and the assertion in getColumnFamilyWithColumns is
triggered. This seems to fix all unit tests here.

The patch looks good, but each deserializer now get the full maxInMemorySize
instead of maxInMemorySize / nb(Deserializers). Was that intended ?

Allow taking advantage of multiple cores while compacting a single CF
-

Attachments:
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt,
0002-parallel-compaction.txt, 0003-Fix-LCR.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was:
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt)

Allow taking advantage of multiple cores while compacting a single CF
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 0003-Fix-LCR.patch)

Allow taking advantage of multiple cores while compacting a single CF
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 0002-parallel-compaction.txt)

Allow taking advantage of multiple cores while compacting a single CF
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 0002-parallel-compaction.txt
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt

Allow taking advantage of multiple cores while compacting a single CF
-

Attachments:
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt,
0002-parallel-compaction.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-03 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was:
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt)

Allow taking advantage of multiple cores while compacting a single CF
-

Attachments:
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt,
0002-parallel-compaction.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-03 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 0002-parallel-compaction.txt
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt

Allow taking advantage of multiple cores while compacting a single CF
-

Attachments:
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt,
0002-parallel-compaction.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-03 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 0002-parallel-compactions.txt)

Allow taking advantage of multiple cores while compacting a single CF
-

Attachments:
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt,
0002-parallel-compaction.txt

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 2901.patch)

Allow taking advantage of multiple cores while compacting a single CF
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 2901-v3.txt)

Allow taking advantage of multiple cores while compacting a single CF
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

[
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 2901-v2.txt)

Allow taking advantage of multiple cores while compacting a single CF
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF