[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-09-06 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 2901-trunk.txt

Rebased to trunk.

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 1.0

 Attachments: 2901-0.8.txt, 2901-trunk.txt, 2901-trunk.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-09-06 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 2901-trunk.txt)

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 1.0

 Attachments: 2901-0.8.txt, 2901-trunk.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-09-06 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 2901-trunk.txt)

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 1.0

 Attachments: 2901-0.8.txt, 2901-trunk.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-09-06 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 2901-trunk.txt

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 1.0

 Attachments: 2901-0.8.txt, 2901-trunk.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-06 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 0002-parallel-compaction.txt)

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4

 Attachments: 2901-trunk.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-06 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 2901-trunk.txt

Committed the CompactionIterable refactoring to trunk.  Attaching latest trunk 
version, which fixes not closing the Deserializer sources.

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4

 Attachments: 2901-trunk.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-06 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 2901-0.8.txt

... and backported to 0.8.

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4

 Attachments: 2901-0.8.txt, 2901-trunk.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-05 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 0002-parallel-compaction.txt
0001-refactor-CompactionIterator-CompactionIterable.txt

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4

 Attachments: 0001-refactor-CompactionIterator-CompactionIterable.txt, 
 0002-parallel-compaction.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-05 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt)

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4

 Attachments: 0001-refactor-CompactionIterator-CompactionIterable.txt, 
 0002-parallel-compaction.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-05 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 0002-parallel-compaction.txt)

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4

 Attachments: 0001-refactor-CompactionIterator-CompactionIterable.txt, 
 0002-parallel-compaction.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-04 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2901:


Attachment: 0003-Fix-LCR.patch

The DefsTest and CliTest problem is because we don't ignore purged tombstone on 
the first pass when computing the serializedSize. Attaching a small patch with 
the fix. The patch also fixes a failure with StreamingTransferTest: in SSTII, 
the columnPosition should be set for non file input, otherwise headerSiez() 
returns the wrong value and the assertion in getColumnFamilyWithColumns is 
triggered. This seems to fix all unit tests here.

The patch looks good, but each deserializer now get the full maxInMemorySize 
instead of maxInMemorySize / nb(Deserializers). Was that intended ?


 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4

 Attachments: 
 0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt, 
 0002-parallel-compaction.txt, 0003-Fix-LCR.patch


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-04 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt)

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-04 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 0003-Fix-LCR.patch)

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-04 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 0002-parallel-compaction.txt)

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-04 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 0002-parallel-compaction.txt
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4

 Attachments: 
 0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt, 
 0002-parallel-compaction.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-03 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt)

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4

 Attachments: 
 0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt, 
 0002-parallel-compaction.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-03 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 0002-parallel-compaction.txt
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4

 Attachments: 
 0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt, 
 0002-parallel-compaction.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-03 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 0002-parallel-compactions.txt)

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4

 Attachments: 
 0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt, 
 0002-parallel-compaction.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-02 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 2901.patch)

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-02 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 2901-v3.txt)

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-02 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: (was: 2901-v2.txt)

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-02 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 0002-parallel-compactions.txt
0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.4

 Attachments: 
 0001-fix-tracker-getting-out-of-sync-with-underlying-data-s.txt, 
 0002-parallel-compactions.txt


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-08-01 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 2901-v3.txt

bq. PCI.Reducer.getCompactedRow unwraps NotifyingSSTableIterator

fixed

bq. it doesn't seem like we ever call close() on the SSTableIdentityIterator

fixed

bq. we need to reuse the trick of DebuggableThreadPoolExecutor

added ACR.close to fix

bq. we already queue up rows in the MergeTask executor, so it feels like it 
would be simple to use direct handoff here

Simpler, but then any deserializer that can't deserialize the next row by the 
time the next merge slot is open, will stall the merge.  My thinking was that 
allowing per-deserializer buffers will keep the pipeline full better.

bq. the memory blow up is (potentially) much more than the 2x (compared to 
mono-threaded) in the description of this ticket

It's not so bad, because we can assume N = 2 sstables, and we restrict each 
deserializer to 1/N of in-memory limit.  So I think we come close to 2x 
overall.  (And if we don't, I'd rather adjust our estimate, than make it less 
performant/more complex. :)

bq. we should probably remove the occurrence in PreCompactedRow as it's still 
multi-threaded while in the MergeTask

refactored PCR constructors to do this.

bq. we create one ComparatorIColumnIterator each time instead of having a 
private static final one

added RowContainer.comparator

bq. there is a race when updating the bytesRead

fixed, and changed bytesRead to volatile


 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.3

 Attachments: 2901-v2.txt, 2901-v3.txt, 2901.patch


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-07-28 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Attachment: 2901-v2.txt

v2 attached.

Summary:

- Extracts common code from CompactionIterator (renamed CompactionIterable) 
into AbstractCompactionIterable. 
- One Deserializer thread per input sstable performs read + deserialize (a row 
at a time).
 - The resulting ColumnFamilies are added to a queue, which is fed to the merge 
Reducer.
 - The merge Reducer creates MergeTasks on a thread-per-core Executor, and 
returns FutureColumnFamily objects, which are turned into PrecompactedRow 
objects when complete.
 - The main complication is in handling larger-than-memory rows.  When one is 
encountered, no further deserialization is done until that row is merged and 
written -- creating a pipeline stall, as it were.  Thus, this is intended to be 
useful with mostly-in-memory row sizes, but preserves correctness in the face 
of occasional exceptions.



 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.3

 Attachments: 2901-v2.txt, 2901.patch


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-07-28 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Reviewer: slebresne
Assignee: Jonathan Ellis

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.3

 Attachments: 2901-v2.txt, 2901.patch


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-07-27 Thread Yewei Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yewei Zhang updated CASSANDRA-2901:
---

Attachment: 2901.patch

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.3

 Attachments: 2901.patch


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader threads.
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-07-18 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Fix Version/s: 0.8.2

bq. IMO it's acceptable to punt completely on rows that are larger than memory, 
and fall back to the old non-parallel code there. I don't see any sane way to 
parallelize large-row compactions

Now that we have CASSANDRA-2879 done in trunk we *could* partially parallelize 
large rows -- the first pass (to determine the compacted row size) needs to be 
serial, but actually writing the data could be done concurrently with another 
writer.

Unclear how much extra work that would be...  and it does depend on the trunk 
work.

My gut: unless it looks super easy to do on top of 2879 (in which case we can 
move this whole ticket to trunk) let's stick to the original plan.

 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.2


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). One thread merging corresponding rows from 
 each input sstable. One thread doing serialize + writing the output. This 
 should give us between 2x and 3x speedup (depending how much doing the merge 
 on another thread than write saves us).
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.
 Multithreaded compaction should be either on or off. It doesn't make sense to 
 try to do things halfway (by doing the reads with a
 threadpool whose size you can grow/shrink, for instance): we still have 
 compaction threads tuned to low priority, by default, so the impact on the 
 rest of the system won't be very different. Nor do we expect to have so many 
 input sstables that we lose a lot in context switching between reader 
 threads. (If this is a concern, we already have a tunable to limit the number 
 of sstables merged at a time in a single CF.)
 IMO it's acceptable to punt completely on rows that are larger than memory, 
 and fall back to the old non-parallel code there. I don't see any sane way to 
 parallelize large-row compactions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2901) Allow taking advantage of multiple cores while compacting a single CF

2011-07-18 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2901:
--

Description: 
Moved from CASSANDRA-1876:

There are five stages: read, deserialize, merge, serialize, and write. We 
probably want to continue doing read+deserialize and serialize+write together, 
or you waste a lot copying to/from buffers.

So, what I would suggest is: one thread per input sstable doing read + 
deserialize (a row at a time). A thread pool (one per core?) merging 
corresponding rows from each input sstable. One thread doing serialize + 
writing the output (this has to wait for the merge threads to complete 
in-order, obviously). This should take us from being CPU bound on SSDs (since 
only one core is compacting) to being I/O bound.

This will require roughly 2x the memory, to allow the reader threads to work 
ahead of the merge stage. (I.e. for each input sstable you will have up to one 
row in a queue waiting to be merged, and the reader thread working on the 
next.) Seems quite reasonable on that front.  You'll also want a small queue 
size for the serialize-merged-rows executor.

Multithreaded compaction should be either on or off. It doesn't make sense to 
try to do things halfway (by doing the reads with a
threadpool whose size you can grow/shrink, for instance): we still have 
compaction threads tuned to low priority, by default, so the impact on the rest 
of the system won't be very different. Nor do we expect to have so many input 
sstables that we lose a lot in context switching between reader threads.

IMO it's acceptable to punt completely on rows that are larger than memory, and 
fall back to the old non-parallel code there. I don't see any sane way to 
parallelize large-row compactions.

  was:
Moved from CASSANDRA-1876:

There are five stages: read, deserialize, merge, serialize, and write. We 
probably want to continue doing read+deserialize and serialize+write together, 
or you waste a lot copying to/from buffers.

So, what I would suggest is: one thread per input sstable doing read + 
deserialize (a row at a time). One thread merging corresponding rows from each 
input sstable. One thread doing serialize + writing the output. This should 
give us between 2x and 3x speedup (depending how much doing the merge on 
another thread than write saves us).

This will require roughly 2x the memory, to allow the reader threads to work 
ahead of the merge stage. (I.e. for each input sstable you will have up to one 
row in a queue waiting to be merged, and the reader thread working on the 
next.) Seems quite reasonable on that front.

Multithreaded compaction should be either on or off. It doesn't make sense to 
try to do things halfway (by doing the reads with a
threadpool whose size you can grow/shrink, for instance): we still have 
compaction threads tuned to low priority, by default, so the impact on the rest 
of the system won't be very different. Nor do we expect to have so many input 
sstables that we lose a lot in context switching between reader threads. (If 
this is a concern, we already have a tunable to limit the number of sstables 
merged at a time in a single CF.)

IMO it's acceptable to punt completely on rows that are larger than memory, and 
fall back to the old non-parallel code there. I don't see any sane way to 
parallelize large-row compactions.


 Allow taking advantage of multiple cores while compacting a single CF
 -

 Key: CASSANDRA-2901
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2901
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Priority: Minor
 Fix For: 0.8.2


 Moved from CASSANDRA-1876:
 There are five stages: read, deserialize, merge, serialize, and write. We 
 probably want to continue doing read+deserialize and serialize+write 
 together, or you waste a lot copying to/from buffers.
 So, what I would suggest is: one thread per input sstable doing read + 
 deserialize (a row at a time). A thread pool (one per core?) merging 
 corresponding rows from each input sstable. One thread doing serialize + 
 writing the output (this has to wait for the merge threads to complete 
 in-order, obviously). This should take us from being CPU bound on SSDs (since 
 only one core is compacting) to being I/O bound.
 This will require roughly 2x the memory, to allow the reader threads to work 
 ahead of the merge stage. (I.e. for each input sstable you will have up to 
 one row in a queue waiting to be merged, and the reader thread working on the 
 next.) Seems quite reasonable on that front.  You'll also want a small queue 
 size for the serialize-merged-rows executor.
 Multithreaded compaction should be either on or off. It doesn't make sense to