[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534123#comment-16534123 ] Michael Burman edited comment on CASSANDRA-7282 at 7/5/18 8:45 PM: --- Updated the code with hopefully what you meant.. (need to get some sleep). I see worse performance, which I believe is because the added complexity of the minimum processing (the improved memory layout does not help enough - maybe the microbenchmark fits the caches in other cases also). I'll need to investigate more tomorrow. {code} [java] Benchmark(vnodes) Mode Cnt Score Error Units [java] BSTBench.multiArrayBinarySeek 4 thrpt 16 106134.448 ± 3211.072 ops/ms [java] BSTBench.multiArrayBinarySeek 16 thrpt 16 21704.218 ± 788.881 ops/ms [java] BSTBench.multiArrayBinarySeek 64 thrpt 164347.090 ± 106.703 ops/ms [java] BSTBench.multiArrayBinarySeek 256 thrpt 16 667.122 ± 210.476 ops/ms [java] BSTBench.multiArrayBinarySeek 512 thrpt 16 192.777 ± 29.690 ops/ms [java] BSTBench.multiArrayLinearSearch 4 thrpt 16 172888.358 ± 7778.132 ops/ms [java] BSTBench.multiArrayLinearSearch16 thrpt 16 16703.622 ± 565.180 ops/ms [java] BSTBench.multiArrayLinearSearch64 thrpt 161096.093 ± 33.050 ops/ms [java] BSTBench.multiArrayLinearSearch 256 thrpt 16 84.511 ± 2.776 ops/ms [java] BSTBench.multiArrayLinearSearch 512 thrpt 16 22.130 ± 0.714 ops/ms {code} was (Author: burmanm): Updated the code with hopefully what you meant.. (need to get some sleep). I see worse performance, which I believe is because the added complexity of the minimum processing. I'll need to investigate more tomorrow. {code} [java] Benchmark(vnodes) Mode Cnt Score Error Units [java] BSTBench.multiArrayBinarySeek 4 thrpt 16 106134.448 ± 3211.072 ops/ms [java] BSTBench.multiArrayBinarySeek 16 thrpt 16 21704.218 ± 788.881 ops/ms [java] BSTBench.multiArrayBinarySeek 64 thrpt 164347.090 ± 106.703 ops/ms [java] BSTBench.multiArrayBinarySeek 256 thrpt 16 667.122 ± 210.476 ops/ms [java] BSTBench.multiArrayBinarySeek 512 thrpt 16 192.777 ± 29.690 ops/ms [java] BSTBench.multiArrayLinearSearch 4 thrpt 16 172888.358 ± 7778.132 ops/ms [java] BSTBench.multiArrayLinearSearch16 thrpt 16 16703.622 ± 565.180 ops/ms [java] BSTBench.multiArrayLinearSearch64 thrpt 161096.093 ± 33.050 ops/ms [java] BSTBench.multiArrayLinearSearch 256 thrpt 16 84.511 ± 2.776 ops/ms [java] BSTBench.multiArrayLinearSearch 512 thrpt 16 22.130 ± 0.714 ops/ms {code} > Faster Memtable map > --- > > Key: CASSANDRA-7282 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 > Project: Cassandra > Issue Type: Improvement >Reporter: Benedict >Assignee: Michael Burman >Priority: Major > Labels: performance > Fix For: 4.x > > Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, > run1.svg, writes.svg > > > Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in > our memtables. Maintaining this is an O(lg(n)) operation; since the vast > majority of users use a hash partitioner, it occurs to me we could maintain a > hybrid ordered list / hash map. The list would impose the normal order on the > collection, but a hash index would live alongside as part of the same data > structure, simply mapping into the list and permitting O(1) lookups and > inserts. > I've chosen to implement this initial version as a linked-list node per item, > but we can optimise this in future by storing fatter nodes that permit a > cache-line's worth of hashes to be checked at once, further reducing the > constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533127#comment-16533127 ] Benedict edited comment on CASSANDRA-7282 at 7/4/18 11:35 PM: -- Hi Michael, I'll try to review this properly, as I find time. A couple of things leap out to me though: # While we certainly need to handle out of range tokens, it seems to me that we should anyway try hard to minimise their occurrence, by e.g. switching memtable on a token range change. Ideally we would switch before this new token range actually takes effect in the case of an expansion; whether or not it makes sense to defer a token range removal probably needs some thought. # A range tree is probably overkill. Why not binary search over an array? was (Author: benedict): Hi Michael, I'll try to review this properly, as I find time. A couple of things leap out to me though: # While we certainly need to handle out of range tokens, we should try harder to minimise their occurrence, by switching memtable on a token range change. Ideally we would switch before this new token range actually takes effect in the case of an expansion; whether or not it makes sense to defer a token range removal probably needs some thought. # A range tree is probably overkill. Why not binary search over an array? > Faster Memtable map > --- > > Key: CASSANDRA-7282 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 > Project: Cassandra > Issue Type: Improvement >Reporter: Benedict >Assignee: Michael Burman >Priority: Major > Labels: performance > Fix For: 4.x > > Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, > run1.svg, writes.svg > > > Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in > our memtables. Maintaining this is an O(lg(n)) operation; since the vast > majority of users use a hash partitioner, it occurs to me we could maintain a > hybrid ordered list / hash map. The list would impose the normal order on the > collection, but a hash index would live alongside as part of the same data > structure, simply mapping into the list and permitting O(1) lookups and > inserts. > I've chosen to implement this initial version as a linked-list node per item, > but we can optimise this in future by storing fatter nodes that permit a > cache-line's worth of hashes to be checked at once, further reducing the > constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273063#comment-14273063 ] Benedict edited comment on CASSANDRA-7282 at 1/11/15 9:19 PM: -- Just a note for the future (I haven't yet addressed your comments, but appreciate the thorough review): It may be we want to harden the data structure against determined attackers that produce a lot of colliding tokens with different partition keys. It's not a terribly complex modification to make this a skip-list, instead of a linked-list (since it is add-only, maintaining a skip-list is just applying the current linked-list logic N times, and the majority of complexity here is for the hash index). The hash index would remain an _optimisation_ for expected constant time access into a logarithmic structure, instead of a linear one, giving us best- and expected- case constant time, worst case logarithmic. The skip-list index levels could perhaps be dropped as the hash index extends to overlap them, although it may be cheaper to encode them all in the nodes themselves. This should probably be a follow up ticket, but I wanted to log the thought as it occurred to me. was (Author: benedict): Just a note for the future (I haven't yet addressed your comments): It may be we want to harden the data structure against determined attackers that produce a lot of colliding tokens with different partition keys. It's not a terribly complex modification to make this a skip-list, instead of a linked-list (since it is add-only, maintaining a skip-list is just applying the current linked-list logic N times, and the majority of complexity here is for the hash index). The hash index would remain an _optimisation_ for expected constant time access into a logarithmic structure, instead of a linear one, giving us best- and expected- case constant time, worst case logarithmic. The skip-list index levels could perhaps be dropped as the hash index extends to overlap them, although it may be cheaper to encode them all in the nodes themselves. This should probably be a follow up ticket, but I wanted to log the thought as it occurred to me. Faster Memtable map --- Key: CASSANDRA-7282 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, run1.svg, writes.svg Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in our memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The list would impose the normal order on the collection, but a hash index would live alongside as part of the same data structure, simply mapping into the list and permitting O(1) lookups and inserts. I've chosen to implement this initial version as a linked-list node per item, but we can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes to be checked at once, further reducing the constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147705#comment-14147705 ] Jason Brown edited comment on CASSANDRA-7282 at 9/25/14 1:05 PM: - I ran benedict's 7282 patch vs. the commit prior in his branch, for several use cases. TL;DR the patch was a clear winner for the the specific use case he called out, and marginally better, if not the same, for a 'typical' use case. For the record, I set mct = 0.5 and was using offheap_objects. For the first use case, I used Benedict's attached profile.yaml, although i changed the RF from 1 to 2. I then used this stress command to execute (basically the same as Benedict's above) {code}./bin/cassandra-stress user profile=~/jasobrown/b7282.yaml ops\(insert=5,read=5\) n=2000 -pop seq=1..10M read-lookback=extreme\(1..1M,2\) -rate threads=200 -mode cql3 native prepared -node node_addresses{code} I've attached the results of running the above command three times successively on both patch and unpatched code, and the results can be summarized: - 15% improvement in throughput - 10% improvement at 95%/99%-iles - 50% improvement at 99.9%-ile ~ 40% less garbage created / 40% less time in GC Note that over the life of stress run, memtables were flushing and sstables compacting, so not all reads were coming directly from the memtable. One thing I perhaps could have tried (and would have liked to) was switching the CL from ONE to LOCAL_QUORUM, although I think stress would have applied that to both reads and writes for the above command, whereas I would have wanted to affect either read or write. I also ran the stress a more 'standard' use case of mine (I'm still developing it alongside the new stress), and results were about even between patch and unpatched, although there may have been very slight advantage toward the patched version. So, I think in the case where you are reading your most recent writes, and the data in the memtable is small (not wide), there is a performance gain in this patch. In a more typical case, it wasn't necessarily proven that the patch boosts performance (but then the patch wasn't attempting to help the general case, anyway). was (Author: jasobrown): I ran benedict's 7282 patch vs. the commit prior in his branch, for several use cases. TL;DR the patch was a clear winner for the the specific use case he called out, and marginally better, if not the same, for a 'typical' use case. For the first use case, I used Benedict's attached profile.yaml, although i changed the RF from 1 to 2. I then used this stress command to execute (basically the same as Benedict's above) {code}./bin/cassandra-stress user profile=~/jasobrown/b7282.yaml ops\(insert=5,read=5\) n=2000 -pop seq=1..10M read-lookback=extreme\(1..1M,2\) -rate threads=200 -mode cql3 native prepared -node node_addresses{code} I've attached the results of running the above command three times successively on both patch and unpatched code, and the results can be summarized: - 15% improvement in throughput - 10% improvement at 95%/99%-iles - 50% improvement at 99.9%-ile ~ 40% less garbage created / 40% less time in GC Note that over the life of stress run, memtables were flushing and sstables compacting, so not all reads were coming directly from the memtable. One thing I perhaps could have tried (and would have liked to) was switching the CL from ONE to LOCAL_QUORUM, although I think stress would have applied that to both reads and writes for the above command, whereas I would have wanted to affect either read or write. I also ran the stress a more 'standard' use case of mine (I'm still developing it alongside the new stress), and results were about even between patch and unpatched, although there may have been very slight advantage toward the patched version. So, I think in the case where you are reading your most recent writes, and the data in the memtable is small (not wide), there is a performance gain in this patch. In a more typical case, it wasn't necessarily proven that the patch boosts performance (but then the patch wasn't attempting to help the general case, anyway). Faster Memtable map --- Key: CASSANDRA-7282 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, run1.svg, writes.svg Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in our memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use a hash partitioner, it occurs to me we could maintain a hybrid
[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140093#comment-14140093 ] Benedict edited comment on CASSANDRA-7282 at 9/19/14 7:01 AM: -- We should test both. This change is going to apply not only now, but in the future. If we reject it, it won't be revisited for a long time. Not only are VM improvements like shenandoah and G1GC going to mean we support users with huge heaps that can support gigantic memtables, this data structure is easily moved completely offheap once the objects it refers to are. Lots of users run with only single disks and large memory capacities also, and with a single disk an mct of 0.5 is the norm. So even an mct of 0.99 is pretty underwhelming for helping to bench these cases. An mct of 0.11 is completely inadequate. However we should test both if that's what you run with, to see how it fairs in different climates. (FTR: my 'realistic' runs were with an mct of 0.4) was (Author: benedict): We should test both. This change is going to apply not only now, but in the future. If we reject it, it won't be revisited for a long time. Not only are VM improvements like shenandoah and G1GC going to mean we support users with huge heaps that can support gigantic memtables, this data structure is easily moved completely offheap once the objects it refers to are. Lots of users run with only single disks and large memory capacities also, and with a single disk an mct of 0.5 is the norm. So even an mct of 0.99 is pretty underwhelming for helping to bench these cases. An mct of 0.11 is completely inadequate. However we should test both if that's what you run with, to see how it fairs in different climates. Faster Memtable map --- Key: CASSANDRA-7282 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in our memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The list would impose the normal order on the collection, but a hash index would live alongside as part of the same data structure, simply mapping into the list and permitting O(1) lookups and inserts. I've chosen to implement this initial version as a linked-list node per item, but we can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes to be checked at once, further reducing the constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140118#comment-14140118 ] Benedict edited comment on CASSANDRA-7282 at 9/19/14 7:11 AM: -- We should be certain to test this with offheap_objects, btw, since this also reduces heap use and increases the number of records we can store. We should also not generally be benchmarking use cases with clustering columns (or very few cql rows and very few data columns, i.e. we should have low cell counts), since they are less likely to exhibit the change, and we have plenty of users with partitions with low cell counts. was (Author: benedict): We should be certain to test this with offheap_objects, btw, since this also reduces heap use and increases the number of records we can store. We should also not generally be benchmarking use cases with clustering columns (or very few cql rows and very few data columns, i.e. low cell counts), since they are less likely to exhibit the change, and we have plenty of users with partitions with low cell counts. Faster Memtable map --- Key: CASSANDRA-7282 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in our memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The list would impose the normal order on the collection, but a hash index would live alongside as part of the same data structure, simply mapping into the list and permitting O(1) lookups and inserts. I've chosen to implement this initial version as a linked-list node per item, but we can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes to be checked at once, further reducing the constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136908#comment-14136908 ] Benedict edited comment on CASSANDRA-7282 at 9/17/14 12:52 PM: --- Some more numbers, with a warmup dataset to populate the map so that variability due to throughput rate is reduced. These numbers show the NBHOM consistently around 3x+ faster, although it introduces per-run variability due to GC. {noformat} Benchmark(readWriteRatio) (type) (warmup) Mode Samples Score Score error Units b.b.c.HashOrderedCollections.test 0.9CSLM 500 thrpt 5 1392.273 2918.717 ops/ms b.b.c.HashOrderedCollections.test 0.9 NBHOM 500 thrpt 5 5088.408 8964.885 ops/ms b.b.c.HashOrderedCollections.test 0.5CSLM 500 thrpt 5 1128.637 2589.679 ops/ms b.b.c.HashOrderedCollections.test 0.5 NBHOM 500 thrpt 5 3406.299 5606.085 ops/ms b.b.c.HashOrderedCollections.test 0.1CSLM 500 thrpt 5 924.642 1802.045 ops/ms b.b.c.HashOrderedCollections.test 0.1 NBHOM 500 thrpt 5 3311.107 999.896 ops/ms b.b.c.HashOrderedCollections.test 0CSLM 500 thrpt 5 939.757 1776.641 ops/ms b.b.c.HashOrderedCollections.test 0 NBHOM 500 thrpt 5 2781.503 4723.844 ops/ms {noformat} edit: same principle but fewer items warmed up, so less variability due to GC: {noformat} Benchmark(readWriteRatio) (type) (warmup) Mode Samples Score Score error Units b.b.c.HashOrderedCollections.test 0.9CSLM 100 thrpt 10 2283.934 157.719 ops/ms b.b.c.HashOrderedCollections.test 0.9 NBHOM 100 thrpt 10 8850.066 147.894 ops/ms b.b.c.HashOrderedCollections.test 0.5CSLM 100 thrpt 10 1960.077 145.752 ops/ms b.b.c.HashOrderedCollections.test 0.5 NBHOM 100 thrpt 10 5637.813 688.394 ops/ms b.b.c.HashOrderedCollections.test 0.1CSLM 100 thrpt 10 706.284 162.845 ops/ms b.b.c.HashOrderedCollections.test 0.1 NBHOM 100 thrpt 10 3270.920 1545.698 ops/ms b.b.c.HashOrderedCollections.test 0CSLM 100 thrpt 10 1689.157 176.412 ops/ms b.b.c.HashOrderedCollections.test 0 NBHOM 100 thrpt 10 2737.195 1042.289 ops/ms {noformat} was (Author: benedict): Some more numbers, with a warmup dataset to populate the map so that variability due to throughput rate is reduced. These numbers show the NBHOM consistently around 3x+ faster, although it introduces per-run variability due to GC. {noformat} Benchmark(readWriteRatio) (type) (warmup) Mode Samples Score Score error Units b.b.c.HashOrderedCollections.test 0.9CSLM 500 thrpt 5 1392.273 2918.717 ops/ms b.b.c.HashOrderedCollections.test 0.9 NBHOM 500 thrpt 5 5088.408 8964.885 ops/ms b.b.c.HashOrderedCollections.test 0.5CSLM 500 thrpt 5 1128.637 2589.679 ops/ms b.b.c.HashOrderedCollections.test 0.5 NBHOM 500 thrpt 5 3406.299 5606.085 ops/ms b.b.c.HashOrderedCollections.test 0.1CSLM 500 thrpt 5 924.642 1802.045 ops/ms b.b.c.HashOrderedCollections.test 0.1 NBHOM 500 thrpt 5 3311.107 999.896 ops/ms b.b.c.HashOrderedCollections.test 0CSLM 500 thrpt 5 939.757 1776.641 ops/ms b.b.c.HashOrderedCollections.test 0 NBHOM 500 thrpt 5 2781.503 4723.844 ops/ms {noformat} Faster Memtable map --- Key: CASSANDRA-7282 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in our memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The list would impose the normal order on the collection, but a hash index would live alongside
[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136908#comment-14136908 ] Benedict edited comment on CASSANDRA-7282 at 9/17/14 1:22 PM: -- Some more numbers, with a warmup dataset to populate the map so that variability due to throughput rate is reduced. These numbers show the NBHOM consistently around 3x+ faster. {noformat} Benchmark(readWriteRatio) (type) (warmup) Mode Samples Score Score error Units b.b.c.HashOrderedCollections.test 0.9CSLM 100 thrpt 10 1775.801 57.310 ops/ms b.b.c.HashOrderedCollections.test 0.9 NBHOM 100 thrpt 10 5186.264 1183.869 ops/ms b.b.c.HashOrderedCollections.test 0.5CSLM 100 thrpt 10 1600.233 65.174 ops/ms b.b.c.HashOrderedCollections.test 0.5 NBHOM 100 thrpt 10 5764.222 93.983 ops/ms b.b.c.HashOrderedCollections.test 0.1CSLM 100 thrpt 10 760.552 72.869 ops/ms b.b.c.HashOrderedCollections.test 0.1 NBHOM 100 thrpt 10 3771.195 188.487 ops/ms b.b.c.HashOrderedCollections.test 0CSLM 100 thrpt 10 704.723 74.047 ops/ms b.b.c.HashOrderedCollections.test 0 NBHOM 100 thrpt 10 3561.286 203.780 ops/ms {noformat} (edit: updated to ensure reads hit keys that are present - this seems to have negative consequences for CSLM, as this is more realistic) was (Author: benedict): Some more numbers, with a warmup dataset to populate the map so that variability due to throughput rate is reduced. These numbers show the NBHOM consistently around 3x+ faster, although it introduces per-run variability due to GC. {noformat} Benchmark(readWriteRatio) (type) (warmup) Mode Samples Score Score error Units b.b.c.HashOrderedCollections.test 0.9CSLM 500 thrpt 5 1392.273 2918.717 ops/ms b.b.c.HashOrderedCollections.test 0.9 NBHOM 500 thrpt 5 5088.408 8964.885 ops/ms b.b.c.HashOrderedCollections.test 0.5CSLM 500 thrpt 5 1128.637 2589.679 ops/ms b.b.c.HashOrderedCollections.test 0.5 NBHOM 500 thrpt 5 3406.299 5606.085 ops/ms b.b.c.HashOrderedCollections.test 0.1CSLM 500 thrpt 5 924.642 1802.045 ops/ms b.b.c.HashOrderedCollections.test 0.1 NBHOM 500 thrpt 5 3311.107 999.896 ops/ms b.b.c.HashOrderedCollections.test 0CSLM 500 thrpt 5 939.757 1776.641 ops/ms b.b.c.HashOrderedCollections.test 0 NBHOM 500 thrpt 5 2781.503 4723.844 ops/ms {noformat} edit: same principle but fewer items warmed up, so less variability due to GC: {noformat} Benchmark(readWriteRatio) (type) (warmup) Mode Samples Score Score error Units b.b.c.HashOrderedCollections.test 0.9CSLM 100 thrpt 10 2283.934 157.719 ops/ms b.b.c.HashOrderedCollections.test 0.9 NBHOM 100 thrpt 10 8850.066 147.894 ops/ms b.b.c.HashOrderedCollections.test 0.5CSLM 100 thrpt 10 1960.077 145.752 ops/ms b.b.c.HashOrderedCollections.test 0.5 NBHOM 100 thrpt 10 5637.813 688.394 ops/ms b.b.c.HashOrderedCollections.test 0.1CSLM 100 thrpt 10 706.284 162.845 ops/ms b.b.c.HashOrderedCollections.test 0.1 NBHOM 100 thrpt 10 3270.920 1545.698 ops/ms b.b.c.HashOrderedCollections.test 0CSLM 100 thrpt 10 1689.157 176.412 ops/ms b.b.c.HashOrderedCollections.test 0 NBHOM 100 thrpt 10 2737.195 1042.289 ops/ms {noformat} Faster Memtable map --- Key: CASSANDRA-7282 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in our memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The list would
[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138519#comment-14138519 ] Jason Brown edited comment on CASSANDRA-7282 at 9/18/14 5:17 AM: - just an update: running into some weird problems getting trunk/benedict's branch to run on my servers - I think it might be some settings I have in the cassandra.in.sh script conflict with 3.0 (i might have had some crazy crap a priori). will figure it out in the morning when i'm not falling asleep. was (Author: jasobrown): just an update: running into some weird problems getting trunk to run on my servers - I think it might be some settings I have in the cassandra.in.sh script conflict with 3.0. will figure it out in the morning when i'm not falling asleep. Faster Memtable map --- Key: CASSANDRA-7282 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in our memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The list would impose the normal order on the collection, but a hash index would live alongside as part of the same data structure, simply mapping into the list and permitting O(1) lookups and inserts. I've chosen to implement this initial version as a linked-list node per item, but we can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes to be checked at once, further reducing the constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132887#comment-14132887 ] Benedict edited comment on CASSANDRA-7282 at 9/13/14 7:14 PM: -- For the Murmur3Partitioner, the Token (which is key for this map) is sorted by hash (represented as a Long), so all we require is that the hashCode() method returns a prefix of this hash, or the top 32-bits of the long value. was (Author: benedict): For the Murmur3Partitioner, the Token (which is key for this map) is sorted by hash, so all we require is that the hashCode() method returns a prefix of this hash. Faster Memtable map --- Key: CASSANDRA-7282 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in our memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The list would impose the normal order on the collection, but a hash index would live alongside as part of the same data structure, simply mapping into the list and permitting O(1) lookups and inserts. I've chosen to implement this initial version as a linked-list node per item, but we can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes to be checked at once, further reducing the constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132913#comment-14132913 ] Branimir Lambov edited comment on CASSANDRA-7282 at 9/13/14 7:45 PM: - {quote}For the Murmur3Partitioner, the Token (which is key for this map) is sorted by hash (represented as a Long), so all we require is that the hashCode() method returns a prefix of this hash, or the top 32-bits of the long value.{quote} This makes sense. Thanks for clarifying, the fact that the list is ordered by the Token escaped me. Since this is the case, should we restrict this method to Murmur tokens only? For anything that sorts naturally we could just grab the first four bytes. I wouldn't even call this a hashCode to avoid the confusion, perhaps an ordering key or ordering prefix? I would still change the condition in the preface to use non-strict inequality, though, because cropping tokens to 32 bits will introduce collisions. was (Author: blambov): This makes sense. Thanks for clarifying, the fact that the list is ordered by the Token escaped me. Since this is the case, should we restrict this method to Murmur tokens only? For anything that sorts naturally we could just grab the first four bytes. I wouldn't even call this a hashCode to avoid the confusion, perhaps an ordering key or ordering prefix? I would still change the condition in the preface to use non-strict inequality, though, because cropping tokens to 32 bits will introduce collisions. Faster Memtable map --- Key: CASSANDRA-7282 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in our memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The list would impose the normal order on the collection, but a hash index would live alongside as part of the same data structure, simply mapping into the list and permitting O(1) lookups and inserts. I've chosen to implement this initial version as a linked-list node per item, but we can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes to be checked at once, further reducing the constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132953#comment-14132953 ] Benedict edited comment on CASSANDRA-7282 at 9/13/14 9:49 PM: -- bq. Since this is the case, should we restrict this method to Murmur tokens only? Well, if I can spend the time ensuring safety we should be able to use this for RandomPartitioner also. bq. I would still change the condition in the preface to use non-strict inequality, though, because cropping tokens to 32 bits will introduce collisions. -There will be collisions with or without truncation. The fact that there are collisions doesn't affect the constraint I've imposed upon the data; we assume nothing about the dataset when two hashCode()s are equal, and simply resort to the underlying token comparator, so I think the constraint is sufficient. Non-strict equality is surely more broken, however? It would hold true for k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional implication, i.e. k1 = k2 == k1.hashCode() = k2.hashCode().- I just reread your comment and realise you meant the RHS only. Agreed this should be changed. bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an ordering key or ordering prefix? Perhaps. This was the simplest approach, and since it *is* a hash key used to index a hash table it seems suitable to use hashCode(), and impose the extra constraint contextually. I'm fairly neutral though; we certainly could introduce a new interface. I'll see how it looks. was (Author: benedict): bq. Since this is the case, should we restrict this method to Murmur tokens only? Well, if I can spend the time ensuring safety we should be able to use this for RandomPartitioner also. bq. I would still change the condition in the preface to use non-strict inequality, though, because cropping tokens to 32 bits will introduce collisions. There will be collisions with or without truncation. The fact that there are collisions doesn't affect the constraint I've imposed upon the data; we assume nothing about the dataset when two hashCode()s are equal, and simply resort to the underlying token comparator, so I think the constraint is sufficient. Non-strict equality is surely more broken, however? It would hold true for k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional implication, i.e. k1 = k2 == k1.hashCode() = k2.hashCode(). bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an ordering key or ordering prefix? Perhaps. This was the simplest approach, and since it *is* a hash key used to index a hash table it seems suitable to use hashCode(), and impose the extra constraint contextually. I'm fairly neutral though; we certainly could introduce a new interface. I'll see how it looks. Faster Memtable map --- Key: CASSANDRA-7282 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in our memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The list would impose the normal order on the collection, but a hash index would live alongside as part of the same data structure, simply mapping into the list and permitting O(1) lookups and inserts. I've chosen to implement this initial version as a linked-list node per item, but we can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes to be checked at once, further reducing the constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132953#comment-14132953 ] Benedict edited comment on CASSANDRA-7282 at 9/13/14 9:55 PM: -- bq. Since this is the case, should we restrict this method to Murmur tokens only? Well, if I can spend the time ensuring safety we should be able to use this for RandomPartitioner also. bq. I would still change the condition in the preface to use non-strict inequality, though, because cropping tokens to 32 bits will introduce collisions. -There will be collisions with or without truncation. The fact that there are collisions doesn't affect the constraint I've imposed upon the data; we assume nothing about the dataset when two hashCode()s are equal, and simply resort to the underlying token comparator, so I think the constraint is sufficient. Non-strict equality is surely more broken, however? It would hold true for k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional implication, i.e. k1 = k2 == k1.hashCode() = k2.hashCode().- I just reread your comment and realise you meant the RHS only. Agreed this should be changed. However we should probably opt for the stronger bidirectional constraint, since it is still more correct. bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an ordering key or ordering prefix? Perhaps. This was the simplest approach, and since it *is* a hash key used to index a hash table it seems suitable to use hashCode(), and impose the extra constraint contextually. I'm fairly neutral though; we certainly could introduce a new interface. I'll see how it looks. was (Author: benedict): bq. Since this is the case, should we restrict this method to Murmur tokens only? Well, if I can spend the time ensuring safety we should be able to use this for RandomPartitioner also. bq. I would still change the condition in the preface to use non-strict inequality, though, because cropping tokens to 32 bits will introduce collisions. -There will be collisions with or without truncation. The fact that there are collisions doesn't affect the constraint I've imposed upon the data; we assume nothing about the dataset when two hashCode()s are equal, and simply resort to the underlying token comparator, so I think the constraint is sufficient. Non-strict equality is surely more broken, however? It would hold true for k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional implication, i.e. k1 = k2 == k1.hashCode() = k2.hashCode().- I just reread your comment and realise you meant the RHS only. Agreed this should be changed. bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an ordering key or ordering prefix? Perhaps. This was the simplest approach, and since it *is* a hash key used to index a hash table it seems suitable to use hashCode(), and impose the extra constraint contextually. I'm fairly neutral though; we certainly could introduce a new interface. I'll see how it looks. Faster Memtable map --- Key: CASSANDRA-7282 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in our memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The list would impose the normal order on the collection, but a hash index would live alongside as part of the same data structure, simply mapping into the list and permitting O(1) lookups and inserts. I've chosen to implement this initial version as a linked-list node per item, but we can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes to be checked at once, further reducing the constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104924#comment-14104924 ] Benedict edited comment on CASSANDRA-7282 at 8/21/14 1:55 AM: -- memtable_heap_space_in_mb: 4096 memtable_offheap_space_in_mb: 2048 -Xm[xn]8Gb (or more) Looks like the top two settings were missing from the yaml somehow. Ninja'd them in. was (Author: benedict): memtable_heap_space_in_mb: 4096 memtable_offheap_space_in_mb: 2048 -Xmx8Gb Faster Memtable map --- Key: CASSANDRA-7282 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in our memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The list would impose the normal order on the collection, but a hash index would live alongside as part of the same data structure, simply mapping into the list and permitting O(1) lookups and inserts. I've chosen to implement this initial version as a linked-list node per item, but we can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes to be checked at once, further reducing the constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map
[ https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104970#comment-14104970 ] Ryan McGuire edited comment on CASSANDRA-7282 at 8/21/14 4:10 AM: -- Thanks Benedict, I tried it with those new settings. Still using memtable_cleanup_threshold of 0.5 : http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.7282.3.json Getting a bit more interesting now, not sure what's up with that first read and then the drop. EDIT: a second run of this shows similar results: http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.7282.3-redo.jsonmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=389.62ymin=0ymax=196953.9 I'll try scaling up memtable_cleanup_threshold again. was (Author: enigmacurry): Thanks Benedict, I tried it with those new settings. Still using memtable_cleanup_threshold of 0.5 : http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.7282.3.json Getting a bit more interesting now, not sure what's up with that first read and then the drop. I'll try scaling up memtable_cleanup_threshold again. Faster Memtable map --- Key: CASSANDRA-7282 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Labels: performance Fix For: 3.0 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in our memtables. Maintaining this is an O(lg(n)) operation; since the vast majority of users use a hash partitioner, it occurs to me we could maintain a hybrid ordered list / hash map. The list would impose the normal order on the collection, but a hash index would live alongside as part of the same data structure, simply mapping into the list and permitting O(1) lookups and inserts. I've chosen to implement this initial version as a linked-list node per item, but we can optimise this in future by storing fatter nodes that permit a cache-line's worth of hashes to be checked at once, further reducing the constant factor costs for lookups. -- This message was sent by Atlassian JIRA (v6.2#6252)