[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2018-07-05 Thread Michael Burman (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534123#comment-16534123
 ] 

Michael Burman edited comment on CASSANDRA-7282 at 7/5/18 8:45 PM:
---

Updated the code with hopefully what you meant.. (need to get some sleep). I 
see worse performance, which I believe is because the added complexity of the 
minimum processing (the improved memory layout does not help enough - maybe the 
microbenchmark fits the caches in other cases also). I'll need to investigate 
more tomorrow.

{code}
 [java] Benchmark(vnodes)   Mode  Cnt   Score   
   Error   Units
 [java] BSTBench.multiArrayBinarySeek   4  thrpt   16  106134.448 ± 
3211.072  ops/ms
 [java] BSTBench.multiArrayBinarySeek  16  thrpt   16   21704.218 ± 
 788.881  ops/ms
 [java] BSTBench.multiArrayBinarySeek  64  thrpt   164347.090 ± 
 106.703  ops/ms
 [java] BSTBench.multiArrayBinarySeek 256  thrpt   16 667.122 ± 
 210.476  ops/ms
 [java] BSTBench.multiArrayBinarySeek 512  thrpt   16 192.777 ± 
  29.690  ops/ms
 [java] BSTBench.multiArrayLinearSearch 4  thrpt   16  172888.358 ± 
7778.132  ops/ms
 [java] BSTBench.multiArrayLinearSearch16  thrpt   16   16703.622 ± 
 565.180  ops/ms
 [java] BSTBench.multiArrayLinearSearch64  thrpt   161096.093 ± 
  33.050  ops/ms
 [java] BSTBench.multiArrayLinearSearch   256  thrpt   16  84.511 ± 
   2.776  ops/ms
 [java] BSTBench.multiArrayLinearSearch   512  thrpt   16  22.130 ± 
   0.714  ops/ms
{code}


was (Author: burmanm):
Updated the code with hopefully what you meant.. (need to get some sleep). I 
see worse performance, which I believe is because the added complexity of the 
minimum processing. I'll need to investigate more tomorrow.

{code}
 [java] Benchmark(vnodes)   Mode  Cnt   Score   
   Error   Units
 [java] BSTBench.multiArrayBinarySeek   4  thrpt   16  106134.448 ± 
3211.072  ops/ms
 [java] BSTBench.multiArrayBinarySeek  16  thrpt   16   21704.218 ± 
 788.881  ops/ms
 [java] BSTBench.multiArrayBinarySeek  64  thrpt   164347.090 ± 
 106.703  ops/ms
 [java] BSTBench.multiArrayBinarySeek 256  thrpt   16 667.122 ± 
 210.476  ops/ms
 [java] BSTBench.multiArrayBinarySeek 512  thrpt   16 192.777 ± 
  29.690  ops/ms
 [java] BSTBench.multiArrayLinearSearch 4  thrpt   16  172888.358 ± 
7778.132  ops/ms
 [java] BSTBench.multiArrayLinearSearch16  thrpt   16   16703.622 ± 
 565.180  ops/ms
 [java] BSTBench.multiArrayLinearSearch64  thrpt   161096.093 ± 
  33.050  ops/ms
 [java] BSTBench.multiArrayLinearSearch   256  thrpt   16  84.511 ± 
   2.776  ops/ms
 [java] BSTBench.multiArrayLinearSearch   512  thrpt   16  22.130 ± 
   0.714  ops/ms
{code}

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2018-07-04 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533127#comment-16533127
 ] 

Benedict edited comment on CASSANDRA-7282 at 7/4/18 11:35 PM:
--

Hi Michael,

I'll try to review this properly, as I find time.  A couple of things leap out 
to me though:
 # While we certainly need to handle out of range tokens, it seems to me that 
we should anyway try hard to minimise their occurrence, by e.g. switching 
memtable on a token range change.  Ideally we would switch before this new 
token range actually takes effect in the case of an expansion; whether or not 
it makes sense to defer a token range removal probably needs some thought.
 # A range tree is probably overkill.  Why not binary search over an array?


was (Author: benedict):
Hi Michael,

I'll try to review this properly, as I find time.  A couple of things leap out 
to me though:
 # While we certainly need to handle out of range tokens, we should try harder 
to minimise their occurrence, by switching memtable on a token range change.  
Ideally we would switch before this new token range actually takes effect in 
the case of an expansion; whether or not it makes sense to defer a token range 
removal probably needs some thought.
 # A range tree is probably overkill.  Why not binary search over an array?

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2015-01-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273063#comment-14273063
 ] 

Benedict edited comment on CASSANDRA-7282 at 1/11/15 9:19 PM:
--

Just a note for the future (I haven't yet addressed your comments, but 
appreciate the thorough review): It may be we want to harden the data structure 
against determined attackers that produce a lot of colliding tokens with 
different partition keys. 

It's not a terribly complex modification to make this a skip-list, instead of a 
linked-list (since it is add-only, maintaining a skip-list is just applying the 
current linked-list logic N times, and the majority of complexity here is for 
the hash index). The hash index would remain an _optimisation_ for expected 
constant time access into a logarithmic structure, instead of a linear one, 
giving us best- and expected- case constant time, worst case logarithmic. The 
skip-list index levels could perhaps be dropped as the hash index extends to 
overlap them, although it may be cheaper to encode them all in the nodes 
themselves. This should probably be a follow up ticket, but I wanted to log the 
thought as it occurred to me.


was (Author: benedict):
Just a note for the future (I haven't yet addressed your comments): It may be 
we want to harden the data structure against determined attackers that produce 
a lot of colliding tokens with different partition keys. 

It's not a terribly complex modification to make this a skip-list, instead of a 
linked-list (since it is add-only, maintaining a skip-list is just applying the 
current linked-list logic N times, and the majority of complexity here is for 
the hash index). The hash index would remain an _optimisation_ for expected 
constant time access into a logarithmic structure, instead of a linear one, 
giving us best- and expected- case constant time, worst case logarithmic. The 
skip-list index levels could perhaps be dropped as the hash index extends to 
overlap them, although it may be cheaper to encode them all in the nodes 
themselves. This should probably be a follow up ticket, but I wanted to log the 
thought as it occurred to me.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
 run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2014-09-25 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147705#comment-14147705
 ] 

Jason Brown edited comment on CASSANDRA-7282 at 9/25/14 1:05 PM:
-

I ran benedict's 7282 patch vs. the commit prior in his branch, for several use 
cases. TL;DR the patch was a clear winner for the the specific use case he 
called out, and marginally better, if not the same, for a 'typical' use case.

For the record, I set mct = 0.5 and was using offheap_objects.

For the first use case, I used Benedict's attached profile.yaml, although i 
changed the RF from 1 to 2. I then used this stress command to execute 
(basically the same as Benedict's above) {code}./bin/cassandra-stress user 
profile=~/jasobrown/b7282.yaml ops\(insert=5,read=5\) n=2000 -pop 
seq=1..10M read-lookback=extreme\(1..1M,2\) -rate threads=200 -mode cql3 native 
prepared -node node_addresses{code}

I've attached the results of running the above command three times successively 
on both patch and unpatched code, and the results can be summarized:
- 15% improvement in throughput
- 10% improvement at 95%/99%-iles
- 50% improvement at 99.9%-ile
~ 40% less garbage created / 40% less time in GC

Note that over the life of stress run, memtables were flushing and sstables 
compacting, so not all reads were coming directly from the memtable.

One thing I perhaps could have tried (and would have liked to) was switching 
the CL from ONE to LOCAL_QUORUM, although I think stress would have applied 
that to both reads and writes for the above command, whereas I would have 
wanted to affect either read or write.

I also ran the stress a more 'standard' use case of mine (I'm still developing 
it alongside the new stress), and results were about even between patch and 
unpatched, although there may have been very slight advantage toward the 
patched version.

So, I think in the case where you are reading your most recent writes, and the 
data in the memtable is small (not wide), there is a performance gain in this 
patch. In a more typical case, it wasn't necessarily proven that the patch 
boosts performance (but then the patch wasn't attempting to help the general 
case, anyway).



was (Author: jasobrown):
I ran benedict's 7282 patch vs. the commit prior in his branch, for several use 
cases. TL;DR the patch was a clear winner for the the specific use case he 
called out, and marginally better, if not the same, for a 'typical' use case.

For the first use case, I used Benedict's attached profile.yaml, although i 
changed the RF from 1 to 2. I then used this stress command to execute 
(basically the same as Benedict's above) {code}./bin/cassandra-stress user 
profile=~/jasobrown/b7282.yaml ops\(insert=5,read=5\) n=2000 -pop 
seq=1..10M read-lookback=extreme\(1..1M,2\) -rate threads=200 -mode cql3 native 
prepared -node node_addresses{code}

I've attached the results of running the above command three times successively 
on both patch and unpatched code, and the results can be summarized:
- 15% improvement in throughput
- 10% improvement at 95%/99%-iles
- 50% improvement at 99.9%-ile
~ 40% less garbage created / 40% less time in GC

Note that over the life of stress run, memtables were flushing and sstables 
compacting, so not all reads were coming directly from the memtable.

One thing I perhaps could have tried (and would have liked to) was switching 
the CL from ONE to LOCAL_QUORUM, although I think stress would have applied 
that to both reads and writes for the above command, whereas I would have 
wanted to affect either read or write.

I also ran the stress a more 'standard' use case of mine (I'm still developing 
it alongside the new stress), and results were about even between patch and 
unpatched, although there may have been very slight advantage toward the 
patched version.

So, I think in the case where you are reading your most recent writes, and the 
data in the memtable is small (not wide), there is a performance gain in this 
patch. In a more typical case, it wasn't necessarily proven that the patch 
boosts performance (but then the patch wasn't attempting to help the general 
case, anyway).


 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
 run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid 

[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2014-09-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140093#comment-14140093
 ] 

Benedict edited comment on CASSANDRA-7282 at 9/19/14 7:01 AM:
--

We should test both. This change is going to apply not only now, but in the 
future. If we reject it, it won't be revisited for a long time. Not only are VM 
improvements like shenandoah and G1GC going to mean we support users with huge 
heaps that can support gigantic memtables, this data structure is easily moved 
completely offheap once the objects it refers to are. Lots of users run with 
only single disks and large memory capacities also, and with a single disk an 
mct of 0.5 is the norm. So even an mct of 0.99 is pretty underwhelming for 
helping to bench these cases. An mct of 0.11 is completely inadequate.

However we should test both if that's what you run with, to see how it fairs in 
different climates.

(FTR: my 'realistic' runs were with an mct of 0.4)


was (Author: benedict):
We should test both. This change is going to apply not only now, but in the 
future. If we reject it, it won't be revisited for a long time. Not only are VM 
improvements like shenandoah and G1GC going to mean we support users with huge 
heaps that can support gigantic memtables, this data structure is easily moved 
completely offheap once the objects it refers to are. Lots of users run with 
only single disks and large memory capacities also, and with a single disk an 
mct of 0.5 is the norm. So even an mct of 0.99 is pretty underwhelming for 
helping to bench these cases. An mct of 0.11 is completely inadequate.

However we should test both if that's what you run with, to see how it fairs in 
different climates.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2014-09-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140118#comment-14140118
 ] 

Benedict edited comment on CASSANDRA-7282 at 9/19/14 7:11 AM:
--

We should be certain to test this with offheap_objects, btw, since this also 
reduces heap use and increases the number of records we can store. We should 
also not generally be benchmarking use cases with clustering columns (or very 
few cql rows and very few data columns, i.e. we should have low cell counts), 
since they are less likely to exhibit the change, and we have plenty of users 
with partitions with low cell counts.


was (Author: benedict):
We should be certain to test this with offheap_objects, btw, since this also 
reduces heap use and increases the number of records we can store. We should 
also not generally be benchmarking use cases with clustering columns (or very 
few cql rows and very few data columns, i.e. low cell counts), since they are 
less likely to exhibit the change, and we have plenty of users with partitions 
with low cell counts.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2014-09-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136908#comment-14136908
 ] 

Benedict edited comment on CASSANDRA-7282 at 9/17/14 12:52 PM:
---

Some more numbers, with a warmup dataset to populate the map so that 
variability due to throughput rate is reduced. These numbers show the NBHOM 
consistently around 3x+ faster, although it introduces per-run variability due 
to GC.

{noformat}
Benchmark(readWriteRatio)  (type)  (warmup)   Mode  
Samples Score  Score error   Units
b.b.c.HashOrderedCollections.test 0.9CSLM   500  thrpt  
  5  1392.273 2918.717  ops/ms
b.b.c.HashOrderedCollections.test 0.9   NBHOM   500  thrpt  
  5  5088.408 8964.885  ops/ms
b.b.c.HashOrderedCollections.test 0.5CSLM   500  thrpt  
  5  1128.637 2589.679  ops/ms
b.b.c.HashOrderedCollections.test 0.5   NBHOM   500  thrpt  
  5  3406.299 5606.085  ops/ms
b.b.c.HashOrderedCollections.test 0.1CSLM   500  thrpt  
  5   924.642 1802.045  ops/ms
b.b.c.HashOrderedCollections.test 0.1   NBHOM   500  thrpt  
  5  3311.107  999.896  ops/ms
b.b.c.HashOrderedCollections.test   0CSLM   500  thrpt  
  5   939.757 1776.641  ops/ms
b.b.c.HashOrderedCollections.test   0   NBHOM   500  thrpt  
  5  2781.503 4723.844  ops/ms
{noformat}

edit: same principle but fewer items warmed up, so less variability due to GC:

{noformat}
Benchmark(readWriteRatio)  (type)  (warmup)   Mode  
Samples Score  Score error   Units
b.b.c.HashOrderedCollections.test 0.9CSLM   100  thrpt  
 10  2283.934  157.719  ops/ms
b.b.c.HashOrderedCollections.test 0.9   NBHOM   100  thrpt  
 10  8850.066  147.894  ops/ms
b.b.c.HashOrderedCollections.test 0.5CSLM   100  thrpt  
 10  1960.077  145.752  ops/ms
b.b.c.HashOrderedCollections.test 0.5   NBHOM   100  thrpt  
 10  5637.813  688.394  ops/ms
b.b.c.HashOrderedCollections.test 0.1CSLM   100  thrpt  
 10   706.284  162.845  ops/ms
b.b.c.HashOrderedCollections.test 0.1   NBHOM   100  thrpt  
 10  3270.920 1545.698  ops/ms
b.b.c.HashOrderedCollections.test   0CSLM   100  thrpt  
 10  1689.157  176.412  ops/ms
b.b.c.HashOrderedCollections.test   0   NBHOM   100  thrpt  
 10  2737.195 1042.289  ops/ms
{noformat}



was (Author: benedict):
Some more numbers, with a warmup dataset to populate the map so that 
variability due to throughput rate is reduced. These numbers show the NBHOM 
consistently around 3x+ faster, although it introduces per-run variability due 
to GC.

{noformat}
Benchmark(readWriteRatio)  (type)  (warmup)   Mode  
Samples Score  Score error   Units
b.b.c.HashOrderedCollections.test 0.9CSLM   500  thrpt  
  5  1392.273 2918.717  ops/ms
b.b.c.HashOrderedCollections.test 0.9   NBHOM   500  thrpt  
  5  5088.408 8964.885  ops/ms
b.b.c.HashOrderedCollections.test 0.5CSLM   500  thrpt  
  5  1128.637 2589.679  ops/ms
b.b.c.HashOrderedCollections.test 0.5   NBHOM   500  thrpt  
  5  3406.299 5606.085  ops/ms
b.b.c.HashOrderedCollections.test 0.1CSLM   500  thrpt  
  5   924.642 1802.045  ops/ms
b.b.c.HashOrderedCollections.test 0.1   NBHOM   500  thrpt  
  5  3311.107  999.896  ops/ms
b.b.c.HashOrderedCollections.test   0CSLM   500  thrpt  
  5   939.757 1776.641  ops/ms
b.b.c.HashOrderedCollections.test   0   NBHOM   500  thrpt  
  5  2781.503 4723.844  ops/ms
{noformat}

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside 

[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2014-09-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136908#comment-14136908
 ] 

Benedict edited comment on CASSANDRA-7282 at 9/17/14 1:22 PM:
--

Some more numbers, with a warmup dataset to populate the map so that 
variability due to throughput rate is reduced. These numbers show the NBHOM 
consistently around 3x+ faster.

{noformat}
Benchmark(readWriteRatio)  (type)  (warmup)   Mode  
Samples Score  Score error   Units
b.b.c.HashOrderedCollections.test 0.9CSLM   100  thrpt  
 10  1775.801   57.310  ops/ms
b.b.c.HashOrderedCollections.test 0.9   NBHOM   100  thrpt  
 10  5186.264 1183.869  ops/ms
b.b.c.HashOrderedCollections.test 0.5CSLM   100  thrpt  
 10  1600.233   65.174  ops/ms
b.b.c.HashOrderedCollections.test 0.5   NBHOM   100  thrpt  
 10  5764.222   93.983  ops/ms
b.b.c.HashOrderedCollections.test 0.1CSLM   100  thrpt  
 10   760.552   72.869  ops/ms
b.b.c.HashOrderedCollections.test 0.1   NBHOM   100  thrpt  
 10  3771.195  188.487  ops/ms
b.b.c.HashOrderedCollections.test   0CSLM   100  thrpt  
 10   704.723   74.047  ops/ms
b.b.c.HashOrderedCollections.test   0   NBHOM   100  thrpt  
 10  3561.286  203.780  ops/ms
{noformat}

(edit: updated to ensure reads hit keys that are present - this seems to have 
negative consequences for CSLM, as this is more realistic)


was (Author: benedict):
Some more numbers, with a warmup dataset to populate the map so that 
variability due to throughput rate is reduced. These numbers show the NBHOM 
consistently around 3x+ faster, although it introduces per-run variability due 
to GC.

{noformat}
Benchmark(readWriteRatio)  (type)  (warmup)   Mode  
Samples Score  Score error   Units
b.b.c.HashOrderedCollections.test 0.9CSLM   500  thrpt  
  5  1392.273 2918.717  ops/ms
b.b.c.HashOrderedCollections.test 0.9   NBHOM   500  thrpt  
  5  5088.408 8964.885  ops/ms
b.b.c.HashOrderedCollections.test 0.5CSLM   500  thrpt  
  5  1128.637 2589.679  ops/ms
b.b.c.HashOrderedCollections.test 0.5   NBHOM   500  thrpt  
  5  3406.299 5606.085  ops/ms
b.b.c.HashOrderedCollections.test 0.1CSLM   500  thrpt  
  5   924.642 1802.045  ops/ms
b.b.c.HashOrderedCollections.test 0.1   NBHOM   500  thrpt  
  5  3311.107  999.896  ops/ms
b.b.c.HashOrderedCollections.test   0CSLM   500  thrpt  
  5   939.757 1776.641  ops/ms
b.b.c.HashOrderedCollections.test   0   NBHOM   500  thrpt  
  5  2781.503 4723.844  ops/ms
{noformat}

edit: same principle but fewer items warmed up, so less variability due to GC:

{noformat}
Benchmark(readWriteRatio)  (type)  (warmup)   Mode  
Samples Score  Score error   Units
b.b.c.HashOrderedCollections.test 0.9CSLM   100  thrpt  
 10  2283.934  157.719  ops/ms
b.b.c.HashOrderedCollections.test 0.9   NBHOM   100  thrpt  
 10  8850.066  147.894  ops/ms
b.b.c.HashOrderedCollections.test 0.5CSLM   100  thrpt  
 10  1960.077  145.752  ops/ms
b.b.c.HashOrderedCollections.test 0.5   NBHOM   100  thrpt  
 10  5637.813  688.394  ops/ms
b.b.c.HashOrderedCollections.test 0.1CSLM   100  thrpt  
 10   706.284  162.845  ops/ms
b.b.c.HashOrderedCollections.test 0.1   NBHOM   100  thrpt  
 10  3270.920 1545.698  ops/ms
b.b.c.HashOrderedCollections.test   0CSLM   100  thrpt  
 10  1689.157  176.412  ops/ms
b.b.c.HashOrderedCollections.test   0   NBHOM   100  thrpt  
 10  2737.195 1042.289  ops/ms
{noformat}


 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would 

[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2014-09-17 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138519#comment-14138519
 ] 

Jason Brown edited comment on CASSANDRA-7282 at 9/18/14 5:17 AM:
-

just an update: running into some weird problems getting trunk/benedict's 
branch to run on my servers - I think it might be some settings I have in the 
cassandra.in.sh script conflict with 3.0 (i might have had some crazy crap a 
priori). will figure it out in the morning when i'm not falling asleep.


was (Author: jasobrown):
just an update: running into some weird problems getting trunk to run on my 
servers - I think it might be some settings I have in the cassandra.in.sh 
script conflict with 3.0. will figure it out in the morning when i'm not 
falling asleep.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2014-09-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132887#comment-14132887
 ] 

Benedict edited comment on CASSANDRA-7282 at 9/13/14 7:14 PM:
--

For the Murmur3Partitioner, the Token (which is key for this map) is sorted by 
hash (represented as a Long), so all we require is that the hashCode() method 
returns a prefix of this hash, or the top 32-bits of the long value.


was (Author: benedict):
For the Murmur3Partitioner, the Token (which is key for this map) is sorted by 
hash, so all we require is that the hashCode() method returns a prefix of this 
hash.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2014-09-13 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132913#comment-14132913
 ] 

Branimir Lambov edited comment on CASSANDRA-7282 at 9/13/14 7:45 PM:
-

{quote}For the Murmur3Partitioner, the Token (which is key for this map) is 
sorted by hash (represented as a Long), so all we require is that the 
hashCode() method returns a prefix of this hash, or the top 32-bits of the long 
value.{quote}

This makes sense. Thanks for clarifying, the fact that the list is ordered by 
the Token escaped me. Since this is the case, should we restrict this method to 
Murmur tokens only? For anything that sorts naturally we could just grab the 
first four bytes. I wouldn't even call this a hashCode to avoid the confusion, 
perhaps an ordering key or ordering prefix?

I would still change the condition in the preface to use non-strict inequality, 
though, because cropping tokens to 32 bits will introduce collisions.


was (Author: blambov):
This makes sense. Thanks for clarifying, the fact that the list is ordered by 
the Token escaped me. Since this is the case, should we restrict this method to 
Murmur tokens only? For anything that sorts naturally we could just grab the 
first four bytes. I wouldn't even call this a hashCode to avoid the confusion, 
perhaps an ordering key or ordering prefix?

I would still change the condition in the preface to use non-strict inequality, 
though, because cropping tokens to 32 bits will introduce collisions.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2014-09-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132953#comment-14132953
 ] 

Benedict edited comment on CASSANDRA-7282 at 9/13/14 9:49 PM:
--

bq. Since this is the case, should we restrict this method to Murmur tokens 
only?

Well, if I can spend the time ensuring safety we should be able to use this for 
RandomPartitioner also.

bq. I would still change the condition in the preface to use non-strict 
inequality, though, because cropping tokens to 32 bits will introduce 
collisions.

-There will be collisions with or without truncation. The fact that there are 
collisions doesn't affect the constraint I've imposed upon the data; we assume 
nothing about the dataset when two hashCode()s are equal, and simply resort to 
the underlying token comparator, so I think the constraint is sufficient. 
Non-strict equality is surely more broken, however? It would hold true for 
k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional 
implication, i.e. k1 = k2 == k1.hashCode() = k2.hashCode().-

I just reread your comment and realise you meant the RHS only. Agreed this 
should be changed.

bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an 
ordering key or ordering prefix?

Perhaps. This was the simplest approach, and since it *is* a hash key used to 
index a hash table it seems suitable to use hashCode(), and impose the extra 
constraint contextually. I'm fairly neutral though; we certainly could 
introduce a new interface. I'll see how it looks.


was (Author: benedict):
bq. Since this is the case, should we restrict this method to Murmur tokens 
only?

Well, if I can spend the time ensuring safety we should be able to use this for 
RandomPartitioner also.

bq. I would still change the condition in the preface to use non-strict 
inequality, though, because cropping tokens to 32 bits will introduce 
collisions.

There will be collisions with or without truncation. The fact that there are 
collisions doesn't affect the constraint I've imposed upon the data; we assume 
nothing about the dataset when two hashCode()s are equal, and simply resort to 
the underlying token comparator, so I think the constraint is sufficient. 
Non-strict equality is surely more broken, however? It would hold true for 
k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional 
implication, i.e. k1 = k2 == k1.hashCode() = k2.hashCode().

bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an 
ordering key or ordering prefix?

Perhaps. This was the simplest approach, and since it *is* a hash key used to 
index a hash table it seems suitable to use hashCode(), and impose the extra 
constraint contextually. I'm fairly neutral though; we certainly could 
introduce a new interface. I'll see how it looks.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2014-09-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132953#comment-14132953
 ] 

Benedict edited comment on CASSANDRA-7282 at 9/13/14 9:55 PM:
--

bq. Since this is the case, should we restrict this method to Murmur tokens 
only?

Well, if I can spend the time ensuring safety we should be able to use this for 
RandomPartitioner also.

bq. I would still change the condition in the preface to use non-strict 
inequality, though, because cropping tokens to 32 bits will introduce 
collisions.

-There will be collisions with or without truncation. The fact that there are 
collisions doesn't affect the constraint I've imposed upon the data; we assume 
nothing about the dataset when two hashCode()s are equal, and simply resort to 
the underlying token comparator, so I think the constraint is sufficient. 
Non-strict equality is surely more broken, however? It would hold true for 
k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional 
implication, i.e. k1 = k2 == k1.hashCode() = k2.hashCode().-

I just reread your comment and realise you meant the RHS only. Agreed this 
should be changed. However we should probably opt for the stronger 
bidirectional constraint, since it is still more correct.

bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an 
ordering key or ordering prefix?

Perhaps. This was the simplest approach, and since it *is* a hash key used to 
index a hash table it seems suitable to use hashCode(), and impose the extra 
constraint contextually. I'm fairly neutral though; we certainly could 
introduce a new interface. I'll see how it looks.


was (Author: benedict):
bq. Since this is the case, should we restrict this method to Murmur tokens 
only?

Well, if I can spend the time ensuring safety we should be able to use this for 
RandomPartitioner also.

bq. I would still change the condition in the preface to use non-strict 
inequality, though, because cropping tokens to 32 bits will introduce 
collisions.

-There will be collisions with or without truncation. The fact that there are 
collisions doesn't affect the constraint I've imposed upon the data; we assume 
nothing about the dataset when two hashCode()s are equal, and simply resort to 
the underlying token comparator, so I think the constraint is sufficient. 
Non-strict equality is surely more broken, however? It would hold true for 
k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional 
implication, i.e. k1 = k2 == k1.hashCode() = k2.hashCode().-

I just reread your comment and realise you meant the RHS only. Agreed this 
should be changed.

bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an 
ordering key or ordering prefix?

Perhaps. This was the simplest approach, and since it *is* a hash key used to 
index a hash table it seems suitable to use hashCode(), and impose the extra 
constraint contextually. I'm fairly neutral though; we certainly could 
introduce a new interface. I'll see how it looks.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2014-08-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104924#comment-14104924
 ] 

Benedict edited comment on CASSANDRA-7282 at 8/21/14 1:55 AM:
--

memtable_heap_space_in_mb: 4096
memtable_offheap_space_in_mb: 2048
-Xm[xn]8Gb (or more)

Looks like the top two settings were missing from the yaml somehow. Ninja'd 
them in.


was (Author: benedict):
memtable_heap_space_in_mb: 4096
memtable_offheap_space_in_mb: 2048
-Xmx8Gb

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-7282) Faster Memtable map

2014-08-20 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104970#comment-14104970
 ] 

Ryan McGuire edited comment on CASSANDRA-7282 at 8/21/14 4:10 AM:
--

Thanks Benedict, I tried it with those new settings. Still using 
memtable_cleanup_threshold of 0.5 :

http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.7282.3.json

Getting a bit more interesting now, not sure what's up with that first read and 
then the drop.

EDIT: a second run of this shows similar results: 
http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.7282.3-redo.jsonmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=389.62ymin=0ymax=196953.9

I'll try scaling up memtable_cleanup_threshold again.


was (Author: enigmacurry):
Thanks Benedict, I tried it with those new settings. Still using 
memtable_cleanup_threshold of 0.5 :

http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.7282.3.json

Getting a bit more interesting now, not sure what's up with that first read and 
then the drop.

I'll try scaling up memtable_cleanup_threshold again.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)