[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-07-09 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537417#comment-16537417
 ] 

Benedict commented on CASSANDRA-7282:
-

Also, as a random aside, I have no idea where I got the idea it was called Van 
Emde Boas layout.  I'm sure the internet mislead me one day, and I never 
verified the nomenclature, but apparently it's actually Eytzinger layout, and 
we should probably avoid propagating this error.

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-07-09 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537398#comment-16537398
 ] 

Benedict commented on CASSANDRA-7282:
-

bq. but without Value Types that's a pretty long shot with Java

Well, you can do n-ary (i.e. B-Tree) equivalents of the Van Emde Boas layout; 
or you could do a simpler binary Van Emde Boas initial search (sized to about a 
cache fetch unit), to select the correct range of a sorted array to search, 
which would probably yield similar results for our domain sizes.  

But to be honest I only really put the binary Van Emde Boas approach in here to 
save time, as it was faster than digging too closely into the differences in 
the benchmark between the two more basic approaches.

bq. I'll start the other ticket by moving the range detection to an upper level 
by creating a new Memtable per range

Great.  Thanks

bq.  I'll need to find a good way from getting that event from bootstrap & 
streaming 

So thinking about this a bit more, it seems there are four options:

1) Locate all the places the token ring changes, insert some code
2) Subscribe to changes in the token ring, and flush if we introduce a new 
range (or expand an existing range)
3) Detect an out-of-range message, and synchronise our ranges with the ring
4) Detect an out-of-range message, and simply route it to an out-of-range bucket

The thing is, there's nothing right now preventing us receiving records for a 
range not assigned to us.  So we will need an out-of-range bucket.  But at the 
same time, we will have to update ourselves, consciously, when the ring 
updates.  If we're moving up out of the memtable to impose this, we cannot rely 
on flush to resynchronise ourselves.

So, probably we will need an out-of-range bucket anyway - that IMO should be 
served by a CSLM until we implement NBHOM degrading to a skip list under skew.

But we also want to know when our owned ranges change - and this can happen for 
a variety of circumstances.  Probably the most robust thing to do, is to 
subscribe to changes in TokenMetadata.  Which isn't currently possible, but 
shouldn't be onerous to add - though I hate adding to an already ugly class.

bq. Also, any table not using ranges (such as system, maybe some new virtual 
ones?) don't use any.

Those tables that don't use any ranges all use the LocalStrategy for 
replication, which right now looks like it would yield a separate range for 
every token in the cluster, all mapping to localhost.  We should probably 
confirm this is the case, and override the method to just return a single range 
owned by localhost.

We should probably take this discussion to another ticket though.


> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-07-09 Thread Michael Burman (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537046#comment-16537046
 ] 

Michael Burman commented on CASSANDRA-7282:
---

Yeah, VamEmdeBoas is fine. In theory there are few better (according to "Array 
layouts for comparison-based searching" including B-Tree in array, but without 
Value Types that's a pretty long shot with Java) but as said, this is perhaps 
not the best place to spend more time on - considering how small amount from 
overall execution time it takes and how small improvements are expected. So 
I'll change the implementation to veb.

I'll start the other ticket by moving the range detection to an upper level by 
creating a new Memtable per range and use the array for finding the correct 
Memtable (but still make the NonBlockingHashMap aware of the range it processes 
as it needs that for the hash normalization). I'll need to find a good way from 
getting that event from bootstrap & streaming (did I forget any other event 
that could create more - or do you have other idea where to get it best?) back 
to CFS if we want it before there's any actual writes happening. Also, any 
table not using ranges (such as system, maybe some new virtual ones?) don't use 
any.

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-07-06 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535478#comment-16535478
 ] 

Benedict commented on CASSANDRA-7282:
-

I've posted a slightly modified gist 
[here|https://gist.github.com/belliottsmith/b559b6a09882009fb229e1485dcb7ac4]

This simply ensures that the tests for each algorithm run identical operations 
(by ensuring we use a consistent seed across forks/runs), randomises a larger 
number of search keys, introduces a new polluteCache parameter that reduces 
cache occupancy by generating many tests, and walking through them linearly for 
each invocation.

I also fixed a couple of minor bugs - or at least inconsistencies in logic.

It also introduces two new variants: A simpler (IMO) binary search variant, 
using some old code of mine from 
[here|https://github.com/belliottsmith/jjoost/blob/master/jjoost-base/src/org/jjoost/util/order/IntOrder.java],
 and a Van Emde Boas array layout, that should exhibit the best cache 
behaviour, by netting the possible benefits of a tree (highest level nodes 
retaining cache occupancy across operations) with those of the array (much 
smaller footprint).  In reality there are better still variants, but the 
project is a long way from benefitting from those, here, I expect.

 
||Benchmark||(polluteCache)||(vnodes)||Mode||Cnt||Score Error Units||
|BSTBench.multiArrayBinarySearchFloor|1|4|thrpt|16|12065.122 ± 401.339 ops/ms|
|BSTBench.multiArrayBinarySearchFloor|1|64|thrpt|16|6292.591 ± 135.290 ops/ms|
|BSTBench.multiArrayBinarySearchFloor|1|512|thrpt|16|4702.458 ± 54.880 ops/ms|
|BSTBench.multiArrayBinarySearchFloor|4096|4|thrpt|16|4691.372 ± 151.983 ops/ms|
|BSTBench.multiArrayBinarySearchFloor|4096|64|thrpt|16|4218.482 ± 145.470 
ops/ms|
|BSTBench.multiArrayBinarySearchFloor|4096|512|thrpt|16|3737.212 ± 110.495 
ops/ms|
|BSTBench.multiArrayBinarySeek|1|4|thrpt|16|12657.483 ± 266.605 ops/ms|
|BSTBench.multiArrayBinarySeek|1|64|thrpt|16|7002.695 ± 105.505 ops/ms|
|BSTBench.multiArrayBinarySeek|1|512|thrpt|16|4339.367 ± 235.567 ops/ms|
|BSTBench.multiArrayBinarySeek|4096|4|thrpt|16|4517.014 ± 94.628 ops/ms|
|BSTBench.multiArrayBinarySeek|4096|64|thrpt|16|3984.565 ± 173.088 ops/ms|
|BSTBench.multiArrayBinarySeek|4096|512|thrpt|16|3682.319 ± 125.062 ops/ms|
|BSTBench.multiArrayVanEmdeBoasLayout|1|4|thrpt|16|15872.360 ± 390.277 ops/ms|
|BSTBench.multiArrayVanEmdeBoasLayout|1|64|thrpt|16|7285.622 ± 100.053 ops/ms|
|BSTBench.multiArrayVanEmdeBoasLayout|1|512|thrpt|16|4568.883 ± 603.805 ops/ms|
|BSTBench.multiArrayVanEmdeBoasLayout|4096|4|thrpt|16|5217.134 ± 12.719 ops/ms|
|BSTBench.multiArrayVanEmdeBoasLayout|4096|64|thrpt|16|4978.465 ± 55.890 ops/ms|
|BSTBench.multiArrayVanEmdeBoasLayout|4096|512|thrpt|16|4361.356 ± 240.606 
ops/ms|
|BSTBench.treeSeek|1|4|thrpt|16|14766.468 ± 134.707 ops/ms|
|BSTBench.treeSeek|1|64|thrpt|16|6400.449 ± 42.273 ops/ms|
|BSTBench.treeSeek|1|512|thrpt|16|4116.524 ± 102.472 ops/ms|
|BSTBench.treeSeek|4096|4|thrpt|16|5132.769 ± 183.395 ops/ms|
|BSTBench.treeSeek|4096|64|thrpt|16|4003.251 ± 69.857 ops/ms|
|BSTBench.treeSeek|4096|512|thrpt|16|3458.094 ± 17.092 ops/ms|

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-07-05 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534146#comment-16534146
 ] 

Benedict commented on CASSANDRA-7282:
-

Thanks.  I'm surprised, and will dig into this a little more myself.

However, we have to be careful with these microbenchmarks.  The whole state 
fits in L1 cache here, but there's a good chance this will not be hot in L1 in 
live operation, given how poor our general cache usage is; it's certainly less 
likely with the tree approach (given it is much larger in memory).  So if the 
scores are similar (which they are), I would still prefer the array variant.  
Still, it should be outright faster, so I'll take a look and see if I can 
figure out why you're seeing this behaviour.

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-07-05 Thread Michael Burman (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534129#comment-16534129
 ] 

Michael Burman commented on CASSANDRA-7282:
---

I updated the original gist

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-07-05 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534127#comment-16534127
 ] 

Benedict commented on CASSANDRA-7282:
-

Do you want to post a link to the code before hitting the sack?

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-07-05 Thread Michael Burman (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534123#comment-16534123
 ] 

Michael Burman commented on CASSANDRA-7282:
---

Updated the code with hopefully what you meant.. (need to get some sleep). I 
see worse performance, which I believe is because the added complexity of the 
minimum processing. I'll need to investigate more tomorrow.

{code}
 [java] Benchmark(vnodes)   Mode  Cnt   Score   
   Error   Units
 [java] BSTBench.multiArrayBinarySeek   4  thrpt   16  106134.448 ± 
3211.072  ops/ms
 [java] BSTBench.multiArrayBinarySeek  16  thrpt   16   21704.218 ± 
 788.881  ops/ms
 [java] BSTBench.multiArrayBinarySeek  64  thrpt   164347.090 ± 
 106.703  ops/ms
 [java] BSTBench.multiArrayBinarySeek 256  thrpt   16 667.122 ± 
 210.476  ops/ms
 [java] BSTBench.multiArrayBinarySeek 512  thrpt   16 192.777 ± 
  29.690  ops/ms
 [java] BSTBench.multiArrayLinearSearch 4  thrpt   16  172888.358 ± 
7778.132  ops/ms
 [java] BSTBench.multiArrayLinearSearch16  thrpt   16   16703.622 ± 
 565.180  ops/ms
 [java] BSTBench.multiArrayLinearSearch64  thrpt   161096.093 ± 
  33.050  ops/ms
 [java] BSTBench.multiArrayLinearSearch   256  thrpt   16  84.511 ± 
   2.776  ops/ms
 [java] BSTBench.multiArrayLinearSearch   512  thrpt   16  22.130 ± 
   0.714  ops/ms
{code}

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-07-05 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533884#comment-16533884
 ] 

Benedict commented on CASSANDRA-7282:
-

Great, thanks.  The mismatch in expectations is probably because this is not 
the optimal way to implement a binary search variant.

Ideally it would have two (or three) backing arrays, as opposed to a backing 
array of nodes.  One primitive array of one side of bounds (e.g. lower bounds) 
to search on; another of keys, and another primitive array of the opposing 
bounds to filter results (or if the bounds are encoded in the keys already, 
that could be skipped).

Do you mind adding that variant to the comparison?

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-07-05 Thread Michael Burman (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533829#comment-16533829
 ] 

Michael Burman commented on CASSANDRA-7282:
---

Here you go: https://gist.github.com/burmanm/7fb14589d5c0d11ae79c018bd7849653

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-07-05 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533770#comment-16533770
 ] 

Benedict commented on CASSANDRA-7282:
-

Could you share the code you used for this benchmark?  The result is 
counterintuitive.

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-07-05 Thread Michael Burman (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533582#comment-16533582
 ] 

Michael Burman commented on CASSANDRA-7282:
---

Out of range tokens were also important for system tables as they do not follow 
the normal ranges. I agree that this type of information should not be inside 
the Memtable, but outside the Memtable. I however designed the patch to be 
"plug-in" without touching other parts too much. There's certainly going to be 
more overhead for one memtable per range, but that might not be an issue 
depending on the amount of vnodes per node. Which then is part of the reason 
for your next question. The tree is certainly an overkill, but I made it 
because it behaves more predictably with the amount of vnodes currently in use:
{code:java}
 [java] Benchmark  (vnodes)   Mode  Cnt   Score   Error 
  Units
 [java] BSTBench.binarySeek   4  thrpt   16  122117.853 ±  6815.672 
 ops/ms
 [java] BSTBench.binarySeek  16  thrpt   16   22711.007 ±   691.020 
 ops/ms
 [java] BSTBench.binarySeek  64  thrpt   16    4547.383 ±   180.749 
 ops/ms
 [java] BSTBench.binarySeek 256  thrpt   16 892.406 ±    42.791 
 ops/ms
 [java] BSTBench.binarySeek 512  thrpt   16 224.545 ± 6.469 
 ops/ms
 [java] BSTBench.linearSearch 4  thrpt   16  214901.639 ±  3264.386 
 ops/ms
 [java] BSTBench.linearSearch    16  thrpt   16   19594.770 ±   718.201 
 ops/ms
 [java] BSTBench.linearSearch    64  thrpt   16    1302.657 ±    43.841 
 ops/ms
 [java] BSTBench.linearSearch   256  thrpt   16 105.317 ± 2.950 
 ops/ms
 [java] BSTBench.linearSearch   512  thrpt   16  25.932 ± 3.488 
 ops/ms
 [java] BSTBench.treeSeek 4  thrpt   16  162053.915 ±  4228.035 
 ops/ms
 [java] BSTBench.treeSeek    16  thrpt   16   31650.599 ±   789.500 
 ops/ms
 [java] BSTBench.treeSeek    64  thrpt   16    5901.952 ±   137.575 
 ops/ms
 [java] BSTBench.treeSeek   256  thrpt   16    1055.737 ±    22.786 
 ops/ms
 [java] BSTBench.treeSeek   512  thrpt   16 208.169 ± 4.617 
 ops/ms
{code}

(The above test creates vnodes size of token ranges and then tries to find each 
one in a random order) And the amount of code for binary search vs tree search 
is almost equivalent. But between 4-256 it is a fast method (with linear search 
being the quickest one for very small amount of token ranges). So I thought it 
as a best "compromise".

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-07-04 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533153#comment-16533153
 ] 

Benedict commented on CASSANDRA-7282:
-

I actually wonder if we should make this ticket depend on a new ticket to 
separate all memtables by token range.  By doing this at a slightly higher 
level, we can flush data for each token range on an independent schedule.  This 
might permit more even use our disks, and reduce overall compaction work by 
permitting us to flush larger sstables on average.

It's also an easier level to integrate decisions about token range changes, and 
avoids the unsightly slicing (on flush) of already sliced data.

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-07-04 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533127#comment-16533127
 ] 

Benedict commented on CASSANDRA-7282:
-

Hi Michael,

I'll try to review this properly, as I find time.  A couple of things leap out 
to me though:
 # While we certainly need to handle out of range tokens, we should try harder 
to minimise their occurrence, by switching memtable on a token range change.  
Ideally we would switch before this new token range actually takes effect in 
the case of an expansion; whether or not it makes sense to defer a token range 
removal probably needs some thought.
 # A range tree is probably overkill.  Why not binary search over an array?

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Assignee: Michael Burman
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2018-06-29 Thread Michael Burman (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527305#comment-16527305
 ] 

Michael Burman commented on CASSANDRA-7282:
---

I've updated this work with the following updates to get this process restarted:
 * Ported to newer storage format and rebased to current master
 * Removed visibility optimization from resize operation as I could make this 
NPE with a race condition if large resizes happen in short duration
 ** I tested different resize algorithms, such as tracking the read and write 
path chains and resizing if certain criteria is met. This is the strategy used 
by the Linux kernel's RCU, but I did not manage to make any notable performance 
gains using that method so I left the original resize algorithm.
 * Each node token range is now backed by a separate index.
 ** Reduces the contention on the index updates, resizes and size parameter 
update (which was a contention point for all threads previously).
 ** Apply a normalization for token range hash updates to improve the 
distribution of hashes and that way the coverage of the index
 *** Center the hash range to around 0 first and then apply a scaling 
multiplier (proportional size of the token range compared to the available hash 
range [Long.MIN_VALUE, Long.MAX_VALUE])
 ** Allows different growth sizes for each token range
 ** For system tables and cases where node might see writes that were not known 
during the initilization and overflow index with range [Long.MIN_VALUE, 
Long.MAX_VALUE] is used (works like the original patch)
 ** For lookup to the correct index I used a balanced BST. If there's a better 
way, I'm all ears. Linear search would be as fast if the amount of vnodes is 
small, but this scales to larger amount of vnodes also.

I've tried to run different workloads against it and also tried to use 
different hashing methods, including the 32-bit hash from MurmurHash to trigger 
a bad hash distribution, but I didn't couldn't see any scenario with 
performance degradation compared to CSLM. With a hash attack (which Murmur is 
vulnerable to) this can be done of course, but those will break Cassandra node 
distribution also.

Link to the branch: [https://github.com/burmanm/cassandra/tree/7282-trunk-2]

> Faster Memtable map
> ---
>
> Key: CASSANDRA-7282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Benedict
>Priority: Major
>  Labels: performance
> Fix For: 4.x
>
> Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
> run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2015-03-29 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385740#comment-14385740
 ] 

Benedict commented on CASSANDRA-7282:
-

bq. +1 for massive understatement.

Thanks. I spent a day working on just that, so glad it panned out :)

I'm currently leaning towards postponing this ticket until 3.1, since some 
careful consideration is needed to ensure a uniform distribution of hash keys 
within the map, especially without vnodes on large clusters. It's possible we 
could only enable this optimisation on nodes that can predict their 
distribution will be fair. In either case I think it may be helpful to consider 
the ticket in relation to CASSANDRA-7032 and CASSANDRA-6696, by e.g. having a 
separate hash table for each vnode range. Depending on 3.0 release timeline, 
the incorporation of these tickets, and on the progression of my other 
commitments, I may still aim to deliver this in 3.0, but just alerting that at 
the moment my view is this is uncertain and on balance less than likely.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
 run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2015-03-16 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363708#comment-14363708
 ] 

Joshua McKenzie commented on CASSANDRA-7282:


From the comments:
bq. We also use a somewhat different explanation here for behaviour, which is 
perhaps easier to comprehend.
+1 for massive understatement. :) The binary reversal magic for determining 
bucket to hash into is very well explained in the comments now. That's the 
subtlest part to wrap my head around out of all this and your explanation is 
spot on.

bq. I'd prefer to see it used more widely in the codebase though.
I don't disagree, but when in rome... that, and a piece of code as dense as 
NBHOM can use all the familiarity we can get in here.

So my last few minor outstanding concerns:
# hashCode conformance
bq. The best option is probably to make all Token implement a HashComparable 
interface, and throw UnsupportedOperationException if they don't really 
implement it, but that is also pretty ugly.
Not *too* ugly in my opinion - a little boilerplate, sure, but not ugly. The 
distinction of whether a hashCode implementation conforms to the requirements 
of NBHOM is subtle enough that I believe it warrants codifying.
# i value / i(...) is a strange combo. For instance, the following:
{noformat}
int i = i(hash, index);
{noformat}
took me by surprise. It's relatively clear what you're doing when you read the 
surrounding code  - just could use different naming imo.
# maybeResize and the load factor of 66%. Given that it's a lazy, async move 
the buckets, not the items approach I can see how it's not going to 
make-or-break at 50% vs. 75%. I'd like *some* kind of data on this though; I'd 
be content with back-of-the-napkin theorization here - just something more than 
seems pretty reasonable. :)

On the whole - looking very good.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
 run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2015-02-16 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14322942#comment-14322942
 ] 

Branimir Lambov commented on CASSANDRA-7282:


bq. To add some more data to this discussion, I decided quickly to isolate just 
the CSLM and NBHOM for comparison.

This benchmark uses Longs as keys, whose hashCode() does not satisfy the 
requirements of the NBHOM. The results aren't materially different when this is 
fixed (NBHOM is still dramatically faster).

However, if even the author of the code can make this mistake so easily I don't 
think reusing {{hashCode()}} for the NBHOM ordering key is acceptable.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
 run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2015-02-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14323080#comment-14323080
 ] 

Benedict commented on CASSANDRA-7282:
-

bq. This benchmark uses Longs as keys

Thanks for spotting that

bq. However, if even the author of the code can make this mistake so easily

It's worth noting that I was recovering from brain surgery performed a couple 
of weeks prior to posting this benchmark, so it's probably not _quite_ as 
indicative of the problem as it might appear. I agree it would be preferable to 
use a different method, though. The problem is doing so neatly and without 
penalty. The best option is probably to make all Token implement a 
HashComparable interface, and throw UnsupportedOperationException if they don't 
really implement it, but that is also pretty ugly. I'm pretty agnostic to the 
solution to this particular problem, so I'll defer to strong opinions.

A potentially more damning criticism of that benchmark is that it assumes a 
uniform address space for the hashes. As the number of nodes in a cluster 
grows, the entropy in the top bits decreases, and so the performance of this 
map could degrade. Besides the aforementioned improvement to bound worst case 
behaviour at O(lg(N)) we could also normalise the top order bits across the 
owned ranges. Possibly there are some other strategies, but I need to think on 
that.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
 run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2015-02-10 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315157#comment-14315157
 ] 

Benedict commented on CASSANDRA-7282:
-

I've pushed an updated branch 
[here|https://github.com/belliottsmith/cassandra/tree/7282-withcomments] that I 
hope addresses all of your concerns.

bq. LongNonBlockingHashOrderedMapTest times out on default settings for ant 
long-test on my box

The test is really designed to be run on a machine with at least 8 physical 
cores, and it may take an excessively long time on fewer. In general it's hard 
to make a test like this terminate in a predictable time horizon, and the 
longer it runs the better (ideally we'd just leave it running for days). I've 
tweaked the settings slightly to make it run better on lower core counts but 
still actually perform some useful work.

bq. this.index aliasing in predecessor appears redundant

This is necessary to save fetching a different version of the index each time, 
and incurring the memory access costs with each reference

bq. Would it be worth making maybeResize threshold configurable? What's our 
reasoning for targeting 66%?

66% seems pretty reasonable (defaults are usually somewhere between 50% and 75% 
for hash tables). We don't generally expose this kind of internal feature to 
users, but I am not totally opposed to doing so. I doubt a value much different 
from this is likely to help much, though; a value of  50% could mean the index 
becomes more stale and see we have more scanning forwards to do on lookups, 
whereas a value of above 100% means we start seeing significant worst case 
collisions.

bq. in putIfAbsent / maybeResize / resize: Just to confirm - are we relying on 
the executor to drop duplicate resize calls on the floor? If so, it might be 
worth documenting.

It doesn't matter if it does or does not; it only ever resizes upwards. We also 
shouldn't ever have a duplicate resize call, unless we haven't executed by the 
time the table's contents have doubled.

bq. Why do we bit shift on generation of index node instead of just using 
integer value?

This is a standard practice for me: shifts in multiples of 10 are powers of 
~1000 (i.e. 1024). I find it much neater than lots of magic constants when 
you get above 1024 (e.g. 1048576 or 1073741824) , or snippets of  1024L * 1024L 
* 1024L. 1  30 is much clearer (representing 1024^3). However it's pretty 
marginal utility when working with small quantities, so I've switched it. I'd 
prefer to see it used more widely in the codebase though.

I've spent a while trying to comment very clearly the algorithm at the top of 
the class, with diagrams. Please let me know if it's a success, and if there's 
anything more I can do to clarify it.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
 run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2015-01-13 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276213#comment-14276213
 ] 

Jonathan Ellis commented on CASSANDRA-7282:
---

bq. It may be we want to harden the data structure against determined attackers 
that produce a lot of colliding tokens with different partition keys

I'm happy to leave this as a non-goal for now.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
 run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2015-01-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273063#comment-14273063
 ] 

Benedict commented on CASSANDRA-7282:
-

Just a note for the future (I haven't yet addressed your comments): It may be 
we want to harden the data structure against determined attackers that produce 
a lot of colliding tokens with different partition keys. 

It's not a terribly complex modification to make this a skip-list, instead of a 
linked-list (since it is add-only, maintaining a skip-list is just applying the 
current linked-list logic N times, and the majority of complexity here is for 
the hash index). The hash index would remain an _optimisation_ for expected 
constant time access into a logarithmic structure, instead of a linear one, 
giving us best- and expected- case constant time, worst case logarithmic. The 
skip-list index levels could perhaps be dropped as the hash index extends to 
overlap them, although it may be cheaper to encode them all in the nodes 
themselves. This should probably be a follow up ticket, but I wanted to log the 
thought as it occurred to me.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
 run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-26 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14149929#comment-14149929
 ] 

Joshua McKenzie commented on CASSANDRA-7282:


(nits interleaved with the rest)

General:
* Could use a comment on LongToken.hashCode() explaining reasoning for shift / 
truncation.  While it makes sense in the context of this patch, there's little 
context within the class itself to explain why we're just lopping off half the 
bits.
* Consider more descriptive variable names rather than abbreviations in 
AtomicReferenceArrayUpdater (rather than V exp, V upd)
* AtomicReferenceArrayUpdater: [could use some 
comments|https://github.com/belliottsmith/cassandra/blob/7282-fastmemtablemap/src/java/org/apache/cassandra/utils/concurrent/AtomicReferenceArrayUpdater.java#L35-41]
 clarifying index calculation logic

NonBlockingHashOrderedMap.java:
* I think a diagram showing how the container works ([see 
CSLM|http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/concurrent/ConcurrentSkipListMap.java])
 would be quite helpful
* Document why INDEX_SHIFT == 18. Currently just a magic number
* Why do we bit shift on generation of index node instead of just using integer 
value?
  {code} private volatile NodeK, V[][] index = new Node[1][1  10]; 
{code}
* It might help to reference the shalev whitepaper's title as a reference for 
some more formal reading on a similar data structure to what we're using here
* this.index aliasing in predecessor appears redundant
* [predecessor() is still very 
dense|https://github.com/belliottsmith/cassandra/blob/7282-fastmemtablemap/src/java/org/apache/cassandra/concurrent/NonBlockingHashOrderedMap.java#L182-203].
  Not a bad thing if necessary, but I'm wondering if there aren't opportunities 
to abstract out some of the guts of what's going on in there to segregate the 
algorithm from the implementation.
* The comments on indexHash and firstHashOfIndex are quite helpful.
* As non-blocking algorithms are still somewhat new, it might be worth 
documenting why we're using [lazy 
updates|http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6275329] for future 
readers of the code.  Might not.
* Would it be worth making maybeResize threshold configurable?  What's our 
reasoning for targeting 66%?
* 
[maybeResize|https://github.com/belliottsmith/cassandra/blob/7282-fastmemtablemap/src/java/org/apache/cassandra/concurrent/NonBlockingHashOrderedMap.java#L268-274]
 logic could use some more clarity on documentation.
* [Signal-to-noise ratio in resize() seems 
off|https://github.com/belliottsmith/cassandra/blob/7282-fastmemtablemap/src/java/org/apache/cassandra/concurrent/NonBlockingHashOrderedMap.java#L293-302].
  Might be worth factoring out all the INDEX_SHIFTING and Math.max/min to get 
implementation details out of the immediate visibility of the algorithm's 
scope, assuming we can do that without impacting performance.
* in putIfAbsent / maybeResize / resize: Just to confirm - are we relying on 
the executor to drop duplicate resize calls on the floor?  If so, it might be 
worth documenting.

LongNonBlockingHashOrderedMapTest.java
* Reinforces the feeling that the heavy bitwise operations could serve to be 
abstracted out or more heavily commented to help reduce complexity.  For 
example, 
[this|https://github.com/belliottsmith/cassandra/blob/7282-fastmemtablemap/test/long/org/apache/cassandra/concurrent/LongNonBlockingHashOrderedMapTest.java#L213-216]
 is opaque.  While it's parsable and appears correct, some trivial commenting 
could help smooth out the reading / maintaining process
* LongNonBlockingHashOrderedMapTest times out on default settings for ant 
long-test on my box
* And still times out at 10x timeout threshold.  :)  This may be a Windows 
thing - do they run to completion within the time on your setup?

In summary - the code looks good but in my opinion could use some more 
documentation and possibly some abstracting out of lower-level implementation 
details.  The logic of the container itself is definitely simpler than the CSLM 
and seems more tailored to our needs.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: jasobrown-sample-run.txt, profile.yaml, reads.svg, 
 run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal 

[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-25 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147705#comment-14147705
 ] 

Jason Brown commented on CASSANDRA-7282:


I ran benedict's 7282 patch vs. the commit prior in his branch, for several use 
cases. TL;DR the patch was a clear winner for the the specific use case he 
called out, and marginally better, if not the same, for a 'typical' use case.

For the first use case, I used Benedict's attached profile.yaml, although i 
changed the RF from 1 to 2. I then used this stress command to execute 
(basically the same as Benedict's above) {code}./bin/cassandra-stress user 
profile=~/jasobrown/b7282.yaml ops\(insert=5,read=5\) n=2000 -pop 
seq=1..10M read-lookback=extreme\(1..1M,2\) -rate threads=200 -mode cql3 native 
prepared -node node_addresses{code}

I've attached the results of running the above command three times successively 
on both patch and unpatched code, and the results can be summarized:
- 15% improvement in throughput
- 10% improvement at 95%/99%-iles
- 50% improvement at 99.9%-ile
~ 40% less garbage created / 40% less time in GC

Note that over the life of stress run, memtables were flushing and sstables 
compacting, so not all reads were coming directly from the memtable.

One thing I perhaps could have tried (and would have liked to) was switching 
the CL from ONE to LOCAL_QUORUM, although I think stress would have applied 
that to both reads and writes for the above command, whereas I would have 
wanted to affect either read or write.

I also ran the stress a more 'standard' use case of mine (I'm still developing 
it alongside the new stress), and results were about even between patch and 
unpatched, although there may have been very slight advantage toward the 
patched version.

So, I think in the case where you are reading your most recent writes, and the 
data in the memtable is small (not wide), there is a performance gain in this 
patch. In a more typical case, it wasn't necessarily proven that the patch 
boosts performance (but then the patch wasn't attempting to help the general 
case, anyway).


 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140093#comment-14140093
 ] 

Benedict commented on CASSANDRA-7282:
-

We should test both. This change is going to apply not only now, but in the 
future. If we reject it, it won't be revisited for a long time. Not only are VM 
improvements like shenandoah and G1GC going to mean we support users with huge 
heaps that can support gigantic memtables, this data structure is easily moved 
completely offheap once the objects it refers to are. Lots of users run with 
only single disks and large memory capacities also, and with a single disk an 
mct of 0.5 is the norm. So even an mct of 0.99 is pretty underwhelming for 
helping to bench these cases. An mct of 0.11 is completely inadequate.

However we should test both if that's what you run with, to see how it fairs in 
different climates.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140118#comment-14140118
 ] 

Benedict commented on CASSANDRA-7282:
-

We should be certain to test this with offheap_objects, btw, since this also 
reduces heap use and increases the number of records we can store. We should 
also not generally be benchmarking use cases with clustering columns (or very 
few cql rows and very few data columns, i.e. low cell counts), since they are 
less likely to exhibit the change, and we have plenty of users with partitions 
with low cell counts.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-18 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139755#comment-14139755
 ] 

Jason Brown commented on CASSANDRA-7282:


Looks like it's not something on my end that's causing weirdness: [~benedict]'s 
recommendation of setting memtable_cleanup_threshold = 0.99 causes all (three) 
of the nodes in my cluster to more or less lock up doing GC during startup. 
I've got a large enough memory space on the machine (128GB), and typically use 
8G for the jvm, so I'll play with some values for memtable_cleanup_threshold to 
get it running reasonably.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-18 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139943#comment-14139943
 ] 

Benedict commented on CASSANDRA-7282:
-

There's no good reason for that unless your commit log directory isn't empty... 
Could you confirm you're clearing your cluster state between runs...?

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-18 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139955#comment-14139955
 ] 

Jason Brown commented on CASSANDRA-7282:


Yes, I clear the the cluster every time (rm -rf $data_dir). 

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-18 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139962#comment-14139962
 ] 

Benedict commented on CASSANDRA-7282:
-

I hope $data_dir is a shared dir with all the state, not just the data :)

If so, mct should have zero bearing on anything at start up. It only plays a 
part in deciding when to flush, and at start up there should never be a reason 
to flush due to memtable occupancy... Something funny is going on

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-18 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14140010#comment-14140010
 ] 

Jason Brown commented on CASSANDRA-7282:


bq. I hope $data_dir is a shared dir with all the state, not just the data

Yeah, yeah, I got some things right today :)

I should note that when I set mct back to 0.111, the cluster launches properly 
- admittedly, I didn't actually do anything with the cluster after I saw it 
successfully came and stayed up, but it's still alive six hours later. I have 
not investigated further.

I agree that mct should be relatively harmless on new cluster launch, but let's 
back up for a moment - would we ever set mct that high in a real world 
cluster? I understand we're trying to avoid flushing (and losing the 
opportunity to actually test what the patch does), but if those kind of mct 
settings are so artificial, shouldn't we be testing with more realistic, or, at 
least, less crazy-town, values? 

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136884#comment-14136884
 ] 

Benedict commented on CASSANDRA-7282:
-

To add some more data to this discussion, I decided quickly to isolate just the 
CSLM and NBHOM for comparison. Code available 
[here|https://github.com/belliottsmith/bench]. 

Using JMH for writes to an append-only collection is tricky, since any 
increased throughput rate is naturally throttling and incurs greater GC and 
algorithmic overhead, so these numbers are likely to be understating the yield, 
but the NBHOM is anywhere from 1.75-2.75x the throughput of CSLM, depending on 
if the workload is read-heavy or write-heavy. It is likely the real yield for 
writes is inbetween these two numbers.

{noformat}
b.b.c.HashOrderedCollections.test  4000   0.9CSLM  
thrpt   10  3593.788  196.459  ops/ms
b.b.c.HashOrderedCollections.test  4000   0.9   NBHOM  
thrpt   10  9299.991  179.408  ops/ms
b.b.c.HashOrderedCollections.test  4000   0.5CSLM  
thrpt   10  2402.021   79.321  ops/ms
b.b.c.HashOrderedCollections.test  4000   0.5   NBHOM  
thrpt   10  5648.160  294.727  ops/ms
b.b.c.HashOrderedCollections.test  4000   0.1CSLM  
thrpt   10  2089.597   71.228  ops/ms
b.b.c.HashOrderedCollections.test  4000   0.1   NBHOM  
thrpt   10  3665.048  138.384  ops/ms
b.b.c.HashOrderedCollections.test  4000 0CSLM  
thrpt   10  2008.595   46.308  ops/ms
b.b.c.HashOrderedCollections.test  4000 0   NBHOM  
thrpt   10  3515.328  214.304  ops/ms
{noformat}

So, as we reduce other overheads in the application and increase memtable 
capacity through further offheap improvements, any yield we are seeing 
application side can only go up.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136908#comment-14136908
 ] 

Benedict commented on CASSANDRA-7282:
-

Some more numbers, with a warmup dataset to populate the map so that 
variability due to throughput rate is reduced. These numbers show the NBHOM 
consistently around 3x+ faster, although it introduces per-run variability due 
to GC.

{noformat}
Benchmark(readWriteRatio)  (type)  (warmup)   Mode  
Samples Score  Score error   Units
b.b.c.HashOrderedCollections.test 0.9CSLM   500  thrpt  
  5  1392.273 2918.717  ops/ms
b.b.c.HashOrderedCollections.test 0.9   NBHOM   500  thrpt  
  5  5088.408 8964.885  ops/ms
b.b.c.HashOrderedCollections.test 0.5CSLM   500  thrpt  
  5  1128.637 2589.679  ops/ms
b.b.c.HashOrderedCollections.test 0.5   NBHOM   500  thrpt  
  5  3406.299 5606.085  ops/ms
b.b.c.HashOrderedCollections.test 0.1CSLM   500  thrpt  
  5   924.642 1802.045  ops/ms
b.b.c.HashOrderedCollections.test 0.1   NBHOM   500  thrpt  
  5  3311.107  999.896  ops/ms
b.b.c.HashOrderedCollections.test   0CSLM   500  thrpt  
  5   939.757 1776.641  ops/ms
b.b.c.HashOrderedCollections.test   0   NBHOM   500  thrpt  
  5  2781.503 4723.844  ops/ms
{noformat}

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-17 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137603#comment-14137603
 ] 

Jason Brown commented on CASSANDRA-7282:


Food for thought:

I've reviewed the branch Benedict has linked, and while I'll run the stress 
later today, I'd like to add some thoughts on the code itself. 

To perhaps state the obvious from reading the patch, the field Memtable.rows is 
now a InsertOnlyOrderedMap instead of the previous ConcurrentSkipListMap. If 
the Partitioner sorts data by hash code (sortsByHashCode()), Memtable will use 
the simple IOOM.Adapter implementation; else Memtable uses the 
NonBlockingHashOrderedMap. I think we can agree NBHOM is considerably more code 
and more involved than IIOM.Adapter (wait 'til the end, please, Benedict :) ).

Currently, only the Murmur3Partitioner is written to sort by hash code. I would 
venture to guess the vast majority of new clusters created = cassandra 1.2 use 
the Murmur3Partitioner, and, further, the vast majority of clusters created 
before 1.2 used RandomPartitioner. Thus, not many clusters use the ordered 
partitioners; we've also steered folks away from those unless they really know 
what they are doing (and then we still try to dissuade them).If we take the 
assumption that most clusters' partioner is either Murmur3 or Random, and weigh 
it against the the current patch, the majority of the code (and, what some have 
argued, the complexity) is to support the RandomPartition and the ordered 
partitioners. If we can make RandomPartitioner be able to sort by hash code, 
and thus be able to take advantage of IIOM.Adapter's functionality, then the 
NBHOM-related functionality would only be necessary for for ordered 
partitioners (let's leave LocalPartitioner, used for indices, out for a 
moment). It seems to me that we could pass on the NBHOM-related functionality 
as it would only be beneficial for a small subset of uses case; not zero use 
cases, mind you, but perhaps enough to take a pass on it for this round. I 
think some commenters concerns about the perceived complexity of the patch 
would then be assuaged, and we can deliver an improvement performance for the 
majority of uses.

[~benedict], Please let me know if I've misunderstood something fundamental 
about the patch that would invalidate my arguments; the 
conclusion/recommendation is, of course, open for discussion.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137609#comment-14137609
 ] 

Benedict commented on CASSANDRA-7282:
-

I think you may have misunderstood, or I have misunderstood what you're saying. 
I assume by IIOM you mean IOOM. But the IOOM.Adapter is not a feature, it's a 
support for old functionality. RandomPartitioner is currently supported by 
IOOM.Adapter backed by CSLM, because I was too lazy at the time to update RP's 
Tokens.

NBHOM is essentially the entirety of the patch, _not_ a legacy support 
mechanism. It is an efficient means for supporting Murmur3 (and optionally 
Random) partitioners.


 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-17 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137638#comment-14137638
 ] 

Jason Brown commented on CASSANDRA-7282:


damnit, there's a ! operator there

{code}if (!cfs.partitioner.sortsByHashCode()) {code}

oh, well, I tried :)


 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-17 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138519#comment-14138519
 ] 

Jason Brown commented on CASSANDRA-7282:


just an update: running into some weird problems getting trunk to run on my 
servers - I think it might be some settings I have in the cassandra.in.sh 
script conflict with 3.0. will figure it out in the morning when i'm not 
falling asleep.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136370#comment-14136370
 ] 

Jonathan Ellis commented on CASSANDRA-7282:
---

bq.  it does factor in to the decision as to whether the benefits outweigh the 
costs associated with moving the complexity from the CSLM into our context as 
well as flushing out any subtle bugs and/or downstream unintended side-effects 
that come along with the change.

I do agree in principle that we have a complexity budget and we need to be 
wise about choosing where to spend it.

I look forward to seeing Jason's numbers.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136745#comment-14136745
 ] 

Benedict commented on CASSANDRA-7282:
-

I'm not sure which statement I made gave the impression I disagree with this. I 
do not. My argumentation has all been around which facts are most pertinent to 
this decision, and how they should be interpreted. 

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133122#comment-14133122
 ] 

Benedict commented on CASSANDRA-7282:
-

bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an 
ordering key or ordering prefix?

Creating an interface would be problematic, since we need to have our map key 
be a shared type for both CSLM keys and NBHOM keys. So I'm going to stick with 
the current situation.

If you meant from a purely documentation point of view, it is absolutely 
essential that the value is a _hash_, otherwise performance will be O\(n\^2\), 
so whilst it may be worth clarifying it is essential we call it a hashCode(). 

To elaborate on this in documentation, I've included the following extra comment

{quote}
This data structure essentially only works for keys that are first sorted by 
some hash value (and may then be sorted 
within those hashes arbitrarily), where a 32-bit prefix of the hash we sort by 
is returned by hashCode()
{quote}

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14133164#comment-14133164
 ] 

Benedict commented on CASSANDRA-7282:
-

Uploaded a version with updated comments 
[here|https://github.com/belliottsmith/cassandra/tree/7282-fastmemtablemap]

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132586#comment-14132586
 ] 

Benedict commented on CASSANDRA-7282:
-

Perhaps this is a good opportunity, for our second set of data points, to try 
taking the latest 'realistic workload' features of cassandra-stress for a whirl

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: reads.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-13 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132885#comment-14132885
 ] 

Branimir Lambov commented on CASSANDRA-7282:


I have a question about the implementation:

You state in the preface of the NBHOM that the following condition must be 
satisfied for two keys k1, k2: k1  k2 = k1.hashCode()  k2.hashCode(). I 
can see that if the condition is satisfied the structure will work very well, 
but this condition is much too strong. On one hand, I don't believe the 
inequality on the right side of the condition should be strict, because this 
forbids all collisions in the hash function in a totally ordered key set. More 
importantly, even with = on the right, I don't think the condition is 
satisfiable for any usable hash function and in particular, I don't see how the 
Murmur3 partitioner tokens satisfy it.

Could you elaborate on why this condition is satisfied by the proposed code?

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132887#comment-14132887
 ] 

Benedict commented on CASSANDRA-7282:
-

For the Murmur3Partitioner, the Token (which is key for this map) is sorted by 
hash, so all we require is that the hashCode() method returns a prefix of this 
hash.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-13 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132901#comment-14132901
 ] 

Joshua McKenzie commented on CASSANDRA-7282:


{quote}
but if the effect is not seen, or is lost in the noise, on the specific 
hardware / use case we're benchmarking that doesn't isolate, my only point is 
that this doesn't IMO reflect badly on the change.
{quote}

It doesn't reflect badly on the change, per se, but it does factor in to the 
decision as to whether the benefits outweigh the costs associated with moving 
the complexity from the CSLM into our context as well as flushing out any 
subtle bugs and/or downstream unintended side-effects that come along with the 
change.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132906#comment-14132906
 ] 

Benedict commented on CASSANDRA-7282:
-

As already stated, I disagree. Always, when making these decisions, the 
important factors are: 1) what workloads / portions of workloads are affected; 
and 2) do we consider these common enough (or to become common enough) to 
warrant inclusion?

Since we are very constrained on our hardware and workload generation, picking 
a realistic workload that we can perform at best helps us rule out 
regressions if it is not compatible with exhibiting the change. The important 
question is simply: do we consider it likely it will impact other workloads we 
are not capable of benchmarking, given the known information we have from 
isolating its effect in a manner we _can_.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-13 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132913#comment-14132913
 ] 

Branimir Lambov commented on CASSANDRA-7282:


This makes sense. Thanks for clarifying, the fact that the list is ordered by 
the Token escaped me. Since this is the case, should we restrict this method to 
Murmur tokens only? For anything that sorts naturally we could just grab the 
first four bytes. I wouldn't even call this a hashCode to avoid the confusion, 
perhaps an ordering key or ordering prefix?

I would still change the condition in the preface to use non-strict inequality, 
though, because cropping tokens to 32 bits will introduce collisions.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-13 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132924#comment-14132924
 ] 

Joshua McKenzie commented on CASSANDRA-7282:


{quote}
does factor in to the decision
{quote}
{quote}
1) what workloads / portions of workloads are affected;
{quote}
That's what I was referring to.

To be clear: I'm not saying we shouldn't use the container or that I'm against 
it in any way,  I'm just re-iterating that we need to be clear as to which 
use-cases and configurations we expect this to help with (which you've done) 
and weigh that relative to the added risk of untested code and the added 
complexity associated with the change.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132953#comment-14132953
 ] 

Benedict commented on CASSANDRA-7282:
-

bq. Since this is the case, should we restrict this method to Murmur tokens 
only?

Well, if I can spend the time ensuring safety we should be able to use this for 
RandomPartitioner also.

bq. I would still change the condition in the preface to use non-strict 
inequality, though, because cropping tokens to 32 bits will introduce 
collisions.

There will be collisions with or without truncation. The fact that there are 
collisions doesn't affect the constraint I've imposed upon the data; we assume 
nothing about the dataset when two hashCode()s are equal, and simply resort to 
the underlying token comparator, so I think the constraint is sufficient. 
Non-strict equality is surely more broken, however? It would hold true for 
k1:1, k2:2, hash(k1):2, hash(k2):2. We could strengthen it to a bi-directional 
implication, i.e. k1 = k2 == k1.hashCode() = k2.hashCode().

bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an 
ordering key or ordering prefix?

Perhaps. This was the simplest approach, and since it *is* a hash key used to 
index a hash table it seems suitable to use hashCode(), and impose the extra 
constraint contextually. I'm fairly neutral though; we certainly could 
introduce a new interface. I'll see how it looks.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-13 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132983#comment-14132983
 ] 

Branimir Lambov commented on CASSANDRA-7282:


{quote}I think the constraint is sufficient.{quote}
The constraint is sufficient, but it is too strong and isn't satisfied by the 
current version of the code.

Assuming that  is a total ordering (since any ordering defined by a Comparable 
must be total), the problem with the current k1  k2 - hash(k1)  hash(k2) is 
that it implies k1 != k2 - hash(k1) != hash(k2) and thus hash(k1) == hash(k2) 
- k1 == k2, because k1 != k2 means that either k1  k2 or k2  k1. This means 
that the condition will only be satisfied for a hash without collisions.

A bidirectional = has the same problem, since hash(k1) == hash(k2) -- 
hash(k1) = hash(k2)  hash(k2) = hash(k1) -- k1 = k2  k2 = k1 -- k1 == 
k2. It is actually a stronger statement than the original condition, hence it 
is not a surprise that it doesn't solve the problem. We need something weaker.

To make it weaker we either strengthen the premise, or weaken the conclusion. 
We can do the latter by making the comparison non-strict: k1  k2 - hash(k1) 
= hash(k2). For this one k1 != k2 implies only hash(k1) = hash(k2) || 
hash(k1) = hash(k2) which is trivially true. Looking at the code this is 
actually the condition you use, because you choose predecessors based on hash 
order rather than direct comparison.

{quote}It would hold true for k1:1, k2:2, hash(k1):2, hash(k2):2.{quote}
I don't see anything wrong with this, just a hash collision.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132991#comment-14132991
 ] 

Benedict commented on CASSANDRA-7282:
-

Hmm. Yes. I shouldn't get into these discussions at this time of night.

Agreed.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: profile.yaml, reads.svg, run1.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-12 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132137#comment-14132137
 ] 

Jonathan Ellis commented on CASSANDRA-7282:
---

[~jasobrown] could you try this on your cluster?

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: reads.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-12 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132212#comment-14132212
 ] 

Jason Brown commented on CASSANDRA-7282:


[~jbellis] Sure, will try it out early next week.

[~benedict] How are you running stress? Any special use case?

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: reads.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-12 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132494#comment-14132494
 ] 

Benedict commented on CASSANDRA-7282:
-

Yes, we want to restrict the workload to memtables only, and we want to make 
them big to generate enough numbers on insert to avoid noise. So from a fresh 
cluster, I start it with a memtable_cleanup_threshold of 0.99, 
memtable_allocation_type: offheap_objects, and I run a stress test with only 
_one column_ per partition, and make that column size 1 (although any size is 
fine if it's offheap). I'm just trying to make the memtable as large as 
possible, which with my 4Gb laptop is difficult. I then set the 
memtable_heap_space_in_mb to 1024 (feel free to make it much bigger), and then 
insert around 5M items. If you stick to one column, ensure your offheap space 
is sufficiently large for the space you insert into it, then 5M items per node 
per Gb of on-heap space is achievable (my math tells me around 9M should be 
possible, but I overshot slightly and decided to be conservative). I then 
follow up immediately with a read run hitting a random selection of those PKs.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: reads.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-12 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132504#comment-14132504
 ] 

Aleksey Yeschenko commented on CASSANDRA-7282:
--

bq. Yes, we want to restrict the workload to memtables only

I understand why, but doesn't it sort of defeats the purpose of benchmarking in 
an actual real world situation?

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: reads.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-12 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132508#comment-14132508
 ] 

Benedict commented on CASSANDRA-7282:
-

No. You benchmark to isolate performance improvements. These numbers 
demonstrate a benefit _to any portion of a workload that fits these 
characteristics_. Different use cases will exhibit different ratios of this 
effect; some considerable, some not so much. Certainly the read performance 
enhancements will be more generally applicable, but being able to fill your 
memtables faster and with less garbage is also not a bad thing, even if you end 
up bottlenecking on disk. Especially since server characteristics are changing 
rapidly, so that disk bottlenecks are disappearing.

As optimisation work goes deeper, isolating the specific portion of work that 
is affected is pretty essential to clearly delineating the benefit.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: reads.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-12 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132511#comment-14132511
 ] 

Aleksey Yeschenko commented on CASSANDRA-7282:
--

True, but the new more complex code replacing the old code tested by years of 
production stays there for all the use cases. So maybe you want to benchmark 
both the niche and the general use case as well - to make a decision whether or 
not a particular change is justified. (No idea if it is or it isn't in this 
case, but would love to see some real world data on this).

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: reads.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-12 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132523#comment-14132523
 ] 

Benedict commented on CASSANDRA-7282:
-

The new code is not significantly more complex. In fact, its arguably less 
complex, the only difference being some of that complexity is not derived from 
an external source (CSLM is significantly more complex than the NBHOM). However 
it is a pretty simple piece of code.

The point of isolating is that we cannot possibly test all possible real world 
use cases. DSE, for instance, delivers Memory-Only SSTables, which depend on 
memtables, so all workloads hitting these would be affected fully by this 
change. All other workloads will be hit to different degrees.

In general we have to make a judgement call about how common such workloads 
are, vs how significant the effect is for those workloads. However trying to 
define a success metric based on a narrow definition of 'realistic workloads' 
that do not isolate the behaviour doesn't buy us anything in my book. Our 
hardware and workload definitions are not sufficiently universal. If we see a 
15% bump for accesses to memtables, that is IMO worth incorporating, esp. as 
the bump will increase over time. As we move memtables fully offheap we can 
support larger and larger memtables, and since the algorithmic quality of this 
change is that the performance does not increase as memtable size grows, 
whereas CSLM grows logarithmically, this change also has the likelihood of 
paying further future dividends.



 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: reads.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-12 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132530#comment-14132530
 ] 

Jason Brown commented on CASSANDRA-7282:


+1 to [~iamaleksey]'s sentiment. Both the general and specific use cases must 
be tested, and if the general case has a regression, we should not optimize for 
the specific case. I'm sure we all agree about that. Let's do some testing over 
the next week and observe what happens.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: reads.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-12 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132532#comment-14132532
 ] 

Aleksey Yeschenko commented on CASSANDRA-7282:
--

For the record, I'm not saying that we shouldn't do it - just that having the 
data for different use cases, and not just one specific one, would be nice, and 
ultimately necessary to make that judgement call. I'm expecting that it would 
probably be worth incorporating.

But there definitely is *some* value in heavily battle-tested code when you 
work on a database.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: reads.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-09-12 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132536#comment-14132536
 ] 

Benedict commented on CASSANDRA-7282:
-

bq. But there definitely is some value in heavily battle-tested code when you 
work on a database.

Sure, agreed :)

Ok, I'm not disputing there's value in _seeing extra data_ (there always is), 
but if the effect is not seen, or is lost in the noise, on the specific 
hardware / use case we're benchmarking that _doesn't_ isolate, my only point is 
that this doesn't IMO reflect badly on the change.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0

 Attachments: reads.svg, writes.svg


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-20 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103866#comment-14103866
 ] 

Joshua McKenzie commented on CASSANDRA-7282:


Some interesting results on the 
[99.9th|http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.7282.defaults.jsonmetric=99.9th_latencyoperation=1_writesmoothing=1show_aggregates=truexmin=0xmax=688.16ymin=0ymax=1600]
 and 
[max|http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.7282.defaults.jsonmetric=max_latencyoperation=1_writesmoothing=1show_aggregates=truexmin=0xmax=688.16ymin=0ymax=3000]
 on write latency.  While their aggregated values look better on the new 
non-blocking hashordered map there look to be more outliers.  Any ideas on why 
that might be [~benedict]?  95th and 99th shows pretty [marked 
improvement|http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.7282.defaults.jsonmetric=99th_latencyoperation=1_writesmoothing=1show_aggregates=truexmin=0xmax=688.16ymin=0ymax=140]
 so the differential looks to be on the extremes only.

Either way - I'll be very interested to see the results of the test w/memtable 
and row size changes.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104014#comment-14104014
 ] 

Benedict commented on CASSANDRA-7282:
-

A few possibilities:

1) very few data points, so difficult to draw much conclusion
2) the 99.9th%ile border is arbitrary with so few data points, as evidenced by 
the better max latency
3) the largest latencies are ordinarily encountered either during CL or 
memtable flush; since we're writing faster, it's possible we are simply putting 
the disks under increased pressure and hitting higher latencies more often, 
bringing the 99.9th%ile closer to max

Ryan - I'd suggest running with offheap_objects, raising the commit log size 
limit to 16Gb, a memtable_cleanup_threshold of 0.99, setting your on heap 
memtable capacity to 4Gb, your offheap to 2Gb, and running the test with just 
one column of around 32 bytes. Then use 48M items and perform your normal run 
of insert then read; we *shouldn't* see any flush result from that if I did my 
quick bit of math correctly.


 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-20 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104307#comment-14104307
 ] 

Ryan McGuire commented on CASSANDRA-7282:
-

Here's a trial that incorporates some of the above suggestions:

http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.7282.default_mct.json

 * 48M 32 byte rows.
 * commitlog_total_space_in_mb: 16384
 * memtable_allocation_type: offheap_objects

I'm not familiar with the settings that heap memtable capacity to 4Gb, your 
offheap to 2Gb refers to. 

I tried adding a memtable_cleanup_threshold of 0.99 but the write failed on 
both runs when it started GC, so I'll try scaling that back some more (the 
above results are just using the default instead.)

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-20 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104607#comment-14104607
 ] 

Ryan McGuire commented on CASSANDRA-7282:
-

memtable_commit_threshold: 0.5 is the highest I've been able to go without the 
write failing mid-operation.

http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.7282.mct_05.json

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104924#comment-14104924
 ] 

Benedict commented on CASSANDRA-7282:
-

memtable_heap_space_in_mb: 4096
memtable_offheap_space_in_mb: 2048
-Xmx8Gb

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-20 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104970#comment-14104970
 ] 

Ryan McGuire commented on CASSANDRA-7282:
-

Thanks Benedict, I tried it with those new settings. Still using 
memtable_cleanup_threshold of 0.5 :

http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.7282.3.json

Getting a bit more interesting now, not sure what's up with that first read and 
then the drop.

I'll try scaling up memtable_cleanup_threshold again.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-20 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104988#comment-14104988
 ] 

Ryan McGuire commented on CASSANDRA-7282:
-

memtable_cleanup_threshold:0.99 is working with these new settings:

http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.7282.4.json

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105022#comment-14105022
 ] 

Benedict commented on CASSANDRA-7282:
-

Ryan provided me with logs, and we can see a flush trigger exactly as the 0.5 
mct write test finishes, explaining the dip. However the temporary dip below 
normal is a bit odd and I'd like to explain it better - best guess is key cache 
being populated; there is a slower ramp up for the normal run as well, here, 
which would be consistent with it (since it flushed earlier). We could try a 
run with key cache disabled, to confirm.

I'd also like to be able to explain the different GC characteristics - we GC 
less often with the new code, but have one _very_ long ParNew pause, fairly 
consistently. Currently I don't have a great explanation for this, except 
possibly the very large array allocations we need to do. Possibly switching to 
an array-of-arrays would be superior, keeping each of the arrays preferably  
1M or so

For the mct 0.99 run, what's most surprising is reads are _not_ particularly 
faster, which is not what I found running locally, and also conflicts with the 
non-trivial increase in write performance. Back to the drawing board to try and 
figure out exactly what's happening.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-19 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102900#comment-14102900
 ] 

Joshua McKenzie commented on CASSANDRA-7282:


[~enigmacurry] Any chance you got some performance numbers for the branch 
Benedict linked above?

[~benedict] Do you have any particular use-cases you think best showcase the 
benefits of the code-changes here?

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103245#comment-14103245
 ] 

Benedict commented on CASSANDRA-7282:
-

The only workload we'll easily measure it in at the moment is in-memory reads. 
So, increase total memtable space substantially, insert small records up to an 
amount small enough to fit entirely in the memtable but large enough to fill as 
much as possible, and then perform aggressive random reads.

It will also have a similar level of impact on improving _insert_ times, 
however since these are ultimately disk bound we are unlikely to be able to 
measure it as easily.


 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-19 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103252#comment-14103252
 ] 

Ryan McGuire commented on CASSANDRA-7282:
-

This fell under my radar, I've started a basic test comparing trunk to this. 
However, I notice this branch has diverged quite a bit, is that still the best 
comparison, or did you want to merge in changes?

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103259#comment-14103259
 ] 

Benedict commented on CASSANDRA-7282:
-

Can you compare to the SHA it was based on? If not I'll rebase

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-19 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103324#comment-14103324
 ] 

Ryan McGuire commented on CASSANDRA-7282:
-

Here's a test run with default settings. We fared just slightly better than 
stock.

http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.7282.defaults.json

Tomorrow I'll run a test incorporating [~benedict]'s suggestions regarding 
memtable and row size adjustments.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-07 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089019#comment-14089019
 ] 

Benedict commented on CASSANDRA-7282:
-

Just pushed a minor update removing an extraneous comment and making the resize 
threshold triggering uniform + clearer.

Agree it would be good to get some performance numbers on this, but I'm not 
sure which magical facilities you're referring to? New stress isn't likely to 
stress this bit out any more interestingly than old stress, and we don't yet 
have a working magical performance service I don't think...

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-07 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14089071#comment-14089071
 ] 

Aleksey Yeschenko commented on CASSANDRA-7282:
--

bq. I'm not sure which magical facilities you're referring to

Just Ryan's stress setup (:

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-06 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14087964#comment-14087964
 ] 

Jonathan Ellis commented on CASSANDRA-7282:
---

[~JoshuaMcKenzie] to review

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-08-06 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088631#comment-14088631
 ] 

Aleksey Yeschenko commented on CASSANDRA-7282:
--

Can we perf-test it first, using our new magical facilities, just to be sure?

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-06-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019074#comment-14019074
 ] 

Benedict commented on CASSANDRA-7282:
-

First version available 
[here|https://github.com/belliottsmith/cassandra/tree/7282-fastmemtablemap]

Need to perform some more rigorous performance tests, but I see a 7-10% bump on 
my box for naive stress test hitting memtable reads.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid skip list / hash map. The skip list would impose the order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the skip list and permitting O(1) lookups. It 
 should be possible to define the hash map to also permit O(1) inserts. Our 
 decorated keys are in fact perfectly designed for this scheme.
 At the same time, we can potentially improve the data locality in the skip 
 list by baking the initial 64 token bits directly into the structure, and 
 storing multiple values per skip list entry to improve cache performance, 
 bringing down memory and constant factor cpu overheads.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-06-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019075#comment-14019075
 ] 

Benedict commented on CASSANDRA-7282:
-

[~enigmacurry] how easy would it be to kick off a quick comparison of this with 
stock? Do we have any hardware that's unoccupied?

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid skip list / hash map. The skip list would impose the order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the skip list and permitting O(1) lookups. It 
 should be possible to define the hash map to also permit O(1) inserts. Our 
 decorated keys are in fact perfectly designed for this scheme.
 At the same time, we can potentially improve the data locality in the skip 
 list by baking the initial 64 token bits directly into the structure, and 
 storing multiple values per skip list entry to improve cache performance, 
 bringing down memory and constant factor cpu overheads.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-06-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019082#comment-14019082
 ] 

Benedict commented on CASSANDRA-7282:
-

Hmm. Scratch that, I've broken something.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid skip list / hash map. The skip list would impose the order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the skip list and permitting O(1) lookups. It 
 should be possible to define the hash map to also permit O(1) inserts. Our 
 decorated keys are in fact perfectly designed for this scheme.
 At the same time, we can potentially improve the data locality in the skip 
 list by baking the initial 64 token bits directly into the structure, and 
 storing multiple values per skip list entry to improve cache performance, 
 bringing down memory and constant factor cpu overheads.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-06-05 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019081#comment-14019081
 ] 

Ryan McGuire commented on CASSANDRA-7282:
-

Probably not til next week..

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid skip list / hash map. The skip list would impose the order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the skip list and permitting O(1) lookups. It 
 should be possible to define the hash map to also permit O(1) inserts. Our 
 decorated keys are in fact perfectly designed for this scheme.
 At the same time, we can potentially improve the data locality in the skip 
 list by baking the initial 64 token bits directly into the structure, and 
 storing multiple values per skip list entry to improve cache performance, 
 bringing down memory and constant factor cpu overheads.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-06-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019086#comment-14019086
 ] 

Benedict commented on CASSANDRA-7282:
-

Next week is fine

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid skip list / hash map. The skip list would impose the order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the skip list and permitting O(1) lookups. It 
 should be possible to define the hash map to also permit O(1) inserts. Our 
 decorated keys are in fact perfectly designed for this scheme.
 At the same time, we can potentially improve the data locality in the skip 
 list by baking the initial 64 token bits directly into the structure, and 
 storing multiple values per skip list entry to improve cache performance, 
 bringing down memory and constant factor cpu overheads.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7282) Faster Memtable map

2014-06-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019210#comment-14019210
 ] 

Benedict commented on CASSANDRA-7282:
-

There was a minor bug with compatability mode for non-murmur3 partitioners. 
Unfortunately it looks like the unit tests and dtests all specify 
ByteOrderedPartitioner, making this difficult to test thoroughly across all 
database features. It works fine with stress, but I will need to write some 
thorough bespoke unit tests.

 Faster Memtable map
 ---

 Key: CASSANDRA-7282
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 3.0


 Currently we maintain a ConcurrentSkipLastMap of DecoratedKey - Partition in 
 our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
 majority of users use a hash partitioner, it occurs to me we could maintain a 
 hybrid ordered list / hash map. The list would impose the normal order on the 
 collection, but a hash index would live alongside as part of the same data 
 structure, simply mapping into the list and permitting O(1) lookups and 
 inserts.
 I've chosen to implement this initial version as a linked-list node per item, 
 but we can optimise this in future by storing fatter nodes that permit a 
 cache-line's worth of hashes to be checked at once,  further reducing the 
 constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.2#6252)