[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2016-03-20 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203267#comment-15203267
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

With CASSANDRA-11206 we eliminate a big pain (IndexInfo GC pressure and 
serialization).

We _could_ move the key-cache into off-heap after 11206. But it's also possible 
that other optimizations (CASSANDRA-8931, CASSANDRA-9754 and CASSANDRA-9786) 
may result in a completely different structure. Having that said, I tend to 
prefer at least 8931 and 9786 before tackling this one and see what the outcome 
of these is.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.x
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2016-02-10 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140764#comment-15140764
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

Yes, it is.
Just want to get the current UDF/UDA stuff done before.
Generally, I'd like to see what's coming with CASSANDRA-9754. The current 
"problem" is the amount of off-heap serializations for big partitions as we do 
a binary search for the lookup. If that gets better with CASSANDRA-9754, that 
would be great.
We can optimize the whole path with CASSANDRA-8931 and CASSANDRA-9786. If these 
give promising results, a follow-up _might_ be to merge the summary into 
index-info.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.x
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2016-02-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139134#comment-15139134
 ] 

Jonathan Ellis commented on CASSANDRA-9738:
---

Is this still on your plate, Robert?

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.x
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-14 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743048#comment-14743048
 ] 

Sylvain Lebresne commented on CASSANDRA-9738:
-

bq. If the code is still being majorly revised so close to RC, we should 
consider pushing this back

I agree.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-14 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743058#comment-14743058
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

bq. consider pushing this back

Also +1.
The major open tasks are to eliminate the need to copy the RIE on cache-get, 
which requires bullet-proof ref-counting, and double buffering during RIE 
building, which is sometimes just a waste of CPU/heap as the returned RIE 
object is never used.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-14 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743321#comment-14743321
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

One thing to consider for RC1 IMO is to change the index file format. Having 
that for RC1 gives us the flexibility to add 9738 for 3.0.0 or 3.0.x - so I've 
created CASSANDRA-10314.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742670#comment-14742670
 ] 

Benedict commented on CASSANDRA-9738:
-

bq. In a lot of cases the RIE is going to be enclosed by a Closable iterator 
anyways.

Given our historically poor handling of resource release, and how many nooks 
and crannies they end up in now, I'd prefer we didn't depend on the closure of 
iterators until we have some better facilities in place for catching leaks, 
like CASSANDRA-9918. Right now I don't think we're likely to leak much on a 
per-request/leak basis (just prevent some shared cleanup, per discarded major 
resource, such as an sstable) if we fail to close an iterator. So this could 
magnify the effect of any such bug.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-13 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742700#comment-14742700
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---

OK, well I don't have an answer for 3.0 on how to completely eliminate the risk 
that the copying overhead won't be an issue.

If it keeps changing there is no way I will get to the point where I can +1 the 
performance dimension. I might be able to get to +1 correctness if things stop 
changing long enough for me to checkout it out and walk through everything. 
Performance could very well be good enough/better for realistically sized 
indexed entries. If we benchmark the final code and it's better then I guess I 
could be +1 on performance and ignore the overhead of allocating and copying 
for each cache access.

In hindsight wholesale replacing the existing key cache was not a good 
incremental way to effect this change in a low risk way. Introducing a new key 
cache and iterating on it would have allowed for faster development, more 
frequent merging to trunk, smaller reviews etc. We would also always have 
something to ship once release time rolls around even if something change was 
pending.

* [IndexEntryStats, I don't get why you are casting down to int, why not 
checked cast on the way 
out?|https://github.com/apache/cassandra/commit/5d53c29ceb915b1b95bca0fff15b1591d508e8c8#diff-75146ba408a51071a0b19ffdfbb2bb3cR1434]
* [I don't understand overflow protection here. It's a long is it really going 
to 
overflow?|https://github.com/apache/cassandra/commit/5d53c29ceb915b1b95bca0fff15b1591d508e8c8#diff-75146ba408a51071a0b19ffdfbb2bb3cR1434]
* Average not taking into account the distribution is suspect to me. Should the 
allocation be slightly larger than average to prevent reallocation?
* When allocating the offset buffer it's using the count of index infos without 
multiplying by their size?
* We are writing sampled records. I am skeptical this is worth optimizing when 
we have to get this done and correct. There are better options in subsequent 
versions like appending to a file.
* Did this change have an impact and improve some metric?


> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742716#comment-14742716
 ] 

Benedict commented on CASSANDRA-9738:
-

bq. In hindsight wholesale replacing the existing key cache was not a good 
incremental way to effect this change in a low risk way

So, looking at the latest performance comparisons, it doesn't look like the 
benefit is so uniformly profound as to be worth rushing this out. It looks like 
whatever was particularly affecting 3.0 in the early benchmarks (that notably 
wasn't affecting prior versions) has been ironed out. 

If the code is still being majorly revised so close to RC, we should consider 
pushing this back, and if incorporated before GA at least consider making it 
optional. It's turning into a patch of really significant size, and we already 
have a great deal of risk associated with this release.


> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-12 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742095#comment-14742095
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

Pushed another update with the following changes:
* Legacy IndexEntry objects / missing offsets: re-serializing seemed too 
expensive to me but not having the offsets in key-cache is also bad. For 3.0+ 
IndexEntry structures, we have the offsets - so that's not a problem. For 
legacy IndexEntry structures, it generates _all_ offsets. The change is that 
the key-cache will include the calculated offsets even for legacy tables. This 
eliminates the need to re-serialize the whole thing. (legacyIndexInfoSearch() 
is gone.)
* Moved the offsets-"array" back to the end of serialized IndexEntry structure.
* Sizing the serialization buffers: Tried to find easily and cheaply accessibly 
stats for partition size and (more important) serialized IndexEntry size for a 
CF but didn't find anything. So I added a (poor man) stats to CFMetaData that 
tracks the numbers to size the serialization buffers appropriately. These stats 
are not persisted, just calculated during runtime - so it's not a "perfect" 
thing. The problem with existing stats is that we know the mean partition size 
but not the mean size of the ClusteringPrefixes which could be used to 
initialize the poor-man stats during startup. (Note: current 
TableMetrics.meanPartitionSize iterates over all sstables to retrieve the value)


> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-11 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741107#comment-14741107
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---

When deserializing an old format IndexedEntry I think you have to rewrite it to 
generate the offsets. Otherwise the cache will never be populated with an entry 
where the offsets are calculated. That will make it slower than the older 
version. I also think this will be faster than generating it incrementally 
since it's a nice tight loop doing a scan of memory. If you are doing a binary 
search you will end up doing most of that work anyways and if it's a scan you 
will also end up doing most of it.

RowIndexEntry.java line 423, legacyIndexInfoSearch. It's still doing a loop 
from the beginning of the offsets to the last offset it calculated. There is no 
need we should know the last calculated offset available and skip to it. For a 
scan this operation becomes n^2 with that loop. I think it should go away 
completely. Just rewrite the IndexedEntry during deserialization since you are 
making a copy anyways when you bring it out of the field.

Jonathan told me the expectation is that people run upgrade sstables so we 
don't need to be heroic. Let's go for the simples possible solution which is 
making the old and new formats match after deserialization. Hopefully this 
means we can remove a bunch of paths based in which format we are looking at.

For cache hits we have to copy the entire IndexedEntry onto the heap into 
unpooled memory. That is making an operation that was lg N a linear operation 
to the size of the IndexedEntry. In terms of raw speed the on heap cache is 
going to be better off using the new serialization, but it will really poke the 
garbage collector in the eye. At least with the OHC cache the garbage is short 
lived.

I don't like to give people options they have to choose from, but I am more 
afraid of making the product unworkable for some use case. Maybe we should 
allow the key cache to be selectable for 3.0? Alternatively could you make 
RowIndexEntry closable and go with ref counting? I feel like these are the two 
options that get us to 3.0 while minimizing regret post release.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-11 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741308#comment-14741308
 ] 

Jonathan Ellis commented on CASSANDRA-9738:
---

bq. For cache hits we have to copy the entire IndexedEntry onto the heap into 
unpooled memory. That is making an operation that was lg N a linear operation 
to the size of the IndexedEntry. 

Is that the entire partition's worth of index then that we're copying on heap?

Can we use the partition size statistics to automatically pick the appropriate 
implementation?

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-11 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740441#comment-14740441
 ] 

Sylvain Lebresne commented on CASSANDRA-9738:
-

To put your mind at ease, I'm not particularly attached to any of the changes 
from CASSANDRA-10232. The goal of that ticket was to make very simple change 
that, given how we used RIE at the time, provided hopefully clear gains (not 
only for performance btw, getting smaller files on disk is a nice thing by 
itself). But if the uses of RIE change and other trade-offs makes more sense, 
so be it. Of course, the more we have time to validate those choices with 
benchmarks, the better.

bq. I am questioning this on the grounds that it requires walking the entire 
partition index and doing work even to random access?

If you mean that the cost of iterating over the index to recompute the offsets 
could outweight the gain of reading less data, then you could be right but my 
hunch is that this probably doesn't make much measurable difference in 
practice. So the goal was more to save some space on disk than anything else 
(it's not hugely important, but it's a nice to have). But anyway, I'll remark 
that even if we want to avoid that walk, keeping both the width and offset of 
each entry is still redundant. The other alternative is to keep the offset but 
ditch the width, as the width can be recomputed from the current index offset 
and the next one (so without a full walk). The reason I didn't went with that 
alternative originally is that I suspect the code to read old entries (the 
backward compatibility code) will be more awkward. Anyway, if we keep vint 
encoding, I guess keeping both the width and offset is not such a big deal and 
I'm fine with that if that's simpler, but it's definitively possible to keep 
only one.


> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-11 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741403#comment-14741403
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---

bq. Is that the entire partition's worth of index then that we're copying on 
heap?
Yes

bq. Can we use the partition size statistics to automatically pick the 
appropriate implementation?
Possibly. If we have two cache implementations at once how do you configure how 
big each one is? It seems tricky to do in one week. 

I think we would have better luck trying to get reference counting working so 
there doesn't need to be a copy. In a lot of cases the RIE is going to be 
enclosed by a Closable iterator anyways.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-11 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741467#comment-14741467
 ] 

Michael Kjellman commented on CASSANDRA-9738:
-

[~aweisberg] i'm having trouble keeping up with the number of diff's created 
from editing your comments on this thread. Any chance you could create new 
comments instead of editing the previous ones? thanks

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-10 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739297#comment-14739297
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

I've got a couple of cstar perf tests done. Here's a short summary:

* OHC-KC is at least as fast as vanilla 3.0, if not faster.
* 3.0 generally beats 2.2 - especially with "big" partitions. Most of the 
remaining cases are mitigated with OHC-KC.
* 2.2 shows better numbers in a view cases - especially reading small rows. But 
that's not a OHC-KC issue - it's also true for vanilla 3.0.
* OHC-KC generally shows equal if not better GC pressure.

Now going to write a "wall of text" describing OHC-KC changes/architecture and 
a more sophisticated performance test interpretation.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-10 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739436#comment-14739436
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---

[~slebresne]
For the OHC key cache we are looking at rolling back the core changes for 
#10232 because they get in the way of accessing the data off heap when we want 
to random access fields. Also the RIE is sampling so the value of slightly 
shrinking the entries seems low to me. There is a time/space tradeoff to make 
since we have to binary search the indexed RIE.

Some of the other optimizations don't work like dropping fields and 
recalculating them because we don't want to process the index to read it just 
copy the bytes. I think this also prevented the optimization where you rebased 
the offset at 64k because that also requires a pass through the data to parse.

I asked snazy to benchmark with large partitions with and without vints, but I 
think it's splitting hairs either way.

It's a bad situation in general because when we cache/uncache these things we 
copy the entire thing which is yucky. It will be faster this way because at 
least it is a straight memory copy without an object graph or parsing.

My pony for 3.0 would be to 64-bit map the entire index file, then get 32-bit 
ByteBuffer slices out of it and not use the key cache entirely, but there is a 
time constraint.

I think just mapping it naively might get us 80 in the 80/20 of CASSANDRA-9754.

Benedict also mentioned CASSANDRA-8931  which sounds like good next release 
fodder for making the index smaller.


> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-10 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739576#comment-14739576
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---

Thanks to Benedict for providing some clarity on this.

So there are two general access paths to an Indexed RIE, there is a scan and a 
binary search to support random access. For the scan it is fine to materialize 
an entire IndexInfo. For the binary search case we don't want to materialize an 
IndexInfo object as this would hurt performance compared to the current POJO 
implementation.

The current code has cases where it access fields from the IndexInfo by index. 
I would like to get away from that and just return the POJO and access fields 
from the POJO. As far as we know there is no degenerate case where it's pulling 
all the fields from different indexes interleaved.

We are getting a big win from this compared to the POJO implementation simply 
by reducing the cost of loading/unloading an IndexedEntry to a memory copy, as 
well as reducing the cost of building an IndexedEntry by serializing it up 
front instead of building a list of POJOs and then coming back and serializing 
it.

We should be able to preserve the use of vints. We should optimize the layout 
of an IndexInfo by having the clustering prefix field as the first field so 
that binary search doesn't have to do extra decoding. During a scan the cost of 
materializing and extra decoding (which we can avoid later if we want) is small 
compared to total operation cost for each entry materialized.

Another optimization in addition to vints (and Benedict we didn't talk about 
this in the hangout) was dropping the offset field from IndexInfo.
[This was then recalculated in 
AbstractSSTableIterator.IndexState|https://github.com/apache/cassandra/compare/trunk...pcmanus:10232#diff-fb1874f891c1a014fb57f8b4e42b5247R431].
 I don't see a conflict between 9738 and this choice, but now I am questioning 
this on the grounds that it requires walking the entire partition index and 
doing work even to random access? I didn't pick up on that in my review of 
10232. We have also agreed that because the IndexedEntries are sampling (and 
not per row or per partition) they are not as size constrained so keeping the 
field seems like the right choice.

The last optimization from 10232 I want to consider is using the 64k WIDTH_BASE 
to reduce the size of the offset field. I don't see why we can't preserve that.

We also want to keep the [reduction in serializer 
allocations|https://github.com/apache/cassandra/commit/4dfbba680620fef985cb2b3f00456ee8155404e0].
 I checked and at it looks like that has been preserved.





> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-09 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736549#comment-14736549
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

Some of the missing test-coverage is blocked by CASSANDRA-10237.
Will work on the rest of the issues.

Also scheduled a bunch of perf tests on cstar with a few but big partitions 
(100k and 500k rows, gaussian distribution).
Will post a summary of these tests later.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-08 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735591#comment-14735591
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---

Just going off of coverage from "ant test" not including your work today. I 
have had some bad luck recently with emma claiming stuff is not covered. 
Hopefully it's accurate in these instances.

* Serializers.LegacyClusteringPrefixSerializer.deserialize isn't tested. Is 
that blocked by CASSANDRA10237?
* IndexInfo.skip isn't labeled as covered and neither is serializedSize
* RowIndexEntry.equals is not covered, neither is hashCode, same for 
RIE.IndexedEntry
* RowIndexEntry.Serializer.serialize doesn't test manually serializing the old 
version
* RowIndexEntry.IndexedEntry constructor doesn't test loading the old version
* RowIndexEntry.IndexedEntry.indexInfo doesn't test the old version, also the 
branch at the end where it stores the offset it happens to know about after 
finishing
* RowIndexEntry.IndexedEntry.indexInfo does it have to do that loop for already 
discovered offsets? Can you store that value and not loop?
* RIE.IndexedEntry.indexCount() could store the offset as a constant, also 
firstIndexInfoOffset
* RIE.IndexedEntry.promotedSize for the old version not covered
* Slice.Serializer.skip() is not tested/covered, neither is serialized size or 
part of deserialize, but you didn't change the latter two
* [RowIndexEntyrTest has commented out 
code|https://github.com/apache/cassandra/compare/cassandra-3.0...snazy:9738-key-cache-ohc#diff-7972c5d16db58cd0a0e63e95c4de264aR203]

Reviewing test contents now. Also going to update and look at your latest. 




> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-08 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735326#comment-14735326
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---


* [This line could just be 
instanceof?|https://github.com/apache/cassandra/compare/cassandra-3.0...snazy:9738-key-cache-ohc#diff-9b961593e363ad6a553e5af1ff1663b1R164]
* [Same 
here?|https://github.com/apache/cassandra/compare/cassandra-3.0...snazy:9738-key-cache-ohc#diff-9b961593e363ad6a553e5af1ff1663b1R544]
* I think we may want to consider not closing streams that wrap fixed buffers. 
The close operation does nothing, it throws an exception, and all that is going 
to do is generate useless instructions and prevent optimization. Maybe as a 
separate ticket we should go back and look for things that do that.
* DataInputBuffer doesn't have an efficient skip method. The skip method 
implementation in super classes expects that it is wrapping a channel or stream 
and skipping requires reading. For a fixed ByteBuffer there is no need to 
allocate or copy memory. MemoryInputStream has the same issue in that it could 
be efficient but isn't. Maybe they should have a common base class since they 
both wrap memory like things and can have memory (as opposed to stream or 
channel) specific implementations.
* [Dead 
code|https://github.com/apache/cassandra/compare/cassandra-3.0...snazy:9738-key-cache-ohc#diff-9b961593e363ad6a553e5af1ff1663b1R389]
* [I don't think you need to incorporate the buffer in the hash code. It's 
wasted cycles, position is sufficient? Not sure if we are using 
it.|https://github.com/apache/cassandra/compare/cassandra-3.0...snazy:9738-key-cache-ohc#diff-9b961593e363ad6a553e5af1ff1663b1R552}
* What's the difference between the ClusteringPrefix serializer in Serializers 
and ClusteringPrefix? Did you have to add the legacy support?

That's my reaction to the code. Working on tests and coverage now.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-08 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735342#comment-14735342
 ] 

Benedict commented on CASSANDRA-9738:
-

It can actually be faster to compare the exact class instance, as this is 
pretty well optimised by hotspot (and should just involve comparing the object 
headers). It is unlikely to make _much_ of a difference here, especially with 
the shallow class hierarchy here, just noting it.

In this case it looks like for correctness in the {{RowIndexEntry}} class it 
needs to be an exact match anyway though.

The IndexedEntry doesn't need to duplicate those checks though, since they're 
performed by the call to RIE.equals

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-04 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731817#comment-14731817
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

Had to rebuild the branch as rebase w/ resolving conflicts would have been like 
opening a can of worms.
Unit tests look good and cstar shows the known numbers.

The old state is still there for reference 
[here|https://github.com/snazy/cassandra/tree/9738-key-cache-ohc-before-10136-notest]
 and [here 
(squashed)|https://github.com/snazy/cassandra/tree/9738-key-cache-ohc-before-10136-squashed-notest].

Optimizations in the "new" version are not that sophisticated as in the "old" 
one for backward compatibility as implemented by CASSANDRA-10136.


> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-03 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728610#comment-14728610
 ] 

Michael Kjellman commented on CASSANDRA-9738:
-

I've been making a good amount of progress on 9754. I'm not sure if I can give 
you an exact date at this point but I'm hoping it will be in the near future.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0 beta 2
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-01 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726135#comment-14726135
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

Pushed some more commits.
Includes serialization of IndexInfo offsets.
Adds tests for RowIndexEntry class and legacy sstables based on CASSANDRA-10236 
(latter not working yet due to CASSANDRA-10237).

Open todo: benchmarking/comparison against "old" implementation, verification 
of compatibility with legacy sstables.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0 beta 2
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-09-01 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724942#comment-14724942
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

bq. index info offsets in the serialization

Yes, we can still change the on-disk format for indexes in 3.0. It's a todo for 
this ticket.

bq. Caching the last used IndexInfo is going to force a little bit of promotion

Did that because some places in the code regularly access the "current" 
IndexInfo. Feels cheaper than having to deserialize the whole object multiple 
times.

bq. operate on the IndexInfo as a ByteBuffer 

Would be beneficial (could save some two memory copies per IndexInfo) - but as 
you said it's not easy. I gave it a quick try but ref-counting watchdog 
complained about unreleased references. I think we still save a lot with the 
serialized index-offsets and only deserializing the really needed objects.

bq. Reference counting shouldn't be too bad

Seems doable (together with IndexInfo as a ByteBuffer). But I'd like to defer 
this to a follow-up ticket.

I think we can get this working for 3.0 - so saving temporary garbage on the 
heap for cached and non-cached RIE.
As I said in 9754, I think that RIE+II (including key cache) has probably 
reached its EOL with all the ongoing effort to optimize/replace the structure 
and algorithms.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0 beta 2
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-31 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723736#comment-14723736
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---

OK, looks like what I was thinking at a high level. Since it is in progress I 
won't go deeper yet. I would include the index info offsets in the 
serialization and just populate them from the start so random access is simple. 
It's not going to be a worse memory impact than the existing Java object graph 
that has the array. Caching the last used IndexInfo is going to force a little 
bit of promotion. Not sure how much.

It's not going to be easy, but if we can operate on the IndexInfo as a 
ByteBuffer we can avoid deserializing them during the binary search. Getting 
ClusteringComparator and ClusteringPrefix to that point does look hard. We 
measure as is and see if it's enough garbage and to matter.

Reference counting shouldn't be too bad. In all the cases I could find the 
index info is consumed immediately and discarded or composed into an iterator 
implementing Closable. There is a bit of rope to hang to hang ourselves with 
there for long lived iterators pinning stuff in the cache. This is probably 
something we should test in OHC.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0 beta 2
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-31 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723761#comment-14723761
 ] 

sankalp kohli commented on CASSANDRA-9738:
--

It looks like there is lot of overlap with CASSANDRA-9754

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0 beta 2
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-31 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723917#comment-14723917
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---

I agree there is overlap in. The scope of what we need to make the key cache 
work is smaller, but it might be game over for the key cache and large 
partitions anyways even with Robert's improvements. Constantly loading and 
unloading large partitions is never going to work. We are at least reaching 
parity with the old key cache.

If we could map the entire partition index then you could point this code at 
the mapping and it would work. It would not be ideal because what we are doing 
is a basic translation of the object graph which is an array pointing to 
variable size objects. Binary searches are going to result in access to two 
pages for each comparison. Possibly still better than what we have today?

I think the trick right now is figuring out what we can do for 3.0, and what 
the next intermediate step is. If Robert goes and makes this work off-heap then 
the key cache can maybe go off heap for 3.0. Seems like removing the key cache 
and reliably operating against a memory map is unlikely for 3.0, but maybe 
shortly after.

What would the timeline be for 9754? That kind of determines what intermediate 
steps make sense.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
> Fix For: 3.0 beta 2
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-30 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721715#comment-14721715
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

Code's been refactored and unit tests have no failures, but some dtests failed 
recently - so it's not yet fully done.

There are several commits on the branch.
It starts with cleanup refactorings and preparation by encapsulating all 
RowIndexEntry and IndexInfo accesses.
Next is introduction of ByteBuffer (744dde329a084a7fdab52fbbed11a15b296bab35), 
which is the main commit.
Following commits are optimizations on top of that commit.

The main change is that all IndexInfo objects are just present as a single 
(on-heap) ByteBuffer. IndexInfo objects are created when needed. This helps to 
reduce the amount of small, temporary and maybe unused objects - e.g. during 
binary search.

I've got no benchmark in place yet and latest cstar tests neither show anything 
new nor a regression. I just didn't succeed to get really big partitions - so 
I'll try to build some artificial microbenchmarks to get some comparable 
numbers and prove that the approach also works for really big partitions (at 
least in theory it should, but who knows).

The plan to do binary search in off-heap key cache or deserialize IndexInfo 
directly from off-heap or even just retrieve requested information without 
deserialization at all is not yet done. It requires more refactoring than I 
expected as RowIndexEntry and IndexInfo objects are used at many distinct 
places and exposing the off-heap ByteBuffer containing the serialized 
RowIndexEntry+IndexInfos requires proper reference counting (or risk off-heap 
memory leaks).

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-29 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721020#comment-14721020
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

bq. If you can do it in time it's fine for 3.0.

K, will try :)

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-28 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14719307#comment-14719307
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---

For observers. We hit a snag. The key cache values can be quite large. There is 
an entry in each value for every row per partition so it can be in the 
thousands. This likely means that copying the entire thing on heap to operate 
on it once per read is not going to match the performance of the existing POJO 
implementation. Robert is going to benchmark a more representative 
configuration.

It's a tractable problem but we will need to an off heap list implementation 
for variable size objects that supports binary search without materializing 
each entry in the search.

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-28 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720240#comment-14720240
 ] 

sankalp kohli commented on CASSANDRA-9738:
--

[~mkjellman] is cooking something like this in CASSANDRA-9754. I will let him 
give more details. 

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-28 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720378#comment-14720378
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

As an intermediate solution (it's actually more a workaround) we could just not 
add key-cache entries bigger than a configurable size (maybe 64kB?) and find a 
solution that works for both key-cache and disk (CASSANDRA-9754). Having 
something that can do the search in a {{ByteBuffer}} might be nice as it could 
be used in off-heap and on disk (assuming it's mmapped). /cc [~mkjellman]

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-28 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720557#comment-14720557
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---

I haven't looked to deep into it yet, but why would it be out for 3.0? Is it 
not possible to bridge an off heap list with the on heap list implementation 
that is already there?

The current formulation is out unless a benchmark of many rows per partition 
demonstrates no slow down compared to the POJO version on a read intensive 
benchmark with a cacheable distribution (not uniform).

If you can do it in time it's fine for 3.0.

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-28 Thread Michael Kjellman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720518#comment-14720518
 ] 

Michael Kjellman commented on CASSANDRA-9738:
-

[~snazy] I had floated the idea of a size limit with [~iamaleksey] and he 
wasn't a fan

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-28 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720530#comment-14720530
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

yea - the idea's bad. It locks out big partitions (that really need the key 
cache). Assume that idea moved to /dev/null.

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-28 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720486#comment-14720486
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

It's just brain dumping an idea: we could change RowIndexEntry and related 
IndexInfo plus its comparators to be off heap compatible (means work on a 
ByteBuffer but not change the serialization format or algorithm itself). Having 
this would also save polluting the heap with RIE+II objects when reading it 
from disk as it could just load one ByteBuffer and operate on that.
In other words: make {{RowIndexEntry}} a wrapper for a {{ByteBuffer}} without 
the need to deserialize any object structure during reads.

CASSANDRA-9754 could further optimize this to handle really big partitions?

But having this problem with big RowIndexEntries (lots of IndexInfo objects) I 
guess migrating KC off-heap is out of 3.0?

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-27 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716905#comment-14716905
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---

How big is the realistic size range of key cache values?

The only thing that makes me uncomfortable right now is the buffer handling for 
the serializing the key cache value in serializedSize(). Resizing in the loop 
is 4k at a time and not doubling. I think it at least needs to double to manage 
whatever the worst case is.

It seems to me like the right tack on failure is to allocate the correct size 
buffer by calculating the size the old way or double (for lg(n)). Doubling is 
fine if we really never expect sane data models to hit it. Doubling is not 
going to be great for people to hit under real world conditions especially if 
they have to do it several times.

It's tempting to ask to have 10189 expanded to be make this stuff fast and 
then do it the old way in this change set, but I could be convinced if there 
isn't a really bad corner case.

Also next time could you wait +1 to squash? I can't see your last few change 
sets once you squash. I am guilty of this too and now I know how it feels.

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-25 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712038#comment-14712038
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

As discussed offline:
* current branch uses its own string (de)serialization, is squashed and rebased
* opened CASSANDRA-10189 as a follow-up for unified read/writeUTF code paths


 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-24 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709670#comment-14709670
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---

For reading a string there is no reason not to use the last bit and allow 
strings 65k strings. It also doesn't throw if the length is too large.

I really would like to get to the one true string reading/writing path if 
possible. So we can put together one good implementation and also to optimize 
icache. To allocate or pool temporary buffers is kind of a question of how 
allocation bound you are. I think the answer for C* right now is very.

Is it feasible to share the same pooling buffer for both the key serialization 
and value serialization? Maybe a little finicky given that size is sometimes 
retrieved in an order that is different from the order they are serialized in.

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-23 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708431#comment-14708431
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

Branch is ready for another review round.
It's rebased against current 3.0 and trunk.

Changes are:
* updated to ohc 0.4.2 (to get rid of {{ByteBuffer.order()}} calls and the 
try-finally-clauses)
* no more duplicate serialization effort in {{serializedSize()}} + 
{{serialize()}} - uses a thread-local serialization buffer. Can be further 
improved with CASSANDRA-9929.
* added/moved string serialization methods to {{ByteBufferUtil}}, which has 
distinct code paths for 7-bit ASCII and strings with wider chars


 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-19 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702991#comment-14702991
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

Pushed changes to existing branches (plus a new branch for merge to trunk). 
Included is a new {{KeyCacheCqlTest}} to test 2i and large partitions (verified 
via key cache metrics) and addressing the review comments (in a separate 
commit).

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-19 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703452#comment-14703452
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---

In OHCKeyCache when calculating the length of strings you are calculating the 
length using String.length() which returns a wrong answer if you encode using 
UTF-8. 

We went to some length to come up with an optimal no garbage string writing 
function for BufferedDataOutputStream (you can see it as a static method in 
UnbufferedDataOutputStream). It would be great if we could do the same thing 
here and not allocate byte arrays for the encoded names. Would it be crazy for 
it take a lambda that describes how to write the generated byte array to some 
output? Then we could use the same code everywhere. [~benedict] what do you 
think?

Since you are using a short length prefix what is your level of confidence it 
will always be enough? How does it fail if it isn't?

The 2i path testing is much appreciated. The utests didn't seem to make it in 
cassci.

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-19 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703613#comment-14703613
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

Ouch, yes, names can contain UTF8 chars.
[Pushed a 
commit|https://github.com/snazy/cassandra/commit/2e2d572ea0c30c2ef1bba49df9a6667f7b51fc4a#diff-df8196b2b182d7e311c455c5d6115f80]
 as _a demo_ of how the lambda approach could look like. writeUTF could be nice 
- but readUTF would require two method refs (one for reading the len and one 
for reading bytes).
read/writeUTF in java.io also use unsigned short for serialization.

Hm - Mrs. cassci seems to be annoyed... (Slave went offline during the build)

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-19 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702731#comment-14702731
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

Thanks :)

bq. How are the 2i paths tested?

Currently implicitly via other unit tests. But I'll add some unit test for that.

bq. The null case in makeVal isn't tested, maybe not that interesting

Oh. Although it doesn't cause any problems, passing null to OHC is not allowed. 
I've added an assertion.

bq. SerializationHeader forKeyCache

Yes. It's intentionally racy to prevent blocking operations. The chance for 
such a race is probably low and heals itself. Added a comment for that.

bq. comment about singleton weigher

removed

bq. NIODataInputStream has a derived class DataInputBuffer

changed to use DIB

bq. string encoding and decoding helpers

yup - makes sense. refactored.

bq. An enhancement we can file for later is to replace those strings with vints 
that reference a map of possible table names.

My idea is to remove strings at all from the key cache. Keyspace + CF names can 
be handled by CASSANDRA-10028. Not sure how to handle file/path names - maybe 
using some sparse list structure for sstable generations (in another ticket).

Haven't pushed anything yet - but will update my branch soon.

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-17 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700387#comment-14700387
 ] 

Ariel Weisberg commented on CASSANDRA-9738:
---

Good stuff. Coverage of the OHC key cache provider looks good.

* How are the 2i paths tested?
* The null case in makeVal isn't tested, maybe not that interesting
* SerializationHeader forKeyCache is racy and can result in an undersize array 
clobbering a properly sized one. But... it doesn't retrieve the value it sets 
so odds are eventually it will work out to be the longer one. It works it's 
just intentionally racy.
* In CacheService does that comment about singleton weigher even make sense 
anymore?
* NIODataInputStream has a derived class DataInputBuffer that exposes the 
constructor you made public.
* The string encoding and decoding helpers you wrote seem like they should be 
factored out somewhere else, maybe ByteBufferUtil? Also you don't specify a 
string encoding and there may be some issues with serialized size of non-latin 
characters lurking as well.
* An enhancement we can file for later is to replace those strings with vints 
that reference a map of possible table names. For persistence definitely fully 
qualify, but in memory we can store more entries that way.

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0 beta 2


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-14 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697543#comment-14697543
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

Code coverage looks not bad so far. Important code is covered (by nature) from 
nearly every unit test.
getter/setter on CacheService for all caches are uncovered.
Key cache's hotKeyIterator() is also uncovered.
Will add unit tests for them in the respective test classes.

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0.0 rc1


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-12 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693369#comment-14693369
 ] 

Aleksey Yeschenko commented on CASSANDRA-9738:
--

[~aweisberg] Could you review?

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0.0 rc1


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-06 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659996#comment-14659996
 ] 

Jonathan Ellis commented on CASSANDRA-9738:
---

What does code coverage show?  Because I know that would be Ariel's first 
question. :)

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0.0 rc1


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-06 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659619#comment-14659619
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

I’d like to propose this patch to be included in 3.0. I hope the cstar tests 
are sufficient but otherwise I can deliver more with different workloads.

h2. cstar tests

All cstar tests mentioned below perform three operations: write-only, mixed and 
read-only.
Unfortunately, cassandra-stress seems to reduce the really possible write 
throughput for workloads with clustering keys.

All tests on this patch show reduced GC pressure (for reads, of course).
By that it gives G1 more headroom to operate and and often gains about 10-15% 
read perf improvement depending on the hardware (in this case bdplab vs. 
blade_11_b) - bdplab (spinning disks, less RAM) shows a bigger improvement.

h3. one big clustering key

user, native, cql3, user 
[profile|https://gist.github.com/snazy/b6c160c65001eb074784]

[blade_11_b|http://cstar.datastax.com/tests/id/7f7265a2-3aee-11e5-b022-42010af0688f]
[bdplab|http://cstar.datastax.com/tests/id/af344e3e-3af0-11e5-b379-42010af0688f]

h3. big clustering over two clustering columns

user, native, cql3, user 
[profile|https://gist.github.com/snazy/351156424929d868baf3]

[blade_11_b|http://cstar.datastax.com/tests/id/e919725a-3b68-11e5-b590-42010af0688f]
[bdplab|http://cstar.datastax.com/tests/id/36f1d0ee-3b8c-11e5-9c9e-42010af0688f]

h3. big clustering over two clustering columns, reduced threads for pure-write 
and mixed operations

user, native, cql3, user 
[profile|https://gist.github.com/snazy/e4579499f61911802fcd]

[blade_11_b|http://cstar.datastax.com/tests/id/36f1d0ee-3b8c-11e5-9c9e-42010af0688f]
[bdplab|http://cstar.datastax.com/tests/id/07754e44-3b8d-11e5-9c9e-42010af0688f]

h3. stress _write_, _mixed_, _read_

[blade_11_b|http://cstar.datastax.com/tests/id/def04c20-3b8d-11e5-9c9e-42010af0688f]
[bdplab|http://cstar.datastax.com/tests/id/f3f5c172-3b8d-11e5-9c9e-42010af0688f]

h2. Git branch + cassci

[git branch|https://github.com/snazy/cassandra/tree/9738-key-cache-ohc]
[unit 
tests|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-9738-key-cache-ohc-testall/]
[dtests|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-9738-key-cache-ohc-dtest/]

I didn’t see any failed tests related to this patch.

There is another branch on github as well which contains [optimizations not 
purely related to 
key-cache|https://github.com/snazy/cassandra/tree/9738-key-cache-ref]. 
{{9738-key-cache-ohc}} is based on that branch and contains:
* “singletons” for key-cache {{o.a.c.db.SerializationHeader}} instances 
(dynamically extended, if required)
* “singletons” for {{IndexInfo.Serializer}} in {{o.a.c.db.Serializers}} 
(dynamically extended, if required)
* “singletons” for {{BigVersion}} instances for {{ma}}, {{la}}, {{ka}}, {{jb}} 
- other versions get temporary objects (some tests use older sstable versions)

h2. Further optimisations

There are some things that can be optimised in the future:
* Currently we need to serialise keyspace and cf names _and_ cfId. This is 
necessary since cfID of secondary indexes is inherited from the base table. If 
all tables and all secondary indexes have unique IDs, we can omit KS and CF 
name serialisation (and it’s weird {{cfName.contains(‘.’)}} 2i detection). Can 
be built with or after 2i API redesign.
* The full directory path is serialised. This appears to be less expensive than 
iterating of the whole {{List}} of sstables and identifying an sstable by its 
generation.
* As [~benedict] suggested, we can switch to very tiny key-cache entries and 
also omit serialisation of {{IndexInfo}}.


 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0.0 rc1


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-05 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14655012#comment-14655012
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

[~kohlisankalp] ok :)

Here are two more cstar results using [this 
profile|https://gist.github.com/snazy/b6c160c65001eb074784].
[blade_11_b|http://cstar.datastax.com/graph?stats=7f7265a2-3aee-11e5-b022-42010af0688fmetric=op_rateoperation=3_usersmoothing=1show_aggregates=truexmin=0xmax=190.96ymin=0ymax=113547.5]
 shows the same OHC having the same throughput as 2.2 but with less GC pauses ; 
both are faster than 3.0 alone.
[bdplab|http://cstar.datastax.com/graph?stats=af344e3e-3af0-11e5-b379-42010af0688fmetric=op_rateoperation=3_usersmoothing=1show_aggregates=truexmin=0xmax=254.65ymin=0ymax=97972.6]
 shows OHC having better throughput than 2.2 and 3.0 ; also less GC effort.
Both environments show this behaviour in every test - guess, it's ok to blame 
it spinning disks and less RAM.

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0.0 rc1


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-04 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654610#comment-14654610
 ] 

sankalp kohli commented on CASSANDRA-9738:
--

Yes I am +1 on the change. We should test this with different partition sizes. 

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0.0 rc1


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-04 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654195#comment-14654195
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

FTR: I've added a chunked implementation to OHC but it is not included in this 
ticket. A thorough review of that chunked implementation would be required. The 
chunked impl has the advantage to reduce malloc/free-rate to nearly 0 at the 
cost that the key-cache cannot be resized dynamically (at least yet).

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0.0 rc1


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-04 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654333#comment-14654333
 ] 

sankalp kohli commented on CASSANDRA-9738:
--

[~snazy] In your test, how big are the CQL partitions and how many IndexInfo 
objects are being generated? That will determine the improvement we will see. 

Also since we need to deserialize these objects on key cache hit, I am not sure 
how it will affect large CQL partitions. 

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0.0 rc1


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-08-04 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654374#comment-14654374
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

[~kohlisankalp] all previous tests were plain old simple stress profiles - so 
no big partitions. The 
[current|https://gist.github.com/snazy/b6c160c65001eb074784] and next cstar 
tests have different stress profiles with different partition sizes. I 
appreciate suggestions for real-world stress profiles.
But even if this gives no throughput benefit, reduced GC pressure (or not 
letting objects escape from new-gen) is worth the change.

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0.0 rc1


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-07-29 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645767#comment-14645767
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

[This cstar test] is probably interesting. It compares 2.2, trunk, trunk w/ 
ParNewCMS, 9738, 9738 w/ ParNewCMS. The reason for the additional ParNewCMS 
configs is that it was the only reasonable change that can explain the weird 
throughput graphs (and huge GC pauses) for trunk.
After some offline discussion yesterday with [~benedict] and [~tjake], 
comparing G1 w/ ParNewCMS we concluded that G1 is the cause of the weird 
graphs + long GC pauses for trunk. G1 requires more head room to perform 
better. And 9738 gives it more head room (by moving stuff from heap to 
off-heap).

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.x


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-07-29 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645807#comment-14645807
 ] 

Benedict commented on CASSANDRA-9738:
-

Looks like improved occupancy is also a factor (comparing run 2 to run 3). I'd 
prefer to eliminate the malloc/free from cache maintenance before we hook OHC 
into the key cache, but these numbers are pretty compelling and we should 
probably consider it for 3.0.

I suspect CASSANDRA-8930 will permit us to deliver a more efficient and high 
occupancy key-cache, but that shouldn't prevent us doing something in the 
meantime.

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.x


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-07-29 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646112#comment-14646112
 ] 

Benedict commented on CASSANDRA-9738:
-

Yes. Thanks

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.x


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-07-29 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646091#comment-14646091
 ] 

Jonathan Ellis commented on CASSANDRA-9738:
---

Did you mean to link a different issue?

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.x


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-07-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620551#comment-14620551
 ] 

Jonathan Ellis commented on CASSANDRA-9738:
---

My next question was going to be why GC activity went *up* but I see you edited 
the table. :)

Very promising!

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
 Fix For: 3.x


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-07-09 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619987#comment-14619987
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

bq. why would the cache improvements improve write speed?

TBH - I was surprised by that, too. I guess, it's key cache invalidation during 
compaction ({{SSTablRewriter.InvalidateKeys}}).

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
 Fix For: 3.x


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-07-08 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619099#comment-14619099
 ] 

Jonathan Ellis commented on CASSANDRA-9738:
---

That definitely looks interesting.  But why would the cache improvements 
improve write speed?

 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
 Fix For: 3.x


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2015-07-08 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618972#comment-14618972
 ] 

Robert Stupp commented on CASSANDRA-9738:
-

I did some [coding|https://github.com/snazy/cassandra/tree/9738-key-cache-ohc] 
just to see what this might buy us.
[cstar 
bench|http://cstar.datastax.com/graph?stats=e0c7514c-258f-11e5-8c01-42010af0688fmetric=op_rateoperation=3_readsmoothing=1show_aggregates=truexmin=0xmax=138.16ymin=0ymax=185288.4]
 gives these nice numbers:

| || reads/sec || writes/sec || total gc time, 2nd read
|| trunk | 120k | 122k | 2sec (111 gcs, 18ms avg)
|| with CASSANDRA-9718 | 128k | 131k | 24sec (106 gcs, 222ms avg)
|| with key-cache off-heap | 163k | 136k | 23sec (104 gcs, 222ms avg)


 Migrate key-cache to be fully off-heap
 --

 Key: CASSANDRA-9738
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Robert Stupp
 Fix For: 3.x


 Key cache still uses a concurrent map on-heap. This could go to off-heap and 
 feels doable now after CASSANDRA-8099.
 Evaluation should be done in advance based on a POC to prove that pure 
 off-heap counter cache buys a performance and/or gc-pressure improvement.
 In theory, elimination of on-heap management of the map should buy us some 
 benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)