[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203267#comment-15203267 ] Robert Stupp commented on CASSANDRA-9738: - With CASSANDRA-11206 we eliminate a big pain (IndexInfo GC pressure and serialization). We _could_ move the key-cache into off-heap after 11206. But it's also possible that other optimizations (CASSANDRA-8931, CASSANDRA-9754 and CASSANDRA-9786) may result in a completely different structure. Having that said, I tend to prefer at least 8931 and 9786 before tackling this one and see what the outcome of these is. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.x > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140764#comment-15140764 ] Robert Stupp commented on CASSANDRA-9738: - Yes, it is. Just want to get the current UDF/UDA stuff done before. Generally, I'd like to see what's coming with CASSANDRA-9754. The current "problem" is the amount of off-heap serializations for big partitions as we do a binary search for the lookup. If that gets better with CASSANDRA-9754, that would be great. We can optimize the whole path with CASSANDRA-8931 and CASSANDRA-9786. If these give promising results, a follow-up _might_ be to merge the summary into index-info. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.x > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139134#comment-15139134 ] Jonathan Ellis commented on CASSANDRA-9738: --- Is this still on your plate, Robert? > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.x > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743048#comment-14743048 ] Sylvain Lebresne commented on CASSANDRA-9738: - bq. If the code is still being majorly revised so close to RC, we should consider pushing this back I agree. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743058#comment-14743058 ] Robert Stupp commented on CASSANDRA-9738: - bq. consider pushing this back Also +1. The major open tasks are to eliminate the need to copy the RIE on cache-get, which requires bullet-proof ref-counting, and double buffering during RIE building, which is sometimes just a waste of CPU/heap as the returned RIE object is never used. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743321#comment-14743321 ] Robert Stupp commented on CASSANDRA-9738: - One thing to consider for RC1 IMO is to change the index file format. Having that for RC1 gives us the flexibility to add 9738 for 3.0.0 or 3.0.x - so I've created CASSANDRA-10314. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742670#comment-14742670 ] Benedict commented on CASSANDRA-9738: - bq. In a lot of cases the RIE is going to be enclosed by a Closable iterator anyways. Given our historically poor handling of resource release, and how many nooks and crannies they end up in now, I'd prefer we didn't depend on the closure of iterators until we have some better facilities in place for catching leaks, like CASSANDRA-9918. Right now I don't think we're likely to leak much on a per-request/leak basis (just prevent some shared cleanup, per discarded major resource, such as an sstable) if we fail to close an iterator. So this could magnify the effect of any such bug. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742700#comment-14742700 ] Ariel Weisberg commented on CASSANDRA-9738: --- OK, well I don't have an answer for 3.0 on how to completely eliminate the risk that the copying overhead won't be an issue. If it keeps changing there is no way I will get to the point where I can +1 the performance dimension. I might be able to get to +1 correctness if things stop changing long enough for me to checkout it out and walk through everything. Performance could very well be good enough/better for realistically sized indexed entries. If we benchmark the final code and it's better then I guess I could be +1 on performance and ignore the overhead of allocating and copying for each cache access. In hindsight wholesale replacing the existing key cache was not a good incremental way to effect this change in a low risk way. Introducing a new key cache and iterating on it would have allowed for faster development, more frequent merging to trunk, smaller reviews etc. We would also always have something to ship once release time rolls around even if something change was pending. * [IndexEntryStats, I don't get why you are casting down to int, why not checked cast on the way out?|https://github.com/apache/cassandra/commit/5d53c29ceb915b1b95bca0fff15b1591d508e8c8#diff-75146ba408a51071a0b19ffdfbb2bb3cR1434] * [I don't understand overflow protection here. It's a long is it really going to overflow?|https://github.com/apache/cassandra/commit/5d53c29ceb915b1b95bca0fff15b1591d508e8c8#diff-75146ba408a51071a0b19ffdfbb2bb3cR1434] * Average not taking into account the distribution is suspect to me. Should the allocation be slightly larger than average to prevent reallocation? * When allocating the offset buffer it's using the count of index infos without multiplying by their size? * We are writing sampled records. I am skeptical this is worth optimizing when we have to get this done and correct. There are better options in subsequent versions like appending to a file. * Did this change have an impact and improve some metric? > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742716#comment-14742716 ] Benedict commented on CASSANDRA-9738: - bq. In hindsight wholesale replacing the existing key cache was not a good incremental way to effect this change in a low risk way So, looking at the latest performance comparisons, it doesn't look like the benefit is so uniformly profound as to be worth rushing this out. It looks like whatever was particularly affecting 3.0 in the early benchmarks (that notably wasn't affecting prior versions) has been ironed out. If the code is still being majorly revised so close to RC, we should consider pushing this back, and if incorporated before GA at least consider making it optional. It's turning into a patch of really significant size, and we already have a great deal of risk associated with this release. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742095#comment-14742095 ] Robert Stupp commented on CASSANDRA-9738: - Pushed another update with the following changes: * Legacy IndexEntry objects / missing offsets: re-serializing seemed too expensive to me but not having the offsets in key-cache is also bad. For 3.0+ IndexEntry structures, we have the offsets - so that's not a problem. For legacy IndexEntry structures, it generates _all_ offsets. The change is that the key-cache will include the calculated offsets even for legacy tables. This eliminates the need to re-serialize the whole thing. (legacyIndexInfoSearch() is gone.) * Moved the offsets-"array" back to the end of serialized IndexEntry structure. * Sizing the serialization buffers: Tried to find easily and cheaply accessibly stats for partition size and (more important) serialized IndexEntry size for a CF but didn't find anything. So I added a (poor man) stats to CFMetaData that tracks the numbers to size the serialization buffers appropriately. These stats are not persisted, just calculated during runtime - so it's not a "perfect" thing. The problem with existing stats is that we know the mean partition size but not the mean size of the ClusteringPrefixes which could be used to initialize the poor-man stats during startup. (Note: current TableMetrics.meanPartitionSize iterates over all sstables to retrieve the value) > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741107#comment-14741107 ] Ariel Weisberg commented on CASSANDRA-9738: --- When deserializing an old format IndexedEntry I think you have to rewrite it to generate the offsets. Otherwise the cache will never be populated with an entry where the offsets are calculated. That will make it slower than the older version. I also think this will be faster than generating it incrementally since it's a nice tight loop doing a scan of memory. If you are doing a binary search you will end up doing most of that work anyways and if it's a scan you will also end up doing most of it. RowIndexEntry.java line 423, legacyIndexInfoSearch. It's still doing a loop from the beginning of the offsets to the last offset it calculated. There is no need we should know the last calculated offset available and skip to it. For a scan this operation becomes n^2 with that loop. I think it should go away completely. Just rewrite the IndexedEntry during deserialization since you are making a copy anyways when you bring it out of the field. Jonathan told me the expectation is that people run upgrade sstables so we don't need to be heroic. Let's go for the simples possible solution which is making the old and new formats match after deserialization. Hopefully this means we can remove a bunch of paths based in which format we are looking at. For cache hits we have to copy the entire IndexedEntry onto the heap into unpooled memory. That is making an operation that was lg N a linear operation to the size of the IndexedEntry. In terms of raw speed the on heap cache is going to be better off using the new serialization, but it will really poke the garbage collector in the eye. At least with the OHC cache the garbage is short lived. I don't like to give people options they have to choose from, but I am more afraid of making the product unworkable for some use case. Maybe we should allow the key cache to be selectable for 3.0? Alternatively could you make RowIndexEntry closable and go with ref counting? I feel like these are the two options that get us to 3.0 while minimizing regret post release. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741308#comment-14741308 ] Jonathan Ellis commented on CASSANDRA-9738: --- bq. For cache hits we have to copy the entire IndexedEntry onto the heap into unpooled memory. That is making an operation that was lg N a linear operation to the size of the IndexedEntry. Is that the entire partition's worth of index then that we're copying on heap? Can we use the partition size statistics to automatically pick the appropriate implementation? > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740441#comment-14740441 ] Sylvain Lebresne commented on CASSANDRA-9738: - To put your mind at ease, I'm not particularly attached to any of the changes from CASSANDRA-10232. The goal of that ticket was to make very simple change that, given how we used RIE at the time, provided hopefully clear gains (not only for performance btw, getting smaller files on disk is a nice thing by itself). But if the uses of RIE change and other trade-offs makes more sense, so be it. Of course, the more we have time to validate those choices with benchmarks, the better. bq. I am questioning this on the grounds that it requires walking the entire partition index and doing work even to random access? If you mean that the cost of iterating over the index to recompute the offsets could outweight the gain of reading less data, then you could be right but my hunch is that this probably doesn't make much measurable difference in practice. So the goal was more to save some space on disk than anything else (it's not hugely important, but it's a nice to have). But anyway, I'll remark that even if we want to avoid that walk, keeping both the width and offset of each entry is still redundant. The other alternative is to keep the offset but ditch the width, as the width can be recomputed from the current index offset and the next one (so without a full walk). The reason I didn't went with that alternative originally is that I suspect the code to read old entries (the backward compatibility code) will be more awkward. Anyway, if we keep vint encoding, I guess keeping both the width and offset is not such a big deal and I'm fine with that if that's simpler, but it's definitively possible to keep only one. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741403#comment-14741403 ] Ariel Weisberg commented on CASSANDRA-9738: --- bq. Is that the entire partition's worth of index then that we're copying on heap? Yes bq. Can we use the partition size statistics to automatically pick the appropriate implementation? Possibly. If we have two cache implementations at once how do you configure how big each one is? It seems tricky to do in one week. I think we would have better luck trying to get reference counting working so there doesn't need to be a copy. In a lot of cases the RIE is going to be enclosed by a Closable iterator anyways. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741467#comment-14741467 ] Michael Kjellman commented on CASSANDRA-9738: - [~aweisberg] i'm having trouble keeping up with the number of diff's created from editing your comments on this thread. Any chance you could create new comments instead of editing the previous ones? thanks > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739297#comment-14739297 ] Robert Stupp commented on CASSANDRA-9738: - I've got a couple of cstar perf tests done. Here's a short summary: * OHC-KC is at least as fast as vanilla 3.0, if not faster. * 3.0 generally beats 2.2 - especially with "big" partitions. Most of the remaining cases are mitigated with OHC-KC. * 2.2 shows better numbers in a view cases - especially reading small rows. But that's not a OHC-KC issue - it's also true for vanilla 3.0. * OHC-KC generally shows equal if not better GC pressure. Now going to write a "wall of text" describing OHC-KC changes/architecture and a more sophisticated performance test interpretation. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739436#comment-14739436 ] Ariel Weisberg commented on CASSANDRA-9738: --- [~slebresne] For the OHC key cache we are looking at rolling back the core changes for #10232 because they get in the way of accessing the data off heap when we want to random access fields. Also the RIE is sampling so the value of slightly shrinking the entries seems low to me. There is a time/space tradeoff to make since we have to binary search the indexed RIE. Some of the other optimizations don't work like dropping fields and recalculating them because we don't want to process the index to read it just copy the bytes. I think this also prevented the optimization where you rebased the offset at 64k because that also requires a pass through the data to parse. I asked snazy to benchmark with large partitions with and without vints, but I think it's splitting hairs either way. It's a bad situation in general because when we cache/uncache these things we copy the entire thing which is yucky. It will be faster this way because at least it is a straight memory copy without an object graph or parsing. My pony for 3.0 would be to 64-bit map the entire index file, then get 32-bit ByteBuffer slices out of it and not use the key cache entirely, but there is a time constraint. I think just mapping it naively might get us 80 in the 80/20 of CASSANDRA-9754. Benedict also mentioned CASSANDRA-8931 which sounds like good next release fodder for making the index smaller. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739576#comment-14739576 ] Ariel Weisberg commented on CASSANDRA-9738: --- Thanks to Benedict for providing some clarity on this. So there are two general access paths to an Indexed RIE, there is a scan and a binary search to support random access. For the scan it is fine to materialize an entire IndexInfo. For the binary search case we don't want to materialize an IndexInfo object as this would hurt performance compared to the current POJO implementation. The current code has cases where it access fields from the IndexInfo by index. I would like to get away from that and just return the POJO and access fields from the POJO. As far as we know there is no degenerate case where it's pulling all the fields from different indexes interleaved. We are getting a big win from this compared to the POJO implementation simply by reducing the cost of loading/unloading an IndexedEntry to a memory copy, as well as reducing the cost of building an IndexedEntry by serializing it up front instead of building a list of POJOs and then coming back and serializing it. We should be able to preserve the use of vints. We should optimize the layout of an IndexInfo by having the clustering prefix field as the first field so that binary search doesn't have to do extra decoding. During a scan the cost of materializing and extra decoding (which we can avoid later if we want) is small compared to total operation cost for each entry materialized. Another optimization in addition to vints (and Benedict we didn't talk about this in the hangout) was dropping the offset field from IndexInfo. [This was then recalculated in AbstractSSTableIterator.IndexState|https://github.com/apache/cassandra/compare/trunk...pcmanus:10232#diff-fb1874f891c1a014fb57f8b4e42b5247R431]. I don't see a conflict between 9738 and this choice, but now I am questioning this on the grounds that it requires walking the entire partition index and doing work even to random access? I didn't pick up on that in my review of 10232. We have also agreed that because the IndexedEntries are sampling (and not per row or per partition) they are not as size constrained so keeping the field seems like the right choice. The last optimization from 10232 I want to consider is using the 64k WIDTH_BASE to reduce the size of the offset field. I don't see why we can't preserve that. We also want to keep the [reduction in serializer allocations|https://github.com/apache/cassandra/commit/4dfbba680620fef985cb2b3f00456ee8155404e0]. I checked and at it looks like that has been preserved. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736549#comment-14736549 ] Robert Stupp commented on CASSANDRA-9738: - Some of the missing test-coverage is blocked by CASSANDRA-10237. Will work on the rest of the issues. Also scheduled a bunch of perf tests on cstar with a few but big partitions (100k and 500k rows, gaussian distribution). Will post a summary of these tests later. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735591#comment-14735591 ] Ariel Weisberg commented on CASSANDRA-9738: --- Just going off of coverage from "ant test" not including your work today. I have had some bad luck recently with emma claiming stuff is not covered. Hopefully it's accurate in these instances. * Serializers.LegacyClusteringPrefixSerializer.deserialize isn't tested. Is that blocked by CASSANDRA10237? * IndexInfo.skip isn't labeled as covered and neither is serializedSize * RowIndexEntry.equals is not covered, neither is hashCode, same for RIE.IndexedEntry * RowIndexEntry.Serializer.serialize doesn't test manually serializing the old version * RowIndexEntry.IndexedEntry constructor doesn't test loading the old version * RowIndexEntry.IndexedEntry.indexInfo doesn't test the old version, also the branch at the end where it stores the offset it happens to know about after finishing * RowIndexEntry.IndexedEntry.indexInfo does it have to do that loop for already discovered offsets? Can you store that value and not loop? * RIE.IndexedEntry.indexCount() could store the offset as a constant, also firstIndexInfoOffset * RIE.IndexedEntry.promotedSize for the old version not covered * Slice.Serializer.skip() is not tested/covered, neither is serialized size or part of deserialize, but you didn't change the latter two * [RowIndexEntyrTest has commented out code|https://github.com/apache/cassandra/compare/cassandra-3.0...snazy:9738-key-cache-ohc#diff-7972c5d16db58cd0a0e63e95c4de264aR203] Reviewing test contents now. Also going to update and look at your latest. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735326#comment-14735326 ] Ariel Weisberg commented on CASSANDRA-9738: --- * [This line could just be instanceof?|https://github.com/apache/cassandra/compare/cassandra-3.0...snazy:9738-key-cache-ohc#diff-9b961593e363ad6a553e5af1ff1663b1R164] * [Same here?|https://github.com/apache/cassandra/compare/cassandra-3.0...snazy:9738-key-cache-ohc#diff-9b961593e363ad6a553e5af1ff1663b1R544] * I think we may want to consider not closing streams that wrap fixed buffers. The close operation does nothing, it throws an exception, and all that is going to do is generate useless instructions and prevent optimization. Maybe as a separate ticket we should go back and look for things that do that. * DataInputBuffer doesn't have an efficient skip method. The skip method implementation in super classes expects that it is wrapping a channel or stream and skipping requires reading. For a fixed ByteBuffer there is no need to allocate or copy memory. MemoryInputStream has the same issue in that it could be efficient but isn't. Maybe they should have a common base class since they both wrap memory like things and can have memory (as opposed to stream or channel) specific implementations. * [Dead code|https://github.com/apache/cassandra/compare/cassandra-3.0...snazy:9738-key-cache-ohc#diff-9b961593e363ad6a553e5af1ff1663b1R389] * [I don't think you need to incorporate the buffer in the hash code. It's wasted cycles, position is sufficient? Not sure if we are using it.|https://github.com/apache/cassandra/compare/cassandra-3.0...snazy:9738-key-cache-ohc#diff-9b961593e363ad6a553e5af1ff1663b1R552} * What's the difference between the ClusteringPrefix serializer in Serializers and ClusteringPrefix? Did you have to add the legacy support? That's my reaction to the code. Working on tests and coverage now. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735342#comment-14735342 ] Benedict commented on CASSANDRA-9738: - It can actually be faster to compare the exact class instance, as this is pretty well optimised by hotspot (and should just involve comparing the object headers). It is unlikely to make _much_ of a difference here, especially with the shallow class hierarchy here, just noting it. In this case it looks like for correctness in the {{RowIndexEntry}} class it needs to be an exact match anyway though. The IndexedEntry doesn't need to duplicate those checks though, since they're performed by the call to RIE.equals > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731817#comment-14731817 ] Robert Stupp commented on CASSANDRA-9738: - Had to rebuild the branch as rebase w/ resolving conflicts would have been like opening a can of worms. Unit tests look good and cstar shows the known numbers. The old state is still there for reference [here|https://github.com/snazy/cassandra/tree/9738-key-cache-ohc-before-10136-notest] and [here (squashed)|https://github.com/snazy/cassandra/tree/9738-key-cache-ohc-before-10136-squashed-notest]. Optimizations in the "new" version are not that sophisticated as in the "old" one for backward compatibility as implemented by CASSANDRA-10136. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0.0 rc1 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728610#comment-14728610 ] Michael Kjellman commented on CASSANDRA-9738: - I've been making a good amount of progress on 9754. I'm not sure if I can give you an exact date at this point but I'm hoping it will be in the near future. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0 beta 2 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726135#comment-14726135 ] Robert Stupp commented on CASSANDRA-9738: - Pushed some more commits. Includes serialization of IndexInfo offsets. Adds tests for RowIndexEntry class and legacy sstables based on CASSANDRA-10236 (latter not working yet due to CASSANDRA-10237). Open todo: benchmarking/comparison against "old" implementation, verification of compatibility with legacy sstables. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0 beta 2 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724942#comment-14724942 ] Robert Stupp commented on CASSANDRA-9738: - bq. index info offsets in the serialization Yes, we can still change the on-disk format for indexes in 3.0. It's a todo for this ticket. bq. Caching the last used IndexInfo is going to force a little bit of promotion Did that because some places in the code regularly access the "current" IndexInfo. Feels cheaper than having to deserialize the whole object multiple times. bq. operate on the IndexInfo as a ByteBuffer Would be beneficial (could save some two memory copies per IndexInfo) - but as you said it's not easy. I gave it a quick try but ref-counting watchdog complained about unreleased references. I think we still save a lot with the serialized index-offsets and only deserializing the really needed objects. bq. Reference counting shouldn't be too bad Seems doable (together with IndexInfo as a ByteBuffer). But I'd like to defer this to a follow-up ticket. I think we can get this working for 3.0 - so saving temporary garbage on the heap for cached and non-cached RIE. As I said in 9754, I think that RIE+II (including key cache) has probably reached its EOL with all the ongoing effort to optimize/replace the structure and algorithms. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0 beta 2 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723736#comment-14723736 ] Ariel Weisberg commented on CASSANDRA-9738: --- OK, looks like what I was thinking at a high level. Since it is in progress I won't go deeper yet. I would include the index info offsets in the serialization and just populate them from the start so random access is simple. It's not going to be a worse memory impact than the existing Java object graph that has the array. Caching the last used IndexInfo is going to force a little bit of promotion. Not sure how much. It's not going to be easy, but if we can operate on the IndexInfo as a ByteBuffer we can avoid deserializing them during the binary search. Getting ClusteringComparator and ClusteringPrefix to that point does look hard. We measure as is and see if it's enough garbage and to matter. Reference counting shouldn't be too bad. In all the cases I could find the index info is consumed immediately and discarded or composed into an iterator implementing Closable. There is a bit of rope to hang to hang ourselves with there for long lived iterators pinning stuff in the cache. This is probably something we should test in OHC. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0 beta 2 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723761#comment-14723761 ] sankalp kohli commented on CASSANDRA-9738: -- It looks like there is lot of overlap with CASSANDRA-9754 > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0 beta 2 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14723917#comment-14723917 ] Ariel Weisberg commented on CASSANDRA-9738: --- I agree there is overlap in. The scope of what we need to make the key cache work is smaller, but it might be game over for the key cache and large partitions anyways even with Robert's improvements. Constantly loading and unloading large partitions is never going to work. We are at least reaching parity with the old key cache. If we could map the entire partition index then you could point this code at the mapping and it would work. It would not be ideal because what we are doing is a basic translation of the object graph which is an array pointing to variable size objects. Binary searches are going to result in access to two pages for each comparison. Possibly still better than what we have today? I think the trick right now is figuring out what we can do for 3.0, and what the next intermediate step is. If Robert goes and makes this work off-heap then the key cache can maybe go off heap for 3.0. Seems like removing the key cache and reliably operating against a memory map is unlikely for 3.0, but maybe shortly after. What would the timeline be for 9754? That kind of determines what intermediate steps make sense. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > Fix For: 3.0 beta 2 > > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721715#comment-14721715 ] Robert Stupp commented on CASSANDRA-9738: - Code's been refactored and unit tests have no failures, but some dtests failed recently - so it's not yet fully done. There are several commits on the branch. It starts with cleanup refactorings and preparation by encapsulating all RowIndexEntry and IndexInfo accesses. Next is introduction of ByteBuffer (744dde329a084a7fdab52fbbed11a15b296bab35), which is the main commit. Following commits are optimizations on top of that commit. The main change is that all IndexInfo objects are just present as a single (on-heap) ByteBuffer. IndexInfo objects are created when needed. This helps to reduce the amount of small, temporary and maybe unused objects - e.g. during binary search. I've got no benchmark in place yet and latest cstar tests neither show anything new nor a regression. I just didn't succeed to get really big partitions - so I'll try to build some artificial microbenchmarks to get some comparable numbers and prove that the approach also works for really big partitions (at least in theory it should, but who knows). The plan to do binary search in off-heap key cache or deserialize IndexInfo directly from off-heap or even just retrieve requested information without deserialization at all is not yet done. It requires more refactoring than I expected as RowIndexEntry and IndexInfo objects are used at many distinct places and exposing the off-heap ByteBuffer containing the serialized RowIndexEntry+IndexInfos requires proper reference counting (or risk off-heap memory leaks). Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721020#comment-14721020 ] Robert Stupp commented on CASSANDRA-9738: - bq. If you can do it in time it's fine for 3.0. K, will try :) Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14719307#comment-14719307 ] Ariel Weisberg commented on CASSANDRA-9738: --- For observers. We hit a snag. The key cache values can be quite large. There is an entry in each value for every row per partition so it can be in the thousands. This likely means that copying the entire thing on heap to operate on it once per read is not going to match the performance of the existing POJO implementation. Robert is going to benchmark a more representative configuration. It's a tractable problem but we will need to an off heap list implementation for variable size objects that supports binary search without materializing each entry in the search. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720240#comment-14720240 ] sankalp kohli commented on CASSANDRA-9738: -- [~mkjellman] is cooking something like this in CASSANDRA-9754. I will let him give more details. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720378#comment-14720378 ] Robert Stupp commented on CASSANDRA-9738: - As an intermediate solution (it's actually more a workaround) we could just not add key-cache entries bigger than a configurable size (maybe 64kB?) and find a solution that works for both key-cache and disk (CASSANDRA-9754). Having something that can do the search in a {{ByteBuffer}} might be nice as it could be used in off-heap and on disk (assuming it's mmapped). /cc [~mkjellman] Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720557#comment-14720557 ] Ariel Weisberg commented on CASSANDRA-9738: --- I haven't looked to deep into it yet, but why would it be out for 3.0? Is it not possible to bridge an off heap list with the on heap list implementation that is already there? The current formulation is out unless a benchmark of many rows per partition demonstrates no slow down compared to the POJO version on a read intensive benchmark with a cacheable distribution (not uniform). If you can do it in time it's fine for 3.0. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720518#comment-14720518 ] Michael Kjellman commented on CASSANDRA-9738: - [~snazy] I had floated the idea of a size limit with [~iamaleksey] and he wasn't a fan Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720530#comment-14720530 ] Robert Stupp commented on CASSANDRA-9738: - yea - the idea's bad. It locks out big partitions (that really need the key cache). Assume that idea moved to /dev/null. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720486#comment-14720486 ] Robert Stupp commented on CASSANDRA-9738: - It's just brain dumping an idea: we could change RowIndexEntry and related IndexInfo plus its comparators to be off heap compatible (means work on a ByteBuffer but not change the serialization format or algorithm itself). Having this would also save polluting the heap with RIE+II objects when reading it from disk as it could just load one ByteBuffer and operate on that. In other words: make {{RowIndexEntry}} a wrapper for a {{ByteBuffer}} without the need to deserialize any object structure during reads. CASSANDRA-9754 could further optimize this to handle really big partitions? But having this problem with big RowIndexEntries (lots of IndexInfo objects) I guess migrating KC off-heap is out of 3.0? Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716905#comment-14716905 ] Ariel Weisberg commented on CASSANDRA-9738: --- How big is the realistic size range of key cache values? The only thing that makes me uncomfortable right now is the buffer handling for the serializing the key cache value in serializedSize(). Resizing in the loop is 4k at a time and not doubling. I think it at least needs to double to manage whatever the worst case is. It seems to me like the right tack on failure is to allocate the correct size buffer by calculating the size the old way or double (for lg(n)). Doubling is fine if we really never expect sane data models to hit it. Doubling is not going to be great for people to hit under real world conditions especially if they have to do it several times. It's tempting to ask to have 10189 expanded to be make this stuff fast and then do it the old way in this change set, but I could be convinced if there isn't a really bad corner case. Also next time could you wait +1 to squash? I can't see your last few change sets once you squash. I am guilty of this too and now I know how it feels. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712038#comment-14712038 ] Robert Stupp commented on CASSANDRA-9738: - As discussed offline: * current branch uses its own string (de)serialization, is squashed and rebased * opened CASSANDRA-10189 as a follow-up for unified read/writeUTF code paths Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14709670#comment-14709670 ] Ariel Weisberg commented on CASSANDRA-9738: --- For reading a string there is no reason not to use the last bit and allow strings 65k strings. It also doesn't throw if the length is too large. I really would like to get to the one true string reading/writing path if possible. So we can put together one good implementation and also to optimize icache. To allocate or pool temporary buffers is kind of a question of how allocation bound you are. I think the answer for C* right now is very. Is it feasible to share the same pooling buffer for both the key serialization and value serialization? Maybe a little finicky given that size is sometimes retrieved in an order that is different from the order they are serialized in. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708431#comment-14708431 ] Robert Stupp commented on CASSANDRA-9738: - Branch is ready for another review round. It's rebased against current 3.0 and trunk. Changes are: * updated to ohc 0.4.2 (to get rid of {{ByteBuffer.order()}} calls and the try-finally-clauses) * no more duplicate serialization effort in {{serializedSize()}} + {{serialize()}} - uses a thread-local serialization buffer. Can be further improved with CASSANDRA-9929. * added/moved string serialization methods to {{ByteBufferUtil}}, which has distinct code paths for 7-bit ASCII and strings with wider chars Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702991#comment-14702991 ] Robert Stupp commented on CASSANDRA-9738: - Pushed changes to existing branches (plus a new branch for merge to trunk). Included is a new {{KeyCacheCqlTest}} to test 2i and large partitions (verified via key cache metrics) and addressing the review comments (in a separate commit). Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703452#comment-14703452 ] Ariel Weisberg commented on CASSANDRA-9738: --- In OHCKeyCache when calculating the length of strings you are calculating the length using String.length() which returns a wrong answer if you encode using UTF-8. We went to some length to come up with an optimal no garbage string writing function for BufferedDataOutputStream (you can see it as a static method in UnbufferedDataOutputStream). It would be great if we could do the same thing here and not allocate byte arrays for the encoded names. Would it be crazy for it take a lambda that describes how to write the generated byte array to some output? Then we could use the same code everywhere. [~benedict] what do you think? Since you are using a short length prefix what is your level of confidence it will always be enough? How does it fail if it isn't? The 2i path testing is much appreciated. The utests didn't seem to make it in cassci. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703613#comment-14703613 ] Robert Stupp commented on CASSANDRA-9738: - Ouch, yes, names can contain UTF8 chars. [Pushed a commit|https://github.com/snazy/cassandra/commit/2e2d572ea0c30c2ef1bba49df9a6667f7b51fc4a#diff-df8196b2b182d7e311c455c5d6115f80] as _a demo_ of how the lambda approach could look like. writeUTF could be nice - but readUTF would require two method refs (one for reading the len and one for reading bytes). read/writeUTF in java.io also use unsigned short for serialization. Hm - Mrs. cassci seems to be annoyed... (Slave went offline during the build) Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702731#comment-14702731 ] Robert Stupp commented on CASSANDRA-9738: - Thanks :) bq. How are the 2i paths tested? Currently implicitly via other unit tests. But I'll add some unit test for that. bq. The null case in makeVal isn't tested, maybe not that interesting Oh. Although it doesn't cause any problems, passing null to OHC is not allowed. I've added an assertion. bq. SerializationHeader forKeyCache Yes. It's intentionally racy to prevent blocking operations. The chance for such a race is probably low and heals itself. Added a comment for that. bq. comment about singleton weigher removed bq. NIODataInputStream has a derived class DataInputBuffer changed to use DIB bq. string encoding and decoding helpers yup - makes sense. refactored. bq. An enhancement we can file for later is to replace those strings with vints that reference a map of possible table names. My idea is to remove strings at all from the key cache. Keyspace + CF names can be handled by CASSANDRA-10028. Not sure how to handle file/path names - maybe using some sparse list structure for sstable generations (in another ticket). Haven't pushed anything yet - but will update my branch soon. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700387#comment-14700387 ] Ariel Weisberg commented on CASSANDRA-9738: --- Good stuff. Coverage of the OHC key cache provider looks good. * How are the 2i paths tested? * The null case in makeVal isn't tested, maybe not that interesting * SerializationHeader forKeyCache is racy and can result in an undersize array clobbering a properly sized one. But... it doesn't retrieve the value it sets so odds are eventually it will work out to be the longer one. It works it's just intentionally racy. * In CacheService does that comment about singleton weigher even make sense anymore? * NIODataInputStream has a derived class DataInputBuffer that exposes the constructor you made public. * The string encoding and decoding helpers you wrote seem like they should be factored out somewhere else, maybe ByteBufferUtil? Also you don't specify a string encoding and there may be some issues with serialized size of non-latin characters lurking as well. * An enhancement we can file for later is to replace those strings with vints that reference a map of possible table names. For persistence definitely fully qualify, but in memory we can store more entries that way. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0 beta 2 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697543#comment-14697543 ] Robert Stupp commented on CASSANDRA-9738: - Code coverage looks not bad so far. Important code is covered (by nature) from nearly every unit test. getter/setter on CacheService for all caches are uncovered. Key cache's hotKeyIterator() is also uncovered. Will add unit tests for them in the respective test classes. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0.0 rc1 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693369#comment-14693369 ] Aleksey Yeschenko commented on CASSANDRA-9738: -- [~aweisberg] Could you review? Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0.0 rc1 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659996#comment-14659996 ] Jonathan Ellis commented on CASSANDRA-9738: --- What does code coverage show? Because I know that would be Ariel's first question. :) Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0.0 rc1 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659619#comment-14659619 ] Robert Stupp commented on CASSANDRA-9738: - I’d like to propose this patch to be included in 3.0. I hope the cstar tests are sufficient but otherwise I can deliver more with different workloads. h2. cstar tests All cstar tests mentioned below perform three operations: write-only, mixed and read-only. Unfortunately, cassandra-stress seems to reduce the really possible write throughput for workloads with clustering keys. All tests on this patch show reduced GC pressure (for reads, of course). By that it gives G1 more headroom to operate and and often gains about 10-15% read perf improvement depending on the hardware (in this case bdplab vs. blade_11_b) - bdplab (spinning disks, less RAM) shows a bigger improvement. h3. one big clustering key user, native, cql3, user [profile|https://gist.github.com/snazy/b6c160c65001eb074784] [blade_11_b|http://cstar.datastax.com/tests/id/7f7265a2-3aee-11e5-b022-42010af0688f] [bdplab|http://cstar.datastax.com/tests/id/af344e3e-3af0-11e5-b379-42010af0688f] h3. big clustering over two clustering columns user, native, cql3, user [profile|https://gist.github.com/snazy/351156424929d868baf3] [blade_11_b|http://cstar.datastax.com/tests/id/e919725a-3b68-11e5-b590-42010af0688f] [bdplab|http://cstar.datastax.com/tests/id/36f1d0ee-3b8c-11e5-9c9e-42010af0688f] h3. big clustering over two clustering columns, reduced threads for pure-write and mixed operations user, native, cql3, user [profile|https://gist.github.com/snazy/e4579499f61911802fcd] [blade_11_b|http://cstar.datastax.com/tests/id/36f1d0ee-3b8c-11e5-9c9e-42010af0688f] [bdplab|http://cstar.datastax.com/tests/id/07754e44-3b8d-11e5-9c9e-42010af0688f] h3. stress _write_, _mixed_, _read_ [blade_11_b|http://cstar.datastax.com/tests/id/def04c20-3b8d-11e5-9c9e-42010af0688f] [bdplab|http://cstar.datastax.com/tests/id/f3f5c172-3b8d-11e5-9c9e-42010af0688f] h2. Git branch + cassci [git branch|https://github.com/snazy/cassandra/tree/9738-key-cache-ohc] [unit tests|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-9738-key-cache-ohc-testall/] [dtests|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-9738-key-cache-ohc-dtest/] I didn’t see any failed tests related to this patch. There is another branch on github as well which contains [optimizations not purely related to key-cache|https://github.com/snazy/cassandra/tree/9738-key-cache-ref]. {{9738-key-cache-ohc}} is based on that branch and contains: * “singletons” for key-cache {{o.a.c.db.SerializationHeader}} instances (dynamically extended, if required) * “singletons” for {{IndexInfo.Serializer}} in {{o.a.c.db.Serializers}} (dynamically extended, if required) * “singletons” for {{BigVersion}} instances for {{ma}}, {{la}}, {{ka}}, {{jb}} - other versions get temporary objects (some tests use older sstable versions) h2. Further optimisations There are some things that can be optimised in the future: * Currently we need to serialise keyspace and cf names _and_ cfId. This is necessary since cfID of secondary indexes is inherited from the base table. If all tables and all secondary indexes have unique IDs, we can omit KS and CF name serialisation (and it’s weird {{cfName.contains(‘.’)}} 2i detection). Can be built with or after 2i API redesign. * The full directory path is serialised. This appears to be less expensive than iterating of the whole {{List}} of sstables and identifying an sstable by its generation. * As [~benedict] suggested, we can switch to very tiny key-cache entries and also omit serialisation of {{IndexInfo}}. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0.0 rc1 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14655012#comment-14655012 ] Robert Stupp commented on CASSANDRA-9738: - [~kohlisankalp] ok :) Here are two more cstar results using [this profile|https://gist.github.com/snazy/b6c160c65001eb074784]. [blade_11_b|http://cstar.datastax.com/graph?stats=7f7265a2-3aee-11e5-b022-42010af0688fmetric=op_rateoperation=3_usersmoothing=1show_aggregates=truexmin=0xmax=190.96ymin=0ymax=113547.5] shows the same OHC having the same throughput as 2.2 but with less GC pauses ; both are faster than 3.0 alone. [bdplab|http://cstar.datastax.com/graph?stats=af344e3e-3af0-11e5-b379-42010af0688fmetric=op_rateoperation=3_usersmoothing=1show_aggregates=truexmin=0xmax=254.65ymin=0ymax=97972.6] shows OHC having better throughput than 2.2 and 3.0 ; also less GC effort. Both environments show this behaviour in every test - guess, it's ok to blame it spinning disks and less RAM. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0.0 rc1 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654610#comment-14654610 ] sankalp kohli commented on CASSANDRA-9738: -- Yes I am +1 on the change. We should test this with different partition sizes. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0.0 rc1 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654195#comment-14654195 ] Robert Stupp commented on CASSANDRA-9738: - FTR: I've added a chunked implementation to OHC but it is not included in this ticket. A thorough review of that chunked implementation would be required. The chunked impl has the advantage to reduce malloc/free-rate to nearly 0 at the cost that the key-cache cannot be resized dynamically (at least yet). Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0.0 rc1 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654333#comment-14654333 ] sankalp kohli commented on CASSANDRA-9738: -- [~snazy] In your test, how big are the CQL partitions and how many IndexInfo objects are being generated? That will determine the improvement we will see. Also since we need to deserialize these objects on key cache hit, I am not sure how it will affect large CQL partitions. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0.0 rc1 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654374#comment-14654374 ] Robert Stupp commented on CASSANDRA-9738: - [~kohlisankalp] all previous tests were plain old simple stress profiles - so no big partitions. The [current|https://gist.github.com/snazy/b6c160c65001eb074784] and next cstar tests have different stress profiles with different partition sizes. I appreciate suggestions for real-world stress profiles. But even if this gives no throughput benefit, reduced GC pressure (or not letting objects escape from new-gen) is worth the change. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.0.0 rc1 Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645767#comment-14645767 ] Robert Stupp commented on CASSANDRA-9738: - [This cstar test] is probably interesting. It compares 2.2, trunk, trunk w/ ParNewCMS, 9738, 9738 w/ ParNewCMS. The reason for the additional ParNewCMS configs is that it was the only reasonable change that can explain the weird throughput graphs (and huge GC pauses) for trunk. After some offline discussion yesterday with [~benedict] and [~tjake], comparing G1 w/ ParNewCMS we concluded that G1 is the cause of the weird graphs + long GC pauses for trunk. G1 requires more head room to perform better. And 9738 gives it more head room (by moving stuff from heap to off-heap). Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.x Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645807#comment-14645807 ] Benedict commented on CASSANDRA-9738: - Looks like improved occupancy is also a factor (comparing run 2 to run 3). I'd prefer to eliminate the malloc/free from cache maintenance before we hook OHC into the key cache, but these numbers are pretty compelling and we should probably consider it for 3.0. I suspect CASSANDRA-8930 will permit us to deliver a more efficient and high occupancy key-cache, but that shouldn't prevent us doing something in the meantime. Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.x Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646112#comment-14646112 ] Benedict commented on CASSANDRA-9738: - Yes. Thanks Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.x Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646091#comment-14646091 ] Jonathan Ellis commented on CASSANDRA-9738: --- Did you mean to link a different issue? Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Assignee: Robert Stupp Fix For: 3.x Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620551#comment-14620551 ] Jonathan Ellis commented on CASSANDRA-9738: --- My next question was going to be why GC activity went *up* but I see you edited the table. :) Very promising! Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Fix For: 3.x Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619987#comment-14619987 ] Robert Stupp commented on CASSANDRA-9738: - bq. why would the cache improvements improve write speed? TBH - I was surprised by that, too. I guess, it's key cache invalidation during compaction ({{SSTablRewriter.InvalidateKeys}}). Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Fix For: 3.x Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619099#comment-14619099 ] Jonathan Ellis commented on CASSANDRA-9738: --- That definitely looks interesting. But why would the cache improvements improve write speed? Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Fix For: 3.x Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618972#comment-14618972 ] Robert Stupp commented on CASSANDRA-9738: - I did some [coding|https://github.com/snazy/cassandra/tree/9738-key-cache-ohc] just to see what this might buy us. [cstar bench|http://cstar.datastax.com/graph?stats=e0c7514c-258f-11e5-8c01-42010af0688fmetric=op_rateoperation=3_readsmoothing=1show_aggregates=truexmin=0xmax=138.16ymin=0ymax=185288.4] gives these nice numbers: | || reads/sec || writes/sec || total gc time, 2nd read || trunk | 120k | 122k | 2sec (111 gcs, 18ms avg) || with CASSANDRA-9718 | 128k | 131k | 24sec (106 gcs, 222ms avg) || with key-cache off-heap | 163k | 136k | 23sec (104 gcs, 222ms avg) Migrate key-cache to be fully off-heap -- Key: CASSANDRA-9738 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 Project: Cassandra Issue Type: Sub-task Reporter: Robert Stupp Fix For: 3.x Key cache still uses a concurrent map on-heap. This could go to off-heap and feels doable now after CASSANDRA-8099. Evaluation should be done in advance based on a POC to prove that pure off-heap counter cache buys a performance and/or gc-pressure improvement. In theory, elimination of on-heap management of the map should buy us some benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)