[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285916#comment-14285916 ] Ariel Weisberg commented on CASSANDRA-7438: --- If you turn the OOME throwing on in C* I am +1. I did a quick performance test with the cache and compared it to the SerializingCache. I didn't test a scenario where it would be better/faster, but the performance looked just as good. Very noisy test with different results every time I restarted so maybe not a great way to measure. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Robert Stupp Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284416#comment-14284416 ] Robert Stupp commented on CASSANDRA-7438: - bq. I am +1 conditional on the library throwing OOME if the allocator fails. Will add a configuration switch to explicitly enable this behavior. bq. There are also still some internal properties inside OHC that are don't have a prefix. Yes, debug-mode and disable-jemalloc did not had the prefix. Will be changed. bq. I noticed you fixed some C* bugs ... need to be backported? It's only the {{==}} to {{equals}} change in {{ColumnFamilyStore.cleanupCache}}. It's not necessary to fix it for older versions, since the {{UUID}} instance is taken from {{CFMetaData}} - so the {{==}} is (was) correct. bq. Can you publish a new version to maven central so I can benchmark it vs the old cache implementation? OHC 0.3 + 0.3.1 are on Maven Central. Note: OHC 0.3.1 incorporates the changes above (might not found using Maven Central search, but artifacts are there) C* git branch updated to use 0.3.1 Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Robert Stupp Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284230#comment-14284230 ] Ariel Weisberg commented on CASSANDRA-7438: --- I am +1 conditional on the library throwing OOME if the allocator fails. It should be the caller of the library that decides how to handle the situation not the library IMO. There are also still some internal properties inside OHC that are don't have a prefix. I noticed you fixed some C* bugs https://github.com/snazy/cassandra/compare/7438-pluggable#diff-98f5acb96aa6d684781936c141132e2aL1915 Do those fixes need to be backported? Can you publish a new version to maven central so I can benchmark it vs the old cache implementation? Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Robert Stupp Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14282467#comment-14282467 ] Robert Stupp commented on CASSANDRA-7438: - I think the possibly best alternative to access malloc/free is {{Unsafe}} with jemalloc in LD_PRELOAD. Native code of {{Unsafe.allocateMemory}} is basically just a wrapper around {{malloc()}}/{{free()}}. Updated the git branch with the following changes: * update to OHC 0.3 * benchmark: add new command line option to specify key length (-kl) * free capacity handling moved to segments * allow to specify preferred memory allocation via system property org.caffinitas.ohc.allocator * allow to specify defaults of OHCacheBuilder via system properties prefixed with org.caffinitas.org. * benchmark: make metrics in local to the driver threads * benchmark: disable bucket histogram in stats by default I did not change the default number of segments = 2 * CPUs - but I thought about that (since you experienced that 256 segments on c3.8xlarge gives some improvement). A naive approach to say e.g. 8 * CPUs feels too heavy for small systems (with one socket) and might be too much outside of benchmarking. If someone wants to get most out of it in production and really hits the number of segments, he can always configure it better. WDYT? Using jemalloc on Linux via LD_PRELOAD is probably the way to go in C* (since off-heap is also used elsewhere). I think we should leave the OS allocator on OSX. Don't know much about allocator performance on Windows. For now I do not plan any new features for C* - so maybe we shall start a final review round? Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Robert Stupp Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280945#comment-14280945 ] Ariel Weisberg commented on CASSANDRA-7438: --- I ran the benchmark on the develop branch today using a c3.8xlarge and profiled with flight recorder. There is definitely some contention on the lock in JNA. I also see a little in AbstractQueuedSynchronizer from locking the segments. along with some park/unpark activity. I built jemalloc (-march=native --disable-fill --disable-stats). The Ubuntu package compiles at o2 instead of o3. I am getting full utilization across 30 threads if I increase the number of segments to 256 otherwise it hovers around 2600% (with 30 threads). It cuts in half the number of instances of contention in the profiler. The workload settings you ran with resulted in a lot of cache (ohcache, not CPU cache) misses. I think a real workload where the cache is useful will have more hits. One note about the benchmark, building the histogram of buckets is not a lightweight operation. I think that should be off by default. I removed it for my testing. Otherwise it looks ok. Using the Timer as shared state in a micro-benchmarks is probably not the way to go. I would have a timer per driver thread and then aggregate. I am running 1-30 threads and it will take a few hours to finish. I am going to look into benchmarking inside C* and comparing the existing cache implementation to OHC now. I used this which gave me mostly cache hits and filled up quite a bit of RAM. It takes a minute or two to fill the cache. {noformat} #!/bin/sh LD_PRELOAD=~/jemalloc-3.6.0/lib/libjemalloc.so.1 \ java -Xmx8g -XX:+UnlockCommercialFeatures -XX:+FlightRecorder \ -DDISABLE_JEMALLOC=true \ -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=7091 -Dcom.sun.management.jmxremote.local.only=false \ -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false \ -Djava.rmi.server.hostname=ec2-54-172-234-230.compute-1.amazonaws.com \ -jar ohc-benchmark/target/ohc-benchmark-0.3-SNAPSHOT.jar \ -rkd 'gaussian(1..1500,2)' -wkd 'gaussian(1..1500,2)' -vs 'gaussian(1024..4096,2)' -r .9 -cap 320 \ -d 120 -t 30 \ -sc 256 {noformat} 256 segments, jemalloc LD_PRELOAD, -DDISABLE_JEMALLOC=true {noformat} Reads : one/five/fifteen/mean: 2503894/2143858/2036336/2459949 count: 295258886 min/max/mean/stddev: 0.00047/ 0.76172/ 0.00652/ 0.03865 75/95/98/99/999/median: 0.00439/ 0.00697/ 0.01147/ 0.03458/ 0.75864/ 0.00342 Writes: one/five/fifteen/mean: 278134/238242/226326/273275 count:32800525 min/max/mean/stddev: 0.00176/ 0.89665/ 0.00945/ 0.03986 75/95/98/99/999/median: 0.00719/ 0.01180/ 0.01816/ 0.11640/ 0.89006/ 0.00556 {noformat} 256 segments, jemalloc via jna {noformat} Reads : one/five/fifteen/mean: 2343872/1458688/1159829/2387622 count: 286635526 min/max/mean/stddev: 0.00054/ 0.97114/ 0.00756/ 0.04664 75/95/98/99/999/median: 0.00435/ 0.00675/ 0.00985/ 0.05139/ 0.95959/ 0.00341 Writes: one/five/fifteen/mean: 260376/162076/128883/265250 count:31843705 min/max/mean/stddev: 0.00267/ 0.70586/ 0.01502/ 0.05161 75/95/98/99/999/median: 0.01049/ 0.01695/ 0.04193/ 0.36639/ 0.70331/ 0.00859 {noformat} default segments, jemalloc LD_PRELOAD, -DDISABLE_JEMALLOC=true {noformat} Reads : one/five/fifteen/mean: 2148677/1630379/1448226/2202878 count: 264549288 min/max/mean/stddev: 0.00035/ 0.66081/ 0.00820/ 0.03519 75/95/98/99/999/median: 0.00435/ 0.01247/ 0.05423/ 0.20834/ 0.65286/ 0.00323 Writes: one/five/fifteen/mean: 238699/180945/160641/244767 count:29395103 min/max/mean/stddev: 0.00172/ 0.39821/ 0.01120/ 0.03079 75/95/98/99/999/median: 0.00805/ 0.02124/ 0.08665/ 0.18473/ 0.39776/ 0.00574 {noformat} Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Robert Stupp Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275809#comment-14275809 ] Ariel Weisberg commented on CASSANDRA-7438: --- bq. Making freeCapacity a per-segment field: Then I'd prefer to reverse the stuff - i.e. add an allocatedBytes field to each segment. Operations would get the (calculated) free capacity as a parameter and act on that value. Or were you thinking about dividing capacity by number of segments and use that as the max capacity for each segment? The goal of splitting the locks and tables into segments is to eliminate any globally shared cache lines that are written to by common operations on the cache. Having every put modify the free capacity introduces a potential point of contention. Only way to really understand the impact is to have a good micro benchmark you trust and try it both ways. I think that you would split the capacity across the segments. Do exactly what you are doing now, but do the check inside the segment. Since puts are allowed to fail I don't think you have to do anything else. bq. Regarding rehash/iterators: Could be simply worked around by counting the number of active iterators and just don't rehash while an iterator is active. That's better than e.g. returning duplicate keys or keys not at all - i.e. people relying on that functionality. This is an OHC not a C* issue. I think from C*'s perspective it can be wrong rarely and it doesn't matter since it doesn't effect correctness. Definitely worth documenting though. bq. I lean towards removing the new tables implementation in OHC. It has the big drawback that it only a allows a specific number of entries per bucket (e.g. 8). But I'd like to defer that decision after some tests on a NUMA machine. You are on to something in terms of making a faster hash table, but it doesn't seem like huge win given the short length of most chains (1, or 2) and the overhead of the allocator and locking etc. It would show up in a micro-benchmark, but not in C*. I would like to stick with linked for C* for now since it's easy to understand and I've looked at it a few times. I think I already sent you a link to this https://www.cs.cmu.edu/~dga/papers/silt-sosp2011.pdf but there are a lot of ideas there for dense hash tables. You can chain together multiple buckets so the entries per bucket becomes a function of cache line size. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Robert Stupp Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275047#comment-14275047 ] Robert Stupp commented on CASSANDRA-7438: - Thanks for the review. Many useful hints in it :) I'll reduce configuration stuff in C* integration as suggested and add some default-by-system-property mechanism (as suggested). Making freeCapacity a per-segment field: Then I'd prefer to reverse the stuff - i.e. add an allocatedBytes field to each segment. Operations would get the (calculated) free capacity as a parameter and act on that value. Or were you thinking about dividing capacity by number of segments and use that as the max capacity for each segment? Regarding rehash/iterators: Could be simply worked around by counting the number of active iterators and just don't rehash while an iterator is active. That's better than e.g. returning duplicate keys or keys not at all - i.e. people relying on that functionality. I just started JMH without any additional parameters. It's called during Maven test phase (unless you specify -DskipTests). You're right. Murmur3 + UTF8 need more tests. Didn't notice that that fastutil is that fat. Already replaced with an own implementation. I lean towards removing the new tables implementation in OHC. It has the big drawback that it only a allows a specific number of entries per bucket (e.g. 8). But I'd like to defer that decision after some tests on a NUMA machine. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Robert Stupp Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274005#comment-14274005 ] Ariel Weisberg commented on CASSANDRA-7438: --- If you go all the way down the JMH rabbit hole you don't need to do any of your own timing and JMH will actually do some smart things to give you accurate timing and ameliorate the impact of non-scalable/expensive timing measurement. Metrics uses System.nanoTime() internally so it isn't really any better as far as I can tell. System.nanoTime() on Linux is pretty scalable http://shipilev.net/blog/2014/nanotrusting-nanotime/. When I tested it in JMH it actually seemed to be linearly scalable, but JMH will solve that for you even on platforms where nanoTime is finicky. The C* integration looks good. I'm glad it was easy. When it comes to exposing configuration parameters less is more The stress tool when used without workload profiles does some validation. It checks that values are there and that the contents are correct. Did not know about the JNA synchronized block. That is surprising, but I am glad to hear it is getting fixed. For access to jemalloc I recommend using unsafe and LD_PRELOAD jemalloc. I think that would be the recommended approach and the one you should benchmark against and JNA would be there as a fallback. That gives you a JNI call for allocation/deallocation. I am trying out the JMH benchmark and looking at the new linked implementation right now. How are you starting the JMH benchmark? Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Robert Stupp Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274409#comment-14274409 ] Ariel Weisberg commented on CASSANDRA-7438: --- I did another review. The additional test coverage looks great. Don’t throw Error, throw runtime exceptions on things like serialization issues. The only place it make sense to throw error is when allocating memory fails. That would match the behavior of ByteBuffer.allocateDirect. I don’t see failure to allocate from the heap allocator as recoverable even in the context of the cache. IOError is thrown from one place in the entire JDK (Console) so it's an odd choice. freeCapacity should reall be a field inside each segment and full/not full and eviction decisions should be made inside each segment independently. In practice inside C* it’s probably fine as just an AtomicLong, but I want to see OHC be all it can be. Rehash test could validate the data. After the rehash. It could also validate the rehash under concurrent access, say have a reader thread that is randomly accessing values the already inserted value.I can’t tell if the crosscheck test inserts enough values to trigger rehashing. Inlining the murmur3 changes makes me a little uncomfortable. It’s good see see some test coverage comparing with another implementation, but it’s over a small set of data. It seems like the Unsigned stuff necessary to perfectly mimic the native version of murmur3 is missing? Add 2-4 byte coed points for the UTF-8 tests. FastUtil is a 17 megabyte dependency all to get one array list. The cross checking implementation is really nice. Looking at the AbstractKeyIterator, I don’t see how it can do the right thing when a segment rehashes. It will point to a random spot in the segment after a rehash right? In practice maybe this doesn’t matter since they should size up promptly and it’s just an optimization that we dump this stuff at all. I can understand what the current code does so I lean towards keeping it. There are a couple of places (serializeForPut, putInternal, maybe others) where there are two exception handlers that each de-allocate the same piece of memory. The deallocation could go in a finally instead of the exception handlers since it always happens. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Robert Stupp Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271745#comment-14271745 ] Robert Stupp commented on CASSANDRA-7438: - Note: OHC how has cache-loader support (https://github.com/snazy/ohc/issues/3). Could be an alternative for RowCacheSentinel. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Robert Stupp Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267558#comment-14267558 ] Robert Stupp commented on CASSANDRA-7438: - BTW: Is there any singe-node-cluster test that has been used to test the 'old' row cache or a test that runs against a single-node-cluster and verifies the data being written during a long run - i.e. several hours? Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Robert Stupp Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265947#comment-14265947 ] Robert Stupp commented on CASSANDRA-7438: - The latest (just checked in) benchmark implementation gives much better results. Using {{com.codahale.metrics.Timer#time(java.util.concurrent.CallableT)}} eliminates use of {{System.nanoTime()}} or {{ThreadMXBean.getCurrentThreadCpuTime()}} - it can directly use its internal clock. The benchmark {{java -jar ohc-benchmark/target/ohc-benchmark-0.2-SNAPSHOT.jar -rkd 'gaussian(1..2000,2)' -wkd 'gaussian(1..2000,2)' -vs 'gaussian(1024..4096,2)' -r .9 -cap 16 -d 30 -t 30}} improved from 800k reads to 3.3M reads per second w/ 8 cores). So yes - benchmark was measuring its own mad code. Due to that I edited my previous comment with the benchmark results since those are invalid now. I've added a (yet simple) JMH benchmark as a separate module. This one can cause high system CPU usage - at operation rates of 2M per second or more (8 cores). I think these rates are really fine. Note: these rates cannot be achieved in production since then you'll obviously have to pay for (de)serialization, too. So we want to address these topics as follow-up: * own off-heap allocator * C* ability to access off-heap cached rows * C* ability to serialize hot keys directly from off-heap (might be a minor win since it's triggered not that often) * per-table knob to control whether to add to row-cache on writes -- I strongly believe that this is a useful feature (maybe LHF) on workloads where read and written data work on different (row} keys. * investigate if counter-cache can benefit * investigate if key-cache can benefit bq. You could start with it outside and publish to maven central and if there an issue getting patches applied quickly we can always fork it in C*. OK bq. pluggable row cache Then I'll start with that - just make row-cache pluggable and the implementation configurable. Note: JNA has a synchronized block that's executed at every call - version 4.2.0 fixes this (don't know when it will be released). Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266809#comment-14266809 ] Robert Stupp commented on CASSANDRA-7438: - OHC works in Cassandra: * unit tests pass ({{ant test}}, not difference against trunk) * get and put verified in debugger and a (simple) table * row cache saving and load working, too Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Robert Stupp Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263083#comment-14263083 ] Ariel Weisberg commented on CASSANDRA-7438: --- I went to run the benchmark myself and I noticed you used a uniform distribution for the keys. I don't think that makes sense for testing a cache where the primary benefit is going to be from cacheable access patterns. I would use extreme with .6 or .5 for the shape. I am also confused by the benchmark implementation. There are threads generating the tasks and then handing them off to other threads for execution. This means the benchmark is measuring unrelated things like the performance of the queue used for receiving tasks and returning results as well as the general design of the harness. It makes me wonder if that is the source of the under-utilization issue. I think this might work well as a JMH benchmark and the parameterization would make it easy to put together a full test matrix that anyone can run with one command. I tried to run it and it seems to go for longer than expected. I specified -d 300 and it is still going. The benchmark is doing work according to top. I ran on a c3.8xlarge using the Rightscale 14.1 base server template running Ubuntu 14.04, Oracle JDK8u25, I got jemalloc from the libjemalloc1 package. Cloned OHC today and ran the benchmarking using bq.java -jar ohc-benchmark/target/ohc-benchmark-0.2-SNAPSHOT.jar -rkd 'gaussian(1..2000,2)' -wkd 'gaussian(1..2000,2)' -vs 'gaussian(1024..4096,2)' -r .9 -cap 160 -d 300 -t 30 -dr 8 after running mvn package. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262472#comment-14262472 ] Ariel Weisberg commented on CASSANDRA-7438: --- I have an in progress response to your earlier comment. I'll address the benchmark here. I wouldn't sweat allocator performance. Ultimately we will have to have our own if only to accurately enforce memory utilization (user asks for 200 megabytes, we use 400, not cool). I think the blueprint for how to do this already exists in something like memcached in terms of how to allocate and defragment. We just need to adapt it for our approach where it is a pool of independently locked hash tables. The overhead of copying is where zero deserialization and ref-counting start to be a win since you don't have to copy at all. I wouldn't get worked up on optimizing for that yet since that requires upstream to be smarter about how it uses the cache. If upstream can parse the cache value and extract a subset without copying the entire thing it will handle larger values more gracefully. At some point upstream might also hold partial rows as well. I would like to see the ability to spin all cores against the cache, at least for relatively small values. Not being able to do that is a little concerning. Are threads blocking inside the allocator? Do the utilization issues occur with large or small values? I don't have a real baseline with whether these numbers are good or bad. They sound okay and as you say you would expect the allocator to be one of the slowest parts. I am not sure testing with 500 threads is realistic since threads have a pretty good chance of being descheduled while holding a lock and that isn't as likely to happen under real usage conditions. I would test with say 30 threads on that hardware. For say 16k values measuring scaling from 1-30 threads would give us an idea of how well things are going. That would also give you better feedback on whether different numbers of stripes help or not. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262487#comment-14262487 ] Ariel Weisberg commented on CASSANDRA-7438: --- bq. Whether to migrate whole OHC code into org.apache.cassandra codebase (with the option to either turn it on or off). I am open to either. I asked Benedict and he prefers having it inside C* so we can patch it. The advantage of having it outside is that it might see use elsewhere and get additional eyes/contributions. You could start with it outside and publish to maven central and if there an issue getting patches applied quickly we can always fork it in C*. bq. Whether to implement a “pluggable row cache“ (to allow multiple implementations) I think that we aren't going to need multiple cache implementations in the long run. Seems like we should be able to have on that can be configured to have the desired behavior. Benedict doesn't feel strongly about it either. If Vijay wants to continue working on another implementation then we would want to keep it pluggable the way it currently is. It looks like the KeyCache and CounterCache both use a different implementation and not SerializingCache. I am not clear on why they don’t use serializing cache. It's worth evaluating why that is before converging on a single implementation. bq. New per-table knob to enable whether to populate entries to the row cache on reads+writes or just on reads (to target different workloads) Sounds like it would be useful, but first we have to come up with someone somewhere that says I want this, or a workload where this is the right call. There may also be correctness issues to think about see next item. bq. Rethink about whether to keep the current RowCacheSentinel implementation as is - if I understand it correctly, it just reduces the number of cache-put operations (cache hit on a sentinel performs a disk read). A compromise regarding additional serialization cost? I think it is for correctness? https://issues.apache.org/jira/browse/CASSANDRA-3862 I'm still reading up on this. bq. Improvement of key (de)serialization (saving the row cache to disk) - use direct I/O There is some trickiness here because the AutoSavingCache breaks apart the keys to determine where the data goes. bq. Optimizations of value deserialization effort - let C* directly access a cached row in off-heap memory instead of the deserialization (and on-heap object construction) overhead. I think these two together would make a good follow up ticket. Another good follow up ticket would be addressing the allocator for performance and for fragmentation. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257721#comment-14257721 ] Robert Stupp commented on CASSANDRA-7438: - I had the opportunity to test OHC on a big machine. First: it works - very happy about that :) Some things I want to notice: * high number of segments do not have any really measurable influence (default of 2* # of cores is fine) * throughput heavily depends on serialization (hash entry size) - Java8 gave about 10% to 15% improvement in some tests (either on {{Unsafe.copyMemory}} or something related like JNI barrier) * the number of entries per bucket stays pretty low with the default load factor of .75 - vast majority has 0 or 1 entries, some 2 or 3 and few up to 8 Issue (not solvable yet): It works great for hash entries to approx. 64kB with good to great throughput. Above that barrier it first works good but after some time the system spends a huge amount of CPU time (~95%) in {{malloc()}} / {{free()}} (with jemalloc, Unsafe.allocate is not worth discussing at all on Linux). I tried to add some „memory buffer cache“ that caches free’d hash entries for reuse. But it turned out that in the end it would be too complex if done right. The current implementation is still in the code, but must be explicitly enabled with a system property. Workloads with small entries and high number of threads easily trigger Linux OOM protection (that kills the process). Please note that it works with large hash entries - but throughput drops dramatically to just a few thousand writes per second. Some numbers (value sizes have gaussian distribution). Had to do these tests in a hurry because I had to give back the machine. Code used during these tests is tagged as {{0.1-SNAP-Bench}} in git. Throughput is limited by {{malloc()}} / {{free()}} and most tests did only use 50% of available CPU capacity (on _c3.8xlarge_ - 32 cores, Intel Xeon E5-2680v2 @2.8GHz, 64GB). * 1k..200k value size, 32 threads, 1M keys, 90% read ratio, 32GB: 22k writes/sec, 200k reads/sec, ~8k evictions/sec, write: 8ms (99perc), read: 3ms(99perc) * 1k..64k value size, 500 threads, 1M keys, 90% read ratio, 32GB: 55k writes/sec, 499k reads/sec, ~2k evictions/sec, write: .1ms (99perc), read: .03ms(99perc) * 1k..64k value size, 500 threads, 1M keys, 50% read ratio, 32GB: 195k writes/sec, 195k reads/sec, ~9k evictions/sec, write: .2ms (99perc), read: .1ms(99perc) * 1k..64k value size, 500 threads, 1M keys, 10% read ratio, 32GB: 185k writes/sec, 20k reads/sec, ~7k evictions/sec, write: 4ms (99perc), read: .07ms(99perc) * 1k..16k value size, 500 threads, 5M keys, 90% read ratio, 32GB: 110k writes/sec, 1M reads/sec, 30k evictions/sec, write: .04ms (99perc), read: .01ms(99perc) * 1k..16k value size, 500 threads, 5M keys, 50% read ratio, 32GB: 420k writes/sec, 420k reads/sec, 125k evictions/sec, write: .06ms (99perc), read: .01ms(99perc) * 1k..16k value size, 500 threads, 5M keys, 10% read ratio, 32GB: 435k writes/sec, 48k reads/sec, 130k evictions/sec, write: .06ms (99perc), read: .01ms(99perc) * 1k..4k value size, 500 threads, 20M keys, 90% read ratio, 32GB: 140k writes/sec, 1.25M reads/sec, 50k evictions/sec, write: .02ms (99perc), read: .005ms(99perc) * 1k..4k value size, 500 threads, 20M keys, 50% read ratio, 32GB: 530k writes/sec, 530k reads/sec, 220k evictions/sec, write: .04ms (99perc), read: .005ms(99perc) * 1k..4k value size, 500 threads, 20M keys, 10% read ratio, 32GB: 665k writes/sec, 74k reads/sec, 250k evcictions/sec, write: .04ms (99perc), read: .005ms(99perc) Command line to execute the benchmark: {code} java -jar ohc-benchmark/target/ohc-benchmark-0.1-SNAPSHOT.jar -rkd 'uniform(1..2000)' -wkd 'uniform(1..2000)' -vs 'gaussian(1024..4096,2)' -r .1 -cap 320 -d 86400 -t 500 -dr 8 -r = read rate -d = duration -t = # of threads -dr = # of driver threads that feed the worker threads -rkd = read key distribution -wkd = write key distribution -vs = value size -cap = capacity {code} Sample bucket histogram from 20M test: {code} [0..0]: 8118604 [1..1]: 5892298 [2..2]: 2138308 [3..3]: 518089 [4..4]: 94441 [5..5]: 13672 [6..6]: 1599 [7..7]: 189 [8..9]: 16 {code} After trapping into that memory management issue with varying allocation sized of some few kB to several MB, I think that it’s still worth to work on an own off-heap memory management. Maybe some block-based approach (fixed or variable). But that’s out of the scope of this ticket. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251537#comment-14251537 ] Robert Stupp commented on CASSANDRA-7438: - I’ve nearly finished the OHC implementation. Unit tests cover all functionality required by C* and a separate test-only implementation is now used to verify the implementation (entry (de)serialization is not extensively covered by the tests, yet). OHC interface is changed towards the functionality required by C*. Maven executes the unit tests both with and without jemalloc (only if jemalloc is installed, of course). [~aweisberg], [~benedict] can you have a look at the current OHC code? I’d like to know how it could/should be integrated in C*. IMO there are two decisions to be made: * Whether to migrate whole OHC code into org.apache.cassandra codebase (with the option to either turn it on or off). * Whether to implement a “pluggable row cache“ (to allow multiple implementations) I've got some ideas regarding row cache which are out of scope of this ticket: * New per-table knob to enable whether to populate entries to the row cache on reads+writes or just on reads (to target different workloads) * Rethink about whether to keep the current {{RowCacheSentinel}} implementation as is - if I understand it correctly, it just reduces the number of cache-put operations (cache hit on a sentinel performs a disk read). A compromise regarding additional serialization cost? * Improvement of key (de)serialization (saving the row cache to disk) - use direct I/O * Optimizations of value deserialization effort - let C* directly access a cached row in off-heap memory instead of the deserialization (and on-heap object construction) overhead. Note: although the jemalloc allocator provides a {{getTotalAllocated()}} method, the result is not correct and I don't know why. The result depends on jemalloc configure settings ({{--en/disable-tcache}}). According to the man-page the result should be correct (sum of {{stats.allocated}} and {{stats.huge.allocated}}), but it isn't (verified with a coded memory leak of small allocations that didn't increase the value). Iterating over the jemalloc _arenas_ and _bins_ does not help since the two mentioned values are aggregations of these. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251756#comment-14251756 ] Robert Stupp commented on CASSANDRA-7438: - Ah - a burn test is still missing. I'll add some code that is able to verify the cache contents, key iterators, and such stuff. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239729#comment-14239729 ] Ariel Weisberg commented on CASSANDRA-7438: --- Lot's of cool stuff here Robert. Unit test wise there is a lot of code that is only covered indirectly or not at all and the behaviors are not checked for explicitly. I don't think it makes sense to include code that doesn't doesn't have a unit test claiming it does what is says it does. The various input/output streams, buffering, and compression all need units tests. Uns.java needs a unit test for pretty much every method as well as the validation functionality. HashEntry has a bunch of uncovered functions. For me the lack of test coverage is the biggest barrier. What I am reacting to is that the tests are black box and miss things. OffHeapMap containsEntry has no tests. removeEntry has untested code. removeLink still has untested code. There is untested histogram stuff, deserializeEntry, serializeEntry. HashEntry classes have untested functions. HashEntries has many predicates that are untested. Having a unit test that fuzzes against a parallel implementation at the same time using a different LRU map implementation would be great for a black box test. You can stripe the other implementation the same way so that the eviction matches. One of my previous comments was that SegmentCacheImpl duplicates reference counting code from OffHeapMap and should just delegate. It ends up doing that anyways. I would really like to see the cleanup/eviction code go away. If inserting an entry would blow capacity remove entries until it doesn't. I don't see a reason to monkey with thresholds. At some point the existing C* cache interface needs to gel with your work. Right now C* uses the hotN and getKeys interface to return the contents of the cache for persistence. I think the path of least resistance to start would be to implement the existing interface and then come back and look at how to get compression and more efficient IO into all the implementations. The existing stuff in C* doesn't do compression and doesn't buffer its IO. I would prefer to minimize major changes to the existing C* code. I want to get it working and then iterate further for other improvements like more efficient cache serialization. You could change the OHC interface or implement an adapter. I think it's fine to modify ICache to return iterables or iterators instead of a collections to incrementally produce key set and hot keys. For everything else I would really like to see things to stay the same unless there is something to be gained by changing the interface. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240193#comment-14240193 ] Robert Stupp commented on CASSANDRA-7438: - bq. Lot's of cool stuff Thx :) Unit testing: you are absolutely right. (Will go on with that next) bq. unit test that fuzzes against a parallel implementation at the same time using a different LRU map implementation Do you mean sth. like LinkedHashMap with removeEldestEntry() ? It's some effort to get a nice implementation for unit tests - but, yeah - makes sense. bq. duplicates reference counting code removed duplicated code bq. cleanup/eviction code go away ... remove entries until it [fits] much easier ; cleaner code ; implemented - but not completely sold on the new implementation yet (quick hack yet) bq. C* cache interface ... get compression and more efficient IO [later] That's fair. I just saw some minutes ago that row-cache serialization only persists the keys and not the values - so the existing implementation in OHC would need to be changed / extended / whatever. I thought it persists the value, too. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240235#comment-14240235 ] Ariel Weisberg commented on CASSANDRA-7438: --- bq. That's fair. I just saw some minutes ago that row-cache serialization only persists the keys and not the values - so the existing implementation in OHC would need to be changed / extended / whatever. I thought it persists the value, too. I was also confused by that. Persisting the values would break cache invalidation in a way that is hard to correct without integrating with the commit log. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240305#comment-14240305 ] Robert Stupp commented on CASSANDRA-7438: - Yep - persisting the values would cause inconsistencies - either on it's own or by users deleting saved caches. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237162#comment-14237162 ] Robert Stupp commented on CASSANDRA-7438: - Also pushed persistence of cache content using Snappy compression. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237170#comment-14237170 ] Robert Stupp commented on CASSANDRA-7438: - Rehashing: hm - at {{o.a.c.db.ColumnFamilyStore#getThroughCache}} (better: {{RowCacheKey}}) we only have the token/key but no (good) hash for the key. The savings by using a 32 bit hash is about 8 bytes per cache entry (reference-counter field can then be reduced from 64 bit to 32 bit and still keeping the 8 byte boundaries for key and value data). But this seems not to have any measurable effect if e.g. jemalloc aligns allocated memory blocks on bigger page sizes depending on whole cache entry size (e.g. several kB or MB). OHC always calculates its own murmur3 hash using the serialized cache key. I _hope_ to achieve a better distribution across segments and buckets by using 64 bits - but not sure on this. My preference of using 64 hash bits is basically it feels better. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234696#comment-14234696 ] Robert Stupp commented on CASSANDRA-7438: - Just pushed some OHC additions to github: * key-iterator (used by CacheService class to invalidate column families) * (de)serialization of cache content to disk using direct I/O from off-heap. Means that the row cache content does not need to go though the heap for serialization and deserialization. Compression should also be possible in off-heap using the static methods in Snappy class since these expect direct buffers so there's nearly no pressure for that on the heap. Background: the implementation basically lies the address and length of the hash entry into DirectByteBuffer class so FileChannel is able to read into it/write from it. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232833#comment-14232833 ] Benedict commented on CASSANDRA-7438: - re: hash bits: there's not really a dramatic benefit to using more than 32-bits. We will always use the upper bits for the segment and the lower bits for the bucket, for which 4B items is plenty, although we don't have proper entropy for all the bits; we may have only 28-bits of good collision free-ness; we will want to rehash the murmur hash to ensure this is spread evenly to avoid a grow boundary consistently failing to reduce collisions. The one advantage of having some spare hash bits is that we can use these to avoid running a potentially expensive comparison on a large key until high confidence we've found the correct item - and as the number of unused hash bits for indexing dwindle, the value of this goes up. But the number of instances where this helps will be vanishingly small, since the head of the key will be on the same cache line and a hash collision and key prefix collision is pretty unlikely. It might be more significant if we were to use open-address hashing, as we would have excellent locality and reduce the number of expected cache misses for a lookup. But this won't be measurable above the cache serialization costs. We do already have these hash bits calculated in c*, typically. We also are unlikely to notice the overhead - allocations are likely to have ~16 bytes of overhead, be padded to the nearest 8 or 16 bytes, and a row has a lot of bumpf to encode. I doubt there will be any variation in storage costs from using all 64 bits. i.e., whatever floats your boat Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231327#comment-14231327 ] Vijay commented on CASSANDRA-7438: -- [~snazy] I was trying to compare the OHC and found few major bugs. 1) You have individual method synchronization on the Map, which doesn't ensure that your get is locked before a put is performed (same with clean, hot(N), remove etc), look at SynchronizedMap source code to do it right else will crash soon. 2) Even after i fix it, there is correctness in the hashing algorithm i think. Get returns a lot of error and looks like there is some memory leaks too. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231627#comment-14231627 ] Ariel Weisberg commented on CASSANDRA-7438: --- Robert I don't seem to be getting the latest code for your work on master? For instance the key comparison code does 8 bytes at a time and doesn't handle trailing bytes as far as I can tell. To Vijay's point. A pseudo-random test against the map that does say 200 million operations against a keyspace of several million entries and mirrors the operations on a regular hash map and checks they have the same contents periodically would be helpful in having some confidence in the map. Size it so the LRU doesn't do anything. Print the seed at the beginning of the test so it can be reproduced. I think this basically duplicates the benchmark, but having it as a unit test is nice. We can tune the number of operations and keys down for running in CI. You could also look a the unit tests for Guava's cache or j.u.HashMap and borrow those. Nice thing about data structure APIs is that the tests already exist. bq. Yes, basically from JDK. Could not get that via inheritance. What are the licensing and attribution requirements for that code? bq. IMO hash code should be 64 bits because 32 bits might not be sufficient. [~benedict] might have some opinions on how to get the best bits out of MurmurHash3. 32 bits is 256-512 gigabytes of cache for 128 byte entries which is not bad. I don't feel strongly either way since I don't know whether callers will have the hash precomputed. bq. Nope - would not be. But it's 2^27 (limited by a stupid constant used for both max# of segments and max# of buckets). Worth taking a look at it - it's weird, yes. In OffHeapMap line 222 it seems to have a gate preventing rehashing to 2 ^ 24 buckets. bq. (Hope I caught all of your comments) I'll check them once you update. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231817#comment-14231817 ] Robert Stupp commented on CASSANDRA-7438: - [~vijay2...@yahoo.com] can you explain what kind of bugs? bq. licensing and attribution requirements It's already in C* code base in exactly the same way. Also pushed some changes: * increased max# of segment and buckets to 2^30 (means approx 1B segments times 1B bucket) * add some prototype for direct I/O for row cache serialization (zero copy) - just as a demo (just coded, not tested yet) * uses Unsafe for value (de)serialization * move (most) statistic counters to OffHeapMap to reduce contention caused by volatile (really makes sense) * remove use of guava cache API * corrected and improved key comparison Regarding the 64 bit hash. It's 64 bit since OHC takes the the most significant bits for the segment and the least significant bits for the hash inside a segment. Both are limited to 30 bits = 60 bits. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231878#comment-14231878 ] Vijay commented on CASSANDRA-7438: -- Never mind, my bad it was related the below (which needs to be more configurable instead) and the items where going missing earlier than i thought it should and looks you just evict the items per segment (If a segment is used more more items will disappear from that segment and the lest used segment items will remain). {code} // 12.5% if capacity less than 8GB // 10% if capacity less than 16 GB // 5% if capacity is higher than 16GB {code} Also noticed you don't have replace which Cassandra uses. Anyways i am going to stop working on this for now, let me know if someone wants any other info. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230675#comment-14230675 ] Ariel Weisberg commented on CASSANDRA-7438: --- Look pretty nice. Suggestions: * Push the stats into the segments and gather them the way you do free capacity and cleanup count. You can drop the volatile (technically you will have to synchronize on read). Inside each OffHeapMap put the stats members (and anything mutable) as the first declared fields. In practice this can put them on the same cache line as the lock field in the object header. It will also be just one flush at the end of the critical section. Stats collection should be free so no reason not to leave it on all the time. * I am not sure batch cleanup makes sense. When inserting an item into the cache would blow the size requirement I would just evict elements until inserting it wouldn't. Is there a specific efficiency you think you are going to get from doing it in batches? * Cache is the wrong API to use since it doesn't allow lazy deserialization and zero copy. Since entries are refcounted there is no need make a copy. Might be something to save for later since everything upstream expects a POJO of some sort. * Key buffer might be worth a thread local sized to a high watermark Do we have a decent way to do line level code review? I can't leave comments on github unless there is a pull request. Line level stuff * Don't catch exceptions and handle inside the map. Let them all propagate to the caller and use try/finally to do cleanup. I know you have to wrap and rethrow some things, but avoid where possible. * Compare key compares 8 bytes at a time, how does it handle trailing bytes and alignment? * Agrona has an Unsafe ByteBuffer implementation that looks like it makes a little better use of various intrinsics then AbstractDataOutput. Does some other nifty stuff as well. https://github.com/real-logic/Agrona/blob/master/src/main/java/uk/co/real_logic/agrona/concurrent/UnsafeBuffer.java * In OffHeapMap.touch lines 439 and 453 are not covered by tests. Coverage looks a little weird in that a lot of the cases are always hit but some don't touch both branches. If lruTail == hashEntryAddr maybe assert next is null. * Rename mutating OffHeapMap lruNext and lruPrev to reflect that they mutate. In general rename mutating methods to reflect they do that such as the two versions of first * I don't see why the cache can't use CPU endianness since the key/value are just copied. * Did you get the UTF encoded string stuff from somewhere? I see something similar in the jdk, can you get that via inheritance? * HashEntryInput, AbstractDataOutput are low on the coverage scale and have no tests for some pretty gnarly UTF8 stuff. * Continuing on that theme there is a lot of unused code to satisfy the interfaces being implemented, would be nice to avoid that. * By hashing the key yourself you prevent caching the hash code in the POJO. Maybe hashes should be 32-bits and provided by the POJO? * If an allocation fails maybe throw OutOfMemoryError with a message * If an entry is too large maybe return an error of some sort? Seems like caller should decide if not caching is OK. * put on allocation failure calls removeInternal, but the key doesn't appear to be in the map yet? Is that to handle the put invalidating the previous entry? * In put, why catch VirtualMachineError and not error? Seems like it wants a finally, and it shouldn't throw checked exceptions. * If a key serializer is necessary throw in the constructor and remove other checks * Hot N could use a more thorough test? * In practice how is hot N used in C*? When people save the cache to disk do they save the entire cache? * In the value loading case, I think there is some subtlety to the concurrency of invocations to the loader in that it doesn't call it on all of them in a race. It might be a minor change in behavior compared to Guava. * Maybe do the value loading timing in nanoseconds? Performance is the same but precision is better. * OffHeapMap.Table.removeLink(long,long) has no test coverage of the second branch that walks a bucket to find the previous entry * I don't think storage for 16 million keys is enough? For 128 bytes per entry that is only 2 gigabytes. You would have to run a lot of segments which is probably fine, but that presents a configuration issue. Maybe allow more than 24 bits of buckets in each segment? * SegmentedCacheImpl contains duplicate code fro dereferencing and still has to delegate part of the work to the OffHeapMap. Maybe keep it all in OffHeapMap? * Unit test wise there are some things not tested. The value loader interface, various things like putAll or invalidateAll. * Release is not synchronized. Release should null pointers out so you get a good clean segfault. Close should maybe lock and close one segment
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231132#comment-14231132 ] Robert Stupp commented on CASSANDRA-7438: - [~aweisberg], thanks for the review :) Some of the changes you suggested are already in. Much of the code has been moved to OffHeapMap. Batch cleanup has been completely removed - it's now handled inside OffHeapMap. It makes runtime and code much nicer. I've delayed descent unit tests to later - the stuff changed to often. And there would be a big change when merging the stuff into C* code base, removing all that unused code and Cache interface implementation. All of the duplicated stuff has been removed - we don't need that - even for a general-purpose cache it would have been not useful. bq. Key buffer might be worth a thread local sized to a high watermark Hm - do you mean sth like {{static final ThreadLocalKeyBuffer perThreadBuffer;}} inside SegmentedCacheImpl? Regarding the line level code review (it's fine the way you did it IMO): bq. Don't catch exceptions Done. bq. 8 bytes at a time, how does it handle trailing bytes and alignment? Tailing bytes: fails back to per-byte comparison. Alignment: key and value are aligned on 8-byte boundaries. bq. Agrona has an Unsafe ByteBuffer implementation that looks like it makes a little better use of various intrinsics then AbstractDataOutput. Good hint! Will definitely take a look at it! bq. I don't see why the cache can't use CPU endianness since the key/value are just copied. Ah - you mean that stuff in HashEntryInput/Output. No - you can't always copy it using unsafe API. I don't recall exactly why I removed that optimization (had that implemented before), but it had sth to do with data serialized for KeyBuffer and putting it into off-heap. But it makes sense for values (since these are always directly serialized to off-heap). bq. UTF encoded string stuff ... get that via inheritance? Yes, basically from JDK. Could not get that via inheritance. bq. hashing the key yourself ... 32-bits Thought about it (and had that previously). Yes - if we have a good hash code, we can use it. But I don't know whether the calling code has a hash code. IMO hash code should be 64 bits because 32 bits might not be sufficient. bq. allocation fails maybe throw OutOfMemoryError That would shut down C* daemon ;) Maybe. Not sure about that. I think if you run into such a situation (out of off-heap/system memory) you are completely lost. It just ignores that put() and removes the old entry. bq. entry is too large maybe return an error of some sort No. The calling code cannot do anything meaningful with it. But the calling could could check for that in advance (before constructing any object related to caching), if it has enough information. bq. catch VirtualMachineError and not error done bq. hotN() I _think_ it is used to persist the hot set of the cache. bq. concerned about materializing the full list on heap Agree. Thought about patching cache off-heap addresses into DirectByteBuffer and using that for serialization. bq. I don't think storage for 16 million keys is enough? Nope - would not be. But it's 2^27 (limited by a stupid constant used for both max# of segments and max# of buckets). Worth taking a look at it - it's weird, yes. bq. value loading case, Don't think we need that API. bq. Release is not synchronized. Yep - will do that. (Hope I caught all of your comments) Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229284#comment-14229284 ] Robert Stupp commented on CASSANDRA-7438: - Have pushed the latest changes of OHC to https://github.com/snazy/ohc. It has been nearly completely rewritten. Architecture (in brief): * OHC consists of multiple segments (default: 2 x #CPUs). Less segments leads to more contention, more segments gives no measurable improvement. * Each segment consists of an off-heap-hash-map (defaults: table-size=8192, load-factor=.75). (The hash table requires 8 bytes per bucket) * Hash entries in a bucket are organized in a double-linked-list * LRU replacement policy is built-in via its own double-linked-list * Critical sections that mutually lock a segment are pretty short (code + CPU) - just a 'synchronized' keyword, no StampedLock/ReentrantLock * Capacity for the cache is configured globally and managed locally in each segment * Eviction (or replacement or cleanup) is triggered when free capacity goes below a trigger value and cleans up to a target free capacity * Uses murmur hash on serialized key. Most significant bits are used to find the segment, least significant bits for the segment's hash map. Non-production relevant stuff: * Allows to start off-heap access in debug mode, that checks for accesses outside of allocated region and produces exceptions instead of SIGSEGV or jemalloc errors * ohc-benchmark updated to reflect changes About replacement policy: Currently LRU is built in - but I'm not really sold on LRU as is. Alternatives could be * timestamp (not sold on this either - basically the same as LRU) * LIRS (https://en.wikipedia.org/wiki/LIRS_caching_algorithm), big overhead (space) * 2Q (counts accesses, divides counter regularly) * LRU+random (50/50) (may give the same result than LIRS, but without LIRS' overhead) But replacement of LRU with something else is out of scope of this ticket and should be done with real workloads in C* - although the last one is just a additional config parameter. IMO we should add a per-table option that configures whether the row cache receives data on reads+writes or just on reads. Might prevent garbage in the cache caused by write heavy tables. {{Unsafe.allocateMemory()}} gives about 5-10% performance improvement compared to jemalloc. Reason fot it might be that JNA library (which has some synchronized blocks in it). IMO OHC is ready to be merged into C* code base. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228693#comment-14228693 ] Benedict commented on CASSANDRA-7438: - Invert those two statements and the behaviour is still broken. B: 154 :map.get() A: 187: map.remove() A: 191: queue.deleteFromQueue() B: 158: queue.addToQueue() Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228871#comment-14228871 ] Vijay commented on CASSANDRA-7438: -- Should be taken care of too, it should become a duplicate delete to the queue and should work normally (by itemUnlinkQueue). Here is the adjusted test case for it. https://github.com/Vijay2win/lruc/blob/master/src/test/java/com/lruc/unsafe/UnsafeQueueTest.java#L81 Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228306#comment-14228306 ] Ariel Weisberg commented on CASSANDRA-7438: --- Pauseless resizing is a worthy design goal, but might not be necessary if you call it a warmup cost. I would break out the performance comparison with and without warming up the cache so we know how it performs when you aren't measuring the resize pauses. Those should only happen at startup when the cache is populated. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228397#comment-14228397 ] Benedict commented on CASSANDRA-7438: - I suspect segmenting the table at a finer granularity, so that each segment is maintained with mutual exclusivity, would achieve better percentiles in both cases due to keeping the maximum resize cost down. We could settle for a separate LRU-q per segment, even, to keep the complexity of this code down significantly - it is unlikely having a global LRU-q is significantly more accurate at predicting reuse than ~128 of them. It would also make it much easier to improve the replacement strategy beyond LRU, which would likely yield a bigger win for performance than any potential loss from reduced concurrency. The critical section for reads could be kept sufficiently small that competition would be very unlikely with the current state of C*, by performing the deserialization outside of it. There's a good chance this would yield a net positive performance impact, by reducing the cost per access without increasing the cost due to contention measurably (because contention would be infrequent). Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228515#comment-14228515 ] Ariel Weisberg commented on CASSANDRA-7438: --- +1 To what Benedict suggests. One minor nit. Resize pauses will happen across stripes at almost exactly the same time. I know with say 12 stripes it's very bad. With more than that it might start to spread them out, but I haven't seen that in action. We can iterate on resize pause issues later if necessary. It's a warmup issue which will be a problem for some, but might not cripple the feature. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228523#comment-14228523 ] Vijay commented on CASSANDRA-7438: -- {quote}I would break out the performance comparison with and without warming up the cache so we know how it performs when you aren't measuring the resize pauses.{quote} Yep and in stedy state it is similar to get and I have verified that the latency is due to rehash. Better benchmarks on bug machines will be done on Monday. Unfortunately -1 on partitions, it will be a lot more complex and will be hard to understand for users. If we have to expand the partitions, we have to figure out a better consistent hashing algo. Cassandra within Cassandra may be. More over we will end up having the current code as is to move maps and queues offheap. Sorry I don't understand the argument of code complexity. If we are talking about code complexity. The unsafe code is 1000 lines including the license headers :) The current contention topic is weather to use cas for locks. Which is showing higher cpu cost and I agree with Pavel on latencies as shown in the numbers. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228524#comment-14228524 ] Vijay commented on CASSANDRA-7438: -- PS: all the latency spikes are in 100' of micros. It's day and night comparison to current cache :) Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228563#comment-14228563 ] Benedict commented on CASSANDRA-7438: - [~aweisberg]: In my experience segments tend to be imperfectly distributed, so whilst there is bunching of resizes simply because they take so long, with real work going on at the same time they should be a _little_ spread out. Though with murmur3 the distribution may be significantly more uniform than my prior experiments. Either way, they're performed in parallel (without coordination) if they coincide, so it's still an improvement. [~vijay2...@yahoo.com]: When I talk about complexity, I mean the difficulties of concurrent programming magnified without the normal tools. For instance, there are the following concerns: * We have a spin-lock - admittedly one that should _generally_ be uncontended, but on a grow or a small map this is certainly not the case, which could result in really problematic behaviour. Pure spin locks should not be used outside of the kernel. * The queue is maintained by a separate thread that requires signalling if it isn't currently performing work - which, in a real C* instance where the cost of linking the queue item is a fraction of the other work done to service a request means we are likely to incur a costly unpark() for a majority of operations * Reads can interleave with put/replace/remove and abort the removal of an item from the queue, resulting in a memory leak. * We perform the grow on a separate thread, but prevent all reader _or_ writer threads from making progress by taking the locks for all buckets immediately. * Freeing of oldSegments is still dangerous, it's just probabilistically less likely to happen. * During a grow, we can lose puts because we unlock the old segments, so with the right (again, unlikely) interleaving of events a writer can think the old table is still valid * When growing, we only double the size of the backing table, however since grows happen in the background the updater can get ahead, meaning we remain behind and multiply the constant factor overheads, collisions and contention until total size tails off. These are only the obvious problems that spring to mind from 15m perusing the code, I'm sure there are others. This kind of stuff is really hard, and the approach I'm suggesting is comparatively a doddle to get right, and is likely faster to boot. I'm not sure I understand your concern with segmentation creating complexity with the hashing... I'm proposing the exact method used by CHM. We have an excellent hash algorithm to distribute the data over the segments: murmurhash3. Although we need to be careful to not use the bits that don't have the correct entropy for selecting a segment. It's really no more than a two-tier hash table. The user doesn't need to know anything about this. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228569#comment-14228569 ] Vijay commented on CASSANDRA-7438: -- {quote}The queue is maintained by a separate thread that requires signalling{quote} Thread is only signalled if they are not performing operation. I am lost. {quote}resulting in a memory leak{quote} I am 100% sure that this is not true. Can you write a test case for it to make this happen plz? {quote}but prevent all reader or writer threads from making progress by taking the locks for all buckets immediately{quote} I am sure this cannot be done, if you don't write you loose coherence and consistency. {quote}During a grow, we can lose puts because we unlock the old segments{quote} test case again plz. I don't think this can happen too. I spend a lot of time testing the exact scenario. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228575#comment-14228575 ] Benedict commented on CASSANDRA-7438: - bq. I am 100% sure Never be 100% sure with concurrency, please :) bq. test case again plz. I don't think this can happen too. I spend a lot of time testing the exact scenario. You have too much faith in tests. You are testing under ideal conditions - two of the race conditions I highlighted will only rear their heads infrequently, most likely when the system is under uncharacteristic load causing very choppy scheduling. Analysis of the code is paramount. I will not produce a test case as I do not have time, however I will give you an interleaving of events that would trigger one of them. Thread A is deleting an item, and is in LRUC.invalidate(), Thread B is looking up the same item, in LRUC.get(). A: 187: map.remove() B: 154 :map.get() A: 191: queue.deleteFromQueue() B: 158: queue.addToQueue() In particular, addToQueue() sets the markAsDeleted flag to false, undoing the prior work of deleteFromQueue. bq. Thread is only signalled if they are not performing operation. I am lost. It will generally not be performing an operation, because its work will be faster than any of the producers can produce work in normal C* operation. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228577#comment-14228577 ] Vijay commented on CASSANDRA-7438: -- May be you know better than me, but map.remove cannot be followed by a sucessful map.get because the remove is within a lock on the segment... Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch, tests.zip Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226474#comment-14226474 ] Tupshin Harper commented on CASSANDRA-7438: --- [~xedin] I'm lost in too many layers of snark and indirection (not just yours). Can you elaborate on what strategy you actually find appealling? Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226516#comment-14226516 ] Robert Stupp commented on CASSANDRA-7438: - Some short notes about the last changes in OHC: * changed from block-oriented allocation to Unsafe or JEMalloc (if available) * added stamped locks in off-heap (quite simple and very efficient) * triggering cleanup + rehash via cas-side trigger works fine * extended the benchmark tool to specify different workload chacteristics (read/write ratio, key distribution, value length distribution - distribution code taken from cassandra-stress) * still working on a good (mostly contention free) LRU strategy One thing I noticed during benchmarking is that (concurrent?) allocations of large areas (several MB) take up to 50/60ms (OSX 10.10, 2.6GHz Core i7 - no swap, of course) - small regions are allocated quite fast (total roundtrip for a put ~0.1ms for 98 percentile). It might be viable to implement some mixture for memory allocation: Unsafe/JEMalloc for small regions (e.g. 1MB) and pre-allocated blocks for large regions. A configuration value could determine the amount of large region blocks to keep immediately available. Just an idea... Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226531#comment-14226531 ] Ariel Weisberg commented on CASSANDRA-7438: --- When are large regions being allocated? How common is the use case? Large would normally only be for table resizing right? Could the row cache contain very large values with wide rows? Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226552#comment-14226552 ] Vijay commented on CASSANDRA-7438: -- {quote}One thing I noticed during benchmarking is that (concurrent?){quote} Yes, use these options, feel free to make it more configurable if you need. {code} public static final String TYPE = c; public static final String THREADS = t; public static final String SIZE = s; public static final String ITERATIONS = i; public static final String PREFIX_SIZE = p; {code} Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226795#comment-14226795 ] Ariel Weisberg commented on CASSANDRA-7438: --- Caching entire rows of very large rows seems like a problem workload for a variety of reasons. The overhead of repopulating each cache entry on insertion is not good. Does the storage engine always materialize entire rows into memory for every query? 60 milliseconds is much longer than it takes to copy several megabytes so it is expensive even with large rows although the rest of the cost of materializing the row might dominate. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226776#comment-14226776 ] Robert Stupp commented on CASSANDRA-7438: - The row cache can contain very large rows AFAIK. Idea is to pre-allocate some portion of the configured capacity for large blocks - new blocks could be allocated on demand (edge-trigger). OTOH if it stores that amount of data on a cache, that amount of time (20...60ms) might be irrelevant compared to the time needed for serialization - so maybe it would be wasted effort. Not sure about that. Table resizing may take as long as it takes - I do not really bother about allocation time for that, because no reads or writes are locked while allocating the new partition(segment) table. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226861#comment-14226861 ] Pavel Yaskevich commented on CASSANDRA-7438: [~tupshin] Original idea of this was to get the thing we know does the job, which is memcached, strip out some of the unnecessary parts and pack it is a lib we can use over JNI, the same way snappy and others do. But now we are getting into a business of re-inventing things that are pretty hard to get right and properly test, so if the argument against having lruc in it's original form was that it would be hard to test/maintain that, in my opinion, is no longer valid. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227248#comment-14227248 ] Jonathan Ellis commented on CASSANDRA-7438: --- bq. The row cache can contain very large rows [partitions] AFAIK Well, it *can*, but it's almost always a bad idea. Not something we should optimize for. (http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1) bq. Does the storage engine always materialize entire rows [partitions] into memory for every query? Only when it's pulling them from the off-heap cache. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227250#comment-14227250 ] Jonathan Ellis commented on CASSANDRA-7438: --- Looking at the discussion, I wonder if we're overcomplicating things. I think it got a bit lost in the noise when Ariel said earlier, bq. I also wonder if splitting the cache into several instances each with a coarse lock per instance wouldn't result in simpler, fast-enough code. I don't want to advocate doing something different for performance, but rather that there is the possibility of a relatively simple implementation via Unsafe. Why not start with something like that and see if it's Good Enough? I suspect that at that point other bottlenecks will be much more important, so paying a high complexity cost to optimize the cache further would be a bad trade overall. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224611#comment-14224611 ] Robert Stupp commented on CASSANDRA-7438: - [~aweisberg] thanks for that write up! A lot of very good findings, ideas and recommendations. Already implemented some of them - short story: * tend to move from fixed-block-allocation to {{Unsafe.alloc}} - quick benchmarks show similar results * StampedLock and LongAdder in J8 are great * will see how a to implement a better partition management and overall LRU story Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224904#comment-14224904 ] Vijay commented on CASSANDRA-7438: -- {quote} sun.misc.Hashing doesn't seem to exist for me, maybe a Java 8 issue? StatsHolder, same AtomicLongArray suggestion. Also consider LongAdder. {quote} Yep, and let me find alternatives for Java 8 (and until 8 for LongAdder). {quote} The queue really needs to be bounded, producer and consumer could proceed at different rates. In Segment.java in the replace path AtomicLong.addAndGet is called back to back, could be called once with the math already done. I believe each of those stalls processing until the store buffers have flushed. The put path does something similar and could have the same optimization. {quote} Yeah those where a oversight. {quote} Tasks submitted to executor services via submit will wrap the result including exceptions in a future which silently discards them. The library might take at initialization time a listener for these errors, or if it is going to be C* specific it could use the wrapped runnable or similar. {quote} Are you suggesting a configurable logging/exception handling in case the 2 threads throw exceptions? If yes sure. Other exceptions AFAIK are already propagated. (Still needs cleanup though). {quote} A lot of locking that was spin locking (which unbounded I don't think is great) is now blocking locking. There is no adaptive spinning if you don't use synchronized. If you are already using unsafe maybe you could do monitor enter/exit. Never tried it. Having the table (segments) on heap is pretty undesirable to me. Happy to be proved wrong, but I think a flyweight over off heap would be better. {quote} Segments are small in memory so far in my tests, The spin lock is to make sure the lock checks the segment if rehash happened or not, this is better than having a seperate lock which will be central. (No different than java or memcached). Not sure if i understand the UNSAFE lock any example will help. The segments are in heap mainly to handle the locking, I think we can do a bit of CAS but global lock on rehashing will be a problem (May be an alternate approach is required). {quote} It looks like concurrent calls to rehash could cause the table to rehash twice since the rebalance field is not CASed. You should do the volatile read, and then attempt the CAS (avoids putting the cache line in exclusive state every time). {quote} Nope it is Single threaded Executor and the rehash boolean is already volatile :) Next commit will have conditions instead (similar to C implementation). {quote} If the expiration lock is already locked some other thread is doing the expiration work. You might keep a semaphore for puts that bypass the lock so other threads can move on during expiration. I suppose after the first few evictions new puts will move on anyways. This would show up in a profiler if it were happening. {quote} Good point… Or a tryLock to spin and check if some other thread released enough memory. {quote} hotN looks like it could lock for quite a while (hundreds of milliseconds, seconds) depending on the size of N. You don't need to use a linked list for the result just allocate an array list of size N. Maybe hotN should be able to yield, possibly leaving behind an iterator that evictors will have to repair. Maybe also depends on how top N handles duplicate or multiple versions of keys. Alternatively hotN could take a read lock, and writers could skip the cache? {quote} We cannot have duplicates in the Queue (remember it is a double linked list of items in cache). Read locks q_expiry_lock is all we need, let me fix it. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225371#comment-14225371 ] Ariel Weisberg commented on CASSANDRA-7438: --- bq. Are you suggesting a configurable logging/exception handling in case the 2 threads throw exceptions? If yes sure. Other exceptions AFAIK are already propagated. (Still needs cleanup though). Something has to happen to exceptions generated there. Since it is a library and there is no caller to propagate them to it implies that people need to provide a listener or a logger. bq. Segments are small in memory so far in my tests, Segments are hash buckets correct? They aren't segments of several hash buckets. If the goal of the hash table is to have at most two or three entries per segment then having an on heap Java object would be a lot of overhead per. Just as a guess we are talking about two objects. There is the Segment/ReentrantLock and then the AbstractQueuedSynchronizer allocated by ReentrantLock which has three additional fields. It's 48 bytes without alignment or object headers. There is also the overhead of having an AtomicArray of pointers to each segment object. A hash table bucket only has to be a pointer plus a lock field if you are going to lock buckets. You could do that in 8-12 bytes. Whether it's too much data on heap is a question of how big a cache you want and how small the values being cached are. The smaller the values being cached the more the metadata overhead of the cache (and the JVM overhead) matter. Locking wise if you are only doing spin locks you can use unsafe compare and swap to implement a lock in off heap memory. You do have to be careful about alignment. bq. Nope it is Single threaded Executor and the rehash boolean is already volatile. Next commit will have conditions instead (similar to C implementation). The task submitted to the executor doesn't check whether another rehash is required it just does it. The check before submitting a task to do rehashing appears to have a race where two threads could submit the task at the same time. There is no isolation between the threads as they read the volatile field and then write to it. Two or more threads could read and see that no rehash is in progress, update the value to rehash in progress, and then submit the task. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225453#comment-14225453 ] Vijay commented on CASSANDRA-7438: -- {quote} Segments are hash buckets correct? {quote} Yes and the way memcached and lruc is does the rehashing is based on this algorithm, and hence yes... That was the argument earlier about JNI based solution. (Also another reason I was talking about of configurable hash expansion capability in my previous comment) {code} unsigned long current_size = cas_incr(stats.hash_items, 1); if (current_size (hashsize(hashpower) * 3) / 2) { assoc_start_expand(); } {code} if we don't like the constant overhead of the cache in heap and If you are talking about CAS which we already do for ref counting, as mentioned before we need an alternative strategy for global locks for rebalance if we go with lock less strategy. {quote} The task submitted to the executor doesn't check whether another rehash is required it just does it. {quote} Until you complete a rehash you don't know if you need to hash again or not... Am i missing something? Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225478#comment-14225478 ] Ariel Weisberg commented on CASSANDRA-7438: --- bq. if we don't like the constant overhead of the cache in heap and If you are talking about CAS which we already do for ref counting, as mentioned before we need an alternative strategy for global locks for rebalance if we go with lock less strategy. Just take what you have and do it off heap. You don't need to change anything about how locking is done, just put the segments off heap so each segment would be a 4-byte lock field and an 8 byte pointer to the first entry. bq. Until you complete a rehash you don't know if you need to hash again or not... Am i missing something? https://github.com/Vijay2win/lruc/blob/master/src/main/java/com/lruc/unsafe/UnsafeConcurrentMap.java#L38 The check on line 38 races with the assignment on line 39. N threads could do the check and think a rehash is necessary. Each would submit a rehash task and the table size would be doubled N times instead of 1 time. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225531#comment-14225531 ] Robert Stupp commented on CASSANDRA-7438: - bq. alignment requirements for 4 or 8 byte CAS Intel P6: must be 8-byte aligned (if cross cache line) or unaligned (if in same cache line) - but: _However, it is recommend that locked accesses be aligned on their natural boundaries for better system performance_ (https://stackoverflow.com/questions/1415256/alignment-requirements-for-atomic-x86-instructions). Side note: heh - there's even support for 128bit atomic operations (cmpxchg16b) - but where's the primitive for that in Java... :( Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225532#comment-14225532 ] Vijay commented on CASSANDRA-7438: -- {quote} so each segment would be a 4-byte lock {quote} Are you talking about setting Just setting 1 for lock and 0 for unlock? Hmmm, alright thats doable... i am guessing you have already seen how ReentrantLock implements locking. {quote} The check on line 38 races with the assignment on line 39. {quote} I thought we discussed this already... Yeah that was suppose to take care by this comment Next commit will have conditions instead (similar to C implementation)., have not committed it yet :) Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225592#comment-14225592 ] Ariel Weisberg commented on CASSANDRA-7438: --- bq. I thought we discussed this already... Yeah that was suppose to take care by this comment Next commit will have conditions instead (similar to C implementation)., have not committed it yet Sorry my bad. Reentrant lock is just a counter of the number of acquisitions. You could do an 8-byte lock field, store the thread-id in the first 4-byte sand the counter in the next 4-bytes. We probably want these to be 8-byte aligned so they don't cross cache lines. bq. Are you talking about setting Just setting 1 for lock and 0 for unlock? Hmmm, alright thats doable... i am guessing you have already seen how ReentrantLock implements locking. You could do it in 8-bytes since pointers are actually only six bytes. The two higher order bytes are just the highest order bit sign extended on current Intel processors. CAS the pointer and use the highest order bit to represent locked/unlocked. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225616#comment-14225616 ] Pavel Yaskevich commented on CASSANDRA-7438: I guess the only thing that we haven't yet re-invented in Cassandra would be an off-heap lock based on architecture specific details, great that we finally about to change this historical injustice. {color:red}This all definitely sounds a lot more reasonable than having C code as a dependency.{color} Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222799#comment-14222799 ] Robert Stupp commented on CASSANDRA-7438: - rehashing: growing (x2) is already implemented, shrinking (/2) shouldn't be a big issue, too. The implementation only locks the currently processed partitions during rehash. put operation: fixed (was definitely a bug), cleanup is running concurrently and trigger on out of memory condition block sizes: will give it a try (fixed vs. different sizes vs. variable sized (no blocks)) per-partition locks: already thought about it - not sure whether it's worth the additional RW-lock overhead since partition lock time is very low during normal operation metrics: some (very basic) metrics are already in it - will add some more timer metrics (configurable) [~vijay2...@yahoo.com] can you catch {{OutOfMemoryError}} for Unsafe.allocate() ? It should not go up the whole call stack. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222802#comment-14222802 ] Pavel Yaskevich commented on CASSANDRA-7438: bq. per-partition locks: already thought about it - not sure whether it's worth the additional RW-lock overhead since partition lock time is very low during normal operation It depends on the operation mode if there are e.g. 75% reads and 25% writes it makes more sense to use locks, because RW lock is going to be optimized by JVM to CAS operation when there is no contention, anyhow it's a valid test to do with different modes to check CAS vs. RW. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223479#comment-14223479 ] Ariel Weisberg commented on CASSANDRA-7438: --- I think that for caches the behavior you want to avoid most is slowly growing heap. People hate that because it's unpredictable and they don't know when it's going to stop. You can always start with jemalloc and get the feature working and then iterate on memory management. Fixed block sizes is a baby and bath water scenario to get the desirable fixed memory utilization property. When you want to build everything out of fixed size pages you have to slot the pages or do some other internal page management strategy so you can pack multiple things and rewrite pages as they fragment. You also need size tiered free lists and fragmentation metadata for pages so you can find partial free pages. That kind of thing only makes sense in ye olden database land where rewriting an already dirty page is cheaper than more IOPs. In memory you can relocate objects. Memcached used to have the problem that instead of the heap growing the cache would lose capacity to fragmentation. FB implemented slab rebalancing in their fork, and then Memcached did its own implementation. The issue was internal fragmentation due to having too many of the wrong size slabs. For Robert * Executor service shutdown, never really got why it takes a timeout nor why there is no blocking version. 99% of the time if it doesn't shutdown within the timeout it's a bug and you don't want to ignore it. We are pedantic about everything else why not this? It's also unused right now. * Stats could go into an atomic long array with padding. It really depends on the access pattern. You want data that is read/written at the same time on the same cache line. These are global counters so they will be contended by everyone accessing the cache, better that they only have to pull in one cache line with all counters then multiple and have to wait for exclusive access before writing to each one. Also consider LongAdder. * If you want to do your own memory management strategy I think something like segregated storage as in boost pool with size tiers for powers of two and power of two plus previous power of two. You can CAS the head of the free list for each tier to make it thread safe, and lock when allocating out a new block instead of the free list. This won't adapt to changing size distributions. For that stuff needs to be relocatable * I'll bet you could use a stamped lock pattern and readers might not have to lock all. I think getting it working with just a lock is fine. * I am not sure shrinking is very important? The table is pretty dense and should be a small portion of total memory once all the other memory is accounted for. You would need a lot of tiny cache entries to really bloat the table and then the population distribution would need to change to make that a waste. * LRU lists per segment seems like it's not viable. That isn't a close enough approximation to LRU since we want at most two or three entries per partition. * Some loops of very similar byte munging in HashEntryAccess * Periodic cleanup check is maybe not so nice. An edge trigger via a CAS field would be nicer and move that up to 80% since on a big-memory machine that is a lot of wasted cache space. Walking the entire LRU could take several seconds, but if it is amortized across a lot of expiration maybe it is ok. * Some rehash required checking is duplicated in OHCacheImpl For Vijay * sun.misc.Hashing doesn't seem to exist for me, maybe a Java 8 issue? * The queue really needs to be bounded, producer and consumer could proceed at different rates. With striped * Tasks submitted to executor services via submit will wrap the result including exceptions in a future which silently discards them. The library might take at initialization time a listener for these errors, or if it is going to be C* specific it could use the wrapped runnable or similar. * A lot of locking that was spin locking (which unbounded I don't think is great) is now blocking locking. There is no adaptive spinning if you don't use synchronized. If you are already using unsafe maybe you could do monitor enter/exit. Never tried it. * It looks like concurrent calls to rehash could cause the table to rehash twice since the rebalance field is not CASed. You should do the volatile read, and then attempt the CAS (avoids putting the cache line in exclusive state every time). * StatsHolder, same AtomicLongArray suggestion. Also consider LongAdder. * In Segment.java in the replace path AtomicLong.addAndGet is called back to back, could be called once with the math already done. I believe each of those stalls processing until the store buffers have flushed. The put path does something similar and could have the same
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222365#comment-14222365 ] Robert Stupp commented on CASSANDRA-7438: - I've spent some evenings on an alternative approach for an off heap row cache, too. It uses a different concept and architecture. * Based on a big hash table * Each hash partition (segment) has a reference to an LRU linked list to hash entries. Each get operation moves the accessed entry to the head of the LRU linked list. * Data memory is divided into uniform blocks (few kB) and managed by multiple (8) free-block linked lists. Just one big memory allocation during initialization. Pro: no fragmentation of free memory, easier to handle. Con: fragmentation of data. * Proactive eviction with the goal to keep a percentage of memory free. * Put operation (currently) fails, if there's not enough memory available to store the data. Idea is not to block the calling code (don't put additional latency on an overloaded system) * Locks (CAS based) exist on each hash partition, each hash entry and each free list and are held as short as possible (e.g. put allocates data blocks, fills these with the data of the new entry, acquires the lock on the hash partition, updates the LRU linked list pointers and finishes) * To keep the linked lists on each hash partition (segment) short, large hash tables should be used * No rehash yet - could be manageable by locking one hash partition at once and split it into two new partitions (more logic, but no global lock). * No overhead in JVM heap for the cache itself (although accesses require short lived objects for serialization) * Only stolen thing is Vijay's benchmark (asked him before ;) ). Pushed here: https://github.com/snazy/ohc - more descriptive Readme, too Other ideas: * If we have off heap data, it might be possible to (de)serialize the hot set directly to/from that off heap data (zero-copy I/O). At the cost of changing the on-disk data format. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222660#comment-14222660 ] Pavel Yaskevich commented on CASSANDRA-7438: Personally I like what Vijay did a bit more just because main ideas where taken from the memcached which is proven to be working fine for the majority of the use-cases and is pretty simple inside. Regarding Robert's implementation I have few comments which he'll have to address (if not already) before I would consider this for inclusion: - rehashing is must have, we want to grow/shrink caches based on usage to lessen burden on users trying to size it appropriately from day 1; - if put operation fails it should at least invalidate previously inserted value if any, and probably kick-off maintenance activities like LRU cleanup and/or rehashing; - Fixed size data block create a lot of allocation slop which could be sometimes take majority of allocate memory (e.g. Firefox had that problem), cache should at least have blocks of different sizes to minimize that; - would be great to have benchmarks for per-partition CAS vs. per-partition RW lock in different operation modes, cache invalidation could be noticeable factor for performance as well as CAS-races; - metrics (if not yet added). Also based on discussion [~snazy] had with [~vijay2...@yahoo.com], I would avoid using DirectByteBuffer because they are a problematic to GC. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1495#comment-1495 ] Vijay commented on CASSANDRA-7438: -- Alright the first version of pure Java version of LRUCache pushed, * Basically a port from the C version. (Most of the test cases pass and they are the same for both versions) * As ariel mentioned before we can use disruptor for the ring buffer but this doesn't use it yet. * Expiry in the queue thread is not implemented yet. * Algorithm to start the rehash needs to be more configurable and based on the capacity will be pushing that soon. * Overhead in JVM heap is just the segments array. https://github.com/Vijay2win/lruc/tree/master/src/main/java/com/lruc/unsafe Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199976#comment-14199976 ] Robert Stupp commented on CASSANDRA-7438: - Debugging C code via JNI and debugging Unsafe code on large data structures is a nightmare. And a simple, stupid bug in both kinds can quickly let the JVM core dump. Advantage for the Unsafe approach is that all OS are directly supported. Advantage for the JNI approach is that the code that handles the data structures is much easier to read. Proposal: * Extract the changes that support pluggable ICacheProvide from this ticket to a separate ticket and commit that stuff * Let Vijay continue his work on this one * Provide an alternative implementation using Unsafe * Let both implementations compete in some long running tests This is much effort to do - but I don't know how to validate either solution theoretically. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198368#comment-14198368 ] Jonathan Ellis commented on CASSANDRA-7438: --- Here's me in July: bq. I'm not wild about taking on the complexity of building and distributing native libraries if we have a reasonable alternative. Vijay what do we win with the native approach over using java unsafe? The objection at the time was, bq. The win is that we can now have Caches which can be bigger than JVM with zero GC overhead on the items. Unsafe approach will hold the references in memory and the overhead on them is reasonably high compared to the native approach (example of it is an integer key's) and in addition if we use hash map we have segments with locks (also there the references in the queue), so it is not a straight forward approach either. ... but as Ariel said, we can use the same technique to hold references off-heap with Unsafe, as with JNI. Am I missing something? Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198523#comment-14198523 ] Vijay commented on CASSANDRA-7438: -- Alright looks like the objection is not on the design but the language choice, if i knew the implementation details it would have been a easier choice in the first place (the argument earlier was that we don't have a way to lock and use the queue easier), for example the map vs queue etc The thing which we are missing is 4 months of dev, testing and reviewers time :). Its alright let me give it a shot and after all we have an alternative to benchmark on. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196296#comment-14196296 ] Ariel Weisberg commented on CASSANDRA-7438: --- .bq No we don't. We have locks per Segment, this is very similar to lock stripping/Java's concurrent hash map. Thanks for clearing that up .bq Not really we lock globally when we reach 100% of the space and we freeup to 80% of the space and we spread the overhead to other threads based on who ever has the item partition lock. It won't be hard to make this part of the queue thread and will try it for the next release of lruc. OK, that make sense. 20% of the cache could be many milliseconds of work if you are using many gigabytes of cache. That's not a great thing to foist on a random victim thread. If you handed that to the queue thread, well I think you run into another issue which is that the ring buffer doesn't appear to check for queue full? The queue thread could go out to lunch for a while. Not a big deal, but finer grained scheduling will probably be necessary. .bq If you look at the code closer to memcached. Actually I started of stripping memcached code so we can run it in process instead of running as a separate process and removing the global locks in queue reallocation etc and eventually diverged too much from it. The other reason it doesn't use slab allocators is because we wanted the memory allocators to do the right thing we already have tested Cassandra with Jemalloc. Ah very cool. jemalloc is not a moving allocator where as it looks like memcached slabs implement rebalancing to accommodate changes in size distribution. That would actually be one of the really nice things to keep IMO. On large memory systems with a cache that scales and performs you would end up dedicating as much RAM as possible to the row cache/key cache and not the page cache since the page cache is not as granular (correct me if the story for C* is different). If you dedicate 80% of RAM to the cache that doesn't leave a lot of space left for fragmentation. By using a heap allocator you also lose the ability to implement hard predictable limits on memory used by the cache since you didn't map it yourself. I could be totally off base and jemalloc might be good enough. .bq There is some comments above which has the reasoning for it (please see the above comments). PS: I believe there was some tickets on Current RowCache complaining about the overhead. I don't have a performance beef with JNI, especially the way you have done which I think is pretty efficient. I think the overhead of JNI (one or two slightly more expensive function calls) would be eclipsed by things like the cache misses, coherence, and pipeline stalls that are part of accessing and maintaining a concurrent cache (Java or C++). It's all just intuition without comparative microbenchmarks of the two caches. Java might look a little faster just due to allocator performance, but we know you pay for that in other ways. I think what you have made scratches the itch for a large cache quite well, and beats the status quo. I don't agree that Unsafe couldn't do the exact same thing with no on heap references. The hash table, ring buffer, and individual item entries are all being malloced and you can do that from Java using Unsafe. You don't need to implement a ring buffer because you can use Disruptor. I also wonder if splitting the cache into several instances each with a coarse lock per instance wouldn't result in simpler, and I know performance is not an issue, fast enough code. I don't want to advocate doing something different for performance, but rather that there is the possibility of a relatively simple implementation via Unsafe. You could coalesce all the contended fields for each instance (stats, lock field, LRU head) into a single cache line, and then rely on a single barrier when releasing a coarse grained lock. The fine grained locking and CASing results in several pipeline stalls because the memory barriers that are implicit in each one require the store buffers to drain. There may even be a suitable off heap map implementation out there already. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196492#comment-14196492 ] Vijay commented on CASSANDRA-7438: -- {quote} well I think you run into another issue which is that the ring buffer doesn't appear to check for queue full? {quote} Yeah i thought about it, we need to handle those and thats why didn't have it in the first place. Should not be really bad though. {quote} I don't agree that Unsafe couldn't do the exact same thing with no on heap references {quote} Probably, since we figured most of the implementation detail sure we can but still there is always many different ways to solve the problem (Even though it will be in efficient to copy multiple bytes to get to the next items in map etc... GC and CPU overhead would be more IMHO). For example Memcached used expiration time set by the clients to remove the items which made it easier for them to do the slab allocator but this is something we removed it in lruc and just a queue. {quote} I also wonder if splitting the cache into several instances each with a coarse lock per instance wouldn't result in simpler {quote} The problem there is how will you invalidate the last used items, since they are different partitions you really don't know which ones to invalidate... there is also a problem of load balancing when to expand the buckets etc which will bring us back to the current lock stripping solutions IMHO. I can do some benchmarks if thats exactly what we need at this point Thanks! Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196885#comment-14196885 ] Jonathan Ellis commented on CASSANDRA-7438: --- bq. There is some comments above which has the reasoning for [why JNI is justified]. PS: I believe there was some tickets on Current RowCache complaining about the overhead. Aren't all those objections to the current design and not to Unsafe per se? Adding native libraries + JNI is a pretty huge step in build, QA, and runtime complexity. I'd like to avoid it if at all possible. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197029#comment-14197029 ] Vijay commented on CASSANDRA-7438: -- {quote} Aren't all those objections to the current design {quote} I am fine to make it configurable and maintain it in a separate project but i didn't realize that was the case. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195418#comment-14195418 ] Ariel Weisberg commented on CASSANDRA-7438: --- RE refcount: I think hazard pointers (never used them personally) are the no-gc no-refcount way of handling this. It also won't be fetched twice if it is uncontended which in many cases it will be since it should be decrefd as soon as the data is copied. I think that with the right QA work this solves the problem of running arbitrarily large caches. That means running a validating workload in continuous integration that demonstrates the cache doesn't lock up, leak, or return the wrong answer. I would probably test directly against the cache to get more iterations in. RE Implementation as a library via JNI: We give up something by using JNI so it only makes sense if we get something else in return. The QA and release work created by JNI is pretty large. You really need a plan for running something like Valgrind or similar against a comprehensive suite of tests. Valgrind doesn't run well with Java AFAIK so you end up doing things like running the native code in a separate process, and have to write an interface amenable to that. Valgrind is also slow enough that if you try and run all your tests against a configuration using it a lot you end up with timeouts and many hours to run all the tests plus time spent interpreting results. Unsafe is worse in that respect because there is no Valgrind and I can attest that debugging an off-heap red-black tree is not fun. I am not clear on why the JNI is justified. It really seems like this could be written against Unsafe and then it would work on any platform. There are no libraries or system calls in use that are only accessible via JNI. I think JNI would make more sense if we were pulling in existing code like memcached that already handles memory pooling, fragmentation, and concurrency. If it were in Java you could use Disruptor for the queue and would only need to implement a thread safe off heap hash table. RE Performance and implementation: What kind of hardware was the benchmark run on? Server class NUMA? I am just wondering if there are enough cores to bring out any scalability issues in the cache implementation. It would be nice to see a benchmark that showed the on heap cache falling over while the off heap cache provides good performance. Subsequent comments aren't particularly useful if performance is satisfactory under relevant configurations. Given the use of a heap allocator and locking it might not make sense to have a background thread do expiration. I think that splitting the cache into several instances with one lock around each instance might result in less contention overall and it would scale up in a more straightforward way. It appears that some common operations will hit a global lock in may_expire() quite frequently? It seems like there are other globally shared frequently mutated cache lines in the write path like stats. Is there something subtle in the locking that makes the use of the custom queue and maps necessary or could you use stuff from Intel TBB and still make it work? It is hypothetically less code to have to QA and maintain. I still need to dig more, but I am also not clear on why locks are necessary for individual items. It looks like there is a table for all of them? Random intuition is that it could be done without a lock or at least a discrete lock. Striping against a padded pool of locks might make sense if that isn't going to cause deadlocks. Apparently every pthread_mutex_t is 40 bytes according to a random stack overflow post. It might make sense to use the same cache line as the refcount to store a lock field, or the bucket in the hash table? Another implementation question is do we want to use C++11? It would remove a lot of platform and compiler specific code. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195679#comment-14195679 ] Vijay commented on CASSANDRA-7438: -- Thanks for reviewing! {quote} I am also not clear on why locks are necessary for individual items. {quote} No we don't. We have locks per Segment, this is very similar to lock stripping or the smiler to Java's concurrent hash map. {quote} global lock in may_expire() quite frequently? {quote} Not really we lock globally when we reach 100% of the space and we freeup to 80% of the space and we spread the overhead to other threads based on who ever has the item partition lock. It won't be hard to make this part of the queue thread and will try it for the next release of lruc. {quote} What kind of hardware was the benchmark run on? {quote} 32 core 100GB RAM with numa and intel xeon. There is a benchmark util which is also checked in as a part of the lruc code which does exactly the same kind of test. {quote} You really need a plan for running something like Valgrind {quote} Good point, I was part way down that road and still have the code i can resuruct it for the next lruc version. {quote} I am not clear on why the JNI is justified {quote} There is some comments above which has the reasoning for it (please see the above comments). PS: I believe there was some tickets on Current RowCache complaining about the overhead. {quote} I think JNI would make more sense if we were pulling in existing code like memcached {quote} If you look at the code closer to memcached. Actually I started of stripping memcached code so we can run it in process instead of running as a separate process and removing the global locks in queue reallocation etc and eventually diverged too much from it. The other reason it doesn't use slab allocators is because we wanted the memory allocators to do the right thing we already have tested Cassandra with Jemalloc. To confort a bit lruc is running in our production already :) Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193760#comment-14193760 ] Vijay commented on CASSANDRA-7438: -- Pushed, Thanks! {quote} We should ensure that changes in the serialized format of saved row caches are detected {quote} I don't think we changed the format, did i? {quote} item.refcount - it refcount is updated, the whole cache line needs to be re-fetched (CPU) {quote} The refcount is per item in the cache, for every item inserted we track this in its memory location. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193785#comment-14193785 ] Robert Stupp commented on CASSANDRA-7438: - bq. I don't think we changed the format, did i? Ah - no. Sorry - got confused with the in-memory serialization. bq. item.refcount What I mean is the the (Intel) CPU L1+L2 cache line size (64 bytes). If 'refcount' is updated (e.g. just for a cache-get), the whole cache line is invalidated (twice) and needs to be re-fetched from RAM although its content did not change. It's just a point for optimization - if we find a viable solution for that, we should implement it. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193074#comment-14193074 ] Robert Stupp commented on CASSANDRA-7438: - bq. total 1 byte profit bikeshed : could be two if serialized unencoded :) I took some time to see whether it could be some lhf to change that - but it isn't really (because there are some uses of {{DataIn/Output(Plus)}} that would need to be changed, too - and it is used widely - even (if I saw that correctly) in SSTables (the point at which I stopped investigating ;) ) Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192874#comment-14192874 ] Jonathan Ellis commented on CASSANDRA-7438: --- [~aweisberg], would be useful to get your take on this too. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192922#comment-14192922 ] Pavel Yaskevich commented on CASSANDRA-7438: I looked trough the patch and everything looks good, but one small thing: - FBUtilities.newRowCacheProvider needs it's argument renaming, because it looks like it has been copied from FBUtilities.newPartitioner and has old names, so instead of paritioner it should be rowCacheClassName and rowCache as the last argument for FBUtilities.construct(...). I just want to address Robert's comment regarding EncodedData{Input/Output}Stream: I agree that longs returned by version 1 UUID are not that compressible and vint is actually going to add 1 byte on top of long (which is pretty easy to test), but the good thing is that although we loose 2 bytes in long serialization we actually win back at least 2 bytes by vint encoding length of the key and, in best case, if key size is less than 127 (which is highly likely) we are actually going to win 3 bytes which makes in total 1 byte profit from encoding :) Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190711#comment-14190711 ] Robert Stupp commented on CASSANDRA-7438: - Did a walkthrough to lruc 0.7, too... Altogether +1 on the current state :) Just one NIT: * Move {{Preconditions.checkArgument(capacity 0 ...}} in {{LRUCache.java}} from {{capacity()}} to {{setCapacity}} One thing regarding saved row caches: We should ensure that changes in the serialized format of saved row caches are detected (and either converted during load or just discarded) Comments would be nice to have in a future version. * Think you need to add the APLv2 license header to all source files ;) * The NEWS, COPYING and AUTHORS files in {{lruc/src/native}} and {{lurk}} are blank * {{stats}} struct is heavily used using CAS - maybe think of aligning the individual values to separate CPU cache lines to reduce CPU cache refreshes * similar for {{item.refcount}} - it refcount is updated, the whole cache line needs to be re-fetched (CPU) * {{o.a.c.cache.ICacheProvider.RowKeySerializer}} tries to „compress“ the two {{long}} values of UUID via {{EncodedDataOutputStream}}/{{EncodedDataInputStream}} - this is usually not possible for long values of a UUID resulting in bigger serialized representations than necessary (this is what the default serialization e.g. UUIDSerializer does) Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189224#comment-14189224 ] Robert Stupp commented on CASSANDRA-7438: - LGTM - but some comments: * the comments in cassandra.yaml could be more fleshy (see below) * the version of lruc.jar should not be a SNAPSHOT version and a bit higher than 0.0.1 (although it's just a number, people usually don't trust something with a '0' in front :) ) - [lruc repo|https://github.com/Vijay2win/lruc/commits/master] shows v0.7 as current version - recommend to use the latest lruc release in C* * would be very nice to have these released on maven central * after lruc-0.7 is used for this ticket, we should run a stress test against a cluster using OffheapCacheProvider as some kind of smoke test {code} # Number of keys from the row cache to save. # Disabled by default, meaning all keys are going to be saved. # row_cache_keys_to_save: 100 # Row cache provider to use. # Possible values are SerializingCacheProvider and OffheapCacheProvider. # Default is no row cache. # # SerializingCacheProvider is the one used in previous versions of Cassandra. # It is available on all platforms and uses offheap memory for the rows but # structures on the Java heap to manage the offheap row data. # # OffheapCacheProvider is new in Cassandra 3.0 and only available on # Unix platforms (Linux and OSX). # It uses a native code library to manage the whole row cache including # management information in native memory thus reducing heap # pressure compared to SerializingCacheProvider. # # row_cache_provider: SerializingCacheProvider {code} Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189364#comment-14189364 ] Vijay commented on CASSANDRA-7438: -- rebased, pushed with latest binaries. {quote} the comments in cassandra.yaml could be more fleshy (see below) {quote} Sorry my bad missed it before and thanks for the write up i just copied it into the fork {quote} recommend to use the latest lruc release in C* {quote} Yeah i did setup release and publishing to maven central, few weeks ago. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172078#comment-14172078 ] Robert Stupp commented on CASSANDRA-7438: - Will take a look at this this week. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160044#comment-14160044 ] Vijay commented on CASSANDRA-7438: -- Pushed most of the changes to https://github.com/Vijay2win/cassandra/commits/7438, not sure moving the tests and code into cassandra code base (since i am really neutral on that). Other related changes, tests and refactor is pushed as a part of 3 main commits in https://github.com/Vijay2win/lruc/commits/master. cc [~xedin] Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157989#comment-14157989 ] Jonathan Ellis commented on CASSANDRA-7438: --- Are you still working on this, [~vijay2...@gmail.com]? Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158005#comment-14158005 ] Vijay commented on CASSANDRA-7438: -- Hi Jonathan, yes, i am adding more tests and fixing a test failure to lruc going to post the patch soon. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145338#comment-14145338 ] Robert Stupp commented on CASSANDRA-7438: - (note: [~vijay2...@gmail.com], please use the other nick) Some quick notes: * Can you add the assertion for {{capacity = 0}} to {{OffheapCacheProvider.create}} - the current error message if {{row_cache_size_in_mb}} is not set (or invalid) capacity should be set could be more fleshy * Additionally the {{capacity}} check should also check for negative values (it starts with a negative value - don't know what happens if it is negative...) * {{org.apache.cassandra.db.RowCacheTest#testRowCacheCleanup}} fails at the last assertion - all other unit tests seem to work * Documentation in cassandra.yaml for row_cache_provider could be a bit more verbose - just some abstract about the characteristics and limitation (e.g. Offheap does only work on Linux + OSX) of both implementations * IMO it would be fine to have a general unit test for {{com.lruc.api.LRUCache}} in C* code, too * Please add an adopted copy of {{RowCacheTest}} for OffheapCacheProvider * unit tests using OffheapCacheProvider must not start on Windows builds - please add an assertion in OffHeapCacheProvider to assert that it runs on Linux or OSX Sorry for the late reply Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144260#comment-14144260 ] Vijay commented on CASSANDRA-7438: -- Hi [~rst...@pironet-ndh.com], I dont see a problem in copying the code or rewriting the code, once you complete the rest of the review we can see what we can do. I am guessing you where not waiting for my response :) Thanks! Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080708#comment-14080708 ] Robert Stupp commented on CASSANDRA-7438: - bq. Yeah it works in unix, but the problem is i don't have a handle since its a temp file after restart. So it is a best effort for cleanups. It's a really sick problem. I changed our Snappy integration a similar way. IMO there's no better solution than messing the temp dir. bq. The problem is it produces a circular dependency Ah - I meant that lruc code is copied to C* code base (if the others agree). But this could be a second step since it's only a bit of refactoring. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075611#comment-14075611 ] Robert Stupp commented on CASSANDRA-7438: - [~vijay2...@gmail.com] do you have a C* branch with lruc integrated? Or: what should I do to bring lruc+C* together? Is the patch up-to-date? I've pushed a new branch 'native-plugin' with the changes for native-maven-plugin - separate from the other code. Windows stuff is bit more complicated - it doesn't compile. Have to dig a bit deeper. Maybe delay Win port... Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075630#comment-14075630 ] Robert Stupp commented on CASSANDRA-7438: - Surely not a complete list, but a start... Java code: * com.lruc.util.Utils.getUnsafe can be safely deleted * com.lruc.util.Utils.extractLibraryFile ** should check return code of {{File.createNewFile}} ** Call to {{File.delete()}} for the extracted library file should be added to {{com.lruc.util.Utils.loadNative}} since unclean shutdown (kill -9) does not delete the so/dylib file. Possible for Unix systems - but not for Win. * Classes com.lruc.jni.lruc, SWIGTYPE_p_item and SWIGTYPE_p_p_item are unused (refactoring relict?) * Generally the lruc code could be more integrated in C* code. ** Let the lruc classes implement org.apache.cassandra.io.util.DataOutputPlus and java.io.DataInput so that they can be directly used by C* ColumnFamilySerializer (no temporary Input/OutputStreams necessary). ** Maybe {{DataOutputPlus.write(Memory)}} can be removed in C* when lruc is used - not sure about that. ** Implement most DataInput/Output methods in EntryInput/OutputStream to benefit from Unsafe (e.g. Unsafe.getLong/putLong) - have seen, that you've removed Abstract... some weeks ago ;) ** Using Unsafe for DataInput/Output of short/int/long/float/double has the drawback that Unsafe always uses the system's byte order - not (necessarily) the portable Java byte order. There's of course no drawback, if all reads/writes are paired. ** {{Unsafe.copyMemory}} could be used for {{write(byte[])}}/{{read(byte[])}}. * Naming of max_size, capacity - should use one common term which also makes sure that it's a maximum memory size - e.g. max_size_bytes. _Capacity_ is often used for the number of elements in a collection. * Memory leak: {{com.lruc.api.LRUCache.hotN}} may keep references in native code (no {{lruc_deref}} calls), if not all items are retrieved from the iterator - remove _hotN_ or return an array/list instead? * Generally I think all classes can be merged into a single package if only a few a are left (see above) C code: * {{#define item_lock(hv) while (item_trylock(hv)) continue;}} shouldn't there be something like a _yield_ ? * Seems like the C code was not cleaned up after you began using Unsafe.allocateMemory :) * I did not follow all possible code paths (due to the previous point) Common: * {{prefix_delimiter}} seems to be unused Altogether I like that :) Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075676#comment-14075676 ] Vijay commented on CASSANDRA-7438: -- Pushed the branch to https://github.com/Vijay2win/cassandra/tree/7438 {quote}Maybe delay Win port{quote} We should be fine, lruc is configurable with the Serialization cache. {quote}unclean shutdown (kill -9) does not delete the so/dylib file{quote} Yeah it works in unix, but the problem is i don't have a handle since its a temp file after restart. So it is a best effort for cleanups. {quote}SWIGTYPE_p_item and SWIGTYPE_p_p_item are unused {quote} Auto generated and can be removed but will be generated every time swig is run. {quote}Generally the lruc code could be more integrated in C* code{quote} The problem is it produces a circular dependency, please look at df3857e4b9637ed6a5099506e95d84de15bf2eb7 where i removed those (the DOSP added back will still need to wrapped around by Cassandra's DOSP). {quote}Naming of max_size, capacity{quote} Yeah let me make it consistent, the problem was i was trying to fit everything into Guava interface. {quote}remove hotN or return an array/list instead{quote} Or may be do memcpy on keys, since this doesn't need optimization (will fix). {quote}shouldn't there be something like a yield{quote} Actually i removed it recently adding or removing doesn't give much performance gains, as a good citizen should add it back. {quote}Seems like the C code was not cleaned up{quote} This cannot be removed and needed for test cases. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075516#comment-14075516 ] Vijay commented on CASSANDRA-7438: -- {quote} unsafe.memoryAllocate instead and replicate what we do with lruc_item_allocate() {quote} Done, Thanks! Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074452#comment-14074452 ] Robert Stupp commented on CASSANDRA-7438: - [~jbellis] Yes, I can review. I agree with [~vijay2...@yahoo.com] - Unsafe can only work when big regions are allocated. Then use own malloc/free implementations to manage these big memory regions which are split into small blocks. On top of that we need to implement a concurrent map that stores data only in off-heap memory. I think we can manage that, but it takes time - need to prevent synchronization, use CAS, prevent fragmentation (best-effort). Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072935#comment-14072935 ] Robert Stupp commented on CASSANDRA-7438: - my username on github is snazy Do you know {{org.codehaus.mojo:native-maven-plugin}}? It allows JNI compilation on almost all platforms directly from Maven and does not interfere with SWIG - have used it on OSX, Linux and Win. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)
[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072950#comment-14072950 ] Benedict commented on CASSANDRA-7438: - bq. Not sure what we are talking about, this == lurc? if yes the RB is fronting the queue so we don't need a global lock. I was referring to [~rst...@pironet-ndh.com]'s assertion of the need for some kind of memory management - you use no tools that aren't available through unsafe/NativeAllocator was my only point. Serializing Row cache alternative (Fully off heap) -- Key: CASSANDRA-7438 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 Project: Cassandra Issue Type: Improvement Components: Core Environment: Linux Reporter: Vijay Assignee: Vijay Labels: performance Fix For: 3.0 Attachments: 0001-CASSANDRA-7438.patch Currently SerializingCache is partially off heap, keys are still stored in JVM heap as BB, * There is a higher GC costs for a reasonably big cache. * Some users have used the row cache efficiently in production for better results, but this requires careful tunning. * Overhead in Memory for the cache entries are relatively high. So the proposal for this ticket is to move the LRU cache logic completely off heap and use JNI to interact with cache. We might want to ensure that the new implementation match the existing API's (ICache), and the implementation needs to have safe memory access, low overhead in memory and less memcpy's (As much as possible). We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.2#6252)