subject:"\[jira\] \[Commented\] \(CASSANDRA\-7438\) Serializing Row cache alternative \(Fully off heap\)"

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2015-01-21 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285916#comment-14285916
]

Ariel Weisberg commented on CASSANDRA-7438:
---

If you turn the OOME throwing on in C* I am +1.

I did a quick performance test with the cache and compared it to the
SerializingCache. I didn't test a scenario where it would be better/faster, but
the performance looked just as good. Very noisy test with different results
every time I restarted so maybe not a great way to measure.

Serializing Row cache alternative (Fully off heap)
--

Key: CASSANDRA-7438
URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
Project: Cassandra
Issue Type: Improvement
Components: Core
Environment: Linux
Reporter: Vijay
Assignee: Robert Stupp
Labels: performance
Fix For: 3.0

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

Currently SerializingCache is partially off heap, keys are still stored in
JVM heap as BB,
* There is a higher GC costs for a reasonably big cache.
* Some users have used the row cache efficiently in production for better
results, but this requires careful tunning.
* Overhead in Memory for the cache entries are relatively high.
So the proposal for this ticket is to move the LRU cache logic completely off
heap and use JNI to interact with cache. We might want to ensure that the new
implementation match the existing API's (ICache), and the implementation
needs to have safe memory access, low overhead in memory and less memcpy's
(As much as possible).
We might also want to make this cache configurable.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2015-01-20 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284416#comment-14284416
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

bq. I am +1 conditional on the library throwing OOME if the allocator fails.

Will add a configuration switch to explicitly enable this behavior.

bq. There are also still some internal properties inside OHC that are don't 
have a prefix.

Yes, debug-mode and disable-jemalloc did not had the prefix. Will be changed.

bq. I noticed you fixed some C* bugs ... need to be backported?

It's only the {{==}} to {{equals}} change in 
{{ColumnFamilyStore.cleanupCache}}. It's not necessary to fix it for older 
versions, since the {{UUID}} instance is taken from {{CFMetaData}} - so the 
{{==}} is (was) correct.

bq. Can you publish a new version to maven central so I can benchmark it vs the 
old cache implementation?

OHC 0.3 + 0.3.1 are on Maven Central.
Note: OHC 0.3.1 incorporates the changes above (might not found using Maven 
Central search, but artifacts are there)
C* git branch updated to use 0.3.1

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Robert Stupp
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2015-01-20 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14284230#comment-14284230
 ] 

Ariel Weisberg commented on CASSANDRA-7438:
---

I am +1 conditional on the library throwing OOME if the allocator fails. It 
should be the caller of the library that decides how to handle the situation 
not the library IMO.

There are also still some internal properties inside OHC that are don't have a 
prefix. 

I noticed you fixed some C* bugs 
https://github.com/snazy/cassandra/compare/7438-pluggable#diff-98f5acb96aa6d684781936c141132e2aL1915
 
Do those fixes need to be backported?

Can you publish a new version to maven central so I can benchmark it vs the old 
cache implementation?

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Robert Stupp
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2015-01-19 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14282467#comment-14282467
]

Robert Stupp commented on CASSANDRA-7438:
-

I think the possibly best alternative to access malloc/free is {{Unsafe}} with
jemalloc in LD_PRELOAD. Native code of {{Unsafe.allocateMemory}} is basically
just a wrapper around {{malloc()}}/{{free()}}.

Updated the git branch with the following changes:
* update to OHC 0.3
* benchmark: add new command line option to specify key length (-kl)
* free capacity handling moved to segments
* allow to specify preferred memory allocation via system property
org.caffinitas.ohc.allocator
* allow to specify defaults of OHCacheBuilder via system properties prefixed
with org.caffinitas.org.
* benchmark: make metrics in local to the driver threads
* benchmark: disable bucket histogram in stats by default

I did not change the default number of segments = 2 * CPUs - but I thought
about that (since you experienced that 256 segments on c3.8xlarge gives some
improvement). A naive approach to say e.g. 8 * CPUs feels too heavy for small
systems (with one socket) and might be too much outside of benchmarking. If
someone wants to get most out of it in production and really hits the number of
segments, he can always configure it better. WDYT?

Using jemalloc on Linux via LD_PRELOAD is probably the way to go in C* (since
off-heap is also used elsewhere).
I think we should leave the OS allocator on OSX.
Don't know much about allocator performance on Windows.

For now I do not plan any new features for C* - so maybe we shall start a final
review round?

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2015-01-16 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280945#comment-14280945
 ] 

Ariel Weisberg commented on CASSANDRA-7438:
---

I ran the benchmark on the develop branch today using a c3.8xlarge and profiled 
with flight recorder. There is definitely some contention on the lock in JNA. I 
also see a little in AbstractQueuedSynchronizer from locking the segments. 
along with some park/unpark activity.

I built jemalloc (-march=native --disable-fill --disable-stats). The Ubuntu 
package compiles at o2 instead of o3. I am getting full utilization across 30 
threads if I increase the number of segments to 256 otherwise it hovers around 
2600% (with 30 threads). It cuts in half the number of instances of contention 
in the profiler.

The workload settings you ran with resulted in a lot of cache (ohcache, not CPU 
cache) misses. I think a real workload where the cache is useful will have more 
hits.

One note about the benchmark, building the histogram of buckets is not a 
lightweight operation. I think that should be off by default. I removed it for 
my testing. Otherwise it looks ok. Using the Timer as shared state in a 
micro-benchmarks is probably not the way to go. I would have a timer per driver 
thread and then aggregate.

I am running 1-30 threads and it will take a few hours to finish. I am going to 
look into benchmarking inside C* and comparing the existing cache 
implementation to OHC now.

I used this which gave me mostly cache hits and filled up quite a bit of RAM. 
It takes a minute or two to fill the cache.
{noformat}
#!/bin/sh
LD_PRELOAD=~/jemalloc-3.6.0/lib/libjemalloc.so.1 \
java -Xmx8g -XX:+UnlockCommercialFeatures -XX:+FlightRecorder \
-DDISABLE_JEMALLOC=true \
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=7091 
-Dcom.sun.management.jmxremote.local.only=false \
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false \
-Djava.rmi.server.hostname=ec2-54-172-234-230.compute-1.amazonaws.com \
-jar ohc-benchmark/target/ohc-benchmark-0.3-SNAPSHOT.jar  \
-rkd 'gaussian(1..1500,2)' -wkd 'gaussian(1..1500,2)' -vs 
'gaussian(1024..4096,2)' -r .9 -cap 320 \
-d 120 -t 30 \
-sc 256
{noformat}

256 segments, jemalloc LD_PRELOAD, -DDISABLE_JEMALLOC=true
{noformat}
 Reads : one/five/fifteen/mean:  2503894/2143858/2036336/2459949
 count:   295258886 
 min/max/mean/stddev: 0.00047/ 0.76172/ 0.00652/ 0.03865
 75/95/98/99/999/median:  0.00439/ 0.00697/ 0.01147/ 0.03458/ 
0.75864/ 0.00342
 Writes: one/five/fifteen/mean:  278134/238242/226326/273275
 count:32800525 
 min/max/mean/stddev: 0.00176/ 0.89665/ 0.00945/ 0.03986
 75/95/98/99/999/median:  0.00719/ 0.01180/ 0.01816/ 0.11640/ 
0.89006/ 0.00556
{noformat}

256 segments, jemalloc via jna
{noformat}
 Reads : one/five/fifteen/mean:  2343872/1458688/1159829/2387622
 count:   286635526 
 min/max/mean/stddev: 0.00054/ 0.97114/ 0.00756/ 0.04664
 75/95/98/99/999/median:  0.00435/ 0.00675/ 0.00985/ 0.05139/ 
0.95959/ 0.00341
 Writes: one/five/fifteen/mean:  260376/162076/128883/265250
 count:31843705 
 min/max/mean/stddev: 0.00267/ 0.70586/ 0.01502/ 0.05161
 75/95/98/99/999/median:  0.01049/ 0.01695/ 0.04193/ 0.36639/ 
0.70331/ 0.00859
{noformat}

default segments, jemalloc LD_PRELOAD, -DDISABLE_JEMALLOC=true
{noformat}
 Reads : one/five/fifteen/mean:  2148677/1630379/1448226/2202878
 count:   264549288 
 min/max/mean/stddev: 0.00035/ 0.66081/ 0.00820/ 0.03519
 75/95/98/99/999/median:  0.00435/ 0.01247/ 0.05423/ 0.20834/ 
0.65286/ 0.00323
 Writes: one/five/fifteen/mean:  238699/180945/160641/244767
 count:29395103 
 min/max/mean/stddev: 0.00172/ 0.39821/ 0.01120/ 0.03079
 75/95/98/99/999/median:  0.00805/ 0.02124/ 0.08665/ 0.18473/ 
0.39776/ 0.00574
{noformat}

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Robert Stupp
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2015-01-13 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275809#comment-14275809
]

Ariel Weisberg commented on CASSANDRA-7438:
---

bq. Making freeCapacity a per-segment field: Then I'd prefer to reverse the
stuff - i.e. add an allocatedBytes field to each segment. Operations would
get the (calculated) free capacity as a parameter and act on that value. Or
were you thinking about dividing capacity by number of segments and use that as
the max capacity for each segment?
The goal of splitting the locks and tables into segments is to eliminate any
globally shared cache lines that are written to by common operations on the
cache. Having every put modify the free capacity introduces a potential point
of contention. Only way to really understand the impact is to have a good micro
benchmark you trust and try it both ways.

I think that you would split the capacity across the segments. Do exactly what
you are doing now, but do the check inside the segment. Since puts are allowed
to fail I don't think you have to do anything else.

bq. Regarding rehash/iterators: Could be simply worked around by counting the
number of active iterators and just don't rehash while an iterator is active.
That's better than e.g. returning duplicate keys or keys not at all - i.e.
people relying on that functionality.
This is an OHC not a C* issue. I think from C*'s perspective it can be wrong
rarely and it doesn't matter since it doesn't effect correctness. Definitely
worth documenting though.

bq. I lean towards removing the new tables implementation in OHC. It has the
big drawback that it only a allows a specific number of entries per bucket
(e.g. 8). But I'd like to defer that decision after some tests on a NUMA
machine.
You are on to something in terms of making a faster hash table, but it doesn't
seem like huge win given the short length of most chains (1, or 2) and the
overhead of the allocator and locking etc. It would show up in a
micro-benchmark, but not in C*. I would like to stick with linked for C* for
now since it's easy to understand and I've looked at it a few times.

I think I already sent you a link to this
https://www.cs.cmu.edu/~dga/papers/silt-sosp2011.pdf but there are a lot of
ideas there for dense hash tables. You can chain together multiple buckets so
the entries per bucket becomes a function of cache line size.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2015-01-13 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275047#comment-14275047
]

Robert Stupp commented on CASSANDRA-7438:
-

Thanks for the review. Many useful hints in it :)

I'll reduce configuration stuff in C* integration as suggested and add some
default-by-system-property mechanism (as suggested).

Making freeCapacity a per-segment field:
Then I'd prefer to reverse the stuff - i.e. add an allocatedBytes field to
each segment. Operations would get the (calculated) free capacity as a
parameter and act on that value. Or were you thinking about dividing capacity
by number of segments and use that as the max capacity for each segment?

Regarding rehash/iterators: Could be simply worked around by counting the
number of active iterators and just don't rehash while an iterator is active.
That's better than e.g. returning duplicate keys or keys not at all - i.e.
people relying on that functionality.

I just started JMH without any additional parameters. It's called during Maven
test phase (unless you specify -DskipTests).

You're right. Murmur3 + UTF8 need more tests.

Didn't notice that that fastutil is that fat. Already replaced with an own
implementation.

I lean towards removing the new tables implementation in OHC. It has the big
drawback that it only a allows a specific number of entries per bucket (e.g.
8). But I'd like to defer that decision after some tests on a NUMA machine.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2015-01-12 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274005#comment-14274005
]

Ariel Weisberg commented on CASSANDRA-7438:
---

If you go all the way down the JMH rabbit hole you don't need to do any of your
own timing and JMH will actually do some smart things to give you accurate
timing and ameliorate the impact of non-scalable/expensive timing measurement.
Metrics uses System.nanoTime() internally so it isn't really any better as far
as I can tell. System.nanoTime() on Linux is pretty scalable
http://shipilev.net/blog/2014/nanotrusting-nanotime/. When I tested it in JMH
it actually seemed to be linearly scalable, but JMH will solve that for you
even on platforms where nanoTime is finicky.

The C* integration looks good. I'm glad it was easy. When it comes to exposing
configuration parameters less is more

The stress tool when used without workload profiles does some validation. It
checks that values are there and that the contents are correct.

Did not know about the JNA synchronized block. That is surprising, but I am
glad to hear it is getting fixed. For access to jemalloc I recommend using
unsafe and LD_PRELOAD jemalloc. I think that would be the recommended approach
and the one you should benchmark against and JNA would be there as a fallback.
That gives you a JNI call for allocation/deallocation.

I am trying out the JMH benchmark and looking at the new linked implementation
right now. How are you starting the JMH benchmark?

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2015-01-12 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274409#comment-14274409
]

Ariel Weisberg commented on CASSANDRA-7438:
---

I did another review. The additional test coverage looks great.

Don’t throw Error, throw runtime exceptions on things like serialization
issues. The only place it make sense to throw error is when allocating memory
fails. That would match the behavior of ByteBuffer.allocateDirect. I don’t see
failure to allocate from the heap allocator as recoverable even in the context
of the cache. IOError is thrown from one place in the entire JDK (Console) so
it's an odd choice.

freeCapacity should reall be a field inside each segment and full/not full and
eviction decisions should be made inside each segment independently. In
practice inside C* it’s probably fine as just an AtomicLong, but I want to see
OHC be all it can be.

Rehash test could validate the data. After the rehash. It could also validate
the rehash under concurrent access, say have a reader thread that is randomly
accessing values the already inserted value.I can’t tell if the crosscheck
test inserts enough values to trigger rehashing.

Inlining the murmur3 changes makes me a little uncomfortable. It’s good see see
some test coverage comparing with another implementation, but it’s over a small
set of data. It seems like the Unsigned stuff necessary to perfectly mimic the
native version of murmur3 is missing?

Add 2-4 byte coed points for the UTF-8 tests.

FastUtil is a 17 megabyte dependency all to get one array list.

The cross checking implementation is really nice.

Looking at the AbstractKeyIterator, I don’t see how it can do the right thing
when a segment rehashes. It will point to a random spot in the segment after a
rehash right? In practice maybe this doesn’t matter since they should size up
promptly and it’s just an optimization that we dump this stuff at all. I can
understand what the current code does so I lean towards keeping it.

There are a couple of places (serializeForPut, putInternal, maybe others) where
there are two exception handlers that each de-allocate the same piece of
memory. The deallocation could go in a finally instead of the exception
handlers since it always happens.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2015-01-09 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14271745#comment-14271745
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

Note: OHC how has cache-loader support (https://github.com/snazy/ohc/issues/3). 
Could be an alternative for RowCacheSentinel.

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Robert Stupp
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2015-01-07 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267558#comment-14267558
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

BTW: Is there any singe-node-cluster test that has been used to test the 'old' 
row cache or a test that runs against a single-node-cluster and verifies the 
data being written during a long run - i.e. several hours?

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Robert Stupp
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2015-01-06 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265947#comment-14265947
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

The latest (just checked in) benchmark implementation gives much better 
results. Using 
{{com.codahale.metrics.Timer#time(java.util.concurrent.CallableT)}} 
eliminates use of {{System.nanoTime()}} or 
{{ThreadMXBean.getCurrentThreadCpuTime()}} - it can directly use its internal 
clock.
The benchmark {{java -jar ohc-benchmark/target/ohc-benchmark-0.2-SNAPSHOT.jar 
-rkd 'gaussian(1..2000,2)' -wkd 'gaussian(1..2000,2)' -vs 
'gaussian(1024..4096,2)' -r .9 -cap 16 -d 30 -t 30}} improved from 800k 
reads to 3.3M reads per second w/ 8 cores). So yes - benchmark was measuring 
its own mad code. Due to that I edited my previous comment with the benchmark 
results since those are invalid now.

I've added a (yet simple) JMH benchmark as a separate module. This one can 
cause high system CPU usage - at operation rates of 2M per second or more (8 
cores). I think these rates are really fine.

Note: these rates cannot be achieved in production since then you'll obviously 
have to pay for (de)serialization, too.

So we want to address these topics as follow-up:
* own off-heap allocator
* C* ability to access off-heap cached rows
* C* ability to serialize hot keys directly from off-heap (might be a minor win 
since it's triggered not that often)
* per-table knob to control whether to add to row-cache on writes -- I strongly 
believe that this is a useful feature (maybe LHF) on workloads where read and 
written data work on different (row} keys.
* investigate if counter-cache can benefit
* investigate if key-cache can benefit

bq. You could start with it outside and publish to maven central and if there 
an issue getting patches applied quickly we can always fork it in C*.
OK

bq. pluggable row cache
Then I'll start with that - just make row-cache pluggable and the 
implementation configurable.

Note: JNA has a synchronized block that's executed at every call - version 
4.2.0 fixes this (don't know when it will be released).

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2015-01-06 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14266809#comment-14266809
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

OHC works in Cassandra:
* unit tests pass ({{ant test}}, not difference against trunk)
* get and put verified in debugger and a (simple) table
* row cache saving and load working, too

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Robert Stupp
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2015-01-02 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14263083#comment-14263083
]

Ariel Weisberg commented on CASSANDRA-7438:
---

I went to run the benchmark myself and I noticed you used a uniform
distribution for the keys. I don't think that makes sense for testing a cache
where the primary benefit is going to be from cacheable access patterns. I
would use extreme with .6 or .5 for the shape.

I am also confused by the benchmark implementation. There are threads
generating the tasks and then handing them off to other threads for execution.
This means the benchmark is measuring unrelated things like the performance of
the queue used for receiving tasks and returning results as well as the general
design of the harness. It makes me wonder if that is the source of the
under-utilization issue.

I think this might work well as a JMH benchmark and the parameterization would
make it easy to put together a full test matrix that anyone can run with one
command.

I tried to run it and it seems to go for longer than expected. I specified -d
300 and it is still going. The benchmark is doing work according to top.

I ran on a c3.8xlarge using the Rightscale 14.1 base server template running
Ubuntu 14.04, Oracle JDK8u25, I got jemalloc from the libjemalloc1 package.
Cloned OHC today and ran the benchmarking using
bq.java -jar ohc-benchmark/target/ohc-benchmark-0.2-SNAPSHOT.jar -rkd
'gaussian(1..2000,2)' -wkd 'gaussian(1..2000,2)' -vs
'gaussian(1024..4096,2)' -r .9 -cap 160 -d 300 -t 30 -dr 8
after running mvn package.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-31 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262472#comment-14262472
]

Ariel Weisberg commented on CASSANDRA-7438:
---

I have an in progress response to your earlier comment. I'll address the
benchmark here.

I wouldn't sweat allocator performance. Ultimately we will have to have our own
if only to accurately enforce memory utilization (user asks for 200 megabytes,
we use 400, not cool). I think the blueprint for how to do this already exists
in something like memcached in terms of how to allocate and defragment. We just
need to adapt it for our approach where it is a pool of independently locked
hash tables.

The overhead of copying is where zero deserialization and ref-counting start to
be a win since you don't have to copy at all. I wouldn't get worked up on
optimizing for that yet since that requires upstream to be smarter about how it
uses the cache. If upstream can parse the cache value and extract a subset
without copying the entire thing it will handle larger values more gracefully.
At some point upstream might also hold partial rows as well.

I would like to see the ability to spin all cores against the cache, at least
for relatively small values. Not being able to do that is a little concerning.
Are threads blocking inside the allocator? Do the utilization issues occur with
large or small values?

I don't have a real baseline with whether these numbers are good or bad. They
sound okay and as you say you would expect the allocator to be one of the
slowest parts. I am not sure testing with 500 threads is realistic since
threads have a pretty good chance of being descheduled while holding a lock and
that isn't as likely to happen under real usage conditions. I would test with
say 30 threads on that hardware.

For say 16k values measuring scaling from 1-30 threads would give us an idea of
how well things are going. That would also give you better feedback on whether
different numbers of stripes help or not.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-31 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262487#comment-14262487
]

Ariel Weisberg commented on CASSANDRA-7438:
---

bq. Whether to migrate whole OHC code into org.apache.cassandra codebase (with
the option to either turn it on or off).
I am open to either. I asked Benedict and he prefers having it inside C* so we
can patch it. The advantage of having it outside is that it might see use
elsewhere and get additional eyes/contributions. You could start with it
outside and publish to maven central and if there an issue getting patches
applied quickly we can always fork it in C*.

bq. Whether to implement a “pluggable row cache“ (to allow multiple
implementations)
I think that we aren't going to need multiple cache implementations in the long
run. Seems like we should be able to have on that can be configured to have the
desired behavior. Benedict doesn't feel strongly about it either. If Vijay
wants to continue working on another implementation then we would want to keep
it pluggable the way it currently is.

It looks like the KeyCache and CounterCache both use a different implementation
and not SerializingCache. I am not clear on why they don’t use serializing
cache. It's worth evaluating why that is before converging on a single
implementation.

bq. New per-table knob to enable whether to populate entries to the row cache
on reads+writes or just on reads (to target different workloads)
Sounds like it would be useful, but first we have to come up with someone
somewhere that says I want this, or a workload where this is the right call.
There may also be correctness issues to think about see next item.

bq. Rethink about whether to keep the current RowCacheSentinel implementation
as is - if I understand it correctly, it just reduces the number of cache-put
operations (cache hit on a sentinel performs a disk read). A compromise
regarding additional serialization cost?
I think it is for correctness?
https://issues.apache.org/jira/browse/CASSANDRA-3862
I'm still reading up on this.

bq. Improvement of key (de)serialization (saving the row cache to disk) - use
direct I/O
There is some trickiness here because the AutoSavingCache breaks apart the keys
to determine where the data goes.
bq. Optimizations of value deserialization effort - let C* directly access a
cached row in off-heap memory instead of the deserialization (and on-heap
object construction) overhead.
I think these two together would make a good follow up ticket. Another good
follow up ticket would be addressing the allocator for performance and for
fragmentation.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-23 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257721#comment-14257721
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

I had the opportunity to test OHC on a big machine.
First: it works - very happy about that :)

Some things I want to notice:
* high number of segments do not have any really measurable influence (default 
of 2* # of cores is fine)
* throughput heavily depends on serialization (hash entry size) - Java8 gave 
about 10% to 15% improvement in some tests (either on {{Unsafe.copyMemory}} or 
something related like JNI barrier)
* the number of entries per bucket stays pretty low with the default load 
factor of .75 - vast majority has 0 or 1 entries, some 2 or 3 and few up to 8

Issue (not solvable yet):
It works great for hash entries to approx. 64kB with good to great throughput. 
Above that barrier it first works good but after some time the system spends a 
huge amount of CPU time (~95%) in {{malloc()}} / {{free()}} (with jemalloc, 
Unsafe.allocate is not worth discussing at all on Linux).
I tried to add some „memory buffer cache“ that caches free’d hash entries for 
reuse. But it turned out that in the end it would be too complex if done right. 
The current implementation is still in the code, but must be explicitly enabled 
with a system property. Workloads with small entries and high number of threads 
easily trigger Linux OOM protection (that kills the process). Please note that 
it works with large hash entries - but throughput drops dramatically to just a 
few thousand writes per second.

Some numbers (value sizes have gaussian distribution). Had to do these tests in 
a hurry because I had to give back the machine. Code used during these tests is 
tagged as {{0.1-SNAP-Bench}} in git. Throughput is limited by {{malloc()}} / 
{{free()}} and most tests did only use 50% of available CPU capacity (on 
_c3.8xlarge_ - 32 cores, Intel Xeon E5-2680v2 @2.8GHz, 64GB).
* 1k..200k value size, 32 threads, 1M keys, 90% read ratio, 32GB: 22k 
writes/sec, 200k reads/sec, ~8k evictions/sec, write: 8ms (99perc), read: 
3ms(99perc)
* 1k..64k value size, 500 threads, 1M keys, 90% read ratio, 32GB: 55k 
writes/sec, 499k reads/sec, ~2k evictions/sec, write: .1ms (99perc), read: 
.03ms(99perc)
* 1k..64k value size, 500 threads, 1M keys, 50% read ratio, 32GB: 195k 
writes/sec, 195k reads/sec, ~9k evictions/sec, write: .2ms (99perc), read: 
.1ms(99perc)
* 1k..64k value size, 500 threads, 1M keys, 10% read ratio, 32GB: 185k 
writes/sec, 20k reads/sec, ~7k evictions/sec, write: 4ms (99perc), read: 
.07ms(99perc)
* 1k..16k value size, 500 threads, 5M keys, 90% read ratio, 32GB: 110k 
writes/sec, 1M reads/sec, 30k evictions/sec, write: .04ms (99perc), read: 
.01ms(99perc)
* 1k..16k value size, 500 threads, 5M keys, 50% read ratio, 32GB: 420k 
writes/sec, 420k reads/sec, 125k evictions/sec, write: .06ms (99perc), read: 
.01ms(99perc)
* 1k..16k value size, 500 threads, 5M keys, 10% read ratio, 32GB: 435k 
writes/sec, 48k reads/sec, 130k evictions/sec, write: .06ms (99perc), read: 
.01ms(99perc)
* 1k..4k value size, 500 threads, 20M keys, 90% read ratio, 32GB: 140k 
writes/sec, 1.25M reads/sec, 50k evictions/sec, write: .02ms (99perc), read: 
.005ms(99perc)
* 1k..4k value size, 500 threads, 20M keys, 50% read ratio, 32GB: 530k 
writes/sec, 530k reads/sec, 220k evictions/sec, write: .04ms (99perc), read: 
.005ms(99perc)
* 1k..4k value size, 500 threads, 20M keys, 10% read ratio, 32GB: 665k 
writes/sec, 74k reads/sec, 250k evcictions/sec, write: .04ms (99perc), read: 
.005ms(99perc)

Command line to execute the benchmark:
{code}
java -jar ohc-benchmark/target/ohc-benchmark-0.1-SNAPSHOT.jar -rkd 
'uniform(1..2000)' -wkd 'uniform(1..2000)' -vs 'gaussian(1024..4096,2)' 
-r .1 -cap 320 -d 86400 -t 500 -dr 8

-r = read rate
-d = duration
-t = # of threads
-dr = # of driver threads that feed the worker threads
-rkd = read key distribution
-wkd = write key distribution
-vs = value size
-cap = capacity
{code}

Sample bucket histogram from 20M test:
{code}
[0..0]: 8118604
[1..1]: 5892298
[2..2]: 2138308
[3..3]: 518089
[4..4]: 94441
[5..5]: 13672
[6..6]: 1599
[7..7]: 189
[8..9]: 16
{code}

After trapping into that memory management issue with varying allocation sized 
of some few kB to several MB, I think that it’s still worth to work on an own 
off-heap memory management. Maybe some block-based approach (fixed or 
variable). But that’s out of the scope of this ticket.

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-18 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251537#comment-14251537
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

I’ve nearly finished the OHC implementation. Unit tests cover all functionality 
required by C* and a separate test-only implementation is now used to verify 
the implementation (entry (de)serialization is not extensively covered by the 
tests, yet). OHC interface is changed towards the functionality required by C*.

Maven executes the unit tests both with and without jemalloc (only if jemalloc 
is installed, of course).

[~aweisberg], [~benedict] can you have a look at the current OHC code?

I’d like to know how it could/should be integrated in C*. IMO there are two 
decisions to be made:
* Whether to migrate whole OHC code into org.apache.cassandra codebase (with 
the option to either turn it on or off).
* Whether to implement a “pluggable row cache“ (to allow multiple 
implementations)

I've got some ideas regarding row cache which are out of scope of this ticket:
* New per-table knob to enable whether to populate entries to the row cache on 
reads+writes or just on reads (to target different workloads)
* Rethink about whether to keep the current {{RowCacheSentinel}} implementation 
as is - if I understand it correctly, it just reduces the number of cache-put 
operations (cache hit on a sentinel performs a disk read). A compromise 
regarding additional serialization cost?
* Improvement of key (de)serialization (saving the row cache to disk) - use 
direct I/O
* Optimizations of value deserialization effort - let C* directly access a 
cached row in off-heap memory instead of the deserialization (and on-heap 
object construction) overhead.

Note: although the jemalloc allocator provides a {{getTotalAllocated()}} 
method, the result is not correct and I don't know why. The result depends on 
jemalloc configure settings ({{--en/disable-tcache}}). According to the 
man-page the result should be correct (sum of {{stats.allocated}} and 
{{stats.huge.allocated}}), but it isn't (verified with a coded memory leak of 
small allocations that didn't increase the value). Iterating over the jemalloc 
_arenas_ and _bins_ does not help since the two mentioned values are 
aggregations of these.


 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-18 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251756#comment-14251756
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

Ah - a burn test is still missing. I'll add some code that is able to verify 
the cache contents, key iterators, and such stuff.

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-09 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239729#comment-14239729
]

Ariel Weisberg commented on CASSANDRA-7438:
---

Lot's of cool stuff here Robert.

Unit test wise there is a lot of code that is only covered indirectly or not at
all and the behaviors are not checked for explicitly. I don't think it makes
sense to include code that doesn't doesn't have a unit test claiming it does
what is says it does. The various input/output streams, buffering, and
compression all need units tests. Uns.java needs a unit test for pretty much
every method as well as the validation functionality. HashEntry has a bunch of
uncovered functions. For me the lack of test coverage is the biggest barrier.

What I am reacting to is that the tests are black box and miss things.
OffHeapMap containsEntry has no tests. removeEntry has untested code.
removeLink still has untested code. There is untested histogram stuff,
deserializeEntry, serializeEntry. HashEntry classes have untested functions.
HashEntries has many predicates that are untested.

Having a unit test that fuzzes against a parallel implementation at the same
time using a different LRU map implementation would be great for a black box
test. You can stripe the other implementation the same way so that the eviction
matches.

One of my previous comments was that SegmentCacheImpl duplicates reference
counting code from OffHeapMap and should just delegate. It ends up doing that
anyways.

I would really like to see the cleanup/eviction code go away. If inserting an
entry would blow capacity remove entries until it doesn't. I don't see a reason
to monkey with thresholds.

At some point the existing C* cache interface needs to gel with your work.
Right now C* uses the hotN and getKeys interface to return the contents of the
cache for persistence. I think the path of least resistance to start would be
to implement the existing interface and then come back and look at how to get
compression and more efficient IO into all the implementations. The existing
stuff in C* doesn't do compression and doesn't buffer its IO. I would prefer to
minimize major changes to the existing C* code. I want to get it working and
then iterate further for other improvements like more efficient cache
serialization.

You could change the OHC interface or implement an adapter. I think it's fine
to modify ICache to return iterables or iterators instead of a collections to
incrementally produce key set and hot keys. For everything else I would really
like to see things to stay the same unless there is something to be gained by
changing the interface.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-09 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240193#comment-14240193
]

Robert Stupp commented on CASSANDRA-7438:
-

bq. Lot's of cool stuff

Thx :)

Unit testing: you are absolutely right. (Will go on with that next)

bq. unit test that fuzzes against a parallel implementation at the same time
using a different LRU map implementation

Do you mean sth. like LinkedHashMap with removeEldestEntry() ? It's some effort
to get a nice implementation for unit tests - but, yeah - makes sense.

bq. duplicates reference counting code

removed duplicated code

bq. cleanup/eviction code go away ... remove entries until it [fits]

much easier ; cleaner code ; implemented - but not completely sold on the new
implementation yet (quick hack yet)

bq. C* cache interface ... get compression and more efficient IO [later]

That's fair. I just saw some minutes ago that row-cache serialization only
persists the keys and not the values - so the existing implementation in OHC
would need to be changed / extended / whatever. I thought it persists the
value, too.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-09 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240235#comment-14240235
]

Ariel Weisberg commented on CASSANDRA-7438:
---

bq. That's fair. I just saw some minutes ago that row-cache serialization
only persists the keys and not the values - so the existing implementation in
OHC would need to be changed / extended / whatever. I thought it persists the
value, too.
I was also confused by that. Persisting the values would break cache
invalidation in a way that is hard to correct without integrating with the
commit log.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-09 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240305#comment-14240305
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

Yep - persisting the values would cause inconsistencies - either on it's own or 
by users deleting saved caches.

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-07 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237162#comment-14237162
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

Also pushed persistence of cache content using Snappy compression.

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-07 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237170#comment-14237170
]

Robert Stupp commented on CASSANDRA-7438:
-

Rehashing: hm - at {{o.a.c.db.ColumnFamilyStore#getThroughCache}} (better:
{{RowCacheKey}}) we only have the token/key but no (good) hash for the key.

The savings by using a 32 bit hash is about 8 bytes per cache entry
(reference-counter field can then be reduced from 64 bit to 32 bit and still
keeping the 8 byte boundaries for key and value data). But this seems not to
have any measurable effect if e.g. jemalloc aligns allocated memory blocks on
bigger page sizes depending on whole cache entry size (e.g. several kB or MB).

OHC always calculates its own murmur3 hash using the serialized cache key. I
_hope_ to achieve a better distribution across segments and buckets by using 64
bits - but not sure on this. My preference of using 64 hash bits is basically
it feels better.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-04 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234696#comment-14234696
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

Just pushed some OHC additions to github:
* key-iterator (used by CacheService class to invalidate column families)
* (de)serialization of cache content to disk using direct I/O from off-heap.  
Means that the row cache content does not need to go though the heap for 
serialization and deserialization. Compression should also be possible in 
off-heap using the static methods in Snappy class since these expect direct 
buffers so there's nearly no pressure for that on the heap. Background: the 
implementation basically lies the address and length of the hash entry into 
DirectByteBuffer class so FileChannel is able to read into it/write from it.


 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-03 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232833#comment-14232833
]

Benedict commented on CASSANDRA-7438:
-

re: hash bits:

there's not really a dramatic benefit to using more than 32-bits. We will
always use the upper bits for the segment and the lower bits for the bucket,
for which 4B items is plenty, although we don't have proper entropy for all the
bits; we may have only 28-bits of good collision free-ness; we will want to
rehash the murmur hash to ensure this is spread evenly to avoid a grow boundary
consistently failing to reduce collisions.

The one advantage of having some spare hash bits is that we can use these to
avoid running a potentially expensive comparison on a large key until high
confidence we've found the correct item - and as the number of unused hash bits
for indexing dwindle, the value of this goes up. But the number of instances
where this helps will be vanishingly small, since the head of the key will be
on the same cache line and a hash collision and key prefix collision is pretty
unlikely. It might be more significant if we were to use open-address hashing,
as we would have excellent locality and reduce the number of expected cache
misses for a lookup. But this won't be measurable above the cache serialization
costs. We do already have these hash bits calculated in c*, typically. We also
are unlikely to notice the overhead - allocations are likely to have ~16 bytes
of overhead, be padded to the nearest 8 or 16 bytes, and a row has a lot of
bumpf to encode. I doubt there will be any variation in storage costs from
using all 64 bits.

i.e., whatever floats your boat

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-02 Thread Vijay (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231327#comment-14231327
]

Vijay commented on CASSANDRA-7438:
--

[~snazy] I was trying to compare the OHC and found few major bugs.

1) You have individual method synchronization on the Map, which doesn't ensure
that your get is locked before a put is performed (same with clean, hot(N),
remove etc), look at SynchronizedMap source code to do it right else will crash
soon.
2) Even after i fix it, there is correctness in the hashing algorithm i think.
Get returns a lot of error and looks like there is some memory leaks too.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-02 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231627#comment-14231627
]

Ariel Weisberg commented on CASSANDRA-7438:
---

Robert I don't seem to be getting the latest code for your work on master? For
instance the key comparison code does 8 bytes at a time and doesn't handle
trailing bytes as far as I can tell.

To Vijay's point. A pseudo-random test against the map that does say 200
million operations against a keyspace of several million entries and mirrors
the operations on a regular hash map and checks they have the same contents
periodically would be helpful in having some confidence in the map. Size it so
the LRU doesn't do anything. Print the seed at the beginning of the test so it
can be reproduced. I think this basically duplicates the benchmark, but having
it as a unit test is nice. We can tune the number of operations and keys down
for running in CI. You could also look a the unit tests for Guava's cache or
j.u.HashMap and borrow those. Nice thing about data structure APIs is that the
tests already exist.

bq. Yes, basically from JDK. Could not get that via inheritance.
What are the licensing and attribution requirements for that code?

bq. IMO hash code should be 64 bits because 32 bits might not be sufficient.
[~benedict] might have some opinions on how to get the best bits out of
MurmurHash3. 32 bits is 256-512 gigabytes of cache for 128 byte entries which
is not bad. I don't feel strongly either way since I don't know whether callers
will have the hash precomputed.

bq. Nope - would not be. But it's 2^27 (limited by a stupid constant used for
both max# of segments and max# of buckets). Worth taking a look at it - it's
weird, yes.
In OffHeapMap line 222 it seems to have a gate preventing rehashing to 2 ^ 24
buckets.

bq. (Hope I caught all of your comments)
I'll check them once you update.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-02 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231817#comment-14231817
]

Robert Stupp commented on CASSANDRA-7438:
-

[~vijay2...@yahoo.com] can you explain what kind of bugs?

bq. licensing and attribution requirements
It's already in C* code base in exactly the same way.

Also pushed some changes:
* increased max# of segment and buckets to 2^30 (means approx 1B segments times
1B bucket)
* add some prototype for direct I/O for row cache serialization (zero copy) -
just as a demo (just coded, not tested yet)
* uses Unsafe for value (de)serialization
* move (most) statistic counters to OffHeapMap to reduce contention caused by
volatile (really makes sense)
* remove use of guava cache API
* corrected and improved key comparison

Regarding the 64 bit hash. It's 64 bit since OHC takes the the most significant
bits for the segment and the least significant bits for the hash inside a
segment. Both are limited to 30 bits = 60 bits.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-02 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231878#comment-14231878
 ] 

Vijay commented on CASSANDRA-7438:
--

Never mind, my bad it was related the below (which needs to be more 
configurable instead) and the items where going missing earlier than i thought 
it should and looks you just evict the items per segment (If a segment is used 
more more items will disappear from that segment and the lest used segment 
items will remain).
{code}
// 12.5% if capacity less than 8GB
// 10% if capacity less than 16 GB
// 5% if capacity is higher than 16GB
{code}

Also noticed you don't have replace which Cassandra uses. 
Anyways i am going to stop working on this for now, let me know if someone 
wants any other info.

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-01 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230675#comment-14230675
]

Ariel Weisberg commented on CASSANDRA-7438:
---

Look pretty nice.

Suggestions:
* Push the stats into the segments and gather them the way you do free capacity
and cleanup count. You can drop the volatile (technically you will have to
synchronize on read). Inside each OffHeapMap put the stats members (and
anything mutable) as the first declared fields. In practice this can put them
on the same cache line as the lock field in the object header. It will also be
just one flush at the end of the critical section. Stats collection should be
free so no reason not to leave it on all the time.
* I am not sure batch cleanup makes sense. When inserting an item into the
cache would blow the size requirement I would just evict elements until
inserting it wouldn't. Is there a specific efficiency you think you are going
to get from doing it in batches?
* Cache is the wrong API to use since it doesn't allow lazy deserialization and
zero copy. Since entries are refcounted there is no need make a copy. Might be
something to save for later since everything upstream expects a POJO of some
sort.
* Key buffer might be worth a thread local sized to a high watermark

Do we have a decent way to do line level code review? I can't leave comments
on github unless there is a pull request. Line level stuff
* Don't catch exceptions and handle inside the map. Let them all propagate to
the caller and use try/finally to do cleanup. I know you have to wrap and
rethrow some things, but avoid where possible.
* Compare key compares 8 bytes at a time, how does it handle trailing bytes and
alignment?
* Agrona has an Unsafe ByteBuffer implementation that looks like it makes a
little better use of various intrinsics then AbstractDataOutput. Does some
other nifty stuff as well.
https://github.com/real-logic/Agrona/blob/master/src/main/java/uk/co/real_logic/agrona/concurrent/UnsafeBuffer.java
* In OffHeapMap.touch lines 439 and 453 are not covered by tests. Coverage
looks a little weird in that a lot of the cases are always hit but some don't
touch both branches. If lruTail == hashEntryAddr maybe assert next is null.
* Rename mutating OffHeapMap lruNext and lruPrev to reflect that they mutate.
In general rename mutating methods to reflect they do that such as the two
versions of first
* I don't see why the cache can't use CPU endianness since the key/value are
just copied.
* Did you get the UTF encoded string stuff from somewhere? I see something
similar in the jdk, can you get that via inheritance?
* HashEntryInput, AbstractDataOutput are low on the coverage scale and have no
tests for some pretty gnarly UTF8 stuff.
* Continuing on that theme there is a lot of unused code to satisfy the
interfaces being implemented, would be nice to avoid that.
* By hashing the key yourself you prevent caching the hash code in the POJO.
Maybe hashes should be 32-bits and provided by the POJO?
* If an allocation fails maybe throw OutOfMemoryError with a message
* If an entry is too large maybe return an error of some sort? Seems like
caller should decide if not caching is OK.
* put on allocation failure calls removeInternal, but the key doesn't appear to
be in the map yet? Is that to handle the put invalidating the previous entry?
* In put, why catch VirtualMachineError and not error? Seems like it wants a
finally, and it shouldn't throw checked exceptions.
* If a key serializer is necessary throw in the constructor and remove other
checks
* Hot N could use a more thorough test?
* In practice how is hot N used in C*? When people save the cache to disk do
they save the entire cache?
* In the value loading case, I think there is some subtlety to the concurrency
of invocations to the loader in that it doesn't call it on all of them in a
race. It might be a minor change in behavior compared to Guava.
* Maybe do the value loading timing in nanoseconds? Performance is the same but
precision is better.
* OffHeapMap.Table.removeLink(long,long) has no test coverage of the second
branch that walks a bucket to find the previous entry
* I don't think storage for 16 million keys is enough? For 128 bytes per entry
that is only 2 gigabytes. You would have to run a lot of segments which is
probably fine, but that presents a configuration issue. Maybe allow more than
24 bits of buckets in each segment?
* SegmentedCacheImpl contains duplicate code fro dereferencing and still has to
delegate part of the work to the OffHeapMap. Maybe keep it all in OffHeapMap?
* Unit test wise there are some things not tested. The value loader interface,
various things like putAll or invalidateAll.
* Release is not synchronized. Release should null pointers out so you get a
good clean segfault. Close should maybe lock and close one segment

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-12-01 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14231132#comment-14231132
]

Robert Stupp commented on CASSANDRA-7438:
-

[~aweisberg], thanks for the review :)

Some of the changes you suggested are already in. Much of the code has been
moved to OffHeapMap.
Batch cleanup has been completely removed - it's now handled inside OffHeapMap.
It makes runtime and code much nicer.

I've delayed descent unit tests to later - the stuff changed to often.
And there would be a big change when merging the stuff into C* code base,
removing all that unused code and Cache interface implementation.

All of the duplicated stuff has been removed - we don't need that - even for a
general-purpose cache it would have been not useful.

bq. Key buffer might be worth a thread local sized to a high watermark
Hm - do you mean sth like {{static final ThreadLocalKeyBuffer
perThreadBuffer;}} inside SegmentedCacheImpl?

Regarding the line level code review (it's fine the way you did it IMO):

bq. Don't catch exceptions
Done.

bq. 8 bytes at a time, how does it handle trailing bytes and alignment?
Tailing bytes: fails back to per-byte comparison. Alignment: key and value are
aligned on 8-byte boundaries.

bq. Agrona has an Unsafe ByteBuffer implementation that looks like it makes a
little better use of various intrinsics then AbstractDataOutput.
Good hint! Will definitely take a look at it!

bq. I don't see why the cache can't use CPU endianness since the key/value are
just copied.
Ah - you mean that stuff in HashEntryInput/Output. No - you can't always copy
it using unsafe API.
I don't recall exactly why I removed that optimization (had that implemented
before), but it had sth to do with data serialized for KeyBuffer and putting it
into off-heap.
But it makes sense for values (since these are always directly serialized to
off-heap).

bq. UTF encoded string stuff ... get that via inheritance?
Yes, basically from JDK. Could not get that via inheritance.

bq. hashing the key yourself ... 32-bits
Thought about it (and had that previously). Yes - if we have a good hash code,
we can use it.
But I don't know whether the calling code has a hash code.
IMO hash code should be 64 bits because 32 bits might not be sufficient.

bq. allocation fails maybe throw OutOfMemoryError
That would shut down C* daemon ;) Maybe. Not sure about that.
I think if you run into such a situation (out of off-heap/system memory) you
are completely lost.
It just ignores that put() and removes the old entry.

bq. entry is too large maybe return an error of some sort
No. The calling code cannot do anything meaningful with it. But the calling
could could check for that in advance (before constructing any object related
to caching), if it has enough information.

bq. catch VirtualMachineError and not error
done

bq. hotN()
I _think_ it is used to persist the hot set of the cache.

bq. concerned about materializing the full list on heap
Agree. Thought about patching cache off-heap addresses into DirectByteBuffer
and using that for serialization.

bq. I don't think storage for 16 million keys is enough?
Nope - would not be. But it's 2^27 (limited by a stupid constant used for both
max# of segments and max# of buckets). Worth taking a look at it - it's weird,
yes.

bq. value loading case,
Don't think we need that API.

bq. Release is not synchronized.
Yep - will do that.

(Hope I caught all of your comments)

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-30 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229284#comment-14229284
]

Robert Stupp commented on CASSANDRA-7438:
-

Have pushed the latest changes of OHC to https://github.com/snazy/ohc. It has
been nearly completely rewritten.

Architecture (in brief):
* OHC consists of multiple segments (default: 2 x #CPUs). Less segments leads
to more contention, more segments gives no measurable improvement.
* Each segment consists of an off-heap-hash-map (defaults: table-size=8192,
load-factor=.75). (The hash table requires 8 bytes per bucket)
* Hash entries in a bucket are organized in a double-linked-list
* LRU replacement policy is built-in via its own double-linked-list
* Critical sections that mutually lock a segment are pretty short (code + CPU)
- just a 'synchronized' keyword, no StampedLock/ReentrantLock
* Capacity for the cache is configured globally and managed locally in each
segment
* Eviction (or replacement or cleanup) is triggered when free capacity goes
below a trigger value and cleans up to a target free capacity
* Uses murmur hash on serialized key. Most significant bits are used to find
the segment, least significant bits for the segment's hash map.

Non-production relevant stuff:
* Allows to start off-heap access in debug mode, that checks for accesses
outside of allocated region and produces exceptions instead of SIGSEGV or
jemalloc errors
* ohc-benchmark updated to reflect changes

About replacement policy: Currently LRU is built in - but I'm not really sold
on LRU as is. Alternatives could be
* timestamp (not sold on this either - basically the same as LRU)
* LIRS (https://en.wikipedia.org/wiki/LIRS_caching_algorithm), big overhead
(space)
* 2Q (counts accesses, divides counter regularly)
* LRU+random (50/50) (may give the same result than LIRS, but without LIRS'
overhead)
But replacement of LRU with something else is out of scope of this ticket and
should be done with real workloads in C* - although the last one is just a
additional config parameter.

IMO we should add a per-table option that configures whether the row cache
receives data on reads+writes or just on reads. Might prevent garbage in the
cache caused by write heavy tables.

{{Unsafe.allocateMemory()}} gives about 5-10% performance improvement compared
to jemalloc. Reason fot it might be that JNA library (which has some
synchronized blocks in it).

IMO OHC is ready to be merged into C* code base.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-29 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228693#comment-14228693
 ] 

Benedict commented on CASSANDRA-7438:
-

Invert those two statements and the behaviour is still broken.

B: 154 :map.get()
A: 187: map.remove()
A: 191: queue.deleteFromQueue()
B: 158: queue.addToQueue()

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-29 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228871#comment-14228871
 ] 

Vijay commented on CASSANDRA-7438:
--

Should be taken care of too, it should become a duplicate delete to the queue 
and should work normally (by itemUnlinkQueue). Here is the adjusted test case 
for it.
https://github.com/Vijay2win/lruc/blob/master/src/test/java/com/lruc/unsafe/UnsafeQueueTest.java#L81

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-28 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228306#comment-14228306
]

Ariel Weisberg commented on CASSANDRA-7438:
---

Pauseless resizing is a worthy design goal, but might not be necessary if you
call it a warmup cost. I would break out the performance comparison with and
without warming up the cache so we know how it performs when you aren't
measuring the resize pauses. Those should only happen at startup when the cache
is populated.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-28 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228397#comment-14228397
 ] 

Benedict commented on CASSANDRA-7438:
-

I suspect segmenting the table at a finer granularity, so that each segment is 
maintained with mutual exclusivity, would achieve better percentiles in both 
cases due to keeping the maximum resize cost down. We could settle for a 
separate LRU-q per segment, even, to keep the complexity of this code down 
significantly - it is unlikely having a global LRU-q is significantly more 
accurate at predicting reuse than ~128 of them. It would also make it much 
easier to improve the replacement strategy beyond LRU, which would likely yield 
a bigger win for performance than any potential loss from reduced concurrency. 
The critical section for reads could be kept sufficiently small that 
competition would be very unlikely with the current state of C*, by performing 
the deserialization outside of it. There's a good chance this would yield a net 
positive performance impact, by reducing the cost per access without increasing 
the cost due to contention measurably (because contention would be infrequent).

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-28 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228515#comment-14228515
]

Ariel Weisberg commented on CASSANDRA-7438:
---

+1 To what Benedict suggests.

One minor nit. Resize pauses will happen across stripes at almost exactly the
same time. I know with say 12 stripes it's very bad. With more than that it
might start to spread them out, but I haven't seen that in action.

We can iterate on resize pause issues later if necessary. It's a warmup issue
which will be a problem for some, but might not cripple the feature.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-28 Thread Vijay (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228523#comment-14228523
]

Vijay commented on CASSANDRA-7438:
--

{quote}I would break out the performance comparison with and without warming up
the cache so we know how it performs when you aren't measuring the resize
pauses.{quote}
Yep and in stedy state it is similar to get and I have verified that the
latency is due to rehash. Better benchmarks on bug machines will be done on
Monday.

Unfortunately -1 on partitions, it will be a lot more complex and will be hard
to understand for users. If we have to expand the partitions, we have to figure
out a better consistent hashing algo. Cassandra within Cassandra may be. More
over we will end up having the current code as is to move maps and queues
offheap. Sorry I don't understand the argument of code complexity.

If we are talking about code complexity. The unsafe code is 1000 lines
including the license headers :)

The current contention topic is weather to use cas for locks. Which is showing
higher cpu cost and I agree with Pavel on latencies as shown in the numbers.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-28 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228524#comment-14228524
 ] 

Vijay commented on CASSANDRA-7438:
--

PS: all the latency spikes are in 100' of micros. It's day and night comparison 
to current cache :)

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-28 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228563#comment-14228563
]

Benedict commented on CASSANDRA-7438:
-

[~aweisberg]: In my experience segments tend to be imperfectly distributed, so
whilst there is bunching of resizes simply because they take so long, with real
work going on at the same time they should be a _little_ spread out. Though
with murmur3 the distribution may be significantly more uniform than my prior
experiments. Either way, they're performed in parallel (without coordination)
if they coincide, so it's still an improvement.

[~vijay2...@yahoo.com]: When I talk about complexity, I mean the difficulties
of concurrent programming magnified without the normal tools. For instance,
there are the following concerns:

* We have a spin-lock - admittedly one that should _generally_ be uncontended,
but on a grow or a small map this is certainly not the case, which could result
in really problematic behaviour. Pure spin locks should not be used outside of
the kernel.
* The queue is maintained by a separate thread that requires signalling if it
isn't currently performing work - which, in a real C* instance where the cost
of linking the queue item is a fraction of the other work done to service a
request means we are likely to incur a costly unpark() for a majority of
operations
* Reads can interleave with put/replace/remove and abort the removal of an item
from the queue, resulting in a memory leak.
* We perform the grow on a separate thread, but prevent all reader _or_ writer
threads from making progress by taking the locks for all buckets immediately.
* Freeing of oldSegments is still dangerous, it's just probabilistically less
likely to happen.
* During a grow, we can lose puts because we unlock the old segments, so with
the right (again, unlikely) interleaving of events a writer can think the old
table is still valid
* When growing, we only double the size of the backing table, however since
grows happen in the background the updater can get ahead, meaning we remain
behind and multiply the constant factor overheads, collisions and contention
until total size tails off.

These are only the obvious problems that spring to mind from 15m perusing the
code, I'm sure there are others. This kind of stuff is really hard, and the
approach I'm suggesting is comparatively a doddle to get right, and is likely
faster to boot.

I'm not sure I understand your concern with segmentation creating complexity
with the hashing... I'm proposing the exact method used by CHM. We have an
excellent hash algorithm to distribute the data over the segments: murmurhash3.
Although we need to be careful to not use the bits that don't have the correct
entropy for selecting a segment. It's really no more than a two-tier hash
table. The user doesn't need to know anything about this.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-28 Thread Vijay (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228569#comment-14228569
]

Vijay commented on CASSANDRA-7438:
--

{quote}The queue is maintained by a separate thread that requires
signalling{quote}
Thread is only signalled if they are not performing operation. I am lost.
{quote}resulting in a memory leak{quote}
I am 100% sure that this is not true. Can you write a test case for it to make
this happen plz?
{quote}but prevent all reader or writer threads from making progress by taking
the locks for all buckets immediately{quote}
I am sure this cannot be done, if you don't write you loose coherence and
consistency.
{quote}During a grow, we can lose puts because we unlock the old segments{quote}
test case again plz. I don't think this can happen too. I spend a lot of time
testing the exact scenario.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-28 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228575#comment-14228575
]

Benedict commented on CASSANDRA-7438:
-

bq. I am 100% sure

Never be 100% sure with concurrency, please :)

bq. test case again plz. I don't think this can happen too. I spend a lot of
time testing the exact scenario.

You have too much faith in tests. You are testing under ideal conditions - two
of the race conditions I highlighted will only rear their heads infrequently,
most likely when the system is under uncharacteristic load causing very choppy
scheduling. Analysis of the code is paramount. I will not produce a test case
as I do not have time, however I will give you an interleaving of events that
would trigger one of them.

Thread A is deleting an item, and is in LRUC.invalidate(), Thread B is looking
up the same item, in LRUC.get().
A: 187: map.remove()
B: 154 :map.get()
A: 191: queue.deleteFromQueue()
B: 158: queue.addToQueue()

In particular, addToQueue() sets the markAsDeleted flag to false, undoing the
prior work of deleteFromQueue.

bq. Thread is only signalled if they are not performing operation. I am lost.

It will generally not be performing an operation, because its work will be
faster than any of the producers can produce work in normal C* operation.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch, tests.zip

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-28 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228577#comment-14228577
 ] 

Vijay commented on CASSANDRA-7438:
--

May be you know better than me, but map.remove cannot be followed by a 
sucessful map.get because the remove is within a lock on the segment... 

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch, tests.zip


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-26 Thread Tupshin Harper (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226474#comment-14226474
 ] 

Tupshin Harper commented on CASSANDRA-7438:
---

[~xedin] I'm lost in too many layers of snark and indirection (not just yours). 
Can you elaborate on what strategy you actually find appealling?

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-26 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226516#comment-14226516
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

Some short notes about the last changes in OHC:

* changed from block-oriented allocation to Unsafe or JEMalloc (if available)
* added stamped locks in off-heap (quite simple and very efficient)
* triggering cleanup + rehash via cas-side trigger works fine
* extended the benchmark tool to specify different workload chacteristics 
(read/write ratio, key distribution, value length distribution - distribution 
code taken from cassandra-stress)
* still working on a good (mostly contention free) LRU strategy

One thing I noticed during benchmarking is that (concurrent?) allocations of 
large areas (several MB) take up to 50/60ms (OSX 10.10, 2.6GHz Core i7 - no 
swap, of course) - small regions are allocated quite fast (total roundtrip for 
a put ~0.1ms for 98 percentile). It might be viable to implement some mixture 
for memory allocation: Unsafe/JEMalloc for small regions (e.g.  1MB) and 
pre-allocated blocks for large regions. A configuration value could determine 
the amount of large region blocks to keep immediately available. Just an idea...


 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-26 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226531#comment-14226531
]

Ariel Weisberg commented on CASSANDRA-7438:
---

When are large regions being allocated? How common is the use case? Large would
normally only be for table resizing right?

Could the row cache contain very large values with wide rows?

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-26 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226552#comment-14226552
 ] 

Vijay commented on CASSANDRA-7438:
--

{quote}One thing I noticed during benchmarking is that (concurrent?){quote}
Yes, use these options, feel free to make it more configurable if you need.
{code}
public static final String TYPE = c;
public static final String THREADS = t;
public static final String SIZE = s;
public static final String ITERATIONS = i;
public static final String PREFIX_SIZE = p;
{code} 

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-26 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226795#comment-14226795
]

Ariel Weisberg commented on CASSANDRA-7438:
---

Caching entire rows of very large rows seems like a problem workload for a
variety of reasons. The overhead of repopulating each cache entry on insertion
is not good.

Does the storage engine always materialize entire rows into memory for every
query?

60 milliseconds is much longer than it takes to copy several megabytes so it is
expensive even with large rows although the rest of the cost of materializing
the row might dominate.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-26 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226776#comment-14226776
]

Robert Stupp commented on CASSANDRA-7438:
-

The row cache can contain very large rows AFAIK.
Idea is to pre-allocate some portion of the configured capacity for large
blocks - new blocks could be allocated on demand (edge-trigger).
OTOH if it stores that amount of data on a cache, that amount of time
(20...60ms) might be irrelevant compared to the time needed for serialization -
so maybe it would be wasted effort. Not sure about that.
Table resizing may take as long as it takes - I do not really bother about
allocation time for that, because no reads or writes are locked while
allocating the new partition(segment) table.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-26 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226861#comment-14226861
 ] 

Pavel Yaskevich commented on CASSANDRA-7438:


[~tupshin] Original idea of this was to get the thing we know does the job, 
which is memcached, strip out some of the unnecessary parts and pack it is a 
lib we can use over JNI, the same way snappy and others do. But now we are 
getting into a business of re-inventing things that are pretty hard to get 
right and properly test, so if the argument against having lruc in it's 
original form was that it would be hard to test/maintain that, in my opinion, 
is no longer valid.

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-26 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227248#comment-14227248
]

Jonathan Ellis commented on CASSANDRA-7438:
---

bq. The row cache can contain very large rows [partitions] AFAIK

Well, it *can*, but it's almost always a bad idea. Not something we should
optimize for. (http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1)

bq. Does the storage engine always materialize entire rows [partitions] into
memory for every query?

Only when it's pulling them from the off-heap cache.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-26 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227250#comment-14227250
]

Jonathan Ellis commented on CASSANDRA-7438:
---

Looking at the discussion, I wonder if we're overcomplicating things. I think
it got a bit lost in the noise when Ariel said earlier,

bq. I also wonder if splitting the cache into several instances each with a
coarse lock per instance wouldn't result in simpler, fast-enough code. I don't
want to advocate doing something different for performance, but rather that
there is the possibility of a relatively simple implementation via Unsafe.

Why not start with something like that and see if it's Good Enough? I suspect
that at that point other bottlenecks will be much more important, so paying a
high complexity cost to optimize the cache further would be a bad trade overall.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-25 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224611#comment-14224611
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

[~aweisberg] thanks for that write up! A lot of very good findings, ideas and 
recommendations. Already implemented some of them - short story:
* tend to move from fixed-block-allocation to {{Unsafe.alloc}} - quick 
benchmarks show similar results
* StampedLock and LongAdder in J8 are great
* will see how a to implement a better partition management and overall LRU 
story

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-25 Thread Vijay (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224904#comment-14224904
]

Vijay commented on CASSANDRA-7438:
--

{quote}
sun.misc.Hashing doesn't seem to exist for me, maybe a Java 8 issue?
StatsHolder, same AtomicLongArray suggestion. Also consider LongAdder.
{quote}
Yep, and let me find alternatives for Java 8 (and until 8 for LongAdder).
{quote}
The queue really needs to be bounded, producer and consumer could proceed at
different rates.
In Segment.java in the replace path AtomicLong.addAndGet is called back to
back, could be called once with the math already done. I believe each of those
stalls processing until the store buffers have flushed. The put path does
something similar and could have the same optimization.
{quote}
Yeah those where a oversight.
{quote}
Tasks submitted to executor services via submit will wrap the result including
exceptions in a future which silently discards them.
The library might take at initialization time a listener for these errors, or
if it is going to be C* specific it could use the wrapped runnable or similar.
{quote}
Are you suggesting a configurable logging/exception handling in case the 2
threads throw exceptions? If yes sure. Other exceptions AFAIK are already
propagated. (Still needs cleanup though).
{quote}
A lot of locking that was spin locking (which unbounded I don't think is great)
is now blocking locking. There is no adaptive spinning if you don't use
synchronized. If you are already using unsafe maybe you could do monitor
enter/exit. Never tried it.
Having the table (segments) on heap is pretty undesirable to me. Happy to be
proved wrong, but I think a flyweight over off heap would be better.
{quote}
Segments are small in memory so far in my tests, The spin lock is to make sure
the lock checks the segment if rehash happened or not, this is better than
having a seperate lock which will be central. (No different than java or
memcached).
Not sure if i understand the UNSAFE lock any example will help.
The segments are in heap mainly to handle the locking, I think we can do a bit
of CAS but global lock on rehashing will be a problem (May be an alternate
approach is required).
{quote}
It looks like concurrent calls to rehash could cause the table to rehash twice
since the rebalance field is not CASed. You should do the volatile read, and
then attempt the CAS (avoids putting the cache line in exclusive state every
time).
{quote}
Nope it is Single threaded Executor and the rehash boolean is already volatile
:)
Next commit will have conditions instead (similar to C implementation).
{quote}
If the expiration lock is already locked some other thread is doing the
expiration work. You might keep a semaphore for puts that bypass the lock so
other threads can move on during expiration. I suppose after the first few
evictions new puts will move on anyways. This would show up in a profiler if it
were happening.
{quote}
Good point… Or a tryLock to spin and check if some other thread released enough
memory.
{quote}
hotN looks like it could lock for quite a while (hundreds of milliseconds,
seconds) depending on the size of N. You don't need to use a linked list for
the result just allocate an array list of size N. Maybe hotN should be able to
yield, possibly leaving behind an iterator that evictors will have to repair.
Maybe also depends on how top N handles duplicate or multiple versions of keys.
Alternatively hotN could take a read lock, and writers could skip the cache?
{quote}
We cannot have duplicates in the Queue (remember it is a double linked list of
items in cache). Read locks q_expiry_lock is all we need, let me fix it.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-25 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225371#comment-14225371
]

Ariel Weisberg commented on CASSANDRA-7438:
---

bq. Are you suggesting a configurable logging/exception handling in case the 2
threads throw exceptions? If yes sure. Other exceptions AFAIK are already
propagated. (Still needs cleanup though).
Something has to happen to exceptions generated there. Since it is a library
and there is no caller to propagate them to it implies that people need to
provide a listener or a logger.

bq. Segments are small in memory so far in my tests,
Segments are hash buckets correct? They aren't segments of several hash
buckets. If the goal of the hash table is to have at most two or three entries
per segment then having an on heap Java object would be a lot of overhead per.
Just as a guess we are talking about two objects. There is the
Segment/ReentrantLock and then the AbstractQueuedSynchronizer allocated by
ReentrantLock which has three additional fields. It's 48 bytes without
alignment or object headers. There is also the overhead of having an
AtomicArray of pointers to each segment object. A hash table bucket only has to
be a pointer plus a lock field if you are going to lock buckets. You could do
that in 8-12 bytes.

Whether it's too much data on heap is a question of how big a cache you want
and how small the values being cached are. The smaller the values being cached
the more the metadata overhead of the cache (and the JVM overhead) matter.

Locking wise if you are only doing spin locks you can use unsafe compare and
swap to implement a lock in off heap memory. You do have to be careful about
alignment.

bq. Nope it is Single threaded Executor and the rehash boolean is already
volatile. Next commit will have conditions instead (similar to C
implementation).
The task submitted to the executor doesn't check whether another rehash is
required it just does it. The check before submitting a task to do rehashing
appears to have a race where two threads could submit the task at the same
time. There is no isolation between the threads as they read the volatile field
and then write to it. Two or more threads could read and see that no rehash is
in progress, update the value to rehash in progress, and then submit the task.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-25 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225453#comment-14225453
 ] 

Vijay commented on CASSANDRA-7438:
--

{quote}
Segments are hash buckets correct? 
{quote}
Yes and the way memcached and lruc is does the rehashing is based on this 
algorithm, and hence yes... That was the argument earlier about JNI based 
solution. (Also another reason I was talking about of configurable hash 
expansion capability in my previous comment)
{code}
unsigned long current_size = cas_incr(stats.hash_items, 1);
if (current_size  (hashsize(hashpower) * 3) / 2) {
assoc_start_expand();
}
{code}
if we don't like the constant overhead of the cache in heap and If you are 
talking about CAS which we already do for ref counting, as mentioned before we 
need an alternative strategy for global locks for rebalance if we go with lock 
less strategy.

{quote}
The task submitted to the executor doesn't check whether another rehash is 
required it just does it.
{quote}
Until you complete a rehash you don't know if you need to hash again or not... 
Am i missing something?

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-25 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225478#comment-14225478
]

Ariel Weisberg commented on CASSANDRA-7438:
---

bq. if we don't like the constant overhead of the cache in heap and If you are
talking about CAS which we already do for ref counting, as mentioned before we
need an alternative strategy for global locks for rebalance if we go with lock
less strategy.
Just take what you have and do it off heap. You don't need to change anything
about how locking is done, just put the segments off heap so each segment would
be a 4-byte lock field and an 8 byte pointer to the first entry.

bq. Until you complete a rehash you don't know if you need to hash again or
not... Am i missing something?
https://github.com/Vijay2win/lruc/blob/master/src/main/java/com/lruc/unsafe/UnsafeConcurrentMap.java#L38

The check on line 38 races with the assignment on line 39. N threads could do
the check and think a rehash is necessary. Each would submit a rehash task and
the table size would be doubled N times instead of 1 time.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-25 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225531#comment-14225531
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

bq. alignment requirements for 4 or 8 byte CAS

Intel P6: must be 8-byte aligned (if cross cache line) or unaligned (if in same 
cache line) - but: _However, it is recommend that locked accesses be aligned on 
their natural boundaries for better system performance_
(https://stackoverflow.com/questions/1415256/alignment-requirements-for-atomic-x86-instructions).
Side note: heh - there's even support for 128bit atomic operations (cmpxchg16b) 
- but where's the primitive for that in Java... :(

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-25 Thread Vijay (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225532#comment-14225532
]

Vijay commented on CASSANDRA-7438:
--

{quote}
so each segment would be a 4-byte lock
{quote}
Are you talking about setting Just setting 1 for lock and 0 for unlock?
Hmmm, alright thats doable... i am guessing you have already seen how
ReentrantLock implements locking.

{quote}
The check on line 38 races with the assignment on line 39.
{quote}
I thought we discussed this already... Yeah that was suppose to take care by
this comment Next commit will have conditions instead (similar to C
implementation)., have not committed it yet :)

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-25 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225592#comment-14225592
]

Ariel Weisberg commented on CASSANDRA-7438:
---

bq. I thought we discussed this already... Yeah that was suppose to take care
by this comment Next commit will have conditions instead (similar to C
implementation)., have not committed it yet
Sorry my bad.

Reentrant lock is just a counter of the number of acquisitions. You could do an
8-byte lock field, store the thread-id in the first 4-byte sand the counter in
the next 4-bytes. We probably want these to be 8-byte aligned so they don't
cross cache lines.

bq. Are you talking about setting Just setting 1 for lock and 0 for unlock?
Hmmm, alright thats doable... i am guessing you have already seen how
ReentrantLock implements locking.
You could do it in 8-bytes since pointers are actually only six bytes. The two
higher order bytes are just the highest order bit sign extended on current
Intel processors. CAS the pointer and use the highest order bit to represent
locked/unlocked.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-25 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225616#comment-14225616
 ] 

Pavel Yaskevich commented on CASSANDRA-7438:


I guess the only thing that we haven't yet re-invented in Cassandra would be an 
off-heap lock based on architecture specific details, great that we finally 
about to change this historical injustice. {color:red}This all definitely 
sounds a lot more reasonable than having C code as a dependency.{color}

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-24 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222799#comment-14222799
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

rehashing: growing (x2) is already implemented, shrinking (/2) shouldn't be a 
big issue, too. The implementation only locks the currently processed 
partitions during rehash.
put operation: fixed (was definitely a bug), cleanup is running concurrently 
and trigger on out of memory condition
block sizes: will give it a try (fixed vs. different sizes vs. variable sized 
(no blocks))
per-partition locks: already thought about it - not sure whether it's worth the 
additional RW-lock overhead since partition lock time is very low during normal 
operation
metrics: some (very basic) metrics are already in it - will add some more timer 
metrics (configurable)

[~vijay2...@yahoo.com] can you catch {{OutOfMemoryError}} for Unsafe.allocate() 
? It should not go up the whole call stack.

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-24 Thread Pavel Yaskevich (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222802#comment-14222802
]

Pavel Yaskevich commented on CASSANDRA-7438:

bq. per-partition locks: already thought about it - not sure whether it's worth
the additional RW-lock overhead since partition lock time is very low during
normal operation

It depends on the operation mode if there are e.g. 75% reads and 25% writes it
makes more sense to use locks, because RW lock is going to be optimized by JVM
to CAS operation when there is no contention, anyhow it's a valid test to do
with different modes to check CAS vs. RW.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-24 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223479#comment-14223479
]

Ariel Weisberg commented on CASSANDRA-7438:
---

I think that for caches the behavior you want to avoid most is slowly growing
heap. People hate that because it's unpredictable and they don't know when it's
going to stop. You can always start with jemalloc and get the feature working
and then iterate on memory management.

Fixed block sizes is a baby and bath water scenario to get the desirable fixed
memory utilization property. When you want to build everything out of fixed
size pages you have to slot the pages or do some other internal page management
strategy so you can pack multiple things and rewrite pages as they fragment.
You also need size tiered free lists and fragmentation metadata for pages so
you can find partial free pages. That kind of thing only makes sense in ye
olden database land where rewriting an already dirty page is cheaper than more
IOPs. In memory you can relocate objects.

Memcached used to have the problem that instead of the heap growing the cache
would lose capacity to fragmentation. FB implemented slab rebalancing in their
fork, and then Memcached did its own implementation. The issue was internal
fragmentation due to having too many of the wrong size slabs.

For Robert
* Executor service shutdown, never really got why it takes a timeout nor why
there is no blocking version. 99% of the time if it doesn't shutdown within the
timeout it's a bug and you don't want to ignore it. We are pedantic about
everything else why not this? It's also unused right now.
* Stats could go into an atomic long array with padding. It really depends on
the access pattern. You want data that is read/written at the same time on the
same cache line. These are global counters so they will be contended by
everyone accessing the cache, better that they only have to pull in one cache
line with all counters then multiple and have to wait for exclusive access
before writing to each one. Also consider LongAdder.
* If you want to do your own memory management strategy I think something like
segregated storage as in boost pool with size tiers for powers of two and power
of two plus previous power of two. You can CAS the head of the free list for
each tier to make it thread safe, and lock when allocating out a new block
instead of the free list. This won't adapt to changing size distributions. For
that stuff needs to be relocatable
* I'll bet you could use a stamped lock pattern and readers might not have to
lock all. I think getting it working with just a lock is fine.
* I am not sure shrinking is very important? The table is pretty dense and
should be a small portion of total memory once all the other memory is
accounted for. You would need a lot of tiny cache entries to really bloat the
table and then the population distribution would need to change to make that a
waste.
* LRU lists per segment seems like it's not viable. That isn't a close enough
approximation to LRU since we want at most two or three entries per partition.
* Some loops of very similar byte munging in HashEntryAccess
* Periodic cleanup check is maybe not so nice. An edge trigger via a CAS field
would be nicer and move that up to 80% since on a big-memory machine that is
a lot of wasted cache space. Walking the entire LRU could take several seconds,
but if it is amortized across a lot of expiration maybe it is ok.
* Some rehash required checking is duplicated in OHCacheImpl

For Vijay
* sun.misc.Hashing doesn't seem to exist for me, maybe a Java 8 issue?
* The queue really needs to be bounded, producer and consumer could proceed at
different rates. With striped
* Tasks submitted to executor services via submit will wrap the result
including exceptions in a future which silently discards them. The library
might take at initialization time a listener for these errors, or if it is
going to be C* specific it could use the wrapped runnable or similar.
* A lot of locking that was spin locking (which unbounded I don't think is
great) is now blocking locking. There is no adaptive spinning if you don't use
synchronized. If you are already using unsafe maybe you could do monitor
enter/exit. Never tried it.
* It looks like concurrent calls to rehash could cause the table to rehash
twice since the rebalance field is not CASed. You should do the volatile read,
and then attempt the CAS (avoids putting the cache line in exclusive state
every time).
* StatsHolder, same AtomicLongArray suggestion. Also consider LongAdder.
* In Segment.java in the replace path AtomicLong.addAndGet is called back to
back, could be called once with the math already done. I believe each of those
stalls processing until the store buffers have flushed. The put path does
something similar and could have the same

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-23 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222365#comment-14222365
]

Robert Stupp commented on CASSANDRA-7438:
-

I've spent some evenings on an alternative approach for an off heap row cache,
too.
It uses a different concept and architecture.
* Based on a big hash table
* Each hash partition (segment) has a reference to an LRU linked list to hash
entries. Each get operation moves the accessed entry to the head of the LRU
linked list.
* Data memory is divided into uniform blocks (few kB) and managed by multiple
(8) free-block linked lists. Just one big memory allocation during
initialization. Pro: no fragmentation of free memory, easier to handle. Con:
fragmentation of data.
* Proactive eviction with the goal to keep a percentage of memory free.
* Put operation (currently) fails, if there's not enough memory available to
store the data. Idea is not to block the calling code (don't put additional
latency on an overloaded system)
* Locks (CAS based) exist on each hash partition, each hash entry and each free
list and are held as short as possible (e.g. put allocates data blocks, fills
these with the data of the new entry, acquires the lock on the hash partition,
updates the LRU linked list pointers and finishes)
* To keep the linked lists on each hash partition (segment) short, large hash
tables should be used
* No rehash yet - could be manageable by locking one hash partition at once and
split it into two new partitions (more logic, but no global lock).
* No overhead in JVM heap for the cache itself (although accesses require short
lived objects for serialization)
* Only stolen thing is Vijay's benchmark (asked him before ;) ).

Pushed here: https://github.com/snazy/ohc - more descriptive Readme, too

Other ideas:
* If we have off heap data, it might be possible to (de)serialize the hot set
directly to/from that off heap data (zero-copy I/O). At the cost of changing
the on-disk data format.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-23 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222660#comment-14222660
 ] 

Pavel Yaskevich commented on CASSANDRA-7438:


Personally I like what Vijay did a bit more just because main ideas where taken 
from the memcached which is proven to be working fine for the majority of the 
use-cases and is pretty simple inside.

Regarding Robert's implementation I have few comments which he'll have to 
address (if not already) before I would consider this for inclusion:

- rehashing is must have, we want to grow/shrink caches based on usage to 
lessen burden on users trying to size it appropriately from day 1;
- if put operation fails it should at least invalidate previously inserted 
value if any, and probably kick-off maintenance activities like LRU cleanup 
and/or rehashing;
- Fixed size data block create a lot of allocation slop which could be 
sometimes take majority of allocate memory (e.g. Firefox had that problem), 
cache should at least have blocks of different sizes to minimize that;
- would be great to have benchmarks for per-partition CAS vs. per-partition RW 
lock in different operation modes, cache invalidation could be noticeable 
factor for performance as well as CAS-races;
- metrics (if not yet added).

Also based on discussion [~snazy] had with [~vijay2...@yahoo.com], I would 
avoid using DirectByteBuffer because they are a problematic to GC.


 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-22 Thread Vijay (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1495#comment-1495
]

Vijay commented on CASSANDRA-7438:
--

Alright the first version of pure Java version of LRUCache pushed,
* Basically a port from the C version. (Most of the test cases pass and they
are the same for both versions)
* As ariel mentioned before we can use disruptor for the ring buffer but this
doesn't use it yet.
* Expiry in the queue thread is not implemented yet.
* Algorithm to start the rehash needs to be more configurable and based on the
capacity will be pushing that soon.
* Overhead in JVM heap is just the segments array.

https://github.com/Vijay2win/lruc/tree/master/src/main/java/com/lruc/unsafe

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-06 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199976#comment-14199976
]

Robert Stupp commented on CASSANDRA-7438:
-

Debugging C code via JNI and debugging Unsafe code on large data structures is
a nightmare.
And a simple, stupid bug in both kinds can quickly let the JVM core dump.

Advantage for the Unsafe approach is that all OS are directly supported.
Advantage for the JNI approach is that the code that handles the data
structures is much easier to read.

Proposal:
* Extract the changes that support pluggable ICacheProvide from this ticket
to a separate ticket and commit that stuff
* Let Vijay continue his work on this one
* Provide an alternative implementation using Unsafe
* Let both implementations compete in some long running tests
This is much effort to do - but I don't know how to validate either solution
theoretically.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-05 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198368#comment-14198368
]

Jonathan Ellis commented on CASSANDRA-7438:
---

Here's me in July:

bq. I'm not wild about taking on the complexity of building and distributing
native libraries if we have a reasonable alternative. Vijay what do we win with
the native approach over using java unsafe?

The objection at the time was,

bq. The win is that we can now have Caches which can be bigger than JVM with
zero GC overhead on the items. Unsafe approach will hold the references in
memory and the overhead on them is reasonably high compared to the native
approach (example of it is an integer key's) and in addition if we use hash map
we have segments with locks (also there the references in the queue), so it is
not a straight forward approach either.

... but as Ariel said, we can use the same technique to hold references
off-heap with Unsafe, as with JNI. Am I missing something?

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-05 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14198523#comment-14198523
 ] 

Vijay commented on CASSANDRA-7438:
--

Alright looks like the objection is not on the design but the language choice, 
if i knew the implementation details it would have been a easier choice in the 
first place (the argument earlier was that we don't have a way to lock and use 
the queue easier), for example the map vs queue etc The thing which we are 
missing is 4 months of dev, testing and reviewers time :). 

Its alright let me give it a shot and after all we have an alternative to 
benchmark on.

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-04 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196296#comment-14196296
]

Ariel Weisberg commented on CASSANDRA-7438:
---

.bq No we don't. We have locks per Segment, this is very similar to lock
stripping/Java's concurrent hash map.
Thanks for clearing that up

.bq Not really we lock globally when we reach 100% of the space and we freeup
to 80% of the space and we spread the overhead to other threads based on who
ever has the item partition lock. It won't be hard to make this part of the
queue thread and will try it for the next release of lruc.

OK, that make sense. 20% of the cache could be many milliseconds of work if you
are using many gigabytes of cache. That's not a great thing to foist on a
random victim thread. If you handed that to the queue thread, well I think you
run into another issue which is that the ring buffer doesn't appear to check
for queue full? The queue thread could go out to lunch for a while. Not a big
deal, but finer grained scheduling will probably be necessary.

.bq If you look at the code closer to memcached. Actually I started of
stripping memcached code so we can run it in process instead of running as a
separate process and removing the global locks in queue reallocation etc and
eventually diverged too much from it. The other reason it doesn't use slab
allocators is because we wanted the memory allocators to do the right thing we
already have tested Cassandra with Jemalloc.

Ah very cool.

jemalloc is not a moving allocator where as it looks like memcached slabs
implement rebalancing to accommodate changes in size distribution. That would
actually be one of the really nice things to keep IMO. On large memory systems
with a cache that scales and performs you would end up dedicating as much RAM
as possible to the row cache/key cache and not the page cache since the page
cache is not as granular (correct me if the story for C* is different). If you
dedicate 80% of RAM to the cache that doesn't leave a lot of space left for
fragmentation. By using a heap allocator you also lose the ability to implement
hard predictable limits on memory used by the cache since you didn't map it
yourself. I could be totally off base and jemalloc might be good enough.

.bq There is some comments above which has the reasoning for it (please see the
above comments). PS: I believe there was some tickets on Current RowCache
complaining about the overhead.
I don't have a performance beef with JNI, especially the way you have done
which I think is pretty efficient. I think the overhead of JNI (one or two
slightly more expensive function calls) would be eclipsed by things like the
cache misses, coherence, and pipeline stalls that are part of accessing and
maintaining a concurrent cache (Java or C++). It's all just intuition without
comparative microbenchmarks of the two caches. Java might look a little faster
just due to allocator performance, but we know you pay for that in other ways.

I think what you have made scratches the itch for a large cache quite well, and
beats the status quo. I don't agree that Unsafe couldn't do the exact same
thing with no on heap references.

The hash table, ring buffer, and individual item entries are all being malloced
and you can do that from Java using Unsafe. You don't need to implement a ring
buffer because you can use Disruptor. I also wonder if splitting the cache into
several instances each with a coarse lock per instance wouldn't result in
simpler, and I know performance is not an issue, fast enough code. I don't want
to advocate doing something different for performance, but rather that there is
the possibility of a relatively simple implementation via Unsafe.

You could coalesce all the contended fields for each instance (stats, lock
field, LRU head) into a single cache line, and then rely on a single barrier
when releasing a coarse grained lock. The fine grained locking and CASing
results in several pipeline stalls because the memory barriers that are
implicit in each one require the store buffers to drain. There may even be a
suitable off heap map implementation out there already.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

Currently SerializingCache is partially off heap, keys are still stored in
JVM heap as BB,
* There is a higher GC costs for a reasonably big cache.
* Some users have used the row cache

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-04 Thread Vijay (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196492#comment-14196492
]

Vijay commented on CASSANDRA-7438:
--

{quote}
well I think you run into another issue which is that the ring buffer doesn't
appear to check for queue full?
{quote}
Yeah i thought about it, we need to handle those and thats why didn't have it
in the first place. Should not be really bad though.
{quote}
I don't agree that Unsafe couldn't do the exact same thing with no on heap
references
{quote}
Probably, since we figured most of the implementation detail sure we can but
still there is always many different ways to solve the problem (Even though it
will be in efficient to copy multiple bytes to get to the next items in map
etc... GC and CPU overhead would be more IMHO). For example Memcached used
expiration time set by the clients to remove the items which made it easier for
them to do the slab allocator but this is something we removed it in lruc and
just a queue.
{quote}
I also wonder if splitting the cache into several instances each with a coarse
lock per instance wouldn't result in simpler
{quote}
The problem there is how will you invalidate the last used items, since they
are different partitions you really don't know which ones to invalidate...
there is also a problem of load balancing when to expand the buckets etc which
will bring us back to the current lock stripping solutions IMHO.

I can do some benchmarks if thats exactly what we need at this point Thanks!

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-04 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196885#comment-14196885
]

Jonathan Ellis commented on CASSANDRA-7438:
---

bq. There is some comments above which has the reasoning for [why JNI is
justified]. PS: I believe there was some tickets on Current RowCache
complaining about the overhead.

Aren't all those objections to the current design and not to Unsafe per se?

Adding native libraries + JNI is a pretty huge step in build, QA, and runtime
complexity. I'd like to avoid it if at all possible.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-04 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197029#comment-14197029
 ] 

Vijay commented on CASSANDRA-7438:
--

{quote}
Aren't all those objections to the current design
{quote}
I am fine to make it configurable and maintain it in a separate project but 
i didn't realize that was the case.

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-03 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195418#comment-14195418
]

Ariel Weisberg commented on CASSANDRA-7438:
---

RE refcount:

I think hazard pointers (never used them personally) are the no-gc no-refcount
way of handling this. It also won't be fetched twice if it is uncontended which
in many cases it will be since it should be decrefd as soon as the data is
copied.

I think that with the right QA work this solves the problem of running
arbitrarily large caches. That means running a validating workload in
continuous integration that demonstrates the cache doesn't lock up, leak, or
return the wrong answer. I would probably test directly against the cache to
get more iterations in.

RE Implementation as a library via JNI:

We give up something by using JNI so it only makes sense if we get something
else in return. The QA and release work created by JNI is pretty large. You
really need a plan for running something like Valgrind or similar against a
comprehensive suite of tests. Valgrind doesn't run well with Java AFAIK so you
end up doing things like running the native code in a separate process, and
have to write an interface amenable to that. Valgrind is also slow enough that
if you try and run all your tests against a configuration using it a lot you
end up with timeouts and many hours to run all the tests plus time spent
interpreting results.

Unsafe is worse in that respect because there is no Valgrind and I can attest
that debugging an off-heap red-black tree is not fun.

I am not clear on why the JNI is justified. It really seems like this could be
written against Unsafe and then it would work on any platform. There are no
libraries or system calls in use that are only accessible via JNI. I think JNI
would make more sense if we were pulling in existing code like memcached that
already handles memory pooling, fragmentation, and concurrency. If it were in
Java you could use Disruptor for the queue and would only need to implement a
thread safe off heap hash table.

RE Performance and implementation:

What kind of hardware was the benchmark run on? Server class NUMA? I am just
wondering if there are enough cores to bring out any scalability issues in the
cache implementation.

It would be nice to see a benchmark that showed the on heap cache falling over
while the off heap cache provides good performance.

Subsequent comments aren't particularly useful if performance is satisfactory
under relevant configurations.

Given the use of a heap allocator and locking it might not make sense to have a
background thread do expiration. I think that splitting the cache into several
instances with one lock around each instance might result in less contention
overall and it would scale up in a more straightforward way.

It appears that some common operations will hit a global lock in may_expire()
quite frequently? It seems like there are other globally shared frequently
mutated cache lines in the write path like stats.

Is there something subtle in the locking that makes the use of the custom queue
and maps necessary or could you use stuff from Intel TBB and still make it
work? It is hypothetically less code to have to QA and maintain.

I still need to dig more, but I am also not clear on why locks are necessary
for individual items. It looks like there is a table for all of them? Random
intuition is that it could be done without a lock or at least a discrete lock.
Striping against a padded pool of locks might make sense if that isn't going to
cause deadlocks. Apparently every pthread_mutex_t is 40 bytes according to a
random stack overflow post. It might make sense to use the same cache line as
the refcount to store a lock field, or the bucket in the hash table?

Another implementation question is do we want to use C++11? It would remove a
lot of platform and compiler specific code.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-03 Thread Vijay (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195679#comment-14195679
]

Vijay commented on CASSANDRA-7438:
--

Thanks for reviewing!
{quote}
I am also not clear on why locks are necessary for individual items.
{quote}
No we don't. We have locks per Segment, this is very similar to lock stripping
or the smiler to Java's concurrent hash map.
{quote}
global lock in may_expire() quite frequently?
{quote}
Not really we lock globally when we reach 100% of the space and we freeup to
80% of the space and we spread the overhead to other threads based on who ever
has the item partition lock. It won't be hard to make this part of the queue
thread and will try it for the next release of lruc.
{quote}
What kind of hardware was the benchmark run on?
{quote}
32 core 100GB RAM with numa and intel xeon. There is a benchmark util which is
also checked in as a part of the lruc code which does exactly the same kind of
test.
{quote}
You really need a plan for running something like Valgrind
{quote}
Good point, I was part way down that road and still have the code i can
resuruct it for the next lruc version.
{quote}
I am not clear on why the JNI is justified
{quote}
There is some comments above which has the reasoning for it (please see the
above comments). PS: I believe there was some tickets on Current RowCache
complaining about the overhead.
{quote}
I think JNI would make more sense if we were pulling in existing code like
memcached
{quote}
If you look at the code closer to memcached. Actually I started of stripping
memcached code so we can run it in process instead of running as a separate
process and removing the global locks in queue reallocation etc and eventually
diverged too much from it. The other reason it doesn't use slab allocators is
because we wanted the memory allocators to do the right thing we already have
tested Cassandra with Jemalloc.

To confort a bit lruc is running in our production already :)

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-02 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193760#comment-14193760
 ] 

Vijay commented on CASSANDRA-7438:
--

Pushed, Thanks!
{quote}
We should ensure that changes in the serialized format of saved row caches are 
detected
{quote}
I don't think we changed the format, did i?
{quote}
 item.refcount - it refcount is updated, the whole cache line needs to be 
re-fetched (CPU)
{quote}
The refcount is per item in the cache, for every item inserted we track this in 
its memory location. 

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-02 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193785#comment-14193785
]

Robert Stupp commented on CASSANDRA-7438:
-

bq. I don't think we changed the format, did i?
Ah - no. Sorry - got confused with the in-memory serialization.

bq. item.refcount
What I mean is the the (Intel) CPU L1+L2 cache line size (64 bytes). If
'refcount' is updated (e.g. just for a cache-get), the whole cache line is
invalidated (twice) and needs to be re-fetched from RAM although its content
did not change. It's just a point for optimization - if we find a viable
solution for that, we should implement it.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-11-01 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193074#comment-14193074
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

bq. total 1 byte profit
bikeshed : could be two if serialized unencoded :)
I took some time to see whether it could be some lhf to change that - but it 
isn't really (because there are some uses of {{DataIn/Output(Plus)}} that would 
need to be changed, too - and it is used widely - even (if I saw that 
correctly) in SSTables (the point at which I stopped investigating ;) )

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-10-31 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192874#comment-14192874
 ] 

Jonathan Ellis commented on CASSANDRA-7438:
---

[~aweisberg], would be useful to get your take on this too.

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-10-31 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14192922#comment-14192922
 ] 

Pavel Yaskevich commented on CASSANDRA-7438:


I looked trough the patch and everything looks good, but one small thing:

- FBUtilities.newRowCacheProvider needs it's argument renaming, because it 
looks like it has been copied from FBUtilities.newPartitioner and has old 
names, so instead of paritioner it should be rowCacheClassName and rowCache 
as the last argument for FBUtilities.construct(...).

I just want to address Robert's comment regarding 
EncodedData{Input/Output}Stream: I agree that longs returned by version 1 UUID 
are not that compressible and vint is actually going to add 1 byte on top of 
long (which is pretty easy to test), but the good thing is that although we 
loose 2 bytes in long serialization we actually win back at least 2 bytes by 
vint encoding length of the key and, in best case, if key size is less than 127 
(which is highly likely) we are actually going to win 3 bytes which makes in 
total 1 byte profit from encoding :)

 

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-10-30 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14190711#comment-14190711
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

Did a walkthrough to lruc 0.7, too...

Altogether +1 on the current state :)

Just one NIT:
* Move {{Preconditions.checkArgument(capacity  0 ...}} in {{LRUCache.java}} 
from {{capacity()}} to {{setCapacity}}

One thing regarding saved row caches: We should ensure that changes in the 
serialized format of saved row caches are detected (and either converted during 
load or just discarded)

Comments would be nice to have in a future version.

* Think you need to add the APLv2 license header to all source files ;)
* The NEWS, COPYING and AUTHORS files in {{lruc/src/native}} and {{lurk}} are 
blank
* {{stats}} struct is heavily used using CAS - maybe think of aligning the 
individual values to separate CPU cache lines to reduce CPU cache refreshes
* similar for {{item.refcount}} - it refcount is updated, the whole cache line 
needs to be re-fetched (CPU)
* {{o.a.c.cache.ICacheProvider.RowKeySerializer}} tries to „compress“ the two 
{{long}} values of UUID via 
{{EncodedDataOutputStream}}/{{EncodedDataInputStream}} - this is usually not 
possible for long values of a UUID resulting in bigger serialized 
representations than necessary (this is what the default serialization e.g. 
UUIDSerializer does)


 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-10-29 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189224#comment-14189224
]

Robert Stupp commented on CASSANDRA-7438:
-

LGTM - but some comments:
* the comments in cassandra.yaml could be more fleshy (see below)
* the version of lruc.jar should not be a SNAPSHOT version and a bit higher
than 0.0.1 (although it's just a number, people usually don't trust something
with a '0' in front :) ) - [lruc
repo|https://github.com/Vijay2win/lruc/commits/master] shows v0.7 as current
version - recommend to use the latest lruc release in C*
* would be very nice to have these released on maven central
* after lruc-0.7 is used for this ticket, we should run a stress test against a
cluster using OffheapCacheProvider as some kind of smoke test

{code}
# Number of keys from the row cache to save.
# Disabled by default, meaning all keys are going to be saved.
# row_cache_keys_to_save: 100

# Row cache provider to use.
# Possible values are SerializingCacheProvider and OffheapCacheProvider.
# Default is no row cache.
#
# SerializingCacheProvider is the one used in previous versions of Cassandra.
# It is available on all platforms and uses offheap memory for the rows but
# structures on the Java heap to manage the offheap row data.
#
# OffheapCacheProvider is new in Cassandra 3.0 and only available on
# Unix platforms (Linux and OSX).
# It uses a native code library to manage the whole row cache including
# management information in native memory thus reducing heap
# pressure compared to SerializingCacheProvider.
#
# row_cache_provider: SerializingCacheProvider
{code}

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-10-29 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189364#comment-14189364
 ] 

Vijay commented on CASSANDRA-7438:
--

rebased, pushed with latest binaries.
{quote}
the comments in cassandra.yaml could be more fleshy (see below)
{quote}
Sorry my bad missed it before and thanks for the write up i just copied it into 
the fork
{quote}
 recommend to use the latest lruc release in C*
{quote}
Yeah i did setup release and publishing to maven central, few weeks ago.


 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-10-15 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14172078#comment-14172078
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

Will take a look at this this week.

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-10-06 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160044#comment-14160044
 ] 

Vijay commented on CASSANDRA-7438:
--

Pushed most of the changes to 
https://github.com/Vijay2win/cassandra/commits/7438, not sure moving the tests 
and code into cassandra code base (since i am really neutral on that). Other 
related changes, tests and refactor is pushed as a part of 3 main commits in 
https://github.com/Vijay2win/lruc/commits/master.
cc [~xedin]

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-10-03 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14157989#comment-14157989
 ] 

Jonathan Ellis commented on CASSANDRA-7438:
---

Are you still working on this, [~vijay2...@gmail.com]?

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-10-03 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158005#comment-14158005
 ] 

Vijay commented on CASSANDRA-7438:
--

Hi Jonathan, yes, i am adding more tests and fixing a test failure to lruc 
going to post the patch soon. 

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-09-23 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145338#comment-14145338
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

(note: [~vijay2...@gmail.com], please use the other nick)

Some quick notes:
* Can you add the assertion for {{capacity = 0}} to 
{{OffheapCacheProvider.create}} - the current error message if 
{{row_cache_size_in_mb}} is not set (or invalid) capacity should be set could 
be more fleshy
* Additionally the {{capacity}} check should also check for negative values (it 
starts with a negative value - don't know what happens if it is negative...)
* {{org.apache.cassandra.db.RowCacheTest#testRowCacheCleanup}} fails at the 
last assertion - all other unit tests seem to work
* Documentation in cassandra.yaml for row_cache_provider could be a bit more 
verbose - just some abstract about the characteristics and limitation (e.g. 
Offheap does only work on Linux + OSX) of both implementations
* IMO it would be fine to have a general unit test for 
{{com.lruc.api.LRUCache}} in C* code, too
* Please add an adopted copy of {{RowCacheTest}} for OffheapCacheProvider
* unit tests using OffheapCacheProvider must not start on Windows builds - 
please add an assertion in OffHeapCacheProvider to assert that it runs on Linux 
or OSX

Sorry for the late reply

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-09-22 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144260#comment-14144260
 ] 

Vijay commented on CASSANDRA-7438:
--

Hi [~rst...@pironet-ndh.com], I dont see a problem in copying the code or 
rewriting the code, once you complete the rest of the review we can see what we 
can do. I am guessing you where not waiting for my response :) Thanks!

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-07-31 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14080708#comment-14080708
]

Robert Stupp commented on CASSANDRA-7438:
-

bq. Yeah it works in unix, but the problem is i don't have a handle since its a
temp file after restart. So it is a best effort for cleanups.

It's a really sick problem. I changed our Snappy integration a similar way. IMO
there's no better solution than messing the temp dir.

bq. The problem is it produces a circular dependency

Ah - I meant that lruc code is copied to C* code base (if the others agree).
But this could be a second step since it's only a bit of refactoring.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-07-27 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075611#comment-14075611
]

Robert Stupp commented on CASSANDRA-7438:
-

[~vijay2...@gmail.com] do you have a C* branch with lruc integrated? Or: what
should I do to bring lruc+C* together? Is the patch up-to-date?

I've pushed a new branch 'native-plugin' with the changes for
native-maven-plugin - separate from the other code. Windows stuff is bit more
complicated - it doesn't compile. Have to dig a bit deeper. Maybe delay Win
port...

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-07-27 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075630#comment-14075630
 ] 

Robert Stupp commented on CASSANDRA-7438:
-

Surely not a complete list, but a start...

Java code:

* com.lruc.util.Utils.getUnsafe can be safely deleted
* com.lruc.util.Utils.extractLibraryFile
** should check return code of {{File.createNewFile}}
** Call to {{File.delete()}} for the extracted library file should be added to 
{{com.lruc.util.Utils.loadNative}} since unclean shutdown (kill -9) does not 
delete the so/dylib file. Possible for Unix systems - but not for Win.
* Classes com.lruc.jni.lruc, SWIGTYPE_p_item and SWIGTYPE_p_p_item are unused 
(refactoring relict?)
* Generally the lruc code could be more integrated in C* code.
** Let the lruc classes implement org.apache.cassandra.io.util.DataOutputPlus 
and java.io.DataInput so that they can be directly used by C* 
ColumnFamilySerializer (no temporary Input/OutputStreams necessary).
** Maybe {{DataOutputPlus.write(Memory)}} can be removed in C* when lruc is 
used - not sure about that.
** Implement most DataInput/Output methods in EntryInput/OutputStream to 
benefit from Unsafe (e.g. Unsafe.getLong/putLong) - have seen, that you've 
removed Abstract... some weeks ago ;)
** Using Unsafe for DataInput/Output of short/int/long/float/double has the 
drawback that Unsafe always uses the system's byte order - not (necessarily) 
the portable Java byte order.  There's of course no drawback, if all 
reads/writes are paired.
** {{Unsafe.copyMemory}} could be used for {{write(byte[])}}/{{read(byte[])}}.
* Naming of max_size, capacity - should use one common term which also makes 
sure that it's a maximum memory size - e.g. max_size_bytes. _Capacity_ is often 
used for the number of elements in a collection.
* Memory leak: {{com.lruc.api.LRUCache.hotN}} may keep references in native 
code (no {{lruc_deref}} calls), if not all items are retrieved from the 
iterator - remove _hotN_ or return an array/list instead?
* Generally I think all classes can be merged into a single package if only a 
few a are left (see above)

C code:

* {{#define item_lock(hv) while (item_trylock(hv)) continue;}} shouldn't there 
be something like a _yield_ ?
* Seems like the C code was not cleaned up after you began using 
Unsafe.allocateMemory :)
* I did not follow all possible code paths (due to the previous point)

Common:

* {{prefix_delimiter}} seems to be unused

Altogether I like that :)


 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-07-27 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075676#comment-14075676
 ] 

Vijay commented on CASSANDRA-7438:
--

Pushed the branch to https://github.com/Vijay2win/cassandra/tree/7438

{quote}Maybe delay Win port{quote}
We should be fine, lruc is configurable with the Serialization cache.
{quote}unclean shutdown (kill -9) does not delete the so/dylib file{quote}
Yeah it works in unix, but the problem is i don't have a handle since its a 
temp file after restart. So it is a best effort for cleanups.
{quote}SWIGTYPE_p_item and SWIGTYPE_p_p_item are unused {quote}
Auto generated and can be removed but will be generated every time swig is run.
{quote}Generally the lruc code could be more integrated in C* code{quote}
The problem is it produces a circular dependency, please look at 
df3857e4b9637ed6a5099506e95d84de15bf2eb7 where i removed those (the DOSP added 
back will still need to wrapped around by Cassandra's DOSP).
{quote}Naming of max_size, capacity{quote}
Yeah let me make it consistent, the problem was i was trying to fit everything 
into Guava interface.
{quote}remove hotN or return an array/list instead{quote}
Or may be do memcpy on keys, since this doesn't need optimization (will fix).
{quote}shouldn't there be something like a yield{quote}
Actually i removed it recently adding or removing doesn't give much performance 
gains, as a good citizen should add it back.
{quote}Seems like the C code was not cleaned up{quote}
This cannot be removed and needed for test cases.

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-07-26 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14075516#comment-14075516
 ] 

Vijay commented on CASSANDRA-7438:
--

{quote}
unsafe.memoryAllocate instead and replicate what we do with lruc_item_allocate()
{quote}
Done, Thanks!

 Serializing Row cache alternative (Fully off heap)
 --

 Key: CASSANDRA-7438
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
 Environment: Linux
Reporter: Vijay
Assignee: Vijay
  Labels: performance
 Fix For: 3.0

 Attachments: 0001-CASSANDRA-7438.patch


 Currently SerializingCache is partially off heap, keys are still stored in 
 JVM heap as BB, 
 * There is a higher GC costs for a reasonably big cache.
 * Some users have used the row cache efficiently in production for better 
 results, but this requires careful tunning.
 * Overhead in Memory for the cache entries are relatively high.
 So the proposal for this ticket is to move the LRU cache logic completely off 
 heap and use JNI to interact with cache. We might want to ensure that the new 
 implementation match the existing API's (ICache), and the implementation 
 needs to have safe memory access, low overhead in memory and less memcpy's 
 (As much as possible).
 We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-07-25 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14074452#comment-14074452
]

Robert Stupp commented on CASSANDRA-7438:
-

[~jbellis] Yes, I can review.

I agree with [~vijay2...@yahoo.com] - Unsafe can only work when big regions are
allocated. Then use own malloc/free implementations to manage these big
memory regions which are split into small blocks. On top of that we need to
implement a concurrent map that stores data only in off-heap memory. I think
we can manage that, but it takes time - need to prevent synchronization, use
CAS, prevent fragmentation (best-effort).

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-07-24 Thread Robert Stupp (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072935#comment-14072935
]

Robert Stupp commented on CASSANDRA-7438:
-

my username on github is snazy

Do you know {{org.codehaus.mojo:native-maven-plugin}}? It allows JNI
compilation on almost all platforms directly from Maven and does not interfere
with SWIG - have used it on OSX, Linux and Win.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

2014-07-24 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072950#comment-14072950
]

Benedict commented on CASSANDRA-7438:
-

bq. Not sure what we are talking about, this == lurc? if yes the RB is
fronting the queue so we don't need a global lock.

I was referring to [~rst...@pironet-ndh.com]'s assertion of the need for some
kind of memory management - you use no tools that aren't available through
unsafe/NativeAllocator was my only point.

Serializing Row cache alternative (Fully off heap)
--

Attachments: 0001-CASSANDRA-7438.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

1 2 >

1 - 100 of 117 matches

Mail list logo