[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946318#comment-13946318
 ] 

Benedict commented on CASSANDRA-6689:
-

Agreed. Pushed updated version to branch, which also modifies the 
FileUtils.clean() code to be the neater direct cast, and to check for the 
possibility by actually attempting a clean of a DirectByteBuffer.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-25 Thread Tupshin Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946494#comment-13946494
 ] 

Tupshin Harper commented on CASSANDRA-6689:
---

+1

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-25 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946986#comment-13946986
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

You good with the latest, Pavel?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-25 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947001#comment-13947001
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Can we at least make return code  0 so automated tools can interpret this 
correctly? 

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-25 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947020#comment-13947020
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

Sure.  Can you do that on commit?  I think Benedict is gone for the night.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-25 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947123#comment-13947123
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Sure, I will do -1 return there.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-25 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13947219#comment-13947219
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

Thanks [~benedict] [~xedin] [~krummas]!

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-24 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945906#comment-13945906
 ] 

Benedict commented on CASSANDRA-6689:
-

I've updated the repository to include your remaining changes.

One last change I did not include was your suppression of errors from calling 
the direct cleaner: IMO this is dangerous, and the user should immediately find 
out that we do not have full support for off-heap functionality by crashing, so 
that they can reconfigure to use the on-heap allocator only. If we cannot 
clean, we are at risk of completely trashing the file system page cache through 
competition with our direct buffers. Better to fail badly ASAP.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-24 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945949#comment-13945949
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


bq. One last change I did not include was your suppression of errors from 
calling the direct cleaner: IMO this is dangerous, and the user should 
immediately find out that we do not have full support for off-heap 
functionality by crashing, so that they can reconfigure to use the on-heap 
allocator only. If we cannot clean, we are at risk of completely trashing the 
file system page cache through competition with our direct buffers. Better to 
fail badly ASAP.

First of all it would not fail badly ASAP because that cause ClassCast or 
NoSuchMethod exception, which would just get logged but the executing thread. 
It's better to error to the log with something useful e.g. that the cleanup is 
not going to happen rather then fails with ClassCastException which doesn't say 
much to the users the same way we do for FileUtils.

[~jbellis] Can you remind us what is the general policy on the crashing? :)

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-24 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13945955#comment-13945955
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


After thinking about this once again, it would be better if we can detect (e.g. 
FileUtils.isCleanerAvailable()) that on startup and just fail over to the heap 
allocation, also we can reuse FileUtils.clean to accept ByteBuffer...

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-24 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13946168#comment-13946168
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

bq. it would be better if we can detect (e.g. FileUtils.isCleanerAvailable()) 
that on startup and just fail over to the heap allocation

That's along the lines of what I'd expect as a user.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-22 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944136#comment-13944136
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. One more thing I forgot to mention which is for consideration, we might 
take advantage of SlabAllocator (on-heap) to accommodate all of the temporary 
buffers created by copying from Memtable so when that allocation spills to 
oldgen it would create less fragmentation in there.

This is likely to require as much complexity as RefAction to ensure resource 
cleanup, at which point we might as well just have proper zero copy.

As to your suggested changes, I am fine with most of the generics changes, but 
-1 on several of your renames and your config note changes: 
unslabbed_heap_buffers is *not* intended to be included in the config file, as 
it is very much discouraged. Including slab in the name adds nothing, whereas 
buffers permits other non-buffer implementations in future without confusion in 
naming/renaming. I might change offheap_buffers to direct_buffers for 
consistency with the JDK though.

heapAllocOnly is uglier and no better than allocateOnHeap. If you want, make it 
allocateOnHeapOnly. 

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-22 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944176#comment-13944176
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. unslabbed_heap_buffers is not intended to be included in the config file

I realise that my comments in the config file do imply a non-slabbed allocation 
method, but I propose this is fixed instead of renaming + including the 
unslabbed option.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-22 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944213#comment-13944213
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


bq. This is likely to require as much complexity as RefAction to ensure 
resource cleanup, at which point we might as well just have proper zero copy.

I has thinking of thing as easy as replacing HeapAllocator.instance with 
SlabAllocator.instance (which would only have a heap allocation enabled).

bq. As to your suggested changes, I am fine with most of the generics changes, 
but -1 on several of your renames and your config note changes: 
unslabbed_heap_buffers is not intended to be included in the config file, as it 
is very much discouraged. Including slab in the name adds nothing, whereas 
buffers permits other non-buffer implementations in future without confusion in 
naming/renaming. I might change offheap_buffers to direct_buffers for 
consistency with the JDK though.

If you have unslabbed_heap_buffers in your code then you better document it in 
the config, because people would find it and create a issue for that. Rather 
from that I don't care much about names as soon as they present what they do - 
{on}heap_slab - means that those buffers are going to be allocated in the 
on-heap slabs, same goes for offheap_slab (I was thinking of using 
direct_slab but all of the code has it as offheap), jvm_default is just a 
default way of allocating single buffers.

bq. heapAllocOnly is uglier and no better than allocateOnHeap. If you want, 
make it allocateOnHeapOnly.

I'm fine with allocateOnHeapOnly if you like big names better, just 
allocateOnHeap doesn't coalesce with a usage of the variable.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-22 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944214#comment-13944214
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. because people would find it and create a issue for that

My understanding (not my policy) is that this is explicitly not a default 
supported option (in the current state of affairs) because of the expectation 
of increased heap fragmentation and GC death spiral, which is why I'm happy 
with the ugly name for that one option.

bq. I has thinking of thing as easy as replacing HeapAllocator.instance with 
SlabAllocator.instance (which would only have a heap allocation enabled).

In this case what is to be gained? Seems likely it would only increase the need 
for old gen collections, as we wouldn't be able to do any reuse?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-22 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944217#comment-13944217
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


bq. My understanding (not my policy) is that this is explicitly not a default 
supported option (in the current state of affairs) because of the expectation 
of increased heap fragmentation and GC death spiral, which is why I'm happy 
with the ugly name for that one option.

So we can go both ways - remove it completely from the allocation type enum and 
don't support it or leave it in place and add it to documentation so users can 
make informed decision weather to use it or not meanwhile default would still 
be heap_slab or similar name.

bq. In this case what is to be gained? Seems likely it would only increase the 
need for old gen collections, as we wouldn't be able to do any reuse?

It could perform the same function as previous heap slab, for par new it 
doesn't make much different but if slab slips to the old get then it would 
create less fragmentation comparing to individual buffers getting promoted 
(plus we can pack some fresh allocations inside), but that's just an idea and 
isn't critical for this ticket...

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-22 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944247#comment-13944247
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

bq. So we can go both ways - remove it completely from the allocation type enum 
and don't support it or leave it in place and add it to documentation 

Why are those our only choices?  It's the way it is for a reason; for a very 
few users, it's legitimately useful for running on the Azul JVM or for creating 
huge numbers of CFs, but for mainstream users it's a Very Bad Idea.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-22 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944262#comment-13944262
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


[~jbellis] Sorry if I want clear, by 'it' i mean unslabbed_buffers or 
jvm_default as I call it. I think we should mention it as legitimate option 
in the config instead of having that it only in the code...

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-22 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944310#comment-13944310
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

Right, I'm saying that it's undocumented right now (CASSANDRA-5935) and I'd 
rather leave it that way.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-22 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944314#comment-13944314
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Ok, I don't like that we don't mention things like that in the config with big 
fat warning, but if you guys think opposite (as you have more context) I am 
fine with leaving it out... 

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-21 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942954#comment-13942954
 ] 

Sylvain Lebresne commented on CASSANDRA-6689:
-

bq. Reviewing under the assumption that 1-3 will go in 2.1 and the rest into 3.0

So at the risk of adding more debate, I do think one of the point of separating 
things is also that we can evaluate things separately. Typically, I would be 
happy to see a quick evaluation of what #1+#2 gives us, and what #3 adds to the 
table. If 1+2 is borderline unusable without 3 (and if that's the case, it 
should be easy to cook up simple graph that shows it), then fine, let's do 3 in 
2.1. But if it turns out that the difference is no that big, I would personally 
be keen to move #3 to 3.0, cause we're shoving enough big patches in 2.1 post 
beta1 as it is imo. To put it another way, if we're making steps, let's try to 
be true to them, and not make too strong assumptions about what goes where, 
because as far as I'm concerned, nothing should never be set in stone until 
it's committed (and even then really).

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-21 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942958#comment-13942958
 ] 

Benedict commented on CASSANDRA-6689:
-

Can everyone please settle on what functionality is desired for this 
ticket/release, so I do not need to keep reorganising it. I am in the middle of 
rebasing and reorganising it again, but I will wait until some consensus is 
reached. If we only want #2 (#1 is useless on its own), the patch set will be 
quite different; I will drop all of the refactoring. If we plan to include #3 
(note that this quite provably reduces fixed on-heap overhead by a minimum of 
75%, and easily  90% for clustering columns), I will retain #1 with some minor 
tweaks, and do my best to further make the commits incremental, but I am not 
going to completely reorder everything for no good reason.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-21 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942966#comment-13942966
 ] 

Sylvain Lebresne commented on CASSANDRA-6689:
-

Well, this was kind of my point. If we decide to make steps that are properly 
defined and self contained (which is my understanding of what people are asking 
here, and what I though was agreed on truly), then we do not have to decide 
right away if we sign for step 2 by taking step 1. Otherwise, what's the point 
of separating it in steps?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-21 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942968#comment-13942968
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. what's the point of separating it in steps?

The point here was to make review easier. This is one single ticket. If we want 
to split it out into multiple tickets, please let's agree that and then I'll do 
it.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-21 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942995#comment-13942995
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

That is a good way to think about it -- we are asking for patches or patchsets 
that are self contained enough to be split out into a ticket apiece if 
necessary.  If the divisions I proposed are a poor fit for that, by all means 
let's come up with ones that make better sense.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-21 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943005#comment-13943005
 ] 

Benedict commented on CASSANDRA-6689:
-

FTR, this is unlikely to improve review, as ridding ourselves of #1 means #3 
will simply be much bigger.

However it seems this is the consensus, so proceeding. This may take some time. 
I will reopen CASSANDRA-6694 to take step #2, and have this ticket take #1.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-21 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943285#comment-13943285
 ] 

Benedict commented on CASSANDRA-6689:
-

Patch for this ticket available 
[here|https://github.com/belliottsmith/cassandra/tree/iss-6689-final]

I have made absolutely no unnecessary changes or refactors. There is one commit 
only, as there was no good way to make any of the changes for this specific 
commit whilst leaving it in a compilable state. For CASSANDRA-6694 I will split 
the patch up into multiple commits, but this is likely to take quite some time.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-21 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943299#comment-13943299
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

Thanks! That looks fairly straightforward now.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-21 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943787#comment-13943787
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


One more thing I forgot to mention which is for consideration, we might take 
advantage of SlabAllocator (on-heap) to accommodate all of the temporary 
buffers created by copying from Memtable so when that allocation spills to 
oldgen it would create less fragmentation in there.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941498#comment-13941498
 ] 

Marcus Eriksson commented on CASSANDRA-6689:


[~xedin] it is here: https://github.com/belliottsmith/cassandra/tree/iss-6689-3

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942480#comment-13942480
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


[~benedict] I looked at the 
[#1|https://github.com/belliottsmith/cassandra/tree/iss-6689-1] and I have a 
concern... Can you elaborate why did you introduce all these Impl static 
classes? That looks especially interesting in context of Cell.Impl where e.g. 
ExpiringCell.Impl extends Cell.Impl all of the methods of which are static 
and then overrides Cell.Impl methods with it's own implementations, why can't 
this be solved by making Cell and descendants abstract classes and overriding 
required methods? 

Another example I see is in DecoratedKey - only Impl method used outside is 
Impl.compareTo(IPartitioner, ByteBuffer, RowPosition), which could be changed 
(moved) to be a static method of RowPosition and would call 
DecorateKey.compareTo(DecoratedKey) when needed, so there is no jump from 
Impl.compareTo(IPartitioner, ByteBuffer, RowPosition) to 
DecorateKey.compareTo(RowPosition) and back to Impl.compareTo(DecoratedKey, 
RowPosition).

One more nit thing, in DecoratedKey.java there is no need to mark token() and 
key() explicitly abstract, also token() is already defined in RingPosition so 
no need to declare it in DecoratedKey.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942490#comment-13942490
 ] 

Benedict commented on CASSANDRA-6689:
-

See my comment explaining 
[here|https://issues.apache.org/jira/browse/CASSANDRA-6694?focusedCommentId=13904708page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13904708]

bq. One more nit thing, in DecoratedKey.java there is no need to mark token() 
and key() explicitly abstract, also token() is already defined in 
RingPosition so no need to declare it in DecoratedKey.

The latter is an artefact of merging the getToken() and token() into the same 
method, and I agree. Although there's no harm in either, happy to change them.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942521#comment-13942521
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


How many times do I have to ask for the same thing in this ticket, *let's keep 
the changes to the point of the current ticket, please*?... As we don't need 
Impl for this ticket (if we need it at all in that way is the topic of the 
separate discussion, did we at least test how much do we actually safe by doing 
it that way? I think [~jbellis] with me on this one after reading 
CASSANDRA-6694) let's remove it, the less code we have to read the faster 
review process would go. So if the NativeCell is going to be introduced in #3 
that means all of that Impl stuff should go there too.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942528#comment-13942528
 ] 

Benedict commented on CASSANDRA-6689:
-

This ticket includes everything up to and including #3, so we DO need it for 
this ticket.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942550#comment-13942550
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Let me repeat this again, if we need Impl for #3 it should be in #3 not #1.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942553#comment-13942553
 ] 

Benedict commented on CASSANDRA-6689:
-

That is not repeating yourself, that is a different statement. You said 
anything not needed for this ticket. I introduced #1 specifically, prior to 
starting the split, as introducing all of the necessary refactors. I do not 
intend to change it again now, it has been waiting long enough.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942573#comment-13942573
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

Benedict, 1-3 were intended to be distinct steps so that review could be 
correspondingly simplified.  It does look to me like there is no clean 
breakdown as such in your branch, e.g. commits named in progress muddy the 
water considerably.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942576#comment-13942576
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Here is the quote from my previous comment:

_As we don't need Impl for this ticket before #3 (if we need it at all in that 
way is the topic of the separate discussion, did we at least test how much do 
we actually safe by doing it that way? I think Jonathan Ellis with me on this 
one after reading CASSANDRA-6694) let's remove it, the less code we have to 
read the faster review process would go. *So if the NativeCell is going to be 
introduced in #3 that means all of that Impl stuff should go there too*._

So first of all it's not clear if we even want to move in Impl direction as 
mentioned in CASSANDRA-6694, secondly we all agree that we are going to keep 
this to the point with roadmap we have so each of the points is individually 
deliverable/commitable, which essentially means less work for everybody and the 
ticket moves on faster, thirdly it looks like 4 people are reaching to one 
target reflected at least in Jonathan's roadmap where we want to get off-heap 
memtables with on-heap read side copy and so on _incrementally_, and you are 
just trying to do as such refactoring/changes as possible in all three of the 
bullet points, it looks that way because iss-6689-{1,2,3} all have at least 
*3K* additions and *2K* deletions, concerns about the size of those changes 
have already been raised by multiple people multiple times and yet we are still 
at the same spot with this.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942623#comment-13942623
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Just for the reference, [a good example for the patch-set done 
right|https://github.com/iamaleksey/cassandra/compare/apache:cassandra-2.1...6506]
 all commits are to the point and built on top of each other, even thought it's 
~4K total I can review whole thing quickly because there is no noise and it's 
efficient to traceback in isolated changes.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-19 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940231#comment-13940231
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


[~jbellis] I am, sorry, I got mixed up in something else but I'm going to look 
at it mid this week (tomorrow or thursday).

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-19 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940983#comment-13940983
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


[~benedict] So what is the branch to look at for #1? 
https://github.com/belliottsmith/cassandra/tree/iss-6689-1 still has DataGroup 
inside, which should have been removed after Marcus's review, shouldn't it?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-18 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940202#comment-13940202
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

Are you still planning to look at this soon, Pavel?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-12 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931656#comment-13931656
 ] 

Benedict commented on CASSANDRA-6689:
-

Pushed an updated version with your changes merged, the cassandra.yaml 
reverted, and a theoretical deadlock risk eliminated in KeysSearcher and 
CompositesSearcher

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-12 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932277#comment-13932277
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Thanks, [~benedict]! I'm going to take a look at this tomorrow (i hope).

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930239#comment-13930239
 ] 

Benedict commented on CASSANDRA-6689:
-

Uploaded a fixed version of #3 to the same repository, to fix a couple of bugs 
spotted by [~krummas]

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-11 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930396#comment-13930396
 ] 

Marcus Eriksson commented on CASSANDRA-6689:


Reviewing under the assumption that 1-3 will go in 2.1 and the rest into 3.0, 
otherwise there is a some of stuff in #1 that should be in #3 etc, but leaving 
that aside for now.

Main point when reviewing was that I found myself trying to wrap my head around 
the Group concept several times, especially since it is not actually adding any 
functionality at this stage (I know it will when we do GC). We should probably 
remove it since it adds indirection that we don't need right now. Pushed a 
branch with the DataGroup and various Group classes in o.a.c.u.memory removed 
here: https://github.com/krummas/cassandra/commits/bes/6689-3.1 , wdyt?



 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930404#comment-13930404
 ] 

Benedict commented on CASSANDRA-6689:
-

[~krummas]:

It will make me sad, but you're absolutely right, it isn't necessary just yet.

Only thing to question is the change of default in conf/cassandra.yaml, but 
guessing this is a debugging oversight.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-08 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924780#comment-13924780
 ] 

Benedict commented on CASSANDRA-6689:
-

Note we will probably want to either include CASSANDRA-6781 at this point, or 
introduce another change to copy all records on-heap during memtable flush, 
else we may see the slower flush performance Marcus indicated.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-07 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923770#comment-13923770
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. One native Buffer each for name and value.

Probably we'll stick to a Buffer per name component here, rather than per name, 
for simplicity at this step.

This timeline translates to mostly what I suggested already, with a little 
tweaking, so I'll work with that (I have a new step 1, though):

# I will introduce the - .data refactor as my starting point, and refactors of 
.memory in 6694, all ignoring any Native implementations;
# Introduce a NativeByteBufferAllocator, and always copy onto heap if reading 
from it
# Introduce NativeCell, and a simplified NativeAllocator (i.e. without any 
magic to reclaim it)
# Introduce RefAction + RefAction driven reclaim
# Introduce GC


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-07 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923907#comment-13923907
 ] 

Benedict commented on CASSANDRA-6689:
-

Okay, so #1 is now 
[here|https://github.com/belliottsmith/cassandra/tree/iss-6689-1]

It also includes necessary patches not yet incorporated from CASSANDRA-6692, 
and redefines CellName.dataSize() to return the size on disk, not the size of 
the components, but otherwise only performs all the necessary refactoring.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-07 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924194#comment-13924194
 ] 

Benedict commented on CASSANDRA-6689:
-

#2 is now [here|https://github.com/belliottsmith/cassandra/tree/iss-6689-2]

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-07 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924471#comment-13924471
 ] 

Benedict commented on CASSANDRA-6689:
-

[#3|https://github.com/belliottsmith/cassandra/tree/iss-6689-3]

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-07 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924711#comment-13924711
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

Thanks, Benedict.  Let's have Marcus and/or Pavel review those and create new 
tickets for 4/5.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922259#comment-13922259
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. Well, it seems like you never operated a real Cassandra cluster, did you? 

You seem to have interpreted my query as an attack on the veracity of your 
statement. It was not. I only wanted more specific facts that could be used to 
target a solution, and preferably a new ticket on which to discuss them.

This discussion has all the hallmarks of approaching unproductivity, so after 
this I do not think I have anything useful to add to the discussion, and will 
leave the committers to decide whether or not to include this work or to wait 
for you to produce your alternative:

# Any scheme that copies data will inherently incur larger GC pressure, as we 
then copy for memtable reads as well as disk reads. Object overhead is in fact 
_larger_ than the payload for many workloads, so even if we have arenas this 
effect is not eliminated or even appreciably ameliorated.
# Temporary reader space (and hence your approach) is *not* predictable: it is 
not proportional to the number of readers, but to the number and size of 
columns the readers read. In fact it is larger than this, as we probably have 
to copy anything we *might* want to use (given the way the code is 
encapsulated, this is what I do currently when copying on-heap - anything else 
would introduce notable complexity), not just columns that end up in the result 
set.
# We appear to be in agreement that your approach has higher costs associated 
with it. Further, copying potentially GB/s of (randomly located) data around 
destroys the CPU cache, reduces peak memory bandwidth by inducing strobes, 
consumes bandwidth directly, wastes CPU cycles waiting for the random lookups; 
all to no good purpose. We should be reducing these costs, not introducing more.
# It is simply not clear, despite your assertion of clarity, how you would 
reclaim any freed memory without separate GC (what else is GC but this 
reclamation?), however you want to call it, when it will be interspersed with 
non-freed memory, nor how you would guard the non-atomic copying (ref-counting, 
OpOrder, Lock: what?). Without this information it is not clear to me that it 
would be any simpler either.
# Your approach is currently (still poorly defined) vaporware.

Some further advantages specific to my approach:
# Pauseless operation, so improved predictability
# Absolute bound on memory utilisation, that can be rolled out to other data 
structures, further improving overall performance predictability
# Lock-freedom and low overhead, so we move closer to being able to answer 
queries directly from the messaging threads themselves, improving latency and 
throughput

An alternative approach needs, IMO, to demonstrate a clear superiority to the 
patch that is already available, especially when it will incur further work to 
produce. It is not clear to me that your solution is superior in any regard, 
nor any simpler. It also seems to be demonstrably less predictable and more 
costly, so I struggle to see how it could be considered preferable.

Also: 
bq. would that keep memtable around longer than expected

I'm not sure why you suppose this would be so. We can already happily reclaim 
any subportion of a region or memtable, so there is no reason to think this 
would be necessary, even if they resided in the same structure.

bq. there seems to be a low once off-heap feature is enabled which is no 
surprise once you look at how much complexity does it actually add.

This is certainly addressable. The off-heap feature by itself I have 
performance tested somewhat, and competes with Java GC for throughput (beating 
it as number of live objects increases), whilst being _pauseless_, so the 
complexity you refer to is no slouch and highly unlikely to be the culprit. 
There are issues with the way we manage IO for direct byte buffers, but I have 
addressed these in CASSANDRA-6781.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-06 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922830#comment-13922830
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


You probably still don't understand my point so let me clarify, I only care 
about 3 things: maintainability, consistency, performance. This is a big chunk 
of code which somebody has to maintain, which allows of inconsistent style (can 
do it with refferer = null or maybe somehow else situation, maybe put RefAction 
as null in the argument or maybe RefAction.impossible() but it really need to 
look throught it to make sure that it check for null everywhere and so on), 
and brings it's own assumptions e.g. _ in front, also adding a poor 
performance.  Before that is addressed, I'm -1 of this. vnodes were a big chunk 
of work but people were able to split into roadmap and successfully finish, so 
I don't see any reason why we can't do it here.

bq. Any scheme that copies data will inherently incur larger GC pressure, as we 
then copy for memtable reads as well as disk reads. Object overhead is in fact 
larger than the payload for many workloads, so even if we have arenas this 
effect is not eliminated or even appreciably ameliorated.

For disk reads we have to copy even for mmap, so we don't keep any references 
on deletion time and files can be safely deallocated. So why don't copy 
directly to the memory allocated by the pool?... Object overhead would stay 
inside ParNew bounds (for ( p999)) so object allocation is relatively cheap 
comparing to everything else, that's the goal of JVM as a whole.

bq. Temporary reader space (and hence your approach) is not predictable: it is 
not proportional to the number of readers, but to the number and size of 
columns the readers read. In fact it is larger than this, as we probably have 
to copy anything we might want to use (given the way the code is encapsulated, 
this is what I do currently when copying on-heap - anything else would 
introduce notable complexity), not just columns that end up in the result set.

Doesn't matter how many emphasises you put here it won't make it this argument 
stronger because, as the main idea is to have those pools of a fixed size which 
would create back-pressure to client in the situations of heavy load which is 
exactly what operators want - go gradually slower without extreme latency 
disturbance.

bq. We appear to be in agreement that your approach has higher costs associated 
with it. Further, copying potentially GB/s of (randomly located) data around 
destroys the CPU cache, reduces peak memory bandwidth by inducing strobes, 
consumes bandwidth directly, wastes CPU cycles waiting for the random lookups; 
all to no good purpose. We should be reducing these costs, not introducing more.

Let's say we live in the modern NUMA world, so we are going to do the following 
pin the group threads to CPU cores so we have fixed scope of allocation of 
different things, that why there is no significant bus pressure for copy among 
other things JVM/Cassandra does with memory (not even significant cache 
coherency traffic).

bq. It is simply not clear, despite your assertion of clarity, how you would 
reclaim any freed memory without separate GC (what else is GC but this 
reclamation?), however you want to call it, when it will be interspersed with 
non-freed memory, nor how you would guard the non-atomic copying (ref-counting, 
OpOrder, Lock: what?). Without this information it is not clear to me that it 
would be any simpler either.

The same way as jemalloc or any other allocator does it, it least that is not 
reinventing the wheel.

bq. Pauseless operation, so improved predictability

What do you mean by this, we still leave on the JVM, do we not? Also what would 
it do in the low memory situation? allocate from heap? wait? This is not 
pauseless operation.

bq. Lock-freedom and low overhead, so we move closer to being able to answer 
queries directly from the messaging threads themselves, improving latency and 
throughput

We won't be able to answer queries directly from the messaging threads for the 
number of reasons not even indirectly related to your approach, at least for 
not breaking SEDA, which also supposed to be a safe guide for over utilization.

bq. An alternative approach needs, IMO, to demonstrate a clear superiority to 
the patch that is already available, especially when it will incur further work 
to produce. It is not clear to me that your solution is superior in any regard, 
nor any simpler. It also seems to be demonstrably less predictable and more 
costly, so I struggle to see how it could be considered preferable.

Overall, I'm not questioning the idea of being able to track what goes where 
would be great, I'm questioning implementation and trade-offs comparing to 
other approaches.




 Partially Off Heap Memtables
 

  

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922861#comment-13922861
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. Before that is addressed, I'm -1 of this

These are already addressed in CASSANDRA-6694.

bq. Object overhead would stay inside ParNew bounds (for ( p999)) 

The more we rely on staying within ParNew, the more often we are going to 
exceed it; and reducing the number of ParNew runs is also a good thing. You 
said you have 300ms ParNew pauses, occuring every second? So reducing the max 
latency and total latency is surely a good thing?

bq.  as the main idea is to have those pools of a fixed size

How does this work without knowing the maximum size of a result set? We can't 
have a client block forever because we didn't provide enough room in the pools. 
Potentially we could have it error, but this seems inelegant to me, when it can 
be avoided. It also seems a suboptimal way to introduce back pressure, since it 
only affects concurrent reads / large reads. We should raise a ticket 
specifically to address back pressure, IMO, and try to come up with a good all 
round solution to the problem.

bq. Let's say we live in the modern NUMA world, so we are going to do the 
following pin the group threads to CPU cores so we have fixed scope of 
allocation of different things, that why there is no significant bus pressure 
for copy among other things JVM/Cassandra does with memory

It would be great to be more NUMA aware, but this is not about traffic over the 
interconnect, but simply with the arrays/memory banks themselves, and doesn't 
address any of the other negative consequences. You'll struggle to get more 
than a few GB/s bandwidth out of a modern CPU given that we are copying object 
trees (even shallow ones - they're still randomly distributed), and we don't 
want to waste any of that if we can avoid it

bq. What do you mean by this, we still leave on the JVM, do we not? Also what 
would it do in the low memory situation? allocate from heap? wait? This is not 
pauseless operation.

I did not mean to imply pauseless globally, but the memory reclaim operations 
introduced here are pauseless, thus reducing pauses overall, as whenever we 
would have had a pause from ParNew/FullGC to reclaim, we would not here.

bq. We won't be able to answer queries directly from the messaging threads for 
the number of reasons not even indirectly related to your approach, at least 
for not breaking SEDA, which also supposed to be a safe guide for over 
utilization.

I'm not sure why you think this would be a bad thing. It would only help for 
CL=1, but we are often benchmarked using this, so it's an important thing to be 
fast on if possible, and there are definitely a number of our users who are 
okay with CL=1 for whom faster responses would be great. Faster query answering 
should reduce over-utilisation, assuming some back-pressure built in to 
MessagingService or the co-ordinator managing its outstanding proxied requests 
to ensure it isn't overwhelmed by the responses.

bq. The same way as jemalloc or any other allocator does it, it least that is 
not reinventing the wheel.

Do you mean you would use jemalloc for every allocation? In which case there 
are further costs incurred for crossing the JNA barrier so frequently, almost 
certainly outweighing any benefit to using jemalloc. Otherwise we would need to 
maintain free-lists ourselves, or perform compacting GC. Personally I think 
compacting GC is actually much simpler.



 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-06 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922983#comment-13922983
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


bq. These are already addressed in CASSANDRA-6694.

Is there a branch/patch to see all of the changes involved?

bq. The more we rely on staying within ParNew, the more often we are going to 
exceed it; and reducing the number of ParNew runs is also a good thing. You 
said you have 300ms ParNew pauses, occuring every second? So reducing the max 
latency and total latency is surely a good thing?

I'm not trying to imply that we should rely on the ParNew, I'm just saying that 
all of the read/write requests are short lived enough to stay inside young 
generation region, even if we slip the effect are masked to all other long term 
allocations that we do which get promoted.

bq. How does this work without knowing the maximum size of a result set? We 
can't have a client block forever because we didn't provide enough room in the 
pools. Potentially we could have it error, but this seems inelegant to me, when 
it can be avoided. It also seems a suboptimal way to introduce back pressure, 
since it only affects concurrent reads / large reads. We should raise a ticket 
specifically to address back pressure, IMO, and try to come up with a good all 
round solution to the problem.

Let the users specify directly or if not specified just take a guess based on 
total system memory, plus we can add an option to extend in the run time, of 
any problem that uses database there is a capacity planing stage and use-case 
spec or at least experimentation which would allow to size pools correctly.

bq. I did not mean to imply pauseless globally, but the memory reclaim 
operations introduced here are pauseless, thus reducing pauses overall, as 
whenever we would have had a pause from ParNew/FullGC to reclaim, we would not 
here.

Sorry but I still don't get it, do you mean lock-free/non-blocking or that it 
does no syscalls or something similar? But that doesn't matter for pauses as 
much as allocation throughput and fragmentation of Java GC.

bq. I'm not sure why you think this would be a bad thing. It would only help 
for CL=1, but we are often benchmarked using this, so it's an important thing 
to be fast on if possible, and there are definitely a number of our users who 
are okay with CL=1 for whom faster responses would be great. Faster query 
answering should reduce over-utilisation, assuming some back-pressure built in 
to MessagingService or the co-ordinator managing its outstanding proxied 
requests to ensure it isn't overwhelmed by the responses.

The fact is that we have SEDA at least as a first line of defense for 
over-utilization, so local reads a scheduled directly a different stage, we 
shouldn't be trying to do anything directly in the messaging stage, it adds 
another complications not related to this very ticket.

bq. Do you mean you would use jemalloc for every allocation? In which case 
there are further costs incurred for crossing the JNA barrier so frequently, 
almost certainly outweighing any benefit to using jemalloc. Otherwise we would 
need to maintain free-lists ourselves, or perform compacting GC. Personally I 
think compacting GC is actually much simpler.

As I mentioned, there is a jemalloc implementation in Netty project already 
which is pure Java, so we at least should consider it before trying to 
re-invent.

bq. It would be great to be more NUMA aware, but this is not about traffic over 
the interconnect, but simply with the arrays/memory banks themselves, and 
doesn't address any of the other negative consequences. You'll struggle to get 
more than a few GB/s bandwidth out of a modern CPU given that we are copying 
object trees (even shallow ones - they're still randomly distributed), and we 
don't want to waste any of that if we can avoid it

I'm still not sure how worse it would make the things, Java is the worst of 
cache locality with it's object placement anyway but we are not going to be 
copying deep trees. Let me outline the steps that I want to see to be taken to 
make incremental, which is how we usually do things for Cassandra project: 

# code an off-heap allocator or use existing one like on of the ByteBufAlloc 
implementations (evaluate new vs. existing);
# Change memtables to use allocator for step #1 and copy data to heap buffers 
when it's read from memtable so it's easy to track the life time of buffers;
# Do an extensive testing to check how horrible the copy really is for 
performance, find ways to optimize;
# If everything is bad, switch from copy to reference tracking (in all of the 
commands, native protocol etc.);
# Do an extensive testing to check if it improves the situation;
# Change serialization/deserialization to use new allocator (pooled buffer 
instead of always allocating on heap);
# Same as 

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923029#comment-13923029
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. Is there a branch/patch to see all of the changes involved?

Yes, 
[offheap2c+6781|https://github.com/belliottsmith/cassandra/tree/offheap2c+6781] 
and [offheap2c|https://github.com/belliottsmith/cassandra/tree/offheap2c+6781]

The former includes performance enhancements for DirectByteBuffer use, from a 
separate ticket.

bq.  I'm just saying that all of the read/write requests are short lived enough 
to stay inside young generation region
Well, yes and no. We have to wait until any client finishes processing the 
data, so there's no absolute guarantee they'll not survive. But either way, 
ParNew pauses are almost as bad as full GC, only they happen much more often. 
300ms pauses are not a good thing, and if we can then reduce the size of YG so 
that when these pauses do happen they're shorter (or, say, maybe even use G1GC) 
then that's even better.

bq. Sorry but I still don't get it, do you mean lock-free/non-blocking or that 
it does no syscalls or something similar? But that doesn't matter for pauses as 
much as allocation throughput and fragmentation of Java GC.

I mean no worker threads have to stop in order for memory to be reclaimed. 
There is no STW for reclaiming memory. This is unrelated to the lock freedom.

bq. Java is the worst of cache locality with it's object placement anyway 

Well, exactly. That's my point here :-)

bq.  it adds another complications not related to this very ticket.

I have no intention of doing it in this ticket, just indicating it as a very 
useful improvement.

bq. which is how we usually do things for Cassandra project

It looks like you really have two steps, if I boil it down: 1) implement 
copying approach; 2) if slow, implement this approach? 

The issue with this, really, though, is that if we simply allocate ByteBuffers 
off-heap like we have in this ticket, we get none of the benefit of increased 
memtable capacity, since the on-heap overheads are still huge. Since that's one 
of the main goals here, it seems problematic to lose out on it - we don't need 
to test to find out what the result would be. This ticket was supposed to only 
be a stepping stone. 

Possibly we could scale CASSANDRA-6694 back somewhat, removing any support for 
RefAction.refer(), and always performing a copy onto heap from the NativeCell 
implementations, and spending some time ripping out any of the GC or any of the 
code at Memtable discard for coping with RefAction.refer(). But honestly this 
seems like a waste of effort to me, as the majority of the code would remain, 
we'll just not have as good a final solution. But it could be done if that is 
the community preference. We could probably split it up into further commits, 
but each commit adds the potential for more errors in my book, when we have a 
good solution that is ready to go.

Maximally separated timeline for separated commits of 6694 would be:
# introduce concurrency primitives
# introduce .memory and .data refactor, but only for ByteBuffer allocators, and 
RefAction, but only allocateOnHeap
# introduce all .data.Native* implementations and a cut down native allocator, 
using OpOrder to guard copying like we currently do for referencing (we need to 
use something, and it is simpler than ref counting or anything else)
# introduce RefAction.refer(), GC, etc. (i.e. final patch)

I would rather not split it up as, as I say, each new patch is an opportunity 
to mess up, but it could be done. We can do performance testing to our hearts 
content at each stage, although personally I think such testing would not be 
sufficient to demonstrate no benefit to the current approach, as even with 
little benefit seen it presupposes no future performance benefit is capped by 
the slower solution. So I would push for the final patch anyway. That said, I 
would be surprised if we did not see any improvement by comparison.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-06 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923147#comment-13923147
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Doing everything together is even bigger opportunity to mess up. Let me clarify 
my point, hopefully you will hear it this time as you plead for committers to 
decide so I'm representing. From my side nothing from this ticket would go in 
until we separate all of the work into multiple committable stages, namely (but 
not necessarily exactly like that): 

# Make use off-heap memory inside of Memtables, which delivers increased 
capacity and less frequent/painful CMS (commit #1):
  a. Determine if we actually need to re-implement custom allocator or can go 
with existing one e.g. netty's implementation
  b. plug allocator from #1 into Memtable for memory usage tracking, do all 
necessary bookkeeping to release memory back to the pool once it's no longer 
needed (overwrite, flush etc.), make Memtable memory life time limited by 
copying data from Memtable on the read path.

# Optimizations after #1 which determine how big of impact the copying actually 
makes comparing to original situation on different workloads and how good is 
ref tracking comparing to that (commit #2)
   Delivers - memory reference tracking (RefAction, GC + what ever is needed)

(if needed) # Further enhancements to allocators/ref accounting (commit #3)

This way we can reuse already existing RefAction and related code that we 
already have, commit #1 doesn't require any substantial changes in the way we 
treat memtables and adds decoupled allocator framework, commit #2 plugs 
reference tracking into all previous changes, commit #3 further improves things 
if needed.


   

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923157#comment-13923157
 ] 

Benedict commented on CASSANDRA-6689:
-

I think my suggested split meets your requirement, but I'm not absolutely sure.

My 1-2 are refactors to bring abstractions closer; 3 is your step 1.b; 4+5 is 
your step 2.

The only difference is I assume we want our own allocator, since we won't be 
able to use Netty's for steps 4+5, and that we allocate Native* within the 
memtable to save substantially more space than allocating ByteBuffer would.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923186#comment-13923186
 ] 

Benedict commented on CASSANDRA-6689:
-

Whilst it would be nice to depend on an external project here, it doesn't 
resolve the problem that it's heavier-weight on heap than what we can achieve 
ourselves is. In CASSANDRA-6694 we really try to push the heap overhead to its 
minimum, and every byte counts; also, we need an extra field we can manipulate 
for maintaining the allocation list for GC, when we come to introduce it. So we 
really do need control of it, otherwise we'll only get us as far as step 1.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-06 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923188#comment-13923188
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Can you elaborate why is it heavier-weight on heap?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-06 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923201#comment-13923201
 ] 

Benedict commented on CASSANDRA-6689:
-

The smallest direct byte buffer from Netty is on the order of 64-bytes minimum, 
whereas we manage 24-bytes; plus for maintaining the allocation list we then 
need to have a separate list, rather than chaining the allocations, so at least 
another 4-bytes overhead (more, but in best case it tends to 4).

[https://github.com/netty/netty/blob/master/buffer/src/main/java/io/netty/buffer/AbstractByteBuf.java]
[https://github.com/netty/netty/blob/master/buffer/src/main/java/io/netty/buffer/UnpooledUnsafeDirectByteBuf.java]


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-06 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923561#comment-13923561
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

It seems like we have a pretty broad consensus (Pavel + Sylvain + Aleksey; 
Marcus?) that we should break this up more.  How about something like this?

# Trading some performance for much bigger memtables is useful, so introduce 
off-heap memtables as an option, default to off.  No GC, just free after flush. 
 One native Buffer each for name and value.  Copy onto heap when reading.
# Add NativeCell a la 6694 to reduce at-rest on-heap overhead.  Still copying 
into an on-heap Cell for reads.  Still no GC.
# Add the simplest possible GC to avoid copying into on-heap Cells.  If we 
still have to flush for overwrite-heavy workloads that's okay.
# Full GC that doesn't need to flush unless we really have used up our memory 
in live Cells.
# Get rid of on-heap memtables.

I'm waving my hands a bit b/c I'm not sure what 3 looks like but I think/hope 
we can probably come up with something simpler than what we have today as an 
intermediate step.  And maybe 5 comes after 3.  But I think 1 and 2 make sense 
for 2.1b2 with the others for 3.0.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-06 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923642#comment-13923642
 ] 

Marcus Eriksson commented on CASSANDRA-6689:


Sounds good, though i would have been good with shipping the whole patch but 
marking the off-heap parts as experimental and defaulting to HeapSlabPool

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920720#comment-13920720
 ] 

Benedict commented on CASSANDRA-6689:
-

bq.  sort of RCU (i'm looking at you OpOrder)

What do you mean here? If you mean read-copy-update, OpOrder is nothing like 
this.

bq. I'm not sure what is to retain here if we do that copy when we send to the 
wire

Ultimately, doing this copying before sending to the wire is something I would 
like to avoid. Using the RefAction.allocateOnHeap() on top of this copying sees 
wire transfer speeds for thrift drop by about 10% in my fairly rough-and-ready 
benchmarks, so obviously copying has a cost. Possibly this cost is due to 
unavoidably copying data you don't necessarily want to serialise, but it seems 
to be there. Ultimately if we want to get in-memory read operations to 10x 
their current performance, we can't go cutting any corners.

bq. introducing separate gc

I've stated clearly what this introduces as a benefit: overwrite workloads no 
longer cause excessive flushes

bq.  things but as we have a fixed number of threads it is going to work out 
the same way as for buffering open files in the steady system state

Your next sentence states how this is a large cause of memory consumption, so 
surely we should be using that memory if possible for other uses (returning it 
to the buffer cache, or using it internally for more caching)?

bq. Temporary memory allocated by readers is exactly what we should be managing 
at the first place because they allocate the most and it always the biggest 
concern for us

I agree we should be moving to managing this as well, however I disagree about 
how we should be managing it. In the medium term we should be bringing the 
buffer cache in process, so that we can answer some queries without handing off 
to the mutation stage (anything known to be non-blocking and fast should be 
answered immediately by the thread that processed the connection), at which 
point we will benefit from shared use of the memory pool, and concrete control 
over how much memory readers are using, and zero-copy reads from the buffer 
cache. I hope we may be able to do this for 3.0.

bq. do a simple memcpy test and see how much mb/s can you get from copying from 
one pre-allocated pool to another

Are you performing a full object tree copy, and doing this with a running 
system to see how it affects the performance of other system components? If 
not, it doesn't seem to be a useful comparison. Note that this will still 
create a tremendous amount of heap churn, as most of the memory used by objects 
right now is on-heap. So copying the records is almost certainly no better for 
young gen pressure than what we currently do - in fact, *it probably makes the 
situation worse*.

bq. it's not the memtable which creates the most of the noise and memory 
presure in the system (even tho it uses big chunk of heap) 

It may not be causing the young gen pressure you're seeing, but it certainly 
offers some benefit here by keeping more rows in memory so recent queries are 
more likely to be answered with zero allocation, so reducing young gen 
pressure; it is also a foundation for improving the row cache and introducing a 
shared page cache which could bring us closer to zero allocation reads.

It's also not clear to me how you would be managing the reclaim of the off-heap 
allocations without OpOrder, or do you mean to only use off-heap buffers for 
readers, or to ref-count any memory as you're reading it? Not using off-heap 
memory for the memtables would negate the main original point of this ticket: 
to support larger memtables, thus reducing write amplification. Ref-counting 
incurs overhead linear to the size of the result set, much like copying, and is 
also fiddly to get right (not convinced it's cleaner or neater), whereas 
OpOrder incurs overhead proportional to the number of times you reclaim. So if 
you're using OpOrder, all you're really talking about is a new RefAction: 
copyToAllocator() or something. So it doesn't notably reduce complexity, it 
just reduces the quality of the end result.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920721#comment-13920721
 ] 

Benedict commented on CASSANDRA-6689:
-

bq.  but the reads and internode communication (especially the latter).

Also, I'd love to see some evidence for this (particularly the latter). I'm not 
disputing it, just would like to see what caused you to reach these 
conclusions. These definitely warrant separate tickets IMO, but if you have 
evidence for it, it would help direct any work.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-05 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921110#comment-13921110
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


bq. I've stated clearly what this introduces as a benefit: overwrite workloads 
no longer cause excessive flushes

If you do a copy before of the memtable buffer, you can clearly put it back to 
the allocator once it's overwritten or becomes otherwise useless, in the 
process of merging columns with previous row contents.

bq. Your next sentence states how this is a large cause of memory consumption, 
so surely we should be using that memory if possible for other uses (returning 
it to the buffer cache, or using it internally for more caching)?

It doesn't state that is a *large cause of memory consumption*, it states that 
it has additional cost but it the steady state it don't be allocating over the 
limit because of the properties of the system that we have, namely the fixed 
number of threads.

bq. Are you performing a full object tree copy, and doing this with a running 
system to see how it affects the performance of other system components? If 
not, it doesn't seem to be a useful comparison. Note that this will still 
create a tremendous amount of heap churn, as most of the memory used by objects 
right now is on-heap. So copying the records is almost certainly no better for 
young gen pressure than what we currently do - in fact, it probably makes the 
situation worse.

Do you mean this? Let's say we copy a Cell (or Column object), which is 1 level 
deep so just allocate additional space for the object headers and do a copy, 
most of the work would be spend by doing a copy of the data (name/value) 
anyway, so as we want to live inside of ParNew, see how many such allocations 
you will be able to do in e.g. 1 second then wipe the whole thing and do it 
again. We are doing mlockall too which should make that even faster as we are 
sure that heap is pre-faulted already.

bq. It may not be causing the young gen pressure you're seeing, but it 
certainly offers some benefit here by keeping more rows in memory so recent 
queries are more likely to be answered with zero allocation, so reducing young 
gen pressure; it is also a foundation for improving the row cache and 
introducing a shared page cache which could bring us closer to zero allocation 
reads. _And so on_

I'm not sure how this would help in the case of row cache, once reference is 
added to the row cache it means that memtable would hang in there until that 
row is purged, so if there is a long lived row (write once, read multiple 
times) in each of the regions (and we reclaim based on regions) would that keep 
memtable around longer than expected?

bq. It's also not clear to me how you would be managing the reclaim of the 
off-heap allocations without OpOrder, or do you mean to only use off-heap 
buffers for readers, or to ref-count any memory as you're reading it? Not using 
off-heap memory for the memtables would negate the main original point of this 
ticket: to support larger memtables, thus reducing write amplification. 
Ref-counting incurs overhead linear to the size of the result set, much like 
copying, and is also fiddly to get right (not convinced it's cleaner or 
neater), whereas OpOrder incurs overhead proportional to the number of times 
you reclaim. So if you're using OpOrder, all you're really talking about is a 
new RefAction: copyToAllocator() or something. So it doesn't notably reduce 
complexity, it just reduces the quality of the end result.

In terms of memory usage copy adds additional linear cost yes but at the same 
time it makes the system behavior more controllable/predictable which is what 
ops usually care about where, even on the artificial stress test, there seems 
to be a low once off-heap feature is enabled which is no surprise once you look 
at how much complexity does it actually add.

bq. Also, I'd love to see some evidence for this (particularly the latter). I'm 
not disputing it, just would like to see what caused you to reach these 
conclusions. These definitely warrant separate tickets IMO, but if you have 
evidence for it, it would help direct any work.

Well, it seems like you never operated a real Cassandra cluster, did you? All 
of the problems that I have listed here are well known, you can even simulate 
this with docker VMs and making internal network gradually slower, there is no 
back pressure mechanism built-in so right now Cassandra would accept a bunch or 
operations on the normal speed (if the outgoing link is physically different 
than internal) but suddenly would just stop accepting anything and fail 
internally because of GC storm caused by all of the internode buffers hanging 
around.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: 

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919225#comment-13919225
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. What I want is to see more control over how we deal with cleaner threads, 
as the main point to be able to run with smaller heaps, and thread stacks could 
consume ~300K we are risking to OOM in certain situations because we don't 
control whole thread life time, that would also help for people in environments 
guided by ulimit or similar.

Just for clarity, it's the _collector_ threads, not the _cleaner_ threads 
you're worried about, right? Because there is only ever one cleaner thread. 
Like I say, this is mostly trying to help with an anti-pattern, so I'm not sure 
it's worth addressing too aggressively in this ticket. In real terms you'll 
struggle to spin up very many threads at all, but let's put an arbitrary limit 
of 100 threads on the pool for now. I don't think this is worth offering 
configuration options to the user here.

bq. Please do think about it, we need to try to reduce the complexity of this 
to minimize bus factor

As I said, I (still) can't see a good way to use this information to simplify 
matters usefully. If we flattened Pool, AllocatorGroup and Allocator all into 
the one object, and we had a per-CF memory limit, we would be able to simplify 
all of the SubPool and SubAllocator memory management considerably, but then we 
have the issue that there's no global management of memory resources, nor 
shared re-use. I don't think this would simplify the actual recycling of memory 
regions that is introduced in this ticket at all though, but it's possible I'm 
missing something.

Suggestions always welcome :-)

bq. Exactly, maybe it's time to re-evaluate trade-off between copying data to 
per-allocated regions with limited ttl vs. tracking data all around the place.

This is the opposite suggestion, right? :-)   It is definitely something worth 
exploring in future.

bq. Have you had a chance to do more on vs. off heap performance testing?

I'm not sure Marcus was performance testing. Personally I'm most interested in 
correctness right now: performance should follow (in particular, slow flushing 
is certainly a bug which can be fixed). Certainly for overwrite workloads 
I've demonstrated the reduced flushing on my box, but further testing of the 
impact is likely to be a long drawn out process, and probably beyond scope of 
this ticket IMO.

Performance testing scope might include:
- Establish how much heap limit reduction this permits
- Establish a severe GC pause inducing workload to see if this mitigates
- Establish if G1GC is made viable with this, and if not, what remaining 
impacts there are to adoption that might be lifted
- Establish how much improvement to overwrite workloads (or mixed workloads, 
where reduced overwrite workload memory consumption = reduced non-overwrite 
workload flushing)
- Establish new reasonable upper memory limit for heap and memtable (both 
independently), for different workloads. e.g. large data payload workloads may 
now practicably support much larger memtables and dramatically reduced flushing
- Establish impact on upper *data* limit for different workloads, with same 
memory limit as olde, and the impact on size-of-flush and concomitant reduction 
in write amplification


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-04 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919511#comment-13919511
 ] 

Marcus Eriksson commented on CASSANDRA-6689:


bq. As I said, I (still) can't see a good way to use this information to 
simplify matters usefully.
+1, my initial review-comment was probably badly phrased, the patch adds a 
complicated feature and it will take effort to learn these things for anyone 
working with the Cassandra code base, but I think it will be worth it in the end

bq. I'm not sure Marcus was performance testing.
no, not at all, just ran stress a couple of times to get ballpark numbers

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-04 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920328#comment-13920328
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


So the only question that I still have - what is the main goal of the ticket? 
Is it to move to zero copy for memtables or minimize number of allocations on 
java heap? If it's the latter then I don't see why we can't have separate 
allocator for memtable and use a separate allocator for operations which would 
do a copy from memtable memory, that way we don't have to track as such as we 
do right now and allocator interface is going to be better encapsulated 
(basically expose alloc/release), I'm not sure if we need sub pools or groups 
of allocators in there, I really like how jemalloc and others do it also Netty 
already has implementation of jemalloc (ByteBufAlloc) so maybe we can reuse 
that?...

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920357#comment-13920357
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. So the only question that I still have - what is the main goal of the 
ticket? Is it to move to zero copy for memtables or minimize number of 
allocations on java heap?

Zero copy would certainly be nice to retain. Otherwise why not just ref-count 
the memory?

bq.  allocator interface is going to be better encapsulated (basically expose 
alloc/release)

I don't follow how this would improve encapsulation, sorry. Could you sketch an 
example? It sounds to me like this actually would create more complexity 
outside of the main body of code. It might reduce complexity a little where the 
fun stuff happens, but is it all really so difficult to understand that we want 
to start opting for less good solutions that require more work? Also makes it 
difficult to impose tight caps on memory used, as we can't practicably 
impose/manage temporary memory allocated by readers. We already have this 
problem for disk reads, but no point opening up further doors for OOM.

bq.  I'm not sure if we need sub pools or groups of allocators in there, I 
really like how jemalloc and others do it also Netty already has implementation 
of jemalloc (ByteBufAlloc) so maybe we can reuse that?...

I'm unclear how this addresses the concerns of resource management. The 
ByteBufAllocator appears to simply allocate byte buffers, which by itself is a 
pretty simple problem. jemalloc is also just a low level replacement to malloc, 
so I'm not sure where it enters the discussion?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-04 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920405#comment-13920405
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


bq. Zero copy would certainly be nice to retain. Otherwise why not just 
ref-count the memory?

I'm not sure what is to retain here if we do that copy when we send to the 
wire. Wouldn't we better of with ref count and copy but having separate 
allocator for temporary things (e.g. write/read buffers) and for more long term 
(e.g. memtable, compression buffers)?

So can you please answer the question - what is the biggest advantage of 
tracking everything, introducing separate gc, sort of RCU (i'm looking at you 
OpOrder) vs. mem copy + different allocators for different things if there is a 
fixed size pre-allocated buffer pool off heap (we can also do COW for some of 
the chunks)? Copy is *not* be a big problem (regarding throughput/latency) 
comparing to other operations (do a simple memcpy test and see how much mb/s 
can you get from copying from one pre-allocated pool to another), there is a 
trade-off of course as we are going to spend more memory to store temporary 
things but as we have a fixed number of threads it is going to work out the 
same way as for buffering open files in the steady system state... Temporary 
memory allocated by readers is *exactly* what we should be managing at the 
first place because they allocate the most and it always the biggest concern 
for us (ParNew pauses are reaching 300 ms in 1 sec. intervals), that's why I'm 
saying let's use low level allocator for things like reads with it's own 
arena and chunk sizes, as we originally wanted and have a separate low level 
allocator for the memtable, preallocate at the startup, return buffers to the 
pool when we serialize for the wire and go from there with global thresholds 
etc.

After all that said, I would suggest that we do a separate low level 
allocator in this ticket (the same way as jemalloc) and plug it into the 
memtable and do a copy on read path, once that step is done we can plug-in 
allocator for all of the serialization/deserialization logic that we have when 
it's going from/to the wire and the last step would be to use allocator for all 
of the sstable read operations.

All this might seem irrelevant for some people but I can say that after 
fire-fighting Cassandra for some time for different use cases, it's not the 
memtable which creates the most of the noise and memory presure in the system 
(even tho it uses big chunk of heap) but the reads and internode communication 
(especially the latter).

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-03 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918701#comment-13918701
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


I read through most of the code in offheap1b skipping queue implementations add 
there and I have couple of questions:

I see this pattern in the code (e.g. {RangeSlice, Read}VerbHandler.java) where 
Referrer is allocated try { operation(); referrer = null } finally { 
referrer.setDone() if (referrer != null); } where documentation says that that 
setDone() should be called when we are done with the referrer, can you 
elaborate why do we need to explicitly set it to null?

Pool.java - can we schedule cleaners to a static thread pool (or something 
similar) instead of creating a thread per cleaner? With would add more control 
in the situation when we flush memtables in short intervals. Related question, 
why can't we have per-cf (or global) pool instead of creating it per memtable, 
might be possible to remove allocator groups if we do that?

Also I just want to throw an idea here maybe there is something useful in it - 
as we have fixed number of stages and threads in them and don't normally share 
much between them, maybe it would be possible to use that as an advantage for 
allocator gc? What I mean is, it might be possible to track life-time of the 
allocation not by how buffer was allocated but where it was allocated, more 
like what RCU...

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918726#comment-13918726
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. can you elaborate why do we need to explicitly set it to null?

I do this in places where I'm handing off control of the referrer to another 
thread/context, so we only call setDone() in the case of an error before we did 
this. Admittedly, looking at these places now, it looks like replacing the 
finally block with a catch (Throwable t) block would be fine, and given we can 
do untyped rethrow there isn't anything lost by doing this. Probably an old 
habit from Java 6 to do it the way I did.

bq. With would add more control in the situation when we flush memtables in 
short intervals.

If I understand you correctly, you mean to deal with the situation where we 
want to flush memtables quick enough that the single cleaner thread may become 
the bottleneck? In which case, I don't think this should ever be a problem: the 
work done by the cleaner itself is very minimal unless it's got some GC to do, 
in which case it is potentially non-trivial, but still unlikely to be a 
bottleneck. There might be a case for handing off the GC candidate selection to 
another thread, except that we expect it to certainly be quicker than any 
flush, and we want to wait for it to complete before deciding if flushing is 
actually worthwhile, so I'm not sure it buys us much. Maybe we could do this 
and continue to check the memory state isn't worsening, and if it is just force 
a flush without waiting for the GC, but it probably is too unlikely to be of 
benefit to be worth the extra complexity. That is, unless you meant some other 
benefit I missed?

bq. Related question, why can't we have per-cf (or global) pool instead of 
creating it per memtable, might be possible to remove allocator groups if we do 
that?

The pool _is_ global; a per-cf pool would have the issue of poor sharing of 
resources, though, no? I would probably like to see us move to allocators 
surviving across memtable replacement, so that we could then flatten allocator 
group with allocator, but I'd prefer to save that for a later date.

bq. Also I just want to throw an idea here maybe there is something useful in 
it - as we have fixed number of stages and threads in them and don't normally 
share much between them, maybe it would be possible to use that as an advantage 
for allocator gc? What I mean is, it might be possible to track life-time of 
the allocation not by how buffer was allocated but where it was allocated, more 
like what RCU...

Not really sure what you're suggesting here, could you elaborate?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-03 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918785#comment-13918785
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


bq. If I understand you correctly, you mean to deal with the situation where we 
want to flush memtables quick enough that the single cleaner thread may become 
the bottleneck? In which case, I don't think this should ever be a problem: the 
work done by the cleaner itself is very minimal unless it's got some GC to do, 
in which case it is potentially non-trivial, but still unlikely to be a 
bottleneck. There might be a case for handing off the GC candidate selection to 
another thread, except that we expect it to certainly be quicker than any 
flush, and we want to wait for it to complete before deciding if flushing is 
actually worthwhile, so I'm not sure it buys us much. Maybe we could do this 
and continue to check the memory state isn't worsening, and if it is just force 
a flush without waiting for the GC, but it probably is too unlikely to be of 
benefit to be worth the extra complexity. That is, unless you meant some other 
benefit I missed?

I can image the situation when we frequently switch memtables which, in current 
code, starts a new thread, so I wonder how quickly we would be able to see a 
slowdown affect caused by that + potentially OOM on stack space...

bq. The pool is global; a per-cf pool would have the issue of poor sharing of 
resources, though, no? I would probably like to see us move to allocators 
surviving across memtable replacement, so that we could then flatten allocator 
group with allocator, but I'd prefer to save that for a later date.

I'm thinking maybe it would lessen internal complexity if we do (sort-of) arena 
allocation inside instead of having groups and sub pools?

bq. Not really sure what you're suggesting here, could you elaborate?

It's just an idea so people can just ignore it but what I meant there is maybe 
it would be a better first step for us to consider making all of the operations 
that we do more self-contained so we can track exactly where it started and 
ends (by operation I mean all types of reads, writes), which could potentially 
make allocator job easier as we would be able to container  per operation 
memory...

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918822#comment-13918822
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. I can image the situation when we frequently switch memtables which, in 
current code, starts a new thread, so I wonder how quickly we would be able to 
see a slowdown affect caused by that + potentially OOM on stack space...

It only starts a new thread if there is GC work to be done, room to do it in, 
and the static pool doesn't have any threads that haven't timed out. The 
cleaner thread itself is started once and remains forever. If you mean you're 
worried about flushing so many memtables that all have GC work possible on them 
and that this would spam the collector thread pool, I guess I agree it's a 
theoretical possibility, but it would have to be pretty extreme. We can spin a 
thousand threads and not break a sweat, but even with tens of thousands of 
memtables we'll hit our memory limit and stop being able to do any more GC 
before we cause major problems. That said, we could safely impose a cap on the 
pool if we wanted. Set it largeish so it isn't a bottleneck, but prevent this 
kind of problem.

bq. I'm thinking maybe it would lessen internal complexity if we do (sort-of) 
arena allocation inside instead of having groups and sub pools?

Probably not appreciably. Most of complexity is likely to remain, as it's 
mostly about dealing with the interactions of writers with readers, and those 
will be mostly the same regardless, since readers can touch anything at any 
time. The allocator group does currently allow for some slight _optimisations_, 
but I don't immediately see a way that merging with pool would reduce 
_complexity_. I will give it some thought though. On this vein, I would like to 
see thread local allocation buffers, but again this is an optimisation rather 
than a complexity reducer.

bq. It's just an idea so people can just ignore it but what I meant there is 
maybe it would be a better first step for us to consider making all of the 
operations that we do more self-contained so we can track exactly where it 
started and ends (by operation I mean all types of reads, writes), which could 
potentially make allocator job easier as we would be able to container per 
operation memory...

We're currently dealing with allocations that necessarily outlive the duration 
of the operation, so I'm not sure how well that would apply here, but possibly 
you have some specific examples? It may be that we decide to use this code for 
creating the mutations from thrift/native/MS etc., though, in which case it 
would be very helpful.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918825#comment-13918825
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. Almost forgot, I have some minor changes to the code: removed couple of 
used methods, added couple of RefAction.impossible() instead of null, changed 
Memtable code so it avoids double cast, and very minor typo.

Thanks. I'll incorporate these into CASSANDRA-6694 instead if that's okay with 
you?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-03 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918866#comment-13918866
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


bq. It only starts a new thread if there is GC work to be done, room to do it 
in, and the static pool doesn't have any threads that haven't timed out. The 
cleaner thread itself is started once and remains forever. If you mean you're 
worried about flushing so many memtables that all have GC work possible on them 
and that this would spam the collector thread pool, I guess I agree it's a 
theoretical possibility, but it would have to be pretty extreme. We can spin a 
thousand threads and not break a sweat, but even with tens of thousands of 
memtables we'll hit our memory limit and stop being able to do any more GC 
before we cause major problems. That said, we could safely impose a cap on the 
pool if we wanted. Set it largeish so it isn't a bottleneck, but prevent this 
kind of problem.

What I want is to see more control over how we deal with cleaner threads, as 
the main point to be able to run with smaller heaps, and thread stacks could 
consume ~300K we are risking to OOM in certain situations because we don't 
control whole thread life time, that would also help for people in environments 
guided by ulimit or similar.

bq. I will give it some thought though. On this vein, I would like to see 
thread local allocation buffers, but again this is an optimisation rather than 
a complexity reducer.

Please do think about it, we need to try to reduce the complexity of this to 
minimize bus factor :)

bq. We're currently dealing with allocations that necessarily outlive the 
duration of the operation, so I'm not sure how well that would apply here, but 
possibly you have some specific examples? It may be that we decide to use this 
code for creating the mutations from thrift/native/MS etc., though, in which 
case it would be very helpful.

Exactly, maybe it's time to re-evaluate trade-off between copying data to 
per-allocated regions with limited ttl vs. tracking data all around the place.

bq. Thanks. I'll incorporate these into CASSANDRA-6694 instead if that's okay 
with you?

Sure.

[~krummas] Have you had a chance to do more on vs. off heap performance testing?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-28 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915840#comment-13915840
 ] 

Sylvain Lebresne commented on CASSANDRA-6689:
-

2 remarks that I remarked while skimming over the patch (those are more 
codestyle than anything but well):
* there's tons of method whose name starts with an underscore. I'd really 
rather avoid that since it's inconsistent with the code base (and given the 
time it took us a few years back to remove all underscores to get the code 
consistent, I'd rather not reintroduce some now).
* why changing all the ByteBuffer.hasArray() calls to !ByteBuffer.isDirect()? 
In almost all cases, hasArray() is called to guard some latter calls to array() 
and/or arrayOffset() and the [ByteBuffer 
javadoc|http://docs.oracle.com/javase/6/docs/api/java/nio/ByteBuffer.html#hasArray()]
 is pretty explicit on the fact that hasArray() is *the* correct method to call 
in that case. 

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-28 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915948#comment-13915948
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. there's tons of method whose name starts with an underscore. I'd really 
rather avoid that since it's inconsistent with the code base (and given the 
time it took us a few years back to remove all underscores to get the code 
consistent, I'd rather not reintroduce some now).

These are almost all within one class: AbstractMemory, also a couple in 
NativeAllocation. The reason for this is that these are the base classes for 
building off-heap objects, so we need methods for the implementation of the 
interfaces these objects will represent, but these methods absolutely should 
not escape or interfere with the abstractions we are implementing. As an 
example, we need a size() method in CellName, and an internal _size() method in 
NativeAllocation. They mean two completely different things, and one of them is 
only interesting to the internals of any native implementation. I want to avoid 
any such clash, or any overhead in establishing that there is no such clash, 
and so underscores seem the best solution. We also need to provide lots of 
internal methods for getting/putting data/fields from the off-heap region the 
object is made up of, and I don't want any confusion about what methods mess 
with this, and which methods are actual functionality. 

There are a lot of these methods, and putting internal as a prefix or 
something else doesn't seem any better to me.

bq. why changing all the ByteBuffer.hasArray() calls to !ByteBuffer.isDirect()? 
In almost all cases, hasArray() is called to guard some latter calls to array() 
and/or arrayOffset() and the ByteBuffer javadoc is pretty explicit on the fact 
that hasArray() is the correct method to call in that case.

This is a valid criticism, and in fact I should roll back this change for this 
patch. It was necessary for CASSANDRA-6689 as we were abusing ByteBuffer, and 
breaking this contract. This is no longer the case, now that we use 
NativeAllocation instead.



 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-28 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916011#comment-13916011
 ] 

Sylvain Lebresne commented on CASSANDRA-6689:
-

bq. There are a lot of these methods, and putting internal as a prefix or 
something else doesn't seem any better to me.

Well, it's more consistent with the code style, and consistency is better. It's 
imo also more explicit since you had to use the word internal to describe 
what the underscore were standing for in your comment. And if size() and 
_size() mean two completely different things, I think having more than one 
character separating them is not a bad thing. But honestly the latter points 
are almost irrelevant, code base style consistency is (to me at least).

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-28 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916020#comment-13916020
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. code base style consistency is (to me at least).

Well, readability of the functional methods is important to me, so having 
dozens of internalGetLongVolatile or internalSize calls around the place seems 
an unnecessary price for such an isolated contradiction to the code style, but 
that's just my take. It seems pretty clear that all of the _ methods are 
internal, to me.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-28 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916028#comment-13916028
 ] 

Sylvain Lebresne commented on CASSANDRA-6689:
-

I give up.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-28 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916063#comment-13916063
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

bq. it's more consistent with the code style, and consistency is better. It's 
imo also more explicit since you had to use the word internal to describe 
what the underscore were standing for in your comment

+1

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-28 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916071#comment-13916071
 ] 

Benedict commented on CASSANDRA-6689:
-

I should note these underscore concerns all apply to CASSANDRA-6694, so I will 
apply the change there, once we have any other concerns here cleared up. As 
merging these changes with those in CASSANDRA-6694 has been a pain already, and 
I don't want to make that worse.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-28 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916545#comment-13916545
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


So [~benedict] what's the latest branch, is it offheap1b? On the side note, +1 
with [~slebresne], let's incapsulate and name everything properly, I worked on 
renaming here already and don't want to history to repeat itself.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-28 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916720#comment-13916720
 ] 

Benedict commented on CASSANDRA-6689:
-

[~xedin] the latest latest is offheap2c, which I have just uploaded, and is for 
ticket 6694, building on this ticket. I think it's the better one to review, 
personally, but if you want to start with this ticket first offheap1b is the 
latest, yes.

To possibly aid the reviewers to understand the decisions taken here, I will 
outline briefly the main areas of code that have been changed, and why the 
approach was selected. I will leave details about _how_ it works to be 
addressed by the documentation in the codebase, and won't cover if or why we 
want off-heap memtables, only the reasons for implementing them in this way.

First, though, I'd like to potentially put some words in Marcus' mouth, and 
suggest that when he says it is complicated/involved he's mostly talking 
about the complexity in reasoning about thread ordering/behaviour, as we do not 
use locks anywhere, and we endeavour to communicate between threads as little 
as possible, and we use the JMM to its fullest (and in this ticket only, abuse 
ByteBuffer badly, and depend sometimes on x86 ordering guarantees rather than 
the JMM). As such it is definitely a complex patch, but actually almost all of 
the complexity is in OffHeapCleaner (NativeCleaner in 6694) and 
OffHeapAllocator (NativeAllocator), with little extra snippets in the 
concurrency utilities, in Referrer/s, and also in the work introduced in 
CASSANDRA-5549 which is probably new to everyone reviewing. This is not 
complexity in the normal sense of code sprawl (although there are lots of 
places touched by this, mostly these are simple modifications), but in that it 
requires sitting and thinking hard about the model and the object lifecycles as 
documented. This is why I have spent a great deal of time specifying the 
assumptions and safety concerns at any of the danger points. This kind of 
complexity is difficult to justify in a simple paragraph or three, and either 
stands or falls by the code itself and its design decisions; I don't think I 
can summarise it here any better than the code comments, and I have no 
alternative implementation to compare and contrast with.

In this patch (ignoring CASSANDRA-6694 for now) there are three main areas of 
changes:
1) New concurrency utilities
2) OffHeapAllocator et al
3) RefAction changes across the codebase

As to 1, these simply help to make the implementation easier and safer. NBQ 
adds a number of useful behaviours, such as CAS-like modification, safe 
iterator removal, multiple views on the same queue. These facilities improve 
clarity and obviousness in a number of places. WaitQueue and OpOrder are also 
improved from those introduced in CASSANDRA-5549. OpOrder prevents garbage 
accumulation from old Group objects being kept floating around in Referrer 
instances (and makes it faster); WaitQueue reduces the cost of wake-ups by 
eliminating a race through use of NBQ. All of the concurrency utilities could 
be split off into a separate patch if we wanted.

As to 2 and 3, as a quick starting point which I will refer back to, I will 
paste a quick bit about the memory lifecycle from the code:

{code}
 * 1) Initially memory is managed like malloc/free: it is considered referenced 
at least until the allocation is
 *matched by a corresponding free()
 * 2) The memory is then protected by the associated OpOrder(s); an allocation 
is only available for collection
 *once all operations started prior to the free() have completed. These are 
effectively short lived transactions that
 *should always complete in a timely fashion (never block or wait 
indefinitely)
 * 3) During this time any operation protected by the read/write OpOrder may 
optionally 'ref' {@link Referrer} an object,
 *or objects, that each reference (in the normal java sense) some 
allocations; once the read operation finishes these
 *will have been registered with one or more GC roots {@link Referrers} - 
one per participating allocator group (CFS).
 * 4) When a GC (or allocator discard) occurs, any extant {@link Referrer} that 
were created during an operation that 
 *began prior to the collect phase are walked, and any regions that are 
reachable from any of these 'refs' are
 *switched to a refcount phase. When each ref completes it decrements the 
count of any such regions it reached.
 *Once this count hits 0, the region is finally eligible for reuse.
{code}

Now, as Jonathan mentioned, in CASSANDRA-5549, we introduced the OpOrder 
concurrency primitive. This provides a mechanism for cleaning up on-heap 
memtables because we can guarantee that no new writes will touch them. They can 
also guarantee no new reads will touch them. The problem is that a read's 
lifetime lasts beyond 

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-27 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13914452#comment-13914452
 ] 

Benedict commented on CASSANDRA-6689:
-

Okay, pushed a patch 
[here|https://github.com/belliottsmith/cassandra/tree/offheap1b] which includes 
all of your branch, is up-to-date with 2.1, and also:

bq. In NBQV, drainTo does not return the correct number of inserted items.

I've fixed this, and also merged NQB.drainTo() to some extent with 
NBQV.drainTo, to optimise it a little as well, and added some long tests to 
NonBlockingQueueTest to cover it

bq. Make OpOrder.Group and RefAction implement AutoClosable to be able to do 
try-with-resources?

I've introduced this change, and I've updated all of the try/finally blocks to 
use it. Much neater.


Things I haven't touched:

bq. AtomicReferenceArrayUpdater.shift(), really?

I'm really not sure what we can improve with this?

bq. I flush ~40MB/s with HeapSlabPool but only ~15MB/s with OffHeapPool.

I think we should split this into another ticket. I suspect we might be able to 
improve this by introducing an extension of DataOutput which can accept a 
ByteBuffer.

bq. GC metrics! Time spent, amount of data reallocated etc.

Also think this deserves its own ticket.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-26 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913657#comment-13913657
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


I agree with [~krummas] regarding having more granular to-the-point patches, 
plus it would be nice to merge this with 6694 so we can review/test all 
together especially considering the amount of changes. I will try to get some 
time to review this in upcoming days and that change would help us all a lot.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-26 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13913686#comment-13913686
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. more granular to-the-point patches, plus it would be nice to merge this 
with 6694 so we can review/test all together

I don't want to seem obtuse here, I want to help as much as I can, but it 
really is one or the other. Even this patch would be difficult to break up, but 
6694 makes it all even more intertwined. Note that excluding the minor 
suggestions made by [~krummas], 6694 is built on top of this, so contains this 
patch. I will be posting a patch shortly that addresses his concerns, but 
they're pretty small changes, so shouldn't affect the review process. I'll 
merge any changes with 6694 when I post.

If we can think of a good split to perform I'm all for it, but literally the 
only thing I can think of is to disable GC in the first commit. But if we 
commit 6694 excluding GC, it's basically a waste of time to separate it, as 
it's a small fraction of the changes, and almost all of the complicated stuff 
needs to be done for disposal after flush without GC.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-25 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911646#comment-13911646
 ] 

Benedict commented on CASSANDRA-6689:
-

As far as performance testing is concerned, I would say we should test 
CASSANDRA-6694, as this should be faster in general (excluding its caveats), 
and also make sure we're running with cassandra.paranoidgc=off (as this has 
negative performance implications). Also, there are some issues with 
performance with places in the codebase we've assumed heap based bytebuffers, 
and have slow routes otherwise. I didn't see such a reduction in performance in 
my tests, though, so I will have to investigate that a bit more thoroughly. 
There is an issue with SequentialWriter that it expects a byte[], so with DBB 
we are writing one byte at a time. I optimised this single-byte path, though, 
and found my performance problems disappeared then.

Some quick off the cuff answers to some of your points:

bq. Should we add QueryProcessor.process(without RefAction) methods to avoid 
all the RefAction.allocateOnHeap() methods? Or, perhaps it is good to always 
force people to think about where they want the data.

I think the latter. It's not too onerous, and it's better to be forced to think 
about it IMO.

bq. a rebase on 2.1 would be nice, does not work running in a cluster now.

CASSANDRA-6694 is fully up-to-date, but I guess if you'd prefer to split it up 
further, skipping ahead to include more changes might be off the cards :-)   
I'll see about merging it with your changes and 2.1, and then remerging 6694 as 
well.

bq. Also, just throwing the idea out here, maybe we should make this a separate 
library?
bq. Make OpOrder.Group and RefAction implement AutoClosable to be able to do 
try-with-resources?

Good ideas.

bq. Would be nice if we didn't have to copy data to heap for thrift queries, 
but i guess that is out of scope for now.

Yes, and thrift users can continue with heap allocation if they're worried 
about this overhead. But since it's transient they might want to benefit from 
the reduced heap size even if it means more frequent ygen collections.

I'll have a look over your changes more closely and respond to the rest later.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-25 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13911625#comment-13911625
 ] 

Marcus Eriksson commented on CASSANDRA-6689:


Overall this is a very complicated/involved change that we really need to 
test/benchmark (I can't make it go faster than HeapSlabPool)/figure out if we 
want it in. That being said, it looks very solid.

The biggest risks are, I guess (and where I spent most of my time when 
reviewing), that we start corrupting/freeing used/leaking offheap memory, but 
as far as I can tell, the patch avoids that. We should do long-running tests to 
verify it as well.

It would probably have been nice to split this patch up in smaller pieces, say 
doing the GC in a separate ticket and only discarding entire regions on flush 
here, but I understand it would be difficult to separate the things.

A commit on top of this pushed to 
https://github.com/krummas/cassandra/commits/marcuse/6689-3 with comment 
fixups/small changes.

* GC metrics! Time spent, amount of data reallocated etc.
* In NBQV, drainTo does not return the correct number of inserted items.
* Class structure in WaitQueue a bit confusing (mixing static/non-static inner 
classes, signalledUpdater in WaitQueue while updating field in 
RegisteredSignal. Small refactor in my branch, feel free to ignore if you 
prefer your way.
* Should we add QueryProcessor.process(without RefAction) methods to avoid 
all the RefAction.allocateOnHeap() methods? Or, perhaps it is good to always 
force people to think about where they want the data.
* Config file wrong, repeated memtable_cleanup_threshold
* Passing null as RefAction to Rows constructor (ListPermissionsStatement and 
Rows.subcodec
* Make OpOrder.Group and RefAction implement AutoClosable to be able to do 
try-with-resources?
* AtomicReferenceArrayUpdater.shift(), really? :)
* CommitState unused
* Would be nice if we didn't have to copy data to heap for thrift queries, but 
i guess that is out of scope for now.

-- silly single-node perf testing (on my laptop): --
* Seems we get about 10% less write op rate throughput with OffHeapPool.
* I flush ~40MB/s with HeapSlabPool but only ~15MB/s with OffHeapPool.
* Running stress leads to client TimedOutException during flushing (I guess due 
to the slower flushing)

Oh, and a rebase on 2.1 would be nice, does not work running in a cluster now.

Also, just throwing the idea out here, maybe we should make this a separate 
library? It would make fixing bugs etc in it much more painful, but with a bit 
of thinking it could be useful for external users. Probably not possible for a 
2.1 timeframe, but maybe in the future.

I'll spend some more time on this, running actual tests etc.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13903185#comment-13903185
 ] 

Benedict commented on CASSANDRA-6689:
-

Pushed a slightly revised version 
[here|https://github.com/belliottsmith/cassandra/tree/offheap1], as didn't 
override a couple of methods in the Cell heirarchy. It's also merged with 
latest trunk.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-12 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899081#comment-13899081
 ] 

Marcus Eriksson commented on CASSANDRA-6689:


[~jasobrown] [~xedin] I wouldn't mind more sets of eyes on this patch, so, if 
you have time, please take a look!

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-12 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899096#comment-13899096
 ] 

Benedict commented on CASSANDRA-6689:
-

There are some natural boundaries if you want to share the burden. Everything 
inside of utils.concurrent is pretty isolated from everything outside, so could 
easily be vetted independently.

Also, the utilisation of Referrer/RefAction is probably going to be a 
painstaking thing to vet (that's what makes the majority of small touches 
outside of the main changes), and quite independent of their declarations. We 
just need to be certain we always use the correct type of RefAction, and never 
let one disappear somewhere - OutboundTCPConnection and native transport 
writing are the two danger areas here (also Memtable flushing needs a bit of 
care, but is definitely less scary).

The most difficult thing to review is going to be the main body of work inside 
of utils.memory, however. This is pretty hardcore lock-free stuff, and the 
thing we're looking for is _unintended_ race conditions (there are lots of 
intended races) - in particular pay attention to the way in which we now 
asynchronously manage the subpool and suballocator (ledgers of how much 
we've allocated / claimed / are reclaiming), and obviously most importantly 
that we never accidentally overwrite data that is being read elsewhere. This 
should all hopefully be very clearly documented both at the level of 
abstraction and the individual points where interesting / dangerous things 
happen. But try to figure it out for yourself as well, in case I and my tests 
missed something. I will be doing further tests in the near future, but I much 
prefer to catch things by eye if possible.

Always feel free to throw up a this bit isn't well explained flag and I'll 
try to improve it. I want this stuff to be as clearly self documenting as 
possible.



 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-11 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898564#comment-13898564
 ] 

Benedict commented on CASSANDRA-6689:
-

I've uploaded a patch 
[here|https://github.com/belliottsmith/cassandra/tree/offheap.merge]

In the simplest terms this boils down to a new off heap allocator. However it 
is much more involved than that, as we need to now manage the life cycle of the 
memory in question.

I won't go into excessive detail here, because I have tried to do so as much as 
possible in the code itself. For full details see the comments in 
OffHeapCleaner, RefAction/Referrer, OffHeapAllocator, OffHeapRegion.

The basic thrust of the approach is that we guard accesses to the memory 
through the OpOrder synchronisation primitive introduced with the 
CASSANDRA-5549. However to prevent stalls and make management of the memory 
easier, we have some further stages to the lifecycle which are much more 
GC-like. A corollary is that we also effectively perform GC of the data in the 
memtable, so that space that is no longer needed as a result of overwritten 
records can be made available again without first flushing the memtable.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-11 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898665#comment-13898665
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Thanks, [~jbellis], I'm already in the watcher list :)

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-11 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898661#comment-13898661
 ] 

Jason Brown commented on CASSANDRA-6689:


I can jump in if [~krummas] is busy.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)