subject:"\[jira\] \[Commented\] \(CASSANDRA\-6689\) Partially Off Heap Memtables"

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944213#comment-13944213
]

Pavel Yaskevich commented on CASSANDRA-6689:

bq. This is likely to require as much complexity as RefAction to ensure
resource cleanup, at which point we might as well just have proper zero copy.

I has thinking of thing as easy as replacing HeapAllocator.instance with
SlabAllocator.instance (which would only have a heap allocation enabled).

bq. As to your suggested changes, I am fine with most of the generics changes,
but -1 on several of your renames and your config note changes:
unslabbed_heap_buffers is not intended to be included in the config file, as it
is very much discouraged. Including slab in the name adds nothing, whereas
buffers permits other non-buffer implementations in future without confusion in
naming/renaming. I might change offheap_buffers to direct_buffers for
consistency with the JDK though.

If you have unslabbed_heap_buffers in your code then you better document it in
the config, because people would find it and create a issue for that. Rather
from that I don't care much about names as soon as they present what they do -
{on}heap_slab - means that those buffers are going to be allocated in the
on-heap slabs, same goes for offheap_slab (I was thinking of using
direct_slab but all of the code has it as offheap), jvm_default is just a
default way of allocating single buffers.

bq. heapAllocOnly is uglier and no better than allocateOnHeap. If you want,
make it allocateOnHeapOnly.

I'm fine with allocateOnHeapOnly if you like big names better, just
allocateOnHeap doesn't coalesce with a usage of the variable.

Partially Off Heap Memtables

Attachments: CASSANDRA-6689-final-changes.patch,
CASSANDRA-6689-small-changes.patch

Move the contents of ByteBuffers off-heap for records written to a memtable.
(See comments for details)

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-22 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944214#comment-13944214
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. because people would find it and create a issue for that

My understanding (not my policy) is that this is explicitly not a default 
supported option (in the current state of affairs) because of the expectation 
of increased heap fragmentation and GC death spiral, which is why I'm happy 
with the ugly name for that one option.

bq. I has thinking of thing as easy as replacing HeapAllocator.instance with 
SlabAllocator.instance (which would only have a heap allocation enabled).

In this case what is to be gained? Seems likely it would only increase the need 
for old gen collections, as we wouldn't be able to do any reuse?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944217#comment-13944217
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


bq. My understanding (not my policy) is that this is explicitly not a default 
supported option (in the current state of affairs) because of the expectation 
of increased heap fragmentation and GC death spiral, which is why I'm happy 
with the ugly name for that one option.

So we can go both ways - remove it completely from the allocation type enum and 
don't support it or leave it in place and add it to documentation so users can 
make informed decision weather to use it or not meanwhile default would still 
be heap_slab or similar name.

bq. In this case what is to be gained? Seems likely it would only increase the 
need for old gen collections, as we wouldn't be able to do any reuse?

It could perform the same function as previous heap slab, for par new it 
doesn't make much different but if slab slips to the old get then it would 
create less fragmentation comparing to individual buffers getting promoted 
(plus we can pack some fresh allocations inside), but that's just an idea and 
isn't critical for this ticket...

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-22 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944247#comment-13944247
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

bq. So we can go both ways - remove it completely from the allocation type enum 
and don't support it or leave it in place and add it to documentation 

Why are those our only choices?  It's the way it is for a reason; for a very 
few users, it's legitimately useful for running on the Azul JVM or for creating 
huge numbers of CFs, but for mainstream users it's a Very Bad Idea.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944262#comment-13944262
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


[~jbellis] Sorry if I want clear, by 'it' i mean unslabbed_buffers or 
jvm_default as I call it. I think we should mention it as legitimate option 
in the config instead of having that it only in the code...

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-22 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944310#comment-13944310
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

Right, I'm saying that it's undocumented right now (CASSANDRA-5935) and I'd 
rather leave it that way.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13944314#comment-13944314
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Ok, I don't like that we don't mention things like that in the config with big 
fat warning, but if you guys think opposite (as you have more context) I am 
fine with leaving it out... 

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-21 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942954#comment-13942954
 ] 

Sylvain Lebresne commented on CASSANDRA-6689:
-

bq. Reviewing under the assumption that 1-3 will go in 2.1 and the rest into 3.0

So at the risk of adding more debate, I do think one of the point of separating 
things is also that we can evaluate things separately. Typically, I would be 
happy to see a quick evaluation of what #1+#2 gives us, and what #3 adds to the 
table. If 1+2 is borderline unusable without 3 (and if that's the case, it 
should be easy to cook up simple graph that shows it), then fine, let's do 3 in 
2.1. But if it turns out that the difference is no that big, I would personally 
be keen to move #3 to 3.0, cause we're shoving enough big patches in 2.1 post 
beta1 as it is imo. To put it another way, if we're making steps, let's try to 
be true to them, and not make too strong assumptions about what goes where, 
because as far as I'm concerned, nothing should never be set in stone until 
it's committed (and even then really).

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942958#comment-13942958
 ] 

Benedict commented on CASSANDRA-6689:
-

Can everyone please settle on what functionality is desired for this 
ticket/release, so I do not need to keep reorganising it. I am in the middle of 
rebasing and reorganising it again, but I will wait until some consensus is 
reached. If we only want #2 (#1 is useless on its own), the patch set will be 
quite different; I will drop all of the refactoring. If we plan to include #3 
(note that this quite provably reduces fixed on-heap overhead by a minimum of 
75%, and easily  90% for clustering columns), I will retain #1 with some minor 
tweaks, and do my best to further make the commits incremental, but I am not 
going to completely reorder everything for no good reason.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-21 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942966#comment-13942966
 ] 

Sylvain Lebresne commented on CASSANDRA-6689:
-

Well, this was kind of my point. If we decide to make steps that are properly 
defined and self contained (which is my understanding of what people are asking 
here, and what I though was agreed on truly), then we do not have to decide 
right away if we sign for step 2 by taking step 1. Otherwise, what's the point 
of separating it in steps?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942968#comment-13942968
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. what's the point of separating it in steps?

The point here was to make review easier. This is one single ticket. If we want 
to split it out into multiple tickets, please let's agree that and then I'll do 
it.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-21 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942995#comment-13942995
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

That is a good way to think about it -- we are asking for patches or patchsets 
that are self contained enough to be split out into a ticket apiece if 
necessary.  If the divisions I proposed are a poor fit for that, by all means 
let's come up with ones that make better sense.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943005#comment-13943005
 ] 

Benedict commented on CASSANDRA-6689:
-

FTR, this is unlikely to improve review, as ridding ourselves of #1 means #3 
will simply be much bigger.

However it seems this is the consensus, so proceeding. This may take some time. 
I will reopen CASSANDRA-6694 to take step #2, and have this ticket take #1.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943285#comment-13943285
 ] 

Benedict commented on CASSANDRA-6689:
-

Patch for this ticket available 
[here|https://github.com/belliottsmith/cassandra/tree/iss-6689-final]

I have made absolutely no unnecessary changes or refactors. There is one commit 
only, as there was no good way to make any of the changes for this specific 
commit whilst leaving it in a compilable state. For CASSANDRA-6694 I will split 
the patch up into multiple commits, but this is likely to take quite some time.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-21 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943299#comment-13943299
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

Thanks! That looks fairly straightforward now.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-21 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943787#comment-13943787
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


One more thing I forgot to mention which is for consideration, we might take 
advantage of SlabAllocator (on-heap) to accommodate all of the temporary 
buffers created by copying from Memtable so when that allocation spills to 
oldgen it would create less fragmentation in there.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-final-changes.patch, 
 CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Marcus Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13941498#comment-13941498
 ] 

Marcus Eriksson commented on CASSANDRA-6689:


[~xedin] it is here: https://github.com/belliottsmith/cassandra/tree/iss-6689-3

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942480#comment-13942480
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


[~benedict] I looked at the 
[#1|https://github.com/belliottsmith/cassandra/tree/iss-6689-1] and I have a 
concern... Can you elaborate why did you introduce all these Impl static 
classes? That looks especially interesting in context of Cell.Impl where e.g. 
ExpiringCell.Impl extends Cell.Impl all of the methods of which are static 
and then overrides Cell.Impl methods with it's own implementations, why can't 
this be solved by making Cell and descendants abstract classes and overriding 
required methods? 

Another example I see is in DecoratedKey - only Impl method used outside is 
Impl.compareTo(IPartitioner, ByteBuffer, RowPosition), which could be changed 
(moved) to be a static method of RowPosition and would call 
DecorateKey.compareTo(DecoratedKey) when needed, so there is no jump from 
Impl.compareTo(IPartitioner, ByteBuffer, RowPosition) to 
DecorateKey.compareTo(RowPosition) and back to Impl.compareTo(DecoratedKey, 
RowPosition).

One more nit thing, in DecoratedKey.java there is no need to mark token() and 
key() explicitly abstract, also token() is already defined in RingPosition so 
no need to declare it in DecoratedKey.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942490#comment-13942490
 ] 

Benedict commented on CASSANDRA-6689:
-

See my comment explaining 
[here|https://issues.apache.org/jira/browse/CASSANDRA-6694?focusedCommentId=13904708page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13904708]

bq. One more nit thing, in DecoratedKey.java there is no need to mark token() 
and key() explicitly abstract, also token() is already defined in 
RingPosition so no need to declare it in DecoratedKey.

The latter is an artefact of merging the getToken() and token() into the same 
method, and I agree. Although there's no harm in either, happy to change them.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942521#comment-13942521
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


How many times do I have to ask for the same thing in this ticket, *let's keep 
the changes to the point of the current ticket, please*?... As we don't need 
Impl for this ticket (if we need it at all in that way is the topic of the 
separate discussion, did we at least test how much do we actually safe by doing 
it that way? I think [~jbellis] with me on this one after reading 
CASSANDRA-6694) let's remove it, the less code we have to read the faster 
review process would go. So if the NativeCell is going to be introduced in #3 
that means all of that Impl stuff should go there too.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942528#comment-13942528
 ] 

Benedict commented on CASSANDRA-6689:
-

This ticket includes everything up to and including #3, so we DO need it for 
this ticket.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942550#comment-13942550
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Let me repeat this again, if we need Impl for #3 it should be in #3 not #1.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942553#comment-13942553
 ] 

Benedict commented on CASSANDRA-6689:
-

That is not repeating yourself, that is a different statement. You said 
anything not needed for this ticket. I introduced #1 specifically, prior to 
starting the split, as introducing all of the necessary refactors. I do not 
intend to change it again now, it has been waiting long enough.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-20 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942573#comment-13942573
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

Benedict, 1-3 were intended to be distinct steps so that review could be 
correspondingly simplified.  It does look to me like there is no clean 
breakdown as such in your branch, e.g. commits named in progress muddy the 
water considerably.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942576#comment-13942576
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Here is the quote from my previous comment:

_As we don't need Impl for this ticket before #3 (if we need it at all in that 
way is the topic of the separate discussion, did we at least test how much do 
we actually safe by doing it that way? I think Jonathan Ellis with me on this 
one after reading CASSANDRA-6694) let's remove it, the less code we have to 
read the faster review process would go. *So if the NativeCell is going to be 
introduced in #3 that means all of that Impl stuff should go there too*._

So first of all it's not clear if we even want to move in Impl direction as 
mentioned in CASSANDRA-6694, secondly we all agree that we are going to keep 
this to the point with roadmap we have so each of the points is individually 
deliverable/commitable, which essentially means less work for everybody and the 
ticket moves on faster, thirdly it looks like 4 people are reaching to one 
target reflected at least in Jonathan's roadmap where we want to get off-heap 
memtables with on-heap read side copy and so on _incrementally_, and you are 
just trying to do as such refactoring/changes as possible in all three of the 
bullet points, it looks that way because iss-6689-{1,2,3} all have at least 
*3K* additions and *2K* deletions, concerns about the size of those changes 
have already been raised by multiple people multiple times and yet we are still 
at the same spot with this.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13942623#comment-13942623
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Just for the reference, [a good example for the patch-set done 
right|https://github.com/iamaleksey/cassandra/compare/apache:cassandra-2.1...6506]
 all commits are to the point and built on top of each other, even thought it's 
~4K total I can review whole thing quickly because there is no noise and it's 
efficient to traceback in isolated changes.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-19 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940231#comment-13940231
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


[~jbellis] I am, sorry, I got mixed up in something else but I'm going to look 
at it mid this week (tomorrow or thursday).

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-19 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940983#comment-13940983
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


[~benedict] So what is the branch to look at for #1? 
https://github.com/belliottsmith/cassandra/tree/iss-6689-1 still has DataGroup 
inside, which should have been removed after Marcus's review, shouldn't it?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
  Labels: performance
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-18 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940202#comment-13940202
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

Are you still planning to look at this soon, Pavel?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-12 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931656#comment-13931656
 ] 

Benedict commented on CASSANDRA-6689:
-

Pushed an updated version with your changes merged, the cassandra.yaml 
reverted, and a theoretical deadlock risk eliminated in KeysSearcher and 
CompositesSearcher

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-12 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932277#comment-13932277
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Thanks, [~benedict]! I'm going to take a look at this tomorrow (i hope).

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-11 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930239#comment-13930239
 ] 

Benedict commented on CASSANDRA-6689:
-

Uploaded a fixed version of #3 to the same repository, to fix a couple of bugs 
spotted by [~krummas]

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-11 Thread Marcus Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930396#comment-13930396
 ] 

Marcus Eriksson commented on CASSANDRA-6689:


Reviewing under the assumption that 1-3 will go in 2.1 and the rest into 3.0, 
otherwise there is a some of stuff in #1 that should be in #3 etc, but leaving 
that aside for now.

Main point when reviewing was that I found myself trying to wrap my head around 
the Group concept several times, especially since it is not actually adding any 
functionality at this stage (I know it will when we do GC). We should probably 
remove it since it adds indirection that we don't need right now. Pushed a 
branch with the DataGroup and various Group classes in o.a.c.u.memory removed 
here: https://github.com/krummas/cassandra/commits/bes/6689-3.1 , wdyt?



 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-11 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930404#comment-13930404
 ] 

Benedict commented on CASSANDRA-6689:
-

[~krummas]:

It will make me sad, but you're absolutely right, it isn't necessary just yet.

Only thing to question is the change of default in conf/cassandra.yaml, but 
guessing this is a debugging oversight.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-08 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924780#comment-13924780
 ] 

Benedict commented on CASSANDRA-6689:
-

Note we will probably want to either include CASSANDRA-6781 at this point, or 
introduce another change to copy all records on-heap during memtable flush, 
else we may see the slower flush performance Marcus indicated.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923770#comment-13923770
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. One native Buffer each for name and value.

Probably we'll stick to a Buffer per name component here, rather than per name, 
for simplicity at this step.

This timeline translates to mostly what I suggested already, with a little 
tweaking, so I'll work with that (I have a new step 1, though):

# I will introduce the - .data refactor as my starting point, and refactors of 
.memory in 6694, all ignoring any Native implementations;
# Introduce a NativeByteBufferAllocator, and always copy onto heap if reading 
from it
# Introduce NativeCell, and a simplified NativeAllocator (i.e. without any 
magic to reclaim it)
# Introduce RefAction + RefAction driven reclaim
# Introduce GC


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923907#comment-13923907
 ] 

Benedict commented on CASSANDRA-6689:
-

Okay, so #1 is now 
[here|https://github.com/belliottsmith/cassandra/tree/iss-6689-1]

It also includes necessary patches not yet incorporated from CASSANDRA-6692, 
and redefines CellName.dataSize() to return the size on disk, not the size of 
the components, but otherwise only performs all the necessary refactoring.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924194#comment-13924194
 ] 

Benedict commented on CASSANDRA-6689:
-

#2 is now [here|https://github.com/belliottsmith/cassandra/tree/iss-6689-2]

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924471#comment-13924471
 ] 

Benedict commented on CASSANDRA-6689:
-

[#3|https://github.com/belliottsmith/cassandra/tree/iss-6689-3]

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-07 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13924711#comment-13924711
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

Thanks, Benedict.  Let's have Marcus and/or Pavel review those and create new 
tickets for 4/5.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922259#comment-13922259
]

Benedict commented on CASSANDRA-6689:
-

bq. Well, it seems like you never operated a real Cassandra cluster, did you?

You seem to have interpreted my query as an attack on the veracity of your
statement. It was not. I only wanted more specific facts that could be used to
target a solution, and preferably a new ticket on which to discuss them.

This discussion has all the hallmarks of approaching unproductivity, so after
this I do not think I have anything useful to add to the discussion, and will
leave the committers to decide whether or not to include this work or to wait
for you to produce your alternative:

# Any scheme that copies data will inherently incur larger GC pressure, as we
then copy for memtable reads as well as disk reads. Object overhead is in fact
_larger_ than the payload for many workloads, so even if we have arenas this
effect is not eliminated or even appreciably ameliorated.
# Temporary reader space (and hence your approach) is *not* predictable: it is
not proportional to the number of readers, but to the number and size of
columns the readers read. In fact it is larger than this, as we probably have
to copy anything we *might* want to use (given the way the code is
encapsulated, this is what I do currently when copying on-heap - anything else
would introduce notable complexity), not just columns that end up in the result
set.
# We appear to be in agreement that your approach has higher costs associated
with it. Further, copying potentially GB/s of (randomly located) data around
destroys the CPU cache, reduces peak memory bandwidth by inducing strobes,
consumes bandwidth directly, wastes CPU cycles waiting for the random lookups;
all to no good purpose. We should be reducing these costs, not introducing more.
# It is simply not clear, despite your assertion of clarity, how you would
reclaim any freed memory without separate GC (what else is GC but this
reclamation?), however you want to call it, when it will be interspersed with
non-freed memory, nor how you would guard the non-atomic copying (ref-counting,
OpOrder, Lock: what?). Without this information it is not clear to me that it
would be any simpler either.
# Your approach is currently (still poorly defined) vaporware.

Some further advantages specific to my approach:
# Pauseless operation, so improved predictability
# Absolute bound on memory utilisation, that can be rolled out to other data
structures, further improving overall performance predictability
# Lock-freedom and low overhead, so we move closer to being able to answer
queries directly from the messaging threads themselves, improving latency and
throughput

An alternative approach needs, IMO, to demonstrate a clear superiority to the
patch that is already available, especially when it will incur further work to
produce. It is not clear to me that your solution is superior in any regard,
nor any simpler. It also seems to be demonstrably less predictable and more
costly, so I struggle to see how it could be considered preferable.

Also:
bq. would that keep memtable around longer than expected

I'm not sure why you suppose this would be so. We can already happily reclaim
any subportion of a region or memtable, so there is no reason to think this
would be necessary, even if they resided in the same structure.

bq. there seems to be a low once off-heap feature is enabled which is no
surprise once you look at how much complexity does it actually add.

This is certainly addressable. The off-heap feature by itself I have
performance tested somewhat, and competes with Java GC for throughput (beating
it as number of live objects increases), whilst being _pauseless_, so the
complexity you refer to is no slouch and highly unlikely to be the culprit.
There are issues with the way we manage IO for direct byte buffers, but I have
addressed these in CASSANDRA-6781.

Partially Off Heap Memtables

Key: CASSANDRA-6689
URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Benedict
Assignee: Benedict
Fix For: 2.1 beta2

Attachments: CASSANDRA-6689-small-changes.patch

Move the contents of ByteBuffers off-heap for records written to a memtable.
(See comments for details)

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922830#comment-13922830
]

Pavel Yaskevich commented on CASSANDRA-6689:

You probably still don't understand my point so let me clarify, I only care
about 3 things: maintainability, consistency, performance. This is a big chunk
of code which somebody has to maintain, which allows of inconsistent style (can
do it with refferer = null or maybe somehow else situation, maybe put RefAction
as null in the argument or maybe RefAction.impossible() but it really need to
look throught it to make sure that it check for null everywhere and so on),
and brings it's own assumptions e.g. _ in front, also adding a poor
performance. Before that is addressed, I'm -1 of this. vnodes were a big chunk
of work but people were able to split into roadmap and successfully finish, so
I don't see any reason why we can't do it here.

bq. Any scheme that copies data will inherently incur larger GC pressure, as we
then copy for memtable reads as well as disk reads. Object overhead is in fact
larger than the payload for many workloads, so even if we have arenas this
effect is not eliminated or even appreciably ameliorated.

For disk reads we have to copy even for mmap, so we don't keep any references
on deletion time and files can be safely deallocated. So why don't copy
directly to the memory allocated by the pool?... Object overhead would stay
inside ParNew bounds (for ( p999)) so object allocation is relatively cheap
comparing to everything else, that's the goal of JVM as a whole.

bq. Temporary reader space (and hence your approach) is not predictable: it is
not proportional to the number of readers, but to the number and size of
columns the readers read. In fact it is larger than this, as we probably have
to copy anything we might want to use (given the way the code is encapsulated,
this is what I do currently when copying on-heap - anything else would
introduce notable complexity), not just columns that end up in the result set.

Doesn't matter how many emphasises you put here it won't make it this argument
stronger because, as the main idea is to have those pools of a fixed size which
would create back-pressure to client in the situations of heavy load which is
exactly what operators want - go gradually slower without extreme latency
disturbance.

bq. We appear to be in agreement that your approach has higher costs associated
with it. Further, copying potentially GB/s of (randomly located) data around
destroys the CPU cache, reduces peak memory bandwidth by inducing strobes,
consumes bandwidth directly, wastes CPU cycles waiting for the random lookups;
all to no good purpose. We should be reducing these costs, not introducing more.

Let's say we live in the modern NUMA world, so we are going to do the following
pin the group threads to CPU cores so we have fixed scope of allocation of
different things, that why there is no significant bus pressure for copy among
other things JVM/Cassandra does with memory (not even significant cache
coherency traffic).

bq. It is simply not clear, despite your assertion of clarity, how you would
reclaim any freed memory without separate GC (what else is GC but this
reclamation?), however you want to call it, when it will be interspersed with
non-freed memory, nor how you would guard the non-atomic copying (ref-counting,
OpOrder, Lock: what?). Without this information it is not clear to me that it
would be any simpler either.

The same way as jemalloc or any other allocator does it, it least that is not
reinventing the wheel.

bq. Pauseless operation, so improved predictability

What do you mean by this, we still leave on the JVM, do we not? Also what would
it do in the low memory situation? allocate from heap? wait? This is not
pauseless operation.

bq. Lock-freedom and low overhead, so we move closer to being able to answer
queries directly from the messaging threads themselves, improving latency and
throughput

We won't be able to answer queries directly from the messaging threads for the
number of reasons not even indirectly related to your approach, at least for
not breaking SEDA, which also supposed to be a safe guide for over utilization.

bq. An alternative approach needs, IMO, to demonstrate a clear superiority to
the patch that is already available, especially when it will incur further work
to produce. It is not clear to me that your solution is superior in any regard,
nor any simpler. It also seems to be demonstrably less predictable and more
costly, so I struggle to see how it could be considered preferable.

Overall, I'm not questioning the idea of being able to track what goes where
would be great, I'm questioning implementation and trade-offs comparing to
other approaches.

Partially Off Heap Memtables

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922861#comment-13922861
]

Benedict commented on CASSANDRA-6689:
-

bq. Before that is addressed, I'm -1 of this

These are already addressed in CASSANDRA-6694.

bq. Object overhead would stay inside ParNew bounds (for ( p999))

The more we rely on staying within ParNew, the more often we are going to
exceed it; and reducing the number of ParNew runs is also a good thing. You
said you have 300ms ParNew pauses, occuring every second? So reducing the max
latency and total latency is surely a good thing?

bq. as the main idea is to have those pools of a fixed size

How does this work without knowing the maximum size of a result set? We can't
have a client block forever because we didn't provide enough room in the pools.
Potentially we could have it error, but this seems inelegant to me, when it can
be avoided. It also seems a suboptimal way to introduce back pressure, since it
only affects concurrent reads / large reads. We should raise a ticket
specifically to address back pressure, IMO, and try to come up with a good all
round solution to the problem.

bq. Let's say we live in the modern NUMA world, so we are going to do the
following pin the group threads to CPU cores so we have fixed scope of
allocation of different things, that why there is no significant bus pressure
for copy among other things JVM/Cassandra does with memory

It would be great to be more NUMA aware, but this is not about traffic over the
interconnect, but simply with the arrays/memory banks themselves, and doesn't
address any of the other negative consequences. You'll struggle to get more
than a few GB/s bandwidth out of a modern CPU given that we are copying object
trees (even shallow ones - they're still randomly distributed), and we don't
want to waste any of that if we can avoid it

bq. What do you mean by this, we still leave on the JVM, do we not? Also what
would it do in the low memory situation? allocate from heap? wait? This is not
pauseless operation.

I did not mean to imply pauseless globally, but the memory reclaim operations
introduced here are pauseless, thus reducing pauses overall, as whenever we
would have had a pause from ParNew/FullGC to reclaim, we would not here.

bq. We won't be able to answer queries directly from the messaging threads for
the number of reasons not even indirectly related to your approach, at least
for not breaking SEDA, which also supposed to be a safe guide for over
utilization.

I'm not sure why you think this would be a bad thing. It would only help for
CL=1, but we are often benchmarked using this, so it's an important thing to be
fast on if possible, and there are definitely a number of our users who are
okay with CL=1 for whom faster responses would be great. Faster query answering
should reduce over-utilisation, assuming some back-pressure built in to
MessagingService or the co-ordinator managing its outstanding proxied requests
to ensure it isn't overwhelmed by the responses.

bq. The same way as jemalloc or any other allocator does it, it least that is
not reinventing the wheel.

Do you mean you would use jemalloc for every allocation? In which case there
are further costs incurred for crossing the JNA barrier so frequently, almost
certainly outweighing any benefit to using jemalloc. Otherwise we would need to
maintain free-lists ourselves, or perform compacting GC. Personally I think
compacting GC is actually much simpler.

Partially Off Heap Memtables

Key: CASSANDRA-6689
URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Benedict
Assignee: Benedict
Fix For: 2.1 beta2

Attachments: CASSANDRA-6689-small-changes.patch

Move the contents of ByteBuffers off-heap for records written to a memtable.
(See comments for details)

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13922983#comment-13922983
]

Pavel Yaskevich commented on CASSANDRA-6689:

bq. These are already addressed in CASSANDRA-6694.

Is there a branch/patch to see all of the changes involved?

bq. The more we rely on staying within ParNew, the more often we are going to
exceed it; and reducing the number of ParNew runs is also a good thing. You
said you have 300ms ParNew pauses, occuring every second? So reducing the max
latency and total latency is surely a good thing?

I'm not trying to imply that we should rely on the ParNew, I'm just saying that
all of the read/write requests are short lived enough to stay inside young
generation region, even if we slip the effect are masked to all other long term
allocations that we do which get promoted.

bq. How does this work without knowing the maximum size of a result set? We
can't have a client block forever because we didn't provide enough room in the
pools. Potentially we could have it error, but this seems inelegant to me, when
it can be avoided. It also seems a suboptimal way to introduce back pressure,
since it only affects concurrent reads / large reads. We should raise a ticket
specifically to address back pressure, IMO, and try to come up with a good all
round solution to the problem.

Let the users specify directly or if not specified just take a guess based on
total system memory, plus we can add an option to extend in the run time, of
any problem that uses database there is a capacity planing stage and use-case
spec or at least experimentation which would allow to size pools correctly.

bq. I did not mean to imply pauseless globally, but the memory reclaim
operations introduced here are pauseless, thus reducing pauses overall, as
whenever we would have had a pause from ParNew/FullGC to reclaim, we would not
here.

Sorry but I still don't get it, do you mean lock-free/non-blocking or that it
does no syscalls or something similar? But that doesn't matter for pauses as
much as allocation throughput and fragmentation of Java GC.

bq. I'm not sure why you think this would be a bad thing. It would only help
for CL=1, but we are often benchmarked using this, so it's an important thing
to be fast on if possible, and there are definitely a number of our users who
are okay with CL=1 for whom faster responses would be great. Faster query
answering should reduce over-utilisation, assuming some back-pressure built in
to MessagingService or the co-ordinator managing its outstanding proxied
requests to ensure it isn't overwhelmed by the responses.

The fact is that we have SEDA at least as a first line of defense for
over-utilization, so local reads a scheduled directly a different stage, we
shouldn't be trying to do anything directly in the messaging stage, it adds
another complications not related to this very ticket.

bq. Do you mean you would use jemalloc for every allocation? In which case
there are further costs incurred for crossing the JNA barrier so frequently,
almost certainly outweighing any benefit to using jemalloc. Otherwise we would
need to maintain free-lists ourselves, or perform compacting GC. Personally I
think compacting GC is actually much simpler.

As I mentioned, there is a jemalloc implementation in Netty project already
which is pure Java, so we at least should consider it before trying to
re-invent.

bq. It would be great to be more NUMA aware, but this is not about traffic over
the interconnect, but simply with the arrays/memory banks themselves, and
doesn't address any of the other negative consequences. You'll struggle to get
more than a few GB/s bandwidth out of a modern CPU given that we are copying
object trees (even shallow ones - they're still randomly distributed), and we
don't want to waste any of that if we can avoid it

I'm still not sure how worse it would make the things, Java is the worst of
cache locality with it's object placement anyway but we are not going to be
copying deep trees. Let me outline the steps that I want to see to be taken to
make incremental, which is how we usually do things for Cassandra project:

# code an off-heap allocator or use existing one like on of the ByteBufAlloc
implementations (evaluate new vs. existing);
# Change memtables to use allocator for step #1 and copy data to heap buffers
when it's read from memtable so it's easy to track the life time of buffers;
# Do an extensive testing to check how horrible the copy really is for
performance, find ways to optimize;
# If everything is bad, switch from copy to reference tracking (in all of the
commands, native protocol etc.);
# Do an extensive testing to check if it improves the situation;
# Change serialization/deserialization to use new allocator (pooled buffer
instead of always allocating on heap);
# Same as

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923029#comment-13923029
]

Benedict commented on CASSANDRA-6689:
-

bq. Is there a branch/patch to see all of the changes involved?

Yes,
[offheap2c+6781|https://github.com/belliottsmith/cassandra/tree/offheap2c+6781]
and [offheap2c|https://github.com/belliottsmith/cassandra/tree/offheap2c+6781]

The former includes performance enhancements for DirectByteBuffer use, from a
separate ticket.

bq. I'm just saying that all of the read/write requests are short lived enough
to stay inside young generation region
Well, yes and no. We have to wait until any client finishes processing the
data, so there's no absolute guarantee they'll not survive. But either way,
ParNew pauses are almost as bad as full GC, only they happen much more often.
300ms pauses are not a good thing, and if we can then reduce the size of YG so
that when these pauses do happen they're shorter (or, say, maybe even use G1GC)
then that's even better.

bq. Sorry but I still don't get it, do you mean lock-free/non-blocking or that
it does no syscalls or something similar? But that doesn't matter for pauses as
much as allocation throughput and fragmentation of Java GC.

I mean no worker threads have to stop in order for memory to be reclaimed.
There is no STW for reclaiming memory. This is unrelated to the lock freedom.

bq. Java is the worst of cache locality with it's object placement anyway

Well, exactly. That's my point here :-)

bq. it adds another complications not related to this very ticket.

I have no intention of doing it in this ticket, just indicating it as a very
useful improvement.

bq. which is how we usually do things for Cassandra project

It looks like you really have two steps, if I boil it down: 1) implement
copying approach; 2) if slow, implement this approach?

The issue with this, really, though, is that if we simply allocate ByteBuffers
off-heap like we have in this ticket, we get none of the benefit of increased
memtable capacity, since the on-heap overheads are still huge. Since that's one
of the main goals here, it seems problematic to lose out on it - we don't need
to test to find out what the result would be. This ticket was supposed to only
be a stepping stone.

Possibly we could scale CASSANDRA-6694 back somewhat, removing any support for
RefAction.refer(), and always performing a copy onto heap from the NativeCell
implementations, and spending some time ripping out any of the GC or any of the
code at Memtable discard for coping with RefAction.refer(). But honestly this
seems like a waste of effort to me, as the majority of the code would remain,
we'll just not have as good a final solution. But it could be done if that is
the community preference. We could probably split it up into further commits,
but each commit adds the potential for more errors in my book, when we have a
good solution that is ready to go.

Maximally separated timeline for separated commits of 6694 would be:
# introduce concurrency primitives
# introduce .memory and .data refactor, but only for ByteBuffer allocators, and
RefAction, but only allocateOnHeap
# introduce all .data.Native* implementations and a cut down native allocator,
using OpOrder to guard copying like we currently do for referencing (we need to
use something, and it is simpler than ref counting or anything else)
# introduce RefAction.refer(), GC, etc. (i.e. final patch)

I would rather not split it up as, as I say, each new patch is an opportunity
to mess up, but it could be done. We can do performance testing to our hearts
content at each stage, although personally I think such testing would not be
sufficient to demonstrate no benefit to the current approach, as even with
little benefit seen it presupposes no future performance benefit is capped by
the slower solution. So I would push for the final patch anyway. That said, I
would be surprised if we did not see any improvement by comparison.

Partially Off Heap Memtables

Key: CASSANDRA-6689
URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Benedict
Assignee: Benedict
Fix For: 2.1 beta2

Attachments: CASSANDRA-6689-small-changes.patch

Move the contents of ByteBuffers off-heap for records written to a memtable.
(See comments for details)

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923147#comment-13923147
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Doing everything together is even bigger opportunity to mess up. Let me clarify 
my point, hopefully you will hear it this time as you plead for committers to 
decide so I'm representing. From my side nothing from this ticket would go in 
until we separate all of the work into multiple committable stages, namely (but 
not necessarily exactly like that): 

# Make use off-heap memory inside of Memtables, which delivers increased 
capacity and less frequent/painful CMS (commit #1):
  a. Determine if we actually need to re-implement custom allocator or can go 
with existing one e.g. netty's implementation
  b. plug allocator from #1 into Memtable for memory usage tracking, do all 
necessary bookkeeping to release memory back to the pool once it's no longer 
needed (overwrite, flush etc.), make Memtable memory life time limited by 
copying data from Memtable on the read path.

# Optimizations after #1 which determine how big of impact the copying actually 
makes comparing to original situation on different workloads and how good is 
ref tracking comparing to that (commit #2)
   Delivers - memory reference tracking (RefAction, GC + what ever is needed)

(if needed) # Further enhancements to allocators/ref accounting (commit #3)

This way we can reuse already existing RefAction and related code that we 
already have, commit #1 doesn't require any substantial changes in the way we 
treat memtables and adds decoupled allocator framework, commit #2 plugs 
reference tracking into all previous changes, commit #3 further improves things 
if needed.


   

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923157#comment-13923157
 ] 

Benedict commented on CASSANDRA-6689:
-

I think my suggested split meets your requirement, but I'm not absolutely sure.

My 1-2 are refactors to bring abstractions closer; 3 is your step 1.b; 4+5 is 
your step 2.

The only difference is I assume we want our own allocator, since we won't be 
able to use Netty's for steps 4+5, and that we allocate Native* within the 
memtable to save substantially more space than allocating ByteBuffer would.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923186#comment-13923186
 ] 

Benedict commented on CASSANDRA-6689:
-

Whilst it would be nice to depend on an external project here, it doesn't 
resolve the problem that it's heavier-weight on heap than what we can achieve 
ourselves is. In CASSANDRA-6694 we really try to push the heap overhead to its 
minimum, and every byte counts; also, we need an extra field we can manipulate 
for maintaining the allocation list for GC, when we come to introduce it. So we 
really do need control of it, otherwise we'll only get us as far as step 1.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923188#comment-13923188
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


Can you elaborate why is it heavier-weight on heap?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923201#comment-13923201
 ] 

Benedict commented on CASSANDRA-6689:
-

The smallest direct byte buffer from Netty is on the order of 64-bytes minimum, 
whereas we manage 24-bytes; plus for maintaining the allocation list we then 
need to have a separate list, rather than chaining the allocations, so at least 
another 4-bytes overhead (more, but in best case it tends to 4).

[https://github.com/netty/netty/blob/master/buffer/src/main/java/io/netty/buffer/AbstractByteBuf.java]
[https://github.com/netty/netty/blob/master/buffer/src/main/java/io/netty/buffer/UnpooledUnsafeDirectByteBuf.java]


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-06 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923561#comment-13923561
]

Jonathan Ellis commented on CASSANDRA-6689:
---

It seems like we have a pretty broad consensus (Pavel + Sylvain + Aleksey;
Marcus?) that we should break this up more. How about something like this?

# Trading some performance for much bigger memtables is useful, so introduce
off-heap memtables as an option, default to off. No GC, just free after flush.
One native Buffer each for name and value. Copy onto heap when reading.
# Add NativeCell a la 6694 to reduce at-rest on-heap overhead. Still copying
into an on-heap Cell for reads. Still no GC.
# Add the simplest possible GC to avoid copying into on-heap Cells. If we
still have to flush for overwrite-heavy workloads that's okay.
# Full GC that doesn't need to flush unless we really have used up our memory
in live Cells.
# Get rid of on-heap memtables.

I'm waving my hands a bit b/c I'm not sure what 3 looks like but I think/hope
we can probably come up with something simpler than what we have today as an
intermediate step. And maybe 5 comes after 3. But I think 1 and 2 make sense
for 2.1b2 with the others for 3.0.

Partially Off Heap Memtables

Key: CASSANDRA-6689
URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Benedict
Assignee: Benedict
Fix For: 2.1 beta2

Attachments: CASSANDRA-6689-small-changes.patch

Move the contents of ByteBuffers off-heap for records written to a memtable.
(See comments for details)

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-06 Thread Marcus Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13923642#comment-13923642
 ] 

Marcus Eriksson commented on CASSANDRA-6689:


Sounds good, though i would have been good with shipping the whole patch but 
marking the off-heap parts as experimental and defaulting to HeapSlabPool

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-05 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920720#comment-13920720
]

Benedict commented on CASSANDRA-6689:
-

bq. sort of RCU (i'm looking at you OpOrder)

What do you mean here? If you mean read-copy-update, OpOrder is nothing like
this.

bq. I'm not sure what is to retain here if we do that copy when we send to the
wire

Ultimately, doing this copying before sending to the wire is something I would
like to avoid. Using the RefAction.allocateOnHeap() on top of this copying sees
wire transfer speeds for thrift drop by about 10% in my fairly rough-and-ready
benchmarks, so obviously copying has a cost. Possibly this cost is due to
unavoidably copying data you don't necessarily want to serialise, but it seems
to be there. Ultimately if we want to get in-memory read operations to 10x
their current performance, we can't go cutting any corners.

bq. introducing separate gc

I've stated clearly what this introduces as a benefit: overwrite workloads no
longer cause excessive flushes

bq. things but as we have a fixed number of threads it is going to work out
the same way as for buffering open files in the steady system state

Your next sentence states how this is a large cause of memory consumption, so
surely we should be using that memory if possible for other uses (returning it
to the buffer cache, or using it internally for more caching)?

bq. Temporary memory allocated by readers is exactly what we should be managing
at the first place because they allocate the most and it always the biggest
concern for us

I agree we should be moving to managing this as well, however I disagree about
how we should be managing it. In the medium term we should be bringing the
buffer cache in process, so that we can answer some queries without handing off
to the mutation stage (anything known to be non-blocking and fast should be
answered immediately by the thread that processed the connection), at which
point we will benefit from shared use of the memory pool, and concrete control
over how much memory readers are using, and zero-copy reads from the buffer
cache. I hope we may be able to do this for 3.0.

bq. do a simple memcpy test and see how much mb/s can you get from copying from
one pre-allocated pool to another

Are you performing a full object tree copy, and doing this with a running
system to see how it affects the performance of other system components? If
not, it doesn't seem to be a useful comparison. Note that this will still
create a tremendous amount of heap churn, as most of the memory used by objects
right now is on-heap. So copying the records is almost certainly no better for
young gen pressure than what we currently do - in fact, *it probably makes the
situation worse*.

bq. it's not the memtable which creates the most of the noise and memory
presure in the system (even tho it uses big chunk of heap)

It may not be causing the young gen pressure you're seeing, but it certainly
offers some benefit here by keeping more rows in memory so recent queries are
more likely to be answered with zero allocation, so reducing young gen
pressure; it is also a foundation for improving the row cache and introducing a
shared page cache which could bring us closer to zero allocation reads.

It's also not clear to me how you would be managing the reclaim of the off-heap
allocations without OpOrder, or do you mean to only use off-heap buffers for
readers, or to ref-count any memory as you're reading it? Not using off-heap
memory for the memtables would negate the main original point of this ticket:
to support larger memtables, thus reducing write amplification. Ref-counting
incurs overhead linear to the size of the result set, much like copying, and is
also fiddly to get right (not convinced it's cleaner or neater), whereas
OpOrder incurs overhead proportional to the number of times you reclaim. So if
you're using OpOrder, all you're really talking about is a new RefAction:
copyToAllocator() or something. So it doesn't notably reduce complexity, it
just reduces the quality of the end result.

Partially Off Heap Memtables

Key: CASSANDRA-6689
URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Benedict
Assignee: Benedict
Fix For: 2.1 beta2

Attachments: CASSANDRA-6689-small-changes.patch

Move the contents of ByteBuffers off-heap for records written to a memtable.
(See comments for details)

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-05 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920721#comment-13920721
 ] 

Benedict commented on CASSANDRA-6689:
-

bq.  but the reads and internode communication (especially the latter).

Also, I'd love to see some evidence for this (particularly the latter). I'm not 
disputing it, just would like to see what caused you to reach these 
conclusions. These definitely warrant separate tickets IMO, but if you have 
evidence for it, it would help direct any work.


 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-05 Thread Pavel Yaskevich (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13921110#comment-13921110
]

Pavel Yaskevich commented on CASSANDRA-6689:

bq. I've stated clearly what this introduces as a benefit: overwrite workloads
no longer cause excessive flushes

If you do a copy before of the memtable buffer, you can clearly put it back to
the allocator once it's overwritten or becomes otherwise useless, in the
process of merging columns with previous row contents.

bq. Your next sentence states how this is a large cause of memory consumption,
so surely we should be using that memory if possible for other uses (returning
it to the buffer cache, or using it internally for more caching)?

It doesn't state that is a *large cause of memory consumption*, it states that
it has additional cost but it the steady state it don't be allocating over the
limit because of the properties of the system that we have, namely the fixed
number of threads.

bq. Are you performing a full object tree copy, and doing this with a running
system to see how it affects the performance of other system components? If
not, it doesn't seem to be a useful comparison. Note that this will still
create a tremendous amount of heap churn, as most of the memory used by objects
right now is on-heap. So copying the records is almost certainly no better for
young gen pressure than what we currently do - in fact, it probably makes the
situation worse.

Do you mean this? Let's say we copy a Cell (or Column object), which is 1 level
deep so just allocate additional space for the object headers and do a copy,
most of the work would be spend by doing a copy of the data (name/value)
anyway, so as we want to live inside of ParNew, see how many such allocations
you will be able to do in e.g. 1 second then wipe the whole thing and do it
again. We are doing mlockall too which should make that even faster as we are
sure that heap is pre-faulted already.

bq. It may not be causing the young gen pressure you're seeing, but it
certainly offers some benefit here by keeping more rows in memory so recent
queries are more likely to be answered with zero allocation, so reducing young
gen pressure; it is also a foundation for improving the row cache and
introducing a shared page cache which could bring us closer to zero allocation
reads. _And so on_

I'm not sure how this would help in the case of row cache, once reference is
added to the row cache it means that memtable would hang in there until that
row is purged, so if there is a long lived row (write once, read multiple
times) in each of the regions (and we reclaim based on regions) would that keep
memtable around longer than expected?

bq. It's also not clear to me how you would be managing the reclaim of the
off-heap allocations without OpOrder, or do you mean to only use off-heap
buffers for readers, or to ref-count any memory as you're reading it? Not using
off-heap memory for the memtables would negate the main original point of this
ticket: to support larger memtables, thus reducing write amplification.
Ref-counting incurs overhead linear to the size of the result set, much like
copying, and is also fiddly to get right (not convinced it's cleaner or
neater), whereas OpOrder incurs overhead proportional to the number of times
you reclaim. So if you're using OpOrder, all you're really talking about is a
new RefAction: copyToAllocator() or something. So it doesn't notably reduce
complexity, it just reduces the quality of the end result.

In terms of memory usage copy adds additional linear cost yes but at the same
time it makes the system behavior more controllable/predictable which is what
ops usually care about where, even on the artificial stress test, there seems
to be a low once off-heap feature is enabled which is no surprise once you look
at how much complexity does it actually add.

bq. Also, I'd love to see some evidence for this (particularly the latter). I'm
not disputing it, just would like to see what caused you to reach these
conclusions. These definitely warrant separate tickets IMO, but if you have
evidence for it, it would help direct any work.

Well, it seems like you never operated a real Cassandra cluster, did you? All
of the problems that I have listed here are well known, you can even simulate
this with docker VMs and making internal network gradually slower, there is no
back pressure mechanism built-in so right now Cassandra would accept a bunch or
operations on the normal speed (if the outgoing link is physically different
than internal) but suddenly would just stop accepting anything and fail
internally because of GC storm caused by all of the internode buffers hanging
around.

Partially Off Heap Memtables

Key: CASSANDRA-6689
URL:

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-04 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919225#comment-13919225
]

Benedict commented on CASSANDRA-6689:
-

bq. What I want is to see more control over how we deal with cleaner threads,
as the main point to be able to run with smaller heaps, and thread stacks could
consume ~300K we are risking to OOM in certain situations because we don't
control whole thread life time, that would also help for people in environments
guided by ulimit or similar.

Just for clarity, it's the _collector_ threads, not the _cleaner_ threads
you're worried about, right? Because there is only ever one cleaner thread.
Like I say, this is mostly trying to help with an anti-pattern, so I'm not sure
it's worth addressing too aggressively in this ticket. In real terms you'll
struggle to spin up very many threads at all, but let's put an arbitrary limit
of 100 threads on the pool for now. I don't think this is worth offering
configuration options to the user here.

bq. Please do think about it, we need to try to reduce the complexity of this
to minimize bus factor

As I said, I (still) can't see a good way to use this information to simplify
matters usefully. If we flattened Pool, AllocatorGroup and Allocator all into
the one object, and we had a per-CF memory limit, we would be able to simplify
all of the SubPool and SubAllocator memory management considerably, but then we
have the issue that there's no global management of memory resources, nor
shared re-use. I don't think this would simplify the actual recycling of memory
regions that is introduced in this ticket at all though, but it's possible I'm
missing something.

Suggestions always welcome :-)

bq. Exactly, maybe it's time to re-evaluate trade-off between copying data to
per-allocated regions with limited ttl vs. tracking data all around the place.

This is the opposite suggestion, right? :-) It is definitely something worth
exploring in future.

bq. Have you had a chance to do more on vs. off heap performance testing?

I'm not sure Marcus was performance testing. Personally I'm most interested in
correctness right now: performance should follow (in particular, slow flushing
is certainly a bug which can be fixed). Certainly for overwrite workloads
I've demonstrated the reduced flushing on my box, but further testing of the
impact is likely to be a long drawn out process, and probably beyond scope of
this ticket IMO.

Performance testing scope might include:
- Establish how much heap limit reduction this permits
- Establish a severe GC pause inducing workload to see if this mitigates
- Establish if G1GC is made viable with this, and if not, what remaining
impacts there are to adoption that might be lifted
- Establish how much improvement to overwrite workloads (or mixed workloads,
where reduced overwrite workload memory consumption = reduced non-overwrite
workload flushing)
- Establish new reasonable upper memory limit for heap and memtable (both
independently), for different workloads. e.g. large data payload workloads may
now practicably support much larger memtables and dramatically reduced flushing
- Establish impact on upper *data* limit for different workloads, with same
memory limit as olde, and the impact on size-of-flush and concomitant reduction
in write amplification

Partially Off Heap Memtables

Key: CASSANDRA-6689
URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Benedict
Assignee: Benedict
Fix For: 2.1 beta2

Attachments: CASSANDRA-6689-small-changes.patch

Move the contents of ByteBuffers off-heap for records written to a memtable.
(See comments for details)

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-04 Thread Marcus Eriksson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13919511#comment-13919511
 ] 

Marcus Eriksson commented on CASSANDRA-6689:


bq. As I said, I (still) can't see a good way to use this information to 
simplify matters usefully.
+1, my initial review-comment was probably badly phrased, the patch adds a 
complicated feature and it will take effort to learn these things for anyone 
working with the Cassandra code base, but I think it will be worth it in the end

bq. I'm not sure Marcus was performance testing.
no, not at all, just ran stress a couple of times to get ballpark numbers

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-04 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920328#comment-13920328
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


So the only question that I still have - what is the main goal of the ticket? 
Is it to move to zero copy for memtables or minimize number of allocations on 
java heap? If it's the latter then I don't see why we can't have separate 
allocator for memtable and use a separate allocator for operations which would 
do a copy from memtable memory, that way we don't have to track as such as we 
do right now and allocator interface is going to be better encapsulated 
(basically expose alloc/release), I'm not sure if we need sub pools or groups 
of allocators in there, I really like how jemalloc and others do it also Netty 
already has implementation of jemalloc (ByteBufAlloc) so maybe we can reuse 
that?...

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-04 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920357#comment-13920357
]

Benedict commented on CASSANDRA-6689:
-

bq. So the only question that I still have - what is the main goal of the
ticket? Is it to move to zero copy for memtables or minimize number of
allocations on java heap?

Zero copy would certainly be nice to retain. Otherwise why not just ref-count
the memory?

bq. allocator interface is going to be better encapsulated (basically expose
alloc/release)

I don't follow how this would improve encapsulation, sorry. Could you sketch an
example? It sounds to me like this actually would create more complexity
outside of the main body of code. It might reduce complexity a little where the
fun stuff happens, but is it all really so difficult to understand that we want
to start opting for less good solutions that require more work? Also makes it
difficult to impose tight caps on memory used, as we can't practicably
impose/manage temporary memory allocated by readers. We already have this
problem for disk reads, but no point opening up further doors for OOM.

bq. I'm not sure if we need sub pools or groups of allocators in there, I
really like how jemalloc and others do it also Netty already has implementation
of jemalloc (ByteBufAlloc) so maybe we can reuse that?...

I'm unclear how this addresses the concerns of resource management. The
ByteBufAllocator appears to simply allocate byte buffers, which by itself is a
pretty simple problem. jemalloc is also just a low level replacement to malloc,
so I'm not sure where it enters the discussion?

Partially Off Heap Memtables

Key: CASSANDRA-6689
URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Benedict
Assignee: Benedict
Fix For: 2.1 beta2

Attachments: CASSANDRA-6689-small-changes.patch

Move the contents of ByteBuffers off-heap for records written to a memtable.
(See comments for details)

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-04 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920405#comment-13920405
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


bq. Zero copy would certainly be nice to retain. Otherwise why not just 
ref-count the memory?

I'm not sure what is to retain here if we do that copy when we send to the 
wire. Wouldn't we better of with ref count and copy but having separate 
allocator for temporary things (e.g. write/read buffers) and for more long term 
(e.g. memtable, compression buffers)?

So can you please answer the question - what is the biggest advantage of 
tracking everything, introducing separate gc, sort of RCU (i'm looking at you 
OpOrder) vs. mem copy + different allocators for different things if there is a 
fixed size pre-allocated buffer pool off heap (we can also do COW for some of 
the chunks)? Copy is *not* be a big problem (regarding throughput/latency) 
comparing to other operations (do a simple memcpy test and see how much mb/s 
can you get from copying from one pre-allocated pool to another), there is a 
trade-off of course as we are going to spend more memory to store temporary 
things but as we have a fixed number of threads it is going to work out the 
same way as for buffering open files in the steady system state... Temporary 
memory allocated by readers is *exactly* what we should be managing at the 
first place because they allocate the most and it always the biggest concern 
for us (ParNew pauses are reaching 300 ms in 1 sec. intervals), that's why I'm 
saying let's use low level allocator for things like reads with it's own 
arena and chunk sizes, as we originally wanted and have a separate low level 
allocator for the memtable, preallocate at the startup, return buffers to the 
pool when we serialize for the wire and go from there with global thresholds 
etc.

After all that said, I would suggest that we do a separate low level 
allocator in this ticket (the same way as jemalloc) and plug it into the 
memtable and do a copy on read path, once that step is done we can plug-in 
allocator for all of the serialization/deserialization logic that we have when 
it's going from/to the wire and the last step would be to use allocator for all 
of the sstable read operations.

All this might seem irrelevant for some people but I can say that after 
fire-fighting Cassandra for some time for different use cases, it's not the 
memtable which creates the most of the noise and memory presure in the system 
(even tho it uses big chunk of heap) but the reads and internode communication 
(especially the latter).

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-03 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918701#comment-13918701
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


I read through most of the code in offheap1b skipping queue implementations add 
there and I have couple of questions:

I see this pattern in the code (e.g. {RangeSlice, Read}VerbHandler.java) where 
Referrer is allocated try { operation(); referrer = null } finally { 
referrer.setDone() if (referrer != null); } where documentation says that that 
setDone() should be called when we are done with the referrer, can you 
elaborate why do we need to explicitly set it to null?

Pool.java - can we schedule cleaners to a static thread pool (or something 
similar) instead of creating a thread per cleaner? With would add more control 
in the situation when we flush memtables in short intervals. Related question, 
why can't we have per-cf (or global) pool instead of creating it per memtable, 
might be possible to remove allocator groups if we do that?

Also I just want to throw an idea here maybe there is something useful in it - 
as we have fixed number of stages and threads in them and don't normally share 
much between them, maybe it would be possible to use that as an advantage for 
allocator gc? What I mean is, it might be possible to track life-time of the 
allocation not by how buffer was allocated but where it was allocated, more 
like what RCU...

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-03 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918726#comment-13918726
]

Benedict commented on CASSANDRA-6689:
-

bq. can you elaborate why do we need to explicitly set it to null?

I do this in places where I'm handing off control of the referrer to another
thread/context, so we only call setDone() in the case of an error before we did
this. Admittedly, looking at these places now, it looks like replacing the
finally block with a catch (Throwable t) block would be fine, and given we can
do untyped rethrow there isn't anything lost by doing this. Probably an old
habit from Java 6 to do it the way I did.

bq. With would add more control in the situation when we flush memtables in
short intervals.

If I understand you correctly, you mean to deal with the situation where we
want to flush memtables quick enough that the single cleaner thread may become
the bottleneck? In which case, I don't think this should ever be a problem: the
work done by the cleaner itself is very minimal unless it's got some GC to do,
in which case it is potentially non-trivial, but still unlikely to be a
bottleneck. There might be a case for handing off the GC candidate selection to
another thread, except that we expect it to certainly be quicker than any
flush, and we want to wait for it to complete before deciding if flushing is
actually worthwhile, so I'm not sure it buys us much. Maybe we could do this
and continue to check the memory state isn't worsening, and if it is just force
a flush without waiting for the GC, but it probably is too unlikely to be of
benefit to be worth the extra complexity. That is, unless you meant some other
benefit I missed?

bq. Related question, why can't we have per-cf (or global) pool instead of
creating it per memtable, might be possible to remove allocator groups if we do
that?

The pool _is_ global; a per-cf pool would have the issue of poor sharing of
resources, though, no? I would probably like to see us move to allocators
surviving across memtable replacement, so that we could then flatten allocator
group with allocator, but I'd prefer to save that for a later date.

bq. Also I just want to throw an idea here maybe there is something useful in
it - as we have fixed number of stages and threads in them and don't normally
share much between them, maybe it would be possible to use that as an advantage
for allocator gc? What I mean is, it might be possible to track life-time of
the allocation not by how buffer was allocated but where it was allocated, more
like what RCU...

Not really sure what you're suggesting here, could you elaborate?

Partially Off Heap Memtables

Key: CASSANDRA-6689
URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Benedict
Assignee: Benedict
Fix For: 2.1 beta2

Move the contents of ByteBuffers off-heap for records written to a memtable.
(See comments for details)

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-03 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918785#comment-13918785
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


bq. If I understand you correctly, you mean to deal with the situation where we 
want to flush memtables quick enough that the single cleaner thread may become 
the bottleneck? In which case, I don't think this should ever be a problem: the 
work done by the cleaner itself is very minimal unless it's got some GC to do, 
in which case it is potentially non-trivial, but still unlikely to be a 
bottleneck. There might be a case for handing off the GC candidate selection to 
another thread, except that we expect it to certainly be quicker than any 
flush, and we want to wait for it to complete before deciding if flushing is 
actually worthwhile, so I'm not sure it buys us much. Maybe we could do this 
and continue to check the memory state isn't worsening, and if it is just force 
a flush without waiting for the GC, but it probably is too unlikely to be of 
benefit to be worth the extra complexity. That is, unless you meant some other 
benefit I missed?

I can image the situation when we frequently switch memtables which, in current 
code, starts a new thread, so I wonder how quickly we would be able to see a 
slowdown affect caused by that + potentially OOM on stack space...

bq. The pool is global; a per-cf pool would have the issue of poor sharing of 
resources, though, no? I would probably like to see us move to allocators 
surviving across memtable replacement, so that we could then flatten allocator 
group with allocator, but I'd prefer to save that for a later date.

I'm thinking maybe it would lessen internal complexity if we do (sort-of) arena 
allocation inside instead of having groups and sub pools?

bq. Not really sure what you're suggesting here, could you elaborate?

It's just an idea so people can just ignore it but what I meant there is maybe 
it would be a better first step for us to consider making all of the operations 
that we do more self-contained so we can track exactly where it started and 
ends (by operation I mean all types of reads, writes), which could potentially 
make allocator job easier as we would be able to container  per operation 
memory...

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-03 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918822#comment-13918822
]

Benedict commented on CASSANDRA-6689:
-

bq. I can image the situation when we frequently switch memtables which, in
current code, starts a new thread, so I wonder how quickly we would be able to
see a slowdown affect caused by that + potentially OOM on stack space...

It only starts a new thread if there is GC work to be done, room to do it in,
and the static pool doesn't have any threads that haven't timed out. The
cleaner thread itself is started once and remains forever. If you mean you're
worried about flushing so many memtables that all have GC work possible on them
and that this would spam the collector thread pool, I guess I agree it's a
theoretical possibility, but it would have to be pretty extreme. We can spin a
thousand threads and not break a sweat, but even with tens of thousands of
memtables we'll hit our memory limit and stop being able to do any more GC
before we cause major problems. That said, we could safely impose a cap on the
pool if we wanted. Set it largeish so it isn't a bottleneck, but prevent this
kind of problem.

bq. I'm thinking maybe it would lessen internal complexity if we do (sort-of)
arena allocation inside instead of having groups and sub pools?

Probably not appreciably. Most of complexity is likely to remain, as it's
mostly about dealing with the interactions of writers with readers, and those
will be mostly the same regardless, since readers can touch anything at any
time. The allocator group does currently allow for some slight _optimisations_,
but I don't immediately see a way that merging with pool would reduce
_complexity_. I will give it some thought though. On this vein, I would like to
see thread local allocation buffers, but again this is an optimisation rather
than a complexity reducer.

bq. It's just an idea so people can just ignore it but what I meant there is
maybe it would be a better first step for us to consider making all of the
operations that we do more self-contained so we can track exactly where it
started and ends (by operation I mean all types of reads, writes), which could
potentially make allocator job easier as we would be able to container per
operation memory...

We're currently dealing with allocations that necessarily outlive the duration
of the operation, so I'm not sure how well that would apply here, but possibly
you have some specific examples? It may be that we decide to use this code for
creating the mutations from thrift/native/MS etc., though, in which case it
would be very helpful.

Partially Off Heap Memtables

Key: CASSANDRA-6689
URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Benedict
Assignee: Benedict
Fix For: 2.1 beta2

Attachments: CASSANDRA-6689-small-changes.patch

Move the contents of ByteBuffers off-heap for records written to a memtable.
(See comments for details)

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-03 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918825#comment-13918825
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. Almost forgot, I have some minor changes to the code: removed couple of 
used methods, added couple of RefAction.impossible() instead of null, changed 
Memtable code so it avoids double cast, and very minor typo.

Thanks. I'll incorporate these into CASSANDRA-6694 instead if that's okay with 
you?

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2

 Attachments: CASSANDRA-6689-small-changes.patch


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-03-03 Thread Pavel Yaskevich (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918866#comment-13918866
]

Pavel Yaskevich commented on CASSANDRA-6689:

bq. It only starts a new thread if there is GC work to be done, room to do it
in, and the static pool doesn't have any threads that haven't timed out. The
cleaner thread itself is started once and remains forever. If you mean you're
worried about flushing so many memtables that all have GC work possible on them
and that this would spam the collector thread pool, I guess I agree it's a
theoretical possibility, but it would have to be pretty extreme. We can spin a
thousand threads and not break a sweat, but even with tens of thousands of
memtables we'll hit our memory limit and stop being able to do any more GC
before we cause major problems. That said, we could safely impose a cap on the
pool if we wanted. Set it largeish so it isn't a bottleneck, but prevent this
kind of problem.

What I want is to see more control over how we deal with cleaner threads, as
the main point to be able to run with smaller heaps, and thread stacks could
consume ~300K we are risking to OOM in certain situations because we don't
control whole thread life time, that would also help for people in environments
guided by ulimit or similar.

bq. I will give it some thought though. On this vein, I would like to see
thread local allocation buffers, but again this is an optimisation rather than
a complexity reducer.

Please do think about it, we need to try to reduce the complexity of this to
minimize bus factor :)

bq. We're currently dealing with allocations that necessarily outlive the
duration of the operation, so I'm not sure how well that would apply here, but
possibly you have some specific examples? It may be that we decide to use this
code for creating the mutations from thrift/native/MS etc., though, in which
case it would be very helpful.

Exactly, maybe it's time to re-evaluate trade-off between copying data to
per-allocated regions with limited ttl vs. tracking data all around the place.

bq. Thanks. I'll incorporate these into CASSANDRA-6694 instead if that's okay
with you?

Sure.

[~krummas] Have you had a chance to do more on vs. off heap performance testing?

Partially Off Heap Memtables

Key: CASSANDRA-6689
URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Benedict
Assignee: Benedict
Fix For: 2.1 beta2

Attachments: CASSANDRA-6689-small-changes.patch

Move the contents of ByteBuffers off-heap for records written to a memtable.
(See comments for details)

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-28 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915840#comment-13915840
 ] 

Sylvain Lebresne commented on CASSANDRA-6689:
-

2 remarks that I remarked while skimming over the patch (those are more 
codestyle than anything but well):
* there's tons of method whose name starts with an underscore. I'd really 
rather avoid that since it's inconsistent with the code base (and given the 
time it took us a few years back to remove all underscores to get the code 
consistent, I'd rather not reintroduce some now).
* why changing all the ByteBuffer.hasArray() calls to !ByteBuffer.isDirect()? 
In almost all cases, hasArray() is called to guard some latter calls to array() 
and/or arrayOffset() and the [ByteBuffer 
javadoc|http://docs.oracle.com/javase/6/docs/api/java/nio/ByteBuffer.html#hasArray()]
 is pretty explicit on the fact that hasArray() is *the* correct method to call 
in that case. 

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915948#comment-13915948
]

Benedict commented on CASSANDRA-6689:
-

bq. there's tons of method whose name starts with an underscore. I'd really
rather avoid that since it's inconsistent with the code base (and given the
time it took us a few years back to remove all underscores to get the code
consistent, I'd rather not reintroduce some now).

These are almost all within one class: AbstractMemory, also a couple in
NativeAllocation. The reason for this is that these are the base classes for
building off-heap objects, so we need methods for the implementation of the
interfaces these objects will represent, but these methods absolutely should
not escape or interfere with the abstractions we are implementing. As an
example, we need a size() method in CellName, and an internal _size() method in
NativeAllocation. They mean two completely different things, and one of them is
only interesting to the internals of any native implementation. I want to avoid
any such clash, or any overhead in establishing that there is no such clash,
and so underscores seem the best solution. We also need to provide lots of
internal methods for getting/putting data/fields from the off-heap region the
object is made up of, and I don't want any confusion about what methods mess
with this, and which methods are actual functionality.

There are a lot of these methods, and putting internal as a prefix or
something else doesn't seem any better to me.

bq. why changing all the ByteBuffer.hasArray() calls to !ByteBuffer.isDirect()?
In almost all cases, hasArray() is called to guard some latter calls to array()
and/or arrayOffset() and the ByteBuffer javadoc is pretty explicit on the fact
that hasArray() is the correct method to call in that case.

This is a valid criticism, and in fact I should roll back this change for this
patch. It was necessary for CASSANDRA-6689 as we were abusing ByteBuffer, and
breaking this contract. This is no longer the case, now that we use
NativeAllocation instead.

Partially Off Heap Memtables

Key: CASSANDRA-6689
URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Benedict
Assignee: Benedict
Fix For: 2.1 beta2

Move the contents of ByteBuffers off-heap for records written to a memtable.
(See comments for details)

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-28 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916011#comment-13916011
 ] 

Sylvain Lebresne commented on CASSANDRA-6689:
-

bq. There are a lot of these methods, and putting internal as a prefix or 
something else doesn't seem any better to me.

Well, it's more consistent with the code style, and consistency is better. It's 
imo also more explicit since you had to use the word internal to describe 
what the underscore were standing for in your comment. And if size() and 
_size() mean two completely different things, I think having more than one 
character separating them is not a bad thing. But honestly the latter points 
are almost irrelevant, code base style consistency is (to me at least).

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916020#comment-13916020
 ] 

Benedict commented on CASSANDRA-6689:
-

bq. code base style consistency is (to me at least).

Well, readability of the functional methods is important to me, so having 
dozens of internalGetLongVolatile or internalSize calls around the place seems 
an unnecessary price for such an isolated contradiction to the code style, but 
that's just my take. It seems pretty clear that all of the _ methods are 
internal, to me.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-28 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916028#comment-13916028
 ] 

Sylvain Lebresne commented on CASSANDRA-6689:
-

I give up.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-28 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916063#comment-13916063
 ] 

Jonathan Ellis commented on CASSANDRA-6689:
---

bq. it's more consistent with the code style, and consistency is better. It's 
imo also more explicit since you had to use the word internal to describe 
what the underscore were standing for in your comment

+1

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916071#comment-13916071
 ] 

Benedict commented on CASSANDRA-6689:
-

I should note these underscore concerns all apply to CASSANDRA-6694, so I will 
apply the change there, once we have any other concerns here cleared up. As 
merging these changes with those in CASSANDRA-6694 has been a pain already, and 
I don't want to make that worse.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables

2014-02-28 Thread Pavel Yaskevich (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13916545#comment-13916545
 ] 

Pavel Yaskevich commented on CASSANDRA-6689:


So [~benedict] what's the latest branch, is it offheap1b? On the side note, +1 
with [~slebresne], let's incapsulate and name everything properly, I worked on 
renaming here already and don't want to history to repeat itself.

 Partially Off Heap Memtables
 

 Key: CASSANDRA-6689
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 2.1 beta2


 Move the contents of ByteBuffers off-heap for records written to a memtable.
 (See comments for details)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6689) Partially Off Heap Memtables