subject:"\[jira\] \[Commented\] \(CASSANDRA\-7546\) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory"

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-10-25 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184132#comment-14184132
 ] 

Jonathan Ellis commented on CASSANDRA-7546:
---

fixed CHANGES

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, 
> cassandra-2.1-7546-v2.txt, cassandra-2.1-7546-v3.txt, cassandra-2.1-7546.txt, 
> graph2_7546.png, graph3_7546.png, graph4_7546.png, graphs1.png, 
> hint_spikes.png, suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-10-17 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175425#comment-14175425
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Thanks [~yukim] ... note I just noticed that in CHANGES.txt this is recorded in 
the "merge from 2.0:" section

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, 
> cassandra-2.1-7546-v2.txt, cassandra-2.1-7546-v3.txt, cassandra-2.1-7546.txt, 
> graph2_7546.png, graph3_7546.png, graph4_7546.png, graphs1.png, 
> hint_spikes.png, suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-10-12 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168841#comment-14168841
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Actually this is the first time I've looked at the Locks.java code in detail 
myself - it should probably not throw an AssertionError on failure (it should 
log) since it is optional - and maybe the methods should be renamed to indicate 
that it may be a noop

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, 
> cassandra-2.1-7546-v2.txt, cassandra-2.1-7546-v3.txt, cassandra-2.1-7546.txt, 
> graph2_7546.png, graph3_7546.png, graph4_7546.png, graphs1.png, 
> hint_spikes.png, suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-10-11 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168489#comment-14168489
 ] 

graham sanderson commented on CASSANDRA-7546:
-

For what it's worth, I happened to be poking around the JVM source today 
debugging something, and so stopped to take a look - the monitorEnter does 
indeed just revoke any bios and inflate the lock... so seems perfectly fine for 
our purposes (since we expect lock contention anyway)

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, 
> cassandra-2.1-7546-v2.txt, cassandra-2.1-7546.txt, graph2_7546.png, 
> graph3_7546.png, graph4_7546.png, graphs1.png, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-10-10 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167633#comment-14167633
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Just to be clear from the graphs - that is 70gig of GC during the 913 thread 
count run!

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, 
> cassandra-2.1-7546-v2.txt, cassandra-2.1-7546.txt, graph2_7546.png, 
> graph3_7546.png, graph4_7546.png, graphs1.png, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-10-10 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167613#comment-14167613
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Sorry [~yukim], I somehow missed your update - I'm about to attach the test 
results here... note they show much higher GC issues in native_obj than 
heap_buffers without the fix, I'm guessing because the spinning is much faster 
with native_obj

As for monitorEnter/monitorExit Benedict and I had a discussion about that 
above (I originally had it with either multiple copies of the code, or nested 
functions), but it complicated stuff, and I was unable to prove any issues with 
monitorEnter or monitorExit (or indeed reference any, other than some vague 
suspicions I had that maybe this excludes biased locking  or anything else 
which assumes these are neatly paired in a stack frame). In any case we don't 
really care because if we are using them we've already proved we're contended, 
and the monitor would be inflated anyway. The other issue was the use of 
Unsafe, but Benedict seemed fine with that also, since without Unsafe (which 
most people have) you just get the old behavior

So, I say go ahead and promote the fix as is (yes current 2.1 trunk seemed to 
have Locks.java already added - I didn't diff them, but I peeked briefly and it 
looked about the same)

It is possible someone will find a usage scenario that this makes slower, in 
which case we can look at that, but I suspect as mentioned before, in all of 
these cases where we degrade performance it is probably because the original 
performance is just on a lucky knife edge between under utilization, and a 
complete mess!

Finally, I'll summarize what Benedict said up above, that whilst we could add a 
switch for this, this is really an internal implementation fix, the goal of 
which is eventually that there should be no bottleneck even when mutation the 
same partition (something he planned to address in version >=3.0 with lazy 
updates, and repair on read)

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, 
> cassandra-2.1-7546-v2.txt, cassandra-2.1-7546.txt, graph2_7546.png, 
> graph3_7546.png, graphs1.png, hint_spikes.png, suggestion1.txt, 
> suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-10-08 Thread Yuki Morishita (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164208#comment-14164208
 ] 

Yuki Morishita commented on CASSANDRA-7546:
---

+1 to v2 (it lacks Locks.java but I assume it is unchanged).

My concern is use of monitorEnter/monitorExit as I'm not sure the downside of 
those, but I don't think I have better alternative.

[~graham.sanderson] can I go ahead and commit to 2.1 or you want me to wait 
until you do native_objects test?

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, 
> cassandra-2.1-7546-v2.txt, cassandra-2.1-7546.txt, graph2_7546.png, 
> graph3_7546.png, graphs1.png, hint_spikes.png, suggestion1.txt, 
> suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-30 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14153780#comment-14153780
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Just a little update:

I have numbers for one node down & hinting with heap_buffers, I just need to 
re-run a few tests since there were a couple of spurious points (might have be 
due to not using a totally clean cluster every time - this is not a cluster I 
can easily re-create) that I want to verify before I post them.

Generally this patch thus far seems to be good, and while there is a non-"sweet 
spot" where it can be mildly harmful, this is basically on the knife edge of 
where you are almost overcommitting your hardware, which is probably not where 
people are hoping to be running.

The other point to note is that while the excess GC allocation here does not 
cause huge issues, in a busy cluster which had a huge number of resident slabs 
to start off with, this can cause major knock on GC - head-aches (with slabs 
spilling into old gen with other garbage etc)... The GC issue isn't as much of 
a problem with the native allocators in 2.1 (though they do seem to become a 
bottleneck under high allocation rates), the fact that it is still generally 
faster with this patch suggests we should keep it on for those too.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, graph2_7546.png, 
> graphs1.png, hint_spikes.png, suggestion1.txt, suggestion1_21.txt, 
> young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-26 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150409#comment-14150409
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Busy week - I did the native_objects graphs. It actually really helps out here 
too - seems like native allocation starts taking a hit with too much 
concurrency.

I was about to do the hinting graphs, but cassandra-stress seems to be pulling 
the server names from the server (so I can't start it with one node down) - or 
maybe I can, and I should just ignore the errors (I just tried giving it 4/5 
nodes on the command line)

What would you like me to do for n= ... I do have the full raw output for all 
these runs

!graphs2_7546.png!



> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, graph2_7546.png, 
> graphs1.png, hint_spikes.png, suggestion1.txt, suggestion1_21.txt, 
> young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-20 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141858#comment-14141858
 ] 

Benedict commented on CASSANDRA-7546:
-

In general the idea for the auto mode is to get a general overview of the 
various conditions, _especially_ when run in target uncertainty (err<) mode, 
which is the default mode. I've just committed a minor change, that was 
previously talked about, that supports running all thread counts in the range 
unconditionally, however it will log a warning if you run this with target 
uncertainty mode, as the workloads will be different.

Really we should be tearing down and rebuilding the cluster between runs. 
However it looks like the results are pretty much a wash for all modes except 
those where high contention on a single partition is to be expected. It's a bit 
strange that .999%ile is higher with the patch for the highest thread counts 
but lower contention, but that may be noise. Certainly the heap reduction looks 
promising.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, graphs1.png, 
> hint_spikes.png, suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-20 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141812#comment-14141812
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Make of this what you will (these are the 1-1024 partitions with and without 
the patch as mentioned above)... You can clearly see the higher mem usage 
without the patch. Beyond that there looks to be some noise from compaction. As 
expected, the patch helps under high contention... dosen't seem to hurt at the 
low end (some of the low thread count stuff looks like it might be 
cassandra-stress related), and I'm not sure yet if the small differences in the 
middle thread counts are just noise.

!graphs1.png!

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, graphs1.png, 
> hint_spikes.png, suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-19 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141766#comment-14141766
 ] 

graham sanderson commented on CASSANDRA-7546:
-

I'll try and make a graph of the data I have so far at some point over the 
weekend anyway.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-19 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141764#comment-14141764
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Thanks - I updated, and have run 1/16/256/1024 partitions against both my 
baseline 2.1.1, and patched (with 7546.21_v1.txt) 2.1.1 using heap_buffers and 
all 5 nodes up.

Things look promising so far, I need to run with a node down (I assume I take 
it out of the seeds list), and also with native_objects/native_buffers... this 
is something I can do in parallel with other work, but will still take some 
time.

Random cassandra-stress question: Generally it seems that the threadCount where 
it stops seems to be the one after it has started overloading the system. Maybe 
this is what is wanted for the final results, but generally it seems that the 
latency of this final run is not representative of the previous one or two 
thread counts which were doing about the same number of ops/second (hence why 
it stopped). Not sure what the thinking is on that, I'm sure it has come up 
before.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-19 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140158#comment-14140158
 ] 

Benedict commented on CASSANDRA-7546:
-

force pushed another update that both enforces the sample size _if it is likely 
that multiple visits will be needed_, and also reduces local contention by 
changing the saved seed position to a scalar from an int[], which can be 
incremented much more cheaply

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-18 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140087#comment-14140087
 ] 

Benedict commented on CASSANDRA-7546:
-

I meant to mention, but forgot, in case you worried about this: for simplicity 
and performance, we don't guarantee that we only generate as many partitions as 
the sample defines, we only guarantee that when sampling we follow that 
distribution (and so will ignore any overshoot that we generated). Essentially 
any thread sampling the working set that hits _past the end of the set_ (i.e. 
either into an area not yet populated, or one that has been finished and not 
replaced) will asynchronously generate a new seed, write to it, and _then_ 
update the sample. This is because updating the sample is itself costly, and 
for workloads where the work is likely to be completed in one shot we don't 
want to incur that cost.

That said it should be quite possible to decide upfront if the workload meets 
these characteristics and, if it doesn't (like this one), update the sample in 
advance.

There's also sort-of an off-by-1 error for the 1025, though. We're not taking 
the minimum index off from the generated sample index, so with a distribution 
1..1024, we're never sampling index 0, and our sample size will be 1025. I've 
pushed a fix for this.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-18 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140011#comment-14140011
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Oh I should mention the warmup ended up generating 20 partitions, and during 
the cause of the whole test, it got bumped to 21... maybe that'll give you an 
"ah ha" moment.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-18 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140007#comment-14140007
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Didn't want to deep dive, but out of curiosity I did do one run configured for 
a single partition

{code}
Results:
op rate   : 5760
partition rate: 5760
row rate  : 5760
latency mean  : 158.7
latency median: 151.2
latency 95th percentile   : 221.5
latency 99th percentile   : 262.3
latency 99.9th percentile : 282.4
latency max   : 396.0
total gc count: 3
total gc mb   : 18779
total gc time (s) : 0
avg gc time(ms)   : 67
stdev gc time(ms) : 26
Total operation time  : 00:00:35
Improvement over 609 threadCount: 4%
 id, total ops , adj row/s,op/s,pk/s,   row/s,mean, 
med, .95, .99,.999, max,   time,   stderr,  gc: #,  max ms,  
sum ms,  sdv ms,  mb
  4 threadCount, 6782  ,-0, 120, 120, 120,33.3,
43.8,50.6,63.0,83.9,85.7,   56.6,  0.01940,  0,   0,
   0,   0,   0
  8 threadCount, 6629  ,-0, 212, 212, 212,37.7,
39.1,57.0,75.0,   127.2,   138.2,   31.3,  0.00868,  0,   0,
   0,   0,   0
 16 threadCount, 27730 ,-0, 566, 566, 566,28.2,
26.2,50.6,75.7,   125.5,   170.4,   49.0,  0.01963,  0,   0,
   0,   0,   0
 24 threadCount, 51763 ,   798, 796, 796, 796,30.1,
29.5,51.0,76.9,90.8,   144.4,   65.0,  0.01977,  2, 203,
 203,  10,   12877
 36 threadCount, 74953 ,-0,1253,1253,1253,28.7,
27.8,50.7,60.5,79.6,   308.0,   59.8,  0.01938,  0,   0,
   0,   0,   0
 54 threadCount, 56948 ,-0,1807,1807,1807,29.8,
27.6,52.6,63.1,78.1,   121.1,   31.5,  0.01170,  3, 176,
 176,  12,   19816
 81 threadCount, 74856 ,-0,2369,2369,2369,34.1,
33.2,57.2,67.6,76.6,   108.6,   31.6,  0.00946,  0,   0,
   0,   0,   0
121 threadCount, 100526,-0,3158,3158,3158,38.2,
37.8,63.4,78.9,89.1,   446.6,   31.8,  0.01805,  2,  93,
  93,   1,   13063
181 threadCount, 277875,-0,4491,4491,4491,40.2,
40.2,63.1,79.1,94.0,   679.7,   61.9,  0.01985,  5, 286,
 286,  28,   32541
271 threadCount, 169870,-0,5205,5205,5205,52.0,
49.2,84.9,   110.5,   140.5,   843.9,   32.6,  0.01320,  3, 157,
 157,  11,   19408
406 threadCount, 187985,  5648,,,,73.0,
64.2,   122.1,   156.0,   285.3,   848.6,   33.8,  0.01421,  3, 173,
 173,  12,   19570
609 threadCount, 201184,  5540,5534,5534,5534,   110.1,   
101.1,   160.5,   230.1,   378.9,   555.6,   36.4,  0.01917,  3, 163,   
  163,  17,   19709
913 threadCount, 205466,  5787,5760,5760,5760,   158.7,   
151.2,   221.5,   262.3,   282.4,   396.0,   35.7,  0.01335,  3, 200,   
  200,  26,   18779
{code}

Obviously I don't know if the slowdown is on the load end or the server end 
(though there is some GC increase here - we'll see what the patch for this 
issue does). Note that if this is a synchronization problem with the load 
generator still, we do know for a fact that hinting is a good way of turning a 
large partition domain into a small partition domain (so I'll obviously be 
testing that too, though that isn't apples to apples either).

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst ma

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-18 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139986#comment-14139986
 ] 

graham sanderson commented on CASSANDRA-7546:
-

FYI in case I didn't mention it, this is a 5 node cluster, and we're running 
LOCAL_QUORUM and repl factor 3

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-18 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139982#comment-14139982
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Ok, cool thanks - I've upgraded my 2.1.0 to 2.1.1... {{7cfd3ed}} for what it's 
worth.

I merged {{7964+7926}} into that and updated my load machine with that.

I switched to 40x40x40x40 clustering keys as suggested and changed the 10M 
entries in the command line args to 256 accordingly (it now runs 
successfully)

The output is below

Note I ended up with 1275 partitions (note during the warmup I ended up with 
1025 so there may be a 1-off bug there also either in stress or my config!)... 
still not sure this is what we expect - each node has only seen about 3M 
mutations total (and I've run the stress test twice - once without the GC stuff 
working)

Anyway, let me know what you think - I won't be running more tests until 
tomorrow US time. 

Another question - what do you usually do to get comparable results; right now 
I have been blowing away the stresscql keyspace every time to at least keep 
compaction out of the equation. Given the length of the cassandra-stress run, 
I'm not sure there is much to be gained by bouncing the cluster in between 
runs, but you probably know better having used it before.

{code}
Results:
op rate   : 10595
partition rate: 10595
row rate  : 10595
latency mean  : 85.8
latency median: 49.9
latency 95th percentile   : 360.0
latency 99th percentile   : 417.9
latency 99.9th percentile : 491.9
latency max   : 552.2
total gc count: 3
total gc mb   : 19471
total gc time (s) : 0
avg gc time(ms)   : 67
stdev gc time(ms) : 5
Total operation time  : 00:00:40
Improvement over 609 threadCount: -1%
 id, total ops , adj row/s,op/s,pk/s,   row/s,mean, 
med, .95, .99,.999, max,   time,   stderr,  gc: #,  max ms,  
sum ms,  sdv ms,  mb
  4 threadCount, 6939  ,-0, 226, 226, 226,17.6,
16.3,40.3,49.4,51.1,   131.8,   30.6,  0.01464,  0,   0,
   0,   0,   0
  8 threadCount, 11827 ,   385, 385, 385, 385,20.7,
15.1,47.5,51.3,82.1,   111.7,   30.7,  0.02511,  0,   0,
   0,   0,   0
 16 threadCount, 19068 ,-0, 612, 612, 612,26.1,
28.8,49.9,60.6,89.7,   172.1,   31.2,  0.01924,  0,   0,
   0,   0,   0
 24 threadCount, 24441 ,-0, 775, 775, 775,30.9,
32.6,52.1,80.3,88.3,   150.4,   31.5,  0.01508,  0,   0,
   0,   0,   0
 36 threadCount, 36641 ,-0,1155,1155,1155,31.1,
30.2,59.0,78.1,89.7,   172.1,   31.7,  0.01127,  0,   0,
   0,   0,   0
 54 threadCount, 55220 ,-0,1730,1730,1730,31.1,
29.1,54.5,74.3,84.3,   164.4,   31.9,  0.00883,  0,   0,
   0,   0,   0
 81 threadCount, 83460 ,-0,2609,2609,2609,31.0,
28.9,51.2,71.0,79.2,   175.4,   32.0,  0.01678,  0,   0,
   0,   0,   0
121 threadCount, 140705,-0,4402,4402,4402,27.4,
25.8,49.7,53.2,70.3,   302.8,   32.0,  0.01438,  2, 462,
 462,  11,   12889
181 threadCount, 226213,-0,7116,7116,7116,25.4,
24.2,48.8,51.8,60.1,   279.0,   31.8,  0.01335,  1, 230,
 230,   0,6401
271 threadCount, 320658,-0,   10089,   10089,   10089,26.8,
25.0,48.3,50.1,57.4,   297.0,   31.8,  0.01256,  2, 425,
 425,  14,   12786
406 threadCount, 342451,-0,   10609,   10609,   10609,38.2,
40.3,59.0,77.5,81.7,   142.4,   32.3,  0.00920,  0,   0,
   0,   0,   0
609 threadCount, 381058,-0,   10651,   10651,   10651,57.0,
48.6,   171.5,   224.4,   248.4,   342.0,   35.8,  0.01234,  1,  66,
  66,   0,6520
913 threadCount, 432518,-0,   10595,   10595,   10595,85.8,
49.9,   360.0,   417.9,   491.9,   552.2,   40.8,  0.01471,  3, 200,
 200,   5,   19471
END
{code}

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
>

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-18 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139341#comment-14139341
 ] 

Benedict commented on CASSANDRA-7546:
-

I've uploaded a patch 
[here|https://github.com/belliottsmith/cassandra/tree/7964-simultinserts], and 
another [here|https://github.com/belliottsmith/cassandra/tree/7964+7926] which 
combines it with another stress patch that reduces the risk of OOM (although 
this risk is pretty low, and almost certainly not what you were hitting) - but 
as you scale thread count up it becomes more of a risk

The main 7964 patch includes a couple of small bug fixes as well, and I've 
tested it against your schema and some other related schemas that are trickier 
to process.

One thing I would suggest considering is expanding the clustering column count 
to increase the speed of generation, as 1200 items is still quite a few to 
create for only sending 1 item, which might end up reducing contention server 
side. Possibly reduce to only 30-40 items per tier.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-17 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137888#comment-14137888
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Yeah, this is only a cluster for my testing this... I just don't want a massive 
breakage that stops it working completely! I'll install head

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-17 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137887#comment-14137887
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Hmm - i'll definitely have to try again - it didn't respond to SIGHUP or non -F 
jstack, and isn't responding to ctrl+C, so maybe close to OOM

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-17 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137885#comment-14137885
 ] 

Benedict commented on CASSANDRA-7546:
-

It's hard to say for certain, but glancing at CHANGES.txt, it looks like 
2.1.1-HEAD is same ballpark as safe to run as 2.1.0. There are a lot of changes 
merged, but mostly for tools like cqlsh, and the things in the core application 
are pretty minor. I don't officially endorse it though, since we only just 
shipped to 2.1.0, and haven't had much time to QA 2.1.1 

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-17 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137872#comment-14137872
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Cool, thanks, I'll wait on your patch (I have plenty of other things to do ;-) 
). that said, am I relatively safe to upgrade the actual nodes to current head 
of 2.1 branch (and thusly pick up your latest GC monitoring stuff?) if I have a 
spare moment before then? Ideally I'd upgrade to the last commit in 2.1 needed 
to be in place on test nodes for correct latest cassandra-stress operation.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-17 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137856#comment-14137856
 ] 

Benedict commented on CASSANDRA-7546:
-

Hmm. This looks like a subtle "bug" with the latest stress when operating over 
such a small domain. But also highlights a problem with using it for this 
workload - I may need to do some tweaking tomorrow to make it suitable. To 
ensure we keep our procedurally generated state for the partition intact we 
only let one insert thread operate over a given partition at a time. If there 
is a conflict, we fall back to the underlying id distribution to avoid wasting 
time. This means that with a small domain we will steadily visit more and more 
partitions, but also that we will never have competing updates to the same 
partition, which is a glaring limitation (especially here). As it happens, the 
latest version of the procedural generation is reasonably easy to safely 
partition the work across multiple threads without mutual exclusivity, so I'll 
try to patch that ASAP.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-17 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1413#comment-1413
 ] 

graham sanderson commented on CASSANDRA-7546:
-

OK, so I'm running latest stress.jar on my load machine - given the number of 
changes to stress in 2.1.1 (and the addition by the looks of things of remote 
GC logging via cassandra-stress which would be useful in this case), I guess 
I'll upgrade the cluster as well.

Here is my current config (minus the comments) and the launch command... note 
there were some typos in our conversation above

{code}
keyspace: stresscql

keyspace_definition: |
  CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 3};

table: testtable

table_definition: |
  CREATE TABLE testtable (
p text,
c1 int, c2 int, c3 int,
v blob,
PRIMARY KEY(p, c1, c2, c3)
  ) WITH COMPACT STORAGE 
AND compaction = { 'class':'LeveledCompactionStrategy' }
AND comment='TestTable'

columnspec:
  - name: p
size: fixed(16)
  - name: c1
cluster: fixed(100)
  - name: c2
cluster: fixed(100)
  - name: c3
cluster: fixed(1000) # note I made it slightly bigger since 10M is better 
than 1M for a max - 1M happens pretty quickly
  - name: v
size: gaussian(50..250)

queries:
   simple1:
  cql: select * from testtable where k = ? and v = ? LIMIT 10
  fields: samerow
{code}

{code}
./cassandra-stress user profile=~/cqlstress-7546.yaml ops\(insert=1\) 
cl=LOCAL_QUORUM -node $NODES -mode native prepared cql3 -pop seq=1..10M -insert 
visits=fixed\(10M\) revisit=uniform\(1..1024\) | tee 
results/results-2.1.0-p1024-a.txt
{code}

As of right now, we're still (8 minutes later) at:

{code}
INFO  19:11:51 Using data-center name 'Austin' for DCAwareRoundRobinPolicy (if 
this is incorrect, please provide the correct datacenter name with 
DCAwareRoundRobinPolicy constructor)
Connected to cluster: Austin Multi-Tenant Cassandra 1
INFO  19:11:51 New Cassandra host cassandra4.aus.vast.com/172.17.26.14:9042 
added
Datatacenter: Austin; Host: cassandra4.aus.vast.com/172.17.26.14; Rack: 98.9
Datatacenter: Austin; Host: /172.17.26.15; Rack: 98.9
Datatacenter: Austin; Host: /172.17.26.13; Rack: 98.9
Datatacenter: Austin; Host: /172.17.26.12; Rack: 98.9
Datatacenter: Austin; Host: /172.17.26.11; Rack: 98.9
INFO  19:11:51 New Cassandra host /172.17.26.12:9042 added
INFO  19:11:51 New Cassandra host /172.17.26.11:9042 added
INFO  19:11:51 New Cassandra host /172.17.26.13:9042 added
INFO  19:11:51 New Cassandra host /172.17.26.15:9042 added
Created schema. Sleeping 5s for propagation.
Warming up insert with 25 iterations...
Failed to connect over JMX; not collecting these stats
Generating batches with [1..1] partitions and [1..1] rows (of 
[1000..1000] total rows in the partitions)
{code}

Number of distinct partitions is currently 2365 and growing.

Is this what we expect? doesn't seem like 250,000 should have exhausted any 
partitions?

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-16 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135637#comment-14135637
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Ok, thanks Sylvain, yes I was a bit confused (also because Benedict's changes 
included in the incorrect tag had CHANGES.txt with his new stress change as 
part of listed 2.1.0 changes - which of course now makes sense); anyways... 
this is good news for me, I'll leave the test cluster on what I deployed 
(2.1.0-tentative == the real 2.1.0 as expected according to how the vote was 
looking at the time), and update stress.jar on my load machine to come from 2.1 
head.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-16 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135201#comment-14135201
 ] 

Sylvain Lebresne commented on CASSANDRA-7546:
-

bq. 
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=log;h=f099e086f3f002789e24bd6c58e52b7553cd5381
 is what was released according to the 2.1.0 tag in git vs what Sylvain 
Lebresne said in the email thread regarding no changes after 
c6a2c65a75adea9a62896269da98dd036c8e57f3 which was 2.1.0-tentative

I messed up when tagging it, it's the vote email that was correct, and I 
apologize for the confusion. I've updated the tag to reflect what was actually 
released.


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-15 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135004#comment-14135004
 ] 

Benedict commented on CASSANDRA-7546:
-

1: that's great news :)
3: if you want lots of unique clustering key values per partition, currently 
stress has some limitations and you will need/want to have multiple clustering 
columns for it to be able to generate that smoothly without taking donkeys 
years per insert (on the workload generation side). Its minimum unit of 
generation (not insert) is a single tier of clustering values, so it would 
generate all 100B values each time you wanted to insert any number with your 
spec.

So, you want to consider a yaml like this:

{noformat}
table_definition: |
  CREATE TABLE testtable (
p text,
c1 int, c2 int, c3 int
v blob,
PRIMARY KEY(p, c1, c2, c3)
  ) WITH COMPACT STORAGE 
AND compaction = { 'class':'LeveledCompactionStrategy' }
AND comment='TestTable'

columnspec:
  - name: p
size: fixed(16)
  - name: c1
cluster: fixed(100)
  - name: c2
cluster: fixed(100)
  - name: c3
cluster: fixed(100)
  - name: v
size: gaussian(50..250)
{noformat}

Then you want to consider passing -pop seq=1..1M -insert visits=fixed(1M) 
revisits=uniform(1..1024)

The visits parameter here tells stress to split each partition into 1M distinct 
inserts, which given its deterministic 1M keys means exactly 1 item inserted 
each visit. The revisits distribution defines the number of partition keys we 
will operate over until we exhaust one before selecting another to include in 
our working set. 

Notice I've removed the population spec from your partition key in the yaml. 
This is because it is not necessary to constrain it here, as you can constrain 
the _seed_ population with the -pop parameter, which is the better way to do it 
here (so you can use the same yaml across runs). However, in this case given 
our revisits() distribution we can also not constrain the seed population, 
since once our first 1024 have been generated no other PK will be visited until 
one of these has been fully exhausted (i.e. 1024 * 1M inserts, quite a few...). 

You may also constrain the seed to the same range, which once a key is 
exhausted would always result in filling back in that key to the working set. 
It doesn't matter what distribution you choose in this case, since it will keep 
generating a value until one not present in the stash crops up, which if they 
operate over the same domain can only result in 1 item regardless of 
distribution, so I suggest a sequential distribution to ensure determinism.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-15 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134695#comment-14134695
 ] 

graham sanderson commented on CASSANDRA-7546:
-

# 
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=log;h=f099e086f3f002789e24bd6c58e52b7553cd5381
 is what was released according to the 2.1.0 tag in git vs despite what 
[~slebresne] said in the email thread regarding no changes after 
c6a2c65a75adea9a62896269da98dd036c8e57f3 which was 2.1.0-tentative
# ok, I'll try offheap_objects instead (or as well)
# I'm still a bit confused about visit/revisit (which are in the 2.1.0 tagged 
release)... I want to evenly spread the load across all my partitions 
(genernally using a new clustering key every time, though I want to put a 
practical limit on the size of the partitions, so I was hoping to let it wrap 
at 10M or so unique clustering key values)... so it ounds like i want a 
visits=fixed(1) and revisits=not quite sure


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-15 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134674#comment-14134674
 ] 

Benedict commented on CASSANDRA-7546:
-

Hi Graham,

I must admit I'm a bit confused, and it's partially self inflicted. In 2.1.1 we 
have changed stress again from what we released in 2.1.0, and I can't tell 
which version you're referring to, though it seems 2.1.1. Neither version has a 
'visits' property in the yaml, but 2.1.1 supports -insert visits= revisit=, 
which are certainly functions worth exploring and I recommend you use 2.1.1 for 
stress functionality either way. 

As far as using these functions are concerned, 'visits' splits a wide row up 
into multiple inserts; if a visits value of 10 is produced, and there are on 
average 100 rows generated for the partition, approximately 10 rows will be 
inserted, then the state of the partition will be stashed away and the next 
insert that operates on that partition will pick up where the previous one left 
off. Which partition is performed next is decided by the 'revisit' 
distribution, which selects from the stash of partially completed inserts, with 
a value of 1 selecting the most recently stashed (the max value of this 
distribution defines the total number of partitions to stash); if it ever 
selects outside of the current stash a new partition is generated instead. 

So the value for 'visits' is related to the number of unique clustering columns 
you generate for each partition, whereas the value for revisit is determined by 
how diverse the data you operate over in a given time window is.  

Separately, it's worth mentioning that offheap_objects is likely a better 
choice than offheap_buffers, since it is considerably more memory dense.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-15 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134635#comment-14134635
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Finally getting back to this, been doing other things (this slightly lower 
priority as we have it in production already)... I just realized that the 
version c6a2c65a75ade being voted on for 2.1.0 that I deployed is not the same 
as 2.1.0 released. I am now upgrading, since cassandra-stress changes snuck in.

Note, than I plan to stress using 1024, 256, 16, 1 partitions, with all 5 nodes 
up, and then with 4 nodes up and one down to test effect of hinting, (note repl 
factor of 3 and cl=LOCAL_QUORUM)

I want to do one cell insert per batch... I'm upgrading in part because of the 
new visit/revisit stuff - I'm not 100% sure how to use them correctly, I'll 
keep playing but you may answer before I have finished upgrading and tried with 
this. My first attempt on the original 2.1.0 revision, ended up with only one 
clustering key value per partition which is not what I wanted (because it'll 
make trees small)

Sample YAML for 1024 partitions
{code}
#
# This is an example YAML profile for cassandra-stress
#
# insert data
# cassandra-stress user profile=/home/jake/stress1.yaml ops(insert=1)
#
# read, using query simple1:
# cassandra-stress profile=/home/jake/stress1.yaml ops(simple1=1)
#
# mixed workload (90/10)
# cassandra-stress user profile=/home/jake/stress1.yaml ops(insert=1,simple1=9)


#
# Keyspace info
#
keyspace: stresscql

#
# The CQL for creating a keyspace (optional if it already exists)
#
keyspace_definition: |
  CREATE KEYSPACE stresscql WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 3};

#
# Table info
#
table: testtable

#
# The CQL for creating a table you wish to stress (optional if it already 
exists)
#
table_definition: |
  CREATE TABLE testtable (
p text,
c text,
v blob,
PRIMARY KEY(p, c)
  ) WITH COMPACT STORAGE 
AND compaction = { 'class':'LeveledCompactionStrategy' }
AND comment='TestTable'

#
# Optional meta information on the generated columns in the above table
# The min and max only apply to text and blob types
# The distribution field represents the total unique population
# distribution of that column across rows.  Supported types are
# 
#  EXP(min..max)An exponential distribution over 
the range [min..max]
#  EXTREME(min..max,shape)  An extreme value (Weibull) 
distribution over the range [min..max]
#  GAUSSIAN(min..max,stdvrng)   A gaussian/normal distribution, 
where mean=(min+max)/2, and stdev is (mean-min)/stdvrng
#  GAUSSIAN(min..max,mean,stdev)A gaussian/normal distribution, 
with explicitly defined mean and stdev
#  UNIFORM(min..max)A uniform distribution over the 
range [min, max]
#  FIXED(val)   A fixed distribution, always 
returning the same value
#  Aliases: extr, gauss, normal, norm, weibull
#
#  If preceded by ~, the distribution is inverted
#
# Defaults for all columns are size: uniform(4..8), population: 
uniform(1..100B), cluster: fixed(1)
#
columnspec:
  - name: p
size: fixed(16)
population: uniform(1..1024) # the range of unique values to select for 
the field (default is 100Billion)
  - name: c
size: fixed(26)
#cluster: uniform(1..100B)
  - name: v
size: gaussian(50..250)

insert:
  partitions: fixed(1)# number of unique partitions to update in a 
single operation
  # if batchcount > 1, multiple batches will be 
used but all partitions will
  # occur in all batches (unless they finish 
early); only the row counts will vary
  batchtype: LOGGED   # type of batch to use
  visits: fixed(10M)# not sure about this

queries:
   simple1: select * from testtable where k = ? and v = ? LIMIT 10
{code}

Command-line

{code}
./cassandra-stress user profile=~/cqlstress-1024.yaml ops\(insert=1\) 
cl=LOCAL_QUORUM -node $NODES -mode native prepared cql3 | tee 
results/results-2.1.0-p1024-a.txt
{code}

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestio

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-05 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123920#comment-14123920
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Yes, I'm actually waiting on one of our main Cassandra Ops guys to come back 
from vacation on Monday to upgrade one of our clusters to 2.1 before I can run 
the stress tests, but we do have the patch running in production on 2.0.x.

It detects hints, and it would also seem (which makes sense) fast hint playback 
of things with low cardinality keys

I will certainly change the log level to INFO or DEBUG though... as this 
shouldn't really be a WARNING.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-05 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14123917#comment-14123917
 ] 

Benedict commented on CASSANDRA-7546:
-

Overall the patch LVGTM, though not giving it an official +1 until I'm closer 
to 100%.

Look forward to seeing the results.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, 7546.21_v1.txt, hint_spikes.png, 
> suggestion1.txt, suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-01 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117896#comment-14117896
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Hi Benedict, I hope you are OK and get well soon... it will likely be a week or 
two before we can prove in production that this is fixing the problem. I also 
have been on vacation and then sick, so I have a lot of other catching up to 
do. Once I have some time, I will play with the new stress testing stuff in 2.1 
along with this and try and get some firm evidence there.

All I ask is that it doesn't get pushed to 3.0.x ;-)

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, hint_spikes.png, suggestion1.txt, 
> suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-01 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117882#comment-14117882
 ] 

Benedict commented on CASSANDRA-7546:
-

Hi Graham,

Just an FYI I won't be in a position for a little while to perform a formal 
review on something this critical, after having had an accident. Just wanted to 
let you know I'm not ignoring progress though, and will get to it soon enough.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, hint_spikes.png, suggestion1.txt, 
> suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-01 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117845#comment-14117845
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Ok, NP, we can do our own custom builds with it in 2.0.x...

I'll make and attach a 2.1.x patch for this sensible (sensitive?) part of the 
code soon.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, hint_spikes.png, suggestion1.txt, 
> suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-09-01 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117276#comment-14117276
 ] 

Sylvain Lebresne commented on CASSANDRA-7546:
-

bq. Assuming all is well, then I would like to request this be targeted for 
2.0.11 too

I'm afraid this is a bit too complex in a bit too sensible part of the code to 
be eligible for 2.0 at this point.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, 7546.20_async.txt, hint_spikes.png, suggestion1.txt, 
> suggestion1_21.txt, young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-29 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116226#comment-14116226
 ] 

graham sanderson commented on CASSANDRA-7546:
-

In beta, the patch worked well at detecting hint activity. next week we will 
put it on half the production nodes, to verify that those nodes don't go into 
memory allocation craziness in response to hinting under heavy load.

Assuming all is well, then I would like to request this be targeted for 2.0.11 
too

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Fix For: 2.1.1
>
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_7b.txt, 
> 7546.20_alt.txt, hint_spikes.png, suggestion1.txt, suggestion1_21.txt, 
> young_gen_gc.png
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-10 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092462#comment-14092462
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Had a lot going on... have this running in beta right now (without double 
counting), but haven't had a chance to deliberately test it with a node down.

That said, it does detect OpsCenter.pdps in beta (we generally have OpsCenter 
turned off in production for high volume stuff, and this would seem to validate 
our decision)

Anyway, I myself am now on vacation for the next 10 days... I'd be super 
interested if we could see some results from 2.1

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-07 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089617#comment-14089617
 ] 

graham sanderson commented on CASSANDRA-7546:
-

doh - i should have asked about the double counting - didn't see it, now I do

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-06 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088738#comment-14088738
 ] 

graham sanderson commented on CASSANDRA-7546:
-

I ran my smoke test on this and it is as expected; I have added the patch (with 
a warn log statement on memtable flush if we have resorted to pessimistic 
concurrency for some rows) to our 2.0.9 beta env... I will try and repro there 
with a node down (though this cluster is pretty much limited by commit volumes 
under high load, so can't equal production concurrency), but that said I just 
want to check that everything is OK, before I patch a single node in production 
(also 2.0.9)

On a separate note (I don't have access to a 2.1 cluster ATM), it would be 
interesting to try something similar to

http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster

with a node down & hinting as a test case for this on 2.1


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-06 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087885#comment-14087885
 ] 

Benedict commented on CASSANDRA-7546:
-

Sounds good, thanks!

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_7.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-06 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087850#comment-14087850
 ] 

Benedict commented on CASSANDRA-7546:
-

bq. We probably mean "to the left" of... "before" or "after" are a bit 
confusing here!

Yep, good catch!

bq. Volatile read of the wasteTracker in the "fast" path.

At the moment we mostly optimise for x86 for the moment, and it's essentially 
free here as you say. Even on platforms it isn't, it's unlikely to be a 
significant part of the overall costs, so better to keep it cleaner

bq. Adjacent in memory CASed vars in the AtomicSortedColumns - Again not 
majorly worried here... I don't think the (CASed) variables themselves are 
highly contended, it is more that we are doing lots of slow concurrent work, 
and then failing the CAS.

Absolutely not worried about this. Like you say, most of the cost is elsewhere. 
Would be much worse to pollute the cache with padding to avoid it.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-06 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087841#comment-14087841
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Cool will do; addColumns also CASes a thread locally modified Holder anyway.

Yes I agree it is ugly to have a non final in something like Holder (being 
CASed immutable state) but I think we can live with it since it is not mutated 
after CAS

As said, we can revert to monitor enter/exit if you wish... I can't prove it is 
worse, and there isn't a whole lot that needs optimization here
Note you have a comment

{quote}
in wasteTracker we maintain within EXCESS_WASTE_OFFSET either side of the 
current time
{quote}

We probably mean "to the left" of... "before" or "after" are a bit confusing 
here!

I thought about a couple of things while you were on vacation

# Volatile read of the wasteTracker in the "fast" path. We could avoid this 
thru some ugliness of hijacking the top bit in the tree size mark pessimistic 
locking too. Not to concerned about this - believe it is free on intel anyway
# Adjacent in memory CASed vars in the AtomicSortedColumns - Again not majorly 
worried here... I don't think the (CASed) variables themselves are highly 
contended, it is more that we are doing lots of slow concurrent work, and then 
failing the CAS.


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-06 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087814#comment-14087814
 ] 

Benedict commented on CASSANDRA-7546:
-

Well, technically we never ever call addColumn() directly, but in 2.0 we 
haven't removed / UnsupportedOperationException'd that path, so I'm not totally 
comfortable leaving it as a regular int, as an external call to addColumn would 
break it (but then, this probably isn't the end of the world). 

However, I actually introduced a double counting bug in changing that :/   ... 
and since we don't want to incur the incAndGet every change, and we don't want 
to dup code, let's settle for the possible race for maintaining size if 
somebody uses the API in a way it isn;t in the codebase right now.

However I think I would prefer to make size final in this case.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-08-06 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087799#comment-14087799
 ] 

graham sanderson commented on CASSANDRA-7546:
-

+1 on another set of eyes (yes the isSynchronized is ugly) - that said, I can 
move ahead on testing the main functionality of this patch (the waste 
detection) since we are all agreed I think on the basic mechanism.

I am reading your patch (thanks for cleaning up - mine was a bit verbose for 
discussion purposes), I will read it in more detail now, but just from an 
initial glance in its raw form, why did you make the size in Holder 
volatile/atomically updated. The holder instances should only mutated by a 
single thread

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_6.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-28 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077057#comment-14077057
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Ok, thank you... yeah my only reason for recording something in the actual 
codebase was to indicate that to the user that they had ultra heavy partition 
contention that might be detrimental to performance, and they should perhaps 
review their schema. Given that this may not be the case at all in 3.0 (i.e. it 
may be gracefully handled in all cases), I'll try out locally with a WARN 
statement instead. I'll probably do it at memtable flush anyway which has more 
useful context (e.g. the CF in question), and would be less spam-y (i.e. one 
warn with the number of contended partitions, though perhaps the contended 
key(s) are interesting at a lower log level)... whether we include such logging 
in the final patch I don't know.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_alt.txt, suggestion1.txt, 
> suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-28 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077018#comment-14077018
 ] 

Benedict commented on CASSANDRA-7546:
-

My biggest concern with metrics is that what we expose as a metric will 
probably change when we change tack to a lock-free lazy-update design, since it 
will be more expensive to maintain. Certainly tracking the amount of 'wasted' 
work will be meaningless then, although possibly we could track the raw 
occurrences of failure to make a change atomically without interference (which 
in the lazy case would be failure to acquire exclusivity to merge your changes 
in)

I'm currently on holiday but will try to review your patch shortly.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_5.txt, 7546.20_alt.txt, suggestion1.txt, 
> suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-24 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074127#comment-14074127
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Actually looking at my numbers here on the production level h/w, I certainly 
don't think the numbers are too aggressive (i.e. if anything they kick in too 
late), but as I say it'd be nice to actually watch this in the real world.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_4.txt, 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-24 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074124#comment-14074124
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Once again numbers - note I'm still using the same test driver as before (hence 
the 0 up/down, count numbers etc), though I have updated it to pass a column 
cloner in the transform.

{code}
[junit] --
 [junit] 1 THREAD; ELEMENT SIZE 64
 [junit] 
 [junit] Threads = 1 elements = 10 (of size 64) partitions = 1
 [junit]  original code:
 [junit]   Duration = 1020ms maxConcurrency = 1
 [junit]   GC for PS Scavenge: 37 ms for 3 collections
 [junit]   Approx allocation = 589MB vs 8MB; ratio to raw data size = 
73.61468285714285
 [junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 sync 
0/0 up 0 down 0
 [junit] 
 [junit]  modified code: 
 [junit]   Duration = 963ms maxConcurrency = 1
 [junit]   GC for PS Scavenge: 22 ms for 2 collections
 [junit]   Approx allocation = 584MB vs 8MB; ratio to raw data size = 
72.99738571428571
 [junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 sync 
0/0 up 0 down 0
 [junit] 
 [junit] 
 [junit] Threads = 1 elements = 10 (of size 64) partitions = 16
 [junit]  original code:
 [junit]   Duration = 826ms maxConcurrency = 1
 [junit]   GC for PS Scavenge: 24 ms for 2 collections
 [junit]   Approx allocation = 496MB vs 8MB; ratio to raw data size = 
61.99165047619048
 [junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 sync 
0/0 up 0 down 0
 [junit] 
 [junit]  modified code: 
 [junit]   Duration = 746ms maxConcurrency = 1
 [junit]   GC for PS Scavenge: 25 ms for 2 collections
 [junit]   Approx allocation = 477MB vs 8MB; ratio to raw data size = 
59.63136380952381
 [junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 sync 
0/0 up 0 down 0
 [junit] 
 [junit] 
 [junit] Threads = 1 elements = 10 (of size 64) partitions = 256
 [junit]  original code:
 [junit]   Duration = 617ms maxConcurrency = 1
 [junit]   GC for PS Scavenge: 11 ms for 1 collections
 [junit]   Approx allocation = 362MB vs 8MB; ratio to raw data size = 
45.24315523809524
 [junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 sync 
0/0 up 0 down 0
 [junit] 
 [junit]  modified code: 
 [junit]   Duration = 602ms maxConcurrency = 1
 [junit]   GC for PS Scavenge: 11 ms for 1 collections
 [junit]   Approx allocation = 366MB vs 8MB; ratio to raw data size = 
45.77833523809524
 [junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 sync 
0/0 up 0 down 0
 [junit] 
 [junit] 
 [junit] Threads = 1 elements = 10 (of size 64) partitions = 1024
 [junit]  original code:
 [junit]   Duration = 443ms maxConcurrency = 1
 [junit]   GC for PS Scavenge: 11 ms for 1 collections
 [junit]   Approx allocation = 308MB vs 8MB; ratio to raw data size = 
38.4688464
 [junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 sync 
0/0 up 0 down 0
 [junit] 
 [junit]  modified code: 
 [junit]   Duration = 422ms maxConcurrency = 1
 [junit]   GC for PS Scavenge: 10 ms for 1 collections
 [junit]   Approx allocation = 309MB vs 8MB; ratio to raw data size = 
38.667831428571425
 [junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 sync 
0/0 up 0 down 0
 [junit] 
 [junit] 
 [junit] --
 [junit] 100 THREADS; ELEMENT SIZE 64
 [junit] 
 [junit] Threads = 100 elements = 10 (of size 64) partitions = 1
 [junit]  original code:
 [junit]   Duration = 2039ms maxConcurrency = 100
 [junit]   GC for PS Scavenge: 118 ms for 34 collections
 [junit]   Approx allocation = 11178MB vs 8MB; ratio to raw data size = 
1395.417500952381
 [junit]   loopRatio (closest to 1 best) 18.20478 raw 10/1820478 counted 
0/0 sync 0/0 up 0 down 0
 [junit] 
 [junit]  modified code: 
 [junit]   Duration = 1299ms maxConcurrency = 100
 [junit]   GC for PS Scavenge: 14 ms for 1 collections
 [junit]   Approx allocation = 614MB vs 8MB; ratio to raw data size = 
76.68355047619048
 [junit]   loopRatio (closest to 1 best) 1.05291 raw 779/6045 counted 0/0 sync 
99246/99246 up 0 down 0
 [junit] 
 [junit] 
 [junit] Threads = 100 elements = 10 (of size 64) partitions = 16
 [junit]  original code:
 [junit]   Duration = 224ms maxConcurrency = 100
 [junit]   GC for PS Scavenge: 22 ms for 2 collections
 [junit]   Approx allocation = 832MB vs 8MB; ratio to raw data size = 
103.971206
 [junit]   loopRatio (closest to 1 best) 1.89634 raw 10/189634 counted 0/0 
sync 0/0 up 0 down 0
 [junit] 
 [junit]  modified code: 
 [junit]   Duration = 226ms maxConcurrency = 99
 [junit]   GC for PS Scavenge: 22 ms for 2 collections
 [junit]   Approx allocation = 810MB vs 8MB; ratio to raw data size = 
101.20042857142857
 [junit]   loopRatio (closest to 1 best) 1.92036 r

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-23 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072718#comment-14072718
 ] 

graham sanderson commented on CASSANDRA-7546:
-

cool - makes sense now, It'll be tomorrow now, but I'll put up a new version

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-23 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072535#comment-14072535
 ] 

Benedict commented on CASSANDRA-7546:
-

Yes, I'm referring to the memtables - an AtomicSortedColumns instance lives 
until its containing memtable is flushed. 100MB/s is around 1M snaptree node 
allocations, so that is maybe a little high for deciding there's too much 
competition (although with ~ 1000 items present this is only 100k inserts), so 
how about we fix it to 10MB/s, to be exceeded by 10Mb. We could certainly hit 
100MB of waste, no trouble (under high competition we'll see orders of 
magnitude more wasted than used, and memtables usually store 1Gb+), but I think 
it's better to trigger a little more readily


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-23 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072530#comment-14072530
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Good point - I was mixing the two types of memory allocation in my head... that 
said I don't know when we are seeing this in production how long each 
AtomicSortedColumns instance lives.

bq.  they stick around until they fill up

I assume you are referring to the memtables there... what defines "full" 
besides.
- there is a hard(ish) memory limit in yaml
- MeteredFlusher flushes high traffic stuff

Basically, I'm just checking that we don't think our 100MB/s wastage may never 
trigger due to aggressive flushing... theoretically we must be wasting MUCH 
more than we are really writing, but I don't have numbers (I could look at the 
logs to get them) to see how often hints memtables were being flushed during 
this process and how big they were.



> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-23 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072494#comment-14072494
 ] 

Benedict commented on CASSANDRA-7546:
-

Under load they don't last very long (i.e. they stick around until they fill 
up, which can be just a few minutes, or even faster under really high load), 
however we don't care about how much we're allocating _to the memtable_ - we 
care about how much memory we allocation wasteful that _do not_ make it into 
the memtable, i.e. all that GC overhead you were seeing - in the worst case you 
saw 12Gb in only 2.5s against one partition. So whatever numbers we fix for 
this scheme we will avoid anything like that kind of extreme scenario.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-23 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072483#comment-14072483
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Ignoring the monotonic bit ;-) as you say it has to be relative to something 
anyway

bq. AtomicBTreeColumns is unlikely to live past 3.1

Sorry, I meant how long is an instance of one of those classes likely to last? 
i.e. is it possible to see 100MB of allocation into one single instance, or 
would another instance have taken over by then. I assume since you are 
suggesting it that it is possible, but thought I'd double check that that is 
what you mean.



> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-23 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072469#comment-14072469
 ] 

Benedict commented on CASSANDRA-7546:
-

bq. it is the low 64 bits of a monotonic number

That's pretty pedantic, since with nanos that stretches to 600 years before 
overflow!

Either way, I'm not sure if I clarified or not but we should be offsetting this 
number from the memtable creation time so we can safely stick within 32 bits. I 
suggest we use the top bit being set as the indicator we've hit contention, so 
we naturally avoid problematic overflow (although really this would just result 
in our optimisation not running properly, so would also be fine)

bq.  how long you expect AtomicSorted/BTreeColumns to last

AtomicBTreeColumns is unlikely to live past 3.1. I would like to get rid of it 
in 3.0, but that is probably ambitious. So another year or so at bleeding edge; 
a few more years at various stages downstream no doubt. AtomicSortedColumns 
will be around as long as 2.0.x is, which is decided by the community really.

Either way, tuning this value is probably not super helpful, since the goal is 
simply to avoid lots of wasted memory allocations. We can simply define a 
sensible slightly cautious criteria for this, and that should be sufficient, 
since if we are slightly overly cautious the end result is only a small number 
of partitions seeing slightly reduced throughput for writes. It is not a huge 
deal either way. It's only really likely to have a measurable impact at all on 
very highly contended partitions, on which any sane value will likely yield a 
very similar improvement.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-23 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072465#comment-14072465
 ] 

graham sanderson commented on CASSANDRA-7546:
-

OK - I had something else come up today, but yeah I realized my math was 
wrong... it is certainly a bit of a massage the correct fidelity of information 
within 32 bits, without overflowing too soon, or not having enough padding so 
that bursty allocation under the sustained limit causes problems.

bq. It is monotonic; that's its main purpose.

I guess we (me?) are being pedantic here... it is the low 64 bits of a 
monotonic number - (even this was broken on early OS/JVM combinations due to 
bugs, however we can take that as fact now I think); what the actual number is 
is undefined. It does seem on UNIX variants appear to be rebased to 
nanonseconds since epoch, and probably on all modern systems is some counter 
that was reset at least on power cycle, so you are probably ok. In any case, 
doing the right thing is pretty much always trivial (assuming you don't expect 
your JVM to run for 200+ years)

--

As an aside, can you give me a hint as to how long you expect 
AtomicSorted/BTreeColumns to last... tuning does seem critical here, since 
wasting 100M would probably be a reasonable value, but I don't know in practice 
if something else would likely end up flushing the memtable before it ever got 
that far.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-23 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072225#comment-14072225
 ] 

Benedict commented on CASSANDRA-7546:
-

bq. however I assume that the end result is that you don't want either the 
Atomic***Columns or the Holder object to grow at all (i.e. another 8 bytes), 
and I'm assuming you're calculating space based on compressedoops object layout

Right, yes. There's room for one 'free' 32-bit value in the AtomicBTreeColumns 
is what I meant.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-23 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072180#comment-14072180
 ] 

Benedict commented on CASSANDRA-7546:
-

Well, actually the scheme I outlined isn't _exactly_ requiring a rate of 
100MB/s; all that actually needs to happen is it consistently exceed a rate of 
1Mb/s for a total allocation of 100MB (which can happen if > 100MB are 
allocated in < 1 second, i.e. 100MB/s, but also if 110Mb is allocated over 
10s). We can tweak those numbers however we like (within some window of 
representable numbers with enough range). For instance exceed a rate of 10MB/s 
consistently by a total of 10MB, which would require e.g. dividing our bytes 
allocated by 1k, measuring time in 100ns intervals, and offset the present by 
10 * 1024. To capture a rate of 100MB/s, we would need to either expect that 
memtables never live for more than 0.5 days (probably reasonable, i.e. 
represent time in 10ns intervals) or require that a single mutator allocates 
10k in one run (also quite reasonable) but we're pushing the limits of what we 
can safely represent.

bq. nanoTime is not monotonic

It is monotonic; that's its main purpose. Although there are no doubt caveats 
on a given machine/processor for how strictly that is guaranteed

bq. which clones are you talking about

Mistype. I mean the number/size of objects we estimate we've allocated 
wastefully. We can estimate this in 2.0 with 200+100*lg2(N), and in 2.1 we 
measure it exactly.


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-23 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072084#comment-14072084
 ] 

graham sanderson commented on CASSANDRA-7546:
-

duh - i'm an idiot, your code catches the allocation waste rate of 100MB/s 
without actually having to get there!

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-23 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072081#comment-14072081
 ] 

graham sanderson commented on CASSANDRA-7546:
-

bq. It doesn't look to me like we re-copy the ranges (only the arrays we store 
them in)

Oops, yeah you are correct

{quote}
I would rather we didn't increase the amount of memory we use. In 2.1 I'm 
stricter about this, because in 2.0 we can mitigate it by replacing 
AtomicReference with a volatile and an AtomicReferenceFieldUpdater. But 
whatever we do in 2.1 has to be free memory-wise. This means we have 1 integer 
or 1 reference to play with in the outer class (not the holder), as we can get 
this for free. We don't need to maintain a size in 2.1 though, so this is easy. 
We can track the actual amount of memory allocated (since we already do this).
{quote}

I'm all for not wasting memory, after all this is what this patch is about. I'm 
not sure exactly what 2.1 has to be _free_ memory wise means... however I 
assume that the end result is that you don't want either the Atomic***Columns 
or the Holder object to grow at all (i.e. another 8 bytes), and I'm assuming 
you're calculating space based on compressedoops object layout (so we may have 
a chance to fill in a spare 32 bit value somewhere; I'll have to check the 2 
classes in 2.0 and 2.1 cases). Note the reason I'm confused about free is that 
the Object[] for the btree are on heap things and we allocate quite a lot of 
them. Perhaps by free you mean, no increase in memory usage vs today for this 
change.

bq. get the current time in ms (but from nanoTime since we need monotonicity);

Also slight confused; nanoTime is not monotonic but nanoTime minus some static 
base nanoTime is for all practical purposes, so I assume you mean this. Based 
on that I guess we can use Integer.MIN_VALUE as a "no one has wasted work yet" 
flag.

bq.  In 2.0 we multiply the number of updates we had made by by lg2(N) (N = 
current tree size), and multiple this by 100 (approximate size of snaptree 
nodes) + ~200 per clone

by number of updates do you mean individual column attempts?
which clones are you talking about - I have currently moved them outside the 
loop which allowed for pre-sharing, and for shrinking the locked work later, 
but this extra int[] is not free (unless we are only talking about retained 
space vs temporary).
I guess we should probably always round up to 1K... that would still be 100,000 
CAS fails a second which is certainly bad

Anyway, I'll double check the allocation costs in 2.0.x, use and atomic field 
updater, and make a 2.0.x patch (and see how it behaves)

Now "max rate" sounds more like something that should be exposable via config 
(though since it is an implementation detail that will go away eventually, it 
doesn't make sense to make it a per CF thing)... I'll run my test again to see 
what a good value seems to be. But yeah if something wastes 100M/s ever, I 
think we can call mark it as "special".

Note, the one question other question I have is how big can a single 
Atomic***Instance get - i.e. is it even possible to allocate 100MB in one, or 
do they turn over too fast.


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-23 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071538#comment-14071538
 ] 

Benedict commented on CASSANDRA-7546:
-

It doesn't look to me like we re-copy the ranges (only the arrays we store them 
in)

On your patch, I have a couple of minor concerns:

* I would rather we didn't increase the amount of memory we use. In 2.1 I'm 
stricter about this, because in 2.0 we can mitigate it by replacing 
AtomicReference with a volatile and an AtomicReferenceFieldUpdater. But 
whatever we do in 2.1 has to be free memory-wise. This means we have 1 integer 
or 1 reference to play with in the outer class (not the holder), as we can get 
this for free. We don't need to maintain a size in 2.1 though, so this is easy. 
We can track the actual amount of memory allocated (since we already do this).
* I would rather make the condition for upgrading to locks be based on some 
rate of wasted work (or, since it works just as well, some rate of wasted 
memory allocations). The current value seems a bit clunky and difficult to 
tune, and might be no real indication of contention. However we need to keep 
this encoded in an integer, and we need to ensure it is free to maintain in the 
fast case. 

So I propose the following: 

# we decide on a maximum rate of waste (let's say 100MB/s)
# when we first waste work we: 
#* get the current time in ms (but from nanoTime since we need monotonicity);
#* subtract from it our max rate (100Mb/s) converted to K/s, i.e. 100 * 1024, 
so we have present-100*1024;
#* set our shared counter state to this value
# whenever we waste work we:
# we calculate how much we wasted\* in Kb 
#* we add this to our shared counter;
#* if the shared counter has _gone past the present time_ we know we've 
exceeded our maximum wastage, and we set our counter to Integer.MAX_VALUE which 
is the flag to everyone to upgrade to locks;
#* if we see it's too in the past, we reset it to present-(100*1024)

\* To calculate wasted work, we track the size you currently are tracking in 
2.0, and in 2.1 we use the BTree's existing size-delta tracking. In 2.0 we 
multiply the number of updates we had made by by lg2(N) (N = current tree 
size), and multiple this by 100 (approximate size of snaptree nodes) + ~200 per 
clone

This is the same scheme I used for tracking wasted cycles in SharedExecutorPool 
(CASSANDRA-4718) and I think it works pretty well, and is succinctly 
represented in memory.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-22 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071330#comment-14071330
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Note w.r.t. deletioninfo... I'm a bit confused about who owns what.

On 2.1 (And I'm not 100% sure of the exact semantics of when you need to use 
HeapAllocator.instance vs pure heap allocation, since I haven't looked at the 
2.1 code much)

{code}
if (inputDeletionInfoCopy == null)
inputDeletionInfoCopy = 
cm.deletionInfo().copy(HeapAllocator.instance);

deletionInfo = 
current.deletionInfo.copy().add(inputDeletionInfoCopy);
updater.allocated(deletionInfo.unsharedHeapSize() - 
current.deletionInfo.unsharedHeapSize());
{code}

However, current.deletionInfo.copy() is not done with the HeapAllocator, and 
the passed inputDeletionInfoCopy's ranges are RE-copied (without using 
HeapAllocator.instance) on some code paths inside the .add() method but not 
others


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_3.txt, 
> 7546.20_alt.txt, suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-22 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071171#comment-14071171
 ] 

graham sanderson commented on CASSANDRA-7546:
-

In case anyone is reading them, here is the latest output - note with the 
current wasted work limit of 100, we actually kick in later except under the 
higher contention loads, but doing a one time flip, actually do less work 
overall...

{code}
[junit] --
[junit] 1 THREAD; ELEMENT SIZE 64
[junit] 
[junit] Threads = 1 elements = 10 (of size 64) partitions = 1
[junit]  original code:
[junit]   Duration = 996ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 36 ms for 3 collections
[junit]   Approx allocation = 563MB vs 8MB; ratio to raw data size = 
70.37447428571429
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit]  modified code: 
[junit]   Duration = 765ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 38 ms for 3 collections
[junit]   Approx allocation = 590MB vs 8MB; ratio to raw data size = 
73.67167714285715
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit] 
[junit] Threads = 1 elements = 10 (of size 64) partitions = 16
[junit]  original code:
[junit]   Duration = 496ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 20 ms for 2 collections
[junit]   Approx allocation = 448MB vs 8MB; ratio to raw data size = 
55.95978857142857
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit]  modified code: 
[junit]   Duration = 574ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 27 ms for 2 collections
[junit]   Approx allocation = 485MB vs 8MB; ratio to raw data size = 
60.56426285714286
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit] 
[junit] Threads = 1 elements = 10 (of size 64) partitions = 256
[junit]  original code:
[junit]   Duration = 662ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 12 ms for 1 collections
[junit]   Approx allocation = 333MB vs 8MB; ratio to raw data size = 
41.59998095238095
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit]  modified code: 
[junit]   Duration = 241ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 9 ms for 1 collections
[junit]   Approx allocation = 349MB vs 8MB; ratio to raw data size = 
43.65317619047619
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit] 
[junit] Threads = 1 elements = 10 (of size 64) partitions = 1024
[junit]  original code:
[junit]   Duration = 222ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 11 ms for 1 collections
[junit]   Approx allocation = 273MB vs 8MB; ratio to raw data size = 
34.18085428571428
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit]  modified code: 
[junit]   Duration = 234ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 10 ms for 1 collections
[junit]   Approx allocation = 286MB vs 8MB; ratio to raw data size = 
35.7883064
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit] 
[junit] --
[junit] 100 THREADS; ELEMENT SIZE 64
[junit] 
[junit] Threads = 100 elements = 10 (of size 64) partitions = 1
[junit]  original code:
[junit]   Duration = 1383ms maxConcurrency = 100
[junit]   GC for PS Scavenge: 108 ms for 29 collections
[junit]   Approx allocation = 9525MB vs 8MB; ratio to raw data size = 
1189.0213895238096
[junit]   loopRatio (closest to 1 best) 16.74471 raw 10/1674471 counted 
0/0 sync 0/0 up 0 down 0
[junit] 
[junit]  modified code: 
[junit]   Duration = 1728ms maxConcurrency = 100
[junit]   GC for PS Scavenge: 14 ms for 1 collections
[junit]   Approx allocation = 572MB vs 8MB; ratio to raw data size = 
71.49758761904762
[junit]   loopRatio (closest to 1 best) 1.00011 raw 144/154 counted 0/0 
sync 99856/99857 up 0 down 0
[junit] 
[junit] 
[junit] Threads = 100 elements = 10 (of size 64) partitions = 16
[junit]  original code:
[junit]   Duration = 223ms maxConcurrency = 100
[junit]   GC for PS Scavenge: 24 ms for 2 collections
[junit]   Approx allocation = 760MB vs 8MB; ratio to raw data size = 
94.87286476190476
[junit]   loopRatio (closest to 1 best) 1.88353 raw 10/188353 counted 
0/0 syn

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-22 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070533#comment-14070533
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Well that makes sense, I hadn't checked if there was a limit on mutator threads 
- we didn't change it... this probably explains the hard upper bound in my 
synthetic test (which incidentally does not to the transformation)

I agree with you on SnapTreeMap, once I see the "essentially" free clone 
operation has to acquire a lock (or at least wait for no mutations)... I 
surmised there were probably dragons there that might cause all kinds of 
nastyness whether it be pain on concurrent updates to a horribly unbalanced 
tree, or dragging huge amounts of garbage with it due to overly lazy copy on 
write (again I didn't look too closely). BTree looks much better (and probably 
does less rebalancing since it has wider nodes I think), though as discussed it 
doesn't prevent the underlying race.

So, I'll see if I have time to work on this later today, but the plan is... for 
2.0.x (just checking)

a) move the transformation.apply out of the loop and do it once
b) do a one way flip flag per AtomicSortedColumns instance, which is flipped 
when a cost reaches a certain value. I was going to calculate the delta in each 
mutator thread (probably adding a log-like measure e.g. using 
Integer.numberOfLeadingZeros(tree.size()) per failing CAS), though looking ugh 
at SnapTreeMap again, it seems that tree.size() is not a good method to call in 
the presence of mutations, so I guess Holder can just track the tree size itself
c) given this is possibly a temporary solution, is it worth exposing the 
"cut-off" value even un-documented such that it could be overridden in 
cassandra.yaml? Note the default should be such that most AtomicSortedColumns 
instance never get cut-off since they are not heavily contended and large 
(indicating contended inserts not updates)


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-22 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070133#comment-14070133
 ] 

Benedict commented on CASSANDRA-7546:
-

bq. let me know if you want me to take another stab at the patch

We're always keen for competent new comers to start contributing to the 
project; if you've got the time that would be great, and I can review. If not, 
I'm happy to make this change.

bq. we probably have hundreds of concurrent mutator threads for them

This should never be the case. By default there are 32 concurrent writers 
permitted, and this should never be changed to more than a small multiple of 
the number of physical cores on the machine (unless running batch CL), so if 
there are hundreds something is going wrong. Furthermore, it makes very little 
sense that this problem wouldn't be hit by as many concurrent large 
modifications: the race condition is the same, but much easier to hit the more 
work there is being done per concurrent modifier. 

I decided to take a peek at the SnapTreeMap code, since this didn't make much 
sense, and I see that there is a very different behaviour if we have many 
clones() as opposed to many updates (larger updates would necessarily result in 
a lower incidence/overlap of clone()), as epochs attempt to be allocated. I 
don't really have time to waste digging any deeper, but it seems possible that 
this code path results in a great deal more object allocation (and possibly 
allocations that are not easily collectible) than simply performing many large 
updates. If this is the case, then again 2.1 will not suffer this problem. This 
doesn't feel like a satisfactory explanation, and nor does the slightly 
different possible synchronization behaviour with larger updates (snap tree is 
littered with synchronized() calls, which might possibly overlap more often 
with many updates).

Either way, I'm happy to introduce the mitigation strategy we've discussed, 
since it makes sense in and of itself. However we clearly do not fully 
understand what is happening in your specific scenario, but I do not want to 
dig further into snap tree - it's a really ugly contraption!

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-22 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069981#comment-14069981
 ] 

graham sanderson commented on CASSANDRA-7546:
-

My last piece of speculation... these single partition hint trees are probably 
getting thousands of nodes big, and we probably have hundreds of concurrent 
mutator threads for them. It may just be that we are hitting a "sweet" spot of 
allocation rate such that none of the on processor threads get to actually make 
sufficient progress to reach their cas before we end up needing to GC, at which 
point they must all safepoint after which I assume, they don't get any 
preferential dibs at running next, so we have a much higher ratio of wastage 
than even in my synthetic test where it was largely proportional to number of 
cores not number of threads. In this nasty case where we have enough cores to 
do lots of concurrent work, but enough work per core to cause enough allocation 
to cause GC before any of them finish the task at hand, you get the worst of 
both the locking and the spinning worlds.

Anyways, let me know if you want me to take another stab at the patch including 
doing the one time allocation outside the loop (or on first pass) - you are 
more familiar with the code, but it is always good to learn.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-22 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069953#comment-14069953
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Note the one summary is that lots of small inserts seems to cause a lot more 
problems than lots of large inserts, presumably because they can happen faster 
and anything bounded by their intrinsic size rather than their actual overhead 
can fit more of them

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-22 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069938#comment-14069938
 ] 

graham sanderson commented on CASSANDRA-7546:
-

{quote}
However whether it is one-way or not is somewhat unimportant for me. This flip 
would only last the lifetime of a memtable, which is not super lengthy (under 
heavily load probably only a few minutes), and would not have dramatically 
negative consequences if it got it slightly wrong
{quote}
Cool, that's what I was asking/thinking.

As for the tree size/rebalancing, I have no particular proof... when things go 
wrong we are hinting massively, and so maybe there are hundreds of hint 
mutation threads each with their own in progress rebalance, pinning a lot of 
nodes across young GC. That said, the memory allocation rate is truly 
spectacular, even given the excessive hinting, so I have to suspect the 
spinning (and as you say probably some of the in arena allocation it does too) 
- though that would also be surprising since these are hint updates which are a 
single cell update

Anyway... we can track cost in the Holder I guess to avoid any atomic 
operations, and maybe factor in the tree size there too.

Note as an aside, we are partly to blame for this issue (best practices to be 
learned, and ways we can mitigate) but the result is surprising enough (because 
things go bad at random, and usually when we are inserting 100s of times less 
data than we can easily handle) that others might easily get bitten. I would 
describe everything that I think is going on in the snowballing of problems, 
but it is a bit of a comedy of errors.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-21 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069896#comment-14069896
 ] 

Benedict commented on CASSANDRA-7546:
-

bq. Alternatively if you are saying, let each thread keep working while they 
still believe they can win, 

This was my original rationale for the patch I posted, however now I am much 
more in favour of 

bq. a one way switch per Atomic*Columns instance that flips after a number 
waster "operations"?

However whether it is one-way or not is somewhat unimportant for me. This flip 
would only last the lifetime of a memtable, which is not super lengthy (under 
heavily load probably only a few minutes), and would not have dramatically 
negative consequences if it got it slightly wrong

However^2 I'm still having a hard time believing rebalancing costs in snap tree 
can be that high, and further if that really is the problem it should not be an 
issue in 2.1, as the b-tree rebalances with O(lg(N)) allocations. I'd be a 
little surprised if the snap tree didn't do the same, as if there were more 
than O(lg(N)) allocations, the algorithmic complexity would be > O(lg(N)) also. 
It's possible somehow that it manages to inter-refererence with on-going 
copies, so that we get a highly complex graph that retains exponentially more 
garbage the more competing updates there are, but again I would be very 
surprised if this were the case. However outside of either of these I would 
expect the garbage generated to all be immediately collectible, so it would 
have to be the sheer volume alone that overwhelmed the GC, which is certainly 
possible but this would entail a _lot_ of hinting, and I'd be surprised if a 
node could be receiving a large enough quantity. On the other hand the arena 
allocations in 2.0 are definitely incapable of being collected and could be 
allocated almost as rapidly.

bq. I'm not sure which changes you are talking about back-porting and whether 
the "at most twice" refers to looping once then locking

In this instance I'm referring to copying the source ColumnFamily locally in 
the variable once after failing the cas, so that we do not keep allocating 
arena space. Alternatively, we could just do it upfront in the method, as the 
only extra cost is an array allocation proportional in size to the input data, 
which is fairly cheap.

All of this said, I think the behaviour of locking after wasting an excessive 
number of cycles is still a good one, so I'm comfortable introducing it either 
way, and it would certainly help with all of the above causes.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-21 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069666#comment-14069666
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Alternatively if you are saying, let each thread keep working while they still 
believe they can win, or while they have something to do that can be reused if 
they lose, then maybe give them one last shot to try again if they lose and 
haven't done anything reusable, then make them block... I'm okay with that. (of 
course on 2.0.x. today, that pretty much boils down to your patch!)

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-21 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069504#comment-14069504
 ] 

graham sanderson commented on CASSANDRA-7546:
-

{quote}
I do wonder how much of a problem this is in 2.1, though. I wonder if the 
largest problem with these racy modifications isn't actually the massive 
amounts of memtable arena allocations they incur in 2.0 with all their 
transformation.apply() calls (which reallocate the mutation on the arena), 
which is most likely what causes the promotion failures, as they cannot be 
collected. I wonder if we shouldn't simply backport the logic to only allocate 
these once, or at most twice (the first time we race). It seems much more 
likely to me that this is where the pain is being felt.
{quote}
I'm not sure which changes you are talking about back-porting and whether the 
"at most twice" refers to looping once then locking. Certainly avoiding any 
repeated cloning of the cells is good, however I'm still pretty sure based on 
PrintFLSStatistics that the slabs themselves are not the biggest problem (I 
suspect SnapTreeMap nodes, combined with high rebalancing cost of huge trees in 
the hint case since the keys are almost entirely sorted).

Are you suggesting a one way switch per Atomic*Columns instance that flips 
after a number waster "operations"? That sounds reasonable... I'd expect that a 
partition for a table is either likely to have high contention or not based on 
the schema design/use case. I have no idea how long these instances hang around 
in practice (presumably not insanely long)

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-21 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069396#comment-14069396
 ] 

Benedict commented on CASSANDRA-7546:
-

My concern with the approach you've outlined is that we're barely a hair's 
breadth from a lock: as soon as we hit _any_ contention, we inflate to locking 
behaviour. This is good for large partitions, and most likely bad for small 
ones, and more to the point seems barely worth the complexity of not just 
making it a lock in the first place. On further consideration, I think I would 
perhaps prefer to run this lock-inflation behaviour based on the size of the 
aborted changes, so if the amount of work we've wasted exceeds some threshold 
we decide it's high time all threads were stopped to let us finish. We could in 
this scenario flip a switch that requires all modifications to acquire the 
monitor once we hit this threshold once; I would be fine with this behaviour, 
and it would be simple. 

I do wonder how much of a problem this is in 2.1, though. I wonder if the 
largest problem with these racy modifications isn't actually the massive 
amounts of memtable arena allocations they incur in 2.0 with all their 
transformation.apply() calls (which reallocate the mutation on the arena), 
which is most likely what causes the promotion failures, as they cannot be 
collected. I wonder if we shouldn't simply backport the logic to only allocate 
these once, or at most twice (the first time we race). It seems much more 
likely to me that this is where the pain is being felt.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-21 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069173#comment-14069173
 ] 

graham sanderson commented on CASSANDRA-7546:
-

Excellent - I will take a look in the 2.1 branch - I was wondering if there 
were some sample profiles.

The main problem we have in 2.0.x is that if we are under relatively heavy 
sustained write load, so we are allocating memtable slabs along with all the 
small short lived objects in the commit log and write path... you add to that 
hinting which means more memtable slabs, and now because of single partition 
for hints, much larger snap trees (whose somewhat contentious 
lazy-copy-on-write may or may not make things worse, I don't know)... under 
that allocation rate we spill huge numbers of small (possibly snap tree nodes) 
objects into the tenured gen along with the slabs, which tends to lead to 
promotion failure and need for compaction.

I'll have to play around, but I don't think it is easy to capture the effect of 
excessive (intended to be) temporary object allocation in a stress test as 
opposed to excessive CPU because the GC can cope really well until it doesn't.

Note my belief is your new tree in 2.1 probably mitigates the problem quite a 
bit (no contention in the tree, wider nodes, less rebalancing etc), though I 
suggest we still fix the CAS loop allocation there too.


> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-21 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069065#comment-14069065
 ] 

Benedict commented on CASSANDRA-7546:
-

I'll take a look at your patch shortly, but in the meantime it's worth pointing 
out cassandra-stress does now support fairly complex CQL inserts including 
various sizes of batch updates, with fine grained control over how large a 
partition to generate, and what percentage of that total partition to update at 
any point. Take a look at the sample stress profiles under the tools hierarchy 
on latest 2.1

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_2.txt, 7546.20_alt.txt, 
> suggestion1.txt, suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-21 Thread graham sanderson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069059#comment-14069059
 ] 

graham sanderson commented on CASSANDRA-7546:
-

FYI here are the same synthetic test results for 7546.20_2.txt

{code}
[junit] --
[junit] 1 THREAD; ELEMENT SIZE 64
[junit] 
[junit] Threads = 1 elements = 10 (of size 64) partitions = 1
[junit]  original code:
[junit]   Duration = 993ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 34 ms for 3 collections
[junit]   Approx allocation = 553MB vs 8MB; ratio to raw data size = 
69.13799428571429
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit]  modified code: 
[junit]   Duration = 761ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 34 ms for 3 collections
[junit]   Approx allocation = 579MB vs 8MB; ratio to raw data size = 
72.31675047619048
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit] 
[junit] Threads = 1 elements = 10 (of size 64) partitions = 16
[junit]  original code:
[junit]   Duration = 780ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 25 ms for 2 collections
[junit]   Approx allocation = 436MB vs 8MB; ratio to raw data size = 
54.48992095238095
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit]  modified code: 
[junit]   Duration = 671ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 24 ms for 2 collections
[junit]   Approx allocation = 477MB vs 8MB; ratio to raw data size = 
59.545997142857146
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit] 
[junit] Threads = 1 elements = 10 (of size 64) partitions = 256
[junit]  original code:
[junit]   Duration = 452ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 11 ms for 1 collections
[junit]   Approx allocation = 321MB vs 8MB; ratio to raw data size = 
40.14510761904762
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit]  modified code: 
[junit]   Duration = 460ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 10 ms for 1 collections
[junit]   Approx allocation = 341MB vs 8MB; ratio to raw data size = 
42.63770857142857
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit] 
[junit] Threads = 1 elements = 10 (of size 64) partitions = 1024
[junit]  original code:
[junit]   Duration = 462ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 14 ms for 1 collections
[junit]   Approx allocation = 264MB vs 8MB; ratio to raw data size = 
32.99879142857143
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit]  modified code: 
[junit]   Duration = 543ms maxConcurrency = 1
[junit]   GC for PS Scavenge: 14 ms for 1 collections
[junit]   Approx allocation = 272MB vs 8MB; ratio to raw data size = 
34.047360952380956
[junit]   loopRatio (closest to 1 best) 1.0 raw 10/10 counted 0/0 
sync 0/0 up 0 down 0
[junit] 
[junit] 
[junit] --
[junit] 100 THREADS; ELEMENT SIZE 64
[junit] 
[junit] Threads = 100 elements = 10 (of size 64) partitions = 1
[junit]  original code:
[junit]   Duration = 2318ms maxConcurrency = 100
[junit]   GC for PS Scavenge: 119 ms for 32 collections
[junit]   Approx allocation = 10547MB vs 8MB; ratio to raw data size = 
1316.62704
[junit]   loopRatio (closest to 1 best) 18.35448 raw 10/1835448 counted 
0/0 sync 0/0 up 0 down 0
[junit] 
[junit]  modified code: 
[junit]   Duration = 1315ms maxConcurrency = 100
[junit]   GC for PS Scavenge: 14 ms for 1 collections
[junit]   Approx allocation = 629MB vs 8MB; ratio to raw data size = 
78.62949142857143
[junit]   loopRatio (closest to 1 best) 1.11563 raw 13653/13653 counted 0/0 
sync 88223/97910 up 0 down 0
[junit] 
[junit] 
[junit] Threads = 100 elements = 10 (of size 64) partitions = 16
[junit]  original code:
[junit]   Duration = 215ms maxConcurrency = 100
[junit]   GC for PS Scavenge: 23 ms for 2 collections
[junit]   Approx allocation = 776MB vs 8MB; ratio to raw data size = 
96.92138285714286
[junit]   loopRatio (closest to 1 best) 1.95927 raw 10/195927 counted 
0/0 sync 0/0 up 0 down 0
[junit] 
[junit]  modified code: 
[junit]   Duration = 201ms maxConcurrency = 99
[junit]   GC for PS Scavenge: 9 ms for 1 collections
[juni

[jira] [Commented] (CASSANDRA-7546) AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory

2014-07-21 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068803#comment-14068803
 ] 

Benedict commented on CASSANDRA-7546:
-

bq. I'm not sure my code (whilst not blazingly pretty) is insanely hard to 
reason about... 

I'm not suggesting it is by any means abhorrent, only that we can achieve the 
desired goal with fewer changes, so unless there's a lot of evidence that the 
extra complexity is worth it, we should stick with the simpler approach (this 
also means less pollution of the instruction cache in a very hot part of the 
codebase, which is a good thing). I would suggest running a stress workload 
with a fixed number of threads, with increasing numbers of partitions (from 1 
up to > number of threads) and see how the curve changes, if you want to 
benchmark this closely.

As to (b): since we only ever acquire the lock when we are contending, it must 
always be inflated anyway, so this shouldn't be an issue.

> AtomicSortedColumns.addAllWithSizeDelta has a spin loop that allocates memory
> -
>
> Key: CASSANDRA-7546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7546
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: graham sanderson
>Assignee: graham sanderson
> Attachments: 7546.20.txt, 7546.20_alt.txt, suggestion1.txt, 
> suggestion1_21.txt
>
>
> In order to preserve atomicity, this code attempts to read, clone/update, 
> then CAS the state of the partition.
> Under heavy contention for updating a single partition this can cause some 
> fairly staggering memory growth (the more cores on your machine the worst it 
> gets).
> Whilst many usage patterns don't do highly concurrent updates to the same 
> partition, hinting today, does, and in this case wild (order(s) of magnitude 
> more than expected) memory allocation rates can be seen (especially when the 
> updates being hinted are small updates to different partitions which can 
> happen very fast on their own) - see CASSANDRA-7545
> It would be best to eliminate/reduce/limit the spinning memory allocation 
> whilst not slowing down the very common un-contended case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

78 matches

Mail list logo