subject:"\[jira\] \[Commented\] \(CASSANDRA\-7486\) Compare CMS and G1 pause times"

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-05-26 Thread Albert P Tobey (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559669#comment-14559669
 ] 

Albert P Tobey commented on CASSANDRA-7486:
---

https://github.com/tobert/cassandra/tree/g1gc

https://github.com/tobert/cassandra/commit/33bf6719e0c8e84672c3633f8ecce602affc3071
https://github.com/tobert/cassandra/commit/cafee86c3c5798e423689a26b43d05ed9312adc5

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Albert P Tobey
 Fix For: 3.0 beta 1


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-05-20 Thread Albert P Tobey (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553067#comment-14553067
 ] 

Albert P Tobey commented on CASSANDRA-7486:
---

I'll attach a patch ASAP.

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Albert P Tobey
 Fix For: 3.0 beta 1


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-05-19 Thread Albert P Tobey (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550741#comment-14550741
 ] 

Albert P Tobey commented on CASSANDRA-7486:
---

Did you run into evacuation failures? How big was your heap? I haven't seen any 
evac failures with 2.1 and Java 8. This is one of the things that was worked on 
for Hotspot 1.8. Then again maybe it's Solr that needs the help.

I suspect you can remove a lot of these settings on Java 8, but have also 
discovered that setting the GC threads is necessary on many machines.

Try adding the below line for a nice decrease in p99 latencies.

JVM_OPTS=$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.x


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-05-19 Thread Michael Perrone (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550721#comment-14550721
 ] 

Michael Perrone commented on CASSANDRA-7486:


I have done extensive load testing with G1GC with Java 1.7_80 and Cassandra 
2.0.12.x versions with solr secondary indexes and 20GB max heap. On 8 core 
systems these options were the sweet spot for the test workload and worked out 
well in a production cluster, providing dramatic improvements in overall GC 
time and eliminating long CMS pauses that we could not tune out. I will try to 
attach some graphs/tables/metrics in another comment. 

{code:yaml}
JVM_OPTS=$JVM_OPTS -XX:+UseG1GC
# set these to the number of cores
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=8
JVM_OPTS=$JVM_OPTS -XX:ParallelGCThreads=8
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8
# default is 10, this makes G1 slightly more aggressive
# by starting the marking cycle earlier
# in order to avoid evacuation failure (OOM)
JVM_OPTS=$JVM_OPTS -XX:G1ReservePercent=15
# default is 45, we should start sooner.
# in a high write large heap (20GB) this was
# found to eliminate Old gen pauses
JVM_OPTS=$JVM_OPTS -XX:InitiatingHeapOccupancyPercent=25
# default is 200 there is a tradeoff between latency
# and throughput. increase this to increase throughput
# at the cost of potential latency, up to 1000
JVM_OPTS=$JVM_OPTS -XX:MaxGCPauseMillis=500
# use largest possible region size
# speeds up marking phase, tradeoff is efficiency
# comment out to let JVM decide the size
VM_OPTS=$JVM_OPTS -XX:G1HeapRegionSize=32
{code}


 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.x


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-05-19 Thread Michael Perrone (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551202#comment-14551202
 ] 

Michael Perrone commented on CASSANDRA-7486:


Have not seen evacuation failures, but test systems ran tight enough under 
heavy load to increase the reserve percentage. Heap was 20GB. 

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Albert P Tobey
 Fix For: 3.0 beta 1


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-05-01 Thread Matt Stump (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524278#comment-14524278
 ] 

Matt Stump commented on CASSANDRA-7486:
---

Before we talk about changing the defaults I would like to see tests run on 
something more representative of customer hardware. At the very least we should 
be doing comparisons of CMS vs G1 for different workloads on cstar. I did some 
initial testing and didn't see a huge benefit, but I very well could have been 
doing something wrong. I'm both hopeful and skeptical. 

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.x


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-05-01 Thread Albert P Tobey (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523593#comment-14523593
]

Albert P Tobey commented on CASSANDRA-7486:
---

if I am reading correctly there was pretty never an old generation collection
under the workload I looked at. The old gen was growing but never reached the
point it needed to do an old gen GC.

^ G1 doesn't work that way.

Another behavior to consider is worst case pause time when there is
fragmentation.

^ G1 performs compaction. It's fairly easy to trigger and observe in gc.log
with Cassandra 2.0. It takes more work with 2.1 since it seems to be easier on
the GC.

I'll see if I can find some time to generate graphs to make all this more
convincing, but time is short because I'm spending all of my time tuning users'
clusters where the #1 first issue every time is getting CMS to behave.

Compare CMS and G1 pause times
--

Key: CASSANDRA-7486
URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
Project: Cassandra
Issue Type: Test
Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
Fix For: 2.1.x

See
http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
and https://twitter.com/rbranson/status/482113561431265281
May want to default 2.1 to G1.
2.1 is a different animal from 2.0 after moving most of memtables off heap.
Suspect this will help G1 even more than CMS. (NB this is off by default but
needs to be part of the test.)

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-05-01 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523630#comment-14523630
]

Ariel Weisberg commented on CASSANDRA-7486:
---

bq. ^ G1 doesn't work that way.
I am talking about CMS. When I looked at the 12 gigabyte heap the old gen grew
to 4.1 gigabytes and I didn't see any point that it shrunk.

bq. I'm spending all of my time tuning users' clusters where the #1 first
issue every time is getting CMS to behave.
We can make the case for G1 in different ways. If we want to do it based on
real world results that is fine with me.

To Benedict's point I think looking at all the operations we care about on
realistic time scales is something we would have to do to really know what the
differences are. I wish we had this stuff in CI so it would just be a matter of
changing the flags, but we aren't there yet.

Compare CMS and G1 pause times
--

Key: CASSANDRA-7486
URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
Project: Cassandra
Issue Type: Test
Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
Fix For: 2.1.x

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-05-01 Thread Albert P Tobey (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523724#comment-14523724
]

Albert P Tobey commented on CASSANDRA-7486:
---

I only started messing with G1 this year, so I only know the old behavior by
lore I've read and heard. I have not observed significant problems it in the
~20-40 hours I've spent tuning clusters with G1 recently.

I don't recommend anyone try G1 on JDK 7 u75 or JDK 8 u40 (although it's
probably OK down to u20 according to the docs I've read). I did some testing on
JDK7u75 and it was stable but didn't spend much time on it since JDK8u40 gave a
nice bump in performance (5-10% on a customer cluster) by just switching JDKs
and nothing else.

From what I've read about the reference clearing issues, there is a new-ish
setting to enable parallel reference collection, -XX:+ParallelRefProcEnabled.
The advice in the docs is to only turn it on if a significant amount of time
is spent on RefProc collection, e.g. [Ref Proc: 5.2 ms]. I pulled that
from a log I had handy and that is high enough that we might want to consider
enabling the flag, but in most of my observations it hovers around 0.1ms under
saturation load.

Compare CMS and G1 pause times
--

Key: CASSANDRA-7486
URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
Project: Cassandra
Issue Type: Test
Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
Fix For: 2.1.x

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-05-01 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523736#comment-14523736
 ] 

Benedict commented on CASSANDRA-7486:
-

bq. there is a new-ish setting to enable parallel reference collection

Throughput was something like 5% of other collectors in my testing, so 
parallelizing this would only help so much! :)

My point is, we don't really fully understand G1, and unless we undertake a 
research project to fully understand its pathological cases, and how they 
compare/contrast to its history, I'd prefer we ensured it behaved under complex 
loads, and not just under isolated read or write loading.

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.x


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-05-01 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523621#comment-14523621
]

Benedict commented on CASSANDRA-7486:
-

bq. ^ G1 doesn't work that way.

While it has no old generation, it does promote regions and if this happens a
lot you can get some weird pathological fragmentation. Now, my experience with
G1 is out of date, and I haven't kept up at all with its latest behaviours, but
I saw some really atrocious behaviour on very simple benchmarks a few years
back. At the time, If you modified references that were randomly distributed
around the heap, it required traversing a majority of the heap to collect very
little, and essentially thrashed. I realise it has improved, but I do not know
in what ways, and so I'm wary of making it the default without being certain it
no longer has pathological cases that are a problem for us. Unless we stress
the collector so that it exercises its suboptimal characteristics, I am not
really super confident. I hope this is simply overly cautious, but we know of
users who also had serious problems with sudden degradation despite looking
good in initial testing, and it would be great for that not to be a widespread
problem.

Compare CMS and G1 pause times
--

Key: CASSANDRA-7486
URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
Project: Cassandra
Issue Type: Test
Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
Fix For: 2.1.x

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-05-01 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523146#comment-14523146
]

Benedict commented on CASSANDRA-7486:
-

bq. if I am reading correctly there was pretty never an old generation
collection under the workload I looked at. The old gen was growing but never
reached the point it needed to do an old gen GC.
bq. Another behavior to consider is worst case pause time when there is
fragmentation.

These are concerns we should not dismiss out of hand. My concern is that these
benchmarks in an idealised world of a steady rate of work production is not
representative of a workload including repair, validation, long running huge
compactions, hinting, periodic read/write load spikes. If these performance
profiles are dependent on the memtables never being promoted, this is dependent
on the disk keeping up, and under a worse but realistic workload the
characteristics may be nothing like what [~ato...@datastax.com] is seeing.
Changing these defaults should be done with the absolute utmost of care, and I
would like to see a lot of very long running mixed workload tests including all
of the other spanners in the works.

Compare CMS and G1 pause times
--

Key: CASSANDRA-7486
URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
Project: Cassandra
Issue Type: Test
Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
Fix For: 2.1.x

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-30 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14521805#comment-14521805
 ] 

Ariel Weisberg commented on CASSANDRA-7486:
---

I would just like to see the data visualized. If it's not better in every 
dimension then in what dimensions is better/worse in across all the data that 
Al collected.

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.x


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-30 Thread Phil Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520926#comment-14520926
 ] 

Phil Yang commented on CASSANDRA-7486:
--

Another small question :)
Since many performance improvements were made to G1 in JDK 8 and its update 
releases, do we need to have a propose for jdk7 users especially its early 
versions to update to jdk8's latest version?

And I find a JEP that Make G1 the default garbage collector on 32- and 64-bit 
server configurations. in jdk9. See http://openjdk.java.net/jeps/248 if you 
have not heard about it.

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.x


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-29 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520071#comment-14520071
 ] 

Jonathan Ellis commented on CASSANDRA-7486:
---

 IMO consistent performance should always take precedence over maximum 
 performance/throughput.

Agreed.  I think our bar here should be Is G1 better for the average user 
keeping in mind that the average user is a *lot* worse at tuning CMS than 
Ariel.  Power users can tune for their own workload the way they always have.

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.x


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-29 Thread Albert P Tobey (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519791#comment-14519791
]

Albert P Tobey commented on CASSANDRA-7486:
---

[~aweisberg] comparing promotions between G1 and CMS doesn't really make sense
IMO. G1 promotions simply mark memory where it is without copying. After a
threshold it will compact surviving cards into a single region. What I've
observed is that compaction is rarely necessary with a big enough G1 heap. With
a saturation write workload on Cassandra 2.1 only ~100-200MB seems to stick
around for the long haul with almost all the rest getting cycled every few
minutes (in an 8GB heap).

[~yangzhe1991] I would keep the default heap at 8GB. I have tested with G1 at
16GB on a 30GB m3.2xlarge on EC2 and it generally gets better throughput and
latency because there's more space for G1 to waste (that's what they call
it). Intel tested up to 100GB with Hbase at 200ms pause target and said nice
things about it. I don't see much need for C* to hit that size but it's
certainly doable with G1. The main problem is smaller heaps where G1 starts to
struggle a little, but I found that it still works OK down to 512MB, even if a
bit less efficient than CMS since G1 targets ~10% CPU time for GC while the
others target 1% by default.

Throughput / latency is always a tradeoff and in the case of G1 with
non-aggressive latency targets (-XX:MaxGCPauseMillis=2000) the throughput is
darn close to CMS with considerably improved standard deviation on latency. IMO
that's a great tradeoff as most of the users I talk to in the wild mostly
struggle with getting reliable latency rather than throughput.

IMO consistent performance should always take precedence over maximum
performance/throughput. G1 provides a much more consistent experience with
fewer knobs to mess with (especially tuning eden size, which is still a black
art that nearly every installation I've looked at gets wrong).

Compare CMS and G1 pause times
--

Key: CASSANDRA-7486
URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
Project: Cassandra
Issue Type: Test
Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
Fix For: 2.1.x

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-29 Thread Ariel Weisberg (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519456#comment-14519456
]

Ariel Weisberg commented on CASSANDRA-7486:
---

For the C* we ship today we should evaluate whether G1 is better. For the
platonic ideal C* (where the heap only needs to be 1 or 2 gig) I suspect we
should ship CMS because I found it has lower baseline pause times for young gen
collections especially on server class hardware.

This is something that we can do in a data driven way. [~ato...@datastax.com]
got some good data, but when I sampled throughput (from the spreadsheet) on
some workloads like the 12g CMS and G1 I saw more throughput under CMS. I think
we should munge the data a bit and visualize throughput and P99 (or P99.9). I
am also not a fan of basing the decision off of that # of cores and a non-NUMA
machine which is not representative of the hardware people use.

I am not comfortable with the measurements for large heaps because if I am
reading correctly there was pretty never an old generation collection under
the workload I looked at. The old gen was growing but never reached the point
it needed to do an old gen GC. It's great the server can run that long with so
little promotion (TIL that is a thing that happens). That explains the very
long young gen pauses. Lots of survivor copying I guess when I look at the size
of survivor set vs pause time. I saw young gen pauses in the 400+ millisecond
range under both collectors.

Another behavior to consider is worst case pause time when there is
fragmentation.

With all the overhead of survivor copying I start to wonder if a valid strategy
would be to allow promotion and let the concurrent collector run all the time.
That would bring down young-gen GC pauses in exchange for throughput.

I think whether 8099 means no off-heap memtables in 3.0 is also a factor. If G1
scales to larger heaps and larger on heap memtables then it will be a better
choice.

Compare CMS and G1 pause times
--

Key: CASSANDRA-7486
URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
Project: Cassandra
Issue Type: Test
Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
Fix For: 2.1.5

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-29 Thread Phil Yang (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519604#comment-14519604
]

Phil Yang commented on CASSANDRA-7486:
--

I think the default option should be prudence and care enough to change.
Usually for C* users, it is acceptable that there is no better performance in
new version. However, it may be unacceptable if the new version get a worse
performance. If there is risk that in some cases G1 is worse than CMS, it may
be a better choice to make G1 an optional choice first by offering another
conf/cassandra-env-g1.sh file to let people have a try and don't change the
default settings.

For the tests comparing G1 and CMS, does the tests cover some extreme case? For
example: bootstrap/rebuild/remove node, repair, lots of queries over
tombstone_failure_threshold... And I think each test should take at lease 24
hours to have several full GCs to estimate the latency.

Furthermore, now using CMS, we have a max heap size (8GB) limit even if the
memory of this node is very large. If we decide to change the default gc
algorithm, what is the suitable new limit?

Compare CMS and G1 pause times
--

Key: CASSANDRA-7486
URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
Project: Cassandra
Issue Type: Test
Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
Fix For: 2.1.6

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-28 Thread Rick Branson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518707#comment-14518707
 ] 

Rick Branson commented on CASSANDRA-7486:
-

I think it definitely makes sense as a default. My guess is that it'll result 
in fewer headaches for most people.

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.5


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-28 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518704#comment-14518704
 ] 

Jonathan Ellis commented on CASSANDRA-7486:
---

Sounds like G1 is finally ready to replace CMS.  WDYT [~mstump] [~rbranson] 
[~aweisberg]?

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.5


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-15 Thread Albert P Tobey (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497603#comment-14497603
]

Albert P Tobey commented on CASSANDRA-7486:
---

My benchmarks completed. These were run on 6 quad-core Intel NUCs with 16GB RAM
/ 240GB SSD / gigabit ethernet. Cassandra 2.1.4. The CPUs are fairly slow at
1.4Ghz. The tests were automated with a complete cluster rebuild between tests
and caches dropped before starting Cassandra each time.

The big win with G1 IMO is that it is auto-tuning. I've been running it on a
few other kinds of machines and it generally does much better with more CPU
power.

cassandra-stress was run with an increased heap but is otherwise unmodified
from Cassandra 2.1.4. I checked the gc log regularly and did not see many
pauses for stress itself above 1ms here there, with most pauses in the
~300usec range.

The final output of the stress is available here:

https://docs.google.com/a/datastax.com/spreadsheets/d/19Eb7HGkd5rFUD_C0ZALbK6-R4fPF9vJRr8BrvxBwo38/edit?usp=sharing
http://tobert.org/downloads/cassandra-2.1-cms-vs-g1.csv

The stress commands, system.log, GC logs, conf directory from all the servers,
and full stress logs are available on my webserver here:

http://tobert.org/downloads/cassandra-2.1-cms-vs-g1-data.tar.gz (35MB)

Compare CMS and G1 pause times
--

Key: CASSANDRA-7486
URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
Project: Cassandra
Issue Type: Test
Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
Fix For: 2.1.5

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-08 Thread Albert P Tobey (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485579#comment-14485579
 ] 

Albert P Tobey commented on CASSANDRA-7486:
---

This is with 2.0 / OpenJDK8 since that's what I had running. Same everything 
each run except for heap size. cassandra-stress 2.1.4 read workload / 800 
threads. I'll re-run with 2.1 / Oracle JDK8 and some mixed load.

-XX:+UseG1GC

Also: -XX:+UseTLAB -XX:+ResizeTLAB  -XX:-UseBiasedLocking -XX:+AlwaysPreTouch 
but maybe those should go in a different ticket.

8GB:

op rate   : 139805
partition rate: 139805
row rate  : 139805
latency mean  : 5.7
latency median: 4.2
latency 95th percentile   : 13.2
latency 99th percentile   : 18.5
latency 99.9th percentile : 21.1
latency max   : 303.8

512MB:

op rate   : 114214
partition rate: 114214
row rate  : 114214
latency mean  : 7.0
latency median: 3.7
latency 95th percentile   : 12.4
latency 99th percentile   : 14.7
latency 99.9th percentile : 15.3
latency max   : 307.1

256MB:

op rate   : 60028
partition rate: 60028
row rate  : 60028
latency mean  : 13.3
latency median: 4.0
latency 95th percentile   : 44.7
latency 99th percentile   : 73.5
latency 99.9th percentile : 79.6
latency max   : 1105.4

Same everything with mostly stock CMS settings for 2.0. I added the  
-XX:+UseTLAB -XX:+ResizeTLAB  -XX:-UseBiasedLocking -XX:+AlwaysPreTouch 
settings to keep the numbers comparable to all of my other data.

8GB/1GB:

op rate   : 119155
partition rate: 119155
row rate  : 119155
latency mean  : 6.7
latency median: 4.1
latency 95th percentile   : 11.8
latency 99th percentile   : 15.5
latency 99.9th percentile : 17.3
latency max   : 520.2


512MB ( -XX:+UseAdaptiveSizePolicy):

op rate   : 82375
partition rate: 82375
row rate  : 82375
latency mean  : 9.7
latency median: 4.3
latency 95th percentile   : 28.2
latency 99th percentile   : 49.4
latency 99.9th percentile : 54.8
latency max   : 2642.6


256MB ( -XX:+UseAdaptiveSizePolicy):

op rate   : 77705
partition rate: 77705
row rate  : 77705
latency mean  : 10.3
latency median: 4.8
latency 95th percentile   : 33.6
latency 99th percentile   : 45.3
latency 99.9th percentile : 49.1
latency max   : 1990.0


 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.5


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-07 Thread Albert P Tobey (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484318#comment-14484318
]

Albert P Tobey commented on CASSANDRA-7486:
---

So far my testing of read workloads matches my experience with writes. An 8GB
heap with generic G1GC settings is good for more workloads out of the box
than haphazardly tuned CMS can be. I've been testing on a mix of Oracle/OpenJDK
and JDK7/8 and the results are fairly consistent across the board with the
exception that performance is a tad higher (~5%) on JDK8 than JDK7 (with G1GC -
I have not tested CMS much on JDK8).

These parameters get better throughput than CMS out of the box with
significantly improved consistency in the max and p99.9 latency.

-Xmx8G -Xms8G -XX:+UseG1GC

If throughput is more critical than latency, the following will get a few %
more throughput at the cost of potentially higher max pause times:

-Xmx8G -Xms8G -XX:+UseG1GC -XX:MaxGCPauseMillis=2000
-XX:InitiatingHeapOccupancyPercent=75

My recommendation is to document the last two options in cassandra-env.sh but
leave them disabled/commented out for end-users to fiddle with. Other knobs for
G1 didn't make a statistically measurable difference in my observations.

G1 scales particularly well with heap size on huge machines. 8 to 16GB doesn't
seem to make a big difference, matching what [~rbranson] saw. At 24GB I started
seeing about 8-10% throughput increase with little variance in pause times.

IMO the simple G1 configuration should be the default for large heaps. It's
simple and provides consistent latency. Because it uses heuristics to determine
the eden size and scanning schedule, it will adapts well to diverse
environments without tweaking. Heap sizes under 8GB should continue to use CMS
or even experiment with serial collectors (e.g. Raspberry Pi, t2.micro,
vagrant). If there is interest, I will write up a patch for cassandra-env.sh to
make the auto-detection code pick G1GC at = 6GB heap and CMS for 6GB.

Compare CMS and G1 pause times
--

Key: CASSANDRA-7486
URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
Project: Cassandra
Issue Type: Test
Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
Fix For: 2.1.5

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-07 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484645#comment-14484645
 ] 

Jonathan Ellis commented on CASSANDRA-7486:
---

[~ato...@datastax.com] wdyt of mstump's suggestions at CASSANDRA-8150?

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.5


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-07 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484647#comment-14484647
 ] 

Jonathan Ellis commented on CASSANDRA-7486:
---

Also, how much better is CMS for small heaps?  Given that sub-8GB heaps aren't 
particularly common or recommended, can we just simplify it to use G1?

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.5


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-07 Thread Albert P Tobey (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484657#comment-14484657
 ] 

Albert P Tobey commented on CASSANDRA-7486:
---

I'll kick off some tests and find out. All of the Oracle docs say not to bother 
below 6GB, but yeah I agree, if it's basically not bad we should go with simple.

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.5


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-04-01 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390359#comment-14390359
 ] 

Robert Stupp commented on CASSANDRA-7486:
-

Nice :)

Looking forward to see Oracle JVM and C* 2.1 results. (TBH, I don't expect much 
difference between OpenJDK8 and Oracle JDK8)

But more interesting would be how G1 behaves w/ read and mixed workloads.


 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.4


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-03-31 Thread Albert P Tobey (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389850#comment-14389850
 ] 

Albert P Tobey commented on CASSANDRA-7486:
---

I managed to get G1 (Java 8) to beat CMS on both latency and throughput on my 
NUC cluster.

Preliminary results: https://gist.github.com/tobert/ea9328e4873441c7fc34



 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.4


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2015-03-09 Thread Jeremy Hanna (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353170#comment-14353170
 ] 

Jeremy Hanna commented on CASSANDRA-7486:
-

Any update on this testing [~shawn.kumar]?  Just wondered as this ticket seemed 
promising initially but hasn't been updated in some time.

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Shawn Kumar
 Fix For: 2.1.4


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2014-07-02 Thread Rick Branson (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050677#comment-14050677
]

Rick Branson commented on CASSANDRA-7486:
-

Mad anecdotes:

We ran with G1 enabled for around 4 days in a 33-node cluster running 1.2.17 on
JDK7u45 that has around a 1:5 read:write ratio. We tried a few different
configurations with short durations, but most of the time we ran it with the
out-of-the-box G1 configuration on a 20G heap and 32 parallel GC threads (16
core, 32 hyperthreaded). There were some somewhat scary bugs fixed in 7u60 that
ultimately caused me to roll back to the CMS collector after the experiment.

* The experiment pointed out that our young gen was basically too small and was
pulling latency up significantly. When we returned back to CMS, I doubled new
size from 800M - 1600M. We had moved to new hardware and hadn't taken the time
to sit down and play with GC settings. This cut our mean latency dramatically
as perceived from the client, ~50% for writes and ~30% for reads, similar to
what we saw with G1. I was quite thrilled with this result.
* I tried both 100ms and 150ms pause times targets with 12G, 16G, and 20G
heaps, and while these resulted in slightly lower mean latency (~5-10%), Mixed
GC activity caused P99s to suffer greatly. There's compelling evidence that the
200ms default is nearly ideal for the way the G1 algorithm works in its current
incarnation.
* We basically needed a 20G heap to make G1 work well for us, since by default
G1 will use up to half of the max heap for eden space and Cassandra needs quite
a large old gen to stay happy. G1 appears to need a much larger eden space to
work efficiently, sizes that would make ParNew die in a fire. GCs of the eden
space were impressively fast, with a ~10G eden space taking ~120ms on average
to collect.
* G1's huge eden space was helpful working around some issues with compaction
on hints CF which had dozens of very wide partitions, hundreds of thousands of
cells each.
* Overall, at the default 200ms pause time target, we didn't see much of an
increase in CPU usage over CMS.

In the end, my tests basically told us that G1 requires a larger heap to get
the same results with *far* less tuning. If there are GC issues, it seems like
in the vast majority of cases G1 can either eliminate them or G1 makes it easy
to just workaround them by cranking up the heap size. Someone should probably
test G1 with a variable-sized heap since it's designed to give back RAM when it
thinks it doesn't need it. That might or might not actually work. While we
didn't test this, a configuration of G1 + heap size min of 1/8 RAM and max of
1/2 RAM might make a really nice default for Cassandra at some point.

Compare CMS and G1 pause times
--

Key: CASSANDRA-7486
URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
Project: Cassandra
Issue Type: Test
Components: Config
Reporter: Jonathan Ellis
Assignee: Ryan McGuire
Fix For: 2.1.0

See
http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-gc-migration-to-expectations-and-advanced-tuning
and https://twitter.com/rbranson/status/482113561431265281
May want to default 2.1 to G1.
2.1 is a different animal from 2.0 after moving most of memtables off heap.
Suspect this will help G1 even more than CMS. (NB this is off by default but
needs to be part of the test.)

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2014-07-01 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049127#comment-14049127
 ] 

Jonathan Ellis commented on CASSANDRA-7486:
---

/cc [~rbranson] [~benedict]

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Ryan McGuire
 Fix For: 2.1.0


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal with 2.0 with moving most of memtables off heap.  
 (NB this is off by default but needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2014-07-01 Thread T Jake Luciani (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049219#comment-14049219
 ] 

T Jake Luciani commented on CASSANDRA-7486:
---

See also 
https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase

100gb heaps!

 Compare CMS and G1 pause times
 --

 Key: CASSANDRA-7486
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
 Project: Cassandra
  Issue Type: Test
  Components: Config
Reporter: Jonathan Ellis
Assignee: Ryan McGuire
 Fix For: 2.1.0


 See 
 http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-gc-migration-to-expectations-and-advanced-tuning
  and https://twitter.com/rbranson/status/482113561431265281
 May want to default 2.1 to G1.
 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
 Suspect this will help G1 even more than CMS.  (NB this is off by default but 
 needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

2014-07-01 Thread Benedict (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049217#comment-14049217
]

Benedict commented on CASSANDRA-7486:
-

We need to make sure we test over an extended period with a variety of
operations being exercised against the cluster. This is probably a good
opportunity to try and define a real world burn in test as well, and what
parameters should be included.

Some things to consider:

# Range of data distributions, including (esp. for this) large partitions and
very large cells. Possibly run two or three parallel stress profiles with very
different data profiles to really give GC a headache dealing with different
velocities / lifetimes.
# Incremental and full repairs
# Hint accumulation / node death
# Tombstones / Range Tombstones
# Secondary indexes?

I'd suggest ignoring some variables, and e.g. stick with just netty, so we can
define a single complex workload and run it for an extended period and get a
good result. While our client buffers behave quite differently with each, I'm
happy tuning defaults for native now it's faster.

It might also be useful, for this test only, to see for a single node how well
the two degrade as heap pressure increases, by artificially consuming large
portions of the heap for the duration of a more simple stress test.

Compare CMS and G1 pause times
--

Key: CASSANDRA-7486
URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
Project: Cassandra
Issue Type: Test
Components: Config
Reporter: Jonathan Ellis
Assignee: Ryan McGuire
Fix For: 2.1.0

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times

33 matches

Site Navigation

Mail list logo

Footer information