[jira] [Updated] (CASSANDRA-7486) Migrate to G1GC by default
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert P Tobey updated CASSANDRA-7486: -- Assignee: Benedict (was: Albert P Tobey) > Migrate to G1GC by default > -- > > Key: CASSANDRA-7486 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 > Project: Cassandra > Issue Type: New Feature > Components: Config >Reporter: Jonathan Ellis >Assignee: Benedict > Fix For: 3.0 alpha 1 > > > See > http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning > and https://twitter.com/rbranson/status/482113561431265281 > May want to default 2.1 to G1. > 2.1 is a different animal from 2.0 after moving most of memtables off heap. > Suspect this will help G1 even more than CMS. (NB this is off by default but > needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Migrate to G1GC by default
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877418#comment-14877418 ] Albert P Tobey commented on CASSANDRA-7486: --- The main point of switching to G1 was to enable most users to get decent - if not the best - performance out of the box without having to guess HEAP_NEWSIZE. Since nobody has the time or inclination to test/discover further, it might as well be rolled back. Users won't notice any difference in pain since there was never a release with G1. > Migrate to G1GC by default > -- > > Key: CASSANDRA-7486 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 > Project: Cassandra > Issue Type: New Feature > Components: Config >Reporter: Jonathan Ellis >Assignee: Albert P Tobey > Fix For: 3.0 alpha 1 > > > See > http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning > and https://twitter.com/rbranson/status/482113561431265281 > May want to default 2.1 to G1. > 2.1 is a different animal from 2.0 after moving most of memtables off heap. > Suspect this will help G1 even more than CMS. (NB this is off by default but > needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Migrate to G1GC by default
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14877311#comment-14877311 ] Albert P Tobey commented on CASSANDRA-7486: --- Is the picture equally bleak at RF=3? Do the "2.2 GC" settings include anything other than the defaults from cassandra-env.sh? "ps -efw" output is sufficient. I'd be happy to take a look at the GC logs if they are available. > Migrate to G1GC by default > -- > > Key: CASSANDRA-7486 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 > Project: Cassandra > Issue Type: New Feature > Components: Config >Reporter: Jonathan Ellis >Assignee: Albert P Tobey > Fix For: 3.0 alpha 1 > > > See > http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning > and https://twitter.com/rbranson/status/482113561431265281 > May want to default 2.1 to G1. > 2.1 is a different animal from 2.0 after moving most of memtables off heap. > Suspect this will help G1 even more than CMS. (NB this is off by default but > needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10249) Reduce over-read for standard disk io by 16x
[ https://issues.apache.org/jira/browse/CASSANDRA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740056#comment-14740056 ] Albert P Tobey commented on CASSANDRA-10249: I'm benchmarking this patch when I have time to do so. https://gist.github.com/tobert/a30ee9b9c2d8aba882f0 > Reduce over-read for standard disk io by 16x > > > Key: CASSANDRA-10249 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10249 > Project: Cassandra > Issue Type: Improvement >Reporter: Albert P Tobey > Fix For: 2.1.x > > Attachments: patched-2.1.9-dstat-lvn10.png, > stock-2.1.9-dstat-lvn10.png, yourkit-screenshot.png > > > On read workloads, Cassandra 2.1 reads drastically more data than it emits > over the network. This causes problems throughput the system by wasting disk > IO and causing unnecessary GC. > I have reproduce the issue on clusters and locally with a single instance. > The only requirement to reproduce the issue is enough data to blow through > the page cache. The default schema and data size with cassandra-stress is > sufficient for exposing the issue. > With stock 2.1.9 I regularly observed anywhere from 300:1 to 500 > disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was > doing 300-500MB/s of disk reads, saturating the drive. > After applying this patch for standard IO mode > https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around > 100:1 on my local test rig. Latency improved considerably and GC became a lot > less frequent. > I tested with 512 byte reads as well, but got the same performance, which > makes sense since all HDD and SSD made in the last few years have a 4K block > size (many of them lie and say 512). > I'm re-running the numbers now and will post them tomorrow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10249) Reduce over-read for standard disk io by 16x
[ https://issues.apache.org/jira/browse/CASSANDRA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729820#comment-14729820 ] Albert P Tobey commented on CASSANDRA-10249: I'm running benchmarks now and will create a new patch that accepts a property and defaults to the original value of 64K. > Reduce over-read for standard disk io by 16x > > > Key: CASSANDRA-10249 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10249 > Project: Cassandra > Issue Type: Improvement >Reporter: Albert P Tobey > Fix For: 2.1.x > > Attachments: patched-2.1.9-dstat-lvn10.png, > stock-2.1.9-dstat-lvn10.png, yourkit-screenshot.png > > > On read workloads, Cassandra 2.1 reads drastically more data than it emits > over the network. This causes problems throughput the system by wasting disk > IO and causing unnecessary GC. > I have reproduce the issue on clusters and locally with a single instance. > The only requirement to reproduce the issue is enough data to blow through > the page cache. The default schema and data size with cassandra-stress is > sufficient for exposing the issue. > With stock 2.1.9 I regularly observed anywhere from 300:1 to 500 > disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was > doing 300-500MB/s of disk reads, saturating the drive. > After applying this patch for standard IO mode > https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around > 100:1 on my local test rig. Latency improved considerably and GC became a lot > less frequent. > I tested with 512 byte reads as well, but got the same performance, which > makes sense since all HDD and SSD made in the last few years have a 4K block > size (many of them lie and say 512). > I'm re-running the numbers now and will post them tomorrow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10249) Reduce over-read for standard disk io by 16x
[ https://issues.apache.org/jira/browse/CASSANDRA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert P Tobey updated CASSANDRA-10249: --- Attachment: yourkit-screenshot.png > Reduce over-read for standard disk io by 16x > > > Key: CASSANDRA-10249 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10249 > Project: Cassandra > Issue Type: Improvement >Reporter: Albert P Tobey > Fix For: 2.1.x > > Attachments: yourkit-screenshot.png > > > On read workloads, Cassandra 2.1 reads drastically more data than it emits > over the network. This causes problems throughput the system by wasting disk > IO and causing unnecessary GC. > I have reproduce the issue on clusters and locally with a single instance. > The only requirement to reproduce the issue is enough data to blow through > the page cache. The default schema and data size with cassandra-stress is > sufficient for exposing the issue. > With stock 2.1.9 I regularly observed anywhere from 300:1 to 500 > disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was > doing 300-500MB/s of disk reads, saturating the drive. > After applying this patch for standard IO mode > https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around > 100:1 on my local test rig. Latency improved considerably and GC became a lot > less frequent. > I tested with 512 byte reads as well, but got the same performance, which > makes sense since all HDD and SSD made in the last few years have a 4K block > size (many of them lie and say 512). > I'm re-running the numbers now and will post them tomorrow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10249) Reduce over-read for standard disk io by 16x
[ https://issues.apache.org/jira/browse/CASSANDRA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert P Tobey updated CASSANDRA-10249: --- Attachment: stock-2.1.9-dstat-lvn10.png > Reduce over-read for standard disk io by 16x > > > Key: CASSANDRA-10249 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10249 > Project: Cassandra > Issue Type: Improvement >Reporter: Albert P Tobey > Fix For: 2.1.x > > Attachments: stock-2.1.9-dstat-lvn10.png, yourkit-screenshot.png > > > On read workloads, Cassandra 2.1 reads drastically more data than it emits > over the network. This causes problems throughput the system by wasting disk > IO and causing unnecessary GC. > I have reproduce the issue on clusters and locally with a single instance. > The only requirement to reproduce the issue is enough data to blow through > the page cache. The default schema and data size with cassandra-stress is > sufficient for exposing the issue. > With stock 2.1.9 I regularly observed anywhere from 300:1 to 500 > disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was > doing 300-500MB/s of disk reads, saturating the drive. > After applying this patch for standard IO mode > https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around > 100:1 on my local test rig. Latency improved considerably and GC became a lot > less frequent. > I tested with 512 byte reads as well, but got the same performance, which > makes sense since all HDD and SSD made in the last few years have a 4K block > size (many of them lie and say 512). > I'm re-running the numbers now and will post them tomorrow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10249) Reduce over-read for standard disk io by 16x
[ https://issues.apache.org/jira/browse/CASSANDRA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert P Tobey updated CASSANDRA-10249: --- Attachment: patched-2.1.9-dstat-lvn10.png > Reduce over-read for standard disk io by 16x > > > Key: CASSANDRA-10249 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10249 > Project: Cassandra > Issue Type: Improvement >Reporter: Albert P Tobey > Fix For: 2.1.x > > Attachments: patched-2.1.9-dstat-lvn10.png, > stock-2.1.9-dstat-lvn10.png, yourkit-screenshot.png > > > On read workloads, Cassandra 2.1 reads drastically more data than it emits > over the network. This causes problems throughput the system by wasting disk > IO and causing unnecessary GC. > I have reproduce the issue on clusters and locally with a single instance. > The only requirement to reproduce the issue is enough data to blow through > the page cache. The default schema and data size with cassandra-stress is > sufficient for exposing the issue. > With stock 2.1.9 I regularly observed anywhere from 300:1 to 500 > disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was > doing 300-500MB/s of disk reads, saturating the drive. > After applying this patch for standard IO mode > https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around > 100:1 on my local test rig. Latency improved considerably and GC became a lot > less frequent. > I tested with 512 byte reads as well, but got the same performance, which > makes sense since all HDD and SSD made in the last few years have a 4K block > size (many of them lie and say 512). > I'm re-running the numbers now and will post them tomorrow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10249) Reduce over-read for standard disk io by 16x
[ https://issues.apache.org/jira/browse/CASSANDRA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert P Tobey updated CASSANDRA-10249: --- Description: On read workloads, Cassandra 2.1 reads drastically more data than it emits over the network. This causes problems throughput the system by wasting disk IO and causing unnecessary GC. I have reproduce the issue on clusters and locally with a single instance. The only requirement to reproduce the issue is enough data to blow through the page cache. The default schema and data size with cassandra-stress is sufficient for exposing the issue. With stock 2.1.9 I regularly observed anywhere from 300:1 to 500 disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was doing 300-500MB/s of disk reads, saturating the drive. After applying this patch for standard IO mode https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around 100:1 on my local test rig. I tested with 512 byte reads as well, but got the same performance, which makes sense since all HDD and SSD made in the last few years have a 4K block size (many of them lie and say 512). Ideally, the reads in was: On read workloads, Cassandra 2.1 reads drastically more data than it emits over the network. This causes problems throughput the system by wasting disk IO and causing unnecessary GC. I have reproduce the issue on clusters and locally with a single instance. The only requirement to reproduce the issue is enough data to blow through the page cache. The default schema and data size with cassandra-stress is sufficient for exposing the issue. With stock 2.1.9 I regularly observed anywhere from 300:1 to 500 disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was doing 300-500MB/s of disk reads, saturating the drive. After applying this patch https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around 100:1 on my local test rig. I tested with 512 byte reads as well, but got the same performance, which makes sense since all HDD and SSD made in the last few years have a 4K block size (many of them lie and say 512). > Reduce over-read for standard disk io by 16x > > > Key: CASSANDRA-10249 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10249 > Project: Cassandra > Issue Type: Improvement >Reporter: Albert P Tobey > Fix For: 2.1.x > > > On read workloads, Cassandra 2.1 reads drastically more data than it emits > over the network. This causes problems throughput the system by wasting disk > IO and causing unnecessary GC. > I have reproduce the issue on clusters and locally with a single instance. > The only requirement to reproduce the issue is enough data to blow through > the page cache. The default schema and data size with cassandra-stress is > sufficient for exposing the issue. > With stock 2.1.9 I regularly observed anywhere from 300:1 to 500 > disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was > doing 300-500MB/s of disk reads, saturating the drive. > After applying this patch for standard IO mode > https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around > 100:1 on my local test rig. > I tested with 512 byte reads as well, but got the same performance, which > makes sense since all HDD and SSD made in the last few years have a 4K block > size (many of them lie and say 512). Ideally, the reads in -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10249) Reduce over-read for standard disk io by 16x
Albert P Tobey created CASSANDRA-10249: -- Summary: Reduce over-read for standard disk io by 16x Key: CASSANDRA-10249 URL: https://issues.apache.org/jira/browse/CASSANDRA-10249 Project: Cassandra Issue Type: Improvement Reporter: Albert P Tobey Fix For: 2.1.x On read workloads, Cassandra 2.1 reads drastically more data than it emits over the network. This causes problems throughput the system by wasting disk IO and causing unnecessary GC. I have reproduce the issue on clusters and locally with a single instance. The only requirement to reproduce the issue is enough data to blow through the page cache. The default schema and data size with cassandra-stress is sufficient for exposing the issue. With stock 2.1.9 I regularly observed anywhere from 300:1 to 500 disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was doing 300-500MB/s of disk reads, saturating the drive. After applying this patch https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around 100:1 on my local test rig. I tested with 512 byte reads as well, but got the same performance, which makes sense since all HDD and SSD made in the last few years have a 4K block size (many of them lie and say 512). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8894) Our default buffer size for (uncompressed) buffered reads should be smaller, and based on the expected record size
[ https://issues.apache.org/jira/browse/CASSANDRA-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726544#comment-14726544 ] Albert P Tobey commented on CASSANDRA-8894: --- Sorry I'm late to the thread, but the recently added patch seems very overcomplicated for little benefit. I noticed the massive over-read in 2.1 and tracked it down independently by profiling with disk_access_mode: standard. I then ran a build with the DEFAULT_BUFFER_SIZE at 4K and saw an instant 4x increase in TXN/s on a simple -stress test with a 60% reduction in wasted disk IO. This over-read is causing performance problems on every Cassandra 2.1 cluster that isn't 100% writes. It doesn't always show up because of the massive amount of RAM a lot of folks are running, but under low memory situations it is killing even very fast SSDs. Patch for 2.1: https://gist.github.com/tobert/10c307cf3709a585a7cf In reading through the history, I think this is being overthought. If anything, the readahead and buffering in the read path should be *removed* and instead issue precise reads wherever it's possible. For now, the change to a 4K buffer size should be added to 2.1 in order to significantly speed up read workloads. > Our default buffer size for (uncompressed) buffered reads should be smaller, > and based on the expected record size > -- > > Key: CASSANDRA-8894 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8894 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Stefania > Labels: benedict-to-commit > Fix For: 3.0 alpha 1 > > Attachments: 8894_25pct.yaml, 8894_5pct.yaml, 8894_tiny.yaml > > > A large contributor to slower buffered reads than mmapped is likely that we > read a full 64Kb at once, when average record sizes may be as low as 140 > bytes on our stress tests. The TLB has only 128 entries on a modern core, and > each read will touch 32 of these, meaning we are unlikely to almost ever be > hitting the TLB, and will be incurring at least 30 unnecessary misses each > time (as well as the other costs of larger than necessary accesses). When > working with an SSD there is little to no benefit reading more than 4Kb at > once, and in either case reading more data than we need is wasteful. So, I > propose selecting a buffer size that is the next larger power of 2 than our > average record size (with a minimum of 4Kb), so that we expect to read in one > operation. I also propose that we create a pool of these buffers up-front, > and that we ensure they are all exactly aligned to a virtual page, so that > the source and target operations each touch exactly one virtual page per 4Kb > of expected record size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10249) Reduce over-read for standard disk io by 16x
[ https://issues.apache.org/jira/browse/CASSANDRA-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert P Tobey updated CASSANDRA-10249: --- Description: On read workloads, Cassandra 2.1 reads drastically more data than it emits over the network. This causes problems throughput the system by wasting disk IO and causing unnecessary GC. I have reproduce the issue on clusters and locally with a single instance. The only requirement to reproduce the issue is enough data to blow through the page cache. The default schema and data size with cassandra-stress is sufficient for exposing the issue. With stock 2.1.9 I regularly observed anywhere from 300:1 to 500 disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was doing 300-500MB/s of disk reads, saturating the drive. After applying this patch for standard IO mode https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around 100:1 on my local test rig. Latency improved considerably and GC became a lot less frequent. I tested with 512 byte reads as well, but got the same performance, which makes sense since all HDD and SSD made in the last few years have a 4K block size (many of them lie and say 512). I'm re-running the numbers now and will post them tomorrow. was: On read workloads, Cassandra 2.1 reads drastically more data than it emits over the network. This causes problems throughput the system by wasting disk IO and causing unnecessary GC. I have reproduce the issue on clusters and locally with a single instance. The only requirement to reproduce the issue is enough data to blow through the page cache. The default schema and data size with cassandra-stress is sufficient for exposing the issue. With stock 2.1.9 I regularly observed anywhere from 300:1 to 500 disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was doing 300-500MB/s of disk reads, saturating the drive. After applying this patch for standard IO mode https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around 100:1 on my local test rig. I tested with 512 byte reads as well, but got the same performance, which makes sense since all HDD and SSD made in the last few years have a 4K block size (many of them lie and say 512). Ideally, the reads in > Reduce over-read for standard disk io by 16x > > > Key: CASSANDRA-10249 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10249 > Project: Cassandra > Issue Type: Improvement >Reporter: Albert P Tobey > Fix For: 2.1.x > > > On read workloads, Cassandra 2.1 reads drastically more data than it emits > over the network. This causes problems throughput the system by wasting disk > IO and causing unnecessary GC. > I have reproduce the issue on clusters and locally with a single instance. > The only requirement to reproduce the issue is enough data to blow through > the page cache. The default schema and data size with cassandra-stress is > sufficient for exposing the issue. > With stock 2.1.9 I regularly observed anywhere from 300:1 to 500 > disk:network ratio. That is to say, for 1MB/s of network IO, Cassandra was > doing 300-500MB/s of disk reads, saturating the drive. > After applying this patch for standard IO mode > https://gist.github.com/tobert/10c307cf3709a585a7cf the ratio fell to around > 100:1 on my local test rig. Latency improved considerably and GC became a lot > less frequent. > I tested with 512 byte reads as well, but got the same performance, which > makes sense since all HDD and SSD made in the last few years have a 4K block > size (many of them lie and say 512). > I'm re-running the numbers now and will post them tomorrow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9946) use ioprio_set on compaction threads by default instead of manually throttling
[ https://issues.apache.org/jira/browse/CASSANDRA-9946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651488#comment-14651488 ] Albert P Tobey edited comment on CASSANDRA-9946 at 8/3/15 6:22 AM: --- Here's a script for pinning compaction to cores and ionice in one go: https://gist.github.com/tobert/97c52f80fdff2ba79ee9 Comment out the 'taskset' line to mess with ionice in isolation. I agree with Ariel and usually advise people to always use the deadline io scheduler, but ... FWIW I think it's possible to tune up CFQ to be acceptable. There isn't a lot of existing advice on the internet about how to do it, but it's doable. I've seen some references in various Redhat low-latency guides but have yet to try it out. Even if many users choose deadline/noop for peak throughput, others may prefer the performance tradeoff of CFQ if there is a payback of more predictable/smooth performance. That's not to mention the large number of setups that never tweak the disk scheduler at all. Setting compaction IO to idle class will benefit some folks and doesn't hurt those on noop/deadline. was (Author: ato...@datastax.com): Here's a script for pinning compaction to cores and ionice in one go: https://gist.github.com/tobert/97c52f80fdff2ba79ee9 Comment out the 'taskset' line to mess with ionice in isolation. FWIW I think it's possible to tune up CFQ to be acceptable. There isn't a lot of existing advice on the internet about how to do it, but it's doable. I've seen some references in various Redhat low-latency guides but have yet to try it out. Even if many users choose deadline/noop for peak throughput, others may prefer the performance tradeoff of CFQ if there is a payback of more predictable/smooth performance. That's not to mention the large number of setups that never tweak the disk scheduler at all. Setting compaction IO to idle class will benefit some folks and doesn't hurt those on noop/deadline. use ioprio_set on compaction threads by default instead of manually throttling -- Key: CASSANDRA-9946 URL: https://issues.apache.org/jira/browse/CASSANDRA-9946 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Fix For: 3.x Compaction throttling works as designed, but it has two drawbacks: * it requires manual tuning to choose the right value for a given machine * it does not allow compaction to burst above its limit if there is additional i/o capacity available while there are less application requests to serve Using ioprio_set instead solves both of these problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9946) use ioprio_set on compaction threads by default instead of manually throttling
[ https://issues.apache.org/jira/browse/CASSANDRA-9946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651488#comment-14651488 ] Albert P Tobey commented on CASSANDRA-9946: --- Here's a script for pinning compaction to cores and ionice in one go: https://gist.github.com/tobert/97c52f80fdff2ba79ee9 Comment out the 'taskset' line to mess with ionice in isolation. FWIW I think it's possible to tune up CFQ to be acceptable. There isn't a lot of existing advice on the internet about how to do it, but it's doable. I've seen some references in various Redhat low-latency guides but have yet to try it out. Even if many users choose deadline/noop for peak throughput, others may prefer the performance tradeoff of CFQ if there is a payback of more predictable/smooth performance. That's not to mention the large number of setups that never tweak the disk scheduler at all. Setting compaction IO to idle class will benefit some folks and doesn't hurt those on noop/deadline. use ioprio_set on compaction threads by default instead of manually throttling -- Key: CASSANDRA-9946 URL: https://issues.apache.org/jira/browse/CASSANDRA-9946 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Fix For: 3.x Compaction throttling works as designed, but it has two drawbacks: * it requires manual tuning to choose the right value for a given machine * it does not allow compaction to burst above its limit if there is additional i/o capacity available while there are less application requests to serve Using ioprio_set instead solves both of these problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9946) use ioprio_set on compaction threads by default instead of manually throttling
[ https://issues.apache.org/jira/browse/CASSANDRA-9946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651488#comment-14651488 ] Albert P Tobey edited comment on CASSANDRA-9946 at 8/3/15 6:23 AM: --- Here's a script for pinning compaction to cores and ionice in one go: https://gist.github.com/tobert/97c52f80fdff2ba79ee9 Comment out the 'taskset' line to mess with ionice in isolation. I agree with Ariel and usually advise people to always use the deadline io scheduler, but ... I think it's possible to tune up CFQ to be acceptable. There isn't a lot of existing advice on the internet about how to do it, but it's doable. I've seen some references in various Redhat low-latency guides but have yet to try it out. Even if many users choose deadline/noop for peak throughput, others may prefer the performance tradeoff of CFQ if there is a payback of more predictable/smooth performance. That's not to mention the large number of setups that never tweak the disk scheduler at all. Setting compaction IO to idle class will benefit some folks and doesn't hurt those on noop/deadline. was (Author: ato...@datastax.com): Here's a script for pinning compaction to cores and ionice in one go: https://gist.github.com/tobert/97c52f80fdff2ba79ee9 Comment out the 'taskset' line to mess with ionice in isolation. I agree with Ariel and usually advise people to always use the deadline io scheduler, but ... FWIW I think it's possible to tune up CFQ to be acceptable. There isn't a lot of existing advice on the internet about how to do it, but it's doable. I've seen some references in various Redhat low-latency guides but have yet to try it out. Even if many users choose deadline/noop for peak throughput, others may prefer the performance tradeoff of CFQ if there is a payback of more predictable/smooth performance. That's not to mention the large number of setups that never tweak the disk scheduler at all. Setting compaction IO to idle class will benefit some folks and doesn't hurt those on noop/deadline. use ioprio_set on compaction threads by default instead of manually throttling -- Key: CASSANDRA-9946 URL: https://issues.apache.org/jira/browse/CASSANDRA-9946 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Ariel Weisberg Fix For: 3.x Compaction throttling works as designed, but it has two drawbacks: * it requires manual tuning to choose the right value for a given machine * it does not allow compaction to burst above its limit if there is additional i/o capacity available while there are less application requests to serve Using ioprio_set instead solves both of these problems. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9274) Changing memtable_flush_writes per recommendations in cassandra.yaml causes memtable_cleanup_threshold to be too small
[ https://issues.apache.org/jira/browse/CASSANDRA-9274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581497#comment-14581497 ] Albert P Tobey commented on CASSANDRA-9274: --- I've been messing with these values lately and have observed some poor behavior around them. The comment in cassandra.yaml is misleading. I agree we shouldn't set the threshold by default, but I would like to see a comment added to memtable_flush_writers indicating that if you add a large number of flush writers, the default memtable_cleanup_threshold is going to end up small, which leads to small flushes and more frequent compaction. It makes some sense to set the memtable_cleanup_threshold based on the expected number of tables rather than cores or disks. I might be wrong, but that seems more relevant than the hardware or even the number of flush writer threads. Changing memtable_flush_writes per recommendations in cassandra.yaml causes memtable_cleanup_threshold to be too small --- Key: CASSANDRA-9274 URL: https://issues.apache.org/jira/browse/CASSANDRA-9274 Project: Cassandra Issue Type: Improvement Reporter: Donald Smith Priority: Minor It says in cassandra.yaml: {noformat} # If your data directories are backed by SSD, you should increase this # to the number of cores. #memtable_flush_writers: 8 {noformat} so we raised it to 24. Much later we noticed a warning in the logs: {noformat} WARN [main] 2015-04-22 15:32:58,619 DatabaseDescriptor.java:539 - memtable_cleanup_threshold is set very low, which may cause performance degradation {noformat} Looking at cassandra.yaml again I see: {noformat} # memtable_cleanup_threshold defaults to 1 / (memtable_flush_writers + 1) # memtable_cleanup_threshold: 0.11 #memtable_cleanup_threshold: 0.11 {noformat} So, I uncommented that last line (figuring that 0.11 is a reasonable value). Cassandra.yaml should give better guidance or the code should *prevent* the value from going outside a reasonable range. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8150) Revaluate Default JVM tuning parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575206#comment-14575206 ] Albert P Tobey commented on CASSANDRA-8150: --- I did some testing on EC2 with Cassandra 2.0 and G1GC and found the following settings to work well. Make sure to comment out the -Xmn line as shown. {code:sh} MAX_HEAP_SIZE=16G HEAP_NEWSIZE=”2G” # placeholder, ignored # setting -Xmn breaks G1GC, don’t do it #JVM_OPTS=$JVM_OPTS -Xmn${HEAP_NEWSIZE} # G1GC support ato...@datastax.com 2015-04-03 JVM_OPTS=$JVM_OPTS -XX:+UseG1GC # Cassandra does not benefit from biased locking JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # lowering the pause target will lower throughput # 200ms is the default and lowest viable setting for G1GC # 1000ms seems to provide good balance of throughput and latency JVM_OPTS=$JVM_OPTS -XX:MaxGCPauseMillis=1000 # auto-optimize thread local allocation block size # https://blogs.oracle.com/jonthecollector/entry/the_real_thi JVM_OPTS=$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB {code} Revaluate Default JVM tuning parameters --- Key: CASSANDRA-8150 URL: https://issues.apache.org/jira/browse/CASSANDRA-8150 Project: Cassandra Issue Type: Improvement Components: Config Reporter: Matt Stump Assignee: Ryan McGuire Attachments: upload.png It's been found that the old twitter recommendations of 100m per core up to 800m is harmful and should no longer be used. Instead the formula used should be 1/3 or 1/4 max heap with a max of 2G. 1/3 or 1/4 is debatable and I'm open to suggestions. If I were to hazard a guess 1/3 is probably better for releases greater than 2.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8150) Revaluate Default JVM tuning parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575206#comment-14575206 ] Albert P Tobey edited comment on CASSANDRA-8150 at 6/5/15 8:56 PM: --- I did some testing on EC2 with Cassandra 2.0 and G1GC and found the following settings to work well. Make sure to comment out the -Xmn line as shown. {code} MAX_HEAP_SIZE=16G HEAP_NEWSIZE=”2G” # placeholder, ignored # setting -Xmn breaks G1GC, don’t do it #JVM_OPTS=$JVM_OPTS -Xmn${HEAP_NEWSIZE} # G1GC support ato...@datastax.com 2015-04-03 JVM_OPTS=$JVM_OPTS -XX:+UseG1GC # Cassandra does not benefit from biased locking JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # lowering the pause target will lower throughput # 200ms is the default and lowest viable setting for G1GC # 1000ms seems to provide good balance of throughput and latency JVM_OPTS=$JVM_OPTS -XX:MaxGCPauseMillis=1000 # auto-optimize thread local allocation block size # https://blogs.oracle.com/jonthecollector/entry/the_real_thi JVM_OPTS=$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB {code} was (Author: ato...@datastax.com): I did some testing on EC2 with Cassandra 2.0 and G1GC and found the following settings to work well. Make sure to comment out the -Xmn line as shown. {code:sh} MAX_HEAP_SIZE=16G HEAP_NEWSIZE=”2G” # placeholder, ignored # setting -Xmn breaks G1GC, don’t do it #JVM_OPTS=$JVM_OPTS -Xmn${HEAP_NEWSIZE} # G1GC support ato...@datastax.com 2015-04-03 JVM_OPTS=$JVM_OPTS -XX:+UseG1GC # Cassandra does not benefit from biased locking JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking # lowering the pause target will lower throughput # 200ms is the default and lowest viable setting for G1GC # 1000ms seems to provide good balance of throughput and latency JVM_OPTS=$JVM_OPTS -XX:MaxGCPauseMillis=1000 # auto-optimize thread local allocation block size # https://blogs.oracle.com/jonthecollector/entry/the_real_thi JVM_OPTS=$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB {code} Revaluate Default JVM tuning parameters --- Key: CASSANDRA-8150 URL: https://issues.apache.org/jira/browse/CASSANDRA-8150 Project: Cassandra Issue Type: Improvement Components: Config Reporter: Matt Stump Assignee: Ryan McGuire Attachments: upload.png It's been found that the old twitter recommendations of 100m per core up to 800m is harmful and should no longer be used. Instead the formula used should be 1/3 or 1/4 max heap with a max of 2G. 1/3 or 1/4 is debatable and I'm open to suggestions. If I were to hazard a guess 1/3 is probably better for releases greater than 2.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9517) Switch to DTCS for hint storage
[ https://issues.apache.org/jira/browse/CASSANDRA-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565433#comment-14565433 ] Albert P Tobey commented on CASSANDRA-9517: --- The big question is if we can get away without the major compaction and it sounds like that's not feasible at this point. A new, simple compaction strategy that does no merging and only tombstone cleanup might be the best bet in the short term. Switch to DTCS for hint storage --- Key: CASSANDRA-9517 URL: https://issues.apache.org/jira/browse/CASSANDRA-9517 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jeremy Hanna Fix For: 2.1.6 The DateTieredCompactionStrategy is a good choice for HintedHandoff so that we reduce the compaction load we incur when users build up hints. [~ato...@datastax.com] and others have tried the following patch in various setups and have seen significantly less load from hint compaction. https://gist.github.com/tobert/c069af27e3f8840d137d Setting the time window to 10 minutes has shown additional improvement. [~krummas] do you have any feedback about this idea and/or settings? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9517) Switch to DTCS for hint storage
[ https://issues.apache.org/jira/browse/CASSANDRA-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565469#comment-14565469 ] Albert P Tobey commented on CASSANDRA-9517: --- My original theory was that we could use DTCS for system.hints since it has a timeseries-like table definition and let it delete whole tables when the TTLs expire. That was before I understood exactly how tombstones are used in hints. The patch seemed to help a little in testing, but I did not figure out why it seemed that way. The forced major compaction is most of the problem when hints build up, so that's the thing that needs to be removed if at all possible. Under 100% write workload on very fast machines I was seeing system.hints compactions in excess of 100GB, which has all kinds of negative side-effects. If there's a way we can convince any of the compaction strategies to split the wide rows across sstables (split by time window) while only merging tombstones along with subsequent cleanup, that could make hints tolerable until 3.0 takes over the world. Switch to DTCS for hint storage --- Key: CASSANDRA-9517 URL: https://issues.apache.org/jira/browse/CASSANDRA-9517 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jeremy Hanna Fix For: 2.1.6 The DateTieredCompactionStrategy is a good choice for HintedHandoff so that we reduce the compaction load we incur when users build up hints. [~ato...@datastax.com] and others have tried the following patch in various setups and have seen significantly less load from hint compaction. https://gist.github.com/tobert/c069af27e3f8840d137d Setting the time window to 10 minutes has shown additional improvement. [~krummas] do you have any feedback about this idea and/or settings? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Migrate to G1GC by default
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563219#comment-14563219 ] Albert P Tobey commented on CASSANDRA-7486: --- I tested a number of different pause targets on a wide variety of machines. While the 200ms default is often fine on big machines with real CPUs, in Ghz-constrained environments like EC2 PVM or LV Xeons, throughput dropped considerably so that the GC could hit the pause target. I initially tested at 1000ms and 2000ms but settled on 500ms because it provides most of the benefit of a more generous pause target while being far enough below the current read/write timeouts in cassandra.yaml to make sure that pauses never/rarely hit those limits. Migrate to G1GC by default -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: New Feature Components: Config Reporter: Jonathan Ellis Assignee: Albert P Tobey Fix For: 3.0 beta 1 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Migrate to G1GC by default
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561432#comment-14561432 ] Albert P Tobey commented on CASSANDRA-7486: --- Updated patches with spelling and whitespace fixes: https://github.com/tobert/cassandra/commits/g1gc-2 https://github.com/tobert/cassandra/commit/419d39814985a6ef165fdbafee5f1b84bf2f197b https://github.com/tobert/cassandra/commit/89d40af978eaeb02185726a63257d979111ad317 https://github.com/tobert/cassandra/commit/0f70469985d62aeadc20b41dc9cdc9d72a035c64 Migrate to G1GC by default -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: New Feature Components: Config Reporter: Jonathan Ellis Assignee: Albert P Tobey Fix For: 3.0 beta 1 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-7486) Migrate to G1GC by default
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert P Tobey updated CASSANDRA-7486: -- Comment: was deleted (was: Yeah. I started on the Powershell scripts but figured I should talk to someone more knowledgeable on Windows before making the change. If you want a straight port I can throw that together and do a quick test on my local Windows machine.) Migrate to G1GC by default -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: New Feature Components: Config Reporter: Jonathan Ellis Assignee: Albert P Tobey Fix For: 3.0 beta 1 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Migrate to G1GC by default
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14560200#comment-14560200 ] Albert P Tobey commented on CASSANDRA-7486: --- https://github.com/tobert/cassandra/commit/0759be3b2a2a8ded0098622dcb95c0eb47d79fd3 Migrate to G1GC by default -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: New Feature Components: Config Reporter: Jonathan Ellis Assignee: Albert P Tobey Fix For: 3.0 beta 1 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559669#comment-14559669 ] Albert P Tobey commented on CASSANDRA-7486: --- https://github.com/tobert/cassandra/tree/g1gc https://github.com/tobert/cassandra/commit/33bf6719e0c8e84672c3633f8ecce602affc3071 https://github.com/tobert/cassandra/commit/cafee86c3c5798e423689a26b43d05ed9312adc5 Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Albert P Tobey Fix For: 3.0 beta 1 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Migrate to G1GC by default
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559851#comment-14559851 ] Albert P Tobey commented on CASSANDRA-7486: --- Yeah. I started on the Powershell scripts but figured I should talk to someone more knowledgeable on Windows before making the change. If you want a straight port I can throw that together and do a quick test on my local Windows machine. Migrate to G1GC by default -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: New Feature Components: Config Reporter: Jonathan Ellis Assignee: Albert P Tobey Fix For: 3.0 beta 1 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Migrate to G1GC by default
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559852#comment-14559852 ] Albert P Tobey commented on CASSANDRA-7486: --- Yeah. I started on the Powershell scripts but figured I should talk to someone more knowledgeable on Windows before making the change. If you want a straight port I can throw that together and do a quick test on my local Windows machine. Migrate to G1GC by default -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: New Feature Components: Config Reporter: Jonathan Ellis Assignee: Albert P Tobey Fix For: 3.0 beta 1 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Migrate to G1GC by default
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14559853#comment-14559853 ] Albert P Tobey commented on CASSANDRA-7486: --- Yeah. I started on the Powershell scripts but figured I should talk to someone more knowledgeable on Windows before making the change. If you want a straight port I can throw that together and do a quick test on my local Windows machine. Migrate to G1GC by default -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: New Feature Components: Config Reporter: Jonathan Ellis Assignee: Albert P Tobey Fix For: 3.0 beta 1 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-7486) Migrate to G1GC by default
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert P Tobey updated CASSANDRA-7486: -- Comment: was deleted (was: Yeah. I started on the Powershell scripts but figured I should talk to someone more knowledgeable on Windows before making the change. If you want a straight port I can throw that together and do a quick test on my local Windows machine.) Migrate to G1GC by default -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: New Feature Components: Config Reporter: Jonathan Ellis Assignee: Albert P Tobey Fix For: 3.0 beta 1 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553067#comment-14553067 ] Albert P Tobey commented on CASSANDRA-7486: --- I'll attach a patch ASAP. Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Albert P Tobey Fix For: 3.0 beta 1 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8651) Add support for running on Apache Mesos
[ https://issues.apache.org/jira/browse/CASSANDRA-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553408#comment-14553408 ] Albert P Tobey commented on CASSANDRA-8651: --- The code is here: https://github.com/mesosphere/cassandra-mesos Add support for running on Apache Mesos --- Key: CASSANDRA-8651 URL: https://issues.apache.org/jira/browse/CASSANDRA-8651 Project: Cassandra Issue Type: Task Reporter: Ben Whitehead Priority: Minor Fix For: 3.x As a user of Apache Mesos I would like to be able to run Cassandra on my Mesos cluster. This would entail integration of Cassandra on Mesos through the creation of a production level Mesos framework. This would enable me to avoid static partitioning and inefficiencies and run Cassandra as part of my data center infrastructure. http://mesos.apache.org/documentation/latest/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550741#comment-14550741 ] Albert P Tobey commented on CASSANDRA-7486: --- Did you run into evacuation failures? How big was your heap? I haven't seen any evac failures with 2.1 and Java 8. This is one of the things that was worked on for Hotspot 1.8. Then again maybe it's Solr that needs the help. I suspect you can remove a lot of these settings on Java 8, but have also discovered that setting the GC threads is necessary on many machines. Try adding the below line for a nice decrease in p99 latencies. JVM_OPTS=$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5 Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Shawn Kumar Fix For: 2.1.x See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523593#comment-14523593 ] Albert P Tobey commented on CASSANDRA-7486: --- if I am reading correctly there was pretty never an old generation collection under the workload I looked at. The old gen was growing but never reached the point it needed to do an old gen GC. ^ G1 doesn't work that way. Another behavior to consider is worst case pause time when there is fragmentation. ^ G1 performs compaction. It's fairly easy to trigger and observe in gc.log with Cassandra 2.0. It takes more work with 2.1 since it seems to be easier on the GC. I'll see if I can find some time to generate graphs to make all this more convincing, but time is short because I'm spending all of my time tuning users' clusters where the #1 first issue every time is getting CMS to behave. Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Shawn Kumar Fix For: 2.1.x See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523724#comment-14523724 ] Albert P Tobey commented on CASSANDRA-7486: --- I only started messing with G1 this year, so I only know the old behavior by lore I've read and heard. I have not observed significant problems it in the ~20-40 hours I've spent tuning clusters with G1 recently. I don't recommend anyone try G1 on JDK 7 u75 or JDK 8 u40 (although it's probably OK down to u20 according to the docs I've read). I did some testing on JDK7u75 and it was stable but didn't spend much time on it since JDK8u40 gave a nice bump in performance (5-10% on a customer cluster) by just switching JDKs and nothing else. From what I've read about the reference clearing issues, there is a new-ish setting to enable parallel reference collection, -XX:+ParallelRefProcEnabled. The advice in the docs is to only turn it on if a significant amount of time is spent on RefProc collection, e.g. [Ref Proc: 5.2 ms]. I pulled that from a log I had handy and that is high enough that we might want to consider enabling the flag, but in most of my observations it hovers around 0.1ms under saturation load. Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Shawn Kumar Fix For: 2.1.x See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519791#comment-14519791 ] Albert P Tobey commented on CASSANDRA-7486: --- [~aweisberg] comparing promotions between G1 and CMS doesn't really make sense IMO. G1 promotions simply mark memory where it is without copying. After a threshold it will compact surviving cards into a single region. What I've observed is that compaction is rarely necessary with a big enough G1 heap. With a saturation write workload on Cassandra 2.1 only ~100-200MB seems to stick around for the long haul with almost all the rest getting cycled every few minutes (in an 8GB heap). [~yangzhe1991] I would keep the default heap at 8GB. I have tested with G1 at 16GB on a 30GB m3.2xlarge on EC2 and it generally gets better throughput and latency because there's more space for G1 to waste (that's what they call it). Intel tested up to 100GB with Hbase at 200ms pause target and said nice things about it. I don't see much need for C* to hit that size but it's certainly doable with G1. The main problem is smaller heaps where G1 starts to struggle a little, but I found that it still works OK down to 512MB, even if a bit less efficient than CMS since G1 targets ~10% CPU time for GC while the others target 1% by default. Throughput / latency is always a tradeoff and in the case of G1 with non-aggressive latency targets (-XX:MaxGCPauseMillis=2000) the throughput is darn close to CMS with considerably improved standard deviation on latency. IMO that's a great tradeoff as most of the users I talk to in the wild mostly struggle with getting reliable latency rather than throughput. IMO consistent performance should always take precedence over maximum performance/throughput. G1 provides a much more consistent experience with fewer knobs to mess with (especially tuning eden size, which is still a black art that nearly every installation I've looked at gets wrong). Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Shawn Kumar Fix For: 2.1.x See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9193) Facility to write dynamic code to selectively trigger trace or log for queries
[ https://issues.apache.org/jira/browse/CASSANDRA-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503111#comment-14503111 ] Albert P Tobey commented on CASSANDRA-9193: --- Maybe just steal this? https://github.com/datastax/nodejs-driver/blob/master/lib/tokenizer.js Facility to write dynamic code to selectively trigger trace or log for queries -- Key: CASSANDRA-9193 URL: https://issues.apache.org/jira/browse/CASSANDRA-9193 Project: Cassandra Issue Type: New Feature Reporter: Matt Stump I want the equivalent of dtrace for Cassandra. I want the ability to intercept a query with a dynamic script (assume JS) and based on logic in that script trigger the statement for trace or logging. Examples - Trace only INSERT statements to a particular CF. - Trace statements for a particular partition or consistency level. - Log statements that fail to reach the desired consistency for read or write. - Log If the request size for read or write exceeds some threshold At some point in the future it would be helpful to also do things such as log partitions greater than X bytes or Z cells when performing compaction. Essentially be able to inject custom code dynamically without a reboot to the different stages of C*. The code should be executed synchronously as part of the monitored task, but we should provide the ability to log or execute CQL asynchronously from the provided API. Further down the line we could use this functionality to modify/rewrite requests or tasks dynamically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497603#comment-14497603 ] Albert P Tobey edited comment on CASSANDRA-7486 at 4/16/15 6:05 AM: My benchmarks completed. These were run on 6 quad-core Intel NUCs with 16GB RAM / 240GB SSD / gigabit ethernet. The CPUs are fairly slow at 1.3Ghz i5-4250U. Cassandra 2.1.4 / Oracle JDK 8u40 / CoreOS 647.0.0 / Linux 3.19.3 (bare metal - no container). The tests were automated with a complete cluster rebuild between tests and caches dropped before starting Cassandra each time. The big win with G1 IMO is that it is auto-tuning. I've been running it on a few other kinds of machines and it generally does much better with more CPU power. cassandra-stress was run with an increased heap but is otherwise unmodified from Cassandra 2.1.4. I checked the gc log regularly and did not see many pauses for stress itself above 1ms here there, with most pauses in the ~300usec range. The three stress nodes I had available are all quad-cores: i7-2600/3.4Ghz/8GB, Xeon-E31270/3.4Ghz/16GB, i5-4250U/1.3Ghz/16GB. The final output of the stress is available here: https://docs.google.com/a/datastax.com/spreadsheets/d/19Eb7HGkd5rFUD_C0ZALbK6-R4fPF9vJRr8BrvxBwo38/edit?usp=sharing http://tobert.org/downloads/cassandra-2.1-cms-vs-g1.csv The stress commands, system.log, GC logs, conf directory from all the servers, and full stress logs are available on my webserver here: http://tobert.org/downloads/cassandra-2.1-cms-vs-g1-data.tar.gz (35MB) was (Author: ato...@datastax.com): My benchmarks completed. These were run on 6 quad-core Intel NUCs with 16GB RAM / 240GB SSD / gigabit ethernet. The CPUs are fairly slow at 1.4Ghz. Cassandra 2.1.4 / Oracle JDK 8u40 / CoreOS 647.0.0 / Linux 3.19.3 (bare metal - no container). The tests were automated with a complete cluster rebuild between tests and caches dropped before starting Cassandra each time. The big win with G1 IMO is that it is auto-tuning. I've been running it on a few other kinds of machines and it generally does much better with more CPU power. cassandra-stress was run with an increased heap but is otherwise unmodified from Cassandra 2.1.4. I checked the gc log regularly and did not see many pauses for stress itself above 1ms here there, with most pauses in the ~300usec range. The final output of the stress is available here: https://docs.google.com/a/datastax.com/spreadsheets/d/19Eb7HGkd5rFUD_C0ZALbK6-R4fPF9vJRr8BrvxBwo38/edit?usp=sharing http://tobert.org/downloads/cassandra-2.1-cms-vs-g1.csv The stress commands, system.log, GC logs, conf directory from all the servers, and full stress logs are available on my webserver here: http://tobert.org/downloads/cassandra-2.1-cms-vs-g1-data.tar.gz (35MB) Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Shawn Kumar Fix For: 2.1.5 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497603#comment-14497603 ] Albert P Tobey edited comment on CASSANDRA-7486 at 4/16/15 6:11 AM: My benchmarks completed. These were run on 6 quad-core Intel NUCs with 16GB RAM / 240GB SSD / gigabit ethernet. The CPUs are fairly slow at 1.3Ghz i5-4250U. Cassandra 2.1.4 / Oracle JDK 8u40 / CoreOS 647.0.0 / Linux 3.19.3 (bare metal - no container). The tests were automated with a complete cluster rebuild between tests and caches dropped before starting Cassandra each time. The big win with G1 IMO is that it is auto-tuning. I've been running it on a few other kinds of machines and it generally does much better with more CPU power. cassandra-stress was run with an increased heap but is otherwise unmodified from Cassandra 2.1.4. I checked the gc log regularly and did not see many pauses for stress itself above 1ms here there, with most pauses in the ~300usec range. The three stress nodes I had available are all quad-cores: i7-2600/3.4Ghz/8GB, Xeon-E31270/3.4Ghz/16GB, i5-4250U/1.3Ghz/16GB. These were saturation tests. In all but the G1 @ 256MB test the stress runs were stable and the systems' CPUs were at 100% pretty much the whole time. The numbers smooth out a lot for all of the combinations of GC settings at more pedestrian throughput. I will kick that off when I get a chance, which will be ~2 weeks from now. The final output of the stress is available here: https://docs.google.com/a/datastax.com/spreadsheets/d/19Eb7HGkd5rFUD_C0ZALbK6-R4fPF9vJRr8BrvxBwo38/edit?usp=sharing http://tobert.org/downloads/cassandra-2.1-cms-vs-g1.csv The stress commands, system.log, GC logs, conf directory from all the servers, and full stress logs are available on my webserver here: http://tobert.org/downloads/cassandra-2.1-cms-vs-g1-data.tar.gz (35MB) was (Author: ato...@datastax.com): My benchmarks completed. These were run on 6 quad-core Intel NUCs with 16GB RAM / 240GB SSD / gigabit ethernet. The CPUs are fairly slow at 1.3Ghz i5-4250U. Cassandra 2.1.4 / Oracle JDK 8u40 / CoreOS 647.0.0 / Linux 3.19.3 (bare metal - no container). The tests were automated with a complete cluster rebuild between tests and caches dropped before starting Cassandra each time. The big win with G1 IMO is that it is auto-tuning. I've been running it on a few other kinds of machines and it generally does much better with more CPU power. cassandra-stress was run with an increased heap but is otherwise unmodified from Cassandra 2.1.4. I checked the gc log regularly and did not see many pauses for stress itself above 1ms here there, with most pauses in the ~300usec range. The three stress nodes I had available are all quad-cores: i7-2600/3.4Ghz/8GB, Xeon-E31270/3.4Ghz/16GB, i5-4250U/1.3Ghz/16GB. The final output of the stress is available here: https://docs.google.com/a/datastax.com/spreadsheets/d/19Eb7HGkd5rFUD_C0ZALbK6-R4fPF9vJRr8BrvxBwo38/edit?usp=sharing http://tobert.org/downloads/cassandra-2.1-cms-vs-g1.csv The stress commands, system.log, GC logs, conf directory from all the servers, and full stress logs are available on my webserver here: http://tobert.org/downloads/cassandra-2.1-cms-vs-g1-data.tar.gz (35MB) Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Shawn Kumar Fix For: 2.1.5 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497603#comment-14497603 ] Albert P Tobey commented on CASSANDRA-7486: --- My benchmarks completed. These were run on 6 quad-core Intel NUCs with 16GB RAM / 240GB SSD / gigabit ethernet. Cassandra 2.1.4. The CPUs are fairly slow at 1.4Ghz. The tests were automated with a complete cluster rebuild between tests and caches dropped before starting Cassandra each time. The big win with G1 IMO is that it is auto-tuning. I've been running it on a few other kinds of machines and it generally does much better with more CPU power. cassandra-stress was run with an increased heap but is otherwise unmodified from Cassandra 2.1.4. I checked the gc log regularly and did not see many pauses for stress itself above 1ms here there, with most pauses in the ~300usec range. The final output of the stress is available here: https://docs.google.com/a/datastax.com/spreadsheets/d/19Eb7HGkd5rFUD_C0ZALbK6-R4fPF9vJRr8BrvxBwo38/edit?usp=sharing http://tobert.org/downloads/cassandra-2.1-cms-vs-g1.csv The stress commands, system.log, GC logs, conf directory from all the servers, and full stress logs are available on my webserver here: http://tobert.org/downloads/cassandra-2.1-cms-vs-g1-data.tar.gz (35MB) Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Shawn Kumar Fix For: 2.1.5 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497603#comment-14497603 ] Albert P Tobey edited comment on CASSANDRA-7486 at 4/16/15 5:58 AM: My benchmarks completed. These were run on 6 quad-core Intel NUCs with 16GB RAM / 240GB SSD / gigabit ethernet. The CPUs are fairly slow at 1.4Ghz. Cassandra 2.1.4 / Oracle JDK 8u40 / CoreOS 647.0.0 / Linux 3.19.3 (bare metal - no container). The tests were automated with a complete cluster rebuild between tests and caches dropped before starting Cassandra each time. The big win with G1 IMO is that it is auto-tuning. I've been running it on a few other kinds of machines and it generally does much better with more CPU power. cassandra-stress was run with an increased heap but is otherwise unmodified from Cassandra 2.1.4. I checked the gc log regularly and did not see many pauses for stress itself above 1ms here there, with most pauses in the ~300usec range. The final output of the stress is available here: https://docs.google.com/a/datastax.com/spreadsheets/d/19Eb7HGkd5rFUD_C0ZALbK6-R4fPF9vJRr8BrvxBwo38/edit?usp=sharing http://tobert.org/downloads/cassandra-2.1-cms-vs-g1.csv The stress commands, system.log, GC logs, conf directory from all the servers, and full stress logs are available on my webserver here: http://tobert.org/downloads/cassandra-2.1-cms-vs-g1-data.tar.gz (35MB) was (Author: ato...@datastax.com): My benchmarks completed. These were run on 6 quad-core Intel NUCs with 16GB RAM / 240GB SSD / gigabit ethernet. Cassandra 2.1.4. The CPUs are fairly slow at 1.4Ghz. The tests were automated with a complete cluster rebuild between tests and caches dropped before starting Cassandra each time. The big win with G1 IMO is that it is auto-tuning. I've been running it on a few other kinds of machines and it generally does much better with more CPU power. cassandra-stress was run with an increased heap but is otherwise unmodified from Cassandra 2.1.4. I checked the gc log regularly and did not see many pauses for stress itself above 1ms here there, with most pauses in the ~300usec range. The final output of the stress is available here: https://docs.google.com/a/datastax.com/spreadsheets/d/19Eb7HGkd5rFUD_C0ZALbK6-R4fPF9vJRr8BrvxBwo38/edit?usp=sharing http://tobert.org/downloads/cassandra-2.1-cms-vs-g1.csv The stress commands, system.log, GC logs, conf directory from all the servers, and full stress logs are available on my webserver here: http://tobert.org/downloads/cassandra-2.1-cms-vs-g1-data.tar.gz (35MB) Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Shawn Kumar Fix For: 2.1.5 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9193) Facility to write dynamic code to selectively trigger trace or log for queries
[ https://issues.apache.org/jira/browse/CASSANDRA-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495157#comment-14495157 ] Albert P Tobey commented on CASSANDRA-9193: --- Javascript makes sense since Nashorn ships with Java 8. No dependencies to add to C*. Looks like you can get at a lot of the stuff we'd need as soon as a REPL or some way to run scripts is available: http://moduscreate.com/javascript-and-the-jvm/ Facility to write dynamic code to selectively trigger trace or log for queries -- Key: CASSANDRA-9193 URL: https://issues.apache.org/jira/browse/CASSANDRA-9193 Project: Cassandra Issue Type: New Feature Reporter: Matt Stump I want the equivalent of dtrace for Cassandra. I want the ability to intercept a query with a dynamic script (assume JS) and based on logic in that script trigger the statement for trace or logging. Examples - Trace only INSERT statements to a particular CF. - Trace statements for a particular partition or consistency level. - Log statements that fail to reach the desired consistency for read or write. - Log If the request size for read or write exceeds some threshold At some point in the future it would be helpful to also do things such as log partitions greater than X bytes or Z cells when performing compaction. Essentially be able to inject custom code dynamically without a reboot to the different stages of C*. The code should be executed synchronously as part of the monitored task, but we should provide the ability to log or execute CQL asynchronously from the provided API. Further down the line we could use this functionality to modify/rewrite requests or tasks dynamically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485579#comment-14485579 ] Albert P Tobey commented on CASSANDRA-7486: --- This is with 2.0 / OpenJDK8 since that's what I had running. Same everything each run except for heap size. cassandra-stress 2.1.4 read workload / 800 threads. I'll re-run with 2.1 / Oracle JDK8 and some mixed load. -XX:+UseG1GC Also: -XX:+UseTLAB -XX:+ResizeTLAB -XX:-UseBiasedLocking -XX:+AlwaysPreTouch but maybe those should go in a different ticket. 8GB: op rate : 139805 partition rate: 139805 row rate : 139805 latency mean : 5.7 latency median: 4.2 latency 95th percentile : 13.2 latency 99th percentile : 18.5 latency 99.9th percentile : 21.1 latency max : 303.8 512MB: op rate : 114214 partition rate: 114214 row rate : 114214 latency mean : 7.0 latency median: 3.7 latency 95th percentile : 12.4 latency 99th percentile : 14.7 latency 99.9th percentile : 15.3 latency max : 307.1 256MB: op rate : 60028 partition rate: 60028 row rate : 60028 latency mean : 13.3 latency median: 4.0 latency 95th percentile : 44.7 latency 99th percentile : 73.5 latency 99.9th percentile : 79.6 latency max : 1105.4 Same everything with mostly stock CMS settings for 2.0. I added the -XX:+UseTLAB -XX:+ResizeTLAB -XX:-UseBiasedLocking -XX:+AlwaysPreTouch settings to keep the numbers comparable to all of my other data. 8GB/1GB: op rate : 119155 partition rate: 119155 row rate : 119155 latency mean : 6.7 latency median: 4.1 latency 95th percentile : 11.8 latency 99th percentile : 15.5 latency 99.9th percentile : 17.3 latency max : 520.2 512MB ( -XX:+UseAdaptiveSizePolicy): op rate : 82375 partition rate: 82375 row rate : 82375 latency mean : 9.7 latency median: 4.3 latency 95th percentile : 28.2 latency 99th percentile : 49.4 latency 99.9th percentile : 54.8 latency max : 2642.6 256MB ( -XX:+UseAdaptiveSizePolicy): op rate : 77705 partition rate: 77705 row rate : 77705 latency mean : 10.3 latency median: 4.8 latency 95th percentile : 33.6 latency 99th percentile : 45.3 latency 99.9th percentile : 49.1 latency max : 1990.0 Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Shawn Kumar Fix For: 2.1.5 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484318#comment-14484318 ] Albert P Tobey commented on CASSANDRA-7486: --- So far my testing of read workloads matches my experience with writes. An 8GB heap with generic G1GC settings is good for more workloads out of the box than haphazardly tuned CMS can be. I've been testing on a mix of Oracle/OpenJDK and JDK7/8 and the results are fairly consistent across the board with the exception that performance is a tad higher (~5%) on JDK8 than JDK7 (with G1GC - I have not tested CMS much on JDK8). These parameters get better throughput than CMS out of the box with significantly improved consistency in the max and p99.9 latency. -Xmx8G -Xms8G -XX:+UseG1GC If throughput is more critical than latency, the following will get a few % more throughput at the cost of potentially higher max pause times: -Xmx8G -Xms8G -XX:+UseG1GC -XX:MaxGCPauseMillis=2000 -XX:InitiatingHeapOccupancyPercent=75 My recommendation is to document the last two options in cassandra-env.sh but leave them disabled/commented out for end-users to fiddle with. Other knobs for G1 didn't make a statistically measurable difference in my observations. G1 scales particularly well with heap size on huge machines. 8 to 16GB doesn't seem to make a big difference, matching what [~rbranson] saw. At 24GB I started seeing about 8-10% throughput increase with little variance in pause times. IMO the simple G1 configuration should be the default for large heaps. It's simple and provides consistent latency. Because it uses heuristics to determine the eden size and scanning schedule, it will adapts well to diverse environments without tweaking. Heap sizes under 8GB should continue to use CMS or even experiment with serial collectors (e.g. Raspberry Pi, t2.micro, vagrant). If there is interest, I will write up a patch for cassandra-env.sh to make the auto-detection code pick G1GC at = 6GB heap and CMS for 6GB. Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Shawn Kumar Fix For: 2.1.5 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484657#comment-14484657 ] Albert P Tobey commented on CASSANDRA-7486: --- I'll kick off some tests and find out. All of the Oracle docs say not to bother below 6GB, but yeah I agree, if it's basically not bad we should go with simple. Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Shawn Kumar Fix For: 2.1.5 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8150) Revaluate Default JVM tuning parameters
[ https://issues.apache.org/jira/browse/CASSANDRA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395572#comment-14395572 ] Albert P Tobey commented on CASSANDRA-8150: --- It appears that -XX:+UseGCTaskAffinity is a noop in hotspot. https://engineering.linkedin.com/garbage-collection/garbage-collection-optimization-high-throughput-and-low-latency-java-applications Revaluate Default JVM tuning parameters --- Key: CASSANDRA-8150 URL: https://issues.apache.org/jira/browse/CASSANDRA-8150 Project: Cassandra Issue Type: Improvement Components: Config Reporter: Matt Stump Assignee: Ryan McGuire Attachments: upload.png It's been found that the old twitter recommendations of 100m per core up to 800m is harmful and should no longer be used. Instead the formula used should be 1/3 or 1/4 max heap with a max of 2G. 1/3 or 1/4 is debatable and I'm open to suggestions. If I were to hazard a guess 1/3 is probably better for releases greater than 2.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7486) Compare CMS and G1 pause times
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389850#comment-14389850 ] Albert P Tobey commented on CASSANDRA-7486: --- I managed to get G1 (Java 8) to beat CMS on both latency and throughput on my NUC cluster. Preliminary results: https://gist.github.com/tobert/ea9328e4873441c7fc34 Compare CMS and G1 pause times -- Key: CASSANDRA-7486 URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 Project: Cassandra Issue Type: Test Components: Config Reporter: Jonathan Ellis Assignee: Shawn Kumar Fix For: 2.1.4 See http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning and https://twitter.com/rbranson/status/482113561431265281 May want to default 2.1 to G1. 2.1 is a different animal from 2.0 after moving most of memtables off heap. Suspect this will help G1 even more than CMS. (NB this is off by default but needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8873) Add PropertySeedProvider
Albert P Tobey created CASSANDRA-8873: - Summary: Add PropertySeedProvider Key: CASSANDRA-8873 URL: https://issues.apache.org/jira/browse/CASSANDRA-8873 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Albert P Tobey Priority: Minor Attachments: PropertySeedProvider.java Add a PropertySeedProvider that allows administrators to set a seed on the command line with -Dcassandra.seeds=127.0.0.1,127.0.0.2 instead of rewriting cassandra.yaml. It looks like the yaml parser expects there to always be a parameters: option on seeds, so unless we change it to be optional, there needs to be a dummy map or the yaml will not parse, e.g. seed_provider: - class_name: org.apache.cassandra.locator.PropertySeedProvider parameters: - stub: this is required for the yaml parser and is ignored -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-8651) Add support for running on Apache Mesos
[ https://issues.apache.org/jira/browse/CASSANDRA-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert P Tobey reassigned CASSANDRA-8651: - Assignee: Albert P Tobey Add support for running on Apache Mesos --- Key: CASSANDRA-8651 URL: https://issues.apache.org/jira/browse/CASSANDRA-8651 Project: Cassandra Issue Type: Task Reporter: Ben Whitehead Assignee: Albert P Tobey Priority: Minor Fix For: 3.0 As a user of Apache Mesos I would like to be able to run Cassandra on my Mesos cluster. This would entail integration of Cassandra on Mesos through the creation of a production level Mesos framework. This would enable me to avoid static partitioning and inefficiencies and run Cassandra as part of my data center infrastructure. http://mesos.apache.org/documentation/latest/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8651) Add support for running on Apache Mesos
[ https://issues.apache.org/jira/browse/CASSANDRA-8651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286450#comment-14286450 ] Albert P Tobey commented on CASSANDRA-8651: --- The folks at Mesosphere are working on building an executor for Mesos and are hoping to upstream any components that make sense to live in the Cassandra tree. It sounds like there could be a custom MesosSeedProvider. There's also a question of whether or not it makes sense to have the executor code live in the Cassandra tree. I think that will be easier to answer once it exists. For now, I don't think they need anything from the Cassandra developers. This ticket exists to make the work visible to the community. Add support for running on Apache Mesos --- Key: CASSANDRA-8651 URL: https://issues.apache.org/jira/browse/CASSANDRA-8651 Project: Cassandra Issue Type: Task Reporter: Ben Whitehead Priority: Minor Fix For: 3.0 As a user of Apache Mesos I would like to be able to run Cassandra on my Mesos cluster. This would entail integration of Cassandra on Mesos through the creation of a production level Mesos framework. This would enable me to avoid static partitioning and inefficiencies and run Cassandra as part of my data center infrastructure. http://mesos.apache.org/documentation/latest/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8494) incremental bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249103#comment-14249103 ] Albert P Tobey commented on CASSANDRA-8494: --- Neat idea. I think this would make a lot of sense to operators and provide visibility into the rebuild process that's easy to understand (how many tokens are complete?). Many of the customers I've talked to in the last few months will be very excited about this. In one case, they want to attach ~70TB of very fast SSD. I explained everything to them, they're still going to try. Another client has more than 100 remote sites that store time-series data. They want to store 10-15TB per node on 15K SAS RAID10. It's the gear they can get and they have limited ability to control power drops etc. in the remote sites, so density is really important to them. My former employer was trying to run 8 x 3TB SATA. No matter how hard we fought for the right drives, the incentives from the HW vendors etc. drove them to buy the big SATA drives. I think ops folks will like this and there's an opportunity to use this feature to improve the UX of bootstrap (by using token ranges to improve feedback to ops). incremental bootstrap - Key: CASSANDRA-8494 URL: https://issues.apache.org/jira/browse/CASSANDRA-8494 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jon Haddad Priority: Minor Labels: density Current bootstrapping involves (to my knowledge) picking tokens and streaming data before the node is available for requests. This can be problematic with fat nodes, since it may require 20TB of data to be streamed over before the machine can be useful. This can result in a massive window of time before the machine can do anything useful. As a potential approach to mitigate the huge window of time before a node is available, I suggest modifying the bootstrap process to only acquire a single initial token before being marked UP. This would likely be a configuration parameter incremental_bootstrap or something similar. After the node is bootstrapped with this one token, it could go into UP state, and could then acquire additional tokens (one or a handful at a time), which would be streamed over while the node is active and serving requests. The benefit here is that with the default 256 tokens a node could become an active part of the cluster with less than 1% of it's final data streamed over. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6246) EPaxos
[ https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14150847#comment-14150847 ] Albert P Tobey commented on CASSANDRA-6246: --- For backwards compatibility, if it's possible to run both protocols, make it a configuration in the yaml. Another rolling restart to disable hybrid/dual mode isn't so bad if it removes a lot of complexity from runtime. Would also make it easy for conservative users to stick with the old paxos. EPaxos -- Key: CASSANDRA-6246 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Jonathan Ellis Assignee: Blake Eggleston Priority: Minor One reason we haven't optimized our Paxos implementation with Multi-paxos is that Multi-paxos requires leader election and hence, a period of unavailability when the leader dies. EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, (2) is particularly useful across multiple datacenters, and (3) allows any node to act as coordinator: http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf However, there is substantial additional complexity involved if we choose to implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7136) Change default paths to ~ instead of /var
[ https://issues.apache.org/jira/browse/CASSANDRA-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018288#comment-14018288 ] Albert P Tobey commented on CASSANDRA-7136: --- [~thobbs] probably not. For some reason I had it in my head that this was for 3.0 so it was at the bottom of my queue. Change default paths to ~ instead of /var - Key: CASSANDRA-7136 URL: https://issues.apache.org/jira/browse/CASSANDRA-7136 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Albert P Tobey Fix For: 2.1.0 Defaulting to /var makes it more difficult for both multi-user systems and people unfamiliar with the command line. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7306) Support edge dcs with more flexible gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010205#comment-14010205 ] Albert P Tobey commented on CASSANDRA-7306: --- One real use case is branch locations with local clusters that get replicated to a central datacenter for analytics. The central cluster has no authority to open ports or create VPNs in the plants, but it can open ports on the inbound side. In this situation, the easiest thing to do is to open the inbound ports to the central cluster and use TLS. The spokes obviously cannot communicate with each other, but they can push data to the hub. This kind of scenario is common in retail and manufacturing. Basically, it's useful anywhere there is hub-and-spoke topology where bidirectional communication is impossible/intermittent. Another common problem is NAT traversal where VPN is not available. If there is no requirement for bi-directional replication, it gets a lot easier to deal with NAT since the spoke/leaf clusters can connect outbound through NAT into a centralized cluster. Generating all the firewall rules for such an setup is a lot of work and prone to error. If only one side needs to modify firewall policy, it's a lot easier to get right and troubleshoot. Support edge dcs with more flexible gossip Key: CASSANDRA-7306 URL: https://issues.apache.org/jira/browse/CASSANDRA-7306 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Tupshin Harper Labels: ponies As Cassandra clusters get bigger and bigger, and their topology becomes more complex, there is more and more need for a notion of hub and spoke datacenters. One of the big obstacles to supporting hundreds (or thousands) of remote dcs, is the assumption that all dcs need to talk to each other (and be connected all the time). This ticket is a vague placeholder with the goals of achieving: 1) better behavioral support for occasionally disconnected datacenters 2) explicit support for custom dc to dc routing. A simple approach would be an optional per-dc annotation of which other DCs that DC could gossip with. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7306) Support edge dcs with more flexible gossip
[ https://issues.apache.org/jira/browse/CASSANDRA-7306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010210#comment-14010210 ] Albert P Tobey commented on CASSANDRA-7306: --- Another angle: small edge clusters for low latency writes in many regions that push to a central warehouse for analytics. Think GSLB - HTTP - Cassandra all over the world with regional clusters replicated to us-west-2 where the data is crunched with Spark or Hadoop. This is basically the MySQL read-replica flipped on its head with 0-read write-replicas going into a read-heavy warehouse. The central cluster could be 100's of nodes while edge clusters are in the 5-10 range. Support edge dcs with more flexible gossip Key: CASSANDRA-7306 URL: https://issues.apache.org/jira/browse/CASSANDRA-7306 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Tupshin Harper Labels: ponies As Cassandra clusters get bigger and bigger, and their topology becomes more complex, there is more and more need for a notion of hub and spoke datacenters. One of the big obstacles to supporting hundreds (or thousands) of remote dcs, is the assumption that all dcs need to talk to each other (and be connected all the time). This ticket is a vague placeholder with the goals of achieving: 1) better behavioral support for occasionally disconnected datacenters 2) explicit support for custom dc to dc routing. A simple approach would be an optional per-dc annotation of which other DCs that DC could gossip with. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-7136) Change default paths to ~ instead of /var
[ https://issues.apache.org/jira/browse/CASSANDRA-7136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13988153#comment-13988153 ] Albert P Tobey commented on CASSANDRA-7136: --- Agreed on $CASSANDRA_HOME/data. Having slept on it, I don't think the defaults in cassandra.yaml should change. It should always reflect sane defaults for *production* use. What we're talking about here is non-production, so it gets the short straw and will do the yaml mangling. I'm leaning towards adding a separate launcher script as well, something like ./run_ephemeral.sh or some other name with obvious meaning. Various installations and testing packages have come to expect consistent behavior from the current setup and there's no good reason to change those if we can simply add another script. Change default paths to ~ instead of /var - Key: CASSANDRA-7136 URL: https://issues.apache.org/jira/browse/CASSANDRA-7136 Project: Cassandra Issue Type: Bug Reporter: Jonathan Ellis Assignee: Albert P Tobey Fix For: 2.1.0 Defaulting to /var makes it more difficult for both multi-user systems and people unfamiliar with the command line. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6487) Log WARN on large batch sizes
[ https://issues.apache.org/jira/browse/CASSANDRA-6487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848099#comment-13848099 ] Albert P Tobey commented on CASSANDRA-6487: --- If it's not out of the way, it would help to include the keyspace and column family and maybe the session ID/info. Log WARN on large batch sizes - Key: CASSANDRA-6487 URL: https://issues.apache.org/jira/browse/CASSANDRA-6487 Project: Cassandra Issue Type: Improvement Reporter: Patrick McFadin Priority: Minor Large batches on a coordinator can cause a lot of node stress. I propose adding a WARN log entry if batch sizes go beyond a configurable size. This will give more visibility to operators on something that can happen on the developer side. New yaml setting with 5k default. # Log WARN on any batch size exceeding this value. 5k by default. # Caution should be taken on increasing the size of this threshold as it can lead to node instability. batch_size_warn_threshold: 5k -- This message was sent by Atlassian JIRA (v6.1.4#6159)