[jira] [Commented] (CASSANDRA-9658) Re-enable memory-mapped index file reads on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14611985#comment-14611985 ] Joshua McKenzie commented on CASSANDRA-9658: Pushed update to [branch|https://github.com/apache/cassandra/compare/cassandra-2.2...josh-mckenzie:9658]. Added a WindowsFailedSnapshotTracker that writes a .toDelete file in $CASSANDRA_HOME, one line per failed snapshot directory if on Windows, and checks that file on startup and recursively delete any folders in there. I left the deleteRecursiveOnExit logic in there as well since a) it's pretty lightweight and simple and b) provides another avenue for us to confirm we delete snapshots on Windows in the rare case they fail. The only other thing I can think of for this would be having a periodic task that attempted to delete all the snapshot files listed in .toDelete as the node was running, so as readers were closed and files were compacted old snapshots would be deleted. That smells way too much like SSTableDeletingTask for my taste; I'm pretty content with the current setup given it's a temporary holdover. CI running: [testall|http://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-9658-testall/3/] - [dtest|http://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-9658-dtest/3/]. Re-enable memory-mapped index file reads on Windows --- Key: CASSANDRA-9658 URL: https://issues.apache.org/jira/browse/CASSANDRA-9658 Project: Cassandra Issue Type: Improvement Reporter: Joshua McKenzie Assignee: Joshua McKenzie Labels: Windows, performance Fix For: 2.2.x It appears that the impact of buffered vs. memory-mapped index file reads has changed dramatically since last I tested. [Here's some results on various platforms we pulled together yesterday w/2.2-HEAD|https://docs.google.com/spreadsheets/d/1JaO2x7NsK4SSg_ZBqlfH0AwspGgIgFZ9wZ12fC4VZb0/edit#gid=0]. TL;DR: On linux we see a 40% hit in performance from 108k ops/sec on reads to 64.8k ops/sec. While surprising in itself, the really unexpected result (to me) is on Windows - with standard access we're getting 16.8k ops/second on our bare-metal perf boxes vs. 184.7k ops/sec with memory-mapped index files, an over 10-fold increase in throughput. While testing w/standard access, CPU's on the stress machine and C* node are both sitting 4%, network doesn't appear bottlenecked, resource monitor doesn't show anything interesting, and performance counters in the kernel show very little. Changes in thread count simply serve to increase median latency w/out impacting any other visible metric that we're measuring, so I'm at a loss as to why the disparity is so huge on the platform. The combination of my changes to get the 2.1 branch to behave on Windows along with [~benedict] and [~Stefania]'s changes in lifecycle and cleanup patterns on 2.2 should hopefully have us in a state where transitioning back to using memory-mapped I/O on Windows will only cause trouble on snapshot deletion. Fairly simple runs of stress w/compaction aren't popping up any obvious errors on file access or renaming - I'm going to do some much heavier testing (ccm multi-node clusters, long stress w/repair and compaction, etc) and see if there's any outstanding issues that need to be stamped out to call mmap'ed index files on Windows safe. The one thing we'll never be able to support is deletion of snapshots while a node is running and sstables are mapped, but for a 10x throughput increase I think users would be willing to make that sacrifice. The combination of the powercfg profile change, the kernel timer resolution, and memory-mapped index files are giving some pretty interesting performance numbers on EC2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9658) Re-enable memory-mapped index file reads on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612087#comment-14612087 ] Joshua McKenzie commented on CASSANDRA-9658: After a bit of discussion offline, pushed an update that protects against deletion of any non-temp, non-data subdirectories on startup. Adding something silly (or important) to the .toDelete file will be skipped. Updated unit test for this functionality as well. Re-enable memory-mapped index file reads on Windows --- Key: CASSANDRA-9658 URL: https://issues.apache.org/jira/browse/CASSANDRA-9658 Project: Cassandra Issue Type: Improvement Reporter: Joshua McKenzie Assignee: Joshua McKenzie Labels: Windows, performance Fix For: 2.2.x It appears that the impact of buffered vs. memory-mapped index file reads has changed dramatically since last I tested. [Here's some results on various platforms we pulled together yesterday w/2.2-HEAD|https://docs.google.com/spreadsheets/d/1JaO2x7NsK4SSg_ZBqlfH0AwspGgIgFZ9wZ12fC4VZb0/edit#gid=0]. TL;DR: On linux we see a 40% hit in performance from 108k ops/sec on reads to 64.8k ops/sec. While surprising in itself, the really unexpected result (to me) is on Windows - with standard access we're getting 16.8k ops/second on our bare-metal perf boxes vs. 184.7k ops/sec with memory-mapped index files, an over 10-fold increase in throughput. While testing w/standard access, CPU's on the stress machine and C* node are both sitting 4%, network doesn't appear bottlenecked, resource monitor doesn't show anything interesting, and performance counters in the kernel show very little. Changes in thread count simply serve to increase median latency w/out impacting any other visible metric that we're measuring, so I'm at a loss as to why the disparity is so huge on the platform. The combination of my changes to get the 2.1 branch to behave on Windows along with [~benedict] and [~Stefania]'s changes in lifecycle and cleanup patterns on 2.2 should hopefully have us in a state where transitioning back to using memory-mapped I/O on Windows will only cause trouble on snapshot deletion. Fairly simple runs of stress w/compaction aren't popping up any obvious errors on file access or renaming - I'm going to do some much heavier testing (ccm multi-node clusters, long stress w/repair and compaction, etc) and see if there's any outstanding issues that need to be stamped out to call mmap'ed index files on Windows safe. The one thing we'll never be able to support is deletion of snapshots while a node is running and sstables are mapped, but for a 10x throughput increase I think users would be willing to make that sacrifice. The combination of the powercfg profile change, the kernel timer resolution, and memory-mapped index files are giving some pretty interesting performance numbers on EC2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9658) Re-enable memory-mapped index file reads on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612126#comment-14612126 ] T Jake Luciani commented on CASSANDRA-9658: --- There seems to be one file rename related test failure but not sure if it's related +1 If the next unit test run looks good Re-enable memory-mapped index file reads on Windows --- Key: CASSANDRA-9658 URL: https://issues.apache.org/jira/browse/CASSANDRA-9658 Project: Cassandra Issue Type: Improvement Reporter: Joshua McKenzie Assignee: Joshua McKenzie Labels: Windows, performance Fix For: 2.2.x It appears that the impact of buffered vs. memory-mapped index file reads has changed dramatically since last I tested. [Here's some results on various platforms we pulled together yesterday w/2.2-HEAD|https://docs.google.com/spreadsheets/d/1JaO2x7NsK4SSg_ZBqlfH0AwspGgIgFZ9wZ12fC4VZb0/edit#gid=0]. TL;DR: On linux we see a 40% hit in performance from 108k ops/sec on reads to 64.8k ops/sec. While surprising in itself, the really unexpected result (to me) is on Windows - with standard access we're getting 16.8k ops/second on our bare-metal perf boxes vs. 184.7k ops/sec with memory-mapped index files, an over 10-fold increase in throughput. While testing w/standard access, CPU's on the stress machine and C* node are both sitting 4%, network doesn't appear bottlenecked, resource monitor doesn't show anything interesting, and performance counters in the kernel show very little. Changes in thread count simply serve to increase median latency w/out impacting any other visible metric that we're measuring, so I'm at a loss as to why the disparity is so huge on the platform. The combination of my changes to get the 2.1 branch to behave on Windows along with [~benedict] and [~Stefania]'s changes in lifecycle and cleanup patterns on 2.2 should hopefully have us in a state where transitioning back to using memory-mapped I/O on Windows will only cause trouble on snapshot deletion. Fairly simple runs of stress w/compaction aren't popping up any obvious errors on file access or renaming - I'm going to do some much heavier testing (ccm multi-node clusters, long stress w/repair and compaction, etc) and see if there's any outstanding issues that need to be stamped out to call mmap'ed index files on Windows safe. The one thing we'll never be able to support is deletion of snapshots while a node is running and sstables are mapped, but for a 10x throughput increase I think users would be willing to make that sacrifice. The combination of the powercfg profile change, the kernel timer resolution, and memory-mapped index files are giving some pretty interesting performance numbers on EC2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9658) Re-enable memory-mapped index file reads on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610817#comment-14610817 ] T Jake Luciani commented on CASSANDRA-9658: --- Review: Overall looks good, but I'd like to see your branch run on cassci to make sure it's a-ok. Is there anything we need todo if the delete fails? like kill -9? or stack overflow on your recursive delete? Should we have a .deleteme file added that can be checked on start? Or at least a log message if a Major exception is thrown that users should expect to clean up themselves. Some minor things... -DatabaseDescriptor.java The log statement is repeated 3 times in code you can just put it at the end of the if/else conditions - SystemKeyspaceTest.java Use the FBUtiliies.getProtectedField method vs doing it yourself? - FileUtils.java Should you log every file at info? should it be debug? -SSTableRewriterTest Bad import org.hsqldb.Database; Re-enable memory-mapped index file reads on Windows --- Key: CASSANDRA-9658 URL: https://issues.apache.org/jira/browse/CASSANDRA-9658 Project: Cassandra Issue Type: Improvement Reporter: Joshua McKenzie Assignee: Joshua McKenzie Labels: Windows, performance Fix For: 2.2.x It appears that the impact of buffered vs. memory-mapped index file reads has changed dramatically since last I tested. [Here's some results on various platforms we pulled together yesterday w/2.2-HEAD|https://docs.google.com/spreadsheets/d/1JaO2x7NsK4SSg_ZBqlfH0AwspGgIgFZ9wZ12fC4VZb0/edit#gid=0]. TL;DR: On linux we see a 40% hit in performance from 108k ops/sec on reads to 64.8k ops/sec. While surprising in itself, the really unexpected result (to me) is on Windows - with standard access we're getting 16.8k ops/second on our bare-metal perf boxes vs. 184.7k ops/sec with memory-mapped index files, an over 10-fold increase in throughput. While testing w/standard access, CPU's on the stress machine and C* node are both sitting 4%, network doesn't appear bottlenecked, resource monitor doesn't show anything interesting, and performance counters in the kernel show very little. Changes in thread count simply serve to increase median latency w/out impacting any other visible metric that we're measuring, so I'm at a loss as to why the disparity is so huge on the platform. The combination of my changes to get the 2.1 branch to behave on Windows along with [~benedict] and [~Stefania]'s changes in lifecycle and cleanup patterns on 2.2 should hopefully have us in a state where transitioning back to using memory-mapped I/O on Windows will only cause trouble on snapshot deletion. Fairly simple runs of stress w/compaction aren't popping up any obvious errors on file access or renaming - I'm going to do some much heavier testing (ccm multi-node clusters, long stress w/repair and compaction, etc) and see if there's any outstanding issues that need to be stamped out to call mmap'ed index files on Windows safe. The one thing we'll never be able to support is deletion of snapshots while a node is running and sstables are mapped, but for a 10x throughput increase I think users would be willing to make that sacrifice. The combination of the powercfg profile change, the kernel timer resolution, and memory-mapped index files are giving some pretty interesting performance numbers on EC2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9658) Re-enable memory-mapped index file reads on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610890#comment-14610890 ] Joshua McKenzie commented on CASSANDRA-9658: Thanks for the quick turnaround! bq. The log statement is repeated 3 times in code you can just put it at the end of the if/else conditions I think that was originally in there to differentiate auto from other modes in the log which requires either logging the way we do now or building a log statement after depending on conf.disk_access_mode. Both seem equally distasteful to me so I figured leave well enough alone. bq. Use the FBUtiliies.getProtectedField method vs doing it yourself? Didn't know we had that - updated. bq. Should you log every file at info? should it be debug? Yes. Mistake / left over from testing. Fixed. bq. Bad import org.hsqldb.Database; IDE got over-eager and I missed it on final pass. Removed. bq. Is there anything we need todo if the delete fails? like kill -9? or stack overflow on your recursive delete? Should we have a .deleteme file added that can be checked on start? Or at least a log message if a Major exception is thrown that users should expect to clean up themselves I thought about that in passing but originally landed on the side of if a node hard-dies and an operator sees there's large disk usage on there, they should turn up snapshots pretty quick and can correlate it with the log message. In retrospect the log message should be more clear about it being best effort on JVM shutdown w/certain caveats (crashing, stack overflow, OOM) if we go that route. Given the work taking place in CASSANDRA-7066, I think I'll peruse Stefania's work on there and follow her lead w/an implementation to delete old snapshots on startup. It's really the most robust solution and it may be a long road for us to get buffered and memory-mapped into parity. Pulling this out of patch available for now while I get that implemented. CI currently running: [testall|https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-9658-testall/1/] [dtest|https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-9658-dtest/1/] Re-enable memory-mapped index file reads on Windows --- Key: CASSANDRA-9658 URL: https://issues.apache.org/jira/browse/CASSANDRA-9658 Project: Cassandra Issue Type: Improvement Reporter: Joshua McKenzie Assignee: Joshua McKenzie Labels: Windows, performance Fix For: 2.2.x It appears that the impact of buffered vs. memory-mapped index file reads has changed dramatically since last I tested. [Here's some results on various platforms we pulled together yesterday w/2.2-HEAD|https://docs.google.com/spreadsheets/d/1JaO2x7NsK4SSg_ZBqlfH0AwspGgIgFZ9wZ12fC4VZb0/edit#gid=0]. TL;DR: On linux we see a 40% hit in performance from 108k ops/sec on reads to 64.8k ops/sec. While surprising in itself, the really unexpected result (to me) is on Windows - with standard access we're getting 16.8k ops/second on our bare-metal perf boxes vs. 184.7k ops/sec with memory-mapped index files, an over 10-fold increase in throughput. While testing w/standard access, CPU's on the stress machine and C* node are both sitting 4%, network doesn't appear bottlenecked, resource monitor doesn't show anything interesting, and performance counters in the kernel show very little. Changes in thread count simply serve to increase median latency w/out impacting any other visible metric that we're measuring, so I'm at a loss as to why the disparity is so huge on the platform. The combination of my changes to get the 2.1 branch to behave on Windows along with [~benedict] and [~Stefania]'s changes in lifecycle and cleanup patterns on 2.2 should hopefully have us in a state where transitioning back to using memory-mapped I/O on Windows will only cause trouble on snapshot deletion. Fairly simple runs of stress w/compaction aren't popping up any obvious errors on file access or renaming - I'm going to do some much heavier testing (ccm multi-node clusters, long stress w/repair and compaction, etc) and see if there's any outstanding issues that need to be stamped out to call mmap'ed index files on Windows safe. The one thing we'll never be able to support is deletion of snapshots while a node is running and sstables are mapped, but for a 10x throughput increase I think users would be willing to make that sacrifice. The combination of the powercfg profile change, the kernel timer resolution, and memory-mapped index files are giving some pretty interesting performance numbers on EC2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9658) Re-enable memory-mapped index file reads on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610695#comment-14610695 ] Jonathan Ellis commented on CASSANDRA-9658: --- ([~tjake] to review) Re-enable memory-mapped index file reads on Windows --- Key: CASSANDRA-9658 URL: https://issues.apache.org/jira/browse/CASSANDRA-9658 Project: Cassandra Issue Type: Improvement Reporter: Joshua McKenzie Assignee: Joshua McKenzie Labels: Windows, performance Fix For: 2.2.x It appears that the impact of buffered vs. memory-mapped index file reads has changed dramatically since last I tested. [Here's some results on various platforms we pulled together yesterday w/2.2-HEAD|https://docs.google.com/spreadsheets/d/1JaO2x7NsK4SSg_ZBqlfH0AwspGgIgFZ9wZ12fC4VZb0/edit#gid=0]. TL;DR: On linux we see a 40% hit in performance from 108k ops/sec on reads to 64.8k ops/sec. While surprising in itself, the really unexpected result (to me) is on Windows - with standard access we're getting 16.8k ops/second on our bare-metal perf boxes vs. 184.7k ops/sec with memory-mapped index files, an over 10-fold increase in throughput. While testing w/standard access, CPU's on the stress machine and C* node are both sitting 4%, network doesn't appear bottlenecked, resource monitor doesn't show anything interesting, and performance counters in the kernel show very little. Changes in thread count simply serve to increase median latency w/out impacting any other visible metric that we're measuring, so I'm at a loss as to why the disparity is so huge on the platform. The combination of my changes to get the 2.1 branch to behave on Windows along with [~benedict] and [~Stefania]'s changes in lifecycle and cleanup patterns on 2.2 should hopefully have us in a state where transitioning back to using memory-mapped I/O on Windows will only cause trouble on snapshot deletion. Fairly simple runs of stress w/compaction aren't popping up any obvious errors on file access or renaming - I'm going to do some much heavier testing (ccm multi-node clusters, long stress w/repair and compaction, etc) and see if there's any outstanding issues that need to be stamped out to call mmap'ed index files on Windows safe. The one thing we'll never be able to support is deletion of snapshots while a node is running and sstables are mapped, but for a 10x throughput increase I think users would be willing to make that sacrifice. The combination of the powercfg profile change, the kernel timer resolution, and memory-mapped index files are giving some pretty interesting performance numbers on EC2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9658) Re-enable memory-mapped index file reads on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608934#comment-14608934 ] Joshua McKenzie commented on CASSANDRA-9658: Set up a 3-node ccm cluster, disk_access_mode: auto, patched to allow memory-mapping of index files on Windows. Almost everything worked without issue - the only things that gave any trouble were: # Sequential repair ** Throws a file access violation error working w/snapshots # clearing snapshots ** Didn't reproduce any problems w/them during regular runtime but SystemKeyspaceTest illustrates the problem where, if we have open mmap'ed readers, snapshot deletion fails. Ran about 250M records through over the course of the day on 2 different CCM clusters running constant repairs and major compactions on them, creating and deleting snapshots, creating and dropping keyspaces and tables. In the interest of tightening up mmap support on Windows I've put together a patch that does the following: # Revert to disk_access_mode param from yaml to determine access modes on Windows, default to auto if none found # Flip parallelism to RepairParallelism.PARALLEL in RepairOption and log a warning when on Windows and either idx or data files are non-standard # Adds a new FileUtils.deleteRecursiveOnExit, using File.deleteOnExit to defer deletion of specific files # Modifies Directories.clearSnapshot to 1st attempt a regular deletion and upon failure, if on Windows, schedule a deferred deletion on JVM shutdown for the snapshot in question. Logs a warning that gives the name of the folder and also indicates that users can attempt to manually delete that folder if they see fit. The upgrade process on Windows should include either a) a bounce of a node after upgrade if this error appears in the log or b) advice to manually attempt deletion of those files later. Or both. # Force SSTableRewriterTest to standard data and idx access mode when on Windows. SSTRW is incompatible with memory-mapped I/O on Windows in its current incarnation; we'd have to postpone mapping until rewrite has completed which we can pursue on another ticket. Currently, since SSTRW is disabled on Windows anyway, I'm ok w/hard-coding the test on the platform to that disk access mode at this time. # Updates SystemKeyspaceTest to confirm that the deleteOnExit approach is working as expected # Cleans up RepairOptionsTest w/respect to the new changes. Branch available [here|https://github.com/apache/cassandra/compare/trunk...josh-mckenzie:9658]. Unit tests pass locally - dtests on Windows are still inconsistent enough that I can't heavily rely on them to test this change, but I'll probably kick off a dtest run against this branch since I've gotten it down to 57 errors at this point. Re-enable memory-mapped index file reads on Windows --- Key: CASSANDRA-9658 URL: https://issues.apache.org/jira/browse/CASSANDRA-9658 Project: Cassandra Issue Type: Improvement Reporter: Joshua McKenzie Assignee: Joshua McKenzie Labels: Windows, performance Fix For: 2.2.x It appears that the impact of buffered vs. memory-mapped index file reads has changed dramatically since last I tested. [Here's some results on various platforms we pulled together yesterday w/2.2-HEAD|https://docs.google.com/spreadsheets/d/1JaO2x7NsK4SSg_ZBqlfH0AwspGgIgFZ9wZ12fC4VZb0/edit#gid=0]. TL;DR: On linux we see a 40% hit in performance from 108k ops/sec on reads to 64.8k ops/sec. While surprising in itself, the really unexpected result (to me) is on Windows - with standard access we're getting 16.8k ops/second on our bare-metal perf boxes vs. 184.7k ops/sec with memory-mapped index files, an over 10-fold increase in throughput. While testing w/standard access, CPU's on the stress machine and C* node are both sitting 4%, network doesn't appear bottlenecked, resource monitor doesn't show anything interesting, and performance counters in the kernel show very little. Changes in thread count simply serve to increase median latency w/out impacting any other visible metric that we're measuring, so I'm at a loss as to why the disparity is so huge on the platform. The combination of my changes to get the 2.1 branch to behave on Windows along with [~benedict] and [~Stefania]'s changes in lifecycle and cleanup patterns on 2.2 should hopefully have us in a state where transitioning back to using memory-mapped I/O on Windows will only cause trouble on snapshot deletion. Fairly simple runs of stress w/compaction aren't popping up any obvious errors on file access or renaming - I'm going to do some much heavier testing (ccm multi-node clusters, long stress w/repair and compaction, etc) and see if there's any outstanding issues
[jira] [Commented] (CASSANDRA-9658) Re-enable memory-mapped index file reads on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605763#comment-14605763 ] Joshua McKenzie commented on CASSANDRA-9658: [~iamaleksey]: My initial assumption is that getting buffered close to parity w/mmap on Windows is going to be both much more programmer-hour intensive and much more invasive than getting mmap stabilized on Windows in time for 2.2.x stabilizing. I agree on the long-term goal of standardizing on a single read path; I'll do some stress-testing today to get an initial read on how much pain enabling mmap'ed I/O on Windows might cause us. [~stefania_alborghetti]: I don't think 7066 will actually be necessary for us after CASSANDRA-8535 and then CASSANDRA-8984, however I'll need to stress test the paths today to get a better feel for it post 8984. Let's sit tight on these test results w/mmap on Windows before taking any other steps to try and get buffered reads closer to parity right now on account of this ticket. Re-enable memory-mapped index file reads on Windows --- Key: CASSANDRA-9658 URL: https://issues.apache.org/jira/browse/CASSANDRA-9658 Project: Cassandra Issue Type: Improvement Reporter: Joshua McKenzie Assignee: Joshua McKenzie Labels: Windows, performance Fix For: 2.2.x It appears that the impact of buffered vs. memory-mapped index file reads has changed dramatically since last I tested. [Here's some results on various platforms we pulled together yesterday w/2.2-HEAD|https://docs.google.com/spreadsheets/d/1JaO2x7NsK4SSg_ZBqlfH0AwspGgIgFZ9wZ12fC4VZb0/edit#gid=0]. TL;DR: On linux we see a 40% hit in performance from 108k ops/sec on reads to 64.8k ops/sec. While surprising in itself, the really unexpected result (to me) is on Windows - with standard access we're getting 16.8k ops/second on our bare-metal perf boxes vs. 184.7k ops/sec with memory-mapped index files, an over 10-fold increase in throughput. While testing w/standard access, CPU's on the stress machine and C* node are both sitting 4%, network doesn't appear bottlenecked, resource monitor doesn't show anything interesting, and performance counters in the kernel show very little. Changes in thread count simply serve to increase median latency w/out impacting any other visible metric that we're measuring, so I'm at a loss as to why the disparity is so huge on the platform. The combination of my changes to get the 2.1 branch to behave on Windows along with [~benedict] and [~Stefania]'s changes in lifecycle and cleanup patterns on 2.2 should hopefully have us in a state where transitioning back to using memory-mapped I/O on Windows will only cause trouble on snapshot deletion. Fairly simple runs of stress w/compaction aren't popping up any obvious errors on file access or renaming - I'm going to do some much heavier testing (ccm multi-node clusters, long stress w/repair and compaction, etc) and see if there's any outstanding issues that need to be stamped out to call mmap'ed index files on Windows safe. The one thing we'll never be able to support is deletion of snapshots while a node is running and sstables are mapped, but for a 10x throughput increase I think users would be willing to make that sacrifice. The combination of the powercfg profile change, the kernel timer resolution, and memory-mapped index files are giving some pretty interesting performance numbers on EC2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9658) Re-enable memory-mapped index file reads on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606722#comment-14606722 ] Stefania commented on CASSANDRA-9658: - SGTM Re-enable memory-mapped index file reads on Windows --- Key: CASSANDRA-9658 URL: https://issues.apache.org/jira/browse/CASSANDRA-9658 Project: Cassandra Issue Type: Improvement Reporter: Joshua McKenzie Assignee: Joshua McKenzie Labels: Windows, performance Fix For: 2.2.x It appears that the impact of buffered vs. memory-mapped index file reads has changed dramatically since last I tested. [Here's some results on various platforms we pulled together yesterday w/2.2-HEAD|https://docs.google.com/spreadsheets/d/1JaO2x7NsK4SSg_ZBqlfH0AwspGgIgFZ9wZ12fC4VZb0/edit#gid=0]. TL;DR: On linux we see a 40% hit in performance from 108k ops/sec on reads to 64.8k ops/sec. While surprising in itself, the really unexpected result (to me) is on Windows - with standard access we're getting 16.8k ops/second on our bare-metal perf boxes vs. 184.7k ops/sec with memory-mapped index files, an over 10-fold increase in throughput. While testing w/standard access, CPU's on the stress machine and C* node are both sitting 4%, network doesn't appear bottlenecked, resource monitor doesn't show anything interesting, and performance counters in the kernel show very little. Changes in thread count simply serve to increase median latency w/out impacting any other visible metric that we're measuring, so I'm at a loss as to why the disparity is so huge on the platform. The combination of my changes to get the 2.1 branch to behave on Windows along with [~benedict] and [~Stefania]'s changes in lifecycle and cleanup patterns on 2.2 should hopefully have us in a state where transitioning back to using memory-mapped I/O on Windows will only cause trouble on snapshot deletion. Fairly simple runs of stress w/compaction aren't popping up any obvious errors on file access or renaming - I'm going to do some much heavier testing (ccm multi-node clusters, long stress w/repair and compaction, etc) and see if there's any outstanding issues that need to be stamped out to call mmap'ed index files on Windows safe. The one thing we'll never be able to support is deletion of snapshots while a node is running and sstables are mapped, but for a 10x throughput increase I think users would be willing to make that sacrifice. The combination of the powercfg profile change, the kernel timer resolution, and memory-mapped index files are giving some pretty interesting performance numbers on EC2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9658) Re-enable memory-mapped index file reads on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605085#comment-14605085 ] Stefania commented on CASSANDRA-9658: - I've run an additional test on cperf and here are [the results|http://cstar.datastax.com/graph?stats=9de58a92-1e0a-11e5-bede42010af0688fmetric=op_rateoperation=2_readsmoothing=1show_aggregates=truexmin=0xmax=198.11ymin=0ymax=270914.6] on [blade_11_b|http://cstar.datastax.com/cluster/specs]. The difference between standard and mmap on trunk is about 55k (229,213 vs 175,327) confirming what's already observed in the previous tests. However 8894 reduces the difference somewhat (230,555 vs 207,208). What was the difference when you last tested? The 8894 branch is based on trunk but has the latest page alignment optimizations, CASSANDRA-8894, which are dependent on the page aligned buffers, CASSANDRA-8897, already on trunk but not in 2.2. I'm happy to spend more time to see if there are further optimization to reduce this difference or fix any regressions that contributed to increasing it in the first place. The cleanup ticket that removes temporary descriptors, CASSANDRA-7066, is actually targeted to trunk only, not 2.2. Is this the ticket we need to re-enable mmap on windows (I seem to recall this is the case from a comment posted there) or are CASSANDRA-8893 and CASSANDRA-8984 sufficient? Re-enable memory-mapped index file reads on Windows --- Key: CASSANDRA-9658 URL: https://issues.apache.org/jira/browse/CASSANDRA-9658 Project: Cassandra Issue Type: Improvement Reporter: Joshua McKenzie Assignee: Joshua McKenzie Labels: Windows, performance Fix For: 2.2.x It appears that the impact of buffered vs. memory-mapped index file reads has changed dramatically since last I tested. [Here's some results on various platforms we pulled together yesterday w/2.2-HEAD|https://docs.google.com/spreadsheets/d/1JaO2x7NsK4SSg_ZBqlfH0AwspGgIgFZ9wZ12fC4VZb0/edit#gid=0]. TL;DR: On linux we see a 40% hit in performance from 108k ops/sec on reads to 64.8k ops/sec. While surprising in itself, the really unexpected result (to me) is on Windows - with standard access we're getting 16.8k ops/second on our bare-metal perf boxes vs. 184.7k ops/sec with memory-mapped index files, an over 10-fold increase in throughput. While testing w/standard access, CPU's on the stress machine and C* node are both sitting 4%, network doesn't appear bottlenecked, resource monitor doesn't show anything interesting, and performance counters in the kernel show very little. Changes in thread count simply serve to increase median latency w/out impacting any other visible metric that we're measuring, so I'm at a loss as to why the disparity is so huge on the platform. The combination of my changes to get the 2.1 branch to behave on Windows along with [~benedict] and [~Stefania]'s changes in lifecycle and cleanup patterns on 2.2 should hopefully have us in a state where transitioning back to using memory-mapped I/O on Windows will only cause trouble on snapshot deletion. Fairly simple runs of stress w/compaction aren't popping up any obvious errors on file access or renaming - I'm going to do some much heavier testing (ccm multi-node clusters, long stress w/repair and compaction, etc) and see if there's any outstanding issues that need to be stamped out to call mmap'ed index files on Windows safe. The one thing we'll never be able to support is deletion of snapshots while a node is running and sstables are mapped, but for a 10x throughput increase I think users would be willing to make that sacrifice. The combination of the powercfg profile change, the kernel timer resolution, and memory-mapped index files are giving some pretty interesting performance numbers on EC2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9658) Re-enable memory-mapped index file reads on Windows
[ https://issues.apache.org/jira/browse/CASSANDRA-9658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603768#comment-14603768 ] Aleksey Yeschenko commented on CASSANDRA-9658: -- I, would still strongly prefer us to get to the core of the issue and fix it, without reenabling the mmaped path. We should absolutely standardize on a single read path for everything. If we can pin-point the issue quickly, then even delaying 2.2 (if we are otherwise going to make this change for 2.2.0). If we cannot do it quickly, then I guess temporarily switching mmap read bath back is unavoidable here, which makes me sad a little. Re-enable memory-mapped index file reads on Windows --- Key: CASSANDRA-9658 URL: https://issues.apache.org/jira/browse/CASSANDRA-9658 Project: Cassandra Issue Type: Improvement Reporter: Joshua McKenzie Assignee: Joshua McKenzie Labels: Windows, performance Fix For: 2.2.x It appears that the impact of buffered vs. memory-mapped index file reads has changed dramatically since last I tested. [Here's some results on various platforms we pulled together yesterday w/2.2-HEAD|https://docs.google.com/spreadsheets/d/1JaO2x7NsK4SSg_ZBqlfH0AwspGgIgFZ9wZ12fC4VZb0/edit#gid=0]. TL;DR: On linux we see a 40% hit in performance from 108k ops/sec on reads to 64.8k ops/sec. While surprising in itself, the really unexpected result (to me) is on Windows - with standard access we're getting 16.8k ops/second on our bare-metal perf boxes vs. 184.7k ops/sec with memory-mapped index files, an over 10-fold increase in throughput. While testing w/standard access, CPU's on the stress machine and C* node are both sitting 4%, network doesn't appear bottlenecked, resource monitor doesn't show anything interesting, and performance counters in the kernel show very little. Changes in thread count simply serve to increase median latency w/out impacting any other visible metric that we're measuring, so I'm at a loss as to why the disparity is so huge on the platform. The combination of my changes to get the 2.1 branch to behave on Windows along with [~benedict] and [~Stefania]'s changes in lifecycle and cleanup patterns on 2.2 should hopefully have us in a state where transitioning back to using memory-mapped I/O on Windows will only cause trouble on snapshot deletion. Fairly simple runs of stress w/compaction aren't popping up any obvious errors on file access or renaming - I'm going to do some much heavier testing (ccm multi-node clusters, long stress w/repair and compaction, etc) and see if there's any outstanding issues that need to be stamped out to call mmap'ed index files on Windows safe. The one thing we'll never be able to support is deletion of snapshots while a node is running and sstables are mapped, but for a 10x throughput increase I think users would be willing to make that sacrifice. The combination of the powercfg profile change, the kernel timer resolution, and memory-mapped index files are giving some pretty interesting performance numbers on EC2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)