Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?
It clearly isn't. There is definitely work involved for the CPU also when doing mmap. It is just that you move it from context switching and small I/O buffer copying to memory management. *All* memory access a process does is subject to the rules of the memory management unit of the CPU, so that cost is not specific to mmap():ed files (once a given page is in core that is). (But again, I'm not arguing the point in Cassandra's case; just generally.) -- / Peter Schuller (@scode on twitter)
Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?
On Fri, Jul 29, 2011 at 4:35 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: What is the origin of the mmap is substantially faster claim? The origin is the performance testing we did when adding mmap'd i/o. I believe Chris Goffinet also found a double-digit percentage performance improvement at Digg and/or Twitter, but I don't remember the details. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?
This is not advisable in general, since non-mmap'd I/O is substantially slower. The OP is correct that it is best to disable swap entirely, and second-best to enable JNA for mlockall. On Thu, Jul 28, 2011 at 7:05 AM, Adi adi.pan...@gmail.com wrote: Hi, We’ve started having problems with cassandra and memory swapping on linux which seems to be a fairly common issue (in our particular case after about a week all swap space will have been used up and we have to restart the process). It sounds like the general consensus is to just disable swap completely, but the recently released “Cassandra High Performance Cookbook” from Packt has instructions for “Stopping cassandra from using swap without disabling it system wide”. We’ve tried following the instructions but it refers to a “memory_locking_policy” variable in cassandra.yaml which throws an “unknown property” error on startup and I can’t find any reference to it in any of the cassandra docs. I’ve copied the summarised instructions below, does anyone know if this is something that ever worked or is there a different variable to set which does the same thing? If you are having trouble preventing the swapping the other parameter that can help is disk_access_mode We are using mmap_index_only and that has prevented swapping for now. auto will try to use mmap for all disk access , mmap will use mmap standard will not use mmap Search for swapping on the users list and go through the email discussions and jira issues related to swapping and that will give you an idea what can work for you. -Adi -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?
I don't think there's ever been a memory_locking_policy variable. Cassandra will call mlockall if JNA is present, no further steps required. On Thu, Jul 28, 2011 at 5:17 AM, Stephen Henderson stephen.hender...@cognitivematch.com wrote: Hi, We’ve started having problems with cassandra and memory swapping on linux which seems to be a fairly common issue (in our particular case after about a week all swap space will have been used up and we have to restart the process). It sounds like the general consensus is to just disable swap completely, but the recently released “Cassandra High Performance Cookbook” from Packt has instructions for “Stopping cassandra from using swap without disabling it system wide”. We’ve tried following the instructions but it refers to a “memory_locking_policy” variable in cassandra.yaml which throws an “unknown property” error on startup and I can’t find any reference to it in any of the cassandra docs. I’ve copied the summarised instructions below, does anyone know if this is something that ever worked or is there a different variable to set which does the same thing? (we’re using 0.7.4 at present and it looks like the book was written for 0.7.0-beta-1.10 so it might have been something which was abandoned during beta?) --- Disabling Swap Memory system-wide may not always be desirable. For example, if the system is not dedicated to running Cassandra, other processes on the system may benefit from Swap Memory. This recipe shows how to install the Java Native Architecture, which allows Java to lock itself in memory making it inevitable. 1. Place the jna.jar and platform.jar in the $CASSANDRA_HOME/lib directory: 2. Enable memory_locking_policy in $CASSANDRA_HOME/conf/cassandra.yaml: “memory_locking_policy: required” 3. Restart your Cassandra instance. 4. Confirm this configuration has taken effect by checking to see if a large portion of memory is Unevictable: $ grep Unevictable /proc/meminfo Unevictable: 1024 Kb --- Thanks, Stephen Stephen Henderson – Lead Developer (Onsite), Cognitive Match stephen.hender...@cognitivematch.com | http://www.cognitivematch.com T: +44 (0) 203 205 0004 | F: +44 (0) 207 526 2226 -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?
On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote: This is not advisable in general, since non-mmap'd I/O is substantially slower. I see this again and again as a claim here, but it is actually close to 10 years since I saw mmap'd I/O have any substantial performance benefits on any real life use I have needed. We have done a lot of testing of this also with cassandra and I don't see anything conclusive. We have done as many test where normal I/O has been faster than mmap and the differences may very well be within statistical variances given the complexity and number of factors involved in something like a distributed cassandra working at quorum. mmap made a difference in 2000 when memory throughput was still measured in hundreds of megabytes/sec and cpu caches was a few kilobytes, but today, you got megabytes of CPU caches with 100GB/sec bandwidths and even memory bandwidths are in 10's of GB/sec. However, I/O buffers are generally quiet small and copying an I/O buffer from kernel to user space inside a cache with 100GB/sec bandwidth is really a non-issue given the I/O throughput cassandra generates. In 2005 or so, CPUs had already reached a limit where I saw that mmap performed worse than regular I/O on as a large number of use cases. Hard to say exactly why, but I saw one theory from a FreeBSD core developer speculating back then that the extra MMU work involved in some I/O loads may actually be slower than cache internal memcopy of tiny I/O buffers (they are pretty small after all). I don't have a personal theory here. I just know that especially on large amounts of smaller I/O operations regular I/O was typically faster than mmap, which could back up that theory. So, I wonder how people came to this conclusion as I am, under no real life use case with cassandra, able to reproduce anything resembling a significant difference and we have been benchmarking on nodes with ssd setups which can churn out 1GB/sec+ read speeds. Way more I/O throughput than most people have at hand and still I cannot get mmap to give me better performance. I do, although subjectively, feel that things just seem to work better with regular I/O for us. We have currently have very nice and stable heap sizes at regardless of I/O loads and we have an easier system to operate as we can actually monitor how much memory the darned thing work. My recommendation? Stay away from mmap. I would love to understand how people got to this conclusion however and try to find out why we seem to see differences! The OP is correct that it is best to disable swap entirely, and second-best to enable JNA for mlockall. Be a bit careful with removing swap completely. Linux is not always happy when it gets short on memory. Terje
Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?
If you're actually hitting disk for most or even many of your reads then mmap doesn't matter since the extra copy to a Java buffer is negligible compared to the i/o itself (even on ssds). On Jul 28, 2011 9:04 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote: This is not advisable in general, since non-mmap'd I/O is substantially slower. I see this again and again as a claim here, but it is actually close to 10 years since I saw mmap'd I/O have any substantial performance benefits on any real life use I have needed. We have done a lot of testing of this also with cassandra and I don't see anything conclusive. We have done as many test where normal I/O has been faster than mmap and the differences may very well be within statistical variances given the complexity and number of factors involved in something like a distributed cassandra working at quorum. mmap made a difference in 2000 when memory throughput was still measured in hundreds of megabytes/sec and cpu caches was a few kilobytes, but today, you got megabytes of CPU caches with 100GB/sec bandwidths and even memory bandwidths are in 10's of GB/sec. However, I/O buffers are generally quiet small and copying an I/O buffer from kernel to user space inside a cache with 100GB/sec bandwidth is really a non-issue given the I/O throughput cassandra generates. In 2005 or so, CPUs had already reached a limit where I saw that mmap performed worse than regular I/O on as a large number of use cases. Hard to say exactly why, but I saw one theory from a FreeBSD core developer speculating back then that the extra MMU work involved in some I/O loads may actually be slower than cache internal memcopy of tiny I/O buffers (they are pretty small after all). I don't have a personal theory here. I just know that especially on large amounts of smaller I/O operations regular I/O was typically faster than mmap, which could back up that theory. So, I wonder how people came to this conclusion as I am, under no real life use case with cassandra, able to reproduce anything resembling a significant difference and we have been benchmarking on nodes with ssd setups which can churn out 1GB/sec+ read speeds. Way more I/O throughput than most people have at hand and still I cannot get mmap to give me better performance. I do, although subjectively, feel that things just seem to work better with regular I/O for us. We have currently have very nice and stable heap sizes at regardless of I/O loads and we have an easier system to operate as we can actually monitor how much memory the darned thing work. My recommendation? Stay away from mmap. I would love to understand how people got to this conclusion however and try to find out why we seem to see differences! The OP is correct that it is best to disable swap entirely, and second-best to enable JNA for mlockall. Be a bit careful with removing swap completely. Linux is not always happy when it gets short on memory. Terje
Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?
I would love to understand how people got to this conclusion however and try to find out why we seem to see differences! I won't make any claims with Cassandra because I have never bothered benchmarking the different in CPU usage since all my use-cases have been more focused on I/O efficiency, but I will say, without having benchmarked that either, the *generally*, if you're doing small reads of data that is in page cache using mmap() - something would have to be seriously wrong for that not to be significantly faster than regular I/O. There's just *no way* there is no performance penalty involved in making the context switch to kernel space, validating syscall parameters etc (not to mention the indirect effects on e.g. process scheduling etc) - compared to simply *touching some virtual memory*. It's easy to benchmark the maximum number of syscalls you can do per second, and I'll eat my left foot if you're able to do more of that than touching a piece of memory ;) Obviously this does *not* mean that mmap():ed I/O will actually be faster in some particular application. But I do want to make the point that the idea that mmap():ed I/O is good for performance (in terms of CPU) is definitely not arbitrary and unfounded. Now, and HERE is the kicker: With all the hoopla over mmap():ed I/O and benchmarks you see, as usual there are lies, damned lies and benchmarks. It's pretty easy to come up with I/O patterns where mmap() will be significantly slower (certainly on platters, I'm guessing even with modern SSD:s) than regular I/O because the method used to communicate with the operating system (touching a page of memory) is vastly different. In the most obvious and simple case, consider an application that needs to read 50 MB of data exactly, and knows it. Suppose the data is not in page cache. Submitting a read() of exactly those 50 MB clearly has at least the potential to be significantly more efficient (assuming nothing is outright wrong) than toughing pages in a sequential fashion and (1) taking multiple, potentially quite a few, page faults in the kernel, and (2) being reliant on read-ahead/pre-fetching which will never have enough knowledge to predict your 50 MB read so you'll invariable take more seeks (at least potentially with concurrent I/O) and probably read more than necessary (since pre-fetching algorithms won't know when you'll be done) than if you simply state to the kernel your exact intent of reading exactly 50*1024*1024 bytes in a particular position in a file/device. To some extent issues like these may affect Cassandra, but it's difficult to measure. For example, if you're I/O bound and doing a lot of range slices that are bigger than a single page - perhaps the default 64kb read size with standard I/O is eliminating unnecessary seeks for you that you're otherwise taking when doing I/O by paging? It's an hypothesis that is certainly plausible under some circumstances, but difficult to validate or falsify. One can probably construct a benchmark where there's no difference, yet see a significant difference in a real-world scenario when your benchmarked I/O is intermixed with other I/O. Not to mention subtle differences in behaviors of kernels, RAID controllers, disk drive controllers, etc... -- / Peter Schuller (@scode on twitter)
Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?
Benchmarks was done with up to 96GB memory, much more caching than most people will ever have. The point anyway is that you are talking I/O in 10's or at best, a few hundred MB/sec before cassandra will eat all your CPU (with dual CPU 6 cores in our case). The memcopy involved here deep inside the kernel will not be very high on the list of expensive operations. The assumption also seems to be that mmap is free cpu wise. It clearly isn't. There is definitely work involved for the CPU also when doing mmap. It is just that you move it from context switching and small I/O buffer copying to memory management. Terje On Jul 29, 2011, at 5:16 AM, Jonathan Ellis wrote: If you're actually hitting disk for most or even many of your reads then mmap doesn't matter since the extra copy to a Java buffer is negligible compared to the i/o itself (even on ssds). On Jul 28, 2011 9:04 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote: This is not advisable in general, since non-mmap'd I/O is substantially slower. I see this again and again as a claim here, but it is actually close to 10 years since I saw mmap'd I/O have any substantial performance benefits on any real life use I have needed. We have done a lot of testing of this also with cassandra and I don't see anything conclusive. We have done as many test where normal I/O has been faster than mmap and the differences may very well be within statistical variances given the complexity and number of factors involved in something like a distributed cassandra working at quorum. mmap made a difference in 2000 when memory throughput was still measured in hundreds of megabytes/sec and cpu caches was a few kilobytes, but today, you got megabytes of CPU caches with 100GB/sec bandwidths and even memory bandwidths are in 10's of GB/sec. However, I/O buffers are generally quiet small and copying an I/O buffer from kernel to user space inside a cache with 100GB/sec bandwidth is really a non-issue given the I/O throughput cassandra generates. In 2005 or so, CPUs had already reached a limit where I saw that mmap performed worse than regular I/O on as a large number of use cases. Hard to say exactly why, but I saw one theory from a FreeBSD core developer speculating back then that the extra MMU work involved in some I/O loads may actually be slower than cache internal memcopy of tiny I/O buffers (they are pretty small after all). I don't have a personal theory here. I just know that especially on large amounts of smaller I/O operations regular I/O was typically faster than mmap, which could back up that theory. So, I wonder how people came to this conclusion as I am, under no real life use case with cassandra, able to reproduce anything resembling a significant difference and we have been benchmarking on nodes with ssd setups which can churn out 1GB/sec+ read speeds. Way more I/O throughput than most people have at hand and still I cannot get mmap to give me better performance. I do, although subjectively, feel that things just seem to work better with regular I/O for us. We have currently have very nice and stable heap sizes at regardless of I/O loads and we have an easier system to operate as we can actually monitor how much memory the darned thing work. My recommendation? Stay away from mmap. I would love to understand how people got to this conclusion however and try to find out why we seem to see differences! The OP is correct that it is best to disable swap entirely, and second-best to enable JNA for mlockall. Be a bit careful with removing swap completely. Linux is not always happy when it gets short on memory. Terje
Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?
Hi, yes I was looking for this config as well. This is really simple to achieve: Put the following line into /etc/security/limits.conf cassandra- memlock 32 Then, start Cassandra as the user cassandra, not as root (note there is never a need to run Cassandra as root, all functionality can be achieved from a normal user by changing the right configs). In the log you will see: [2011-07-29 13:06:46,491] WARN: Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root. (main CLibrary.java:118) Done. If you want enable mlockall again, simply change the above 32(k) value to the size of your RAM (e.g. 17825792 for a 16GB machine). We also have turned off mmap altogether and never had any memory issues again. swap is happily enabled. We currently prefer stability over performance. Cheers, T. On 28/07/11 22:17, Stephen Henderson wrote: Hi, We’ve started having problems with cassandra and memory swapping on linux which seems to be a fairly common issue (in our particular case after about a week all swap space will have been used up and we have to restart the process). It sounds like the general consensus is to just disable swap completely, but the recently released “Cassandra High Performance Cookbook” from Packt has instructions for “Stopping cassandra from using swap without disabling it system wide”. We’ve tried following the instructions but it refers to a “memory_locking_policy” variable in cassandra.yaml which throws an “unknown property” error on startup and I can’t find any reference to it in any of the cassandra docs. I’ve copied the summarised instructions below, does anyone know if this is something that ever worked or is there a different variable to set which does the same thing? (we’re using 0.7.4 at present and it looks like the book was written for 0.7.0-beta-1.10 so it might have been something which was abandoned during beta?) --- Disabling Swap Memory system-wide may not always be desirable. For example, if the system is not dedicated to running Cassandra, other processes on the system may benefit from Swap Memory. This recipe shows how to install the Java Native Architecture, which allows Java to lock itself in memory making it inevitable. 1. Place the jna.jar and platform.jar in the $CASSANDRA_HOME/lib directory: 2. Enable memory_locking_policy in $CASSANDRA_HOME/conf/cassandra.yaml: “memory_locking_policy: required” 3. Restart your Cassandra instance. 4. Confirm this configuration has taken effect by checking to see if a large portion of memory is Unevictable: $ grep Unevictable /proc/meminfo Unevictable: 1024 Kb --- Thanks, Stephen *Stephen Henderson – *Lead Developer (Onsite), Cognitive Match stephen.hender...@cognitivematch.com mailto:stephen.hender...@cognitivematch.com | http://www.cognitivematch.com http://www.cognitivematch.com/ T: +44 (0) 203 205 0004 | F: +44 (0) 207 526 2226