Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-29 Thread Peter Schuller
 It clearly isn't. There is definitely work involved for the CPU also when
 doing mmap. It is just that you move it from context switching and small I/O
 buffer copying to memory management.

*All* memory access a process does is subject to the rules of the
memory management unit of the CPU, so that cost is not specific to
mmap():ed files (once a given page is in core that is).

(But again, I'm not arguing the point in Cassandra's case; just generally.)

-- 
/ Peter Schuller (@scode on twitter)


Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-29 Thread Jonathan Ellis
On Fri, Jul 29, 2011 at 4:35 AM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
 What is the origin of the mmap
 is substantially faster claim?

The origin is the performance testing we did when adding mmap'd i/o.

I believe Chris Goffinet also found a double-digit percentage
performance improvement at Digg and/or Twitter, but I don't remember
the details.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-28 Thread Jonathan Ellis
This is not advisable in general, since non-mmap'd I/O is substantially slower.

The OP is correct that it is best to disable swap entirely, and
second-best to enable JNA for mlockall.

On Thu, Jul 28, 2011 at 7:05 AM, Adi adi.pan...@gmail.com wrote:
 Hi,



 We’ve started having problems with cassandra and memory swapping on linux
 which seems to be a fairly common issue (in our particular case after about
 a week all swap space will have been used up and we have to restart the
 process).



 It sounds like the general consensus is to just disable swap completely,
 but the recently released “Cassandra High Performance Cookbook” from Packt
 has instructions for “Stopping cassandra from using swap without disabling
 it system wide”. We’ve tried following the instructions but it refers to a
 “memory_locking_policy” variable in cassandra.yaml which throws an “unknown
 property” error on startup and I can’t find any reference to it in any of
 the cassandra docs.



 I’ve copied the summarised instructions below, does anyone know if this is
 something that ever worked or is there a different variable to set which
 does the same thing?

 If you are having trouble preventing the swapping the other parameter that
 can help is disk_access_mode
 We are using mmap_index_only and that has prevented swapping for now.
 auto will try to use mmap for all disk access ,
 mmap will use mmap
 standard will not use mmap

 Search for swapping on the users list and go through the email discussions
 and jira issues related to swapping and that will give you an idea what can
 work for you.
 -Adi



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-28 Thread Jonathan Ellis
I don't think there's ever been a memory_locking_policy variable.
Cassandra will call mlockall if JNA is present, no further steps
required.

On Thu, Jul 28, 2011 at 5:17 AM, Stephen Henderson
stephen.hender...@cognitivematch.com wrote:
 Hi,



 We’ve started having problems with cassandra and memory swapping on linux
 which seems to be a fairly common issue (in our particular case after about
 a week all swap space will have been used up and we have to restart the
 process).



 It sounds like the general consensus is to just disable swap completely, but
 the recently released “Cassandra High Performance Cookbook” from Packt has
 instructions for “Stopping cassandra from using swap without disabling it
 system wide”. We’ve tried following the instructions but it refers to a
 “memory_locking_policy” variable in cassandra.yaml which throws an “unknown
 property” error on startup and I can’t find any reference to it in any of
 the cassandra docs.



 I’ve copied the summarised instructions below, does anyone know if this is
 something that ever worked or is there a different variable to set which
 does the same thing? (we’re using 0.7.4 at present and it looks like the
 book was written for 0.7.0-beta-1.10 so it might have been something which
 was abandoned during beta?)

 ---

 Disabling Swap Memory system-wide may not always be desirable. For example,
 if the system is not dedicated to running Cassandra, other processes on the
 system may benefit from Swap Memory. This recipe shows how to install the
 Java Native Architecture, which allows Java to lock itself in memory making
 it inevitable.



 1. Place the jna.jar and platform.jar in the $CASSANDRA_HOME/lib directory:

 2. Enable memory_locking_policy in $CASSANDRA_HOME/conf/cassandra.yaml:
 “memory_locking_policy: required”

 3. Restart your Cassandra instance.

 4. Confirm this configuration has taken effect by checking to see if a large
 portion of memory is Unevictable:

 $ grep Unevictable /proc/meminfo

 Unevictable:    1024 Kb

 ---





 Thanks,

 Stephen



 Stephen Henderson – Lead Developer (Onsite), Cognitive Match

 stephen.hender...@cognitivematch.com | http://www.cognitivematch.com

 T: +44 (0) 203 205 0004 | F: +44 (0) 207 526 2226





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-28 Thread Terje Marthinussen

On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote:

 This is not advisable in general, since non-mmap'd I/O is substantially 
 slower.

I see this again and again as a claim here, but it is actually close to 10 
years since I saw mmap'd I/O have any substantial performance benefits on any 
real life use I have needed.

We have done a lot of testing of this also with cassandra and I don't see 
anything conclusive. We have done as many test where normal I/O has been faster 
than mmap and the differences may very well be within statistical variances 
given the complexity and number of factors involved in something like a 
distributed cassandra working at quorum.

mmap made a difference in 2000 when memory throughput was still measured in 
hundreds of megabytes/sec and cpu caches was a few kilobytes, but today, you 
got megabytes of CPU caches with 100GB/sec bandwidths and even memory 
bandwidths are in 10's of GB/sec.

However, I/O buffers are generally quiet small and copying an I/O  buffer from 
kernel to user space inside a cache with 100GB/sec bandwidth is really  a 
non-issue given the I/O throughput cassandra generates.

In 2005 or so, CPUs had already reached a limit where I saw that mmap performed 
worse than regular I/O on as a large number of use cases. 

Hard to say exactly why, but I saw one theory from a FreeBSD core developer 
speculating back then that the extra MMU work involved in some I/O loads may 
actually be slower than cache internal memcopy of tiny I/O buffers (they are 
pretty small after all).

I don't have a personal theory here. I just know that especially on large 
amounts of smaller I/O operations regular I/O was typically faster than mmap, 
which could back up that theory.

So, I wonder how people came to this conclusion as I am, under no real life use 
case with cassandra, able to reproduce anything resembling a significant 
difference and we have been benchmarking on nodes with ssd setups which can 
churn out 1GB/sec+ read speeds. 

Way more I/O throughput than most people have at hand and still I cannot get 
mmap to give me better performance.

I do, although subjectively, feel that things just seem to work better with 
regular I/O for us. We have currently have very nice and stable heap sizes at 
regardless of I/O loads and we have an easier system to operate as we can 
actually monitor how much memory the darned thing work.

My recommendation? Stay away from mmap.

I would love to understand how people got to this conclusion however and try to 
find out why we seem to see differences!

 The OP is correct that it is best to disable swap entirely, and
 second-best to enable JNA for mlockall.

Be a bit careful with removing swap completely. Linux is not always happy when 
it gets short on memory.

Terje

Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-28 Thread Jonathan Ellis
If you're actually hitting disk for most or even many of your reads then
mmap doesn't matter since the extra copy to a Java buffer is negligible
compared to the i/o itself (even on ssds).
 On Jul 28, 2011 9:04 AM, Terje Marthinussen tmarthinus...@gmail.com
wrote:

 On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote:

 This is not advisable in general, since non-mmap'd I/O is substantially
slower.

 I see this again and again as a claim here, but it is actually close to 10
years since I saw mmap'd I/O have any substantial performance benefits on
any real life use I have needed.

 We have done a lot of testing of this also with cassandra and I don't see
anything conclusive. We have done as many test where normal I/O has been
faster than mmap and the differences may very well be within statistical
variances given the complexity and number of factors involved in something
like a distributed cassandra working at quorum.

 mmap made a difference in 2000 when memory throughput was still measured
in hundreds of megabytes/sec and cpu caches was a few kilobytes, but today,
you got megabytes of CPU caches with 100GB/sec bandwidths and even memory
bandwidths are in 10's of GB/sec.

 However, I/O buffers are generally quiet small and copying an I/O buffer
from kernel to user space inside a cache with 100GB/sec bandwidth is really
a non-issue given the I/O throughput cassandra generates.

 In 2005 or so, CPUs had already reached a limit where I saw that mmap
performed worse than regular I/O on as a large number of use cases.

 Hard to say exactly why, but I saw one theory from a FreeBSD core
developer speculating back then that the extra MMU work involved in some I/O
loads may actually be slower than cache internal memcopy of tiny I/O buffers
(they are pretty small after all).

 I don't have a personal theory here. I just know that especially on large
amounts of smaller I/O operations regular I/O was typically faster than
mmap, which could back up that theory.

 So, I wonder how people came to this conclusion as I am, under no real
life use case with cassandra, able to reproduce anything resembling a
significant difference and we have been benchmarking on nodes with ssd
setups which can churn out 1GB/sec+ read speeds.

 Way more I/O throughput than most people have at hand and still I cannot
get mmap to give me better performance.

 I do, although subjectively, feel that things just seem to work better
with regular I/O for us. We have currently have very nice and stable heap
sizes at regardless of I/O loads and we have an easier system to operate as
we can actually monitor how much memory the darned thing work.

 My recommendation? Stay away from mmap.

 I would love to understand how people got to this conclusion however and
try to find out why we seem to see differences!

 The OP is correct that it is best to disable swap entirely, and
 second-best to enable JNA for mlockall.

 Be a bit careful with removing swap completely. Linux is not always happy
when it gets short on memory.

 Terje


Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-28 Thread Peter Schuller
 I would love to understand how people got to this conclusion however and try 
 to find out why we seem to see differences!

I won't make any claims with Cassandra because I have never bothered
benchmarking the different in CPU usage since all my use-cases have
been more focused on I/O efficiency, but I will say, without having
benchmarked that either, the *generally*, if you're doing small reads
of data that is in page cache using mmap() - something would have to
be seriously wrong for that not to be significantly faster than
regular I/O.

There's just *no way* there is no performance penalty involved in
making the context switch to kernel space, validating syscall
parameters etc (not to mention the indirect effects on e.g. process
scheduling etc) - compared to simply *touching some virtual memory*.
It's easy to benchmark the maximum number of syscalls you can do per
second, and I'll eat my left foot if you're able to do more of that
than touching a piece of memory ;)

Obviously this does *not* mean that mmap():ed I/O will actually be
faster in some particular application. But I do want to make the point
that the idea that mmap():ed I/O is good for performance (in terms of
CPU) is definitely not arbitrary and unfounded.

Now, and HERE is the kicker: With all the hoopla over mmap():ed I/O
and benchmarks you see, as usual there are lies, damned lies and
benchmarks. It's pretty easy to come up with I/O patterns where mmap()
will be significantly slower (certainly on platters, I'm guessing even
with modern SSD:s) than regular I/O because the method used to
communicate with the operating system (touching a page of memory) is
vastly different.

In the most obvious and simple case, consider an application that
needs to read 50 MB of data exactly, and knows it. Suppose the data is
not in page cache. Submitting a read() of exactly those 50 MB clearly
has at least the potential to be significantly more efficient
(assuming nothing is outright wrong) than toughing pages in a
sequential fashion and (1) taking multiple, potentially quite a few,
page faults in the kernel, and (2) being reliant on
read-ahead/pre-fetching which will never have enough knowledge to
predict your 50 MB read so you'll invariable take more seeks (at least
potentially with concurrent I/O) and probably read more than necessary
(since pre-fetching algorithms won't know when you'll be done) than
if you simply state to the kernel your exact intent of reading exactly
50*1024*1024 bytes in a particular position in a file/device.

To some extent issues like these may affect Cassandra, but it's
difficult to measure. For example, if you're I/O bound and doing a lot
of range slices that are bigger than a single page - perhaps the
default 64kb read size with standard I/O is eliminating unnecessary
seeks for you that you're otherwise taking when doing I/O by paging?
It's an hypothesis that is certainly plausible under some
circumstances, but difficult to validate or falsify. One can probably
construct a benchmark where there's no difference, yet see a
significant difference in a real-world scenario when your benchmarked
I/O is intermixed with other I/O. Not to mention subtle differences in
behaviors of kernels, RAID controllers, disk drive controllers, etc...

-- 
/ Peter Schuller (@scode on twitter)


Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-28 Thread Terje Marthinussen
Benchmarks was done with up to 96GB memory, much more caching than most people 
will ever have.

The point anyway is that you are talking I/O in 10's or at best, a few hundred 
MB/sec before cassandra will eat all your CPU (with dual CPU 6 cores in our 
case).

The memcopy involved here deep inside the kernel will not be very high on the 
list of expensive operations.

The assumption also seems to be that mmap is free cpu wise. 
It clearly isn't. There is definitely work involved for the CPU also when doing 
mmap. It is just that you move it from context switching and small I/O buffer 
copying to memory management.

Terje

On Jul 29, 2011, at 5:16 AM, Jonathan Ellis wrote:

 If you're actually hitting disk for most or even many of your reads then mmap 
 doesn't matter since the extra copy to a Java buffer is negligible compared 
 to the i/o itself (even on ssds). 
 On Jul 28, 2011 9:04 AM, Terje Marthinussen tmarthinus...@gmail.com wrote:
  
  On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote:
  
  This is not advisable in general, since non-mmap'd I/O is substantially 
  slower.
  
  I see this again and again as a claim here, but it is actually close to 10 
  years since I saw mmap'd I/O have any substantial performance benefits on 
  any real life use I have needed.
  
  We have done a lot of testing of this also with cassandra and I don't see 
  anything conclusive. We have done as many test where normal I/O has been 
  faster than mmap and the differences may very well be within statistical 
  variances given the complexity and number of factors involved in something 
  like a distributed cassandra working at quorum.
  
  mmap made a difference in 2000 when memory throughput was still measured in 
  hundreds of megabytes/sec and cpu caches was a few kilobytes, but today, 
  you got megabytes of CPU caches with 100GB/sec bandwidths and even memory 
  bandwidths are in 10's of GB/sec.
  
  However, I/O buffers are generally quiet small and copying an I/O buffer 
  from kernel to user space inside a cache with 100GB/sec bandwidth is really 
  a non-issue given the I/O throughput cassandra generates.
  
  In 2005 or so, CPUs had already reached a limit where I saw that mmap 
  performed worse than regular I/O on as a large number of use cases. 
  
  Hard to say exactly why, but I saw one theory from a FreeBSD core developer 
  speculating back then that the extra MMU work involved in some I/O loads 
  may actually be slower than cache internal memcopy of tiny I/O buffers 
  (they are pretty small after all).
  
  I don't have a personal theory here. I just know that especially on large 
  amounts of smaller I/O operations regular I/O was typically faster than 
  mmap, which could back up that theory.
  
  So, I wonder how people came to this conclusion as I am, under no real life 
  use case with cassandra, able to reproduce anything resembling a 
  significant difference and we have been benchmarking on nodes with ssd 
  setups which can churn out 1GB/sec+ read speeds. 
  
  Way more I/O throughput than most people have at hand and still I cannot 
  get mmap to give me better performance.
  
  I do, although subjectively, feel that things just seem to work better with 
  regular I/O for us. We have currently have very nice and stable heap sizes 
  at regardless of I/O loads and we have an easier system to operate as we 
  can actually monitor how much memory the darned thing work.
  
  My recommendation? Stay away from mmap.
  
  I would love to understand how people got to this conclusion however and 
  try to find out why we seem to see differences!
  
  The OP is correct that it is best to disable swap entirely, and
  second-best to enable JNA for mlockall.
  
  Be a bit careful with removing swap completely. Linux is not always happy 
  when it gets short on memory.
  
  Terje



Re: memory_locking_policy parameter in cassandra.yaml for disabling swap - has this variable been renamed?

2011-07-28 Thread Teijo Holzer

Hi,

yes I was looking for this config as well.

This is really simple to achieve:

Put the following line into /etc/security/limits.conf

cassandra-   memlock 32

Then, start Cassandra as the user cassandra, not as root (note there is never a 
need to run Cassandra as root, all functionality can be achieved from a normal 
user by changing the right configs).


In the log you will see:

[2011-07-29 13:06:46,491] WARN: Unable to lock JVM memory (ENOMEM). This can 
result in part of the JVM being swapped out, especially with mmapped I/O 
enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root. (main CLibrary.java:118)


Done.

If you want enable mlockall again, simply change the above 32(k) value to the 
size of your RAM (e.g. 17825792 for a 16GB machine).


We also have turned off mmap altogether and never had any memory issues again. 
swap is happily enabled. We currently prefer stability over performance.


Cheers,

T.

On 28/07/11 22:17, Stephen Henderson wrote:

Hi,

We’ve started having problems with cassandra and memory swapping on linux which
seems to be a fairly common issue (in our particular case after about a week
all swap space will have been used up and we have to restart the process).

It sounds like the general consensus is to just disable swap completely, but
the recently released “Cassandra High Performance Cookbook” from Packt has
instructions for “Stopping cassandra from using swap without disabling it
system wide”. We’ve tried following the instructions but it refers to a
“memory_locking_policy” variable in cassandra.yaml which throws an “unknown
property” error on startup and I can’t find any reference to it in any of the
cassandra docs.

I’ve copied the summarised instructions below, does anyone know if this is
something that ever worked or is there a different variable to set which does
the same thing? (we’re using 0.7.4 at present and it looks like the book was
written for 0.7.0-beta-1.10 so it might have been something which was abandoned
during beta?)

---

Disabling Swap Memory system-wide may not always be desirable. For example, if
the system is not dedicated to running Cassandra, other processes on the system
may benefit from Swap Memory. This recipe shows how to install the Java Native
Architecture, which allows Java to lock itself in memory making it inevitable.

1. Place the jna.jar and platform.jar in the $CASSANDRA_HOME/lib directory:

2. Enable memory_locking_policy in $CASSANDRA_HOME/conf/cassandra.yaml:
“memory_locking_policy: required”

3. Restart your Cassandra instance.

4. Confirm this configuration has taken effect by checking to see if a large
portion of memory is Unevictable:

$ grep Unevictable /proc/meminfo

Unevictable: 1024 Kb

---

Thanks,

Stephen

*Stephen Henderson – *Lead Developer (Onsite), Cognitive Match

stephen.hender...@cognitivematch.com
mailto:stephen.hender...@cognitivematch.com | http://www.cognitivematch.com
http://www.cognitivematch.com/

T: +44 (0) 203 205 0004 | F: +44 (0) 207 526 2226