Benchmarks was done with up to 96GB memory, much more caching than most people 
will ever have.

The point anyway is that you are talking I/O in 10's or at best, a few hundred 
MB/sec before cassandra will eat all your CPU (with dual CPU 6 cores in our 
case).

The memcopy involved here deep inside the kernel will not be very high on the 
list of expensive operations.

The assumption also seems to be that mmap is "free" cpu wise. 
It clearly isn't. There is definitely work involved for the CPU also when doing 
mmap. It is just that you move it from context switching and small I/O buffer 
copying to memory management.

Terje

On Jul 29, 2011, at 5:16 AM, Jonathan Ellis wrote:

> If you're actually hitting disk for most or even many of your reads then mmap 
> doesn't matter since the extra copy to a Java buffer is negligible compared 
> to the i/o itself (even on ssds). 
> On Jul 28, 2011 9:04 AM, "Terje Marthinussen" <tmarthinus...@gmail.com> wrote:
> > 
> > On Jul 28, 2011, at 9:52 PM, Jonathan Ellis wrote:
> > 
> >> This is not advisable in general, since non-mmap'd I/O is substantially 
> >> slower.
> > 
> > I see this again and again as a claim here, but it is actually close to 10 
> > years since I saw mmap'd I/O have any substantial performance benefits on 
> > any real life use I have needed.
> > 
> > We have done a lot of testing of this also with cassandra and I don't see 
> > anything conclusive. We have done as many test where normal I/O has been 
> > faster than mmap and the differences may very well be within statistical 
> > variances given the complexity and number of factors involved in something 
> > like a distributed cassandra working at quorum.
> > 
> > mmap made a difference in 2000 when memory throughput was still measured in 
> > hundreds of megabytes/sec and cpu caches was a few kilobytes, but today, 
> > you got megabytes of CPU caches with 100GB/sec bandwidths and even memory 
> > bandwidths are in 10's of GB/sec.
> > 
> > However, I/O buffers are generally quiet small and copying an I/O buffer 
> > from kernel to user space inside a cache with 100GB/sec bandwidth is really 
> > a non-issue given the I/O throughput cassandra generates.
> > 
> > In 2005 or so, CPUs had already reached a limit where I saw that mmap 
> > performed worse than regular I/O on as a large number of use cases. 
> > 
> > Hard to say exactly why, but I saw one theory from a FreeBSD core developer 
> > speculating back then that the extra MMU work involved in some I/O loads 
> > may actually be slower than cache internal memcopy of tiny I/O buffers 
> > (they are pretty small after all).
> > 
> > I don't have a personal theory here. I just know that especially on large 
> > amounts of smaller I/O operations regular I/O was typically faster than 
> > mmap, which could back up that theory.
> > 
> > So, I wonder how people came to this conclusion as I am, under no real life 
> > use case with cassandra, able to reproduce anything resembling a 
> > significant difference and we have been benchmarking on nodes with ssd 
> > setups which can churn out 1GB/sec+ read speeds. 
> > 
> > Way more I/O throughput than most people have at hand and still I cannot 
> > get mmap to give me better performance.
> > 
> > I do, although subjectively, feel that things just seem to work better with 
> > regular I/O for us. We have currently have very nice and stable heap sizes 
> > at regardless of I/O loads and we have an easier system to operate as we 
> > can actually monitor how much memory the darned thing work.
> > 
> > My recommendation? Stay away from mmap.
> > 
> > I would love to understand how people got to this conclusion however and 
> > try to find out why we seem to see differences!
> > 
> >> The OP is correct that it is best to disable swap entirely, and
> >> second-best to enable JNA for mlockall.
> > 
> > Be a bit careful with removing swap completely. Linux is not always happy 
> > when it gets short on memory.
> > 
> > Terje

Reply via email to