Aaron Werman wrote:
> pg to my mind is unique in not trying to avoid OS buffering. Other
> dbmses spend a substantial effort to create a virtual OS (task
> management, I/O drivers, etc.) both in code and support. Choosing mmap
> seems such a limiting an option - it adds OS dependency and limits
> kernel developer options (2G limits, global mlock serializations,
> porting problems, inability to schedule or parallelize I/O, still
> having to coordinate writers and readers).

I'm not sure I entirely agree with this.  Whether you access a file
via mmap() or via read(), the end result is that you still have to
access it, and since PG has significant chunks of system-dependent
code that it heavily relies on as it is (e.g., locking mechanisms,
shared memory), writing the I/O subsystem in a similar way doesn't
seem to me to be that much of a stretch (especially since PG already
has the storage manager), though it might involve quite a bit of work.

As for parallelization of I/O, the use of mmap() for reads should
signficantly improve parallelization -- now instead of issuing read()
system calls, possibly for the same set of blocks, all the backends
would essentially be examining the same data directly.  The
performance improvements as a result of accessing the kernel's cache
pages directly instead of having it do buffer copies to process-local
memory should increase as concurrency goes up.  But see below.

> More to the point, I think it is very hard to effectively coordinate
> multithreaded I/O, and mmap seems used mostly to manage relatively
> simple scenarios. 

PG already manages and coordinates multithreaded I/O.  The mechanisms
used to coordinate writes needn't change at all.  But the way reads
are done relative to writes might have to be rethought, since an
mmap()ed buffer always reflects what's actually in kernel space at the
time the buffer is accessed, while a buffer retrieved via read()
reflects the state of the file at the time of the read().  If it's
necessary for the state of the buffers to be fixed at examination
time, then mmap() will be at best a draw, not a win.

> mmap doesn't look that promising.

This ultimately depends on two things: how much time is spent copying
buffers around in kernel memory, and how much advantage can be gained
by freeing up the memory used by the backends to store the
backend-local copies of the disk pages they use (and thus making that
memory available to the kernel to use for additional disk buffering).
The gains from the former are likely small.  The gains from the latter
are probably also small, but harder to estimate.

The use of mmap() is probably one of those optimizations that should
be done when there's little else left to optimize, because the
potential gains are possibly (if not probably) relatively small and
the amount of work involved may be quite large.

So I agree -- compared with other, much lower-hanging fruit, mmap()
doesn't look promising.

Kevin Brown                                           [EMAIL PROTECTED]

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to