Aaron Werman wrote: > pg to my mind is unique in not trying to avoid OS buffering. Other > dbmses spend a substantial effort to create a virtual OS (task > management, I/O drivers, etc.) both in code and support. Choosing mmap > seems such a limiting an option - it adds OS dependency and limits > kernel developer options (2G limits, global mlock serializations, > porting problems, inability to schedule or parallelize I/O, still > having to coordinate writers and readers).
I'm not sure I entirely agree with this. Whether you access a file via mmap() or via read(), the end result is that you still have to access it, and since PG has significant chunks of system-dependent code that it heavily relies on as it is (e.g., locking mechanisms, shared memory), writing the I/O subsystem in a similar way doesn't seem to me to be that much of a stretch (especially since PG already has the storage manager), though it might involve quite a bit of work. As for parallelization of I/O, the use of mmap() for reads should signficantly improve parallelization -- now instead of issuing read() system calls, possibly for the same set of blocks, all the backends would essentially be examining the same data directly. The performance improvements as a result of accessing the kernel's cache pages directly instead of having it do buffer copies to process-local memory should increase as concurrency goes up. But see below. > More to the point, I think it is very hard to effectively coordinate > multithreaded I/O, and mmap seems used mostly to manage relatively > simple scenarios. PG already manages and coordinates multithreaded I/O. The mechanisms used to coordinate writes needn't change at all. But the way reads are done relative to writes might have to be rethought, since an mmap()ed buffer always reflects what's actually in kernel space at the time the buffer is accessed, while a buffer retrieved via read() reflects the state of the file at the time of the read(). If it's necessary for the state of the buffers to be fixed at examination time, then mmap() will be at best a draw, not a win. > mmap doesn't look that promising. This ultimately depends on two things: how much time is spent copying buffers around in kernel memory, and how much advantage can be gained by freeing up the memory used by the backends to store the backend-local copies of the disk pages they use (and thus making that memory available to the kernel to use for additional disk buffering). The gains from the former are likely small. The gains from the latter are probably also small, but harder to estimate. The use of mmap() is probably one of those optimizations that should be done when there's little else left to optimize, because the potential gains are possibly (if not probably) relatively small and the amount of work involved may be quite large. So I agree -- compared with other, much lower-hanging fruit, mmap() doesn't look promising. -- Kevin Brown [EMAIL PROTECTED] ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly