Sorry for the late reply - I've been away, and I've had problems posting too
Merlin, I'd like to come back with a few more points!
>That's the whole point: memory is a limited resource. If pg is
>crawling, then the problem is simple: you need more memory.
My posting only relates to the scenario where RAM is not a limiting factor,
a scenario which shall become increasingly common over the next few years,
as 64 bit processors and OSs allow the exploitation of ever larger, ever
cheaper RAM. Incidentally, If PG is crawling, memory might be the
problem...but not necessarily - could be disk bound on writes.
> The question is: is it postgresql's responsibility to manage that
I think you are confusing the issue of RAM and address space.
Any application can acquire a piece of address space for its own use. It is
the responsibility of the application to do what it needs with that address
space. I'm interested in how PG could do something better in its address
space when it knows that it can fit all the data it operates on within that
Though the OS is responsible for determining whether that address space is
RAM resident or not, in my scenario, this is irrelevant, because there
*will* be enough RAM for everything, and the OS will, in that scenario,
allow all the address space to become RAM resident.
I am not advocating undermining the OS in any way. It would be stupid to
make PGSQL take over the running of the hardware. I've learned the hard way
that bypassing the OS is just a big pain up the backside!
>Pg is a data management tool, not a memory management tool.
I'm not criticising PG. PG is actually a 'DISK/MEMORY' data management tool.
It manages data which lives on disks, but it can only operate on that data
in memory, and goes to some lengths to try to fit bits of disk data in a
defined piece of memory, and push them back out again.
At the moment, this model assumes that RAM is a scarce resource.
The model still 'sort of' works when RAM is actually not scarce, because the
OS effectively uses that extra RAM to make IO *appear* to be quicker, and
indeed, I've found that a hint has been added to PG to tell it how much the
OS is likely to be caching.
But the question is this:
"If you wrote a DB from scratch with the assumption that *all* the data
could fit in the address space allocated by the postmaster, and you were
confident that the OS had enough RAM so that you never suffered vmem page
misses, couldn't you make things go much faster?"
A more pertinent question is:
"Could PG be extended to have a flag, which when enabled, told it to operate
with the assumption that it could fit all the disk data in RAM, and
implement the data organisation optimisations that rely on the persistence
of data in address space?"
>'let's manage everything' argument also frequently gets brought up wrt
>file i/o, because people assume the o/s sucks at file management.
Well, I'm not saying this.
I have substantial experience with high performance file IO through a
But if you are interested in high speed IO, naive 'let the OS do everything'
approach isn't often good enough. You, the application, have to be aware
that the order and timing of IO requests, along with the size of IO block
you cause to trigger, have a dramatic impact on the speed with which the
data reaches your app, OS or no OS. Most high speed storage still relies on
spinning things containing data that can only be accessed in a certain way,
and data movement is page boundary sensitive. The OS may hide these details
from you, but you, the app writer, have to have an understanding of the
underlying reality if you want to optimise performance.
I want to stress that at no point am I advocating *not* using the OS. PG
should do ALL IO and memory allocation through the OS, otherwise you end up
with a platform specific product that is of little use.
That given, there is still the opportunity for PG to be able to operate far
more efficiently in my high memory scenario.
Wouldn't your backend processes like to have the entire database sitting
ready in address space (ram resident, of course!), indexes all fully built?
No tuple more than a few machine instructions away?
Imagine the postmaster isn't having to frantically decide which bits of data
to kick out of the workspace in order to keep the backends happy. Imagine
the postmaster isn't having to build structures to keep track of the newly
read in blocks of data from 'disk' (or OS cache). Imagine that everything
was just there...
Is this not a compelling scenario?
>At some point, hard disks will be replaced by solid state memory
This is irrelevant to my scenario. The optimisations I crave are to do with
getting the entire database in a query-optimised form near to the CPUS -
i.e. in fast RAM. (I'd expect solid state disk ram to be much slower than
the RAM that sits nearer the CPU).
The speed of the persistent storage system (whether spinning platters or
some sort of persistent solid state memory) isn't really of direct relevance
to my argument. Solid state disks would allow write speeds to improve, which
would add to the gains which I am fishing for. So when they come, I'll be
Still. Solid state disks aren't really an option right now. Big RAM is.
>do you really want to recode your memory manager when
>this happens because all your old assumptions are no longer correct?
My scenario assumes nothing about how the data is stored, but you are right
to flag the problems that arise when original assumptions about hardware
For example, PG assumes that RAM is a rare resource, and it assumes the
postmaster cannot fit the entire database in a single address space.
*These* assumptions are now not correct, following the 64bit address space
The availability of 64 bit addressing and huge banks of RAM is of enormous
significance to databases, and it is the whole reason for my post.
Over the next 5-10 years, an increasing proportion of databases will fit
comfortably in RAM resident address space on commodity equipment. My
particular nationwide application will fit into RAM now. Hence my interest!
So, the question for the people involved in PG is: *how* can PG be improved
to make use of this, and reap the very substantial speed gains in this
scenario, without breaking the existing usage scenarios of PG in the
traditional 'DB > RAM' scenario?
The answer isn't "undermine the OS". The answer might be"make the postmaster
to build and operate with persistent, query optimised representations of the
Yes, I guess that might be a lot of work.. But the DB that delivers this
performance will be very well placed in the next 5 years, don't you think?
Anyway - I look forward to further feedback, and thanks for your comments so
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend