On 9/30/05, Ron Peacetree <[EMAIL PROTECTED]> wrote:
> 4= I'm sure we are paying all sorts of nasty overhead for essentially
> emulating the pg "filesystem" inside another filesystem. That means
> ~2x as much overhead to access a particular piece of data.
> The simplest solution is for us to implement a new VFS compatible
> filesystem tuned to exactly our needs: pgfs.
> We may be able to avoid that by some amount of hacking or
> modifying of the current FSs we use, but I suspect it would be more
> work for less ROI.
On this point, Reiser4 fs already implements a number of things which
would be desirable for PostgreSQL. For example: write()s to reiser4
filesystems are atomic, so there is no risk of torn pages (this is
enabled because reiser4 uses WAFL like logging where data is not
overwritten but rather relocated). The filesystem is modular and
extensible so it should be easy to add whatever additional semantics
are needed. I would imagine that all that would be needed is some
more atomicity operations (single writes are already atomic, but I'm
sure it would be useful to batch many writes into a transaction),some
layout and packing controls, and some flush controls. A step further
would perhaps integrate multiversioning directly into the FS (the
wandering logging system provides the write side of multiversioning, a
little read side work would be required.). More importantly: the file
system was intended to be extensible for this sort of application.
It might make a good 'summer of code' project for someone next year,
... presumably by then reiser4 will have made it into the mainline
kernel by then. :)
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster