On 9/30/05, Ron Peacetree <[EMAIL PROTECTED]> wrote: > 4= I'm sure we are paying all sorts of nasty overhead for essentially > emulating the pg "filesystem" inside another filesystem. That means > ~2x as much overhead to access a particular piece of data. > > The simplest solution is for us to implement a new VFS compatible > filesystem tuned to exactly our needs: pgfs. > > We may be able to avoid that by some amount of hacking or > modifying of the current FSs we use, but I suspect it would be more > work for less ROI.
On this point, Reiser4 fs already implements a number of things which would be desirable for PostgreSQL. For example: write()s to reiser4 filesystems are atomic, so there is no risk of torn pages (this is enabled because reiser4 uses WAFL like logging where data is not overwritten but rather relocated). The filesystem is modular and extensible so it should be easy to add whatever additional semantics are needed. I would imagine that all that would be needed is some more atomicity operations (single writes are already atomic, but I'm sure it would be useful to batch many writes into a transaction),some layout and packing controls, and some flush controls. A step further would perhaps integrate multiversioning directly into the FS (the wandering logging system provides the write side of multiversioning, a little read side work would be required.). More importantly: the file system was intended to be extensible for this sort of application. It might make a good 'summer of code' project for someone next year, ... presumably by then reiser4 will have made it into the mainline kernel by then. :) ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster