On Wed, 24 Aug 2005, PFC wrote: > > > Josh Berkus has already mentioned this as conventional wisdom as written > > by Oracle. This may also be legacy wisdom. Oracle/Sybase/etc has been > > around for a long time; it was probably a clear performance win way back > > when. Nowadays with how far open-source OS's have advanced, I'd take it > > with a grain of salt and do my own performance analysis. I suspect the > > big vendors wouldn't change their stance even if they knew it was no > > longer true due to the support hassles. > > Reinvent a filesystem... that would be suicidal. > > Now, Hans Reiser has expressed interest on the ReiserFS list in tweaking > his Reiser4 especially for Postgres. In his own words, he wants a "Killer > app for reiser4". Reiser4 will offser transactional semantics via a > special reiser4 syscall, so it might be possible, with a minimum of > changes to postgres (ie maybe just another sync mode besides fsync, > fdatasync et al) to use this. Other interesting details were exposed on > the reiser list, too (ie. a transactional filesystems can give ACID > guarantees to postgres without the need for fsync()). > > Very interesting.
Ummm... I don't see anything here which will be a win for Postgres. The transactional semantics we're interested in are fairly complex: 1) Modifications to multiple objects can become visible to the system atomically 2) On error, a series of modifications which had been grouped together within a transaction can be rolled back 3) Using object version information, determine which version of which object is visible to a given session 4) Using version information and locking, detect and resolve read/write and write/write conflicts Now, I can see a file system offering (1) and (2). But a file system that can allow people to do (3) and (4) would require that we make *major* modifications to how postgresql is implemented. More over, it would be for no gain, since we've already written a system which can do it. A filesystem could, in theory, help us by providing an API which allows us to tell the file system either: the way we'd like it to read ahead, the fact that we don't want it to read ahead or the way we'd like it to cache (or not cache) data. The thing is, most OSes provide interfaces to do this already and we make only little use of them (I'm think of madv_sequential(), madv_random(), POSIX fadvise(), the various flags to open() which AIX, HPUX, Solaris provide). Gavin ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq