On 10/14/03 8:26 PM, "Greg Stark" <[EMAIL PROTECTED]> wrote: > > All the more reason Postgres's view of the world should maybe be represented > there. As it turns out Linus seems unsympathetic to the O_DIRECT approach and > seems more interested in building a better kernel interface to control caching > and i/o scheduling. Something that fits better with postgres's design than > Oracle's.
This would certainly help Postgres as currently written, but it won't have the theoretical performance headroom of what Oracle wants. A practical kernel API is too narrow to be fully aware of and exploit database state. And then there is the portability issue... The way you want these kinds of things implemented in an operating system kernel are somewhat orthogonal to how you want them implemented from the perspective of a database kernel. Typical resource use cases for an operating system and a database engine make pretty different assumptions and the best you'll get is a compromise that doesn't optimize either. Making additional optimizations to the OS kernel works great for Postgres (on Linux, at least) because currently very little is optimized in this regard. Basically Linus is doing some design optimization work for us. An improvement, but kind of a mediocre one in the big scheme of things and not terribly portable. If we suddenly wanted to optimize Postgres for performance the way Oracle does, we would be a lot more keen on the O_DIRECT approach. > Actually I think it would be useful for the WAL. As I understand it there's no > point caching the WAL and every write is going to get synced anyways so > there's no point in buffering it either. The sooner the process can find out > it's been synced the better. But I'm not really 100% up on the way the WAL is > used so I could be wrong. Aye, I think you may be correct. > Bah. So Oracle has to live with whatever OS features VMS had 20 years ago. It > has to reimplement whatever I/O scheduling or other strategies it wants. > Rather than being the escape from the "lowest common denominator" it is in > fact precisely the cause of it. You appear to have completely missed the point. The point of the abstraction layer is so they can optimize the hell out of the database for every single platform they support without having to rewrite a bunch of the database every time. The database kernel API is BETTER AND MORE OPTIMAL than the operating system API. It allows them to use whatever memory management scheme, I/O scheme, etc is the best for every single platform. If "the best" happens to going to the native OS service, then that is what they do, but most of the code doesn't need to know this if the abstraction layer is well-designed. Most of the code in a DBMS does not care where memory comes from, how its managed, what the file system actually looks like, or how I/O is done. As long as the behavior is the same from the database kernel API it is writing to, it is all good. What this means from a practical standpoint is that you don't *have* to use SysV IPC on every platform, or POSIX, or mmap, or whatever. You can use whatever that particular platform likes as long it can be mapped into the database kernel API, which tends to be at a high enough level that just about *any* reasonable implementation of an OS API can be mapped into it with quite a bit of optimization. > You describe Postgres as if abstraction is a foreign concept to it. Much > better to have well designed minimal abstractions for each of the resources > needed, rather than trying to turn every OS you meet into the first one you > met. You have a serious misconception of what a database kernel is and looks like. A database kernel doesn't look like the OS kernel that is mapped to it. You write a database kernel API that is idealized for database usage and provides services specifically designed for the needs of a database. It is a high-level API, not a mirror copy of standard OS APIs; if you did that, you wouldn't have any room to do the database kernel implementation. You then build an implementation of the API on the local system using whatever operating system interfaces suit your fancy. The API is simple enough and small enough that this isn't particularly difficult to do in a typical case. And you can write a default kernel that is portable "as is" to most operating systems. There is some abstraction in Postgres and the database is well-written, but it isn't written in a manner that makes it easy to swap out operating system or API models. It is written to be portable at all levels. A database kernel isn't necessarily required to be portable at the very lowest level, but it is vastly more optimizable because you aren't forced into a narrow set of choices for interfacing with the operating system. Operating system APIs are not particularly well-suited for databases, and if you force a database to adhere to operating system APIs directly, you end up with a suboptimal situation almost every single time. You end with implementations that you never would have done if you were targeting the database for only that platform. Using a database kernel lets you make platform specific optimizations and API selections without forcing most of the database code to be aware of it. Perhaps more to the point, who gives a damn what optimizations Linus puts in the Linux kernel. What good does that do Postgres users on FreeBSD, or OSX, or Windows? Abstracting a database engine to a set of operating system APIs is never going to give stellar or even results across all platforms because the operating system APIs usually aren't written so that you could write your database optimally. Theoretically, it is the difference between middling performance in the typical case and highly optimal in just about every case. A database kernel lets you use an operating system in the way it likes to be used rather than using an API that you just happen to support. Cheers, -James Rogers [EMAIL PROTECTED] ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings