Re: [HACKERS] Database Kernels and O_DIRECT
Tom Lane wrote: James Rogers [EMAIL PROTECTED] writes: If we suddenly wanted to optimize Postgres for performance the way Oracle does, we would be a lot more keen on the O_DIRECT approach. This isn't ever going to happen, for the simple reason that we don't have Oracle's manpower. You are blithely throwing around the phrase database kernel like it would be a small simple project. In reality you are talking about (at least) implementing our own complete filesystem, and then doing it over again on every platform we want to support, and then after that, optimizing it to the point of actually being enough better than the native facilities to have been worth the effort. I cannot conceive of that happening in a Postgres project that even remotely resembles the present reality, because we just don't have the manpower; and what manpower we do have is better spent on other tasks. We have other things to do than re-invent the operating system wheel. Improving the planner, for example. One question is what a database kernel would look like? Would it basically mean just taking our existing portability code, such as for shared memory, and moving it into a separate libary with its own API? Don't we almost have that already? I am just confused what would be different? I think the only major difference I have heard is to bypass the OS file system and memory management. We already bypass most of the memory management by using palloc. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Database Kernels and O_DIRECT
[EMAIL PROTECTED] (Andrew Dunstan) writes: Tom Lane wrote: James Rogers [EMAIL PROTECTED] writes: If we suddenly wanted to optimize Postgres for performance the way Oracle does, we would be a lot more keen on the O_DIRECT approach. This isn't ever going to happen, for the simple reason that we don't have Oracle's manpower. [snip - long and sensible elaboration of above statement] I have wondered (somewhat fruitlessly) for several years about the possibilities of special purpose lightweight file systems that could relax some of the assumptions and checks used in general purpose file systems. Such a thing might provide most of the benefits of a database kernel without imposing anything extra on the database application layer. Just a thought - I have no resources to make any attack on such a project. There is an exactly relevant project for this, namely Hans Reiser's ReiserFS, on Linux. http://www.namesys.com/whitepaper.html In Version 4, they will be exporting an API that allows userspace applications to control the use of transactional filesystem updates. If someone were to directly build a database on top of this, one might wind up with some sort of ReiserSQL, which would be relatively analagous to the database kernel approach. Of course, the task would be large, and it would likely take _years_ for it to stabilize to the point of being much more than a neat hack. The other neat approach that would be more relevant to PostgreSQL would be to create a filesystem that stored data in pure blocks, with pretty large block sizes, and low overhead for saving directory metadata. There isn't too terribly much interest in {a,o,m}time... -- output = reverse(ofni.smrytrebil @ enworbbc) http://dev6.int.libertyrms.com/ Christopher Browne (416) 646 3304 x124 (land) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Database Kernels and O_DIRECT
On 10/14/03 8:26 PM, Greg Stark [EMAIL PROTECTED] wrote: All the more reason Postgres's view of the world should maybe be represented there. As it turns out Linus seems unsympathetic to the O_DIRECT approach and seems more interested in building a better kernel interface to control caching and i/o scheduling. Something that fits better with postgres's design than Oracle's. This would certainly help Postgres as currently written, but it won't have the theoretical performance headroom of what Oracle wants. A practical kernel API is too narrow to be fully aware of and exploit database state. And then there is the portability issue... The way you want these kinds of things implemented in an operating system kernel are somewhat orthogonal to how you want them implemented from the perspective of a database kernel. Typical resource use cases for an operating system and a database engine make pretty different assumptions and the best you'll get is a compromise that doesn't optimize either. Making additional optimizations to the OS kernel works great for Postgres (on Linux, at least) because currently very little is optimized in this regard. Basically Linus is doing some design optimization work for us. An improvement, but kind of a mediocre one in the big scheme of things and not terribly portable. If we suddenly wanted to optimize Postgres for performance the way Oracle does, we would be a lot more keen on the O_DIRECT approach. Actually I think it would be useful for the WAL. As I understand it there's no point caching the WAL and every write is going to get synced anyways so there's no point in buffering it either. The sooner the process can find out it's been synced the better. But I'm not really 100% up on the way the WAL is used so I could be wrong. Aye, I think you may be correct. Bah. So Oracle has to live with whatever OS features VMS had 20 years ago. It has to reimplement whatever I/O scheduling or other strategies it wants. Rather than being the escape from the lowest common denominator it is in fact precisely the cause of it. You appear to have completely missed the point. The point of the abstraction layer is so they can optimize the hell out of the database for every single platform they support without having to rewrite a bunch of the database every time. The database kernel API is BETTER AND MORE OPTIMAL than the operating system API. It allows them to use whatever memory management scheme, I/O scheme, etc is the best for every single platform. If the best happens to going to the native OS service, then that is what they do, but most of the code doesn't need to know this if the abstraction layer is well-designed. Most of the code in a DBMS does not care where memory comes from, how its managed, what the file system actually looks like, or how I/O is done. As long as the behavior is the same from the database kernel API it is writing to, it is all good. What this means from a practical standpoint is that you don't *have* to use SysV IPC on every platform, or POSIX, or mmap, or whatever. You can use whatever that particular platform likes as long it can be mapped into the database kernel API, which tends to be at a high enough level that just about *any* reasonable implementation of an OS API can be mapped into it with quite a bit of optimization. You describe Postgres as if abstraction is a foreign concept to it. Much better to have well designed minimal abstractions for each of the resources needed, rather than trying to turn every OS you meet into the first one you met. You have a serious misconception of what a database kernel is and looks like. A database kernel doesn't look like the OS kernel that is mapped to it. You write a database kernel API that is idealized for database usage and provides services specifically designed for the needs of a database. It is a high-level API, not a mirror copy of standard OS APIs; if you did that, you wouldn't have any room to do the database kernel implementation. You then build an implementation of the API on the local system using whatever operating system interfaces suit your fancy. The API is simple enough and small enough that this isn't particularly difficult to do in a typical case. And you can write a default kernel that is portable as is to most operating systems. There is some abstraction in Postgres and the database is well-written, but it isn't written in a manner that makes it easy to swap out operating system or API models. It is written to be portable at all levels. A database kernel isn't necessarily required to be portable at the very lowest level, but it is vastly more optimizable because you aren't forced into a narrow set of choices for interfacing with the operating system. Operating system APIs are not particularly well-suited for databases, and if you force a database to adhere to operating system APIs directly, you end up with a suboptimal situation almost every single time.
Re: [HACKERS] Database Kernels and O_DIRECT
On 10/14/03 11:31 PM, James Rogers [EMAIL PROTECTED] wrote: There is some abstraction in Postgres and the database is well-written, but it isn't written in a manner that makes it easy to swap out operating system or API models. It is written to be portable at all levels. A database kernel isn't necessarily required to be portable at the very lowest level, but it is vastly more optimizable because you aren't forced into a narrow set of choices for interfacing with the operating system. Just to clarify, my post wasn't really to say that we should run out and make Postgres use a database kernel type internal model tomorrow. The point of all that was that Oracle does things that way for a very good reason and that there can be benefits that may not be immediately obvious. It is really one of those emergent needs when a database engine gets to a certain level of sophistication. For smaller and simpler databases, you don't really need it and the effort isn't justified. At some point, you cross a threshold where not only does it become justified but it becomes a wise idea or not having it will start to punish you in a number of different ways. I personally think that Postgres is sitting on the cusp of its a wise idea, and that it is something worth thinking about in the future. Cheers, -James Rogers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Database Kernels and O_DIRECT
Greg Stark wrote: James Rogers [EMAIL PROTECTED] writes: Someone from Oracle is on there explaining what Oracle's needs are. Perhaps someone more knowledgable than myself could explain what would most help postgres in this area. There is an important difference between Oracle and Postgres that makes discussions of this complicated because the assumptions are different. All the more reason Postgres's view of the world should maybe be represented there. As it turns out Linus seems unsympathetic to the O_DIRECT approach and seems more interested in building a better kernel interface to control caching and i/o scheduling. Something that fits better with postgres's design than Oracle's. Of course, the big question is why Oracle is even there talking to Linus, and Linus isn't asking to get PostgreSQL involved. If you are running an open-source project, you would think you would give favor to other open-source projects. Same with MySQL favortism --- if you are writing an open-source tool, why favor a database developed/controlled by a single company? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Database Kernels and O_DIRECT
Of course, the big question is why Oracle is even there talking to Linus, and Linus isn't asking to get PostgreSQL involved. If you are running an open-source project, you would think you would give favor to other open-source projects. Same with MySQL favortism --- if you are writing an open-source tool, why favor a database developed/controlled by a single company? It's the unix style: no message, no error... If Postgres developers do not send any message to Linus he will think Linux is doing just fine for them. Seems that Oracle cares to improve their Linux port so they asked Linus some features. I doubt Linus runned to Oracle asking please, how could I help you improve your closed software project?. Kernel folks seems to be very busy people. IMHO if we see any window for improvement in any OS, we should go to Linus (or Peter or Bill Gates) and ask for it. As wrote in the original post. Regards, -- Paulo Scardine ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Database Kernels and O_DIRECT
James Rogers [EMAIL PROTECTED] writes: If we suddenly wanted to optimize Postgres for performance the way Oracle does, we would be a lot more keen on the O_DIRECT approach. This isn't ever going to happen, for the simple reason that we don't have Oracle's manpower. You are blithely throwing around the phrase database kernel like it would be a small simple project. In reality you are talking about (at least) implementing our own complete filesystem, and then doing it over again on every platform we want to support, and then after that, optimizing it to the point of actually being enough better than the native facilities to have been worth the effort. I cannot conceive of that happening in a Postgres project that even remotely resembles the present reality, because we just don't have the manpower; and what manpower we do have is better spent on other tasks. We have other things to do than re-invent the operating system wheel. Improving the planner, for example. One of the first concepts I learned in CS grad school was that of optimizing a system at multiple levels. If the hardware guys can build a 2X faster CPU, and the operating system guys can find a 2X improvement in (say) filesystem performance, and then the application guys can find a 2X improvement in their algorithms, you've got 8X total speedup, which might have been impossible or at least vastly harder to get by working at only one level of the system. The lesson for Postgres is that we should not be trying to beat the operating system guys at their own game. It's unclear that we can anyway, and we can certainly get more bang for our optimization buck by working at system levels that don't correspond to operating-system concerns. I tend to agree with the opinion that Oracle's architecture is based on twenty-year-old assumptions. Back then it was reasonable to assume that database-specific algorithms could outperform a general-purpose operating system. In today's environment that assumption is not a given. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Database Kernels and O_DIRECT
Tom Lane wrote: James Rogers [EMAIL PROTECTED] writes: If we suddenly wanted to optimize Postgres for performance the way Oracle does, we would be a lot more keen on the O_DIRECT approach. This isn't ever going to happen, for the simple reason that we don't have Oracle's manpower. [snip - long and sensible elaboration of above statement] I have wondered (somewhat fruitlessly) for several years about the possibilities of special purpose lightweight file systems that could relax some of the assumptions and checks used in general purpose file systems. Such a thing might provide most of the benefits of a database kernel without imposing anything extra on the database application layer. Just a thought - I have no resources to make any attack on such a project. cheers andrew ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Database Kernels and O_DIRECT
James Rogers kirjutas K, 15.10.2003 kell 11:26: On 10/14/03 11:31 PM, James Rogers [EMAIL PROTECTED] wrote: There is some abstraction in Postgres and the database is well-written, but it isn't written in a manner that makes it easy to swap out operating system or API models. It is written to be portable at all levels. A database kernel isn't necessarily required to be portable at the very lowest level, but it is vastly more optimizable because you aren't forced into a narrow set of choices for interfacing with the operating system. Just to clarify, my post wasn't really to say that we should run out and make Postgres use a database kernel type internal model tomorrow. The point of all that was that Oracle does things that way for a very good reason and that there can be benefits that may not be immediately obvious. OTOH, what may be a perfectly good reason for Oracle, may not be it for PostgreSQL. For me the beauty of OS software has always been the possibility to fix problems at the right level (kernel, library, language) , and not to just make workarounds at another level (your application). So getting some API's into kernel for optimizing cache usage or writeback strategies would be much better than using raw writes and rewriting the whole thing ourseleves. The newer linux kernels have several schedulers to choose from, why not push for choice in other areas as well. The ultimate database kernel could thus be a custom tuned linux kernel ;) It is really one of those emergent needs when a database engine gets to a certain level of sophistication. For smaller and simpler databases, you don't really need it and the effort isn't justified. At some point, you cross a threshold where not only does it become justified but it becomes a wise idea or not having it will start to punish you in a number of different ways. I personally think that Postgres is sitting on the cusp of its a wise idea, and that it is something worth thinking about in the future. This thread reminds me of Linus/Tannenbaum Monolithic vs. Microkernel argument - while theoretically Microkernels are better Linux could outperform it by having the required modularity on source level, and being an open-source project this was enough. It also beat the Mach kernel by being there whereas microkernel based mach was too hard to develop/debug and thus has taken way longer to mature. -- Hannu ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Database Kernels and O_DIRECT
Tom == Tom Lane [EMAIL PROTECTED] writes: Tom I tend to agree with the opinion that Oracle's architecture Tom is based on twenty-year-old assumptions. Back then it was Tom reasonable to assume that database-specific algorithms could Tom outperform a general-purpose operating system. In today's Tom environment that assumption is not a given. In fact: Michael Stonebraker: Operating System Support for Database Management. CACM 24(7): 412-418 (1981) Abstract: Several operating system services are examined with a view toward their applicability to support of database management functions. These services include buffer pool management; the file system; scheduling, process management, and interprocess communication; and consistency control. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Database Kernels and O_DIRECT
Andrew Dunstan wrote: I have wondered (somewhat fruitlessly) for several years about the possibilities of special purpose lightweight file systems that could relax some of the assumptions and checks used in general purpose file systems. Such a thing might provide most of the benefits of a database kernel without imposing anything extra on the database application layer. CPU is usually cheap compared to disk io. There are two things that might be worth looking into: Oracle released their cluster filesystem (ocfs) as a GPL driver for Linux. It might be interesting to check how it performs if used for postgres, but I fear that it implicitely assumes that the bulk of the caching is performed by the database in user space. And using O_DIRECT for the WAL logs - the logs are never read. -- Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Database Kernels and O_DIRECT
James Rogers [EMAIL PROTECTED] writes: Someone from Oracle is on there explaining what Oracle's needs are. Perhaps someone more knowledgable than myself could explain what would most help postgres in this area. There is an important difference between Oracle and Postgres that makes discussions of this complicated because the assumptions are different. All the more reason Postgres's view of the world should maybe be represented there. As it turns out Linus seems unsympathetic to the O_DIRECT approach and seems more interested in building a better kernel interface to control caching and i/o scheduling. Something that fits better with postgres's design than Oracle's. the former case, it is very useful and conducive to better performance to have O_DIRECT and direct control of the I/O in general -- the more, the better. In the latter case (e.g. Postgres), it is more of a nuisance and difficult to exploit well. Actually I think it would be useful for the WAL. As I understand it there's no point caching the WAL and every write is going to get synced anyways so there's no point in buffering it either. The sooner the process can find out it's been synced the better. But I'm not really 100% up on the way the WAL is used so I could be wrong. The point of having a database kernel underneath the DBMS is two-fold. First, it improves portability by acting as an operating system abstraction layer, replacing OS kernel services with its own equivalents Bah. So Oracle has to live with whatever OS features VMS had 20 years ago. It has to reimplement whatever I/O scheduling or other strategies it wants. Rather than being the escape from the lowest common denominator it is in fact precisely the cause of it. You describe Postgres as if abstraction is a foreign concept to it. Much better to have well designed minimal abstractions for each of the resources needed, rather than trying to turn every OS you meet into the first one you met. -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]