Re: [HACKERS] Database Kernels and O_DIRECT

2003-10-25 Thread Bruce Momjian
Tom Lane wrote:
 James Rogers [EMAIL PROTECTED] writes:
  If we suddenly wanted to optimize Postgres for performance the way
  Oracle does, we would be a lot more keen on the O_DIRECT approach.
 
 This isn't ever going to happen, for the simple reason that we don't
 have Oracle's manpower.  You are blithely throwing around the phrase
 database kernel like it would be a small simple project.  In reality
 you are talking about (at least) implementing our own complete
 filesystem, and then doing it over again on every platform we want to
 support, and then after that, optimizing it to the point of actually
 being enough better than the native facilities to have been worth the
 effort.  I cannot conceive of that happening in a Postgres project that
 even remotely resembles the present reality, because we just don't have
 the manpower; and what manpower we do have is better spent on other
 tasks.  We have other things to do than re-invent the operating system
 wheel.  Improving the planner, for example.

One question is what a database kernel would look like?  Would it
basically mean just taking our existing portability code, such as for
shared memory, and moving it into a separate libary with its own API? 
Don't we almost have that already?

I am just confused what would be different?  I think the only major
difference I have heard is to bypass the OS file system and memory
management.  We already bypass most of the memory management by using
palloc.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Database Kernels and O_DIRECT

2003-10-16 Thread Christopher Browne
[EMAIL PROTECTED] (Andrew Dunstan) writes:
 Tom Lane wrote:
James Rogers [EMAIL PROTECTED] writes:
If we suddenly wanted to optimize Postgres for performance the way
Oracle does, we would be a lot more keen on the O_DIRECT approach.
This isn't ever going to happen, for the simple reason that we don't
 have Oracle's manpower.

 [snip - long and sensible elaboration of above statement]

 I have wondered (somewhat fruitlessly) for several years about the
 possibilities of special purpose lightweight file systems that could
 relax some of the assumptions and checks used in general purpose file
 systems. Such a thing might provide most of the benefits of a
 database kernel without imposing anything extra on the database
 application layer.

 Just a thought - I have no resources to make any attack on such a project.

There is an exactly relevant project for this, namely Hans Reiser's
ReiserFS, on Linux.

http://www.namesys.com/whitepaper.html

In Version 4, they will be exporting an API that allows userspace
applications to control the use of transactional filesystem updates.

If someone were to directly build a database on top of this, one might
wind up with some sort of ReiserSQL, which would be relatively
analagous to the database kernel approach.

Of course, the task would be large, and it would likely take _years_
for it to stabilize to the point of being much more than a neat
hack.

The other neat approach that would be more relevant to PostgreSQL
would be to create a filesystem that stored data in pure blocks, with
pretty large block sizes, and low overhead for saving directory
metadata.  There isn't too terribly much interest in {a,o,m}time...
-- 
output = reverse(ofni.smrytrebil @ enworbbc)
http://dev6.int.libertyrms.com/
Christopher Browne
(416) 646 3304 x124 (land)

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Database Kernels and O_DIRECT

2003-10-15 Thread James Rogers
On 10/14/03 8:26 PM, Greg Stark [EMAIL PROTECTED] wrote:
 
 All the more reason Postgres's view of the world should maybe be represented
 there. As it turns out Linus seems unsympathetic to the O_DIRECT approach and
 seems more interested in building a better kernel interface to control caching
 and i/o scheduling. Something that fits better with postgres's design than
 Oracle's.


This would certainly help Postgres as currently written, but it won't have
the theoretical performance headroom of what Oracle wants.  A practical
kernel API is too narrow to be fully aware of and exploit database state.
And then there is the portability issue...

The way you want these kinds of things implemented in an operating system
kernel are somewhat orthogonal to how you want them implemented from the
perspective of a database kernel.  Typical resource use cases for an
operating system and a database engine make pretty different assumptions and
the best you'll get is a compromise that doesn't optimize either.

Making additional optimizations to the OS kernel works great for Postgres
(on Linux, at least) because currently very little is optimized in this
regard.  Basically Linus is doing some design optimization work for us.  An
improvement, but kind of a mediocre one in the big scheme of things and not
terribly portable.  If we suddenly wanted to optimize Postgres for
performance the way Oracle does, we would be a lot more keen on the O_DIRECT
approach.

 
 Actually I think it would be useful for the WAL. As I understand it there's no
 point caching the WAL and every write is going to get synced anyways so
 there's no point in buffering it either. The sooner the process can find out
 it's been synced the better. But I'm not really 100% up on the way the WAL is
 used so I could be wrong.


Aye, I think you may be correct.

 
 Bah. So Oracle has to live with whatever OS features VMS had 20 years ago. It
 has to reimplement whatever I/O scheduling or other strategies it wants.
 Rather than being the escape from the lowest common denominator it is in
 fact precisely the cause of it.


You appear to have completely missed the point.

The point of the abstraction layer is so they can optimize the hell out of
the database for every single platform they support without having to
rewrite a bunch of the database every time.  The database kernel API is
BETTER AND MORE OPTIMAL than the operating system API. It allows them to use
whatever memory management scheme, I/O scheme, etc is the best for every
single platform.  If the best happens to going to the native OS service,
then that is what they do, but most of the code doesn't need to know this if
the abstraction layer is well-designed.

Most of the code in a DBMS does not care where memory comes from, how its
managed, what the file system actually looks like, or how I/O is done.  As
long as the behavior is the same from the database kernel API it is writing
to, it is all good.  What this means from a practical standpoint is that you
don't *have* to use SysV IPC on every platform, or POSIX, or mmap, or
whatever.  You can use whatever that particular platform likes as long it
can be mapped into the database kernel API, which tends to be at a high
enough level that just about *any* reasonable implementation of an OS API
can be mapped into it with quite a bit of optimization.


 You describe Postgres as if abstraction is a foreign concept to it. Much
 better to have well designed minimal abstractions for each of the resources
 needed, rather than trying to turn every OS you meet into the first one you
 met.
 

You have a serious misconception of what a database kernel is and looks
like.

A database kernel doesn't look like the OS kernel that is mapped to it.  You
write a database kernel API that is idealized for database usage and
provides services specifically designed for the needs of a database.  It is
a high-level API, not a mirror copy of standard OS APIs; if you did that,
you wouldn't have any room to do the database kernel implementation.  You
then build an implementation of the API on the local system using whatever
operating system interfaces suit your fancy.  The API is simple enough and
small enough that this isn't particularly difficult to do in a typical case.
And you can write a default kernel that is portable as is to most
operating systems.

There is some abstraction in Postgres and the database is well-written, but
it isn't written in a manner that makes it easy to swap out operating system
or API models.  It is written to be portable at all levels.  A database
kernel isn't necessarily required to be portable at the very lowest level,
but it is vastly more optimizable because you aren't forced into a narrow
set of choices for interfacing with the operating system.

Operating system APIs are not particularly well-suited for databases, and if
you force a database to adhere to operating system APIs directly, you end up
with a suboptimal situation almost every single time. 

Re: [HACKERS] Database Kernels and O_DIRECT

2003-10-15 Thread James Rogers
On 10/14/03 11:31 PM, James Rogers [EMAIL PROTECTED] wrote:
 
 There is some abstraction in Postgres and the database is well-written, but
 it isn't written in a manner that makes it easy to swap out operating system
 or API models.  It is written to be portable at all levels.  A database
 kernel isn't necessarily required to be portable at the very lowest level,
 but it is vastly more optimizable because you aren't forced into a narrow
 set of choices for interfacing with the operating system.


Just to clarify, my post wasn't really to say that we should run out and
make Postgres use a database kernel type internal model tomorrow.  The point
of all that was that Oracle does things that way for a very good reason and
that there can be benefits that may not be immediately obvious.

It is really one of those emergent needs when a database engine gets to a
certain level of sophistication.  For smaller and simpler databases, you
don't really need it and the effort isn't justified.  At some point, you
cross a threshold where not only does it become justified but it becomes a
wise idea or not having it will start to punish you in a number of different
ways.  I personally think that Postgres is sitting on the cusp of its a
wise idea, and that it is something worth thinking about in the future.

Cheers,

-James Rogers
 [EMAIL PROTECTED]


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Database Kernels and O_DIRECT

2003-10-15 Thread Bruce Momjian
Greg Stark wrote:
 
 James Rogers [EMAIL PROTECTED] writes:
  
   Someone from Oracle is on there explaining what Oracle's needs are. Perhaps
   someone more knowledgable than myself could explain what would most help
   postgres in this area.
  
  
  There is an important difference between Oracle and Postgres that makes
  discussions of this complicated because the assumptions are different.
 
 All the more reason Postgres's view of the world should maybe be represented
 there. As it turns out Linus seems unsympathetic to the O_DIRECT approach and
 seems more interested in building a better kernel interface to control caching
 and i/o scheduling. Something that fits better with postgres's design than
 Oracle's.

Of course, the big question is why Oracle is even there talking to
Linus, and Linus isn't asking to get PostgreSQL involved.  If you are
running an open-source project, you would think you would give favor to
other open-source projects.  Same with MySQL favortism --- if you are
writing an open-source tool, why favor a database developed/controlled
by a single company?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Database Kernels and O_DIRECT

2003-10-15 Thread Paulo Scardine
 Of course, the big question is why Oracle is even there talking to
 Linus, and Linus isn't asking to get PostgreSQL involved.  If you are
 running an open-source project, you would think you would give favor to
 other open-source projects.  Same with MySQL favortism --- if you are
 writing an open-source tool, why favor a database developed/controlled
 by a single company?

It's the unix style: no message, no error... If Postgres developers do not
send any message to Linus he will think Linux is doing just fine for them.

Seems that Oracle cares to improve their Linux port so they asked Linus some
features. I doubt Linus runned to Oracle asking please, how could I help
you improve your closed software project?. Kernel folks seems to be very
busy people.

IMHO if we see any window for improvement in any OS, we should go to Linus
(or Peter or Bill Gates) and ask for it. As wrote in the original post.

Regards,
--
Paulo Scardine



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Database Kernels and O_DIRECT

2003-10-15 Thread Tom Lane
James Rogers [EMAIL PROTECTED] writes:
 If we suddenly wanted to optimize Postgres for performance the way
 Oracle does, we would be a lot more keen on the O_DIRECT approach.

This isn't ever going to happen, for the simple reason that we don't
have Oracle's manpower.  You are blithely throwing around the phrase
database kernel like it would be a small simple project.  In reality
you are talking about (at least) implementing our own complete
filesystem, and then doing it over again on every platform we want to
support, and then after that, optimizing it to the point of actually
being enough better than the native facilities to have been worth the
effort.  I cannot conceive of that happening in a Postgres project that
even remotely resembles the present reality, because we just don't have
the manpower; and what manpower we do have is better spent on other
tasks.  We have other things to do than re-invent the operating system
wheel.  Improving the planner, for example.

One of the first concepts I learned in CS grad school was that of
optimizing a system at multiple levels.  If the hardware guys can build
a 2X faster CPU, and the operating system guys can find a 2X improvement
in (say) filesystem performance, and then the application guys can find
a 2X improvement in their algorithms, you've got 8X total speedup, which
might have been impossible or at least vastly harder to get by working
at only one level of the system.  The lesson for Postgres is that we
should not be trying to beat the operating system guys at their own
game.  It's unclear that we can anyway, and we can certainly get more
bang for our optimization buck by working at system levels that don't
correspond to operating-system concerns.

I tend to agree with the opinion that Oracle's architecture is based on
twenty-year-old assumptions.  Back then it was reasonable to assume that
database-specific algorithms could outperform a general-purpose
operating system.  In today's environment that assumption is not a given.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Database Kernels and O_DIRECT

2003-10-15 Thread Andrew Dunstan
Tom Lane wrote:

James Rogers [EMAIL PROTECTED] writes:
 

If we suddenly wanted to optimize Postgres for performance the way
Oracle does, we would be a lot more keen on the O_DIRECT approach.
   

This isn't ever going to happen, for the simple reason that we don't
have Oracle's manpower.  

[snip - long and sensible elaboration of above statement]

I have wondered (somewhat fruitlessly) for several years about the 
possibilities of special purpose lightweight file systems that could 
relax some of the assumptions and checks used in general purpose file 
systems. Such a thing might provide most of the benefits of a database 
kernel without imposing anything extra on the database application layer.

Just a thought - I have no resources to make any attack on such a project.

cheers

andrew

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Database Kernels and O_DIRECT

2003-10-15 Thread Hannu Krosing
James Rogers kirjutas K, 15.10.2003 kell 11:26:
 On 10/14/03 11:31 PM, James Rogers [EMAIL PROTECTED] wrote:
  
  There is some abstraction in Postgres and the database is well-written, but
  it isn't written in a manner that makes it easy to swap out operating system
  or API models.  It is written to be portable at all levels.  A database
  kernel isn't necessarily required to be portable at the very lowest level,
  but it is vastly more optimizable because you aren't forced into a narrow
  set of choices for interfacing with the operating system.
 
 
 Just to clarify, my post wasn't really to say that we should run out and
 make Postgres use a database kernel type internal model tomorrow.  The point
 of all that was that Oracle does things that way for a very good reason and
 that there can be benefits that may not be immediately obvious.

OTOH, what may be a perfectly good reason for Oracle, may not be it for
PostgreSQL.

For me the beauty of OS software has always been the possibility to fix
problems at the right level (kernel, library, language) , and not to
just make workarounds at another level (your application).

So getting some API's into kernel for optimizing cache usage or
writeback strategies would be much better than using raw writes and
rewriting the whole thing ourseleves. 

The newer linux kernels have several schedulers to choose from, why not
push for choice in other areas as well.

The ultimate database kernel could thus be a custom tuned linux kernel
;)

 It is really one of those emergent needs when a database engine gets to a
 certain level of sophistication.  For smaller and simpler databases, you
 don't really need it and the effort isn't justified.  At some point, you
 cross a threshold where not only does it become justified but it becomes a
 wise idea or not having it will start to punish you in a number of different
 ways.  I personally think that Postgres is sitting on the cusp of its a
 wise idea, and that it is something worth thinking about in the future.

This thread reminds me of Linus/Tannenbaum Monolithic vs. Microkernel
argument - while theoretically Microkernels are better Linux could
outperform it by having the required modularity on source level, and
being an open-source project this was enough. It also beat the Mach
kernel by being there whereas microkernel based mach was too hard to
develop/debug and thus has taken way longer to mature.

--
Hannu


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Database Kernels and O_DIRECT

2003-10-15 Thread Sailesh Krishnamurthy
 Tom == Tom Lane [EMAIL PROTECTED] writes:

Tom I tend to agree with the opinion that Oracle's architecture
Tom is based on twenty-year-old assumptions.  Back then it was
Tom reasonable to assume that database-specific algorithms could
Tom outperform a general-purpose operating system.  In today's
Tom environment that assumption is not a given.


In fact: 

   Michael Stonebraker: Operating System Support for Database Management. 
   CACM 24(7): 412-418 (1981)

   Abstract: 

 Several operating system services are examined with a
 view toward their applicability to support of database
 management functions. These services include buffer pool
 management; the file system; scheduling, process
 management, and interprocess communication; and
 consistency control.

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Database Kernels and O_DIRECT

2003-10-15 Thread Manfred Spraul
Andrew Dunstan wrote:

I have wondered (somewhat fruitlessly) for several years about the 
possibilities of special purpose lightweight file systems that could 
relax some of the assumptions and checks used in general purpose file 
systems. Such a thing might provide most of the benefits of a 
database kernel without imposing anything extra on the database 
application layer.
CPU is usually cheap compared to disk io.

There are two things that might be worth looking into:
Oracle released their cluster filesystem (ocfs) as a GPL driver for 
Linux. It might be interesting to check how it performs if used for 
postgres, but I fear that it implicitely assumes that the bulk of the 
caching is performed by the database in user space.
And using O_DIRECT for the WAL logs - the logs are never read.

--
   Manfred


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Database Kernels and O_DIRECT

2003-10-14 Thread Greg Stark

James Rogers [EMAIL PROTECTED] writes:
 
  Someone from Oracle is on there explaining what Oracle's needs are. Perhaps
  someone more knowledgable than myself could explain what would most help
  postgres in this area.
 
 
 There is an important difference between Oracle and Postgres that makes
 discussions of this complicated because the assumptions are different.

All the more reason Postgres's view of the world should maybe be represented
there. As it turns out Linus seems unsympathetic to the O_DIRECT approach and
seems more interested in building a better kernel interface to control caching
and i/o scheduling. Something that fits better with postgres's design than
Oracle's.

 the former case, it is very useful and conducive to better performance
 to have O_DIRECT and direct control of the I/O in general -- the more,
 the better.  In the latter case (e.g. Postgres), it is more of a
 nuisance and difficult to exploit well.

Actually I think it would be useful for the WAL. As I understand it there's no
point caching the WAL and every write is going to get synced anyways so
there's no point in buffering it either. The sooner the process can find out
it's been synced the better. But I'm not really 100% up on the way the WAL is
used so I could be wrong.

 The point of having a database kernel underneath the DBMS is two-fold.  
 
 First, it improves portability by acting as an operating system
 abstraction layer, replacing OS kernel services with its own equivalents

Bah. So Oracle has to live with whatever OS features VMS had 20 years ago. It
has to reimplement whatever I/O scheduling or other strategies it wants.
Rather than being the escape from the lowest common denominator it is in
fact precisely the cause of it.

You describe Postgres as if abstraction is a foreign concept to it. Much
better to have well designed minimal abstractions for each of the resources
needed, rather than trying to turn every OS you meet into the first one you
met.


-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]