On Tue, Oct 04, 2005 at 11:06:54PM -0400, Ron Peacetree wrote:
> Unfortunately, no matter what I say or do, I'm not going to please
> or convince anyone who has already have made their minds up
> to the extent that they post comments like Mr Trainor's below.
> His response style pretty much proves my earlier point that this
> is presently a religious issue within the pg community.
Religious for some. Conservative for others.
Sometimes people need to see the way, before they are willing to
accept it merely on the say so of another person. In some circles, it
is called the scientific method... :-)
Also, there is a cost to complicated specific optimizations. They can
be a real maintenance and portability head-ache. What is the value ratio
of performance to maintenance or portability?
> The absolute best proof would be to build a version of pg that does
> what Oracle and DB2 have done and implement it's own DB
> specific memory manager and then compare the performance
> between the two versions on the same HW, OS, and schema.
Not necessarily. Even if a version of PostgreSQL were to be written to
function in this new model, there would be no guarantee that it was
written in the most efficient manner possible. Performance could show
PostgreSQL using its own caching, and disk space management
implementation, and performing poorly. The only true, and accurate way
would be to implement, and then invest time by those most competent to
test, and optimize the implementation. At this point, it would become
a moving target, as those who believe otherwise, would be free to
pursue using more efficient file systems, or modifications to the
operating system to better co-operate with PostgreSQL.
I don't think there can be a true answer to this one. The more
innovative, and clever people, will always be able to make their
solution work better. If the difference in performance was really so
obvious, there wouldn't be doubters on either side. It would be clear
to all. The fact is, there is reason to doubt. Perhaps not doubt that
the final solution would be more efficient, but rather, the reason
to doubt that the difference in efficiency would be significant.
> The second best proof would be to set up either DB2 or Oracle so
> that they _don't_ use their memory managers and compare their
> performance to a set up that _does_ use said memory managers
> on the same HW, OS, and schema.
Same as above. If Oracle was designed to work with the functionality,
then disabling the functionality, wouldn't prove that an efficient
design would perform equally poorly, or even, poorly at all. I think
it would be obvious that Oracle would have invested most of their
dollars into the common execution paths, with the expected
> I don't currently have the resources for either experiment.
This is the real problem. :-)
> Some might even argue that IBM (where Codd and Date worked)
> and Oracle just _might_ have had justification for the huge effort
> they put into developing such infrastructure.
Or, not. They might just have more money to throw at the problem,
and be entrenched into their solution to the point that they need
to innovate to ensure that their solution appears to be the best.
> Then there's the large library of research on caching strategies
> in just about every HW and SW domain, including DB theory,
> that points put that the more context dependent, ie application
> or domain specific awareness, caching strategies are the better
> they are.
A lot of this is theory. It may be good theory, but there is no
guarantee that the variables listed in these theories match, or
properly estimate the issues that would be found in a real
> Maybe after we do all we can about physical IO and sorting
> performance I'll take on the religious fanatics on this one.
> One problem set at a time.
In any case, I'm on your side - in theory. Intuitively, I don't
understand how anybody could claim that a general solution could ever
be faster than a specific solution. Anybody who claimed this, would
go down in my books as a fool. It should be obvious to these people
that, as an extreme, the entire operating system caching layer, and
the entire file system layer could be inlined into PostgreSQL,
avoiding many of the expenses involved in switching back and forth
between user space and system space, leaving a more efficient,
although significantly more complicated solution.
Whether by luck, or by experience of those involved, I haven't seen
any senior PostgreSQL developers actually stating that it couldn't be
faster. Instead, I've seen it claimed that the PostgreSQL developers
don't have the resources to attack this problem, as there are other
far more pressing features, product defects, and more obviously
beneficial optimization opportunities to work on. Memory management,
or disk management, is "good enough" as provided by decent operating
systems, and the itch just isn't bad enough to scratch yet. They
remain unconvinced that the gain in performance, would be worth the
cost of maintaining this extra complexity in the code base.
If you believe the case can be made, it is up to you to make it.
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED]
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not