Re: [HACKERS] Socket communication for contrib

2004-04-16 Thread Andrew Sullivan
On Wed, Apr 07, 2004 at 09:39:15AM -0400, Jan Wieck wrote:

 the reason why dblink, dbmirror and the rserv prototype are under 
 contrib/ while projects like erserver and slony1 live on gborg - just to 
 take this class of projects as an example.

Since the rserv prototype actually does not work any more (at least
according to the reports I've seen), it should probably be removed
anyway.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism. 
--Brad Holland

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Socket communication for contrib

2004-04-16 Thread Andrew Sullivan
On Mon, Apr 05, 2004 at 06:33:51PM +0200, Hans-J?rgen Sch?nig wrote:
 
 I have learned (please correct me if I am wrong) that people tend to 
 look in contrib before they look at gborg.

This may be true, but if so, perhaps it's a reason to add a
contrib/gborg directory with just a README that says For lots of
additional software which is designed to be part of your PostgreSQL
installation, go to http://gborg.postgresql.org.

A


-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The plural of anecdote is not data.
--Roger Brinner

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Socket communication for contrib

2004-04-16 Thread Steve Atkins
On Fri, Apr 16, 2004 at 08:10:20AM -0400, Andrew Sullivan wrote:
 On Mon, Apr 05, 2004 at 06:33:51PM +0200, Hans-J?rgen Sch?nig wrote:
  
  I have learned (please correct me if I am wrong) that people tend to 
  look in contrib before they look at gborg.
 
 This may be true, but if so, perhaps it's a reason to add a
 contrib/gborg directory with just a README that says For lots of
 additional software which is designed to be part of your PostgreSQL
 installation, go to http://gborg.postgresql.org.

Or even a text dump of http://gborg.postgresql.org/project/projdisplaylist.php
(which would be more useful if all gborg projects had useful descriptions -
 but that would directly benefit users of gborg too)

Cheers,
  Steve

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Socket communication for contrib

2004-04-16 Thread Josh Berkus
Hans, Andrew,

  I have learned (please correct me if I am wrong) that people tend to 
  look in contrib before they look at gborg.

pgFoundry/project.postgresql.org will be up by next week, I promise.
Working on it now.

This should increase the visibility of non-core components.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] [Fwd: DBT3-pgsql large performance improvement 2.6.6-rc1]

2004-04-16 Thread Mary Edie Meredith
FYI for those of you tracking Linux development
kernel performance on PostgreSQL.  

-Forwarded Message-
From: Mary Edie Meredith [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: DBT3-pgsql large performance improvement 2.6.6-rc1
Date: Fri, 16 Apr 2004 09:51:47 -0700

Performance in DBT-3 (using PostgreSQL) has vastly
improved for _both in the power portion (single 
process/query) and in the throughput portion of 
the test (when the test is running multiple processes) 
on our 4-way(4GB) and 8-way(8GB) STP systems as 
compared 2.6.5  kernel results.

Using the default DBT-3 options (ie using LVM, ext2, 
PostgreSQL version 7.4.1) 

Note: Bigger numbers are better.

KernelRunid..CPUs.Power..%incP.Thruput %incT 
2.6.5 291308   4  97.08  base   120.46  base   
2.6.6-rc1 291876   4  146.11 50.5%  222.94 85.1%

KernelRunid..CPUs.Power..%incP..Thruput %incT
2.6.5 291346   8  101.08  base   138.95 base
2.6.6-rc1 291915   8  151.69  50.1%  273.69 97.0%

So the improvement is between 50% and 97%!

Profile 2.6.5 8way throughput phase:
http://khack.osdl.org/stp/291346/profile/after_throughput_test_1-tick.sort
Profile 2.6.6-r1 8way throughput phase:
http://khack.osdl.org/stp/291915/profile/after_throughput_test_1-tick.sort

What I notice is that radix_tree_lookup is in 
the top 20 in the 2.6.5 profile, but not in 
2.6.6-rc1.  Could the radix tree changes be 
responsible for this?

DBT-3 is a read mostly DSS workload and the throughput 
phase  is where we run multiple query streams (as 
many as we have CPUs).  In this workload, the database 
is stored on a file system, but it is small relative 
to the amount of memory (4GB and 8GB).  It almost 
completely caches in page cache early on.   So there 
is some physical IO in the first few minutes, but very 
little to none in the remainder. 


 
-- 
Mary Edie Meredith 
[EMAIL PROTECTED]
503-626-2455 x42
Open Source Development Labs


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] GiST -- making my index faster makes is slower

2004-04-16 Thread David Blasby
I just tried an experiment that I thought would improve the performance 
of the PostGIS spatial index.

The old way used a BOX (bounding box of 4 doubles) as the key in the index.

The new way uses a bounding box of 4 single-precision floats (its only 
16 bytes long - a BOX is 32).  I thought that it would significantly 
reduce the size of the index (it did) and that the indexing would be 
faster since there's less disk pages to fiddle through and these pages 
would be more likely to fit in the disk cache.

It turns out this is not the case - its significantly slower.  I think 
it to do with more keys fitting in the leaf tuples.

As far as I can tell, the GiST index looks through the internal nodes, 
then when it hits a leaf node, it search through each of the keys in the 
leaf.

I save time searching through the internal nodes (because there's less 
of them) - this is O(log n) saving.

I lose time searching through the leafs (because there's more keys in a 
leaf) - this is O(n) cost.

For a query that does a nested loop of 10,000 index scans (on the same 
table), the old method takes about 2 second, the new faster method 
takes about 20 seconds.

I'm still trying to verify that this is the actual reason why its slower 
- anyone have any other ideas?  Anyone know a good way to profile this?

dave

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] GiST -- making my index faster makes is slower

2004-04-16 Thread David Blasby
Couple of observations:

1. Are you sure your data is handled as 32 bit all the way through?  Run
time casting will offset performance gains on 32 bit floats.  Is your
comparison routine casting to double?
I thought this might be the case - but I thought it would be small.  The 
only place it might be doing hidden casts would be in statements like
query-xmin = key-xmax.


2. Math CPUs usually crunch at 80 bits, can't save much by using 32 bit
floats, although cache coherency will be better.
3. 64 bit cpu will probably run better on 64 bit floats.

4. Is your dataset congested enough that you now have duplicate values
by loss of precision?   This would of course impact performance.  How
big is your dataset?  How big is your avg. result set?
My test datasets are quite spatially separate, so I dont expect there to 
be any accidental overlaps.  There's about 12 million rows (in total) 
in the datasets - I've only noticed about 2 of these overlaps.  My test 
datasets are 1000 rows, 10,000 rows, 10,000,000 rows, and a few 
different ones in the 200,000 row range.

I'm testing queries that return anywhere from 1 geometry to about 10,000.

The actual index search is only a few milliseconds longer using the 
float32 bounding box for a mid-sized table returning a handfull of rows. 
 When the result sets get bigger (or you start doing nested queries), 
the performance differences become greater.

dave

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] 7.5 beta version]

2004-04-16 Thread Gaetano Mendola
Jürgen Cappel wrote:

Point 1 I completely agree on: byte order, alignment, padding, etc.
is different for each platform and data cannot directly be exchanged.
Point 2: who really needs C++ ??
We use it, a multi path TCP router written in C++ and behind
there is a Postgresql...
Regards
Gaetano Mendola


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] GiST -- making my index faster makes is slower

2004-04-16 Thread Tom Lane
David Blasby [EMAIL PROTECTED] writes:
 The old way used a BOX (bounding box of 4 doubles) as the key in the index.
 The new way uses a bounding box of 4 single-precision floats (its only 
 16 bytes long - a BOX is 32).

 It turns out this is not the case - its significantly slower.  I think 
 it to do with more keys fitting in the leaf tuples.

Hm.  With twice as many keys per page, you'd think that it could be no
more than 2x slower, if the problem is one of O(n) search through the
keys on the page.  That doesn't explain a 10x slowdown.  Could it be
that some part of the GIST code is O(n^2), or worse, in the number of
keys per page?  If so, perhaps it's fixable.

I'd suggest profiling the backend with both key types to get an idea of
where the time is going.

regards, tom lane

PS: actually, allowing for the 12-byte index tuple overhead, you
couldn't have even twice as many keys per page.  So there's something
mighty odd here.  Keep us posted.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] GiST -- making my index faster makes is slower

2004-04-16 Thread David Blasby
Tom Lane wrote:

I'd suggest profiling the backend with both key types to get an idea of
where the time is going.
I've been trying to use gprof to do some profiling, but I'm having 
troubles.  Whats the best way to profile?


PS: actually, allowing for the 12-byte index tuple overhead, you
couldn't have even twice as many keys per page.  So there's something
mighty odd here.  Keep us posted.
Using the old system, I'd get about 7 internal node hits and about 110 
leaf hits.  Under the new one, I get about 4 internal node hits and 
about 160 leaf hits.

I'm just in the process of changing the key to:

typedef struct
{
float xmin;
float ymin;
float xmax;
float ymax;
char  junk[16]; // make the 16 byte type into 32!
} BOX2DFLOAT4;
To see what happens.

dave

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] GiST -- making my index faster makes is slower

2004-04-16 Thread Tom Lane
David Blasby [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I'd suggest profiling the backend with both key types to get an idea of
 where the time is going.

 I've been trying to use gprof to do some profiling, but I'm having 
 troubles.  Whats the best way to profile?

It's not hard; the only real gotcha is that on Linux systems you need to
compile with -DLINUX_PROFILE so that the postmaster works around some
Linux brain damage with dropping the profile status at fork().
I usually do (in an already configured and built source tree)

cd src/backend
make clean
make PROFILE=-pg -DLINUX_PROFILE all
make install-bin

Don't forget that the backends will drop their gmon.out files in
$PGDATA/data/DBOID/gmon.out.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] GiST -- making my index faster makes is slower

2004-04-16 Thread David Blasby
Humm -- strange results here:

typedef struct
{
float xmin;
float ymin;
float xmax;
float ymax;
} BOX2DFLOAT4;
This takes about 18,000 ms to do a nested query with 10,000 iterations.

typedef struct
{
float xmin;
float ymin;
float xmax;
float ymax;
char  junk[16];
} BOX2DFLOAT4;
This takes about 15,000 ms to do a nested query with 10,000 iterations.

typedef struct
{
double xmin;
double ymin;
double xmax;
double ymax;
} BOX2DFLOAT4;
This takes about 1500 ms to do a nested query with 10,000 iterations.
Yes - that almost 14 seconds faster!
This doesnt make a lot of sense since the only part of the GiST index 
thats being called that actually looks at the bounding boxes is this:

 retval = (((key-xmax= query-xmax) 
  (key-xmin = query-xmax)) ||
  ((query-xmax= key-xmax) 
  (query-xmin= key-xmax)))
 
  (((key-ymax= query-ymax) 
  (key-ymin= query-ymax)) ||
  ((query-ymax= key-ymax) 
  (query-ymin= key-ymax)));
dave

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] PostgreSQL configuration

2004-04-16 Thread Mark Kirkwood
Joe Conway wrote:

Tom Lane wrote:

Personally I rely quite a lot on setting PGDATA to keep straight which
installation I'm currently working with, so I'm not going to be happy
with a redesign that eliminates that variable without providing an
adequate substitute :-(


I'll second that.


I'll third (or whatever) it too :-)

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] GiST -- making my index faster makes is slower

2004-04-16 Thread Tom Lane
David Blasby [EMAIL PROTECTED] writes:
 Humm -- strange results here:
 typedef struct
 {
   float xmin;
   float ymin;
   float xmax;
   float ymax;
 } BOX2DFLOAT4;

 This takes about 18,000 ms to do a nested query with 10,000 iterations.

 typedef struct
 {
   float xmin;
   float ymin;
   float xmax;
   float ymax;
   char  junk[16];
 } BOX2DFLOAT4;

 This takes about 15,000 ms to do a nested query with 10,000 iterations.

That is strange --- I'm still suspecting an O(n^2) bit of logic
somewhere.  But the size of the discrepancy is a little bit more
reasonable.  Could you profile these two cases and see where the extra
time goes?

 typedef struct
 {
   double xmin;
   double ymin;
   double xmax;
   double ymax;
 } BOX2DFLOAT4;

 This takes about 1500 ms to do a nested query with 10,000 iterations.
 Yes - that almost 14 seconds faster!

AFAICS there are only two possible explanations:

1. float-vs-float comparison is a lot slower than double-vs-double on
your hardware.  Actually, the comparisons might be float-vs-double
(is the query struct the same as the key??).  The compiler could
well be promoting one or both floats to double and then doing double
comparison, but even so, that's a heck of a large overhead.

2. The float bounding boxes are presumably slightly inaccurate.  If they
are a tad too large then perhaps you are visiting more leaf pages than
you need to.  (A worrisome possibility is that a float box could be
slightly too small, causing you to fail to visit an entry you should
:-()

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] PostgreSQL configuration

2004-04-16 Thread Kevin Brown
Tom Lane wrote:
 Kevin Brown [EMAIL PROTECTED] writes:
  The goal here is simply to make it obvious to a system administrator where
  the PG data directory that a given postmaster is using resides.
 
 Why would it not be sufficient to add a read-only GUC variable that
 tells that?  Connect to the postmaster and do show datadir and you're
 done.  (Without this, it's not clear you've made any particular gain
 anyway, since a given postmaster would typically mean the one I can
 connect to at this port, no?)

That would probably be sufficient for most cases.  It wouldn't take care
of the case where there's a strict separation of powers between the
system administrator and the DBA, but only if the system were managed
badly (i.e., the SA and the DBA don't talk to each other very well).
That's probably something we shouldn't concern ourselves with.

 In any case I don't see how removing PGDATA would make this more
 obvious.  You yourself just pointed out that the command-line arguments
 of a postmaster aren't necessarily visible through ps; if they're not,
 what have you gained in transparency by forbidding PGDATA?

I think you misunderstood what I was saying (which means I didn't say it
right).

There are ways within a program to change what 'ps' shows as the
command line.  We use those methods to make it possible to see what
a given backend is doing by looking at the 'ps' output.  It would be
possible to have the postmaster use those ways in order to show which data
directory it is using even if it wasn't specified on the command line.
But in my experience, those ways don't work reliably on all systems.
On the systems that those methods don't work, what 'ps' shows is the
original command line that was used.  So clearly, the only way 'ps'
will show the data directory in that instance is if it was actually
specified on the command line.

  In any case, I'm not at all opposed to having the backend stuff know
  about PGDATA during development, but for production you should have to
  explicitly specify the data directory on the command line.
 
 If you wish to do things that way, you can; but that doesn't mean that
 everyone else should have to do it that way too.  If there were a
 security or reliability hazard involved, I might agree with taking the
 fascist approach, but I see no such hazard here ...

Fair enough.  The PGDATA issue isn't a big enough one that I'm terribly
concerned about it, especially if a read-only GUC variable is available
to give that information (something that, I think, should be there
anyway).


-- 
Kevin Brown   [EMAIL PROTECTED]

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Remove MySQL Tools from Source?

2004-04-16 Thread scott.marlowe
On Fri, 16 Apr 2004, Christopher Kings-Lynne wrote:

  I always ran one of the 2 scripts (can't remember which one) and after that 
  started checking the dump file, because there were things that didn't get 
  changed correctly[1].
  
  [1]: I always remember the first conversion I did. I found out that MySQL 
  accepted dates like 30/2/2000 or 0-0-.
  Very odd.
 
 Yes, MySQL has always accepted those as perfectly valid dates.  It's 
 quite broken.

the sad thing is that while MySQL implemented a -ansi switch that 
supposedly turns on ansi compliance, it only fixes about 1/2 of all the 
non-compliance issues.  Yet another half-implemented feature... :)


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Accessing RelOptInfo structure from the executor module

2004-04-16 Thread Tom Lane
Shalu Gupta [EMAIL PROTECTED] writes:
  I want to access the RelOptInfo data structure from the executor module.

You can't, because it doesn't exist any more.  RelOptInfo is a planner
internal data structure, and is probably pfree'd before the executor starts.

Perhaps you could explain your problem at a higher level?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org