Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL

2006-08-17 Thread Peter Hardman

On 17 Aug 2006 at 10:00, Mario Weilguni wrote:

 not really sure if this is right without any testdata, but isn't that what 
 you 
 want?
 
 CREATE index foo on sheep_flock (flock_no);
 
 SELECT DISTINCT on (f1.transfer_date) f1.regn_no, f1.transfer_date as date_in
 FROM SHEEP_FLOCK f1
 WHERE f1.flock_no = '1359'
 order by f1.transfer_date desc;
 
 best regards, 
 mario weilguni
 
 
Mario, Thanks for the suggestion, but this query produces the wrong answer - 
but 
then I provided no data, nor properly explained what the data would be.
Each sheep will have multiple records, starting with one for when it's first 
registered, then one for each flock it's in (eg sold into) then one for when it 
dies 
and goes to the 'big flock in the sky'.

 So first I need to find the most recent record for each sheep and then select 
the 
sheep who's most recent record matches the flock in question.

Your query finds all the sheep that have been in the flock in question, then 
selects 
the first one from each set of records with the same date. So it collects data 
on 
dead sheep, and only selects one sheep if several were bought or registered on 
the same day.

Forgive me for being verbose - I want to make sure I understand it propely 
myself!

regards, 
 -- 
Peter Hardman
Acre Cottage, Horsebridge
King's Somborne
Stockbridge
SO20 6PT

== Breeder of Shetland Cattle and Shetland Sheep ==


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL

2006-08-17 Thread Peter Hardman
On 16 Aug 2006 at 18:51, Tom Lane wrote:

 Peter Hardman [EMAIL PROTECTED] writes:
  I'm in the process of migrating a Paradox 7/BDE 5.01 database from 
  single-user 
snip

Arjen van der Meijden has proposed a very elegant query in another post. 

 What I find interesting though is that it sounds like both MSSQL and
 Paradox know something we don't about how to optimize it.  PG doesn't
 have any idea how to do the above query without forming the full output
 of the sub-select, but I suspect that the commercial DBs know a
 shortcut; perhaps they are able to automatically derive a restriction
 in the subquery similar to what you did by hand.  Does Paradox have
 anything comparable to EXPLAIN that would give a hint about the query
 plan they are using?

Sadly, no. In fact the ability to use SQL from Paradox at all is not well known 
and 
not very visible in the the documentation. 

I wonder whether Paradox and MySQL are just not doing the sort (this seems to 
be what eats up the time), since the output of the subquery is in fact already 
in the 
proper order.

 
 Also, just as in the other thread, I'm thinking that a seqscan+hash
 aggregate would be a better idea than this bit:
 
 -  GroupAggregate  (cost=0.00..3924.91 rows=33676 
  width=13) (actual time=0.324..473.131 rows=38815 loops=1)
   -  Index Scan using sheep_flock_pkey on 
  sheep_flock f (cost=0.00..3094.95 rows=81802 width=13) (actual 
  time=0.295..232.156)
 
 Possibly you need to raise work_mem to get it to consider the hash
 aggregation method.
 
 BTW, are you *sure* you are testing PG 8.1?  The Subquery Scan f2 plan
 node looks unnecessary to me, and I'd have expected 8.1 to drop it out.
 8.0 and before would have left it in the plan though.  This doesn't make
 all that much difference performance-wise in itself, but it does make me
 wonder what you are testing.

Yes, the executables all say version 8.1.3.6044
 
Regards,-- 
Peter Hardman
Acre Cottage, Horsebridge
King's Somborne
Stockbridge
SO20 6PT

== Breeder of Shetland Cattle and Shetland Sheep ==


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL

2006-08-17 Thread Peter Hardman
On 17 Aug 2006 at 12:11, Markus Schaber wrote:

 Hi, Peter,
 
 Peter Hardman wrote:
 
  BTW, are you *sure* you are testing PG 8.1?  The Subquery Scan f2 plan
  node looks unnecessary to me, and I'd have expected 8.1 to drop it out.
  8.0 and before would have left it in the plan though.  This doesn't make
  all that much difference performance-wise in itself, but it does make me
  wonder what you are testing.
  
  Yes, the executables all say version 8.1.3.6044
 
 Would you mind to look at the output of select version();, too?
 
 I ask this because I stumbled over it myself, that I had installed the
 correct postgresql and psql versions, but accidentally connected to a
 different database installation due to strange environment and script
 settings...
select version() returns

PostgreSQL 8.1.3 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 
(mingw-special)

Cheers,-- 
Peter Hardman
Acre Cottage, Horsebridge
King's Somborne
Stockbridge
SO20 6PT

== Breeder of Shetland Cattle and Shetland Sheep ==


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL

2006-08-17 Thread Peter Hardman


On 16 Aug 2006 at 17:48, Peter Hardman wrote:

 I'm in the process of migrating a Paradox 7/BDE 5.01 database from 
 single-user 
 Paradox to a web based interface to either MySQL or PostgreSQL.
snip 

I've uploaded my data to www.shetland-sheep.org.uk/pgdata/sheep-flock.zip

The flock SSBXXX is the 'big flock in the sky' and thus there should never be 
any 
date for a sheep greater than this. 

Yes, the primary key is regn_no + flock_no + transfer_date.

Thanks again for all the help and advice.

Regards,-- 
Peter Hardman
Acre Cottage, Horsebridge
King's Somborne
Stockbridge
SO20 6PT

== Breeder of Shetland Cattle and Shetland Sheep ==


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL

2006-08-17 Thread Peter Hardman
On 17 Aug 2006 at 14:33, Tom Lane wrote:

 I wrote:
  Anywy, your point about the sort being redundant is a good one, and
  offhand I'd have expected PG to catch that; I'll have to look into
  why it didn't.  But that's not going to explain a 10x speed
  difference, because the sort isn't 90% of the runtime.
 
 I dug into this using some made-up test data, and was able to reproduce
 the plan you got after changing the order of the pkey index columns
 to (regn_no, transfer_date, flock_no) ... are you sure you quoted that
 accurately before?

Yes. Maybe the data I've uploaded to www.shetland-
sheep.org.uk/pgdata/sheep_flock.zip will help reproduce the plan.

snip 
 I found a couple of minor planner problems, which I've repaired in CVS
 HEAD.  You might consider using TEXT columns instead of VARCHAR(n),
 because the only bug that actually seemed to change the chosen plan
 involved the planner getting confused by the difference between
 varchar_var and varchar_var::text (which is what gets generated for
 sorting purposes because varchar doesn't have a separate sort operator).

As someone else suggested, these fields ought really to be CHAR no VARCHAR. 
I chose VARCHAR because the data mostly is shorter than the maximum lengths 
(although probably not enough to matter). I'd not really got into the 
subtleties of 
different behaviour of CHAR and VARCHAR.
 
snip 

Regards,-- 
Peter Hardman
Acre Cottage, Horsebridge
King's Somborne
Stockbridge
SO20 6PT

== Breeder of Shetland Cattle and Shetland Sheep ==


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL

2006-08-17 Thread Peter Hardman



On 17 Aug 2006 at 20:58, Peter Hardman wrote:


 
 
 On 16 Aug 2006 at 17:48, Peter Hardman wrote:
 
  I'm in the process of migrating a Paradox 7/BDE 5.01 database from single-user 
  Paradox to a web based interface to either MySQL or PostgreSQL.
 snip 
 
 I've uploaded my data to www.shetland-sheep.org.uk/pgdata/sheep-flock.zip


Sorry - that should be www.shetland-sheep.org.uk/pgdata/sheep_flock.zip
 
 The flock SSBXXX is the 'big flock in the sky' and thus there should never be any 
 date for a sheep greater than this. 
 
 Yes, the primary key is regn_no + flock_no + transfer_date.
 
 Thanks again for all the help and advice.
 
 Regards,-- 
 Peter Hardman
 Acre Cottage, Horsebridge
 King's Somborne
 Stockbridge
 SO20 6PT
 
 == Breeder of Shetland Cattle and Shetland Sheep ==
 
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 


-- 
Peter Hardman
Acre Cottage, Horsebridge
King's Somborne
Stockbridge
SO20 6PT


== Breeder of Shetland Cattle and Shetland Sheep ==





[PERFORM] PostgreSQL runs a query much slower than BDE and MySQL

2006-08-16 Thread Peter Hardman
I'm in the process of migrating a Paradox 7/BDE 5.01 database from single-user 
Paradox to a web based interface to either MySQL or PostgreSQL.
The database is a pedigree sheep breed society database recording sheep and 
flocks (amongst other things).

My current problem is with one table and an associated query which takes 10 
times longer to execute on PostgreSQL than BDE, which in turn takes 10 times 
longer than MySQL. The table links sheep to flocks and is created as follows:

CREATE TABLE SHEEP_FLOCK
(
  regn_no varchar(7) NOT NULL,
  flock_no varchar(6) NOT NULL,
  transfer_date date NOT NULL,
  last_changed date NOT NULL,
  CONSTRAINT SHEEP_FLOCK_pkey PRIMARY KEY (regn_no, flock_no, 
transfer_date)
) 
WITHOUT OIDS;
ALTER TABLE SHEEP_FLOCK OWNER TO postgres;

I then populate the table with 

COPY SHEEP_FLOCK
FROM 'e:/ssbg/devt/devt/export_data/sheep_flock.txt'
WITH CSV HEADER

The table then has about 82000 records

The query I run is:

/* Select all sheep who's most recent transfer was into the subject flock */
SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in
FROM SHEEP_FLOCK f1 JOIN 
/* The last transfer date for each sheep */
(SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date
FROM  SHEEP_FLOCK f
GROUP BY f.regn_no) f2 
ON f1.regn_no = f2.regn_no
WHERE f1.flock_no = '1359'
AND f1.transfer_date = f2.last_xfer_date

The sub-select on it's own returns about 32000 rows.

Using identically structured tables and the same primary key, if I run this on 
Paradox/BDE it takes about 120ms, on MySQL (5.0.24, local server) about 3ms, 
and on PostgresSQL (8.1.3, local server) about 1290ms). All on the same 
Windows XP Pro machine with 512MB ram of which nearly half is free.  

The query plan shows most of the time is spent sorting the 3+ rows from the 
subquery, so I added a further
subquery as follows: 

/* Select all sheep who's most recent transfer was into the subject flock */
SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in
FROM SHEEP_FLOCK f1 JOIN 
/* The last transfer date for each sheep */
(SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date
FROM  SHEEP_FLOCK f
WHERE f.regn_no IN 
/* Limit the rows extracted by the outer sub-query to those relevant to 
the 
subject flock */
/* This typically reduces the time from 1297ms to 47ms - from 35000 
rows 
to 127 rows */
(SELECT s.regn_no FROM SHEEP_FLOCK s where s.flock_no = '1359')
GROUP BY f.regn_no) f2 
ON f1.regn_no = f2.regn_no
WHERE f1.flock_no = '1359'
AND f1.transfer_date = f2.last_xfer_date

then as the comment suggests I get a considerable improvement, but it's still 
an 
order of magnitude slower than MySQL.

Can anyone suggest why PostgreSQL performs the original query so much slower 
than even BDE?
 -- 
Peter Hardman
Acre Cottage, Horsebridge
King's Somborne
Stockbridge
SO20 6PT

== Breeder of Shetland Cattle and Shetland Sheep ==


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL

2006-08-16 Thread Peter Hardman
On 16 Aug 2006 at 20:02, Arjen van der Meijden wrote:

 On 16-8-2006 18:48, Peter Hardman wrote:
  Using identically structured tables and the same primary key, if I run this 
  on 
  Paradox/BDE it takes about 120ms, on MySQL (5.0.24, local server) about 
  3ms, 
  and on PostgresSQL (8.1.3, local server) about 1290ms). All on the same 
  Windows XP Pro machine with 512MB ram of which nearly half is free.  
 
 Is that with or without query caching? I.e. can you test it with SELECT 
 SQL_NO_CACHE ... ?
 In a read-only environment it will still beat PostgreSQL, but as soon as 
 you'd get a read-write environment, MySQL's query cache is of less use. 
 So you should compare both the cached and non-cached version, if applicable.
It seems to make no difference - not surprising really as I'm just running the 
query 
from the command line interface.
 
 Besides that, most advices on this list are impossible without the 
 result of 'explain analyze', so you should probably get that as well.
Here is the output of EXPLAIN ANALYZE for the slow query:

Unique  (cost=7201.65..8487.81 rows=1 width=13) (actual 
time=1649.733..1811.684 rows=32 loops=1)
  -  Merge Join  (cost=7201.65..8487.80 rows=1 width=13) (actual 
time=1649.726..1811.528 rows=32 loops=1)
Merge Cond: (((outer.regn_no)::text = inner.?column3?) AND 
(outer.transfer_date = inner.last_xfer_date))
-  Index Scan using sheep_flock_pkey on sheep_flock f1  
(cost=0.00..1033.19 rows=77 width=13) (actual time=15.357..64.237 rows=127 
loops=1)
  Index Cond: ((flock_no)::text = '1359'::text)
-  Sort  (cost=7201.65..7285.84 rows=33676 width=15) (actual 
time=1580.198..1653.502 rows=38277 loops=1)
  Sort Key: (f2.regn_no)::text, f2.last_xfer_date
  -  Subquery Scan f2  (cost=0.00..4261.67 rows=33676 width=15) 
(actual 
time=0.331..598.246 rows=38815 loops=1)
-  GroupAggregate  (cost=0.00..3924.91 rows=33676 
width=13) 
(actual time=0.324..473.131 rows=38815 loops=1)
  -  Index Scan using sheep_flock_pkey on sheep_flock 
f  
(cost=0.00..3094.95 rows=81802 width=13) (actual time=0.295..232.156 
rows=81802 loops=1)
Total runtime: 1812.737 ms


 
 I'm not sure whether this is the same query, but you might want to try:
 SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in
 FROM SHEEP_FLOCK f1
 WHERE
 f1.flock_no = '1359'
 AND f1.transfer_date = (SELECT MAX(f.transfer_date) FROM SHEEP_FLOCK f 
 WHERE regn_no = f1.regn_no)
 
That's neat - I didn't know you could make a reference from a subselect to the 
outer select. Your query has the same performance as my very complex one on 
both MySQL and PostgreSQL. However I'm not entirely sure about the times for 
MySQL - every interface gives a different answer so I'll have to try them from 
a 
script so I know whats going on.
Interestingly BDE takes 7 seconds to run your query. Just as well I didn't 
start 
from there... 
 And you might need an index on (regn_no, transfer_date) and/or one 
 combined with that flock_no.
Explain says it only uses the primary key, so it seems there' no need for a 
separate index

Thanks for the help
-- 
Peter Hardman
Acre Cottage, Horsebridge
King's Somborne
Stockbridge
SO20 6PT

== Breeder of Shetland Cattle and Shetland Sheep ==


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq