Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL
On 17 Aug 2006 at 10:00, Mario Weilguni wrote: not really sure if this is right without any testdata, but isn't that what you want? CREATE index foo on sheep_flock (flock_no); SELECT DISTINCT on (f1.transfer_date) f1.regn_no, f1.transfer_date as date_in FROM SHEEP_FLOCK f1 WHERE f1.flock_no = '1359' order by f1.transfer_date desc; best regards, mario weilguni Mario, Thanks for the suggestion, but this query produces the wrong answer - but then I provided no data, nor properly explained what the data would be. Each sheep will have multiple records, starting with one for when it's first registered, then one for each flock it's in (eg sold into) then one for when it dies and goes to the 'big flock in the sky'. So first I need to find the most recent record for each sheep and then select the sheep who's most recent record matches the flock in question. Your query finds all the sheep that have been in the flock in question, then selects the first one from each set of records with the same date. So it collects data on dead sheep, and only selects one sheep if several were bought or registered on the same day. Forgive me for being verbose - I want to make sure I understand it propely myself! regards, -- Peter Hardman Acre Cottage, Horsebridge King's Somborne Stockbridge SO20 6PT == Breeder of Shetland Cattle and Shetland Sheep == ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL
On 16 Aug 2006 at 18:51, Tom Lane wrote: Peter Hardman [EMAIL PROTECTED] writes: I'm in the process of migrating a Paradox 7/BDE 5.01 database from single-user snip Arjen van der Meijden has proposed a very elegant query in another post. What I find interesting though is that it sounds like both MSSQL and Paradox know something we don't about how to optimize it. PG doesn't have any idea how to do the above query without forming the full output of the sub-select, but I suspect that the commercial DBs know a shortcut; perhaps they are able to automatically derive a restriction in the subquery similar to what you did by hand. Does Paradox have anything comparable to EXPLAIN that would give a hint about the query plan they are using? Sadly, no. In fact the ability to use SQL from Paradox at all is not well known and not very visible in the the documentation. I wonder whether Paradox and MySQL are just not doing the sort (this seems to be what eats up the time), since the output of the subquery is in fact already in the proper order. Also, just as in the other thread, I'm thinking that a seqscan+hash aggregate would be a better idea than this bit: - GroupAggregate (cost=0.00..3924.91 rows=33676 width=13) (actual time=0.324..473.131 rows=38815 loops=1) - Index Scan using sheep_flock_pkey on sheep_flock f (cost=0.00..3094.95 rows=81802 width=13) (actual time=0.295..232.156) Possibly you need to raise work_mem to get it to consider the hash aggregation method. BTW, are you *sure* you are testing PG 8.1? The Subquery Scan f2 plan node looks unnecessary to me, and I'd have expected 8.1 to drop it out. 8.0 and before would have left it in the plan though. This doesn't make all that much difference performance-wise in itself, but it does make me wonder what you are testing. Yes, the executables all say version 8.1.3.6044 Regards,-- Peter Hardman Acre Cottage, Horsebridge King's Somborne Stockbridge SO20 6PT == Breeder of Shetland Cattle and Shetland Sheep == ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL
On 17 Aug 2006 at 12:11, Markus Schaber wrote: Hi, Peter, Peter Hardman wrote: BTW, are you *sure* you are testing PG 8.1? The Subquery Scan f2 plan node looks unnecessary to me, and I'd have expected 8.1 to drop it out. 8.0 and before would have left it in the plan though. This doesn't make all that much difference performance-wise in itself, but it does make me wonder what you are testing. Yes, the executables all say version 8.1.3.6044 Would you mind to look at the output of select version();, too? I ask this because I stumbled over it myself, that I had installed the correct postgresql and psql versions, but accidentally connected to a different database installation due to strange environment and script settings... select version() returns PostgreSQL 8.1.3 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 (mingw-special) Cheers,-- Peter Hardman Acre Cottage, Horsebridge King's Somborne Stockbridge SO20 6PT == Breeder of Shetland Cattle and Shetland Sheep == ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL
On 16 Aug 2006 at 17:48, Peter Hardman wrote: I'm in the process of migrating a Paradox 7/BDE 5.01 database from single-user Paradox to a web based interface to either MySQL or PostgreSQL. snip I've uploaded my data to www.shetland-sheep.org.uk/pgdata/sheep-flock.zip The flock SSBXXX is the 'big flock in the sky' and thus there should never be any date for a sheep greater than this. Yes, the primary key is regn_no + flock_no + transfer_date. Thanks again for all the help and advice. Regards,-- Peter Hardman Acre Cottage, Horsebridge King's Somborne Stockbridge SO20 6PT == Breeder of Shetland Cattle and Shetland Sheep == ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL
On 17 Aug 2006 at 14:33, Tom Lane wrote: I wrote: Anywy, your point about the sort being redundant is a good one, and offhand I'd have expected PG to catch that; I'll have to look into why it didn't. But that's not going to explain a 10x speed difference, because the sort isn't 90% of the runtime. I dug into this using some made-up test data, and was able to reproduce the plan you got after changing the order of the pkey index columns to (regn_no, transfer_date, flock_no) ... are you sure you quoted that accurately before? Yes. Maybe the data I've uploaded to www.shetland- sheep.org.uk/pgdata/sheep_flock.zip will help reproduce the plan. snip I found a couple of minor planner problems, which I've repaired in CVS HEAD. You might consider using TEXT columns instead of VARCHAR(n), because the only bug that actually seemed to change the chosen plan involved the planner getting confused by the difference between varchar_var and varchar_var::text (which is what gets generated for sorting purposes because varchar doesn't have a separate sort operator). As someone else suggested, these fields ought really to be CHAR no VARCHAR. I chose VARCHAR because the data mostly is shorter than the maximum lengths (although probably not enough to matter). I'd not really got into the subtleties of different behaviour of CHAR and VARCHAR. snip Regards,-- Peter Hardman Acre Cottage, Horsebridge King's Somborne Stockbridge SO20 6PT == Breeder of Shetland Cattle and Shetland Sheep == ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL
On 17 Aug 2006 at 20:58, Peter Hardman wrote: On 16 Aug 2006 at 17:48, Peter Hardman wrote: I'm in the process of migrating a Paradox 7/BDE 5.01 database from single-user Paradox to a web based interface to either MySQL or PostgreSQL. snip I've uploaded my data to www.shetland-sheep.org.uk/pgdata/sheep-flock.zip Sorry - that should be www.shetland-sheep.org.uk/pgdata/sheep_flock.zip The flock SSBXXX is the 'big flock in the sky' and thus there should never be any date for a sheep greater than this. Yes, the primary key is regn_no + flock_no + transfer_date. Thanks again for all the help and advice. Regards,-- Peter Hardman Acre Cottage, Horsebridge King's Somborne Stockbridge SO20 6PT == Breeder of Shetland Cattle and Shetland Sheep == ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Peter Hardman Acre Cottage, Horsebridge King's Somborne Stockbridge SO20 6PT == Breeder of Shetland Cattle and Shetland Sheep ==
[PERFORM] PostgreSQL runs a query much slower than BDE and MySQL
I'm in the process of migrating a Paradox 7/BDE 5.01 database from single-user Paradox to a web based interface to either MySQL or PostgreSQL. The database is a pedigree sheep breed society database recording sheep and flocks (amongst other things). My current problem is with one table and an associated query which takes 10 times longer to execute on PostgreSQL than BDE, which in turn takes 10 times longer than MySQL. The table links sheep to flocks and is created as follows: CREATE TABLE SHEEP_FLOCK ( regn_no varchar(7) NOT NULL, flock_no varchar(6) NOT NULL, transfer_date date NOT NULL, last_changed date NOT NULL, CONSTRAINT SHEEP_FLOCK_pkey PRIMARY KEY (regn_no, flock_no, transfer_date) ) WITHOUT OIDS; ALTER TABLE SHEEP_FLOCK OWNER TO postgres; I then populate the table with COPY SHEEP_FLOCK FROM 'e:/ssbg/devt/devt/export_data/sheep_flock.txt' WITH CSV HEADER The table then has about 82000 records The query I run is: /* Select all sheep who's most recent transfer was into the subject flock */ SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in FROM SHEEP_FLOCK f1 JOIN /* The last transfer date for each sheep */ (SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date FROM SHEEP_FLOCK f GROUP BY f.regn_no) f2 ON f1.regn_no = f2.regn_no WHERE f1.flock_no = '1359' AND f1.transfer_date = f2.last_xfer_date The sub-select on it's own returns about 32000 rows. Using identically structured tables and the same primary key, if I run this on Paradox/BDE it takes about 120ms, on MySQL (5.0.24, local server) about 3ms, and on PostgresSQL (8.1.3, local server) about 1290ms). All on the same Windows XP Pro machine with 512MB ram of which nearly half is free. The query plan shows most of the time is spent sorting the 3+ rows from the subquery, so I added a further subquery as follows: /* Select all sheep who's most recent transfer was into the subject flock */ SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in FROM SHEEP_FLOCK f1 JOIN /* The last transfer date for each sheep */ (SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date FROM SHEEP_FLOCK f WHERE f.regn_no IN /* Limit the rows extracted by the outer sub-query to those relevant to the subject flock */ /* This typically reduces the time from 1297ms to 47ms - from 35000 rows to 127 rows */ (SELECT s.regn_no FROM SHEEP_FLOCK s where s.flock_no = '1359') GROUP BY f.regn_no) f2 ON f1.regn_no = f2.regn_no WHERE f1.flock_no = '1359' AND f1.transfer_date = f2.last_xfer_date then as the comment suggests I get a considerable improvement, but it's still an order of magnitude slower than MySQL. Can anyone suggest why PostgreSQL performs the original query so much slower than even BDE? -- Peter Hardman Acre Cottage, Horsebridge King's Somborne Stockbridge SO20 6PT == Breeder of Shetland Cattle and Shetland Sheep == ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL
On 16 Aug 2006 at 20:02, Arjen van der Meijden wrote: On 16-8-2006 18:48, Peter Hardman wrote: Using identically structured tables and the same primary key, if I run this on Paradox/BDE it takes about 120ms, on MySQL (5.0.24, local server) about 3ms, and on PostgresSQL (8.1.3, local server) about 1290ms). All on the same Windows XP Pro machine with 512MB ram of which nearly half is free. Is that with or without query caching? I.e. can you test it with SELECT SQL_NO_CACHE ... ? In a read-only environment it will still beat PostgreSQL, but as soon as you'd get a read-write environment, MySQL's query cache is of less use. So you should compare both the cached and non-cached version, if applicable. It seems to make no difference - not surprising really as I'm just running the query from the command line interface. Besides that, most advices on this list are impossible without the result of 'explain analyze', so you should probably get that as well. Here is the output of EXPLAIN ANALYZE for the slow query: Unique (cost=7201.65..8487.81 rows=1 width=13) (actual time=1649.733..1811.684 rows=32 loops=1) - Merge Join (cost=7201.65..8487.80 rows=1 width=13) (actual time=1649.726..1811.528 rows=32 loops=1) Merge Cond: (((outer.regn_no)::text = inner.?column3?) AND (outer.transfer_date = inner.last_xfer_date)) - Index Scan using sheep_flock_pkey on sheep_flock f1 (cost=0.00..1033.19 rows=77 width=13) (actual time=15.357..64.237 rows=127 loops=1) Index Cond: ((flock_no)::text = '1359'::text) - Sort (cost=7201.65..7285.84 rows=33676 width=15) (actual time=1580.198..1653.502 rows=38277 loops=1) Sort Key: (f2.regn_no)::text, f2.last_xfer_date - Subquery Scan f2 (cost=0.00..4261.67 rows=33676 width=15) (actual time=0.331..598.246 rows=38815 loops=1) - GroupAggregate (cost=0.00..3924.91 rows=33676 width=13) (actual time=0.324..473.131 rows=38815 loops=1) - Index Scan using sheep_flock_pkey on sheep_flock f (cost=0.00..3094.95 rows=81802 width=13) (actual time=0.295..232.156 rows=81802 loops=1) Total runtime: 1812.737 ms I'm not sure whether this is the same query, but you might want to try: SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in FROM SHEEP_FLOCK f1 WHERE f1.flock_no = '1359' AND f1.transfer_date = (SELECT MAX(f.transfer_date) FROM SHEEP_FLOCK f WHERE regn_no = f1.regn_no) That's neat - I didn't know you could make a reference from a subselect to the outer select. Your query has the same performance as my very complex one on both MySQL and PostgreSQL. However I'm not entirely sure about the times for MySQL - every interface gives a different answer so I'll have to try them from a script so I know whats going on. Interestingly BDE takes 7 seconds to run your query. Just as well I didn't start from there... And you might need an index on (regn_no, transfer_date) and/or one combined with that flock_no. Explain says it only uses the primary key, so it seems there' no need for a separate index Thanks for the help -- Peter Hardman Acre Cottage, Horsebridge King's Somborne Stockbridge SO20 6PT == Breeder of Shetland Cattle and Shetland Sheep == ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq