On 8/16/06, Peter Hardman <[EMAIL PROTECTED]> wrote:
I'm in the process of migrating a Paradox 7/BDE 5.01 database from single-user
Paradox to a web based interface to either MySQL or PostgreSQL.
The database is a pedigree sheep breed society database recording sheep and
flocks (amongst other things).

My current problem is with one table and an associated query which takes 10
times longer to execute on PostgreSQL than BDE, which in turn takes 10 times
longer than MySQL. The table links sheep to flocks and is created as follows:

CREATE TABLE SHEEP_FLOCK
(
  regn_no varchar(7) NOT NULL,
  flock_no varchar(6) NOT NULL,
  transfer_date date NOT NULL,
  last_changed date NOT NULL,
  CONSTRAINT SHEEP_FLOCK_pkey PRIMARY KEY (regn_no, flock_no,
transfer_date)
)
WITHOUT OIDS;
ALTER TABLE SHEEP_FLOCK OWNER TO postgres;

I then populate the table with

COPY SHEEP_FLOCK
FROM 'e:/ssbg/devt/devt/export_data/sheep_flock.txt'
WITH CSV HEADER

The table then has about 82000 records

The query I run is:

/* Select all sheep who's most recent transfer was into the subject flock */
SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in
FROM SHEEP_FLOCK f1 JOIN
    /* The last transfer date for each sheep */
    (SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date
    FROM  SHEEP_FLOCK f
    GROUP BY f.regn_no) f2
ON f1.regn_no = f2.regn_no
WHERE f1.flock_no = '1359'
AND f1.transfer_date = f2.last_xfer_date

The sub-select on it's own returns about 32000 rows.

Using identically structured tables and the same primary key, if I run this on
Paradox/BDE it takes about 120ms, on MySQL (5.0.24, local server) about 3ms,
and on PostgresSQL (8.1.3, local server) about 1290ms). All on the same
Windows XP Pro machine with 512MB ram of which nearly half is free.

The query plan shows most of the time is spent sorting the 30000+ rows from the 
subquery, so I added a further
subquery as follows:

/* Select all sheep who's most recent transfer was into the subject flock */
SELECT DISTINCT f1.regn_no, f1.transfer_date as date_in
FROM SHEEP_FLOCK f1 JOIN
    /* The last transfer date for each sheep */
    (SELECT f.regn_no, MAX(f.transfer_date) as last_xfer_date
    FROM  SHEEP_FLOCK f
    WHERE f.regn_no IN
        /* Limit the rows extracted by the outer sub-query to those relevant to 
the
subject flock */
        /* This typically reduces the time from 1297ms to 47ms - from 35000 rows
to 127 rows */
        (SELECT s.regn_no FROM SHEEP_FLOCK s where s.flock_no = '1359')
    GROUP BY f.regn_no) f2
ON f1.regn_no = f2.regn_no
WHERE f1.flock_no = '1359'
AND f1.transfer_date = f2.last_xfer_date

then as the comment suggests I get a considerable improvement, but it's still an
order of magnitude slower than MySQL.

Can anyone suggest why PostgreSQL performs the original query so much slower 
than even BDE?

ANALYZE?

Regards,

Rodrigo

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to