Re: [PERFORM] Performance problem with semi-large tables

PFC Sat, 29 Jan 2005 14:12:09 -0800

select s.*, ss.*
from shipment s, shipment_status ss, release_code r
where s.current_status_id = ss.id
   and ss.release_code_id = r.id
   and r.filtered_column = '5'
order by ss.date desc
limit 100;

Release code is just a very small table of 8 rows by looking at the production data, hence the 0.125 filter ratio. However, the data distribution is not normal since the filtered column actually pulls out about 54% of the rows in shipment_status when it joins. Postgres seems to be doing a sequencial scan to pull out all of these rows. Next, it joins approx 17550 rows to shipment. Since this query has a limit, it only returns the first 100, which seems like a waste.

Well, postgres does what you asked. It will be slow, because you have a full table join. LIMIT does not change this because the rows have to be sorted first.

The date is in shipment_status so you should first get the shipment_status.id that you need and later join to shipment. This will avoid the big join :


SELECT s.*, ss.* FROM
        (SELECT * FROM shipment_status WHERE release_code_id IN
                (SELECT r.id FROM release_code WHERE r.filtered_column = '5')
        ORDER BY date DESC LIMIT 100
        ) as ss, shipment s
WHERE s.current_status_id = ss.id
ORDER BY date DESC LIMIT 100

        Is this better ?


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
     subscribe-nomail command to [EMAIL PROTECTED] so that your
     message can get through to the mailing list cleanly

Re: [PERFORM] Performance problem with semi-large tables

Reply via email to