Hi everyone.
I'm new to this forum and was wondering if anyone
would be kind enough to help me out with a pretty severe performance
issue. I believe the problem to be rather generic, so I'll put it in
generic terms. Since I'm at home and not a work (but this is really
bugging me), I can't post any specifics. However, I think my explaination
will suffice.
I have a 2 tables that are are getting large and
will only get larger with time (expoentially as more users sign on to the
system). Right the now, a table called 'shipment' contains about 16,000
rows and 'shipment_status' contains about 32,500 rows. These aren't
massive rows (I keep reading about tables with millions), but they will
definately get into 6 digits by next year and query performance is quite
poor.
Now, from what I can understand about tuning, you
want to specify good filters, provide good indexes on the driving filter as well
as any referencial keys that are used while joining. This has helped me
solve performance problems many times in the past (for example, changing a query
speed from 2 seconds to 21 milliseconds).
However, I am now tuning queries that operate on
these two tables and the filters aren't very good (the best is a filter ratio of
0.125) and the number of rows returned is very large (not taking into
consideration limits).
For example, consider something like this
query that takes ~1 second to finish:
select s.*, ss.*
from shipment s, shipment_status ss, release_code
r
where s.current_status_id = ss.id
and ss.release_code_id =
r.id
and r.filtered_column =
'5'
order by ss.date desc
limit 100;
Release code is just a very small table of 8 rows
by looking at the production data, hence the 0.125 filter ratio. However,
the data distribution is not normal since the filtered column actually pulls out
about 54% of the rows in shipment_status when it joins. Postgres seems to
be doing a sequencial scan to pull out all of these rows. Next, it joins
approx 17550 rows to shipment. Since this query has a limit, it only
returns the first 100, which seems like a waste.
Now, for this query, I know I can filter out the
date instead to speed it up. For example, I can probably search for all
the shipments in the last 3 days instead of limiting it to 100. But since
this isn't a real production query, I only wanted to show it as an example since
many times I cannot do a filter by the date (and the sort may be date or
something else irrelavant).
I'm just stressed out how I can make queries like
this more efficient since all I see is a bunch of hash joins and sequencial
scans taking all kinds of time.
I guess here are my 2 questions:
1. Should I just change beg to change the
requirements so that I can make more specific queries and more screens to access
those?
2. Can you recommend ways so that postgres acts on
big tables more efficiently? I'm not really interested in this specific
case (I just made it up). I'm more interested in general solutions to this
general problem of big table sizes with bad filters and where join orders don't
seem to help much.
Thank you very much for your
help.
Best Regards,
Ken Egervari
|
- [PERFORM] Performance problem with semi-large tables Ken Egervari