Ken Egervari wrote:
First, what version of postgres, and have you run VACUUM ANALYZE recently? Also, please attach the result of running EXPLAIN ANALYZE. (eg, explain analyze select s.* from shipment ...)
I'm using postgres 8.0. I wish I could paste explain analyze, but I won't be at work for a few days. I was hoping some Postgres/SQL experts here would be able to simply look at the query and make recommendations because it's not a very difficult or unique query.
That's the problem. Without explain analyze, it's hard to say why it is performing weird, because it *does* look like a straightforward query.
It's very possible that you don't have up-to-date statistics, which causes postgres to make a bad estimate of what the fastest plan is.
I run VACUUM ANALYZE religiously. I even dumped the production database and used it as my test database after a full vacuum analyze. It's really as fresh as it can be.
Good. Again, this is just the first precaution, as not everyone is as careful as you. And without the explain analyze, you can't tell what the planner estimates are.
I don't know how to make Hibernate do what you want, but if you change the query to using subselects (not all databases support this, so hibernate might not let you), you can see a performance improvement.
Yes, Hibernate supports sub-selects. In fact, I can even drop down to JDBC explicitly, so whatever SQL tricks out there I can use will work on Hibernate. In what way will sub-selects improve this query?
When doing massive joins across multiple tables (as you are doing) it is frequently faster to do a couple of small joins where you only need a couple of rows as input to the rest. Something like:
select * from shipment s where s.carrier_code_id in (select cc.id from carrier_code cc join carrier c on cc.carrier_id = c.id) and s.current_status_id in (select cs.id from shipment_status cs where ...)
Again it's something that you can try. I have found quite a few of my queries performed much better with subselects. I'm guessing it's because with big queries it has a harder time figuring out how to refactor (the decision tree becomes big). But I'm not really sure. I just know it can work.
Also sometimes using explicit joins can be worse than just letting the query manager figure it out. So something like select s.* from shipment s, carrier_code cc, carrier c, ... where s.carrier_code_id = cc.id and c.id = cc.carrier_id and ....
I think I can avoid using joins in Hibernate, but it makes the query harder to maintain. How much of a performance benefit are we talking with this change? Since hibernate is an object language, you don't actually have to specify many joins. You can use the "dot" notation.
I'm not saying this *will* improve performance. It is just something to try. It very easily could not be worth the overhead.
Query query = session.createQuery( "select shipment " + "from Shipment shipment " + " inner join shipment.cargoControlNumber.carrierCode.carrier.persons person " + " inner join shipment.currentStatus currentStatus " + " inner join currentStatus.releaseCode releaseCode " + " left join fetch shipment.currentStatus " + "where " + " person.id = :personId and " + " shipment.isPurged = false and " + " releaseCode.number = '9' and " + " currentStatus is not null and " + " currentStatus.date >= current_date - 31 " + "order by currentStatus.date desc" );
query.setParameter( "personId", personId );
query.setFirstResult( firstResult ); query.setMaxResults( maxResults );
As you can see, it's fairly elegant language and maps to SQL quite well.
But again, since this is generated from another program (Hibernate), I really don't know how you tell it how to tune the SQL. Probably the biggest "non-bug" performance improvements are from tuning the SQL.
I agree, but the ones I've tried aren't good enough. I have made these indexes that apply to this query as well as others in my from looking at my SQL scripts. Many of my queries have really sped up to 14 milliseconds from these indexes. But I can't make this query run any faster.
CREATE INDEX carrier_to_person_person_id_idx ON carrier_to_person USING btree (person_id); CREATE INDEX carrier_to_person_carrier_id_idx ON carrier_to_person USING btree (carrier_id); CREATE INDEX carrier_code_carrier_id_idx ON carrier_code USING btree (carrier_id); CREATE INDEX shipment_carrier_code_id_idx ON shipment USING btree (carrier_code_id); CREATE INDEX current_status_date_idx ON shipment_status USING btree (date); CREATE INDEX shipment_current_status_id_idx ON shipment USING btree (current_status_id); CREATE INDEX shipment_status_shipment_id_idx ON shipment_status USING btree (shipment_id);
Thanks for your responses everyone. I'll try and get you that explain analyze. I'm just not at work at the moment but this is a problem that I'm simply puzzled and worried about. I'm getting all of this from CVS on my work server.
There is also the possibility that you are having problems with cross-column correlation, or poor distribution of a column. Postgres doesn't keep cross-column statistics, so if 2 columns are correlated, then it mis-estimates selectivity, and might pick the wrong plan.
In general your query looks decent, we just need to figure out what is going on.
Description: OpenPGP digital signature