Re: [PERFORM] Help with tuning this query

John Arbash Meinel Wed, 02 Mar 2005 09:41:12 -0800

Ken Egervari wrote:

First, what version of postgres, and have you run VACUUM ANALYZE
recently?
Also, please attach the result of running EXPLAIN ANALYZE.
(eg, explain analyze select s.* from shipment ...)

I'm using postgres 8.0.  I wish I could paste explain analyze, but I
won't be at work for a few days.  I was hoping some Postgres/SQL
experts here would be able to simply look at the query and make
recommendations because it's not a very difficult or unique query.

That's the problem. Without explain analyze, it's hard to say why it is
performing weird, because it *does* look like a straightforward query.

It's very possible that you don't have up-to-date statistics, which
causes postgres to make a bad estimate of what the fastest plan is.

I run VACUUM ANALYZE religiously.  I even dumped the production
database and used it as my test database after a full vacuum analyze.
It's really as fresh as it can be.

Good. Again, this is just the first precaution, as not everyone is as
careful as you. And without the explain analyze, you can't tell what the
planner estimates are.

I don't know how to make Hibernate do what you want, but if you change
the query to using subselects (not all databases support this, so
hibernate might not let you), you can see a performance improvement.

Yes, Hibernate supports sub-selects.  In fact, I can even drop down to
JDBC explicitly, so whatever SQL tricks out there I can use will work
on Hibernate.  In what way will sub-selects improve this query?

When doing massive joins across multiple tables (as you are doing) it is
frequently faster to do a couple of small joins where you only need a
couple of rows as input to the rest. Something like:

select * from shipment s
where s.carrier_code_id in
       (select cc.id from carrier_code cc join carrier c on
cc.carrier_id = c.id)
and s.current_status_id in (select cs.id from shipment_status cs where ...)

Again it's something that you can try. I have found quite a few of my
queries performed much better with subselects.
I'm guessing it's because with big queries it has a harder time figuring
out how to refactor (the decision tree becomes big). But I'm not really
sure. I just know it can work.

Also sometimes using explicit joins can be worse than just letting the
query manager figure it out. So something like
select s.* from shipment s, carrier_code cc, carrier c, ...
   where s.carrier_code_id = cc.id and c.id = cc.carrier_id and ....

I think I can avoid using joins in Hibernate, but it makes the query
harder to maintain.  How much of a performance benefit are we talking
with this change?  Since hibernate is an object language, you don't
actually have to specify many joins.  You can use the "dot" notation.

I'm not saying this *will* improve performance. It is just something to
try. It very easily could not be worth the overhead.

    Query query = session.createQuery(
        "select shipment " +
        "from Shipment shipment " +
        "   inner join
shipment.cargoControlNumber.carrierCode.carrier.persons person " +
        "   inner join shipment.currentStatus currentStatus " +
        "   inner join currentStatus.releaseCode releaseCode " +
        "   left join fetch shipment.currentStatus " +
        "where " +
        "   person.id = :personId and " +
        "   shipment.isPurged = false and " +
        "   releaseCode.number = '9' and " +
        "   currentStatus is not null and " +
        "   currentStatus.date >= current_date - 31 " +
        "order by currentStatus.date desc"
    );

    query.setParameter( "personId", personId );

    query.setFirstResult( firstResult );
    query.setMaxResults( maxResults );

    return query.list();

As you can see, it's fairly elegant language and maps to SQL quite well.

But again, since this is generated from another program (Hibernate), I
really don't know how you tell it how to tune the SQL. Probably the
biggest "non-bug" performance improvements are from tuning the SQL.

I agree, but the ones I've tried aren't good enough.  I have made
these indexes that apply to this query as well as others in my from
looking at my SQL scripts.  Many of my queries have really sped up to
14 milliseconds from these indexes.  But I can't make this query run
any faster.

CREATE INDEX carrier_to_person_person_id_idx ON carrier_to_person
USING btree (person_id);
CREATE INDEX carrier_to_person_carrier_id_idx ON carrier_to_person
USING btree (carrier_id);
CREATE INDEX carrier_code_carrier_id_idx ON carrier_code USING btree
(carrier_id);
CREATE INDEX shipment_carrier_code_id_idx ON shipment USING btree
(carrier_code_id);
CREATE INDEX current_status_date_idx ON shipment_status USING btree
(date);
CREATE INDEX shipment_current_status_id_idx ON shipment USING btree
(current_status_id);
CREATE INDEX shipment_status_shipment_id_idx ON shipment_status USING
btree (shipment_id);

Thanks for your responses everyone.  I'll try and get you that explain
analyze.  I'm just not at work at the moment but this is a problem
that I'm simply puzzled and worried about.  I'm getting all of this
from CVS on my work server.

Ken


There is also the possibility that you are having problems with
cross-column correlation, or poor distribution of a column. Postgres
doesn't keep cross-column statistics, so if 2 columns are correlated,
then it mis-estimates selectivity, and might pick the wrong plan.

In general your query looks decent, we just need to figure out what is
going on.

John
=:->

signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] Help with tuning this query

Reply via email to