Ken Egervari wrote:

I took John's advice and tried to work with sub-selects. I tried this variation, which actually seems like it would make a difference conceptually since it drives on the person table quickly. But to my surprise, the query runs at about 375 milliseconds. I think it's because it's going over that shipment table multiple times, which is where the results are coming from.


I also made a version that runs over shipment a single time, but it's exactly 250 milliseconds. I guess the planner does the exact same thing.

Why are you now left joining driver and carrier code, but inner joining shipment_status? I assume this is the *real* query that you are executing.

From the earlier explain analyze, and your statements, the initial person p should be the heavily selective portion.

And what does "driver" get you? It isn't in the return, and it isn't part of a selectivity clause.
You are also double joining against carrier code, once as a left outer join, and once in the inner join.

This query doesn't seem quite right. Are you sure it is generating the rows you are expecting?

select s.*, ss.*

from shipment s
inner join shipment_status ss on s.current_status_id=ss.id
inner join release_code rc on ss.release_code_id=rc.id
left outer join driver d on s.driver_id=d.id
left outer join carrier_code cc on s.carrier_code_id=cc.id
where s.carrier_code_id in (
 select cc.id
 from person p
  inner join carrier_to_person ctp on p.id=ctp.person_id
  inner join carrier c on ctp.carrier_id=c.id
  inner join carrier_code cc on cc.carrier_id = c.id
 where p.id = 355
)
and s.current_status_id is not null
and s.is_purged=false
and(rc.number='9' )
and(ss.date>=current_date-31 )

order by ss.date desc

You might want to post the explain analyze of this query to have a point of reference, but what about something like this:
select s.*, ss.*

from shipment_status ss on s.current_status_id=ss.id
join (select s.* from shipment s
      where s.carrier_code_id in
           (select cc.id
              from person p
             inner join carrier_to_person ctp on p.id=ctp.person_id
             inner join carrier c on ctp.carrier_id=c.id
             inner join carrier_code cc on cc.carrier_id = c.id
            where p.id = 355
           )
        and s.current_status_id is not null
        and s.is_purged=false
) as i -- Just a name for the subselect since it is in a join
inner join release_code rc on ss.release_code_id=rc.id
where (rc.number='9' )
and(ss.date between current_date-31 and current_date())

order by ss.date desc

My idea with this query is to minimize the number of shipment rows that need to be generated before joining with the other rows. My syntax is probably a little bit off, since I can't actually run it against real tables.
But looking at your *original* query, you were getting 15000 rows out of shipment_status, and then 27700 rows out of shipment, which was then being merge-joined down to only 300 rows, and then hash-joined down to 39.

I'm just trying to think of ways to prevent it from blossoming into 27k rows to start with.

Please double check your query, because it seems to be grabbing unnecessary rows with the left joins, and then post another explain analyze with one (or several) different subselect forms.

John
=:->


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to