> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On Behalf Of Milosz Krajewski
> Sent: Monday, July 29, 2002 16:50
> To: [EMAIL PROTECTED]
> Subject: [SQL] How to optimize SQL query ?
>
>
> How to optimize query or just force postgre to do it my way ?
>
> Example:
>   table continets (
>     id numeric, ..., active numeric
>   );
>
>   table countries (
>     id numeric, id_continent numeric, ..., active numeric
>   );
>
>   table cities (
>     id numeric, id_country numeric, ..., active numeric
>   );
>
> relations:
>   cities.id_county are in countries.id
>   countries.id_continent are on continents.id
>
> Query:
>   table temp_cities (
>     id_city numeric;
>   );
>
>
> temp_cities is temp table which holds few (~20) id of cities, to
> show them.
>
> so:
>   select * from cities
>     where cities.id in (select id_city from temp_cities);

AVOID: indexes (which you should have defined on primary keys [implicitely
defined by PostgreSQL] and foreign keys [must be defined explicitely]) are
not used

> or:
>   select * from cities, temp_cities tmp
>     where cities.id = tmp.id_city;

BETTER ;-)

> works fine.
>
> But the problem starts here:
>
>   select * from cities, coutries, continets
>     where
>       (cities.id in (select id_city from temp_cities)) and
>       (cities.id_county = countries.id) and
>       (countries.id_continent = continents.id) and
>       (cities.active = 1) and (coutries.active = 1) and
>       (continents.active = 1)
>
> (active means is row active or archive, many of them are active,
> but I have to check it)
>
> Posgre is planning it like this:
>   joins cities with coutries
>   joins countries with continents
>   selects active
>   filtering with cities.id (with temp_cities)
>
> If I could force it to filter cities.id first
> (I can do this with Oracle by changing
> "select id_city from temp_cities" to
> "select id_city from temp_cities group by id_city")
> it will work much (1000x) faster.
>
> Can I force postgre do it my way ?

Use the explicit JOIN syntax and join each table one after another in the
order you feel is the more adequate for your query. PostgreSQL will respect
this order.

>From one I understand, you should write it this way:
SELECT
        *
FROM
        continents
INNER JOIN
        countries
        ON ( continents.id = country.id_continent )
INNER JOIN
        cities
        ON ( countries.id = cities.id_country )
INNER JOIN
        temp_cities
        ON ( cities.id = temp_cities.id )
WHERE
        ( continents.active = 1 )
        AND ( countries.active = 1 )
        AND ( cities.active = 1 )

The reason to do so are:
1. Joining first on the tables that contain the less rows contributes to
keep the cartesian product between the joins as low as possible
2. Thus if a continent - respectively country - is not active, it will be
casted out from the join immediately and thus reduce the cartesian product
for the next join(s)
3. Joining on 'temp-cities' allows the usage of the hopefully defined index

I achieved ratio from 10000 to 1 respecting this strategy on a scenario
fairly closed to yours ;-)

Ce.D

> --
> [ Milosz "Krashan" Krajewski ][ mail: [EMAIL PROTECTED], UIN: 1319535 ]
> [ inet: Vilge, Vilgefortz    ][ www: http://www.ds2.pg.gda.pl/~krash ]
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
>



---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Reply via email to