Re: [PERFORM] Left Outer Join much faster than non-outer Join?

Ron Mayer Wed, 30 Mar 2005 23:09:01 -0800

Tom Lane wrote:

[EMAIL PROTECTED] writes:

select *
    from streetname_lookup as sl
    join city_lookup as cl on (true)
    left outer join tlid_smaller as ts on (sl.geo_streetname_id = 
ts.geo_streetname_id and cl.geo_city_id=ts.geo_city_id)
    where  str_name='alamo' and  city='san antonio' and state='TX'
;

That's a fairly odd query;


I think it's a very common type of query in data warehousing.

It's reasonably typical of a traditional star schema where
"streetname_lookup" and "city_lookup" are dimension tables
and "tlid_smaller" is the central fact table.

why don't you have any join condition between
streetname_lookup and city_lookup?


Those two tables shared no data.   They merely get the "id"s
for looking things up in the much larger central table.

Unique indexes on the city_lookup and street_lookup make the
cartesian join harmless (they each return only 1 value); and
the huge fact table has a multi-column index that takes both
of the ids from those lookups.


With the tables I have (shown below), how else could one
efficiently fetch the data for "Main St" "San Francisco"?

  streetname_lookup
  (for every street name used in the country)
      streetid  |  name  | type
      ----------+--------+------
      1         |  Main  | St
      2         |  1st   | St

  city_lookup
  (for every city name used in the country)
      cityid  |  name   | state
      --------+---------+------
      1       |  Boston | MA
      2       |  Alameda| CA


  tlid_smaller
  (containing a record for every city block in the country)
   city_id | street_id  | addresses | demographics, etc.
   --------+------------+-----------+----------------------
        1  |    1       | 100 block | [lots of columns]
        1  |    1       | 200 block | [lots of columns]
        1  |    1       | 300 block | [lots of columns]
        1  |    2       | 100 block | [lots of columns]
        1  |    2       | 100 block | [lots of columns]

The planner won't consider Cartesian joins unless forced to, which is
why it fails to consider the join order "((sl join cl) join ts)" unless
you have an outer join in the mix.  I think that's generally a good
heuristic, and am disinclined to remove it ...


IMHO it's a shame it doesn't even consider it when the estimated
results are very small.   I think often joins that merely look up
IDs would be useful to consider for the purpose of making potential
multi-column indexes (as shown in the previous email's explain
analyze result where the cartesian join was 30X faster than the
other approach since it could use the multi-column index on the
very large table).

Ron

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Left Outer Join much faster than non-outer Join?

Reply via email to