Re: [SQL] Self-referencing table question

Sean Davis Thu, 24 Mar 2005 09:35:31 -0800

Thanks. I thought about that a bit and it seems like it is highly likely to be expensive for a single query (though I should probably try it at some point). If I do find myself reformatting results after response to user input (i.e., reusing the query), though, then your solution is likely to be very useful.

Sean

On Mar 24, 2005, at 11:13 AM, Edmund Bacon wrote:

Sometimes using a temp table is a better idea:
e.g.
-- start by creating a temp table 'tids' that hold the to_ids that
-- we are interested in.
 SELECT to_id
   INTO TEMP TABLE tids
   FROM correlation
  WHERE from_id  = 1234
  ORDER BY val DESC limit 100;
-- The following temp table makes use of the primary key on
-- the correlation table, and the stated goal from the original
-- question that:
--         from_id > to_id
--     and from_id in (tids.to_id)
--    and to_id in (tids.to_id)
 SELECT t1.to_id AS from_id, t2.to_id
   INTO TEMP TABLE from_to
   FROM tids t1, tids t2
  WHERE t1.to_id > t2.to_id;
-- Now we can use the from_to table as an index into the correlation
-- table.
 SELECT c.from_id, c.to_id, c.val
   FROM from_to
   JOIN correlation c USING(from_id, to_id)
  WHERE val > 0.5;
The explain analyze for the final select works out to: Nested Loop (cost=0.00..50692.00 rows=8488 width=16) (actual time=0.171..150.095 rows=2427 loops=1) -> Seq Scan on from_to (cost=0.00..79.38 rows=5238 width=8) (actual time=0.006..7.660 rows=4950 loops=1) -> Index Scan using correlation_pkey on correlation c (cost=0.00..9.63 rows=2 width=16) (actual time=0.024..0.025 rows=0 loops=4950) Index Cond: (("outer".from_id = c.from_id) AND ("outer".to_id = c.to_id)) Filter: (val > 0.5::double precision) Total runtime: 152.261 ms
Richard Huxton wrote:
Sean Davis wrote:
I answer my own question, if only for my own records. The following query is about 5-6 times faster than the original. Of course, if anyone else has other ideas, I'd be happy to hear them.
Sean
explain analyze select from_id,to_id,val from exprsdb.correlation where from_id in (select to_id from exprsdb.correlation where from_id=2424 order by val desc limit 100) and to_id in (select to_id from exprsdb.correlation where from_id=2424 order by val desc limit 100) and val>0.6 and to_id<from_id;
Might not be any faster, but you can do this as a self-join with subquery:
SELECT c1.from_id, c1.to_id, c1.val
FROM
  correlation c1,
  (
    SELECT to_id FROM correlation WHERE from_id=2424
    ORDER BY val DESC LIMIT 100
  ) AS c2
  (
    SELECT to_id FROM correlation WHERE from_id=2424
    ORDER BY val DESC LIMIT 100
  ) AS c3
WHERE
  c1.from_id = c2.to_id
  AND c1.to_id = c3.to_id
  AND c1.val > 0.5
  AND c1.to_id < from_id
;
I think PG should be smart enough nowadays to figure out these two queries are basically the same.
--
Edmund Bacon <[EMAIL PROTECTED]>
---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [SQL] Self-referencing table question

Reply via email to