Thanks. I thought about that a bit and it seems like it is highly
likely to be expensive for a single query (though I should probably try
it at some point). If I do find myself reformatting results after
response to user input (i.e., reusing the query), though, then your
solution is likely to be very useful.
Sean
On Mar 24, 2005, at 11:13 AM, Edmund Bacon wrote:
Sometimes using a temp table is a better idea:
e.g.
-- start by creating a temp table 'tids' that hold the to_ids that
-- we are interested in.
SELECT to_id
INTO TEMP TABLE tids
FROM correlation
WHERE from_id = 1234
ORDER BY val DESC limit 100;
-- The following temp table makes use of the primary key on
-- the correlation table, and the stated goal from the original
-- question that:
-- from_id > to_id
-- and from_id in (tids.to_id)
-- and to_id in (tids.to_id)
SELECT t1.to_id AS from_id, t2.to_id
INTO TEMP TABLE from_to
FROM tids t1, tids t2
WHERE t1.to_id > t2.to_id;
-- Now we can use the from_to table as an index into the correlation
-- table.
SELECT c.from_id, c.to_id, c.val
FROM from_to
JOIN correlation c USING(from_id, to_id)
WHERE val > 0.5;
The explain analyze for the final select works out to:
Nested Loop (cost=0.00..50692.00 rows=8488 width=16) (actual
time=0.171..150.095 rows=2427 loops=1)
-> Seq Scan on from_to (cost=0.00..79.38 rows=5238 width=8)
(actual time=0.006..7.660 rows=4950 loops=1)
-> Index Scan using correlation_pkey on correlation c
(cost=0.00..9.63 rows=2 width=16) (actual time=0.024..0.025 rows=0
loops=4950)
Index Cond: (("outer".from_id = c.from_id) AND ("outer".to_id
= c.to_id))
Filter: (val > 0.5::double precision)
Total runtime: 152.261 ms
Richard Huxton wrote:
Sean Davis wrote:
I answer my own question, if only for my own records. The following
query is about 5-6 times faster than the original. Of course, if
anyone else has other ideas, I'd be happy to hear them.
Sean
explain analyze select from_id,to_id,val from exprsdb.correlation
where from_id in (select to_id from exprsdb.correlation where
from_id=2424 order by val desc limit 100) and to_id in (select
to_id from exprsdb.correlation where from_id=2424 order by val desc
limit 100) and val>0.6 and to_id<from_id;
Might not be any faster, but you can do this as a self-join with
subquery:
SELECT c1.from_id, c1.to_id, c1.val
FROM
correlation c1,
(
SELECT to_id FROM correlation WHERE from_id=2424
ORDER BY val DESC LIMIT 100
) AS c2
(
SELECT to_id FROM correlation WHERE from_id=2424
ORDER BY val DESC LIMIT 100
) AS c3
WHERE
c1.from_id = c2.to_id
AND c1.to_id = c3.to_id
AND c1.val > 0.5
AND c1.to_id < from_id
;
I think PG should be smart enough nowadays to figure out these two
queries are basically the same.
--
Edmund Bacon <[EMAIL PROTECTED]>
---------------------------(end of
broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])