* Craig James <[EMAIL PROTECTED]> [20070728 22:00]:
> >>SELECT * FROM (
> >> (SELECT * FROM large_table lt
> >> WHERE lt.user_id = 12345
> >> ORDER BY created_at DESC LIMIT 10) AS q1
> >> UNION
> >> (SELECT * FROM large_table lt
> >> WHERE user_id IN (SELECT contact_id FROM relationships WHERE
> >> user_id=12345)
> >> ORDER BY created_at DESC LIMIT 10) AS q2
> >>ORDER BY created_at DESC LIMIT 10;
> >
> >It's not possible to use ORDER BY or LIMIT within unioned queries.
> >
> >http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION
>
> If I'm reading this documentation correctly, it *is* possible, as long as
> they're inside of a sub-select, as in this case.
I completely overlooked that obvious note in the documentation,
sorry. I tried it only with the aliases which fooled me into thinking
that doesn't work at all:
testdb=# (select 1 limit 1) as q1 union (select 2) as q2;
ERROR: syntax error at or near "as"
LINE 1: (select 1 limit 1) as q1 union (select 2) as q2;
^
but this works:
testdb=# (select 1 limit 1) union (select 2);
?column?
----------
1
2
Great - that works!
What I didn't realize in the original post is that the problem
actually seems to be how to retrieve the rows from large_table for the
correlated relationships in an efficient way - the second of the two
queries that could be UNIONed.
Using a subselect is efficient for the user with few relationships and
matched rows at the end of the sorted large_table:
testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt
WHERE user_id IN (SELECT contact_id FROM relationships WHERE
user_id=12345)
ORDER BY created_at DESC LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=6963.94..6963.96 rows=10 width=621) (actual time=94.598..94.629
rows=4 loops=1)
-> Sort (cost=6963.94..6966.96 rows=1211 width=621) (actual
time=94.592..94.602 rows=4 loops=1)
Sort Key: lt.created_at
-> Nested Loop (cost=39.52..6901.92 rows=1211 width=621) (actual
time=85.670..94.547 rows=4 loops=1)
-> HashAggregate (cost=39.52..39.53 rows=1 width=4) (actual
time=23.549..23.552 rows=1 loops=1)
-> Bitmap Heap Scan on relationships (cost=4.33..39.49
rows=10 width=4) (actual time=23.526..23.530 rows=1 loops=1)
Recheck Cond: (user_id = 12345)
-> Bitmap Index Scan on
relationships_user_id_contact_id_index (cost=0.00..4.33 rows=10 width=0)
(actual time=0.027..0.027 rows=1 loops=1)
Index Cond: (user_id = 12345)
-> Index Scan using large_user_id_started_at_index on
large_table lt (cost=0.00..6834.04 rows=2268 width=621) (actual
time=62.108..70.952 rows=4 loops=1)
Index Cond: (lt.user_id = relationships.contact_id)
Total runtime: 94.875 ms
But the subselect is not fast for the user with many relationships and
matched rows at the beginning of the sorted large_table:
testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt
WHERE user_id IN (SELECT contact_id FROM relationships WHERE
user_id=55555)
ORDER BY created_at DESC LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=6963.94..6963.96 rows=10 width=621) (actual
time=53187.349..53187.424 rows=10 loops=1)
-> Sort (cost=6963.94..6966.96 rows=1211 width=621) (actual
time=53187.341..53187.360 rows=10 loops=1)
Sort Key: lt.created_at
-> Nested Loop (cost=39.52..6901.92 rows=1211 width=621) (actual
time=201.728..52673.800 rows=69018 loops=1)
-> HashAggregate (cost=39.52..39.53 rows=1 width=4) (actual
time=178.777..178.966 rows=40 loops=1)
-> Bitmap Heap Scan on relationships (cost=4.33..39.49
rows=10 width=4) (actual time=47.049..178.560 rows=40 loops=1)
Recheck Cond: (user_id = 55555)
-> Bitmap Index Scan on
relationships_user_id_contact_id_index (cost=0.00..4.33 rows=10 width=0)
(actual time=28.721..28.721 rows=40 loops=1)
Index Cond: (user_id = 55555)
-> Index Scan using large_user_id_started_at_index on
large_table lt (cost=0.00..6834.04 rows=2268 width=621) (actual
time=21.994..1301.375 rows=1725 loops=40)
Index Cond: (lt.user_id = relationships.contact_id)
Total runtime: 53188.584 ms
Using a join now the query for matches for the user with little data is slow:
testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt
JOIN relationships r ON lt.user_id=r.contact_id WHERE
r.user_id=12345
ORDER BY lt.created_at DESC LIMIT 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..24116.65 rows=10 width=645) (actual
time=100348.436..145552.633 rows=4 loops=1)
-> Nested Loop (cost=0.00..28751864.52 rows=11922 width=645) (actual
time=100348.429..145552.602 rows=4 loops=1)
-> Index Scan Backward using large_created_at_index on large_table lt
(cost=0.00..824833.09 rows=4384343 width=621) (actual time=28.961..82448.167
rows=4384064 loops=1)
-> Index Scan using relationships_user_id_contact_id_index on
relationships r (cost=0.00..6.36 rows=1 width=24) (actual time=0.009..0.009
rows=0 loops=4384064)
Index Cond: ((r.user_id = 12345) AND (lt.user_id = r.contact_id))
Total runtime: 145552.809 ms
And for the user with much data it's fast:
testdb=# EXPLAIN ANALYZE SELECT * FROM large_table lt
JOIN relationships r ON lt.user_id=r.contact_id WHERE
r.user_id=55555
ORDER BY lt.created_at DESC LIMIT 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..24116.65 rows=10 width=645) (actual time=0.068..0.428
rows=10 loops=1)
-> Nested Loop (cost=0.00..28751864.52 rows=11922 width=645) (actual
time=0.063..0.376 rows=10 loops=1)
-> Index Scan Backward using large_created_at_index on large_table lt
(cost=0.00..824833.09 rows=4384343 width=621) (actual time=0.028..0.064
rows=13 loops=1)
-> Index Scan using relationships_user_id_contact_id_index on
relationships r (cost=0.00..6.36 rows=1 width=24) (actual time=0.010..0.013
rows=1 loops=13)
Index Cond: ((r.user_id = 55555) AND (lt.user_id = r.contact_id))
Total runtime: 0.609 ms
Any ideas?
tia, Til
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org