OK,
So if I change the NOT IN clause the query will run with a MERGE JOIN and
complete in about 20 seconds. Have a look at the logic I am following and see
if it makes sense. Might this just be a case where because there is more data
in one DB compared to another (even though the counts are "close"), that's the
difference in this whole thing?
-K
postgres@> more badquery.sql
EXPLAIN SELECT substring(users.email from '^.*\@') || domains.domain as email,
users.customer_id,
users.password,
p.policy_set_id,
users.user_id,
domains.auth_type
FROM
(
SELECT d.customer_id, d.domain, d.domain_id, d.auth_type,
d.active from mxl_domain d
UNION ALL
SELECT d.customer_id, da.domain, da.domain_id, d.auth_type,
da.active from mxl_domain d, mxl_domain_alias da WHERE da.domain_id =
d.domain_id
) as domains,
(
SELECT u.email, u.customer_id, u.user_id, u.domain_id,
u.password, u.active from mxl_user u
UNION ALL
SELECT ua.email, u.customer_id, u.user_id, u.domain_id,
u.password, ua.active from mxl_user u, mxl_user_alias ua WHERE ua.user_id =
u.user_id
) as users,
wds_policy_set p
WHERE users.customer_id = p.id
AND users.customer_id = domains.customer_id
AND users.domain_id = domains.domain_id
AND p.default_flag = 1
AND p.web_access_flag = 1
AND p.scope = 3
AND domains.active = 1
AND users.active != 0
AND p.active = 1
AND users.user_id NOT IN (SELECT user_id FROM mxl_user_group mug
WHERE mug.user_id = users.user_id);
postgres@ time psql -Upostgres -dmxl -fbadquery.sql
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=3522288.96..3534654.13 rows=245730 width=1564)
Merge Cond: ((p.id = u.customer_id) AND (d.domain_id = u.domain_id))
-> Sort (cost=38160.35..38471.30 rows=124381 width=536)
Sort Key: p.id, d.domain_id
-> Hash Join (cost=578.15..10203.83 rows=124381 width=536)
Hash Cond: (d.customer_id = p.id)
-> Append (cost=0.00..4989.45 rows=77541 width=528)
-> Seq Scan on mxl_domain d (cost=0.00..1810.88
rows=64390 width=28)
Filter: (active = 1)
-> Subquery Scan "*SELECT* 2" (cost=2454.78..3178.57
rows=13151 width=528)
-> Hash Join (cost=2454.78..3047.06 rows=13151
width=33)
Hash Cond: (da.domain_id = d.domain_id)
-> Seq Scan on mxl_domain_alias da
(cost=0.00..296.39 rows=13151 width=25)
Filter: (active = 1)
-> Hash (cost=1649.90..1649.90 rows=64390
width=12)
-> Seq Scan on mxl_domain d
(cost=0.00..1649.90 rows=64390 width=12)
-> Hash (cost=525.74..525.74 rows=4193 width=8)
-> Seq Scan on wds_policy_set p (cost=0.00..525.74
rows=4193 width=8)
Filter: ((default_flag = 1) AND (web_access_flag =
1) AND (active = 1) AND (scope = 3))
-> Materialize (cost=3484128.61..3497039.21 rows=1032848 width=1044)
-> Sort (cost=3484128.61..3486710.73 rows=1032848 width=1044)
Sort Key: u.customer_id, u.domain_id
-> Append (cost=0.00..2826808.61 rows=1032848 width=1044)
-> Index Scan using mxl_user_domain_id_idx on mxl_user u
(cost=0.00..2416377.66 rows=906921 width=52)
Filter: ((active <> 0) AND (NOT (subplan)))
SubPlan
-> Index Scan using mxl_user_group_uid_idx on
mxl_user_group mug (cost=0.00..2.47 rows=1 width=4)
Index Cond: (user_id = $0)
-> Subquery Scan "*SELECT* 2" (cost=8.23..410430.95
rows=125927 width=1044)
Filter: (NOT (subplan))
-> Merge Join (cost=8.23..95536.63 rows=251854
width=55)
Merge Cond: (u.user_id = ua.user_id)
-> Index Scan using mxl_user_pkey on mxl_user
u (cost=0.00..77679.47 rows=1881318 width=28)
-> Index Scan using mxl_user_alias_uid_idx on
mxl_user_alias ua (cost=0.00..10109.21 rows=251854 width=31)
Filter: (ua.active <> 0)
SubPlan
-> Index Scan using mxl_user_group_uid_idx on
mxl_user_group mug (cost=0.00..2.47 rows=1 width=4)
Index Cond: (user_id = $0)
(38 rows)
postgres@ time psql -Upostgres -dmxl -fbadquery.sql -okda.out
real 0m22.645s
user 0m1.565s
sys 0m0.246s
postgres@> wc -l kda.out
285563 kda.out
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general