It does the merge (not-semi) join: production=> explain analyze SELECT ac.* FROM balances ac JOIN customers o ON (o.id = ac.customer_id AND o.group_id = 45);
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=2172.47..19959.82 rows=6 width=80) (actual time=114.578..243898.199 rows=7318 loops=1) Merge Cond: (ac.customer_id = o.id) -> Index Scan using balances_customer_id_index on balances ac (cost=0.00..727.42 rows=16876 width=80) (actual time=0.025..20.972 rows=16876 loops=1) -> Index Scan using customers_pkey on customers o (cost=0.00..64811.57 rows=179 width=8) (actual time=92.174..243813.231 rows=7672 loops=1) Filter: (group_id = 45) Rows Removed by Filter: 212699113 Total runtime: 243901.595 ms (7 rows) Sean On Fri, Oct 9, 2015 at 1:09 PM, Igor Neyman <iney...@perceptron.com> wrote: > Hello, > > > > I'm seeing some inexplicable (to me) behavior in PostgreSQL 9.2. I checked > > the archives, but I still can't explain it. Apologies if I missed > something. > > > > 1. When I join two tables with "WHERE id IN (...)" versus with an explicit > > join, and the join column for the inner table is a primary key, I would > expect > > the same behavior in both cases, but the optimizer is choosing a merge > join in > > one case and a merge semi join in the other. There's at most one customer > > with a given id. Why not do a semi join? > > > > 2. Even though the join methods are different, I would expect about the > same > > performance in either case, but one query takes only a few hundred > > milliseconds while the other takes hundreds of seconds. Ouch! > > > > Can anyone help me explain this behavior? > > > > Some details are below. Let me know if it would be helpful to gather > others. > > > > Sean > > > > > > production=> select version(); > > version > > > ---------------------------------------------------------------------------------------- > > PostgreSQL 9.2.13 on i686-pc-linux-gnu, compiled by gcc (Debian 4.7.2-5) > 4.7.2, 32-bit > > (1 row) > > > > production=> \d customers > > Table "public.customers" > > Column | Type | Modifiers > > > ---------------+----------+------------------------------------------------------- > > id | bigint | not null default > nextval('customers_id_seq'::regclass) > > group_id | bigint | > > ... > > Indexes: > > "customers_pkey" PRIMARY KEY, btree (id) > > ... > > > > production=> select count(*) from customers; > > count > > -------- > > 473733 > > (1 row) > > > > production=> \d balances > > Table "public.balances" > > Column | Type | Modifiers > > > -------------------+----------+------------------------------------------------------ > > id | bigint | not null default > nextval('balances_id_seq'::regclass) > > balance | integer | not null default 0 > > tracking_number | integer | not null > > customer_id | bigint | not null > > ... > > Indexes: > > "balances_pkey" PRIMARY KEY, btree (id) > > "balances_customer_tracking_number_index" UNIQUE, btree (customer_id, > tracking_number) > > ... > > > > production=> select count(*) from balances; > > count > > ------- > > 16876 > > (1 row) > > > > production=> analyze verbose customers; > > INFO: analyzing "public.customers" > > INFO: "customers": scanned 14280 of 14280 pages, containing 475288 live > rows and 1949 dead rows; 300000 rows in sample, 475288 estimated total rows > > ANALYZE > > > > production=> analyze verbose balances; > > INFO: analyzing "public.balances" > > INFO: "balances": scanned 202 of 202 pages, containing 16876 live rows > and 0 dead rows; 16876 rows in sample, 16876 estimated total rows > > ANALYZE > > > > production=> explain analyze SELECT * FROM balances where customer_id IN > (SELECT id from customers WHERE group_id = 45); > > > QUERY PLAN > > > ----------------------------------------------------------------------------------------------------------------------------------------------------- > > Merge Semi Join (cost=2442.14..19958.30 rows=16876 width=80) (actual > time=119.905..145.126 rows=7318 loops=1) > > Merge Cond: (balances.customer_id = customers.id) > > -> Index Scan using balances_customer_id_index on balances > (cost=0.00..727.79 rows=16876 width=80) (actual time=0.302..9.477 > rows=16876 loops=1) > > -> Index Scan using customers_pkey on customers (cost=0.00..64192.97 > rows=184 width=8) (actual time=103.354..126.459 rows=359 loops=1) > > Filter: (group_id = 45) > > Rows Removed by Filter: 141684 > > Total runtime: 146.659 ms > > (7 rows) > > > > production=> explain analyze SELECT ac.* FROM balances ac join customers o > ON o.id = ac.customer_id WHERE o.group_id = 45; > > > QUERY PLAN > > > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Merge Join (cost=2214.50..20216.86 rows=30 width=80) (actual > time=185.615..201991.752 rows=7318 loops=1) > > Merge Cond: (ac.customer_id = o.id) > > -> Index Scan using balances_customer_tracking_number_index on > balances ac (cost=0.00..1007.49 rows=16876 width=80) (actual > time=0.068..25.036 rows=16876 loops=1) > > -> Index Scan using customers_pkey on customers o > (cost=0.00..63836.61 rows=836 width=8) (actual time=159.840..201915.765 > rows=7672 loops=1) > > Filter: (group_id = 45) > > Rows Removed by Filter: 212699113 > > Total runtime: 201995.044 ms > > (7 rows) > > > > > > What if you rewrite your second query like this: > > > > SELECT ac.* > > FROM balances ac JOIN customers o ON (o.id = ac.customer_id AND > o.group_id = 45); > > > > Regards, > > Igor Neyman > > > > >