Re: [GENERAL] Merge join vs merge semi join against primary key

Sean Rhea Fri, 09 Oct 2015 13:31:03 -0700

It does the merge (not-semi) join:

production=> explain analyze SELECT ac.* FROM balances ac JOIN customers o
ON (o.id = ac.customer_id AND o.group_id = 45);


  QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=2172.47..19959.82 rows=6 width=80) (actual
time=114.578..243898.199 rows=7318 loops=1)
   Merge Cond: (ac.customer_id = o.id)
   ->  Index Scan using balances_customer_id_index on balances ac
 (cost=0.00..727.42 rows=16876 width=80) (actual time=0.025..20.972
rows=16876 loops=1)
   ->  Index Scan using customers_pkey on customers o  (cost=0.00..64811.57
rows=179 width=8) (actual time=92.174..243813.231 rows=7672 loops=1)
         Filter: (group_id = 45)
         Rows Removed by Filter: 212699113
 Total runtime: 243901.595 ms
(7 rows)

Sean



On Fri, Oct 9, 2015 at 1:09 PM, Igor Neyman <[email protected]> wrote:

> Hello,
>
>
>
> I'm seeing some inexplicable (to me) behavior in PostgreSQL 9.2. I checked
>
> the archives, but I still can't explain it. Apologies if I missed
> something.
>
>
>
> 1. When I join two tables with "WHERE id IN (...)" versus with an explicit
>
> join, and the join column for the inner table is a primary key, I would
> expect
>
> the same behavior in both cases, but the optimizer is choosing a merge
> join in
>
> one case and a merge semi join in the other. There's at most one customer
>
> with a given id. Why not do a semi join?
>
>
>
> 2. Even though the join methods are different, I would expect about the
> same
>
> performance in either case, but one query takes only a few hundred
>
> milliseconds while the other takes hundreds of seconds. Ouch!
>
>
>
> Can anyone help me explain this behavior?
>
>
>
> Some details are below. Let me know if it would be helpful to gather
> others.
>
>
>
> Sean
>
>
>
>
>
> production=> select version();
>
>                                         version
>
>
> ----------------------------------------------------------------------------------------
>
>  PostgreSQL 9.2.13 on i686-pc-linux-gnu, compiled by gcc (Debian 4.7.2-5)
> 4.7.2, 32-bit
>
> (1 row)
>
>
>
> production=> \d customers
>
>                         Table "public.customers"
>
>      Column    |   Type   |                         Modifiers
>
>
> ---------------+----------+-------------------------------------------------------
>
>  id            | bigint   | not null default
> nextval('customers_id_seq'::regclass)
>
>  group_id      | bigint   |
>
> ...
>
> Indexes:
>
>     "customers_pkey" PRIMARY KEY, btree (id)
>
> ...
>
>
>
> production=> select count(*) from customers;
>
>  count
>
> --------
>
>  473733
>
> (1 row)
>
>
>
> production=> \d balances
>
>                        Table "public.balances"
>
>         Column     |   Type   |                         Modifiers
>
>
> -------------------+----------+------------------------------------------------------
>
>  id                | bigint   | not null default
> nextval('balances_id_seq'::regclass)
>
>  balance           | integer  | not null default 0
>
>  tracking_number   | integer  | not null
>
>  customer_id       | bigint   | not null
>
> ...
>
> Indexes:
>
>     "balances_pkey" PRIMARY KEY, btree (id)
>
>     "balances_customer_tracking_number_index" UNIQUE, btree (customer_id,
> tracking_number)
>
> ...
>
>
>
> production=> select count(*) from balances;
>
>  count
>
> -------
>
>  16876
>
> (1 row)
>
>
>
> production=> analyze verbose customers;
>
> INFO:  analyzing "public.customers"
>
> INFO:  "customers": scanned 14280 of 14280 pages, containing 475288 live
> rows and 1949 dead rows; 300000 rows in sample, 475288 estimated total rows
>
> ANALYZE
>
>
>
> production=> analyze verbose balances;
>
> INFO:  analyzing "public.balances"
>
> INFO:  "balances": scanned 202 of 202 pages, containing 16876 live rows
> and 0 dead rows; 16876 rows in sample, 16876 estimated total rows
>
> ANALYZE
>
>
>
> production=> explain analyze SELECT * FROM balances where customer_id IN
> (SELECT id from customers WHERE group_id = 45);
>
>
>   QUERY PLAN
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
>  Merge Semi Join  (cost=2442.14..19958.30 rows=16876 width=80) (actual
> time=119.905..145.126 rows=7318 loops=1)
>
>    Merge Cond: (balances.customer_id = customers.id)
>
>    ->  Index Scan using balances_customer_id_index on balances
>  (cost=0.00..727.79 rows=16876 width=80) (actual time=0.302..9.477
> rows=16876 loops=1)
>
>    ->  Index Scan using customers_pkey on customers  (cost=0.00..64192.97
> rows=184 width=8) (actual time=103.354..126.459 rows=359 loops=1)
>
>          Filter: (group_id = 45)
>
>          Rows Removed by Filter: 141684
>
>  Total runtime: 146.659 ms
>
> (7 rows)
>
>
>
> production=> explain analyze SELECT ac.* FROM balances ac join customers o
> ON o.id = ac.customer_id WHERE o.group_id = 45;
>
>
>   QUERY PLAN
>
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>  Merge Join  (cost=2214.50..20216.86 rows=30 width=80) (actual
> time=185.615..201991.752 rows=7318 loops=1)
>
>    Merge Cond: (ac.customer_id = o.id)
>
>    ->  Index Scan using balances_customer_tracking_number_index on
> balances ac  (cost=0.00..1007.49 rows=16876 width=80) (actual
> time=0.068..25.036 rows=16876 loops=1)
>
>    ->  Index Scan using customers_pkey on customers o
>  (cost=0.00..63836.61 rows=836 width=8) (actual time=159.840..201915.765
> rows=7672 loops=1)
>
>          Filter: (group_id = 45)
>
>          Rows Removed by Filter: 212699113
>
>  Total runtime: 201995.044 ms
>
> (7 rows)
>
>
>
>
>
> What if you rewrite your second query like this:
>
>
>
> SELECT ac.*
>
> FROM balances ac JOIN customers o ON (o.id = ac.customer_id AND
> o.group_id = 45);
>
>
>
> Regards,
>
> Igor Neyman
>
>
>
>
>

Re: [GENERAL] Merge join vs merge semi join against primary key

Reply via email to