Oh, sorry, this happens on Postgresql 9.6.6. I've checked that on
Postgresql 10.0 query plan from setup (1)
now uses two seq scans, like in setup (2).


2017-12-01 15:03 GMT+02:00 Danylo Hlynskyi <abcz2.upr...@gmail.com>:

> Hello. I want to remove rows from first table, that exist in second
> (equality is done using PK). However I experience seq scan on second table,
> which counters my intuition - I think it should be index-only. Because
> tables are large, performance of query is very bad.
>
> However I got mixed results when trying to reproduce this behavior on
> syntetic tables. Here I'll show 3 different plans, which I got for the same
> query.
>
> 1. Setup is:
> ---------------------------
> create table diff (id uuid constraint diff_pkey primary key);
> create table origin (id uuid constraint origin_pkey primary key);
> ---------------------------
>
> The query generates correct plan, which performs only index scans:
>
> explain delete from origin where exists (select id from diff where
> origin.id = diff.id);
>                                         QUERY PLAN
> ------------------------------------------------------------
> -------------------------------
>  Delete on origin  (cost=0.30..105.56 rows=1850 width=12)
>    ->  Merge Semi Join  (cost=0.30..105.56 rows=1850 width=12)
>          Merge Cond: (origin.id = diff.id)
>          ->  Index Scan using origin_pkey on origin  (cost=0.15..38.90
> rows=1850 width=22)
>          ->  Index Scan using diff_pkey on diff  (cost=0.15..38.90
> rows=1850 width=22)
> (5 rows)
>
> 2. Setup is:
> --------------------------------
> create table origin (id uuid constraint origin_pkey primary key, data
> jsonb);
> create table diff (id uuid constraint diff_pkey primary key, data jsonb);
> --------------------------------
>
> The query generates plan with two seq scans:
>
> explain delete from origin where exists (select id from diff where
> origin.id = diff.id);
>                                 QUERY PLAN
> ------------------------------------------------------------
> ---------------
>  Delete on origin  (cost=34.08..69.49 rows=1070 width=12)
>    ->  Hash Semi Join  (cost=34.08..69.49 rows=1070 width=12)
>          Hash Cond: (origin.id = diff.id)
>          ->  Seq Scan on origin  (cost=0.00..20.70 rows=1070 width=22)
>          ->  Hash  (cost=20.70..20.70 rows=1070 width=22)
>                ->  Seq Scan on diff  (cost=0.00..20.70 rows=1070 width=22)
> (6 rows)
>
> 3. My real `origin` table has 26 fields and 800 billion rows, real `diff`
> table has 12 million rows and the query generates plan with nested loop and
> seq scan on `diff` table:
>
> explain delete from drug_refills origin where exists (select id from
> drug_refills_diff diff where origin.id = diff.id);
>                                                 QUERY PLAN
> ------------------------------------------------------------
> ----------------------------------------------
>  Delete on drug_refills origin  (cost=0.57..22049570.11 rows=11975161
> width=12)
>    ->  Nested Loop  (cost=0.57..22049570.11 rows=11975161 width=12)
>          ->  Seq Scan on drug_refills_diff diff  (cost=0.00..720405.61
> rows=11975161 width=22)
>          ->  Index Scan using drug_refills_pkey on drug_refills origin
> (cost=0.57..1.77 rows=1 width=22)
>                Index Cond: (id = diff.id)
> (5 rows)
>
> I have run ANALYZE on both tables, but it didn't help. Here are column
> types in origin and diff (same schema), if that matters:
>
> uuid
> timestamp with time zone
> timestamp with time zone
> character varying(255)
> character varying(255)
> character varying(1024)
> numeric(10,4)
> integer
> numeric(14,8)
> numeric(14,8)
> numeric(14,8)
> numeric(14,8)
> numeric(14,8)
> character varying(16)
> character varying(16)
> character varying(16)
> character varying(16)
> character varying(16)
> character varying(16)
> date
> jsonb
> text[]
> uuid
> uuid
> uuid
> uuid
>
>

Reply via email to