2017-04-05 10:13 GMT+02:00 Daniel Westermann < daniel.westerm...@dbi-services.com > :
2017-04-05 9:28 GMT+02:00 Daniel Westermann < daniel.westerm...@dbi-services.com > : BQ_BEGIN >>what is result of EXPLAIN statement for slow and fast cases? >> >>regards >> >>Pavel For work_mem=32MB explain (analyze,verbose,buffers) select count(user_id) from users where user_id not in ( select id from ids); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=83175.01..83175.02 rows=1 width=8) (actual time=7945.592..7945.593 rows=1 loops=1) Output: count(users.user_id) Buffers: shared read=29425 -> Seq Scan on public.users (cost=16925.01..79425.01 rows=1500000 width=4) (actual time=1928.665..5888.645 rows=2499999 loops=1) Output: users.user_id, users.username Filter: (NOT (hashed SubPlan 1)) Rows Removed by Filter: 500001 Buffers: shared read=29425 SubPlan 1 -> Seq Scan on public.ids (cost=0.00..14425.01 rows=1000001 width=4) (actual time=11.111..868.382 rows=1000001 loops=1) Output: ids.id Buffers: shared read=4425 Planning time: 187.396 ms Execution time: 7948.108 ms (14 rows) Time: 8244.493 ms For work_mem='16MB' it does not complete with analyze in on hour. For explain only: explain (verbose) select count(user_id) from users where user_id not in ( select id from ids); QUERY PLAN ---------------------------------------------------------------------------------------- Aggregate (cost=38748092500.00..38748092500.01 rows=1 width=8) Output: count(users.user_id) -> Seq Scan on public.users (cost=0.00..38748088750.00 rows=1500000 width=4) Output: users.user_id, users.username Filter: (NOT (SubPlan 1)) SubPlan 1 -> Materialize (cost=0.00..23332.01 rows=1000001 width=4) Output: ids.id -> Seq Scan on public.ids (cost=0.00..14425.01 rows=1000001 width=4) Output: ids.id (10 rows) >There is a materialize op more > >do you have a index on ids.id ? Yes: \d ids Table "public.ids" Column | Type | Modifiers --------+---------+----------- id | integer | Indexes: "i_ids" UNIQUE, btree (id) BQ_END >>hmm .. NOT IN is just bad :( >> >>The second is slow becase table ids is stored in temp file. and it is >>repeatedly read from file. In first case, ids table is stored in memory. >> >>SELECT count(user_id) FROM users WHERE user_id NOT EXISTS(SELECT id from ids >>where id = user_id) >> Yes, really bad :) ... and I still do not get it. Even when reading from the tempfile all the time it should at least complete within one hour, shouldn't it? The tables are not so big: select * from pg_size_pretty ( pg_relation_size ('ids' )); pg_size_pretty ---------------- 35 MB (1 row) select * from pg_size_pretty ( pg_relation_size ('users' )); pg_size_pretty ---------------- 195 MB (1 row)