2017-04-05 10:13 GMT+02:00 Daniel Westermann < 
daniel.westerm...@dbi-services.com > : 



2017-04-05 9:28 GMT+02:00 Daniel Westermann < 
daniel.westerm...@dbi-services.com > : 

BQ_BEGIN

>>what is result of EXPLAIN statement for slow and fast cases? 
>> 
>>regards 
>> 
>>Pavel 

For work_mem=32MB 

explain (analyze,verbose,buffers) select count(user_id) from users where 
user_id not in ( select id from ids); 
QUERY PLAN 
--------------------------------------------------------------------------------------------------------------------------------------
 
Aggregate (cost=83175.01..83175.02 rows=1 width=8) (actual 
time=7945.592..7945.593 rows=1 loops=1) 
Output: count(users.user_id) 
Buffers: shared read=29425 
-> Seq Scan on public.users (cost=16925.01..79425.01 rows=1500000 width=4) 
(actual time=1928.665..5888.645 rows=2499999 loops=1) 
Output: users.user_id, users.username 
Filter: (NOT (hashed SubPlan 1)) 
Rows Removed by Filter: 500001 
Buffers: shared read=29425 
SubPlan 1 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=1000001 width=4) (actual 
time=11.111..868.382 rows=1000001 loops=1) 
Output: ids.id 
Buffers: shared read=4425 
Planning time: 187.396 ms 
Execution time: 7948.108 ms 
(14 rows) 

Time: 8244.493 ms 

For work_mem='16MB' it does not complete with analyze in on hour. For explain 
only: 

explain (verbose) select count(user_id) from users where user_id not in ( 
select id from ids); 
QUERY PLAN 
----------------------------------------------------------------------------------------
 
Aggregate (cost=38748092500.00..38748092500.01 rows=1 width=8) 
Output: count(users.user_id) 
-> Seq Scan on public.users (cost=0.00..38748088750.00 rows=1500000 width=4) 
Output: users.user_id, users.username 
Filter: (NOT (SubPlan 1)) 
SubPlan 1 
-> Materialize (cost=0.00..23332.01 rows=1000001 width=4) 
Output: ids.id 
-> Seq Scan on public.ids (cost=0.00..14425.01 rows=1000001 width=4) 
Output: ids.id 
(10 rows) 




>There is a materialize op more 
> 
>do you have a index on ids.id ? 

Yes: 

\d ids 
Table "public.ids" 
Column | Type | Modifiers 
--------+---------+----------- 
id | integer | 
Indexes: 
"i_ids" UNIQUE, btree (id) 


BQ_END



>>hmm .. NOT IN is just bad :( 
>> 
>>The second is slow becase table ids is stored in temp file. and it is 
>>repeatedly read from file. In first case, ids table is stored in memory. 
>> 
>>SELECT count(user_id) FROM users WHERE user_id NOT EXISTS(SELECT id from ids 
>>where id = user_id) 
>> 

Yes, really bad :) ... and I still do not get it. Even when reading from the 
tempfile all the time it should at least complete within one hour, shouldn't 
it? The tables are not so big: 

select * from pg_size_pretty ( pg_relation_size ('ids' )); 
pg_size_pretty 
---------------- 
35 MB 
(1 row) 
select * from pg_size_pretty ( pg_relation_size ('users' )); 
pg_size_pretty 
---------------- 
195 MB 
(1 row) 


Reply via email to