Re: [PERFORM] Join between 2 tables always executes a sequential scan on the larger table

Dieter Rehbein Wed, 03 Apr 2013 01:19:51 -0700

Hi Igor,

thanks for the reply.  The sequential scan on user_2_competition wasn't my 
main-problem. What really suprised me was the sequential scan on table user, 
which is a sequential scan over one million rows.


Hash Left Join  (cost=111357.64..126222.29 rows=41396 width=42) (actual 
time=1982.543..2737.331 rows=41333 loops=1)
  Hash Cond: ((uc.user_id)::text = (u.id)::text)
  ->  Seq Scan on user_2_competition uc  (cost=0.00..4705.21 rows=41396 
width=33) (actual time=0.019..89.691 rows=41333 loops=1)
        Filter: ((competition_id)::text = 
'3cc1cb9b3ac132ad013ad01316040001'::text)
        Rows Removed by Filter: 80684
  ->  Hash  (cost=90074.73..90074.73 rows=999673 width=42) (actual 
time=1977.604..1977.604 rows=999673 loops=1)
        Buckets: 2048  Batches: 128  Memory Usage: 589kB
        ->  Seq Scan on "user" u  (cost=0.00..90074.73 rows=999673 width=42) 
(actual time=0.004..1178.827 rows=999673 loops=1)     <-- This sequential scan 
is strange.


IMHO the reason for the sequential scan on user is, that it is faster than an 
index-scan for 41333 rows.  I've tried the same query using a different 
competition id with much less participants (about 1700). That query has a query 
plan as expected:

Nested Loop Left Join  (cost=0.00..21385.59 rows=1684 width=42) (actual 
time=1.317..147.781 rows=1757 loops=1)
  ->  Seq Scan on user_2_competition uc  (cost=0.00..7026.93 rows=1684 
width=33) (actual time=1.262..92.339 rows=1757 loops=1)
        Filter: ((competition_id)::text = 
'3cc1cb963b988f12013bc737b4590001'::text)
  ->  Index Scan using user_pkey on "user" u  (cost=0.00..8.51 rows=1 width=42) 
(actual time=0.030..0.031 rows=1 loops=1757)
        Index Cond: ((id)::text = (uc.user_id)::text)
Total runtime: 148.068 ms


regards
Dieter



Am 02.04.2013 um 16:55 schrieb Igor Neyman <[email protected]>:

From: Dieter Rehbein [mailto:[email protected]] 
Sent: Tuesday, April 02, 2013 4:52 AM
To: [email protected]
Subject: Join between 2 tables always executes a sequential scan on the larger 
table

Hi everybody,

in a project I have a performance problem, which I (and my colleagues) don't 
understand. It's a simple join between 2 of 3 tables:

table-1:   user   (id, user_name, ...).   This table has about 1 million rows 
(999673 rows)
table-2:   competition (57 rows)
table-3:   user_2_competition.  A relation between user and competition. This 
table has about 100.000 rows

The query is a join between table user_2_competition and user and looks like 
this:

select u.id, u.user_name
from user_2_competition uc 
          left join "user" u on u.id = uc.user_id 
where uc.competition_id = '3cc1cb9b3ac132ad013ad01316040001'

The query returns the ID and user_name of all users participating in a 
competition.

What I don't understand: This query executes a sequential scan on user!


The tables have the following indexes:

user_2_competition:  there is an index on user_id and an index on 
competition_id (competition_id is a VARCHAR(32) containing UUIDs)
user:  id is the primary key and has therefore a unique index (the ID is a 
VARCHAR(32), which contains UUIDs).

The database has just been restored from a backup, I've executed ANALYZE for 
both tables.

The output of explain analyze (Postgres 9.2.3):

Hash Left Join  (cost=111357.64..126222.29 rows=41396 width=42) (actual 
time=1982.543..2737.331 rows=41333 loops=1)
  Hash Cond: ((uc.user_id)::text = (u.id)::text)
  ->  Seq Scan on user_2_competition uc  (cost=0.00..4705.21 rows=41396 
width=33) (actual time=0.019..89.691 rows=41333 loops=1)
        Filter: ((competition_id)::text = 
'3cc1cb9b3ac132ad013ad01316040001'::text)
        Rows Removed by Filter: 80684
  ->  Hash  (cost=90074.73..90074.73 rows=999673 width=42) (actual 
time=1977.604..1977.604 rows=999673 loops=1)
        Buckets: 2048  Batches: 128  Memory Usage: 589kB
        ->  Seq Scan on "user" u  (cost=0.00..90074.73 rows=999673 width=42) 
(actual time=0.004..1178.827 rows=999673 loops=1)
Total runtime: 2740.723 ms


I expected to see an index-scan on user_2_competition with a hash join to user, 
not a sequential scan on user.  I've tried this with Postgres 9.1 and 9.2.3).

Any ideas, what's going on here?

With EXPLAIN ANALYZE I can see, which query plan Postgres is using.  Is there 
any way to find out, WHY postgres uses this query plan? 

best regards
Dieter

-----------------------------------------------

Dieter, 
why do you think index-scan on user_2_competition would be better?

Based on huge number of rows returned (41333 out of total ~120000 in the table) 
from this table optimizer decided that Seq Scan is better than index scan.
You don't show QUERY TUNING parameters from Postgresql.conf, are they default?
Playing with optimizer parameters (lowering random_page_cost, lowering 
cpu_index_tuple_cost , increasing effective_cache_size, or just setting 
enable_seqscan = off), you could try to force "optimizer" to use index, and see 
if you are getting better results.

Regards,
Igor Neyman

Happy Skiing!

Dieter Rehbein
Software Architect | [email protected] 

Skiline Media GmbH
Lakeside B03
9020 Klagenfurt, Austria

fon: +43 463 249445-800
fax: +43 463 249445-102

"Erlebe Skifahren neu!"

CONFIDENTIALITY: This e-mail and any attachments are confidential and may also 
be privileged. If you are not the designated recipient, please notify the 
sender immediately by reply e-mail and destroy all copies (digital and paper). 
Any unauthorized disclosure, distribution, copying, storage or use of the 
information contained in this e-mail or any attachments is strictly prohibited 
and may be unlawful.
LEGAL: Skiline Media GmbH - Managing Director: Michael Saringer



-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Join between 2 tables always executes a sequential scan on the larger table

Reply via email to