I have two simple queries that do what I believe to be the exact same
thing. I was surprised to see a reliable, and what I consider to be
significant (although not problematic for my application) difference
in execution time. It hints to me that PostgreSQL may be missing an
optimization opportunity? This is on PostgreSQL 8.1.4.
For a quick summary of the relationships:
I have a 79 row "system" table that describes each ClearCase system.
ClearCase uses uuid to uniquely identify database objects across the
life of the object. For this table, I store uid as a varchar(80), and
have a unique index on it:
eudb=> \d sm_system
Table "public.sm_system"
Column | Type | Modifiers
-------------+------------------------+-----------------------------------------------------------------
system_dbid | integer | not null default
nextval('sm_system_system_dbid_seq'::regclass)
type | character varying(10) | not null
uid | character varying(200) | not null
name | character varying(200) | not null
owner | character varying(80) | not null
Indexes:
"sm_system_pkey" PRIMARY KEY, btree (system_dbid) CLUSTER
"sm_system_type_key" UNIQUE, btree ("type", uid)
Check constraints:
"sm_system_type_check" CHECK ("type"::text = 'NEU'::text OR "type"::text =
'PLS'::text)
I have a 339,586 row "change" table that describes each ClearCase
activity. Each activity has a name that should be unique, but may not
be unique across time. Uniqueness is relative to the system that
contains it.
Table "public.sm_change"
Column | Type |
Modifiers
----------------+--------------------------------+-----------------------------------------------------------------
change_dbid | integer | not null default
nextval('sm_change_change_dbid_seq'::regclass)
system_dbid | integer | not null
stream_dbid | integer | not null
uid | character varying(200) | not null
name | character varying(200) | not null
status | character varying(20) | not null
owner | character varying(80) | not null
target | integer |
creationtime | timestamp(0) without time zone | not null
submissiontime | timestamp(0) without time zone | not null
comments | text |
elements | text |
Indexes:
"sm_change_pkey" PRIMARY KEY, btree (change_dbid) CLUSTER
"sm_change_system_dbid_key" UNIQUE, btree (system_dbid, uid)
"sm_change_name_key" btree (lower(name::text))
"sm_change_stream_dbid_key" btree (stream_dbid)
"sm_change_target_key" btree (target)
Foreign-key constraints:
"sm_change_stream_dbid_fkey" FOREIGN KEY (stream_dbid) REFERENCES
sm_stream(stream_dbid)
"sm_change_system_dbid_fkey" FOREIGN KEY (system_dbid) REFERENCES
sm_system(system_dbid)
"sm_change_target_fkey" FOREIGN KEY (target) REFERENCES
sm_change(change_dbid)
One of the made up queries that I played with was a lookup on the system uuid,
and the
activity name. This is the one that I noticed the timing difference:
neudb=> select uid, name from sm_change where system_dbid = (select system_dbid
from sm_system where uid = '2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da') and
lower(name) = lower('markm-Q00855572');
uid | name
------------------------------------------+-----------------
ff733174.6c7411d8.900c.00:06:5b:b3:db:28 | markm-Q00855572
(1 row)
Time: 1.242 ms
The 1.242 ms is pretty stable. 1.226 ms -> 1.248 ms over 5 runs.
Then we have:
neudb=> select sm_change.uid, sm_change.name from sm_change join sm_system
using (system_dbid) where sm_system.uid =
'2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da' and lower(sm_change.name) =
lower('markm-Q00855572');
uid | name
------------------------------------------+-----------------
ff733174.6c7411d8.900c.00:06:5b:b3:db:28 | markm-Q00855572
(1 row)
Time: 1.500 ms
This time is less stable - it runs from 1.394 ms -> 1.561 ms over 5 runs.
As I mentioned - for my application, I don't really care. If it took
10 ms or more, I wouldn't care. But the difference in time bothered me.
So, here are the query plans that PostgreSQL selected for me:
neudb=> explain analyze select uid, name from sm_change where system_dbid =
(select system_dbid from sm_system where uid =
'2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da') and lower(name) =
lower('markm-Q00855572');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Index Scan using sm_change_name_key on sm_change (cost=2.99..7.82 rows=1
width=80) (actual time=0.322..0.328 rows=1 loops=1)
Index Cond: (lower((name)::text) = 'markm-q00855572'::text)
Filter: (system_dbid = $0)
InitPlan
-> Seq Scan on sm_system (cost=0.00..2.99 rows=1 width=4) (actual
time=0.052..0.106 rows=1 loops=1)
Filter: ((uid)::text =
'2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da'::text)
Total runtime: 0.419 ms
(7 rows)
Time: 16.494 ms
neudb=> explain analyze select sm_change.uid, sm_change.name from sm_change
join sm_system using (system_dbid) where sm_system.uid =
'2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da' and lower(sm_change.name) =
lower('markm-Q00855572');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..7.83 rows=1 width=80) (actual time=0.099..0.159
rows=1 loops=1)
Join Filter: ("outer".system_dbid = "inner".system_dbid)
-> Index Scan using sm_change_name_key on sm_change (cost=0.00..4.83
rows=1 width=84) (actual time=0.053..0.059 rows=1 loops=1)
Index Cond: (lower((name)::text) = 'markm-q00855572'::text)
-> Seq Scan on sm_system (cost=0.00..2.99 rows=1 width=4) (actual
time=0.030..0.077 rows=1 loops=1)
Filter: ((uid)::text =
'2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da'::text)
Total runtime: 0.250 ms
(7 rows)
Time: 1.898 ms
I'm still learning how PostgreSQL works internally. My understanding
is that the above are essentially the same. The first finds the system
row using a sequential scan, then looks for the change row using the
index, filtering by the system value. The second finds the change rows
using the same index, expecting to find one row, and finding only one
row, and matches it up against the system row using a sequential scan.
So why does one reliably run faster than the other?
neudb=> prepare plan1 (varchar(80), varchar(80)) as select uid, name from
sm_change where system_dbid = (select system_dbid from sm_system where uid =
$1) and lower(name) = lower($2);
neudb=> prepare plan2 (varchar(80), varchar(80)) as select sm_change.uid,
sm_change.name from sm_change join sm_system using (system_dbid) where
sm_system.uid = $1 and lower(sm_change.name) = lower($2);
Now:
neudb=> execute plan1 ('2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da',
'markm-q00855572');
uid | name
------------------------------------------+-----------------
ff733174.6c7411d8.900c.00:06:5b:b3:db:28 | markm-Q00855572
(1 row)
Time: 0.794 ms
neudb=> execute plan2 ('2ff5942c.dd2911d5.ad56.08:00:09:fd:1b:da',
'markm-q00855572');
uid | name
------------------------------------------+-----------------
ff733174.6c7411d8.900c.00:06:5b:b3:db:28 | markm-Q00855572
(1 row)
Time: 0.715 ms
The numbers above don't mean anything. I ran both a few dozen times, and my
conclusion
is that after the plan is prepared (I did explain analyze to ensure that the
prepared
plans were the same as the dynamically generated plans), the times are the
same. Both
ranged from 0.690 ms -> 0.850 ms. Timings at these resolutions are not so
reliable. :-)
I think this means that the planner takes longer to figure out what to do about
the
join, and that my writing the select out as an embedded select reduces the
effort
required by the planner. This makes sense to me, except that I thought
PostgreSQL
would convert back and forth between the two forms automatically. They are the
same
query, are they not? Why wouldn't they both take longer, or both take shorter?
What
if I invented a scenario where the difference in plans made a major difference,
such as making the system table much larger, still without an index? Should they
not both come up with the same plan - the better estimated plan?
Am I expecting too much? :-)
Cheers,
mark
--
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED]
__________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
http://mark.mielke.cc/
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster