Re: [PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster

2010-03-24 Thread Christian Brink

On 03/22/2010 03:21 PM, Tom Lane wrote:

The fundamental reason why you're getting a bad plan choice is the
factor-of-100 estimation error here.  I'm not sure you can do a whole
lot about that without rethinking the query --- in particular I would
suggest trying to get rid of the non-constant range bounds.  You're
apparently already plugging in an external variable for the date,
so maybe you could handle the time of day similarly instead of joining
to sysstrings for it.

   


Tom  Peter,

I thought you might like to know the outcome of this. I was able to get 
the 8.0 and the 8.2 planner to correctly run the query. There were 2 
issues. As Tom pointed out the the 'systrings' lookup seems to be the 
main culprit. Which makes sense. How can the planner know how to run the 
query when it doesn't know approximately what it will bracket the until 
the query has started?


The other part of the solution is bit concerning. I had to do a 'dump 
and load' (and vacuum analyze)  to get the planner to work correctly 
even after I rewrote the query. FYI I had run 'VACUUM ANALYZE' (and 
sometimes 'REINDEX TABLE x') between each test.



--
Christian Brink



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster

2010-03-22 Thread Christian Brink
I previously posted 'forcing index scan on query produces 16x faster' 
and it seemed that the consensus was that 8.0.x series had an issue. I 
have upgraded to the highest practical version for our distro. But we 
seem to have the same issue.


If I force the 'enable_seqscan' off our actual time is 9ms where if 
'enable_seqscan' is on the performance is 2200ms ( the good news is the 
Seq Scan query on 8.2 is 1/2 the time of the 8.0 query ).



The paste is below - I reloaded the table from scratch after the 8.2 
upgrade. Then I ran a 'REINDEX DATABASE' and a 'VACUUM ANALYZE' (then 
ran some queries and reran the vac analyze).




postream= SELECT version();
 version
-
 PostgreSQL 8.2.11 on i386-redhat-linux-gnu, compiled by GCC gcc (GCC) 
4.1.2 20070925 (Red Hat 4.1.2-33)

(1 row)

postream= SET enable_seqscan = false;
SET
postream= EXPLAIN ANALYZE
postream- SELECT si.group1_id as name, sum(si.qty) as count, 
sum(si.amt) as amt

postream-FROM salesitems si, sales s, sysstrings
postream-   WHERE si.id = s.id
postream-AND si.group1_id != ''
postream-AND si.group1_id IS NOT NULL
postream-AND NOT si.void
postream-AND NOT s.void
postream-AND NOT s.suspended
postream-AND s.tranzdate = (cast('2010-02-15' as date) + 
cast(sysstrings.data as time))
postream-AND s.tranzdate  ((cast('2010-02-15' as date) + 1) + 
cast(sysstrings.data as time))

postream-AND sysstrings.id='net/Console/Employee/Day End Time'
postream- GROUP BY name;

QUERY PLAN
--
 HashAggregate  (cost=426973.65..426973.86 rows=14 width=35) (actual 
time=9.424..9.438 rows=12 loops=1)
   -  Nested Loop  (cost=0.01..426245.31 rows=97113 width=35) (actual 
time=0.653..6.954 rows=894 loops=1)
 -  Nested Loop  (cost=0.01..2416.59 rows=22477 width=4) 
(actual time=0.595..2.150 rows=225 loops=1)
   -  Index Scan using sysstrings_pkey on sysstrings  
(cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1 loops=1)
 Index Cond: (id = 'net/Console/Employee/Day End 
Time'::text)
   -  Index Scan using sales_tranzdate_index on sales s  
(cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687 
rows=225 loops=1)
 Index Cond: ((s.tranzdate = ('2010-02-15'::date + 
(sysstrings.data)::time without time zone)) AND (s.tranzdate  
('2010-02-16'::date + (sysstrings.data)::time without time zone)))

 Filter: ((NOT void) AND (NOT suspended))
 -  Index Scan using salesitems_pkey on salesitems si  
(cost=0.00..18.54 rows=25 width=39) (actual time=0.007..0.013 rows=4 
loops=225)

   Index Cond: (si.id = s.id)
   Filter: (((group1_id)::text  ''::text) AND (group1_id 
IS NOT NULL) AND (NOT void))

 Total runtime: 9.585 ms
(12 rows)

postream= SET enable_seqscan = true;
SET
postream= EXPLAIN ANALYZE
postream- SELECT si.group1_id as name, sum(si.qty) as count, 
sum(si.amt) as amt

postream-FROM salesitems si, sales s, sysstrings
postream-   WHERE si.id = s.id
postream-AND si.group1_id != ''
postream-AND si.group1_id IS NOT NULL
postream-AND NOT si.void
postream-AND NOT s.void
postream-AND NOT s.suspended
postream-AND s.tranzdate = (cast('2010-02-15' as date) + 
cast(sysstrings.data as time))
postream-AND s.tranzdate  ((cast('2010-02-15' as date) + 1) + 
cast(sysstrings.data as time))

postream-AND sysstrings.id='net/Console/Employee/Day End Time'
postream- GROUP BY name;

   QUERY PLAN

 HashAggregate  (cost=38315.09..38315.30 rows=14 width=35) (actual 
time=2206.531..2206.545 rows=12 loops=1)
   -  Hash Join  (cost=2697.55..37586.74 rows=97113 width=35) (actual 
time=2128.070..2204.048 rows=894 loops=1)

 Hash Cond: (si.id = s.id)
 -  Seq Scan on salesitems si  (cost=0.00..30578.15 
rows=890646 width=39) (actual time=0.047..1487.688 rows=901281 loops=1)
   Filter: (((group1_id)::text  ''::text) AND (group1_id 
IS NOT NULL) AND (NOT void))
 -  Hash  (cost=2416.59..2416.59 rows=22477 width=4) (actual 
time=1.823..1.823 rows=225 loops=1)
   -  Nested Loop  (cost=0.01..2416.59 rows=22477 width=4) 
(actual time=0.477..1.592 rows=225 loops=1)
 -  Index Scan using 

Re: [PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster

2010-03-22 Thread Tom Lane
Christian Brink cbr...@r-stream.com writes:
   -  Nested Loop  (cost=0.01..2416.59 rows=22477 width=4) 
 (actual time=0.595..2.150 rows=225 loops=1)
 -  Index Scan using sysstrings_pkey on sysstrings  
 (cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1 loops=1)
   Index Cond: (id = 'net/Console/Employee/Day End 
 Time'::text)
 -  Index Scan using sales_tranzdate_index on sales s  
 (cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687 
 rows=225 loops=1)
   Index Cond: ((s.tranzdate = ('2010-02-15'::date + 
 (sysstrings.data)::time without time zone)) AND (s.tranzdate  
 ('2010-02-16'::date + (sysstrings.data)::time without time zone)))
   Filter: ((NOT void) AND (NOT suspended))

The fundamental reason why you're getting a bad plan choice is the
factor-of-100 estimation error here.  I'm not sure you can do a whole
lot about that without rethinking the query --- in particular I would
suggest trying to get rid of the non-constant range bounds.  You're
apparently already plugging in an external variable for the date,
so maybe you could handle the time of day similarly instead of joining
to sysstrings for it.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster

2010-03-22 Thread Eger, Patrick
Not to beat a dead horse excessively, but I think the below is a pretty
good argument for index hints? I know the general optimizer wants to be
highest priority (I very much agree with this), but I think there are
fully legitimate cases like the below. Asking the user to rewrite the
query in an unnatural way (or to change optimizer params that may work
for 99% of queries) is, IMO not a good thing. Given that the postgres
optimizer can never be perfect (as it will never have the perfect
knowledge necessary for a perfect decision), I would request that index
hints be reconsidered (for 9.0?). I know many users (myself included)
are doing this in a very rudimentary way by disabling particular access
types on a per session basis set enable_seqscan=off; set
enable_hashjoin=off; QUERY set enable_seqscan=on; set
enable_hashjoin=on;... I'd hack up a patch if I had the time at least
=)

Best regards, Patrick

-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Tom Lane
Sent: Monday, March 22, 2010 12:22 PM
To: Christian Brink
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan
on query produces faster 

Christian Brink cbr...@r-stream.com writes:
   -  Nested Loop  (cost=0.01..2416.59 rows=22477 width=4) 
 (actual time=0.595..2.150 rows=225 loops=1)
 -  Index Scan using sysstrings_pkey on sysstrings  
 (cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1
loops=1)
   Index Cond: (id = 'net/Console/Employee/Day End 
 Time'::text)
 -  Index Scan using sales_tranzdate_index on sales s

 (cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687 
 rows=225 loops=1)
   Index Cond: ((s.tranzdate = ('2010-02-15'::date
+ 
 (sysstrings.data)::time without time zone)) AND (s.tranzdate  
 ('2010-02-16'::date + (sysstrings.data)::time without time zone)))
   Filter: ((NOT void) AND (NOT suspended))

The fundamental reason why you're getting a bad plan choice is the
factor-of-100 estimation error here.  I'm not sure you can do a whole
lot about that without rethinking the query --- in particular I would
suggest trying to get rid of the non-constant range bounds.  You're
apparently already plugging in an external variable for the date,
so maybe you could handle the time of day similarly instead of joining
to sysstrings for it.

regards, tom lane

-- 
Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance