Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-27 Thread Robert Haas
2009/7/20 Віталій Тимчишин tiv...@gmail.com:
 20 липня 2009 р. 11:02 Chris dmag...@gmail.com написав:

 Віталій Тимчишин wrote:


 2009/7/20 Robert James srobertja...@gmail.com
 mailto:srobertja...@gmail.com


    Hi. I notice that when I do a WHERE x, Postgres uses an index, and
    when I do WHERE y, it does so as well, but when I do WHERE x OR y,
    it doesn't. Why is this so?

 It's not clever enough.

 Of course it is.

 For simple cases



 I'm running 8.3.7.

 create table t1(id int primary key);
 insert into t1(id) select a from generate_series(1, 50) as s(a);
 analyze t1;

 explain analyze select * from t1 where
 id  1

 Index Scan using t1_pkey on t1  (cost=0.00..322.51 rows=9612 width=4)
 (actual time=0.030..3.700 rows= loops=1)
   Index Cond: (id  1)
 Total runtime: 4.835 ms

 explain analyze select * from t1 where
 id in (select (random() * 50)::int4 from generate_series(0,10))

 Nested Loop  (cost=32.50..1341.49 rows=200 width=4) (actual
 time=15.353..67.014 rows=11 loops=1)
   -  HashAggregate  (cost=32.50..34.50 rows=200 width=4) (actual
 time=0.028..0.043 rows=11 loops=1)
     -  Function Scan on generate_series  (cost=0.00..20.00 rows=1000
 width=0) (actual time=0.014..0.020 rows=11 loops=1)
   -  Index Scan using t1_pkey on t1  (cost=0.00..6.52 rows=1 width=4)
 (actual time=6.083..6.084 rows=1 loops=11)
     Index Cond: (t1.id = (((random() * 50::double
 precision))::integer))
 Total runtime: 67.070 ms

 explain analyze select * from t1 where
 id in (select (random() * 50)::int4 from generate_series(0,10))
 or
 id  1

 Seq Scan on t1  (cost=22.50..9735.50 rows=254806 width=4) (actual
 time=0.049..148.947 rows=10010 loops=1)
   Filter: ((hashed subplan) OR (id  1))
   SubPlan
     -  Function Scan on generate_series  (cost=0.00..20.00 rows=1000
 width=0) (actual time=0.014..0.019 rows=11 loops=1)
 Total runtime: 150.123 ms

 explain analyze
 select * from t1 where
 id in (select (random() * 50)::int4 from generate_series(0,10))
 union
 select * from t1 where
 id  1

 Unique  (cost=2412.68..2461.74 rows=9812 width=4) (actual
 time=89.190..95.014 rows=10010 loops=1)
   -  Sort  (cost=2412.68..2437.21 rows=9812 width=4) (actual
 time=89.189..91.167 rows=10010 loops=1)
     Sort Key: public.t1.id
     Sort Method:  quicksort  Memory: 854kB
     -  Append  (cost=32.50..1762.13 rows=9812 width=4) (actual
 time=16.641..76.338 rows=10010 loops=1)
   -  Nested Loop  (cost=32.50..1341.49 rows=200 width=4)
 (actual time=16.641..70.051 rows=11 loops=1)
     -  HashAggregate  (cost=32.50..34.50 rows=200 width=4)
 (actual time=0.033..0.049 rows=11 loops=1)
   -  Function Scan on generate_series
 (cost=0.00..20.00 rows=1000 width=0) (actual time=0.020..0.026 rows=11
 loops=1)
     -  Index Scan using t1_pkey on t1  (cost=0.00..6.52
 rows=1 width=4) (actual time=6.359..6.361 rows=1 loops=11)
   Index Cond: (public.t1.id = (((random() *
 50::double precision))::integer))
   -  Index Scan using t1_pkey on t1  (cost=0.00..322.51
 rows=9612 width=4) (actual time=0.023..4.075 rows= loops=1)
     Index Cond: (id  1)
 Total runtime: 112.694 ms

Hmm.  What you're suggesting here is that we could consider
implementing OR conditions by rescanning the inner side for each index
qual and then unique-ifying the results on the index column.  That's
probably possible, but it doesn't sound easy, especially since our
selectivity-estimation code for OR conditions is not very good, so we
might choose to do it this way when that's not actually the best plan.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-27 Thread Віталій Тимчишин
27 липня 2009 р. 13:53 Robert Haas robertmh...@gmail.com написав:


 Hmm.  What you're suggesting here is that we could consider
 implementing OR conditions by rescanning the inner side for each index
 qual and then unique-ifying the results on the index column.  That's
 probably possible, but it doesn't sound easy, especially since our
 selectivity-estimation code for OR conditions is not very good, so we
 might choose to do it this way when that's not actually the best plan.

 ...Robert


Actually what I am talking about is to make OR with UNION (or UNION-like
because it's a little different depending on input rows uniqueness) as an
option. All of OR parts can use/not use different strategies (including
multiple different idexes or hash joins).
In cases when conditions are complex this can drastically increase
performance by winning over sequence scan.

As of selectivity, I'd say this is general problem - sometimes it is
estimated OK, sometimes not, but this should not prevent from trying
different plans. (From my current work: it does wrong estimations of filter
selectivity, introduces HASH join and kills the server with OOM).

Best regards, Vitaliy Tymchyshyn.


Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-27 Thread Robert Haas
2009/7/27 Віталій Тимчишин tiv...@gmail.com:


 27 липня 2009 р. 13:53 Robert Haas robertmh...@gmail.com написав:

 Hmm.  What you're suggesting here is that we could consider
 implementing OR conditions by rescanning the inner side for each index
 qual and then unique-ifying the results on the index column.  That's
 probably possible, but it doesn't sound easy, especially since our
 selectivity-estimation code for OR conditions is not very good, so we
 might choose to do it this way when that's not actually the best plan.

 ...Robert

 Actually what I am talking about is to make OR with UNION (or UNION-like
 because it's a little different depending on input rows uniqueness) as an
 option. All of OR parts can use/not use different strategies (including
 multiple different idexes or hash joins).
 In cases when conditions are complex this can drastically increase
 performance by winning over sequence scan.

That's exactly what I was talking about.

 As of selectivity, I'd say this is general problem - sometimes it is
 estimated OK, sometimes not, but this should not prevent from trying
 different plans. (From my current work: it does wrong estimations of filter
 selectivity, introduces HASH join and kills the server with OOM).

Yep.  I think the two things that would help the most with this are
multi-column statistics and some kind of  cache, but AFAIK nobody is
actively developing code for either one ATM.

The problem, though, is that it won't ALWAYS be right to implement OR
using UNION, so you have to have some way of deciding which is better.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-27 Thread Віталій Тимчишин
27 липня 2009 р. 15:02 Robert Haas robertmh...@gmail.com написав:


 The problem, though, is that it won't ALWAYS be right to implement OR
 using UNION, so you have to have some way of deciding which is better.


That's easy - you propose both ways to planner and it's up to it to decide.
Yes, it can decide wrong way, but we are returning to statistics problem. At
least one can tune costs and enable_ settings. Now one have to rewrite query
that may be not possible/too complex.


Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-27 Thread Tom Lane
=?KOI8-U?B?96bUwcymyiD0yc3eydvJzg==?= tiv...@gmail.com writes:
 Actually what I am talking about is to make OR with UNION (or UNION-like
 because it's a little different depending on input rows uniqueness) as an
 option. All of OR parts can use/not use different strategies (including
 multiple different idexes or hash joins).

AFAICS you're proposing re-inventing the old implementation of OR'd
indexscans.  We took that out when we added bitmap scans because it
didn't have any performance advantage over BitmapOr.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-27 Thread Віталій Тимчишин
27 липня 2009 р. 17:18 Tom Lane t...@sss.pgh.pa.us написав:

 =?KOI8-U?B?96bUwcymyiD0yc3eydvJzg==?= tiv...@gmail.com writes:
  Actually what I am talking about is to make OR with UNION (or UNION-like
  because it's a little different depending on input rows uniqueness) as an
  option. All of OR parts can use/not use different strategies (including
  multiple different idexes or hash joins).

 AFAICS you're proposing re-inventing the old implementation of OR'd
 indexscans.  We took that out when we added bitmap scans because it
 didn't have any performance advantage over BitmapOr.


It's not tied to indexscans at all. Different parts can do (as in UNION)
totally different strategy - e.g. perform two hash joins or perform merge
join for one part and nested loop for another or ...

As of performance - see above in this thread. UNION now often provides much
better performance when different parts of OR expression involve different
additional tables.


Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-20 Thread Віталій Тимчишин
2009/7/20 Robert James srobertja...@gmail.com


 Hi. I notice that when I do a WHERE x, Postgres uses an index, and when I
 do WHERE y, it does so as well, but when I do WHERE x OR y, it doesn't. Why
 is this so?


It's not clever enough.

And how can I shut this off?


Use UNION/UNION ALL if possible in your case.


Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-20 Thread Chris

Віталій Тимчишин wrote:



2009/7/20 Robert James srobertja...@gmail.com 
mailto:srobertja...@gmail.com



Hi. I notice that when I do a WHERE x, Postgres uses an index, and
when I do WHERE y, it does so as well, but when I do WHERE x OR y,
it doesn't. Why is this so? 



It's not clever enough.


Of course it is.

I'm running 8.3.7.

create table t1(id int primary key);
insert into t1(id) select a from generate_series(1, 50) as s(a);
analyze t1;

explain analyze select * from t1 where id=5000 or id=25937;
  QUERY PLAN 


--
 Bitmap Heap Scan on t1  (cost=8.60..16.44 rows=2 width=4) (actual 
time=0.077..0.083 rows=2 loops=1)

   Recheck Cond: ((id = 5000) OR (id = 25937))
   -  BitmapOr  (cost=8.60..8.60 rows=2 width=0) (actual 
time=0.063..0.063 rows=0 loops=1)
 -  Bitmap Index Scan on t1_pkey  (cost=0.00..4.30 rows=1 
width=0) (actual time=0.034..0.034 rows=1 loops=1)

   Index Cond: (id = 5000)
 -  Bitmap Index Scan on t1_pkey  (cost=0.00..4.30 rows=1 
width=0) (actual time=0.021..0.021 rows=1 loops=1)

   Index Cond: (id = 25937)
 Total runtime: 0.153 ms
(8 rows)

What Robert didn't post was his query, see

http://archives.postgresql.org/pgsql-general/2009-07/msg00767.php

which makes it a lot harder to 'optimize' since they aren't straight 
forward conditions.


--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-20 Thread Robert James
Query is:
select * from dict
where
 word in (select substr('moon', 0, generate_series(3,length('moon' --
this is my X above
 OR word like 'moon%' -- this is my Y above

dict is indexed on word
2009/7/20 Chris dmag...@gmail.com

 2009/7/20 Robert James srobertja...@gmail.com mailto:
 srobertja...@gmail.com


Hi. I notice that when I do a WHERE x, Postgres uses an index, and
when I do WHERE y, it does so as well, but when I do WHERE x OR y,
it doesn't. Why is this so?

 What Robert didn't post was his query, see

 http://archives.postgresql.org/pgsql-general/2009-07/msg00767.php

 which makes it a lot harder to 'optimize' since they aren't straight
 forward conditions.



Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-20 Thread Віталій Тимчишин
20 липня 2009 р. 11:02 Chris dmag...@gmail.com написав:

 Віталій Тимчишин wrote:



 2009/7/20 Robert James srobertja...@gmail.com mailto:
 srobertja...@gmail.com


Hi. I notice that when I do a WHERE x, Postgres uses an index, and
when I do WHERE y, it does so as well, but when I do WHERE x OR y,
it doesn't. Why is this so?

 It's not clever enough.


 Of course it is.


For simple cases



 I'm running 8.3.7.

 create table t1(id int primary key);
 insert into t1(id) select a from generate_series(1, 50) as s(a);
 analyze t1;


explain analyze select * from t1 where
id  1

Index Scan using t1_pkey on t1  (cost=0.00..322.51 rows=9612 width=4)
(actual time=0.030..3.700 rows= loops=1)
  Index Cond: (id  1)
Total runtime: 4.835 ms

explain analyze select * from t1 where
id in (select (random() * 50)::int4 from generate_series(0,10))

Nested Loop  (cost=32.50..1341.49 rows=200 width=4) (actual
time=15.353..67.014 rows=11 loops=1)
  -  HashAggregate  (cost=32.50..34.50 rows=200 width=4) (actual
time=0.028..0.043 rows=11 loops=1)
-  Function Scan on generate_series  (cost=0.00..20.00 rows=1000
width=0) (actual time=0.014..0.020 rows=11 loops=1)
  -  Index Scan using t1_pkey on t1  (cost=0.00..6.52 rows=1 width=4)
(actual time=6.083..6.084 rows=1 loops=11)
Index Cond: (t1.id = (((random() * 50::double
precision))::integer))
Total runtime: 67.070 ms

explain analyze select * from t1 where
id in (select (random() * 50)::int4 from generate_series(0,10))
or
id  1

Seq Scan on t1  (cost=22.50..9735.50 rows=254806 width=4) (actual
time=0.049..148.947 rows=10010 loops=1)
  Filter: ((hashed subplan) OR (id  1))
  SubPlan
-  Function Scan on generate_series  (cost=0.00..20.00 rows=1000
width=0) (actual time=0.014..0.019 rows=11 loops=1)
Total runtime: 150.123 ms

explain analyze
select * from t1 where
id in (select (random() * 50)::int4 from generate_series(0,10))
union
select * from t1 where
id  1

Unique  (cost=2412.68..2461.74 rows=9812 width=4) (actual
time=89.190..95.014 rows=10010 loops=1)
  -  Sort  (cost=2412.68..2437.21 rows=9812 width=4) (actual
time=89.189..91.167 rows=10010 loops=1)
Sort Key: public.t1.id
Sort Method:  quicksort  Memory: 854kB
-  Append  (cost=32.50..1762.13 rows=9812 width=4) (actual
time=16.641..76.338 rows=10010 loops=1)
  -  Nested Loop  (cost=32.50..1341.49 rows=200 width=4)
(actual time=16.641..70.051 rows=11 loops=1)
-  HashAggregate  (cost=32.50..34.50 rows=200 width=4)
(actual time=0.033..0.049 rows=11 loops=1)
  -  Function Scan on generate_series
(cost=0.00..20.00 rows=1000 width=0) (actual time=0.020..0.026 rows=11
loops=1)
-  Index Scan using t1_pkey on t1  (cost=0.00..6.52
rows=1 width=4) (actual time=6.359..6.361 rows=1 loops=11)
  Index Cond: (public.t1.id = (((random() *
50::double precision))::integer))
  -  Index Scan using t1_pkey on t1  (cost=0.00..322.51
rows=9612 width=4) (actual time=0.023..4.075 rows= loops=1)
Index Cond: (id  1)
Total runtime: 112.694 ms

So, if it founds out anything complex, it sadly falls back to Sequence scan.


[PERFORM] Can Postgres use an INDEX over an OR?

2009-07-19 Thread Robert James
Hi. I notice that when I do a WHERE x, Postgres uses an index, and when I do
WHERE y, it does so as well, but when I do WHERE x OR y, it doesn't. Why is
this so? And how can I shut this off?
select * from dict
where
 word in (select substr('moon', 0, generate_series(3,length('moon' --
this is my X above
 OR word like 'moon%' -- this is my Y above

Seq Scan on dict (cost=0.02..2775.66 rows=30422 width=24) (actual
time=16.635..28.580 rows=8 loops=1)
 Filter: ((hashed subplan) OR ((word)::text ~~ 'moon%'::text))
 SubPlan
 - Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.014..0.019 rows=2
loops=1)
Total runtime: 28.658 ms
(Using just X or Y alone uses the index, and completes in 0.150 ms)

Is this a bug?

PS Running PostgreSQL 8.2.1 on i686-pc-mingw32, compiled by GCC gcc.exe
(GCC) 3.4.2 (mingw-special)