Re: [PERFORM] Index usage for tstzrange?

2013-03-22 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 We should do this automatically. Or am I missing something?

Yes.  This is not equality.

 ALTER OPERATOR FAMILY integer_ops USING btree ADD
   OPERATOR 3 @ (int4, int4range),
   FUNCTION 1 btint4rangecmp(int4, int4range);

That will break approximately everything in sight, starting with the
planner's opinion of what equality is.  There is *way* too much stuff
that knows the semantics of btree opclasses for us to start jamming
random operators into them, even if this seemed to work in trivial
testing.  (See the last section of src/backend/access/nbtree/README
to just scratch the surface of the assumptions this breaks.)

It's possible that for constant ranges we could have the planner expand
intcol @ 'x,y'::int4range into intcol between x and y, using
something similar to the index LIKE optimization (ie, the special
operator stuff in indxpath.c).  I'd like to find a way to make that
type of optimization pluggable, though --- the existing approach of
hard-wiring knowledge into indxpath.c has never been anything but
a kluge, and it definitely doesn't scale as-is to anything except
built-in types and operators.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage for tstzrange?

2013-03-21 Thread Heikki Linnakangas

On 21.03.2013 06:07, Vasilis Ventirozos wrote:

On Thu, Mar 21, 2013 at 5:58 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

What I find more disturbing is that this is what I get from the example
in HEAD:

regression=# explain SELECT * FROM a WHERE ts@
tstzrange('2013-01-01','2013-01-01 00:10:00');
ERROR:  XX000: type 1184 is not a range type
LOCATION:  range_get_typcache, rangetypes.c:1451

Haven't traced through it to determine exactly what's happening, but
isn't this a legitimate usage?  And if it isn't, surely a more
user-facing error ought to be getting thrown somewhere upstream of here.


It is a legit usage, this is from a test i did myself (9.2.3)

test=# explain SELECT * FROM a WHERE ts@
tstzrange('2013-01-01','2013-04-01 00:10:00');
  QUERY PLAN

  Seq Scan on a  (cost=0.00..23.75 rows=1 width=44)
Filter: (ts@ '[2013-01-01 00:00:00+02,2013-04-01
00:10:00+03)'::tstzrange)


Looks like the range type cost estimation patch broke this, back in 
August already. The case of var @ constant, where constant is a range 
and var is an element, that's broken. The cost estimation function, 
rangesel(), incorrectly assumes that the 'var' is always a range type.


It's a bit worrying that no-one noticed until now. I'll add a test for 
that operator to the rangetypes regression test.


The immediate fix is attached, but this made me realize that rangesel() 
is still missing estimation for the element @ range operator. It 
shouldn't be hard to implement, I'm pretty sure we have all the 
statistics we need for that.


- Heikki
diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index 76dc913..c450c6a 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -154,8 +154,6 @@ rangesel(PG_FUNCTION_ARGS)
 		}
 	}
 
-	typcache = range_get_typcache(fcinfo, vardata.vartype);
-
 	/*
 	 * OK, there's a Var and a Const we're dealing with here.  We need the
 	 * Const to be of same range type as the column, else we can't do anything
@@ -169,6 +167,8 @@ rangesel(PG_FUNCTION_ARGS)
 	 */
 	if (operator == OID_RANGE_CONTAINS_ELEM_OP)
 	{
+		typcache = range_get_typcache(fcinfo, vardata.vartype);
+
 		if (((Const *) other)-consttype == typcache-rngelemtype-type_id)
 		{
 			RangeBound lower, upper;
@@ -185,6 +185,8 @@ rangesel(PG_FUNCTION_ARGS)
 	}
 	else
 	{
+		typcache = range_get_typcache(fcinfo, ((Const *) other)-consttype);
+
 		if (((Const *) other)-consttype == vardata.vartype)
 			constrange = DatumGetRangeType(((Const *) other)-constvalue);
 	}

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage for tstzrange?

2013-03-21 Thread Josh Berkus

 Well, no.  @ is not a btree-indexable operator.

Yes, but it's equivalent to ( ( a = b1 or b1 is null ) and ( a  b2 or
b2 is null ) ), which *is* btree-indexable and can use an index.  So it
seems like the kind of optimization we could eventually make.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Index usage for tstzrange?

2013-03-20 Thread Josh Berkus
Folks,

I just noticed that if I use a tstzrange for convenience, a standard
btree index on a timestamp won't get used for it.  Example:

table a (
id int,
val text,
ts timestamptz
);
index a_ts on a(ts);

SELECT * FROM a WHERE ts @ tstzrange('2013-01-01','2013-01-01 00:10:00')

... will NOT use the index a_ts.  Is this something which could be fixed
for 9.4?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage for tstzrange?

2013-03-20 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 I just noticed that if I use a tstzrange for convenience, a standard
 btree index on a timestamp won't get used for it.  Example:

 table a (
   id int,
   val text,
   ts timestamptz
 );
 index a_ts on a(ts);

 SELECT * FROM a WHERE ts @ tstzrange('2013-01-01','2013-01-01 00:10:00')

 ... will NOT use the index a_ts.

Well, no.  @ is not a btree-indexable operator.

What I find more disturbing is that this is what I get from the example
in HEAD:

regression=# explain SELECT * FROM a WHERE ts @ 
tstzrange('2013-01-01','2013-01-01 00:10:00');
ERROR:  XX000: type 1184 is not a range type
LOCATION:  range_get_typcache, rangetypes.c:1451

Haven't traced through it to determine exactly what's happening, but
isn't this a legitimate usage?  And if it isn't, surely a more
user-facing error ought to be getting thrown somewhere upstream of here.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index usage for min() vs. order by asc limit 1

2011-11-18 Thread MirrorX
can you run an analyze command first and then post here the results of:
select * FROM pg_stats WHERE tablename = 'delayed_jobs'; 
?

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/index-usage-for-min-vs-order-by-asc-limit-1-tp5002928p5004410.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] index usage for min() vs. order by asc limit 1

2011-11-17 Thread Ben Chobot
I have two queries in PG 9.1. One uses an index like I would like, the other 
does not. Is this expected behavior? If so, is there any way around it? 


postgres=# explain analyze select min(id) from delayed_jobs where 
strand='sis_batch:account:15' group by strand;
QUERY PLAN
--
 GroupAggregate  (cost=0.00..8918.59 rows=66 width=29) (actual 
time=226.759..226.760 rows=1 loops=1)
   -  Seq Scan on delayed_jobs  (cost=0.00..8553.30 rows=72927 width=29) 
(actual time=0.014..169.941 rows=72268 loops=1)
 Filter: ((strand)::text = 'sis_batch:account:15'::text)
 Total runtime: 226.817 ms
(4 rows)

postgres=# explain analyze select id from delayed_jobs where 
strand='sis_batch:account:15' order by id limit 1;
   QUERY 
PLAN
-
 Limit  (cost=0.00..0.33 rows=1 width=8) (actual time=0.097..0.098 rows=1 
loops=1)
   -  Index Scan using index_delayed_jobs_on_strand on delayed_jobs  
(cost=0.00..24181.74 rows=72927 width=8) (actual time=0.095..0.095 rows=1 
loops=1)
 Index Cond: ((strand)::text = 'sis_batch:account:15'::text)
 Total runtime: 0.129 ms
(4 rows)


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index usage for min() vs. order by asc limit 1

2011-11-17 Thread Steve Atkins

On Nov 17, 2011, at 5:12 PM, Ben Chobot wrote:

 I have two queries in PG 9.1. One uses an index like I would like, the other 
 does not. Is this expected behavior? If so, is there any way around it? 

I don't think you want the group by in that first query.

Cheers,
  Steve

 
 
 postgres=# explain analyze select min(id) from delayed_jobs where 
 strand='sis_batch:account:15' group by strand;
QUERY PLAN
 --
 GroupAggregate  (cost=0.00..8918.59 rows=66 width=29) (actual 
 time=226.759..226.760 rows=1 loops=1)
   -  Seq Scan on delayed_jobs  (cost=0.00..8553.30 rows=72927 width=29) 
 (actual time=0.014..169.941 rows=72268 loops=1)
 Filter: ((strand)::text = 'sis_batch:account:15'::text)
 Total runtime: 226.817 ms
 (4 rows)
 
 postgres=# explain analyze select id from delayed_jobs where 
 strand='sis_batch:account:15' order by id limit 1;
   QUERY 
 PLAN
 -
 Limit  (cost=0.00..0.33 rows=1 width=8) (actual time=0.097..0.098 rows=1 
 loops=1)
   -  Index Scan using index_delayed_jobs_on_strand on delayed_jobs  
 (cost=0.00..24181.74 rows=72927 width=8) (actual time=0.095..0.095 rows=1 
 loops=1)
 Index Cond: ((strand)::text = 'sis_batch:account:15'::text)
 Total runtime: 0.129 ms
 (4 rows)
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index usage for min() vs. order by asc limit 1

2011-11-17 Thread Ben Chobot
On Nov 17, 2011, at 5:20 PM, Steve Atkins wrote:
 
 I don't think you want the group by in that first query.

Heh, I tried to simply the example, but in reality that = becomes an in clause 
of multiple values. So the group by is needed.


 
 
 postgres=# explain analyze select min(id) from delayed_jobs where 
 strand='sis_batch:account:15' group by strand;
   QUERY PLAN
 --
 GroupAggregate  (cost=0.00..8918.59 rows=66 width=29) (actual 
 time=226.759..226.760 rows=1 loops=1)
  -  Seq Scan on delayed_jobs  (cost=0.00..8553.30 rows=72927 width=29) 
 (actual time=0.014..169.941 rows=72268 loops=1)
Filter: ((strand)::text = 'sis_batch:account:15'::text)
 Total runtime: 226.817 ms
 (4 rows)
 
 postgres=# explain analyze select id from delayed_jobs where 
 strand='sis_batch:account:15' order by id limit 1;
  QUERY 
 PLAN
 -
 Limit  (cost=0.00..0.33 rows=1 width=8) (actual time=0.097..0.098 rows=1 
 loops=1)
  -  Index Scan using index_delayed_jobs_on_strand on delayed_jobs  
 (cost=0.00..24181.74 rows=72927 width=8) (actual time=0.095..0.095 rows=1 
 loops=1)
Index Cond: ((strand)::text = 'sis_batch:account:15'::text)
 Total runtime: 0.129 ms
 (4 rows)
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] index usage on queries on inherited tables

2011-04-29 Thread Robert Haas
On Apr 27, 2011, at 11:11 PM, Joseph Shraibman j...@selectacast.net wrote:
 On 04/27/2011 04:32 PM, Robert Haas wrote:
 In the first case, PostgreSQL evidently thinks that using the indexes
 will be slower than just ignoring them.  You could find out whether
 it's right by trying it with enable_seqscan=off.
 
 My point is that this is just a problem with inherited tables.  It
 should be obvious to postgres that few rows are being returned, but in
 the inherited tables case it doesn't use indexes.  This was just an
 example.  In a 52 gig table I have a select id from table limit 1 order
 by id desc returns instantly, but as soon as you declare a child table
 it tries to seq scan all the tables.

Oh, sorry, I must have misunderstood. As Greg says, this is fixed in 9.1.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index usage on queries on inherited tables

2011-04-27 Thread Robert Haas
On Fri, Apr 1, 2011 at 2:41 AM, Joseph Shraibman j...@selectacast.net wrote:
 When I do a query on a table with child tables on certain queries pg
 uses indexes and on others it doesn't. Why does this happen? For example:


 [local]:playpen= explain analyze select * from vis where id  10747 ;
                                                               QUERY
 PLAN
 
  Result  (cost=4.29..115.11 rows=325 width=634) (actual
 time=0.063..0.116 rows=5 loops=1)
   -  Append  (cost=4.29..115.11 rows=325 width=634) (actual
 time=0.053..0.090 rows=5 loops=1)
         -  Bitmap Heap Scan on vis  (cost=4.29..23.11 rows=5
 width=948) (actual time=0.051..0.058 rows=5 loops=1)
               Recheck Cond: (id  10747)
               -  Bitmap Index Scan on vis_pkey  (cost=0.00..4.29
 rows=5 width=0) (actual time=0.037..0.037 rows=5 loops=1)
                     Index Cond: (id  10747)
         -  Seq Scan on vis_for_seg_1_2011_03 vis  (cost=0.00..11.50
 rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: (id  10747)
         -  Seq Scan on vis_for_seg_4_2011_03 vis  (cost=0.00..11.50
 rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: (id  10747)
         -  Seq Scan on vis_for_seg_66_2011_03 vis  (cost=0.00..11.50
 rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
               Filter: (id  10747)
         -  Seq Scan on vis_for_seg_69_2011_03 vis  (cost=0.00..11.50
 rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: (id  10747)
         -  Seq Scan on vis_for_seg_79_2011_03 vis  (cost=0.00..11.50
 rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
               Filter: (id  10747)
         -  Seq Scan on vis_for_seg_80_2011_03 vis  (cost=0.00..11.50
 rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
               Filter: (id  10747)
         -  Seq Scan on vis_for_seg_82_2011_03 vis  (cost=0.00..11.50
 rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
               Filter: (id  10747)
         -  Seq Scan on vis_for_seg_87_2011_03 vis  (cost=0.00..11.50
 rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
               Filter: (id  10747)
  Total runtime: 0.724 ms
 (23 rows)

 Time: 5.804 ms
 [local]:playpen= explain analyze select * from vis where id = 10747 ;

 QUERY
 PLAN

 
  Result  (cost=0.00..74.41 rows=9 width=664) (actual time=0.060..0.503
 rows=1 loops=1)
   -  Append  (cost=0.00..74.41 rows=9 width=664) (actual
 time=0.053..0.493 rows=1 loops=1)
         -  Index Scan using vis_pkey on vis  (cost=0.00..8.27 rows=1
 width=948) (actual time=0.051..0.055 rows=1 loops=1)
               Index Cond: (id = 10747)
         -  Index Scan using vis_for_seg_1_2011_03_pkey on
 vis_for_seg_1_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
 time=0.122..0.122 rows=0 loops=1)
               Index Cond: (id = 10747)
         -  Index Scan using vis_for_seg_4_2011_03_pkey on
 vis_for_seg_4_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
 time=0.043..0.043 rows=0 loops=1)
               Index Cond: (id = 10747)
         -  Index Scan using vis_for_seg_66_2011_03_pkey on
 vis_for_seg_66_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
 time=0.041..0.041 rows=0 loops=1)
               Index Cond: (id = 10747)
         -  Index Scan using vis_for_seg_69_2011_03_pkey on
 vis_for_seg_69_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
 time=0.041..0.041 rows=0 loops=1)
               Index Cond: (id = 10747)
         -  Index Scan using vis_for_seg_79_2011_03_pkey on
 vis_for_seg_79_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
 time=0.043..0.043 rows=0 loops=1)
               Index Cond: (id = 10747)
         -  Index Scan using vis_for_seg_80_2011_03_pkey on
 vis_for_seg_80_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
 time=0.041..0.041 rows=0 loops=1)
               Index Cond: (id = 10747)
         -  Index Scan using vis_for_seg_82_2011_03_pkey on
 vis_for_seg_82_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
 time=0.049..0.049 rows=0 loops=1)
               Index Cond: (id = 10747)
         -  Index Scan using vis_for_seg_87_2011_03_pkey on
 vis_for_seg_87_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
 time=0.043..0.043 rows=0 loops=1)
               Index Cond: (id = 10747)
  Total runtime: 1.110 ms
 (21 rows)

 [local]:playpen= select version();

 version
 
  PostgreSQL 9.0.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
 20080704 (Red Hat 4.1.2-48), 32-bit
 (1 row)

In the first 

Re: [PERFORM] index usage on queries on inherited tables

2011-04-27 Thread Joseph Shraibman
On 04/27/2011 04:32 PM, Robert Haas wrote:
 In the first case, PostgreSQL evidently thinks that using the indexes
 will be slower than just ignoring them.  You could find out whether
 it's right by trying it with enable_seqscan=off.

My point is that this is just a problem with inherited tables.  It
should be obvious to postgres that few rows are being returned, but in
the inherited tables case it doesn't use indexes.  This was just an
example.  In a 52 gig table I have a select id from table limit 1 order
by id desc returns instantly, but as soon as you declare a child table
it tries to seq scan all the tables.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index usage on queries on inherited tables

2011-04-27 Thread Samuel Gendler
On Wed, Apr 27, 2011 at 2:11 PM, Joseph Shraibman j...@selectacast.netwrote:

 On 04/27/2011 04:32 PM, Robert Haas wrote:
  In the first case, PostgreSQL evidently thinks that using the indexes
  will be slower than just ignoring them.  You could find out whether
  it's right by trying it with enable_seqscan=off.

 My point is that this is just a problem with inherited tables.  It
 should be obvious to postgres that few rows are being returned, but in
 the inherited tables case it doesn't use indexes.  This was just an
 example.  In a 52 gig table I have a select id from table limit 1 order
 by id desc returns instantly, but as soon as you declare a child table
 it tries to seq scan all the tables.


If I'm understanding correctly, this kind of obviates the utility of
partitioning if you structure a warehouse in a traditional manner.  Assuming
a fact table partitioned by time, but with foreign keys to a time dimension,
it is now not possible to gain any advantage from the partitioning if
selecting on columns in the time dimension.

select * from fact_table f join time_dimension t on f.time_id = t.time_id
where t.quarter=3 and t.year = 2010 will scan all partitions of the fact
table despite the fact that all of the rows would come from 3 partitions,
assuming a partitioning schema that uses one partition for each month.

I use a time id that is calculable from the from the timestamp so it doesn't
need to be looked up, and partitioning on time_id directly is easy enough to
handle, but if I'm understanding the problem, it sounds like nothing short
of computing the appropriate time ids before issuing the query and then
including a 'where f.time_id between x and y' clause to the query will
result in the partitions being correctly excluded.  Is that what people are
doing to solve this problem?  The alternative is to leave a timestamp column
in the fact table (something I tend to do since it makes typing ad-hoc
queries in psql much easier) and partition on that column and then always
include a where clause for that column that is at least as large as the
requested row range.  Both result in fairly ugly queries, though I can
certainly see how I might structure my code to always build queries which
adhere to this.

I'm just in the process of designing a star schema for a project and was
intending to use exactly the structure I described at the top of the email.
Is there a postgres best-practices for solving this problem? There's no way
I can get away without partitioning.  I'm looking at a worst case table of
100,000 rows being written every 5 minutes, 24x7 - 29 million rows per day,
a billion rows per month - with most queries running over a single month or
comparing same months from differing years and quarters - so a month based
partitioning.  Normal case is closer to 10K rows per 5 minutes.

Suggestions?

--sam


Re: [PERFORM] index usage on queries on inherited tables

2011-04-27 Thread Greg Smith

Joseph Shraibman wrote:

In a 52 gig table I have a select id from table limit 1 order
by id desc returns instantly, but as soon as you declare a child table
it tries to seq scan all the tables.
  


This is probably the limitation that's fixed in PostgreSQL 9.1 by this 
commit (following a few others leading up to it):  
http://archives.postgresql.org/pgsql-committers/2010-11/msg00028.php


There was a good example showing what didn't work as expected before 
(along with an earlier patch that didn't everything the larger 9.1 
improvement does) at 
http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php ; 
ORDER BY x DESC LIMIT 1 returns the same things as MAX(x).


It's a pretty serious issue with the partitioning in earlier versions.  
I know of multiple people, myself included, who have been compelled to 
apply this change to an earlier version of PostgreSQL to make larger 
partitioned databases work correctly.  The other option is to manually 
decompose the queries into ones that target each of the child tables 
individually, then combine the results, which is no fun either.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Index usage with functions in where condition

2010-07-09 Thread Jeremy Palmer
I'm having trouble getting the query planner to use indexes. The situation 
occurs when writing a query that uses functions for defining the parameters for 
the conditions on the indexed columns. The system I'm running is Windows Server 
2003, using version 8.4.2 of PostgreSQL.

This is the following table that I'm running my query against:

CREATE TABLE crs_coordinate
(
  id integer NOT NULL,
  nod_id integer NOT NULL,
  value1 numeric(22,12),
  value2 numeric(22,12),
  CONSTRAINT crs_coordinate_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX coo_value1 ON crs_coordinate USING btree (value1);
CREATE INDEX coo_value2 ON crs_coordinate USING btree (value2);

This table has 23 million rows in it and was analysed just before planning my 
queries. 

This is the query that does not use the indexes:

SELECT
  coo.nod_id,
  640*radians(sqrt((coo.value1 - -41.0618)^2+((coo.value2 - 
175.58461)*cos(radians(-41.0618)))^2)) as distance
FROM 
  crs_coordinate coo
WHERE
  coo.value1 between -41.0618-degrees(1200.0/640.0) and 
-41.0618+degrees(1200.0/640.0) and
  coo.value2 between 
175.58461-degrees(1200.0/640.0)/(cos(radians(-41.0618))) and 
175.58461+degrees(1200.0/640.0)/(cos(radians(-41.0618)));

Seq Scan on crs_coordinate coo  (cost=0.00..1039607.49 rows=592 width=28)
  Filter: (((value1)::double precision = (-41.0725429586587)::double 
precision) AND ((value1)::double precision = (-41.0510570413413)::double 
precision) AND ((value2)::double precision = 175.570362072701::double 
precision) AND ((value2)::double precision = 175.598857927299::double 
precision))

However if I pre-evaluated the parameters for the where condition on the value1 
and value2 columns, the planner chooses to use the indexes:

SELECT
  coo.nod_id,
  640*radians(sqrt((coo.value1 - -41.0618)^2+((coo.value2 - 
175.58461)*cos(radians(-41.0618)))^2)) as distance
FROM 
  crs_coordinate coo
WHERE
  coo.value1 BETWEEN -41.07254296 AND -41.05105704 AND
  coo.value2 BETWEEN 175.57036207 AND 175.59885792;

Bitmap Heap Scan on crs_coordinate coo  (cost=5299.61..6705.41 rows=356 
width=28)
  Recheck Cond: ((value1 = (-41.07254296)) AND (value1 = (-41.05105704)) AND 
(value2 = 175.57036207) AND (value2 = 175.59885792))
  -  BitmapAnd  (cost=5299.61..5299.61 rows=356 width=0)
-  Bitmap Index Scan on coo_value1  (cost=0.00..1401.12 rows=54923 
width=0)
  Index Cond: ((value1 = (-41.07254296)) AND (value1 = 
(-41.05105704)))
-  Bitmap Index Scan on coo_value2  (cost=0.00..3898.06 rows=153417 
width=0)
  Index Cond: ((value2 = 175.57036207) AND (value2 = 
175.59885792))

So why is the first query not using the indexes on the value1 and value2 
columns? I'm assuming that both the COS and RAIDIANS functions are STRICT 
IMMUTABLE, so logically the evaluation of these functions in the where clause 
should be inlined. Looking at the query plan this inlining does seem to be 
happening...

At this stage I have a work around by putting the query into a plpgsql function 
and using dynamic SQL. But it is still frustrating why the planner seems to be 
working in a far from optimal fashion. Can anyone shed some light on this for 
me?

Thanks,
Jeremy

__

This message contains information, which is confidential and may be subject to 
legal privilege. 
If you are not the intended recipient, you must not peruse, use, disseminate, 
distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 
0800 665 463 or i...@linz.govt.nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any 
attachments, after its transmission from LINZ.

Thank you.
__

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage with functions in where condition

2010-07-09 Thread Tom Lane
Jeremy Palmer jpal...@linz.govt.nz writes:
 This is the query that does not use the indexes:

 SELECT
   coo.nod_id,
   640*radians(sqrt((coo.value1 - -41.0618)^2+((coo.value2 - 
 175.58461)*cos(radians(-41.0618)))^2)) as distance
 FROM 
   crs_coordinate coo
 WHERE
   coo.value1 between -41.0618-degrees(1200.0/640.0) and 
 -41.0618+degrees(1200.0/640.0) and
   coo.value2 between 
 175.58461-degrees(1200.0/640.0)/(cos(radians(-41.0618))) and 
 175.58461+degrees(1200.0/640.0)/(cos(radians(-41.0618)));

Those expressions yield float8, not numeric, and numeric vs float8 isn't
an indexable operator for reasons we needn't get into here.  You should
probably rethink whether numeric is really the best choice of datatype
for your columns, if this is the sort of value you expect to work with
--- you're paying a considerable price in speed and space for
perhaps-illusory precision gains.  But if you insist on using numeric
then the solution is to cast the expression results to numeric
explicitly.

BTW I wonder whether you ought not be looking into postgis rather than
rolling-your-own coordinate arithmetic ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage with functions in where condition

2010-07-09 Thread Jeremy Palmer
Hi Tom,

Thanks for the help - much appreciated.

Yes I'm using PostGIS, and with a simply join to a relating table I could get 
access to the geometry for these point positions. Is using the GIST r-tree 
index faster than using the 2 b-tree indexes on the lat and long values? I 
guess this is a question for the PostGIS guys and a quick test could tell me 
anyway! My memory is that the GIST r-tree index is slow for points at the 
moment, and that a good implementation of a kd-tree index over GIST is required 
for better speed.

Regards,

Jeremy Palmer
Geodetic Surveyor
National Geodetic Office
 
Land Information New Zealand | Toitu te whenua
160 Lambton Quay | Private Box 5501 | Wellington 6145
 
DDI: 64 (0)4 498 3537 | Fax: 64 (0)4 498 3837 | www.linz.govt.nz

 
-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Saturday, 10 July 2010 11:20 a.m.
To: Jeremy Palmer
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Index usage with functions in where condition 

Jeremy Palmer jpal...@linz.govt.nz writes:
 This is the query that does not use the indexes:

 SELECT
   coo.nod_id,
   640*radians(sqrt((coo.value1 - -41.0618)^2+((coo.value2 - 
 175.58461)*cos(radians(-41.0618)))^2)) as distance
 FROM 
   crs_coordinate coo
 WHERE
   coo.value1 between -41.0618-degrees(1200.0/640.0) and 
 -41.0618+degrees(1200.0/640.0) and
   coo.value2 between 
 175.58461-degrees(1200.0/640.0)/(cos(radians(-41.0618))) and 
 175.58461+degrees(1200.0/640.0)/(cos(radians(-41.0618)));

Those expressions yield float8, not numeric, and numeric vs float8 isn't
an indexable operator for reasons we needn't get into here.  You should
probably rethink whether numeric is really the best choice of datatype
for your columns, if this is the sort of value you expect to work with
--- you're paying a considerable price in speed and space for
perhaps-illusory precision gains.  But if you insist on using numeric
then the solution is to cast the expression results to numeric
explicitly.

BTW I wonder whether you ought not be looking into postgis rather than
rolling-your-own coordinate arithmetic ...

regards, tom lane
__

This message contains information, which is confidential and may be subject to 
legal privilege. 
If you are not the intended recipient, you must not peruse, use, disseminate, 
distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 
0800 665 463 or i...@linz.govt.nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any 
attachments, after its transmission from LINZ.

Thank you.
__

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] index usage in not like

2010-02-18 Thread AI Rumman
Not like operation does not use index.

select * from vtiger_contactscf where lower(cf_1253) not like
lower('Former%')

I created index on lower(cf_1253).

How can I ensure index usage in not like operation?
Anyone please help.


Re: [PERFORM] index usage in not like

2010-02-18 Thread Thom Brown
On 18 February 2010 11:55, AI Rumman rumman...@gmail.com wrote:
 Not like operation does not use index.

 select * from vtiger_contactscf where lower(cf_1253) not like
 lower('Former%')

 I created index on lower(cf_1253).

 How can I ensure index usage in not like operation?
 Anyone please help.


How many rows do you have in your table?  If there are relatively few,
it probably guesses it to be cheaper to do a sequential scan and
calculate lower values on-the-fly rather than bother with the index.

Thom

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index usage in not like

2010-02-18 Thread Thom Brown
 On Thu, Feb 18, 2010 at 6:00 PM, Thom Brown thombr...@gmail.com wrote:

 On 18 February 2010 11:55, AI Rumman rumman...@gmail.com wrote:
  Not like operation does not use index.
 
  select * from vtiger_contactscf where lower(cf_1253) not like
  lower('Former%')
 
  I created index on lower(cf_1253).
 
  How can I ensure index usage in not like operation?
  Anyone please help.
 

 How many rows do you have in your table?  If there are relatively few,
 it probably guesses it to be cheaper to do a sequential scan and
 calculate lower values on-the-fly rather than bother with the index.

 Thom

On 18 February 2010 12:06, AI Rumman rumman...@gmail.com wrote:
 vtigercrm504=# explain analyze select * from vtiger_contactscf where
 lower(cf_1253)  like 'customer';

 QUERY
 PLAN
 
  Index Scan using vtiger_contactscf_cf_1253_idx on vtiger_contactscf
 (cost=0.00..146.54 rows=6093 width=179) (actual time=0.083..29.868 rows=5171
 loops=1)
Index Cond: (lower((cf_1253)::text) ~=~ 'customer'::character varying)
Filter: (lower((cf_1253)::text) ~~ 'customer'::text)
  Total runtime: 34.956 ms
 (4 rows)
 vtigercrm504=# explain analyze select * from vtiger_contactscf where
 lower(cf_1253)  like 'customer';

 QUERY
 PLAN
 
  Index Scan using vtiger_contactscf_cf_1253_idx on vtiger_contactscf
 (cost=0.00..146.54 rows=6093 width=179) (actual time=0.083..29.868 rows=5171
 loops=1)
Index Cond: (lower((cf_1253)::text) ~=~ 'customer'::character varying)
Filter: (lower((cf_1253)::text) ~~ 'customer'::text)
  Total runtime: 34.956 ms
 (4 rows)

Could you do the same again for a not like query?

Thom

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index usage in not like

2010-02-18 Thread A. Kretschmer
In response to Thom Brown :
 On 18 February 2010 11:55, AI Rumman rumman...@gmail.com wrote:
  Not like operation does not use index.
 
  select * from vtiger_contactscf where lower(cf_1253) not like
  lower('Former%')
 
  I created index on lower(cf_1253).
 
  How can I ensure index usage in not like operation?
  Anyone please help.
 
 
 How many rows do you have in your table?  If there are relatively few,
 it probably guesses it to be cheaper to do a sequential scan and
 calculate lower values on-the-fly rather than bother with the index.

That's one reason, an other reason, i think, is, that a btree-index can't
search with an 'not like' - operator.



test=*# insert into words select 'fucking example' from 
generate_series(1,1);
INSERT 0 1
test=*# insert into words select 'abc' from generate_series(1,10);
INSERT 0 10
test=*# explain select * from words where lower(w)  like lower('a%') or 
lower(w)  like lower('b%');
 QUERY PLAN
-
 Bitmap Heap Scan on words  (cost=1538.75..6933.39 rows=55643 width=36)
   Recheck Cond: ((lower(w) ~~ 'a%'::text) OR (lower(w) ~~ 'b%'::text))
   Filter: ((lower(w) ~~ 'a%'::text) OR (lower(w) ~~ 'b%'::text))
   -  BitmapOr  (cost=1538.75..1538.75 rows=57432 width=0)
 -  Bitmap Index Scan on idx_words  (cost=0.00..1027.04 rows=39073 
width=0)
   Index Cond: ((lower(w) ~=~ 'a'::text) AND (lower(w) ~~ 
'b'::text))
 -  Bitmap Index Scan on idx_words  (cost=0.00..483.90 rows=18359 
width=0)
   Index Cond: ((lower(w) ~=~ 'b'::text) AND (lower(w) ~~ 
'c'::text))
(8 rows)

test=*# explain select * from words where lower(w) not like lower('a%') or 
lower(w)  like lower('b%');
QUERY PLAN
---
 Seq Scan on words  (cost=0.00..10624.48 rows=282609 width=36)
   Filter: ((lower(w) !~~ 'a%'::text) OR (lower(w) ~~ 'b%'::text))
(2 rows)


In other words: revert your where-condition from 'not like' to multiple 'like' 
conditions for all letters except 'f%'.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index usage in not like

2010-02-18 Thread Kenneth Marshall
On Thu, Feb 18, 2010 at 01:18:10PM +0100, A. Kretschmer wrote:
 In response to Thom Brown :
  On 18 February 2010 11:55, AI Rumman rumman...@gmail.com wrote:
   Not like operation does not use index.
  
   select * from vtiger_contactscf where lower(cf_1253) not like
   lower('Former%')
  
   I created index on lower(cf_1253).
  
   How can I ensure index usage in not like operation?
   Anyone please help.
  
  
  How many rows do you have in your table?  If there are relatively few,
  it probably guesses it to be cheaper to do a sequential scan and
  calculate lower values on-the-fly rather than bother with the index.
 
 That's one reason, an other reason, i think, is, that a btree-index can't
 search with an 'not like' - operator.
 
 
 
 test=*# insert into words select 'fucking example' from 
 generate_series(1,1);
 INSERT 0 1
 test=*# insert into words select 'abc' from generate_series(1,10);
 INSERT 0 10
 test=*# explain select * from words where lower(w)  like lower('a%') or 
 lower(w)  like lower('b%');
  QUERY PLAN
 -
  Bitmap Heap Scan on words  (cost=1538.75..6933.39 rows=55643 width=36)
Recheck Cond: ((lower(w) ~~ 'a%'::text) OR (lower(w) ~~ 'b%'::text))
Filter: ((lower(w) ~~ 'a%'::text) OR (lower(w) ~~ 'b%'::text))
-  BitmapOr  (cost=1538.75..1538.75 rows=57432 width=0)
  -  Bitmap Index Scan on idx_words  (cost=0.00..1027.04 rows=39073 
 width=0)
Index Cond: ((lower(w) ~=~ 'a'::text) AND (lower(w) ~~ 
 'b'::text))
  -  Bitmap Index Scan on idx_words  (cost=0.00..483.90 rows=18359 
 width=0)
Index Cond: ((lower(w) ~=~ 'b'::text) AND (lower(w) ~~ 
 'c'::text))
 (8 rows)
 
 test=*# explain select * from words where lower(w) not like lower('a%') or 
 lower(w)  like lower('b%');
 QUERY PLAN
 ---
  Seq Scan on words  (cost=0.00..10624.48 rows=282609 width=36)
Filter: ((lower(w) !~~ 'a%'::text) OR (lower(w) ~~ 'b%'::text))
 (2 rows)
 
 
 In other words: revert your where-condition from 'not like' to multiple 
 'like' conditions for all letters except 'f%'.
 
 
 Andreas

The 'not like' condition is likely to be extremely non-selective
which would cause a sequential scan to be used in any event whether
or not an index could be used.

Cheers,
Ken


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index usage in not like

2010-02-18 Thread Thom Brown
On 18 February 2010 12:18, A. Kretschmer
andreas.kretsch...@schollglas.com wrote:
 In response to Thom Brown :
 On 18 February 2010 11:55, AI Rumman rumman...@gmail.com wrote:
  Not like operation does not use index.
 
  select * from vtiger_contactscf where lower(cf_1253) not like
  lower('Former%')
 
  I created index on lower(cf_1253).
 
  How can I ensure index usage in not like operation?
  Anyone please help.
 

 How many rows do you have in your table?  If there are relatively few,
 it probably guesses it to be cheaper to do a sequential scan and
 calculate lower values on-the-fly rather than bother with the index.

 That's one reason, an other reason, i think, is, that a btree-index can't
 search with an 'not like' - operator.


Erm.. yes.  Now that you say it, it's obvious. :S

Thom

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index usage in not like

2010-02-18 Thread A. Kretschmer
In response to Kenneth Marshall :
   How many rows do you have in your table?  If there are relatively few,
   it probably guesses it to be cheaper to do a sequential scan and
   calculate lower values on-the-fly rather than bother with the index.
  
  That's one reason, an other reason, i think, is, that a btree-index can't
  search with an 'not like' - operator.
 
 The 'not like' condition is likely to be extremely non-selective
 which would cause a sequential scan to be used in any event whether
 or not an index could be used.

That's true, but i have an example where the 'not like' condition is
extremely selective:

,[  sql  ]
| test=*# select count(1) from words where lower(w) not like lower('f%');
|  count
| ---
| 10
| (1 row)
|
| test=*# select count(1) from words where lower(w)  like lower('f%');
|  count
| ---
|  1
| (1 row)
`


But the index can't use:

,[  code  ]
| test=*# explain select * from words where lower(w) not like lower('f%');
| QUERY PLAN
| --
|  Seq Scan on words  (cost=0.00..4396.15 rows=10 width=47)
|Filter: (lower(w) !~~ 'f%'::text)
| (2 rows)
`


And i think, the reason is:

,[  quote from docu  ]
| B-trees can handle equality and range queries on data that can be sorted
| into some ordering. In particular, the PostgreSQL query planner will
| consider using a B-tree index whenever an indexed column is involved in
| a comparison using one of these operators:
|
| 
| =
| =
| =
| 
`



Regards, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Index usage with sub select or inner joins

2008-11-12 Thread Julien Theulier
Hello,

I am doing some performances testing on Postgres  I discovered the
following behavior, when using 2 different ways of writing selects (but
doing the same aggregations at the end):
1. test case 1, using outer join:
create table test2 as 
select
soj_session_log_id, pv_timestamp, vi_pv_id,a.item_id,
coalesce(sum(case when (bid_datepv_timestamp and bid_date=pv_timestamp -
INTERVAL '3 day') then 1 else 0 end)) as recent_sales_3d1,
coalesce(sum(case when (bid_datepv_timestamp and bid_date=pv_timestamp -
INTERVAL '7 day') then 1 else 0 end)) as recent_sales_7d1,
coalesce(sum(case when (bid_datepv_timestamp and bid_date=pv_timestamp -
INTERVAL '14 day') then 1 else 0 end)) as recent_sales_14d1,
coalesce(sum(case when (bid_datepv_timestamp and bid_date=pv_timestamp -
INTERVAL '30 day') then 1 else 0 end)) as recent_sales_30d1,
coalesce(sum(case when (bid_datepv_timestamp and bid_date=pv_timestamp -
INTERVAL '60 day') then 1 else 0 end)) as recent_sales_60d1
from bm_us_views_main_1609 a
left outer join bm_us_bids b on (b.item_id=a.item_id and
b.bid_datea.pv_timestamp and (b.bid_date=a.pv_timestamp - INTERVAL '60
day'))
where a.item_type in (7,9) and qty1
group by soj_session_log_id, pv_timestamp, vi_pv_id, a.item_id;;

This query doesn't use any index according to the explain plan:
HashAggregate  (cost=672109.07..683054.81 rows=182429 width=49)
  -  Merge Left Join  (cost=646489.83..668004.42 rows=182429 width=49)
Merge Cond: (a.item_id = b.item_id)
Join Filter: ((b.bid_date  a.pv_timestamp) AND (b.bid_date =
(a.pv_timestamp - '60 days'::interval)))
-  Sort  (cost=331768.62..332224.69 rows=182429 width=41)
  Sort Key: a.item_id
  -  Seq Scan on bm_us_views_main_1609 a
(cost=0.00..315827.08 rows=182429 width=41)
Filter: ((item_type = ANY ('{7,9}'::numeric[])) AND
(qty  1))
-  Sort  (cost=314669.01..320949.52 rows=2512205 width=19)
  Sort Key: b.item_id
  -  Seq Scan on bm_us_bids b  (cost=0.00..47615.05
rows=2512205 width=19)

2. Test case 2, using sub queries:
create table test2 as 
select
soj_session_log_id, pv_timestamp, vi_pv_id,item_id,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_datepv_timestamp and bid_date=pv_timestamp - INTERVAL '3 day' group by
item_id ),0) as recent_sales_3d,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_datepv_timestamp and bid_date=pv_timestamp - INTERVAL '7 day' group by
item_id ),0) as recent_sales_7d,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_datepv_timestamp and bid_date=pv_timestamp - INTERVAL '14 day' group
by item_id ),0) as recent_sales_14d,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_datepv_timestamp and bid_date=pv_timestamp - INTERVAL '30 day' group
by item_id ),0) as recent_sales_30d,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_datepv_timestamp and bid_date=pv_timestamp - INTERVAL '60 day' group
by item_id ),0) as recent_sales_60d
from bm_us_views_main_1609 a
where item_type in (7,9) and qty1;

This query uses indexes  according to the explain plan:
Seq Scan on bm_us_views_main_1609 a  (cost=0.00..8720230.77 rows=182429
width=41)
  Filter: ((item_type = ANY ('{7,9}'::numeric[])) AND (qty  1))
  SubPlan
-  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)
  -  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)
Index Cond: ((item_id = $0) AND (bid_date  $1) AND
(bid_date = ($1 - '60 days'::interval)))
-  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)
  -  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)
Index Cond: ((item_id = $0) AND (bid_date  $1) AND
(bid_date = ($1 - '30 days'::interval)))
-  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)
  -  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)
Index Cond: ((item_id = $0) AND (bid_date  $1) AND
(bid_date = ($1 - '14 days'::interval)))
-  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)
  -  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)
Index Cond: ((item_id = $0) AND (bid_date  $1) AND
(bid_date = ($1 - '7 days'::interval)))
-  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)
  -  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)
Index Cond: ((item_id = $0) AND (bid_date  $1) AND
(bid_date = ($1 - '3 days'::interval)))

The index bm_us_bids_item_ix is on columns item_id, bidder_id, bid_date


QUESTION: Why the planner choose seq scan in the first case  indexes scan
in the second case? In a more general way, I observed that the planner has
difficulties to select index scans  does in almost all the cases seq scan,
when doing 

Re: [PERFORM] Index usage with sub select or outer joins

2008-11-12 Thread Julien Theulier
Hello, Joshua,

I did different test cases and here are the results (numbers in seconds),
using (case sub queries) or not (case join) the index:
Rows (main table)   Outer Join  Sub queries
setting
1396163 rows39.219.6
work_mem=256Mb
3347443 rows72.2203.1
work_mem=256Mb
3347443 rows70.331.1
work_mem=1024Mb
4321072 rows115 554.9
work_mem=256Mb
4321072 rows111 583
work_mem=1024Mb
All outer joins where done without index uses

To force the use of the index for the first case (outer join), I have change
the seq_scan cost (from 1 to 2.5), it takes now only 6.1s for the outer join
on 1.4M rows. New explain plan below:
HashAggregate  (cost=457881.84..460248.84 rows=39450 width=49)
  -  Nested Loop Left Join  (cost=0.00..456994.22 rows=39450 width=49)
-  Seq Scan on bm_us_views_main_2608 a  (cost=0.00..223677.45
rows=39450 width=41)
  Filter: ((item_type = ANY ('{7,9}'::numeric[])) AND (qty 
1))
-  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..5.65 rows=13 width=19)
  Index Cond: ((b.item_id = a.item_id) AND (b.bid_date 
a.pv_timestamp) AND (b.bid_date = (a.pv_timestamp - '60 days'::interval)))

Index bm_us_bids_item_ix is on item_id, bidder_id (not used in the
condition)  bid_date

What can be the recommendations on tuning the different costs so it can
better estimate the seq scan  index scans costs? I think the issue is
there. But didn't find any figures helping to choose the correct parameters
according to cpu  disks speed

Regards,
Julien Theulier

-Message d'origine-
De : Joshua Tolley [mailto:[EMAIL PROTECTED] 
Envoyé : mercredi 12 novembre 2008 14:54
À : Julien Theulier
Cc : pgsql-performance@postgresql.org
Objet : Re: [PERFORM] Index usage with sub select or inner joins

On Wed, Nov 12, 2008 at 02:22:47PM +0100, Julien Theulier wrote:
 QUESTION: Why the planner choose seq scan in the first case  indexes 
 scan in the second case? In a more general way, I observed that the 
 planner has difficulties to select index scans  does in almost all 
 the cases seq scan, when doing join queries. After investigations, it 
 looks like when you join table a with table b on a column x and y and 
 you have an index on column x only, the planner is not able to choose 
 the index scan. You have to build the index corresponding exactly to 
 the join statement btw the 2 tables

Short, general answer: index scans aren't always faster than sequential
scans, and the planner is smart enough to know that. Googling Why isn't
postgresql using my index provides more detailed results, but in short, if
it scans an index, it has to read pages from the index, and for all the
tuples it finds in the index, it has to read once again from the heap,
whereas a sequential scan requires reading once from the heap. If your query
will visit most of the rows of the table, pgsql will choose a sequential
scan over an index scan.

- Josh / eggyknap


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage with sub select or inner joins

2008-11-12 Thread Joshua Tolley
On Wed, Nov 12, 2008 at 02:22:47PM +0100, Julien Theulier wrote:
 QUESTION: Why the planner choose seq scan in the first case  indexes scan
 in the second case? In a more general way, I observed that the planner has
 difficulties to select index scans  does in almost all the cases seq scan,
 when doing join queries. After investigations, it looks like when you join
 table a with table b on a column x and y and you have an index on column x
 only, the planner is not able to choose the index scan. You have to build
 the index corresponding exactly to the join statement btw the 2 tables

Short, general answer: index scans aren't always faster than sequential
scans, and the planner is smart enough to know that. Googling Why isn't
postgresql using my index provides more detailed results, but in short,
if it scans an index, it has to read pages from the index, and for all
the tuples it finds in the index, it has to read once again from the
heap, whereas a sequential scan requires reading once from the heap. If
your query will visit most of the rows of the table, pgsql will choose a
sequential scan over an index scan.

- Josh / eggyknap


signature.asc
Description: Digital signature


Re: [PERFORM] Index usage problem on 8.3.3

2008-10-31 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Jeff Frost [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Huh.  That does sound like it's a version-to-version difference.
 There's nothing in the CVS log that seems related though.  Are you
 willing to post your test case?
 
 It's a customer DB, so I'll contact them and see if we can boil it down
 to a test case with no sensitive data.

 Well, if there was a change it seems to have been in the right direction
 ;-) so this is mostly just idle curiosity.  Don't jump through hoops to
 get a test case.

Assuming it's not a bug...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage problem on 8.3.3

2008-10-31 Thread Jeff Frost

On Fri, 31 Oct 2008, Gregory Stark wrote:


Tom Lane [EMAIL PROTECTED] writes:


Jeff Frost [EMAIL PROTECTED] writes:

Tom Lane wrote:

Huh.  That does sound like it's a version-to-version difference.
There's nothing in the CVS log that seems related though.  Are you
willing to post your test case?


It's a customer DB, so I'll contact them and see if we can boil it down
to a test case with no sensitive data.


Well, if there was a change it seems to have been in the right direction
;-) so this is mostly just idle curiosity.  Don't jump through hoops to
get a test case.


Assuming it's not a bug...


Well, after boiling down my test case to the bare essentials, I was unable to 
reproduce the different behavior between 8.3.3 and 8.3.4.  Now, I've gone back 
to the original script and can't reproduce the behavior I previously saw on 
8.3.4 and my screen session doesn't have enough scrollback to look at what 
happened previously.  I was thinking perhaps I had inadvertently committed the 
transaction, but then the act would have been dropped as it's a temp table 
created with ON COMMIT DROP.  But, I've tested 3 times in a row and every time 
8.3.4 uses the seq scan just like 8.3.3 now, so I must've done something 
differently to get that result as Tom had originally suggested.  I just can't 
think what it might have been.  Perhaps it's time to buy some glasses. :-/


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Jeff Frost
I've run across a strange problem with PG 8.3.3 not using indexes on a
particular table after building the table during a transaction.

You can see a transcript of the issue here:

http://gist.github.com/21154

Interestingly, if I create another temp table 'CREATE TEMP TABLE AS
SELECT * FROM act' as seen on line 107, then add the same indexes to
that table, PG will use the indexes.  While it's not in the gist
transcript, even an extremely simple query like:

SELECT * FROM act WHERE act_usr_id = 1;

will not use the index on the original act table, but the jefftest and
jefftest2 tables both work fine.  As you can probably see in the
transcript, the tables have been ANALYZEd.  I even tried 'enable
seqscan=0;' and that made the cost really high for the seq scan, but the
planner still chose the seq scan.

The issue does not affect 8.2.3 nor does it affect 8.3.4.  I didn't see
any mention of a fix for this sort of thing in 8.3.4's release notes.  I
was wondering if this is a known bug in 8.3.3 (and maybe other 8.3.x
versions) and just didn't make it into the release notes of 8.3.4?

-- 
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Tom Lane
Jeff Frost [EMAIL PROTECTED] writes:
 I've run across a strange problem with PG 8.3.3 not using indexes on a
 particular table after building the table during a transaction.

This may be a HOT side-effect ... is pg_index.indcheckxmin set for
the index?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Tom Lane
Jeff Frost [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 This may be a HOT side-effect ... is pg_index.indcheckxmin set for
 the index?
 
 Yep, sure enough, the 'act' table's indexes have it set and jefftest and
 jefftest2's indexes do not.

Okay.  What that means is that the indexes were created on data that had
already been inserted and updated to some extent, resulting in
HOT-update chains that turned out to be illegal for the new indexes.
The way we deal with this is to mark the indexes as not usable by any
query that can still see the dead HOT-updated tuples.

Your best bet for dodging the problem is probably to break the operation
into two transactions, if that's possible.  INSERT and UPDATE in the
first xact, create the indexes at the start of the second.  (Hmm ...
I'm not sure if that's sufficient if there are other concurrent
transactions; but it's certainly necessary.)  Another possibility is
to create the indexes just after data load, before you start updating
the columns they're on.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Jeff Frost
Tom Lane wrote:
 Okay.  What that means is that the indexes were created on data that had
 already been inserted and updated to some extent, resulting in
 HOT-update chains that turned out to be illegal for the new indexes.
 The way we deal with this is to mark the indexes as not usable by any
 query that can still see the dead HOT-updated tuples.

 Your best bet for dodging the problem is probably to break the operation
 into two transactions, if that's possible.  INSERT and UPDATE in the
 first xact, create the indexes at the start of the second.  (Hmm ...
 I'm not sure if that's sufficient if there are other concurrent
 transactions; but it's certainly necessary.)  Another possibility is
 to create the indexes just after data load, before you start updating
 the columns they're on.

   
Thanks Tom!

Any idea why I don't see it on 8.3.4?

-- 
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Tom Lane
Jeff Frost [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Okay.  What that means is that the indexes were created on data that had
 already been inserted and updated to some extent, resulting in
 HOT-update chains that turned out to be illegal for the new indexes.
 The way we deal with this is to mark the indexes as not usable by any
 query that can still see the dead HOT-updated tuples.

 Any idea why I don't see it on 8.3.4?

I think it's more likely some small difference in your test conditions
than any real version-to-version difference.  In particular I think the
still see test might be influenced by the ages of transactions running
concurrently.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Jeff Frost

On Thu, 30 Oct 2008, Tom Lane wrote:


Any idea why I don't see it on 8.3.4?


I think it's more likely some small difference in your test conditions
than any real version-to-version difference.  In particular I think the
still see test might be influenced by the ages of transactions running
concurrently.


Interesting.  This is on a test server which has no other concurrent 
transactions and it acts the same way after I stopped 8.3.4 and started up 
8.3.3 again as it did before stopping 8.3.3 to bring up 8.3.4.  Hrmm..I'm not 
sure that makes sense.  So, I did the test with the sql script on 8.3.3, then 
shut down 8.3.3, started up 8.3.4 on the same data dir, ran the test 
successfully.  Next I shut down 8.3.4 and started 8.3.3 and verified that the 
behavior was still the same on 8.3.3.  I wonder what else I might be doing 
differently.


The good news is that making the indexes before the updates seems to make the 
planner happy!


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Tom Lane
Jeff Frost [EMAIL PROTECTED] writes:
 On Thu, 30 Oct 2008, Tom Lane wrote:
 Any idea why I don't see it on 8.3.4?
 
 I think it's more likely some small difference in your test conditions
 than any real version-to-version difference.  In particular I think the
 still see test might be influenced by the ages of transactions running
 concurrently.

 Interesting.  This is on a test server which has no other concurrent 
 transactions and it acts the same way after I stopped 8.3.4 and started up 
 8.3.3 again as it did before stopping 8.3.3 to bring up 8.3.4.  Hrmm..I'm not 
 sure that makes sense.  So, I did the test with the sql script on 8.3.3, then 
 shut down 8.3.3, started up 8.3.4 on the same data dir, ran the test 
 successfully.  Next I shut down 8.3.4 and started 8.3.3 and verified that the 
 behavior was still the same on 8.3.3.  I wonder what else I might be doing 
 differently.

Huh.  That does sound like it's a version-to-version difference.
There's nothing in the CVS log that seems related though.  Are you
willing to post your test case?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Jeff Frost


Tom Lane wrote:
 Jeff Frost [EMAIL PROTECTED] writes:
   
 On Thu, 30 Oct 2008, Tom Lane wrote:
 
 Any idea why I don't see it on 8.3.4?
 
 I think it's more likely some small difference in your test conditions
 than any real version-to-version difference.  In particular I think the
 still see test might be influenced by the ages of transactions running
 concurrently.
   

   
 Interesting.  This is on a test server which has no other concurrent 
 transactions and it acts the same way after I stopped 8.3.4 and started up 
 8.3.3 again as it did before stopping 8.3.3 to bring up 8.3.4.  Hrmm..I'm 
 not 
 sure that makes sense.  So, I did the test with the sql script on 8.3.3, 
 then 
 shut down 8.3.3, started up 8.3.4 on the same data dir, ran the test 
 successfully.  Next I shut down 8.3.4 and started 8.3.3 and verified that 
 the 
 behavior was still the same on 8.3.3.  I wonder what else I might be doing 
 differently.
 

 Huh.  That does sound like it's a version-to-version difference.
 There's nothing in the CVS log that seems related though.  Are you
 willing to post your test case?
   
   
It's a customer DB, so I'll contact them and see if we can boil it down
to a test case with no sensitive data.

-- 
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032



Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Tom Lane
Jeff Frost [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Huh.  That does sound like it's a version-to-version difference.
 There's nothing in the CVS log that seems related though.  Are you
 willing to post your test case?
 
 It's a customer DB, so I'll contact them and see if we can boil it down
 to a test case with no sensitive data.

Well, if there was a change it seems to have been in the right direction
;-) so this is mostly just idle curiosity.  Don't jump through hoops to
get a test case.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] index usage on arrays

2008-02-07 Thread andrew klassen
I am using Postgres 8.2.5. 
 
I have a table that has rows containing a variable length array with a known 
maximum. 
I was doing selects on the array elements using an ANY match. The performance 
was not too good as my table got bigger. So I added an index on the array. 
That didn't help since the select was not using it.  I saw a thread in the 
mailing lists stating the index wouldn't be used. 
 
So I created indices on the individual array elements and then do a select
on each element separately and then combine each match using OR. 
This did substantially increase the select performance. However, it may 
be difficult to maintain this approach over time as the maximum array 
size may increase dramatically and forming the query will become tedious. 
 
Is there any alternative to what am I currently doing other than creating a row 
for 
each array element, i.e. stop using an array and use a separate row for each 
array index? The reason I didn't want to take this approach is because there 
are 
other columns in the row that will be duplicated needlessly.
 
Thanks, Andrew


  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 


Re: [PERFORM] index usage on arrays

2008-02-07 Thread Tom Lane
andrew klassen [EMAIL PROTECTED] writes:
 Is there any alternative to what am I currently doing other than creating a 
 row for 
 each array element,

Since (I think) 8.2, you could create a GIN index on the array column
and then array overlap () would be indexable.  GIN has some
performance issues if the table is heavily updated, though.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] index usage on arrays

2008-02-07 Thread Oleg Bartunov

andrew,
what are your queries ? Have you seen contrib/intarray, 
GIN index ?


On Thu, 7 Feb 2008, andrew klassen wrote:


I am using Postgres 8.2.5.

I have a table that has rows containing a variable length array with a known 
maximum.
I was doing selects on the array elements using an ANY match. The performance
was not too good as my table got bigger. So I added an index on the array.
That didn't help since the select was not using it.  I saw a thread in the
mailing lists stating the index wouldn't be used.

So I created indices on the individual array elements and then do a select
on each element separately and then combine each match using OR.
This did substantially increase the select performance. However, it may
be difficult to maintain this approach over time as the maximum array
size may increase dramatically and forming the query will become tedious.

Is there any alternative to what am I currently doing other than creating a row 
for
each array element, i.e. stop using an array and use a separate row for each
array index? The reason I didn't want to take this approach is because there are
other columns in the row that will be duplicated needlessly.

Thanks, Andrew


 

Be a better friend, newshound, and
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Index usage when bitwise operator is used

2007-09-18 Thread valgog
Hi Tom,

do you think it would be a good idea to ask GIN index team to
implement an int-based bitmap set indexing operator for GIN/GiST based
indexes? Or there will be a possibility to somehow optimally index
arrays of enumerations to implement such bitmap structures in 8.3 or
later postgresql versions?

With best regards,

-- Valentine

On Sep 17, 3:37 pm, [EMAIL PROTECTED] (Tom Lane) wrote:
 Kevin Grittner [EMAIL PROTECTED] writes:
  On Mon, Sep 17, 2007 at  2:49 AM, in message
  [EMAIL PROTECTED], valgog
  [EMAIL PROTECTED] wrote:=20
  Are you sure you understood what was the question?

  Is the TBL1.CATEGORY = TBL2.CATEGORY the same as TBL1.CATEGORY 
  TBL2.CATEGORY  0?
  Yes, given that he stipulated that one and only one bit would be set.

 Really?  In that case, isn't this bit-field just a bad implementation of
 an enum-style field?

 regards, tom lane

 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Index usage when bitwise operator is used

2007-09-17 Thread valgog
Hi,

I could not find and normal solution for that issue. But I am using
some workarounds for that issue.

The solution, that I am using now is to create an index for every bit
of your bitmap field.

So something like

CREATE INDEX idx_hobbybit_0_limited
  ON versionA.user_fast_index
  USING btree
  (gender, dateofbirth) -- here the gender and dateofbirth fields are
the fields that we usually ORDER BY in the select statements, but you
can play with the needed fields
  WHERE (hobby_bitmap  1)  0;

by creating such an index for every used bit and combining WHERE
(hobby_bitmap  1 )  0 like statements the planner will be choosing
the right index to use.

Another workaround, that will be more applicable in your case I think,
is to create a functional GIN index on your bitmap field using a
static function to create an array of bitmap keys from your bitmap
field.

CREATE OR REPLACE FUNCTION
versionA.bitmap_to_bit_array(source_bitmap integer)
  RETURNS integer[] AS
'select ARRAY( select (1  s.i) from generate_series(0, 32) as s(i)
where ( 1  s.i )  $1  0 )'
  LANGUAGE 'sql' IMMUTABLE STRICT;

And than create a GIN index on the needed field using this stored
procedure. After that, it would be possible to use intarray set
operators on the result of that function. This will also make it
possible to use that GIN index.

Actually it would be much much better if it were possible to build GIN
indexes directly on the bitmap fields. But this is to be implemented
by GIN and GiST index development team. Probably would be not a bad
idea to make a feature request on them.


With best regards,

Valentine Gogichashvili

On Sep 13, 2:30 pm, [EMAIL PROTECTED] (W.Alphonse HAROUNY) wrote:
 Hello,

 My question is about index usage when bitwise operations are invoked.
 Situation Context:
 --

 Lets suppose we have 2 tables TBL1 and TBL2 as the following:
 TBL1 {
   . ;
   integer categoryGroup; // categoryGroup is declared as an index on TABL1
   . ;

 }

 TBL2 {
   . ;
   integer categoryGroup; // categoryGroup is declared as an index on TABL2
   . ;

 }

 By conception, I suppose that:
 - [categoryGroup] may hold a limited number of values, less than 32 values.
 - [categoryGroup] is of type integer = it means 4 bytes = 32 bits
   = 32 places available to hold binary '0' or binary '1' values.
 - [categoryGroup] is the result of an OR bitwise operation among a
 predefined set of variables [variableCategory].
We suppose that [variableCategory] is of type integer (=32 bits)
and each binary value of [variableCategory] may only hold a single binary
 '1'.

 Ex: variableCategory1 = 0010
   variableCategory2 = 0010
   variableCategory3 = 1000

  If [categoryGroup] =  variableCategory1 | variableCategory2 |
 variableCategory3
 =[categoryGroup] = 00101010

 Question:
 --
 I have an SQL request similar to:

 SELECT . FROM TBL1, TBL2 WHERE
  inner join between TBL1 and TBL2 is True AND
  TBL1.CATEGORY  TBL2.CATEGORY  0  //-- where  is the AND bitwise
 operator

 Qst:
 1/ IS the above SQL request will use the INDEX [categoryGroup] defined on
 TBL1 and TBL2 ?
 2/ What should I do or How should I modify my SQL request in order
to force the query engine to use an index ? (the already defined index or
 another useful index)

 Thx a lot



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Index usage when bitwise operator is used

2007-09-17 Thread valgog

 What about saying?:

   TBL1.CATEGORY = TBL2.CATEGORY


Are you sure you understood what was the question?

Is the TBL1.CATEGORY = TBL2.CATEGORY the same as TBL1.CATEGORY 
TBL2.CATEGORY  0?


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Index usage when bitwise operator is used

2007-09-17 Thread Kevin Grittner
 On Mon, Sep 17, 2007 at  2:49 AM, in message
[EMAIL PROTECTED], valgog
[EMAIL PROTECTED] wrote: 

 What about saying?:

   TBL1.CATEGORY = TBL2.CATEGORY

 
 Are you sure you understood what was the question?
 
 Is the TBL1.CATEGORY = TBL2.CATEGORY the same as TBL1.CATEGORY 
 TBL2.CATEGORY  0?

Yes, given that he stipulated that one and only one bit would be set.
 
-Kevin
 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Index usage when bitwise operator is used

2007-09-17 Thread Tom Lane
Kevin Grittner [EMAIL PROTECTED] writes:
 On Mon, Sep 17, 2007 at  2:49 AM, in message
 [EMAIL PROTECTED], valgog
 [EMAIL PROTECTED] wrote:=20
 Are you sure you understood what was the question?
 
 Is the TBL1.CATEGORY = TBL2.CATEGORY the same as TBL1.CATEGORY 
 TBL2.CATEGORY  0?

 Yes, given that he stipulated that one and only one bit would be set.

Really?  In that case, isn't this bit-field just a bad implementation of
an enum-style field?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Index usage when bitwise operator is used

2007-09-17 Thread Kevin Grittner
 On Mon, Sep 17, 2007 at  8:37 AM, in message [EMAIL PROTECTED],
Tom Lane [EMAIL PROTECTED] wrote: 
 Kevin Grittner [EMAIL PROTECTED] writes:
 On Mon, Sep 17, 2007 at  2:49 AM, in message
 [EMAIL PROTECTED], valgog
 [EMAIL PROTECTED] wrote:=20
 Are you sure you understood what was the question?
 
 Is the TBL1.CATEGORY = TBL2.CATEGORY the same as TBL1.CATEGORY 
 TBL2.CATEGORY  0?
 
 Yes, given that he stipulated that one and only one bit would be set.
 
 Really?  In that case, isn't this bit-field just a bad implementation of
 an enum-style field?
 
My bad.  I did misread it.  Sorry, all.
 
-Kevin
 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Index usage when bitwise operator is used

2007-09-17 Thread W.Alphonse HAROUNY
Hi,

A little clarification. Actually,  TBL1.CATEGORY and/or  TBL2.CATEGORY may
hold a binary value having multiple binary(ies) '1'.
Each binary value column represent an business attribute.
If a binary value column is equal to '1', it means that the business
attribute is True,
otherwise it is false.
I adopted this avoid defining a detail table to table TBL1. Idem to TBL2.

If  TBL1.CATEGORY |  TBL2.CATEGORY  0
= it means that we have at least one common business attribute that is TRUE
for TBL1 and TBL2.

Regards
W.Alf


On 9/17/07, Tom Lane [EMAIL PROTECTED] wrote:

 Kevin Grittner [EMAIL PROTECTED] writes:
  On Mon, Sep 17, 2007 at  2:49 AM, in message
  [EMAIL PROTECTED], valgog
  [EMAIL PROTECTED] wrote:=20
  Are you sure you understood what was the question?
 
  Is the TBL1.CATEGORY = TBL2.CATEGORY the same as TBL1.CATEGORY 
  TBL2.CATEGORY  0?

  Yes, given that he stipulated that one and only one bit would be set.

 Really?  In that case, isn't this bit-field just a bad implementation of
 an enum-style field?

regards, tom lane

 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly



[PERFORM] Index usage when bitwise operator is used

2007-09-13 Thread W.Alphonse HAROUNY
Hello,

My question is about index usage when bitwise operations are invoked.
Situation Context:
--

Lets suppose we have 2 tables TBL1 and TBL2 as the following:
TBL1 {
  . ;
  integer categoryGroup; // categoryGroup is declared as an index on TABL1
  . ;
}

TBL2 {
  . ;
  integer categoryGroup; // categoryGroup is declared as an index on TABL2
  . ;
}

By conception, I suppose that:
- [categoryGroup] may hold a limited number of values, less than 32 values.
- [categoryGroup] is of type integer = it means 4 bytes = 32 bits
  = 32 places available to hold binary '0' or binary '1' values.
- [categoryGroup] is the result of an OR bitwise operation among a
predefined set of variables [variableCategory].
   We suppose that [variableCategory] is of type integer (=32 bits)
   and each binary value of [variableCategory] may only hold a single binary
'1'.


Ex: variableCategory1 = 0010
  variableCategory2 = 0010
  variableCategory3 = 1000

 If [categoryGroup] =  variableCategory1 | variableCategory2 |
variableCategory3
=[categoryGroup] = 00101010



Question:
--
I have an SQL request similar to:

SELECT . FROM TBL1, TBL2 WHERE
 inner join between TBL1 and TBL2 is True AND
 TBL1.CATEGORY  TBL2.CATEGORY  0  //-- where  is the AND bitwise
operator

Qst:
1/ IS the above SQL request will use the INDEX [categoryGroup] defined on
TBL1 and TBL2 ?
2/ What should I do or How should I modify my SQL request in order
   to force the query engine to use an index ? (the already defined index or
another useful index)



Thx a lot


[PERFORM] index usage

2007-04-23 Thread Arkadiusz Raj
Hi,

I have a table in my database that is updated every minute with new acquired
data. Anyway there is a query to get latest values to be displayed on
screen. I have postgresql 7.4.2 that work very fine. The problem was that
after hdd crash I have rebuild database from the archive and... Execution
time of this query starts to be unacceptable. And I found funny thing. Using
static value in place expression remove this problem. Query started to be
executed fast again.

I did not change any settings in postgresql configuration. Just had to
restart all the services.

Can someone tell me why the optimizer stopped to choose index? I had seqscan
disabled already.

One note about those two outputs below: there are different number of
touples returned due to the fact that in fact the timestamp is chosen
differently. 

Regards,

/Arek

--

explain analyze SELECT distinct on (index) index, status, value FROM _values
WHERE device=1 and timestamp(now()-5*interval '1 min') ORDER by index,
timestamp desc;
   QUERY
PLAN 

-
 Unique  (cost=100117679.93..100117756.29 rows=1 width=24) (actual
time=5279.262..5279.308 rows=10 loops=1)
   -  Sort  (cost=100117679.93..100117718.11 rows=15272 width=24) (actual
time=5279.260..5279.275 rows=21 loops=1)
 Sort Key: index, timestamp
 -  Seq Scan on _values  (cost=1.00..100116618.64
rows=15272 width=24) (actual time=5277.596..5279.184 rows=21 loops=1)
   Filter: ((device = 1) AND ((timestamp)::timestamp with time
zone  (now() - '00:05:00'::interval)))
 Total runtime: 5279.391 ms
(6 rows)

explain analyze SELECT distinct on (index) index, status, value FROM _values
WHERE device=1 and timestamp'2007-04-22 21:20' ORDER by index, timestamp
desc;
    QUERY
PLAN    

---
 Unique  (cost=703.45..703.47 rows=1 width=24) (actual time=4.807..4.867
rows=10 loops=1)
   -  Sort  (cost=703.45..703.46 rows=5 width=24) (actual time=4.804..4.827
rows=31 loops=1)
 Sort Key: index, timestamp
 -  Index Scan using _values_dbidx_idx on _values 
(cost=0.00..703.39 rows=5 width=24) (actual time=0.260..4.728 rows=31
loops=1)
   Index Cond: (timestamp  '2007-04-22 21:20:00'::timestamp
without time zone)
   Filter: (device = 1)
 Total runtime: 4.958 ms
(7 rows)


-- 
List przeskanowano programem ArcaMail, ArcaVir 2007
przeskanowano 2007-04-23 19:20:29, silnik: 2007.01.01 12:00:00, bazy: 
2007.04.15 09:21:20
This message has been scanned by ArcaMail, ArcaVir 2007
scanned 2007-04-23 19:20:29, engine: 2007.01.01 12:00:00, base: 2007.04.15 
09:21:20
http://www.arcabit.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] index usage

2007-04-23 Thread Steinar H. Gunderson
On Mon, Apr 23, 2007 at 07:20:29PM +0200, Arkadiusz Raj wrote:
 I have a table in my database that is updated every minute with new acquired

 data. Anyway there is a query to get latest values to be displayed on

 screen. I have postgresql 7.4.2 that work very fine.

You want _at least_ the latest 7.4 version -- ideally, the latest 8.2
version.

 The problem was that

 after hdd crash I have rebuild database from the archive and... Execution

 time of this query starts to be unacceptable.

Have you re-ANALYZEd after the data load?

Anyhow, the issue with the planner not knowing how to estimate expressions
like now() - interval '5 minutes' correctly is a known 7.4 issue, and it's
fixed in later versions. It might have worked more or less by accident
earlier, although it seems odd that it wouldn't even have considered the
index scan...

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Index usage

2006-08-21 Thread Scott Matseas

We're having a problem with one of our queries being slow. It appears to be due
to the index being used to go from tableA to tableB.

Here are the tables:
CREATE TABLE tableA
(
 table_idA int8 NOT NULL DEFAULT nextval('tableA_id_seq'::regclass),
 CONSTRAINT table_idA_pk PRIMARY KEY (table_idA),
) 
WITHOUT OIDS;


CREATE TABLE tableB
(
 table_idB int8 NOT NULL DEFAULT nextval('tableB_id_seq'::regclass),
 table_idA int8 NOT NULL,
 direction char NOT NULL,
 CONSTRAINT tableB_pk PRIMARY KEY (table_idB),
 CONSTRAINT tableB_unq UNIQUE (table_idA, direction),
) 
WITHOUT OIDS;


CREATE TABLE last_summarized
(
 summary_name varchar(64) NOT NULL,
 summarized_id int8,
 max_session_id int8,
 CONSTRAINT last_summarized_pk PRIMARY KEY (summary_name)
) 
WITHOUT OIDS;


Here is the query:
explain
SELECT * FROM 
  last_summarized ls

  JOIN tableA s ON s.table_idA  ls.summarized_id AND s.table_idA
= ls.max_session_id
  LEFT JOIN tableB sf ON s.table_idA = sf.table_idA AND sf.direction = 
'a'::char
  LEFT JOIN tableB sfb ON s.table_idA = sfb.table_idA AND sfb.direction = 
'b'::char
 WHERE ls.summary_name::text = 'summary'::text 


Size of tables in # of rows
tableA: 9,244,816
tableB: 15,398,497
last_summarized: 1


Explain of the above query:
Hash Left Join  (cost=1811349.31..18546527.89 rows=1029087 width=294)
  Hash Cond: (outer.table_idA = inner.table_idA)
  -  Hash Left Join  (cost=915760.88..7519203.61 rows=1029087 width=219)
Hash Cond: (outer.table_idA = inner.table_idA)
-  Nested Loop  (cost=0.00..126328.57 rows=1029087 width=144)
  -  Index Scan using last_summarized_pk on last_summarized ls  
(cost=0.00..5.98 rows=1 width=82)
Index Cond: ((summary_name)::text = 'summary'::text)
  -  Index Scan using table_idA_pk on tableA s  (cost=0.00..110886.29 
rows=1029087 width=62)
Index Cond: ((s.table_idA  outer.summarized_id) AND (s.table_idA = 
outer.max_session_id))
-  Hash  (cost=784763.16..784763.16 rows=8100289 width=75)
  -  Bitmap Heap Scan on tableB sf  (cost=216418.55..784763.16 
rows=8100289 width=75)
Recheck Cond: (direction = 'a'::char)
-  Bitmap Index Scan on tableB_unq  (cost=0.00..216418.55 
rows=8100289 width=0)
  Index Cond: (direction = 'a'::char) -- 
USING part of Index
  -  Hash  (cost=775968.61..775968.61 rows=7396725 width=75)
-  Bitmap Heap Scan on tableB sfb  (cost=216418.55..775968.61 rows=7396725 
width=75)
  Recheck Cond: (direction = 'b'::char)
  -  Bitmap Index Scan on tableB_unq  (cost=0.00..216418.55 
rows=7396725 width=0)
Index Cond: (direction = 'b'::char) -- USING 
part of Index


From the above explain see inline comment(-- USING part of Index). The 
table_idA column

looks like it is being ignored in the index Cond. If I enable sequential scan 
the Index Cond in
question gets replaced with a Seq scan.

Also if I disable enable_bitmapscan sometimes both columns of the index(tableB_unq) will be 
used.


Does anyone know why we're experiencing this behavior?




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Index usage

2006-08-21 Thread Tom Lane
Scott Matseas [EMAIL PROTECTED] writes:
 If I enable sequential scan the Index Cond in
 question gets replaced with a Seq scan.

What other planner parameters have you been fooling with?

With no data in the tables, I get a reasonably sane-looking plan,
so I'm thinking you've chosen bad values for something or other
(starting with enable_seqscan = off ;-))

explain
 SELECT * FROM 
   last_summarized ls
   JOIN tableA s ON s.table_idA  ls.summarized_id AND s.table_idA
 = ls.max_session_id
   LEFT JOIN tableB sf ON s.table_idA = sf.table_idA AND sf.direction = 
'a'::char
   LEFT JOIN tableB sfb ON s.table_idA = sfb.table_idA AND sfb.direction = 
'b'::char
  WHERE ls.summary_name::text = 'summary'::text ;
   QUERY PLAN   
 
-
 Nested Loop Left Join  (cost=6.16..54.51 rows=216 width=116)
   -  Nested Loop Left Join  (cost=6.16..42.05 rows=216 width=95)
 -  Nested Loop  (cost=6.16..29.58 rows=216 width=74)
   -  Index Scan using last_summarized_pk on last_summarized ls  
(cost=0.00..8.02 rows=1 width=66)
 Index Cond: ((summary_name)::text = 'summary'::text)
   -  Bitmap Heap Scan on tablea s  (cost=6.16..18.32 rows=216 
width=8)
 Recheck Cond: ((s.table_ida  ls.summarized_id) AND 
(s.table_ida = ls.max_session_id))
 -  Bitmap Index Scan on table_ida_pk  (cost=0.00..6.16 
rows=216 width=0)
   Index Cond: ((s.table_ida  ls.summarized_id) AND 
(s.table_ida = ls.max_session_id))
 -  Index Scan using tableb_unq on tableb sfb  (cost=0.00..0.05 rows=1 
width=21)
   Index Cond: ((s.table_ida = sfb.table_ida) AND (sfb.direction = 
'b'::bpchar))
   -  Index Scan using tableb_unq on tableb sf  (cost=0.00..0.05 rows=1 
width=21)
 Index Cond: ((s.table_ida = sf.table_ida) AND (sf.direction = 
'a'::bpchar))
(13 rows)


regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Index usage

2006-08-21 Thread Scott Matseas

Tom Lane wrote:

What other planner parameters have you been fooling with?

Hi Tom,
The other parameters that have been changed are:
set join_collapse_limit to 1
set enable_sort to off

We are using version 8.1.3. We've noticed the query plan changing depending
on the amount of data in the tables especially when the query looks at 
more rows

in tableA. The parameter work_mem is set to 262,144.

Thanks,
Scott





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] index usage

2006-07-28 Thread Ben
I have a table with 37000 rows, an integer column, and an index on that 
column. I've got a function that returns an integer. When I do a select 
where I restrict that column to being equal to a static number, explain 
tells me the index will be used. When I do the same thing but use the 
function instead of a static number, explain shows me a full scan on the 
table.


I must be missing something, because my understanding is that the function 
will be evaluated once for the statement and then collapsed into a static 
number for the filtering. But the results of the explain seem to imply 
that's not the case?


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] index usage

2006-07-28 Thread Daniel Caune
 De : [EMAIL PROTECTED] [mailto:pgsql-performance-
 [EMAIL PROTECTED] De la part de Ben
 Envoyé : vendredi, juillet 28, 2006 15:21
 À : pgsql-performance@postgresql.org
 Objet : [PERFORM] index usage
 
 I have a table with 37000 rows, an integer column, and an index on that
 column. I've got a function that returns an integer. When I do a select
 where I restrict that column to being equal to a static number, explain
 tells me the index will be used. When I do the same thing but use the
 function instead of a static number, explain shows me a full scan on the
 table.
 
 I must be missing something, because my understanding is that the function
 will be evaluated once for the statement and then collapsed into a static
 number for the filtering. But the results of the explain seem to imply
 that's not the case?
 

Is your function IMMUTABLE, STABLE or VOLATILE?

--
Daniel

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] index usage

2006-07-28 Thread Ben

It's volatile, but it will always return an integer.

On Fri, 28 Jul 2006, Daniel Caune wrote:


De : [EMAIL PROTECTED] [mailto:pgsql-performance-
[EMAIL PROTECTED] De la part de Ben
Envoyé : vendredi, juillet 28, 2006 15:21
À : pgsql-performance@postgresql.org
Objet : [PERFORM] index usage

I have a table with 37000 rows, an integer column, and an index on that
column. I've got a function that returns an integer. When I do a select
where I restrict that column to being equal to a static number, explain
tells me the index will be used. When I do the same thing but use the
function instead of a static number, explain shows me a full scan on the
table.

I must be missing something, because my understanding is that the function
will be evaluated once for the statement and then collapsed into a static
number for the filtering. But the results of the explain seem to imply
that's not the case?



Is your function IMMUTABLE, STABLE or VOLATILE?

--
Daniel


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] index usage

2006-07-28 Thread Mark Lewis
A volatile function has may return a different result for each row;
think of the random() or nextval() functions for example.  You wouldn't
want them to return the same value for each row returned.

-- Mark Lewis

On Fri, 2006-07-28 at 13:59 -0700, Ben wrote:
 It's volatile, but it will always return an integer.
 
 On Fri, 28 Jul 2006, Daniel Caune wrote:
 
  De : [EMAIL PROTECTED] [mailto:pgsql-performance-
  [EMAIL PROTECTED] De la part de Ben
  Envoyé : vendredi, juillet 28, 2006 15:21
  À : pgsql-performance@postgresql.org
  Objet : [PERFORM] index usage
 
  I have a table with 37000 rows, an integer column, and an index on that
  column. I've got a function that returns an integer. When I do a select
  where I restrict that column to being equal to a static number, explain
  tells me the index will be used. When I do the same thing but use the
  function instead of a static number, explain shows me a full scan on the
  table.
 
  I must be missing something, because my understanding is that the function
  will be evaluated once for the statement and then collapsed into a static
  number for the filtering. But the results of the explain seem to imply
  that's not the case?
 
 
  Is your function IMMUTABLE, STABLE or VOLATILE?
 
  --
  Daniel
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] index usage

2006-07-28 Thread Tom Lane
Ben [EMAIL PROTECTED] writes:
 It's volatile, but it will always return an integer.

If it's volatile then it can't be used for an index condition.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Index Usage using IN

2006-02-01 Thread Jeffrey W. Baker
On Thu, 2006-02-02 at 09:12 +1300, Ralph Mason wrote:
 Hi,
 
 I have 2 tables both have an index on ID (both ID columns are an oid).
 
 I want to find only only rows in one and not the other.
 
 Select ID from TableA where ID not IN ( Select ID from Table B)

Have you considered this:

SELECT ID from TableA EXCEPT Select ID from Table B

?

-jwb


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Index Usage using IN

2006-02-01 Thread Jeffrey W. Baker
On Wed, 2006-02-01 at 12:22 -0800, Jeffrey W. Baker wrote:
 On Thu, 2006-02-02 at 09:12 +1300, Ralph Mason wrote:
  Hi,
  
  I have 2 tables both have an index on ID (both ID columns are an oid).
  
  I want to find only only rows in one and not the other.
  
  Select ID from TableA where ID not IN ( Select ID from Table B)
 
 Have you considered this:
 
 SELECT ID from TableA EXCEPT Select ID from Table B

Alternately:

   SELECT a.ID 
 FROM TableA AS a 
LEFT JOIN TableB AS b 
   ON a.ID = b.ID 
WHERE b.ID IS NULL

-jwb

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Index Usage using IN

2006-02-01 Thread Tom Lane
Jeffrey W. Baker [EMAIL PROTECTED] writes:
 On Thu, 2006-02-02 at 09:12 +1300, Ralph Mason wrote:
 Select ID from TableA where ID not IN ( Select ID from Table B)

 Have you considered this:

 SELECT ID from TableA EXCEPT Select ID from Table B

Also, increasing work_mem might persuade the planner to try a hashed
subplan, which'd be a lot better than what you have.  Note that it's
quite unlikely that indexes are going to help for this.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Index Usage using IN

2006-02-01 Thread Bruno Wolff III
On Thu, Feb 02, 2006 at 09:12:59 +1300,
  Ralph Mason [EMAIL PROTECTED] wrote:
 Hi,
 
 I have 2 tables both have an index on ID (both ID columns are an oid).
 
 I want to find only only rows in one and not the other.
 
 Select ID from TableA where ID not IN ( Select ID from Table B)
 
 This always generates sequential scans.
 
 Table A has about 250,000 rows.   Table B has about 250,000 Rows.
 
 We should get a Scan on Table B and a Index Lookup on Table A.

I don't think that is going to work if there are NULLs in table B.
I don't know whether or not Postgres has code to special case NULL testing
(either for constraints ruling them out, or doing probes for them in addition
to the key it is trying to match) for doing NOT IN. Just doing a simple
index probe into table A isn't going to tell you all you need to know if
you don't find a match.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Index Usage using IN

2006-02-01 Thread Hari Warrier


Select ID from TableA where not exists ( Select ID from Table B where ID 
= TableA.ID)
might give you index scan. Of course, that is only useful is TableA is 
very small table.

Not appropriate for 250k rows

on 2/1/2006 12:12 PM Ralph Mason said the following:

Hi,

I have 2 tables both have an index on ID (both ID columns are an oid).

I want to find only only rows in one and not the other.

Select ID from TableA where ID not IN ( Select ID from Table B)

This always generates sequential scans.

Table A has about 250,000 rows.   Table B has about 250,000 Rows.

We should get a Scan on Table B and a Index Lookup on Table A.

Is there any way to force this?  enable_seqscan off doesn't help at all.

The Plan is

Seq Scan on tablea(cost=1.00..23883423070450.96 rows=119414 
width=4)

 Filter: (NOT (subplan))
 SubPlan- Seq Scan on tableb (cost=1.00..14611.17 
rows=242617 width=4)



Thanks
Ralph




---(end of broadcast)---
TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Index usage for sorted query

2004-11-22 Thread Markus Schaber
Hi, Pierre-Frédéric,

On Sat, 20 Nov 2004 17:12:43 +0100
Pierre-Frédéric Caillaud [EMAIL PROTECTED] wrote:

  WHERE cd='ca' ORDER BY l_postcode;
 
 Write :
 
  WHERE cd='ca' ORDER BY cd, l_postcode;
 
 You have a multicolumn index, so you should specify a multicolumn sort  
 exactly the same as your index, and the planner will get it.

Thanks, that seems to help.

Seems weird to order by a column that is all the same value, but well,
why not :-)

Thanks a lot,
Markus


-- 
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:[EMAIL PROTECTED] | www.logi-track.com

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Index usage for sorted query

2004-11-20 Thread Markus Schaber
Hello,

I have the following query plan:

logigis=# explain SELECT geom, ref_in_id as ref, nref_in_id as nref, st_name as 
name, substr(l_postcode,1,2) as postfirst,  func_class as level FROM 
schabi.streets WHERE cd='ca' ORDER BY l_postcode;
QUERY PLAN  
   
---
 Sort  (cost=2950123.42..2952466.07 rows=937059 width=290)
   Sort Key: l_postcode
   -  Index Scan using streets_name_idx on streets  (cost=0.00..2857177.57 
rows=937059 width=290)
 Index Cond: ((cd)::text = 'ca'::text)


And I have, beside others, the following index:
»streets_name_idx« btree (cd, l_postcode)

As the query plan shows, my postgresql 7.4 does fine on using the index
for the WHERE clause.

But as it fetches all the rows through the index, why doesn't it
recognize that, fetching this way, the rows are already sorted by
l_postcode?

As I have a larger set of data, it nearly breaks down our developer
machine every time we do this, as it always creates a temporary copy of
the large amount of data to sort it (setting sort_mem higher makes it
swap, setting it lower makes it thrashing disk directly).

Is Postgresql 8 more intelligend in this case?

Thanks for your hints,
Markus 

-- 
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:[EMAIL PROTECTED] | www.logi-track.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Index usage for sorted query

2004-11-20 Thread Pierre-Frdric Caillaud
Instead of :
WHERE cd='ca' ORDER BY l_postcode;
Write :
WHERE cd='ca' ORDER BY cd, l_postcode;
You have a multicolumn index, so you should specify a multicolumn sort  
exactly the same as your index, and the planner will get it.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Index usage for sorted query

2004-11-20 Thread Tom Lane
Markus Schaber [EMAIL PROTECTED] writes:
 But as it fetches all the rows through the index, why doesn't it
 recognize that, fetching this way, the rows are already sorted by
 l_postcode?

Tell it to ORDER BY cd, l_postcode.

 Is Postgresql 8 more intelligend in this case?

No.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] index usage

2004-04-28 Thread scott.marlowe
On Mon, 26 Apr 2004, Stephan Szabo wrote:

 
 On Fri, 23 Apr 2004 [EMAIL PROTECTED] wrote:
 
  I have a query which I think should be using an index all of the time but
  postgres only uses the index part of the time.  The index
  (ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed
  first followed by the selected column (support_person_id).  Wouldn't the
  most efficient plan be to scan the index each time because the only columns
  needed are in the index?  Below is the table, 2 queries showing the
 
 Not necessarily.  The rows in the actual file still need to be checked to
 see if they're visible to the select and if it's expected that the entire
 file (or a reasonable % of the pages anyway) will need to be loaded using
 the index isn't necessarily a win.

While those of us familiar with PostgreSQL are well aware of the fact that 
indexes can't be used directly to garner information, but only as a lookup 
to a tuple in the table, it seems this misconception is quite common among 
those coming to postgreSQL from other databases.

Is there any information that directly reflects this issue in the docs?  
There are tons of hints that it works this way in how they're written, but 
nothing that just comes out and says that with pgsql's mvcc 
implementation, an index scan still has to hit the pages that contain the 
tuples, so often in pgsql a seq scan is a win where in other databases and 
index scan would have been a win?

If not, where would I add it if I were going to write something up for the 
docs?  Just wondering...


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] index usage

2004-04-28 Thread Tom Lane
scott.marlowe [EMAIL PROTECTED] writes:
 There are tons of hints that it works this way in how they're written, but 
 nothing that just comes out and says that with pgsql's mvcc 
 implementation, an index scan still has to hit the pages that contain the 
 tuples, so often in pgsql a seq scan is a win where in other databases and 
 index scan would have been a win?

 If not, where would I add it if I were going to write something up for the 
 docs?  Just wondering...

AFAIR the only place in the docs that mentions seqscan or indexscan at
all is the discussion of EXPLAIN in Performance Tips.  Perhaps a
suitably-enlarged version of that section could cover this.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] index usage

2004-04-26 Thread brad-pgperf
Hi, 

I have a query which I think should be using an index all of the time but 
postgres only uses the index part of the time.  The index 
(ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed 
first followed by the selected column (support_person_id).  Wouldn't the 
most efficient plan be to scan the index each time because the only columns 
needed are in the index?  Below is the table, 2 queries showing the 
difference in plans, followed by the record distribution of ticket_crm_map.  
I first did a 'vacuum analyze' to update the statistics. 

Thanks,
Brad 

athenapost= \d ticket_crm_map
 Table public.ticket_crm_map
   Column |Type | 
Modifiers
+-+- 
---
tcrm_map_id| integer | not null
ticket_id  | integer | not null
crm_id | integer | not null
support_person_id  | integer | not null
escalated_to_person_id | integer | not null
status | character varying(50)   | not null default 
'Open'::character varying
close_date | timestamp without time zone |
updated_date   | timestamp without time zone |
updated_by | character varying(255)  |
created_date   | timestamp without time zone |
created_by | character varying(255)  |
additional_info| text|
subject| character varying(255)  |
Indexes:
  ticket_crm_map_pkey primary key, btree (tcrm_map_id)
  ticket_crm_map_crm_id_key unique, btree (crm_id, ticket_id)
  ticket_crm_map_crm_id_suppid btree (crm_id, support_person_id)
  ticket_crm_map_status btree (status)
  ticket_crm_map_ticket_id btree (ticket_id)
Foreign-key constraints:
  $1 FOREIGN KEY (ticket_id) REFERENCES ticket(ticket_id)
  $2 FOREIGN KEY (crm_id) REFERENCES company_crm(crm_id)
  $3 FOREIGN KEY (support_person_id) REFERENCES person(person_id)
  $4 FOREIGN KEY (escalated_to_person_id) REFERENCES person(person_id)
  $5 FOREIGN KEY (status) REFERENCES ticket_status(status) 

athenapost= explain analyze select distinct support_person_id from 
ticket_crm_map where crm_id = 7;
  
QUERY PLAN
 
 
--
Unique  (cost=1262.99..1265.27 rows=1 width=4) (actual time=15.335..18.245 
rows=20 loops=1)
 -  Sort  (cost=1262.99..1264.13 rows=456 width=4) (actual 
time=15.332..16.605 rows=2275 loops=1)
   Sort Key: support_person_id
   -  Index Scan using ticket_crm_map_crm_id_suppid on ticket_crm_map  
(cost=0.00..1242.85 rows=456 width=4) (actual time=0.055..11.281 rows=2275 
loops=1)
 Index Cond: (crm_id = 7)
Total runtime: 18.553 ms
(6 rows) 

Time: 20.598 ms
athenapost= explain analyze select distinct support_person_id from 
ticket_crm_map where crm_id = 1;
 QUERY PLAN
 
-
Unique  (cost=10911.12..11349.26 rows=32 width=4) (actual 
time=659.102..791.517 rows=24 loops=1)
 -  Sort  (cost=10911.12..11130.19 rows=87628 width=4) (actual 
time=659.090..713.285 rows=93889 loops=1)
   Sort Key: support_person_id
   -  Seq Scan on ticket_crm_map  (cost=0.00..3717.25 rows=87628 
width=4) (actual time=0.027..359.299 rows=93889 loops=1)
 Filter: (crm_id = 1)
Total runtime: 814.601 ms
(6 rows) 

Time: 817.095 ms
athenapost= select count(*), crm_id from ticket_crm_map group by crm_id;
count | crm_id
---+
2554 | 63
 129 | 25
  17 | 24
 110 | 23
  74 | 22
  69 | 21
   2 | 20
  53 | 82
  10 | 17
  16 | 81
46637 | 16
  14 | 80
   2 | 15
1062 | 79
  87 | 78
  93 | 77
  60 | 44
 363 | 76
 225 | 10
   4 | 74
  83 |  9
  27 | 73
 182 |  8
2275 |  7
  15 | 71
 554 |  6
  44 | 70
 631 |  5
  37 |  4
 190 |  3
 112 |  2
93889 |  1
(32 rows) 

Time: 436.697 ms
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] index usage

2004-04-26 Thread brad-pgperf
When checking an index in postgres the original table has to be checked for 
each result to find if the index entry is still valid?  In which case you 
can't blindly scan the whole index and assume the data is good. I was used 
to Oracle behavior where the index is up to date so it can do the scan 
without hitting the original table. 

Does this sound correct to anyone? 

Thanks,
Brad 

Stephan Szabo writes:
On Fri, 23 Apr 2004 [EMAIL PROTECTED] wrote: 

I have a query which I think should be using an index all of the time but
postgres only uses the index part of the time.  The index
(ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed
first followed by the selected column (support_person_id).  Wouldn't the
most efficient plan be to scan the index each time because the only columns
needed are in the index?  Below is the table, 2 queries showing the
Not necessarily.  The rows in the actual file still need to be checked to
see if they're visible to the select and if it's expected that the entire
file (or a reasonable % of the pages anyway) will need to be loaded using
the index isn't necessarily a win. 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])