subject:"\[PERFORM\] index usage"

Re: [PERFORM] Index usage for tstzrange?

2013-03-22 Thread Tom Lane

Heikki Linnakangas  writes:
> We should do this automatically. Or am I missing something?

Yes.  This is not equality.

> ALTER OPERATOR FAMILY integer_ops USING btree ADD
>   OPERATOR 3 <@ (int4, int4range),
>   FUNCTION 1 btint4rangecmp(int4, int4range);

That will break approximately everything in sight, starting with the
planner's opinion of what equality is.  There is *way* too much stuff
that knows the semantics of btree opclasses for us to start jamming
random operators into them, even if this seemed to work in trivial
testing.  (See the last section of src/backend/access/nbtree/README
to just scratch the surface of the assumptions this breaks.)

It's possible that for constant ranges we could have the planner expand
"intcol <@ 'x,y'::int4range" into "intcol between x and y", using
something similar to the index LIKE optimization (ie, the "special
operator" stuff in indxpath.c).  I'd like to find a way to make that
type of optimization pluggable, though --- the existing approach of
hard-wiring knowledge into indxpath.c has never been anything but
a kluge, and it definitely doesn't scale as-is to anything except
built-in types and operators.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage for tstzrange?

2013-03-22 Thread Josh Berkus


> We should do this automatically. Or am I missing something?

Aside from the need to support @> as well, not that I can see.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage for tstzrange?

2013-03-22 Thread Heikki Linnakangas


On 22.03.2013 02:05, Josh Berkus wrote:

Well, no.<@ is not a btree-indexable operator.


Yes, but it's equivalent to ( ( a>= b1 or b1 is null ) and ( a<  b2 or
b2 is null ) ), which *is* btree-indexable and can use an index.  So it
seems like the kind of optimization we could eventually make.


Yeah. The sort order of <@ is the same as regular b-tree, so it should 
be possible. In fact, nothing stops you from creating the suitable 
operator and b-tree support functions. See attached patch for int4, but 
the same should work for timestamptz.


We should do this automatically. Or am I missing something?

- Heikki
CREATE or replace FUNCTION btint4rangecmp(int4, int4range) RETURNS int4 AS
$$
SELECT CASE WHEN $1 < lower($2) THEN -1
WHEN $1 < upper($2) OR upper($2) IS NULL THEN 0
ELSE 1 END
$$
LANGUAGE SQL IMMUTABLE STRICT;

CREATE FUNCTION int4_contained_by_int4range (int4, int4range) RETURNS bool AS
'elem_contained_by_range' LANGUAGE internal IMMUTABLE STRICT;

CREATE OPERATOR <@ (
  PROCEDURE=int4_contained_by_int4range,
  LEFTARG=int4, RIGHTARG=int4range
);

ALTER OPERATOR FAMILY integer_ops USING btree ADD
  OPERATOR 3 <@ (int4, int4range),
  FUNCTION 1 btint4rangecmp(int4, int4range);


-- Create a test table
create table inttable (i int4 primary key);
insert into inttable select g from generate_series(1,10) g;

-- Query it. Uses index, hooray!
select * from inttable where i <@ int4range(111, 120);

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage for tstzrange?

2013-03-22 Thread Heikki Linnakangas


On 21.03.2013 17:55, Alexander Korotkov wrote:

On Thu, Mar 21, 2013 at 12:52 PM, Heikki Linnakangas<

The immediate fix is attached, but this made me realize that rangesel() is
still missing estimation for the "element<@ range" operator. It shouldn't
be hard to implement, I'm pretty sure we have all the statistics we need
for that.


Probably we could even call existing scalarltsel and scalargtsel for this
case.


I came up with the attached. I didn't quite use scalarltsel, but I used 
the scalarineqsel function, which contains the "guts" of scalarltsel and 
scalargtsel.


One thing I wasn't quite sure of (from the patch):


/*
 * We use the data type's default < operator. This is bogus, if the 
range
 * type's rngsubopc operator class is different. In practice, that ought
 * to be rare. It would also be bogus to use the < operator from the
 * rngsubopc operator class, because the statistics are collected using
 * using the default operator class, anyway.
 *
 * For the same reason, use the default collation. The statistics are
 * collected with the default collation.
 */


Does that make sense? The other option would be to use the < operator 
from the rngsubopc op class, even though the scalar statistics are 
collected with the default b-tree < operator. As long as the two sort 
roughly the same way, you get reasonable results either way. Yet another 
option would be to use histogram_selectivity() instead of 
ineq_histogram_selectivity(), if the range's rngsubopc opclass isn't the 
type's default opclass. histogram_selectivity() works with any operator 
regardless of the sort ordering, basically using the histogram values 
merely as a sample, rather than as a histogram. But I'm reluctant to 
make this any more complicated, as using a non-default opclass for the 
range type is rare.


- Heikki
diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index c450c6a..60700b8 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -55,6 +55,8 @@ static double calc_hist_selectivity_contains(TypeCacheEntry *typcache,
 			   RangeBound *lower, RangeBound *upper,
 			   RangeBound *hist_lower, int hist_nvalues,
 			Datum *length_hist_values, int length_hist_nvalues);
+static double calc_elem_contained_by_sel(PlannerInfo *root, TypeCacheEntry *typcache,
+		   VariableStatData *vardata, RangeType *constval);
 
 /*
  * Returns a default selectivity estimate for given operator, when we don't
@@ -155,18 +157,35 @@ rangesel(PG_FUNCTION_ARGS)
 	}
 
 	/*
-	 * OK, there's a Var and a Const we're dealing with here.  We need the
-	 * Const to be of same range type as the column, else we can't do anything
-	 * useful. (Such cases will likely fail at runtime, but here we'd rather
-	 * just return a default estimate.)
-	 *
-	 * If the operator is "range @> element", the constant should be of the
-	 * element type of the range column. Convert it to a range that includes
-	 * only that single point, so that we don't need special handling for
-	 * that in what follows.
+	 * OK, there's a Var and a Const we're dealing with here.  Check that the
+	 * Const is of the right type, else we can't do anything useful. (Such
+	 * cases will likely fail at runtime, but here we'd rather just return a
+	 * default estimate.)
 	 */
-	if (operator == OID_RANGE_CONTAINS_ELEM_OP)
+	if (operator == OID_RANGE_ELEM_CONTAINED_OP)
+	{
+		/*
+		 * "element <@ range" is quite different from the other range
+		 * operators, in that the Var is not a range, but of the element type.
+		 */
+		typcache = range_get_typcache(fcinfo, ((Const *) other)->consttype);
+
+		if (typcache->rngelemtype->type_id == vardata.vartype)
+		{
+			constrange = DatumGetRangeType(((Const *) other)->constvalue);
+			selec = calc_elem_contained_by_sel(root, typcache, &vardata, constrange);
+		}
+		else
+			selec = default_range_selectivity(operator);
+	}
+	else if (operator == OID_RANGE_CONTAINS_ELEM_OP)
 	{
+		/*
+		 * In "range @> element", the constant should be of the element type
+		 * of the range column. Convert it to a range that includes only that
+		 * single point, so that we don't need special handling for that in
+		 * what follows.
+		 */
 		typcache = range_get_typcache(fcinfo, vardata.vartype);
 
 		if (((Const *) other)->consttype == typcache->rngelemtype->type_id)
@@ -181,26 +200,29 @@ rangesel(PG_FUNCTION_ARGS)
 			upper.infinite = false;
 			upper.lower = false;
 			constrange = range_serialize(typcache, &lower, &upper, false);
+
+			selec = calc_rangesel(typcache, &vardata, constrange, operator);
 		}
+		else
+			selec = default_range_selectivity(operator);
 	}
 	else
 	{
-		typcache = range_get_typcache(fcinfo, ((Const *) other)->consttype);
+		/*
+		 * In all other range operators, both operands are ranges, and they
+		 * must be of the same type.
+		 */
+		typcache

Re: [PERFORM] Index usage for tstzrange?

2013-03-21 Thread Josh Berkus


> Well, no.  <@ is not a btree-indexable operator.

Yes, but it's equivalent to ( ( a >= b1 or b1 is null ) and ( a < b2 or
b2 is null ) ), which *is* btree-indexable and can use an index.  So it
seems like the kind of optimization we could eventually make.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage for tstzrange?

2013-03-21 Thread Heikki Linnakangas


On 21.03.2013 06:07, Vasilis Ventirozos wrote:

On Thu, Mar 21, 2013 at 5:58 AM, Tom Lane  wrote:

What I find more disturbing is that this is what I get from the example
in HEAD:

regression=# explain SELECT * FROM a WHERE ts<@
tstzrange('2013-01-01','2013-01-01 00:10:00');
ERROR:  XX000: type 1184 is not a range type
LOCATION:  range_get_typcache, rangetypes.c:1451

Haven't traced through it to determine exactly what's happening, but
isn't this a legitimate usage?  And if it isn't, surely a more
user-facing error ought to be getting thrown somewhere upstream of here.


It is a legit usage, this is from a test i did myself (9.2.3)

test=# explain SELECT * FROM a WHERE ts<@
tstzrange('2013-01-01','2013-04-01 00:10:00');
  QUERY PLAN

  Seq Scan on a  (cost=0.00..23.75 rows=1 width=44)
Filter: (ts<@ '["2013-01-01 00:00:00+02","2013-04-01
00:10:00+03")'::tstzrange)


Looks like the range type cost estimation patch broke this, back in 
August already. The case of var <@ constant, where constant is a range 
and var is an element, that's broken. The cost estimation function, 
rangesel(), incorrectly assumes that the 'var' is always a range type.


It's a bit worrying that no-one noticed until now. I'll add a test for 
that operator to the rangetypes regression test.


The immediate fix is attached, but this made me realize that rangesel() 
is still missing estimation for the "element <@ range" operator. It 
shouldn't be hard to implement, I'm pretty sure we have all the 
statistics we need for that.


- Heikki
diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index 76dc913..c450c6a 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -154,8 +154,6 @@ rangesel(PG_FUNCTION_ARGS)
 		}
 	}
 
-	typcache = range_get_typcache(fcinfo, vardata.vartype);
-
 	/*
 	 * OK, there's a Var and a Const we're dealing with here.  We need the
 	 * Const to be of same range type as the column, else we can't do anything
@@ -169,6 +167,8 @@ rangesel(PG_FUNCTION_ARGS)
 	 */
 	if (operator == OID_RANGE_CONTAINS_ELEM_OP)
 	{
+		typcache = range_get_typcache(fcinfo, vardata.vartype);
+
 		if (((Const *) other)->consttype == typcache->rngelemtype->type_id)
 		{
 			RangeBound lower, upper;
@@ -185,6 +185,8 @@ rangesel(PG_FUNCTION_ARGS)
 	}
 	else
 	{
+		typcache = range_get_typcache(fcinfo, ((Const *) other)->consttype);
+
 		if (((Const *) other)->consttype == vardata.vartype)
 			constrange = DatumGetRangeType(((Const *) other)->constvalue);
 	}

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage for tstzrange?

2013-03-20 Thread Vasilis Ventirozos

On Thu, Mar 21, 2013 at 5:58 AM, Tom Lane  wrote:

> Josh Berkus  writes:
> > I just noticed that if I use a tstzrange for convenience, a standard
> > btree index on a timestamp won't get used for it.  Example:
>
> > table a (
> >   id int,
> >   val text,
> >   ts timestamptz
> > );
> > index a_ts on a(ts);
>
> > SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00')
>
> > ... will NOT use the index a_ts.
>
> Well, no.  <@ is not a btree-indexable operator.
>
> What I find more disturbing is that this is what I get from the example
> in HEAD:
>
> regression=# explain SELECT * FROM a WHERE ts <@
> tstzrange('2013-01-01','2013-01-01 00:10:00');
> ERROR:  XX000: type 1184 is not a range type
> LOCATION:  range_get_typcache, rangetypes.c:1451
>
> Haven't traced through it to determine exactly what's happening, but
> isn't this a legitimate usage?  And if it isn't, surely a more
> user-facing error ought to be getting thrown somewhere upstream of here.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

It is a legit usage, this is from a test i did myself (9.2.3)

test=# explain SELECT * FROM a WHERE ts <@
tstzrange('2013-01-01','2013-04-01 00:10:00');
 QUERY PLAN

 Seq Scan on a  (cost=0.00..23.75 rows=1 width=44)
   Filter: (ts <@ '["2013-01-01 00:00:00+02","2013-04-01
00:10:00+03")'::tstzrange)

Re: [PERFORM] Index usage for tstzrange?

2013-03-20 Thread Tom Lane

Josh Berkus  writes:
> I just noticed that if I use a tstzrange for convenience, a standard
> btree index on a timestamp won't get used for it.  Example:

> table a (
>   id int,
>   val text,
>   ts timestamptz
> );
> index a_ts on a(ts);

> SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00')

> ... will NOT use the index a_ts.

Well, no.  <@ is not a btree-indexable operator.

What I find more disturbing is that this is what I get from the example
in HEAD:

regression=# explain SELECT * FROM a WHERE ts <@ 
tstzrange('2013-01-01','2013-01-01 00:10:00');
ERROR:  XX000: type 1184 is not a range type
LOCATION:  range_get_typcache, rangetypes.c:1451

Haven't traced through it to determine exactly what's happening, but
isn't this a legitimate usage?  And if it isn't, surely a more
user-facing error ought to be getting thrown somewhere upstream of here.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Index usage for tstzrange?

2013-03-20 Thread Josh Berkus

Folks,

I just noticed that if I use a tstzrange for convenience, a standard
btree index on a timestamp won't get used for it.  Example:

table a (
id int,
val text,
ts timestamptz
);
index a_ts on a(ts);

SELECT * FROM a WHERE ts <@ tstzrange('2013-01-01','2013-01-01 00:10:00')

... will NOT use the index a_ts.  Is this something which could be fixed
for 9.4?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] index usage for min() vs. "order by asc limit 1"

2011-11-18 Thread MirrorX

can you run an analyze command first and then post here the results of:
select * FROM pg_stats WHERE tablename = 'delayed_jobs'; 
?

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/index-usage-for-min-vs-order-by-asc-limit-1-tp5002928p5004410.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] index usage for min() vs. "order by asc limit 1"

2011-11-17 Thread Ben Chobot

On Nov 17, 2011, at 5:20 PM, Steve Atkins wrote:
> 
> I don't think you want the group by in that first query.

Heh, I tried to simply the example, but in reality that = becomes an in clause 
of multiple values. So the group by is needed.


>> 
>> 
>> postgres=# explain analyze select min(id) from delayed_jobs where 
>> strand='sis_batch:account:15' group by strand;
>>   QUERY PLAN
>> --
>> GroupAggregate  (cost=0.00..8918.59 rows=66 width=29) (actual 
>> time=226.759..226.760 rows=1 loops=1)
>>  ->  Seq Scan on delayed_jobs  (cost=0.00..8553.30 rows=72927 width=29) 
>> (actual time=0.014..169.941 rows=72268 loops=1)
>>Filter: ((strand)::text = 'sis_batch:account:15'::text)
>> Total runtime: 226.817 ms
>> (4 rows)
>> 
>> postgres=# explain analyze select id from delayed_jobs where 
>> strand='sis_batch:account:15' order by id limit 1;
>>  QUERY 
>> PLAN
>> -
>> Limit  (cost=0.00..0.33 rows=1 width=8) (actual time=0.097..0.098 rows=1 
>> loops=1)
>>  ->  Index Scan using index_delayed_jobs_on_strand on delayed_jobs  
>> (cost=0.00..24181.74 rows=72927 width=8) (actual time=0.095..0.095 rows=1 
>> loops=1)
>>Index Cond: ((strand)::text = 'sis_batch:account:15'::text)
>> Total runtime: 0.129 ms
>> (4 rows)
>> 
>> 
>> -- 
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] index usage for min() vs. "order by asc limit 1"

2011-11-17 Thread Steve Atkins


On Nov 17, 2011, at 5:12 PM, Ben Chobot wrote:

> I have two queries in PG 9.1. One uses an index like I would like, the other 
> does not. Is this expected behavior? If so, is there any way around it? 

I don't think you want the group by in that first query.

Cheers,
  Steve

> 
> 
> postgres=# explain analyze select min(id) from delayed_jobs where 
> strand='sis_batch:account:15' group by strand;
>QUERY PLAN
> --
> GroupAggregate  (cost=0.00..8918.59 rows=66 width=29) (actual 
> time=226.759..226.760 rows=1 loops=1)
>   ->  Seq Scan on delayed_jobs  (cost=0.00..8553.30 rows=72927 width=29) 
> (actual time=0.014..169.941 rows=72268 loops=1)
> Filter: ((strand)::text = 'sis_batch:account:15'::text)
> Total runtime: 226.817 ms
> (4 rows)
> 
> postgres=# explain analyze select id from delayed_jobs where 
> strand='sis_batch:account:15' order by id limit 1;
>   QUERY 
> PLAN
> -
> Limit  (cost=0.00..0.33 rows=1 width=8) (actual time=0.097..0.098 rows=1 
> loops=1)
>   ->  Index Scan using index_delayed_jobs_on_strand on delayed_jobs  
> (cost=0.00..24181.74 rows=72927 width=8) (actual time=0.095..0.095 rows=1 
> loops=1)
> Index Cond: ((strand)::text = 'sis_batch:account:15'::text)
> Total runtime: 0.129 ms
> (4 rows)
> 
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] index usage for min() vs. "order by asc limit 1"

2011-11-17 Thread Ben Chobot

I have two queries in PG 9.1. One uses an index like I would like, the other 
does not. Is this expected behavior? If so, is there any way around it? 


postgres=# explain analyze select min(id) from delayed_jobs where 
strand='sis_batch:account:15' group by strand;
QUERY PLAN
--
 GroupAggregate  (cost=0.00..8918.59 rows=66 width=29) (actual 
time=226.759..226.760 rows=1 loops=1)
   ->  Seq Scan on delayed_jobs  (cost=0.00..8553.30 rows=72927 width=29) 
(actual time=0.014..169.941 rows=72268 loops=1)
 Filter: ((strand)::text = 'sis_batch:account:15'::text)
 Total runtime: 226.817 ms
(4 rows)

postgres=# explain analyze select id from delayed_jobs where 
strand='sis_batch:account:15' order by id limit 1;
   QUERY 
PLAN
-
 Limit  (cost=0.00..0.33 rows=1 width=8) (actual time=0.097..0.098 rows=1 
loops=1)
   ->  Index Scan using index_delayed_jobs_on_strand on delayed_jobs  
(cost=0.00..24181.74 rows=72927 width=8) (actual time=0.095..0.095 rows=1 
loops=1)
 Index Cond: ((strand)::text = 'sis_batch:account:15'::text)
 Total runtime: 0.129 ms
(4 rows)


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] index usage on queries on inherited tables

2011-04-29 Thread Robert Haas

On Apr 27, 2011, at 11:11 PM, Joseph Shraibman  wrote:
> On 04/27/2011 04:32 PM, Robert Haas wrote:
>> In the first case, PostgreSQL evidently thinks that using the indexes
>> will be slower than just ignoring them.  You could find out whether
>> it's right by trying it with enable_seqscan=off.
> 
> My point is that this is just a problem with inherited tables.  It
> should be obvious to postgres that few rows are being returned, but in
> the inherited tables case it doesn't use indexes.  This was just an
> example.  In a 52 gig table I have a "select id from table limit 1 order
> by id desc" returns instantly, but as soon as you declare a child table
> it tries to seq scan all the tables.

Oh, sorry, I must have misunderstood. As Greg says, this is fixed in 9.1.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] index usage on queries on inherited tables

2011-04-27 Thread Greg Smith


Joseph Shraibman wrote:

In a 52 gig table I have a "select id from table limit 1 order
by id desc" returns instantly, but as soon as you declare a child table
it tries to seq scan all the tables.
  


This is probably the limitation that's fixed in PostgreSQL 9.1 by this 
commit (following a few others leading up to it):  
http://archives.postgresql.org/pgsql-committers/2010-11/msg00028.php


There was a good example showing what didn't work as expected before 
(along with an earlier patch that didn't everything the larger 9.1 
improvement does) at 
http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php ; 
"ORDER BY x DESC LIMIT 1" returns the same things as MAX(x).


It's a pretty serious issue with the partitioning in earlier versions.  
I know of multiple people, myself included, who have been compelled to 
apply this change to an earlier version of PostgreSQL to make larger 
partitioned databases work correctly.  The other option is to manually 
decompose the queries into ones that target each of the child tables 
individually, then combine the results, which is no fun either.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] index usage on queries on inherited tables

2011-04-27 Thread Samuel Gendler

On Wed, Apr 27, 2011 at 2:11 PM, Joseph Shraibman wrote:

> On 04/27/2011 04:32 PM, Robert Haas wrote:
> > In the first case, PostgreSQL evidently thinks that using the indexes
> > will be slower than just ignoring them.  You could find out whether
> > it's right by trying it with enable_seqscan=off.
>
> My point is that this is just a problem with inherited tables.  It
> should be obvious to postgres that few rows are being returned, but in
> the inherited tables case it doesn't use indexes.  This was just an
> example.  In a 52 gig table I have a "select id from table limit 1 order
> by id desc" returns instantly, but as soon as you declare a child table
> it tries to seq scan all the tables.
>
>
If I'm understanding correctly, this kind of obviates the utility of
partitioning if you structure a warehouse in a traditional manner.  Assuming
a fact table partitioned by time, but with foreign keys to a time dimension,
it is now not possible to gain any advantage from the partitioning if
selecting on columns in the time dimension.

"select * from fact_table f join time_dimension t on f.time_id = t.time_id
where t.quarter=3 and t.year = 2010" will scan all partitions of the fact
table despite the fact that all of the rows would come from 3 partitions,
assuming a partitioning schema that uses one partition for each month.

I use a time id that is calculable from the from the timestamp so it doesn't
need to be looked up, and partitioning on time_id directly is easy enough to
handle, but if I'm understanding the problem, it sounds like nothing short
of computing the appropriate time ids before issuing the query and then
including a 'where f.time_id between x and y' clause to the query will
result in the partitions being correctly excluded.  Is that what people are
doing to solve this problem?  The alternative is to leave a timestamp column
in the fact table (something I tend to do since it makes typing ad-hoc
queries in psql much easier) and partition on that column and then always
include a where clause for that column that is at least as large as the
requested row range.  Both result in fairly ugly queries, though I can
certainly see how I might structure my code to always build queries which
adhere to this.

I'm just in the process of designing a star schema for a project and was
intending to use exactly the structure I described at the top of the email.
Is there a postgres best-practices for solving this problem? There's no way
I can get away without partitioning.  I'm looking at a worst case table of
100,000 rows being written every 5 minutes, 24x7 - 29 million rows per day,
a billion rows per month - with most queries running over a single month or
comparing same months from differing years and quarters - so a month based
partitioning.  Normal case is closer to 10K rows per 5 minutes.

Suggestions?

--sam

Re: [PERFORM] index usage on queries on inherited tables

2011-04-27 Thread Joseph Shraibman

On 04/27/2011 04:32 PM, Robert Haas wrote:
> In the first case, PostgreSQL evidently thinks that using the indexes
> will be slower than just ignoring them.  You could find out whether
> it's right by trying it with enable_seqscan=off.

My point is that this is just a problem with inherited tables.  It
should be obvious to postgres that few rows are being returned, but in
the inherited tables case it doesn't use indexes.  This was just an
example.  In a 52 gig table I have a "select id from table limit 1 order
by id desc" returns instantly, but as soon as you declare a child table
it tries to seq scan all the tables.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] index usage on queries on inherited tables

2011-04-27 Thread Robert Haas

On Fri, Apr 1, 2011 at 2:41 AM, Joseph Shraibman  wrote:
> When I do a query on a table with child tables on certain queries pg
> uses indexes and on others it doesn't. Why does this happen? For example:
>
>
> [local]:playpen=> explain analyze select * from vis where id > 10747 ;
>                                                               QUERY
> PLAN
> 
>  Result  (cost=4.29..115.11 rows=325 width=634) (actual
> time=0.063..0.116 rows=5 loops=1)
>   ->  Append  (cost=4.29..115.11 rows=325 width=634) (actual
> time=0.053..0.090 rows=5 loops=1)
>         ->  Bitmap Heap Scan on vis  (cost=4.29..23.11 rows=5
> width=948) (actual time=0.051..0.058 rows=5 loops=1)
>               Recheck Cond: (id > 10747)
>               ->  Bitmap Index Scan on vis_pkey  (cost=0.00..4.29
> rows=5 width=0) (actual time=0.037..0.037 rows=5 loops=1)
>                     Index Cond: (id > 10747)
>         ->  Seq Scan on vis_for_seg_1_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
>               Filter: (id > 10747)
>         ->  Seq Scan on vis_for_seg_4_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
>               Filter: (id > 10747)
>         ->  Seq Scan on vis_for_seg_66_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
>               Filter: (id > 10747)
>         ->  Seq Scan on vis_for_seg_69_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
>               Filter: (id > 10747)
>         ->  Seq Scan on vis_for_seg_79_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
>               Filter: (id > 10747)
>         ->  Seq Scan on vis_for_seg_80_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
>               Filter: (id > 10747)
>         ->  Seq Scan on vis_for_seg_82_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
>               Filter: (id > 10747)
>         ->  Seq Scan on vis_for_seg_87_2011_03 vis  (cost=0.00..11.50
> rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
>               Filter: (id > 10747)
>  Total runtime: 0.724 ms
> (23 rows)
>
> Time: 5.804 ms
> [local]:playpen=> explain analyze select * from vis where id = 10747 ;
>
> QUERY
> PLAN
>
> 
>  Result  (cost=0.00..74.41 rows=9 width=664) (actual time=0.060..0.503
> rows=1 loops=1)
>   ->  Append  (cost=0.00..74.41 rows=9 width=664) (actual
> time=0.053..0.493 rows=1 loops=1)
>         ->  Index Scan using vis_pkey on vis  (cost=0.00..8.27 rows=1
> width=948) (actual time=0.051..0.055 rows=1 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_1_2011_03_pkey on
> vis_for_seg_1_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.122..0.122 rows=0 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_4_2011_03_pkey on
> vis_for_seg_4_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.043..0.043 rows=0 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_66_2011_03_pkey on
> vis_for_seg_66_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.041..0.041 rows=0 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_69_2011_03_pkey on
> vis_for_seg_69_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.041..0.041 rows=0 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_79_2011_03_pkey on
> vis_for_seg_79_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.043..0.043 rows=0 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_80_2011_03_pkey on
> vis_for_seg_80_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.041..0.041 rows=0 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_82_2011_03_pkey on
> vis_for_seg_82_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.049..0.049 rows=0 loops=1)
>               Index Cond: (id = 10747)
>         ->  Index Scan using vis_for_seg_87_2011_03_pkey on
> vis_for_seg_87_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
> time=0.043..0.043 rows=0 loops=1)
>               Index Cond: (id = 10747)
>  Total runtime: 1.110 ms
> (21 rows)
>
> [local]:playpen=> select version();
>
> version
> 
>  PostgreSQL 9.0.3 on i

[PERFORM] index usage on queries on inherited tables

2011-03-31 Thread Joseph Shraibman

When I do a query on a table with child tables on certain queries pg
uses indexes and on others it doesn't. Why does this happen? For example:


[local]:playpen=> explain analyze select * from vis where id > 10747 ;
   QUERY
PLAN  

 Result  (cost=4.29..115.11 rows=325 width=634) (actual
time=0.063..0.116 rows=5 loops=1)
   ->  Append  (cost=4.29..115.11 rows=325 width=634) (actual
time=0.053..0.090 rows=5 loops=1)
 ->  Bitmap Heap Scan on vis  (cost=4.29..23.11 rows=5
width=948) (actual time=0.051..0.058 rows=5 loops=1)
   Recheck Cond: (id > 10747)
   ->  Bitmap Index Scan on vis_pkey  (cost=0.00..4.29
rows=5 width=0) (actual time=0.037..0.037 rows=5 loops=1)
 Index Cond: (id > 10747)
 ->  Seq Scan on vis_for_seg_1_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
   Filter: (id > 10747)
 ->  Seq Scan on vis_for_seg_4_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
   Filter: (id > 10747)
 ->  Seq Scan on vis_for_seg_66_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
   Filter: (id > 10747)
 ->  Seq Scan on vis_for_seg_69_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
   Filter: (id > 10747)
 ->  Seq Scan on vis_for_seg_79_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
   Filter: (id > 10747)
 ->  Seq Scan on vis_for_seg_80_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
   Filter: (id > 10747)
 ->  Seq Scan on vis_for_seg_82_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.001..0.001 rows=0 loops=1)
   Filter: (id > 10747)
 ->  Seq Scan on vis_for_seg_87_2011_03 vis  (cost=0.00..11.50
rows=40 width=629) (actual time=0.002..0.002 rows=0 loops=1)
   Filter: (id > 10747)
 Total runtime: 0.724 ms
(23 rows)

Time: 5.804 ms
[local]:playpen=> explain analyze select * from vis where id = 10747 ;

QUERY
PLAN



 Result  (cost=0.00..74.41 rows=9 width=664) (actual time=0.060..0.503
rows=1 loops=1)
   ->  Append  (cost=0.00..74.41 rows=9 width=664) (actual
time=0.053..0.493 rows=1 loops=1)
 ->  Index Scan using vis_pkey on vis  (cost=0.00..8.27 rows=1
width=948) (actual time=0.051..0.055 rows=1 loops=1)
   Index Cond: (id = 10747)
 ->  Index Scan using vis_for_seg_1_2011_03_pkey on
vis_for_seg_1_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.122..0.122 rows=0 loops=1)
   Index Cond: (id = 10747)
 ->  Index Scan using vis_for_seg_4_2011_03_pkey on
vis_for_seg_4_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.043..0.043 rows=0 loops=1)
   Index Cond: (id = 10747)
 ->  Index Scan using vis_for_seg_66_2011_03_pkey on
vis_for_seg_66_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.041..0.041 rows=0 loops=1)
   Index Cond: (id = 10747)
 ->  Index Scan using vis_for_seg_69_2011_03_pkey on
vis_for_seg_69_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.041..0.041 rows=0 loops=1)
   Index Cond: (id = 10747)
 ->  Index Scan using vis_for_seg_79_2011_03_pkey on
vis_for_seg_79_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.043..0.043 rows=0 loops=1)
   Index Cond: (id = 10747)
 ->  Index Scan using vis_for_seg_80_2011_03_pkey on
vis_for_seg_80_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.041..0.041 rows=0 loops=1)
   Index Cond: (id = 10747)
 ->  Index Scan using vis_for_seg_82_2011_03_pkey on
vis_for_seg_82_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.049..0.049 rows=0 loops=1)
   Index Cond: (id = 10747)
 ->  Index Scan using vis_for_seg_87_2011_03_pkey on
vis_for_seg_87_2011_03 vis  (cost=0.00..8.27 rows=1 width=629) (actual
time=0.043..0.043 rows=0 loops=1)
   Index Cond: (id = 10747)
 Total runtime: 1.110 ms
(21 rows)

[local]:playpen=> select version();
 
version

Re: [PERFORM] Index usage with functions in where condition

2010-07-09 Thread Jeremy Palmer

Hi Tom,

Thanks for the help - much appreciated.

Yes I'm using PostGIS, and with a simply join to a relating table I could get 
access to the geometry for these point positions. Is using the GIST r-tree 
index faster than using the 2 b-tree indexes on the lat and long values? I 
guess this is a question for the PostGIS guys and a quick test could tell me 
anyway! My memory is that the GIST r-tree index is slow for points at the 
moment, and that a good implementation of a kd-tree index over GIST is required 
for better speed.

Regards,

Jeremy Palmer
Geodetic Surveyor
National Geodetic Office

Land Information New Zealand | Toitu te whenua
160 Lambton Quay | Private Box 5501 | Wellington 6145

DDI: 64 (0)4 498 3537 | Fax: 64 (0)4 498 3837 | www.linz.govt.nz

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Saturday, 10 July 2010 11:20 a.m.
To: Jeremy Palmer
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Index usage with functions in where condition 

Jeremy Palmer  writes:
> This is the query that does not use the indexes:

> SELECT
>   coo.nod_id,
>   640*radians(sqrt((coo.value1 - -41.0618)^2+((coo.value2 - 
> 175.58461)*cos(radians(-41.0618)))^2)) as distance
> FROM 
>   crs_coordinate coo
> WHERE
>   coo.value1 between -41.0618-degrees(1200.0/640.0) and 
> -41.0618+degrees(1200.0/640.0) and
>   coo.value2 between 
> 175.58461-degrees(1200.0/640.0)/(cos(radians(-41.0618))) and 
> 175.58461+degrees(1200.0/640.0)/(cos(radians(-41.0618)));

Those expressions yield float8, not numeric, and numeric vs float8 isn't
an indexable operator for reasons we needn't get into here.  You should
probably rethink whether numeric is really the best choice of datatype
for your columns, if this is the sort of value you expect to work with
--- you're paying a considerable price in speed and space for
perhaps-illusory precision gains.  But if you insist on using numeric
then the solution is to cast the expression results to numeric
explicitly.

BTW I wonder whether you ought not be looking into postgis rather than
rolling-your-own coordinate arithmetic ...

regards, tom lane
__

This message contains information, which is confidential and may be subject to 
legal privilege. 
If you are not the intended recipient, you must not peruse, use, disseminate, 
distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 
0800 665 463 or i...@linz.govt.nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any 
attachments, after its transmission from LINZ.

Thank you.
__

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage with functions in where condition

2010-07-09 Thread Tom Lane

Jeremy Palmer  writes:
> This is the query that does not use the indexes:

> SELECT
>   coo.nod_id,
>   640*radians(sqrt((coo.value1 - -41.0618)^2+((coo.value2 - 
> 175.58461)*cos(radians(-41.0618)))^2)) as distance
> FROM 
>   crs_coordinate coo
> WHERE
>   coo.value1 between -41.0618-degrees(1200.0/640.0) and 
> -41.0618+degrees(1200.0/640.0) and
>   coo.value2 between 
> 175.58461-degrees(1200.0/640.0)/(cos(radians(-41.0618))) and 
> 175.58461+degrees(1200.0/640.0)/(cos(radians(-41.0618)));

Those expressions yield float8, not numeric, and numeric vs float8 isn't
an indexable operator for reasons we needn't get into here.  You should
probably rethink whether numeric is really the best choice of datatype
for your columns, if this is the sort of value you expect to work with
--- you're paying a considerable price in speed and space for
perhaps-illusory precision gains.  But if you insist on using numeric
then the solution is to cast the expression results to numeric
explicitly.

BTW I wonder whether you ought not be looking into postgis rather than
rolling-your-own coordinate arithmetic ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Index usage with functions in where condition

2010-07-09 Thread Jeremy Palmer

I'm having trouble getting the query planner to use indexes. The situation 
occurs when writing a query that uses functions for defining the parameters for 
the conditions on the indexed columns. The system I'm running is Windows Server 
2003, using version 8.4.2 of PostgreSQL.

This is the following table that I'm running my query against:

CREATE TABLE crs_coordinate
(
  id integer NOT NULL,
  nod_id integer NOT NULL,
  value1 numeric(22,12),
  value2 numeric(22,12),
  CONSTRAINT crs_coordinate_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

CREATE INDEX coo_value1 ON crs_coordinate USING btree (value1);
CREATE INDEX coo_value2 ON crs_coordinate USING btree (value2);

This table has 23 million rows in it and was analysed just before planning my 
queries. 

This is the query that does not use the indexes:

SELECT
  coo.nod_id,
  640*radians(sqrt((coo.value1 - -41.0618)^2+((coo.value2 - 
175.58461)*cos(radians(-41.0618)))^2)) as distance
FROM 
  crs_coordinate coo
WHERE
  coo.value1 between -41.0618-degrees(1200.0/640.0) and 
-41.0618+degrees(1200.0/640.0) and
  coo.value2 between 
175.58461-degrees(1200.0/640.0)/(cos(radians(-41.0618))) and 
175.58461+degrees(1200.0/640.0)/(cos(radians(-41.0618)));

Seq Scan on crs_coordinate coo  (cost=0.00..1039607.49 rows=592 width=28)
  Filter: (((value1)::double precision >= (-41.0725429586587)::double 
precision) AND ((value1)::double precision <= (-41.0510570413413)::double 
precision) AND ((value2)::double precision >= 175.570362072701::double 
precision) AND ((value2)::double precision <= 175.598857927299::double 
precision))

However if I pre-evaluated the parameters for the where condition on the value1 
and value2 columns, the planner chooses to use the indexes:

SELECT
  coo.nod_id,
  640*radians(sqrt((coo.value1 - -41.0618)^2+((coo.value2 - 
175.58461)*cos(radians(-41.0618)))^2)) as distance
FROM 
  crs_coordinate coo
WHERE
  coo.value1 BETWEEN -41.07254296 AND -41.05105704 AND
  coo.value2 BETWEEN 175.57036207 AND 175.59885792;

Bitmap Heap Scan on crs_coordinate coo  (cost=5299.61..6705.41 rows=356 
width=28)
  Recheck Cond: ((value1 >= (-41.07254296)) AND (value1 <= (-41.05105704)) AND 
(value2 >= 175.57036207) AND (value2 <= 175.59885792))
  ->  BitmapAnd  (cost=5299.61..5299.61 rows=356 width=0)
->  Bitmap Index Scan on coo_value1  (cost=0.00..1401.12 rows=54923 
width=0)
  Index Cond: ((value1 >= (-41.07254296)) AND (value1 <= 
(-41.05105704)))
->  Bitmap Index Scan on coo_value2  (cost=0.00..3898.06 rows=153417 
width=0)
  Index Cond: ((value2 >= 175.57036207) AND (value2 <= 
175.59885792))

So why is the first query not using the indexes on the value1 and value2 
columns? I'm assuming that both the COS and RAIDIANS functions are STRICT 
IMMUTABLE, so logically the evaluation of these functions in the where clause 
should be inlined. Looking at the query plan this inlining does seem to be 
happening...

At this stage I have a work around by putting the query into a plpgsql function 
and using dynamic SQL. But it is still frustrating why the planner seems to be 
working in a far from optimal fashion. Can anyone shed some light on this for 
me?

Thanks,
Jeremy

__

This message contains information, which is confidential and may be subject to 
legal privilege. 
If you are not the intended recipient, you must not peruse, use, disseminate, 
distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 
0800 665 463 or i...@linz.govt.nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any 
attachments, after its transmission from LINZ.

Thank you.
__

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] index usage in not like

2010-02-18 Thread A. Kretschmer

In response to Kenneth Marshall :
> > > How many rows do you have in your table?  If there are relatively few,
> > > it probably guesses it to be cheaper to do a sequential scan and
> > > calculate lower values on-the-fly rather than bother with the index.
> > 
> > That's one reason, an other reason, i think, is, that a btree-index can't
> > search with an 'not like' - operator.
> 
> The 'not like' condition is likely to be extremely non-selective
> which would cause a sequential scan to be used in any event whether
> or not an index could be used.

That's true, but i have an example where the 'not like' condition is
extremely selective:

,[  sql  ]
| test=*# select count(1) from words where lower(w) not like lower('f%');
|  count
| ---
| 10
| (1 row)
|
| test=*# select count(1) from words where lower(w)  like lower('f%');
|  count
| ---
|  1
| (1 row)
`


But the index can't use:

,[  code  ]
| test=*# explain select * from words where lower(w) not like lower('f%');
| QUERY PLAN
| --
|  Seq Scan on words  (cost=0.00..4396.15 rows=10 width=47)
|Filter: (lower(w) !~~ 'f%'::text)
| (2 rows)
`


And i think, the reason is:

,[  quote from docu  ]
| B-trees can handle equality and range queries on data that can be sorted
| into some ordering. In particular, the PostgreSQL query planner will
| consider using a B-tree index whenever an indexed column is involved in
| a comparison using one of these operators:
|
| <
| <=
| =
| >=
| >
`



Regards, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] index usage in not like

2010-02-18 Thread Thom Brown

On 18 February 2010 12:18, A. Kretschmer
 wrote:
> In response to Thom Brown :
>> On 18 February 2010 11:55, AI Rumman  wrote:
>> > "Not like" operation does not use index.
>> >
>> > select * from vtiger_contactscf where lower(cf_1253) not like
>> > lower('Former%')
>> >
>> > I created index on lower(cf_1253).
>> >
>> > How can I ensure index usage in not like operation?
>> > Anyone please help.
>> >
>>
>> How many rows do you have in your table?  If there are relatively few,
>> it probably guesses it to be cheaper to do a sequential scan and
>> calculate lower values on-the-fly rather than bother with the index.
>
> That's one reason, an other reason, i think, is, that a btree-index can't
> search with an 'not like' - operator.
>

Erm.. yes.  Now that you say it, it's obvious. :S

Thom

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] index usage in not like

2010-02-18 Thread Kenneth Marshall

On Thu, Feb 18, 2010 at 01:18:10PM +0100, A. Kretschmer wrote:
> In response to Thom Brown :
> > On 18 February 2010 11:55, AI Rumman  wrote:
> > > "Not like" operation does not use index.
> > >
> > > select * from vtiger_contactscf where lower(cf_1253) not like
> > > lower('Former%')
> > >
> > > I created index on lower(cf_1253).
> > >
> > > How can I ensure index usage in not like operation?
> > > Anyone please help.
> > >
> > 
> > How many rows do you have in your table?  If there are relatively few,
> > it probably guesses it to be cheaper to do a sequential scan and
> > calculate lower values on-the-fly rather than bother with the index.
> 
> That's one reason, an other reason, i think, is, that a btree-index can't
> search with an 'not like' - operator.
> 
> 
> 
> test=*# insert into words select 'fucking example' from 
> generate_series(1,1);
> INSERT 0 1
> test=*# insert into words select 'abc' from generate_series(1,10);
> INSERT 0 10
> test=*# explain select * from words where lower(w)  like lower('a%') or 
> lower(w)  like lower('b%');
>  QUERY PLAN
> -
>  Bitmap Heap Scan on words  (cost=1538.75..6933.39 rows=55643 width=36)
>Recheck Cond: ((lower(w) ~~ 'a%'::text) OR (lower(w) ~~ 'b%'::text))
>Filter: ((lower(w) ~~ 'a%'::text) OR (lower(w) ~~ 'b%'::text))
>->  BitmapOr  (cost=1538.75..1538.75 rows=57432 width=0)
>  ->  Bitmap Index Scan on idx_words  (cost=0.00..1027.04 rows=39073 
> width=0)
>Index Cond: ((lower(w) ~>=~ 'a'::text) AND (lower(w) ~<~ 
> 'b'::text))
>  ->  Bitmap Index Scan on idx_words  (cost=0.00..483.90 rows=18359 
> width=0)
>Index Cond: ((lower(w) ~>=~ 'b'::text) AND (lower(w) ~<~ 
> 'c'::text))
> (8 rows)
> 
> test=*# explain select * from words where lower(w) not like lower('a%') or 
> lower(w)  like lower('b%');
> QUERY PLAN
> ---
>  Seq Scan on words  (cost=0.00..10624.48 rows=282609 width=36)
>Filter: ((lower(w) !~~ 'a%'::text) OR (lower(w) ~~ 'b%'::text))
> (2 rows)
> 
> 
> In other words: revert your where-condition from 'not like' to multiple 
> 'like' conditions for all letters except 'f%'.
> 
> 
> Andreas

The 'not like' condition is likely to be extremely non-selective
which would cause a sequential scan to be used in any event whether
or not an index could be used.

Cheers,
Ken


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] index usage in not like

2010-02-18 Thread A. Kretschmer

In response to Thom Brown :
> On 18 February 2010 11:55, AI Rumman  wrote:
> > "Not like" operation does not use index.
> >
> > select * from vtiger_contactscf where lower(cf_1253) not like
> > lower('Former%')
> >
> > I created index on lower(cf_1253).
> >
> > How can I ensure index usage in not like operation?
> > Anyone please help.
> >
> 
> How many rows do you have in your table?  If there are relatively few,
> it probably guesses it to be cheaper to do a sequential scan and
> calculate lower values on-the-fly rather than bother with the index.

That's one reason, an other reason, i think, is, that a btree-index can't
search with an 'not like' - operator.



test=*# insert into words select 'fucking example' from 
generate_series(1,1);
INSERT 0 1
test=*# insert into words select 'abc' from generate_series(1,10);
INSERT 0 10
test=*# explain select * from words where lower(w)  like lower('a%') or 
lower(w)  like lower('b%');
 QUERY PLAN
-
 Bitmap Heap Scan on words  (cost=1538.75..6933.39 rows=55643 width=36)
   Recheck Cond: ((lower(w) ~~ 'a%'::text) OR (lower(w) ~~ 'b%'::text))
   Filter: ((lower(w) ~~ 'a%'::text) OR (lower(w) ~~ 'b%'::text))
   ->  BitmapOr  (cost=1538.75..1538.75 rows=57432 width=0)
 ->  Bitmap Index Scan on idx_words  (cost=0.00..1027.04 rows=39073 
width=0)
   Index Cond: ((lower(w) ~>=~ 'a'::text) AND (lower(w) ~<~ 
'b'::text))
 ->  Bitmap Index Scan on idx_words  (cost=0.00..483.90 rows=18359 
width=0)
   Index Cond: ((lower(w) ~>=~ 'b'::text) AND (lower(w) ~<~ 
'c'::text))
(8 rows)

test=*# explain select * from words where lower(w) not like lower('a%') or 
lower(w)  like lower('b%');
QUERY PLAN
---
 Seq Scan on words  (cost=0.00..10624.48 rows=282609 width=36)
   Filter: ((lower(w) !~~ 'a%'::text) OR (lower(w) ~~ 'b%'::text))
(2 rows)


In other words: revert your where-condition from 'not like' to multiple 'like' 
conditions for all letters except 'f%'.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] index usage in not like

2010-02-18 Thread Thom Brown

> On Thu, Feb 18, 2010 at 6:00 PM, Thom Brown  wrote:
>>
>> On 18 February 2010 11:55, AI Rumman  wrote:
>> > "Not like" operation does not use index.
>> >
>> > select * from vtiger_contactscf where lower(cf_1253) not like
>> > lower('Former%')
>> >
>> > I created index on lower(cf_1253).
>> >
>> > How can I ensure index usage in not like operation?
>> > Anyone please help.
>> >
>>
>> How many rows do you have in your table?  If there are relatively few,
>> it probably guesses it to be cheaper to do a sequential scan and
>> calculate lower values on-the-fly rather than bother with the index.
>>
>> Thom
>
On 18 February 2010 12:06, AI Rumman  wrote:
> vtigercrm504=# explain analyze select * from vtiger_contactscf where
> lower(cf_1253)  like 'customer';
>
> QUERY
> PLAN
> 
>  Index Scan using vtiger_contactscf_cf_1253_idx on vtiger_contactscf
> (cost=0.00..146.54 rows=6093 width=179) (actual time=0.083..29.868 rows=5171
> loops=1)
>Index Cond: (lower((cf_1253)::text) ~=~ 'customer'::character varying)
>Filter: (lower((cf_1253)::text) ~~ 'customer'::text)
>  Total runtime: 34.956 ms
> (4 rows)
> vtigercrm504=# explain analyze select * from vtiger_contactscf where
> lower(cf_1253)  like 'customer';
>
> QUERY
> PLAN
> 
>  Index Scan using vtiger_contactscf_cf_1253_idx on vtiger_contactscf
> (cost=0.00..146.54 rows=6093 width=179) (actual time=0.083..29.868 rows=5171
> loops=1)
>Index Cond: (lower((cf_1253)::text) ~=~ 'customer'::character varying)
>Filter: (lower((cf_1253)::text) ~~ 'customer'::text)
>  Total runtime: 34.956 ms
> (4 rows)

Could you do the same again for a "not like" query?

Thom

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] index usage in not like

2010-02-18 Thread Thom Brown

On 18 February 2010 11:55, AI Rumman  wrote:
> "Not like" operation does not use index.
>
> select * from vtiger_contactscf where lower(cf_1253) not like
> lower('Former%')
>
> I created index on lower(cf_1253).
>
> How can I ensure index usage in not like operation?
> Anyone please help.
>

How many rows do you have in your table?  If there are relatively few,
it probably guesses it to be cheaper to do a sequential scan and
calculate lower values on-the-fly rather than bother with the index.

Thom

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] index usage in not like

2010-02-18 Thread AI Rumman

"Not like" operation does not use index.

select * from vtiger_contactscf where lower(cf_1253) not like
lower('Former%')

I created index on lower(cf_1253).

How can I ensure index usage in not like operation?
Anyone please help.

Re: [PERFORM] Index usage with sub select or outer joins

2008-11-12 Thread Julien Theulier

Hello, Joshua,

I did different test cases and here are the results (numbers in seconds),
using (case sub queries) or not (case join) the index:
Rows (main table)   Outer Join  Sub queries
setting
1396163 rows39.219.6
work_mem=256Mb
3347443 rows72.2203.1
work_mem=256Mb
3347443 rows70.331.1
work_mem=1024Mb
4321072 rows115 554.9
work_mem=256Mb
4321072 rows111 583
work_mem=1024Mb
All outer joins where done without index uses

To force the use of the index for the first case (outer join), I have change
the seq_scan cost (from 1 to 2.5), it takes now only 6.1s for the outer join
on 1.4M rows. New explain plan below:
"HashAggregate  (cost=457881.84..460248.84 rows=39450 width=49)"
"  ->  Nested Loop Left Join  (cost=0.00..456994.22 rows=39450 width=49)"
"->  Seq Scan on bm_us_views_main_2608 a  (cost=0.00..223677.45
rows=39450 width=41)"
"  Filter: ((item_type = ANY ('{7,9}'::numeric[])) AND (qty >
1))"
"->  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..5.65 rows=13 width=19)"
"  Index Cond: ((b.item_id = a.item_id) AND (b.bid_date <
a.pv_timestamp) AND (b.bid_date >= (a.pv_timestamp - '60 days'::interval)))"

Index bm_us_bids_item_ix is on item_id, bidder_id (not used in the
condition) & bid_date

What can be the recommendations on tuning the different costs so it can
better estimate the seq scan & index scans costs? I think the issue is
there. But didn't find any figures helping to choose the correct parameters
according to cpu & disks speed

Regards,
Julien Theulier

-Message d'origine-
De : Joshua Tolley [mailto:[EMAIL PROTECTED] 
Envoyé : mercredi 12 novembre 2008 14:54
À : Julien Theulier
Cc : pgsql-performance@postgresql.org
Objet : Re: [PERFORM] Index usage with sub select or inner joins

On Wed, Nov 12, 2008 at 02:22:47PM +0100, Julien Theulier wrote:
> QUESTION: Why the planner choose seq scan in the first case & indexes 
> scan in the second case? In a more general way, I observed that the 
> planner has difficulties to select index scans & does in almost all 
> the cases seq scan, when doing join queries. After investigations, it 
> looks like when you join table a with table b on a column x and y and 
> you have an index on column x only, the planner is not able to choose 
> the index scan. You have to build the index corresponding exactly to 
> the join statement btw the 2 tables

Short, general answer: index scans aren't always faster than sequential
scans, and the planner is smart enough to know that. Googling "Why isn't
postgresql using my index" provides more detailed results, but in short, if
it scans an index, it has to read pages from the index, and for all the
tuples it finds in the index, it has to read once again from the heap,
whereas a sequential scan requires reading once from the heap. If your query
will visit most of the rows of the table, pgsql will choose a sequential
scan over an index scan.

- Josh / eggyknap


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage with sub select or inner joins

2008-11-12 Thread Joshua Tolley

On Wed, Nov 12, 2008 at 02:22:47PM +0100, Julien Theulier wrote:
> QUESTION: Why the planner choose seq scan in the first case & indexes scan
> in the second case? In a more general way, I observed that the planner has
> difficulties to select index scans & does in almost all the cases seq scan,
> when doing join queries. After investigations, it looks like when you join
> table a with table b on a column x and y and you have an index on column x
> only, the planner is not able to choose the index scan. You have to build
> the index corresponding exactly to the join statement btw the 2 tables

Short, general answer: index scans aren't always faster than sequential
scans, and the planner is smart enough to know that. Googling "Why isn't
postgresql using my index" provides more detailed results, but in short,
if it scans an index, it has to read pages from the index, and for all
the tuples it finds in the index, it has to read once again from the
heap, whereas a sequential scan requires reading once from the heap. If
your query will visit most of the rows of the table, pgsql will choose a
sequential scan over an index scan.

- Josh / eggyknap

signature.asc
Description: Digital signature

[PERFORM] Index usage with sub select or inner joins

2008-11-12 Thread Julien Theulier

Hello,

I am doing some performances testing on Postgres & I discovered the
following behavior, when using 2 different ways of writing selects (but
doing the same aggregations at the end):
1. test case 1, using outer join:
create table test2 as 
select
soj_session_log_id, pv_timestamp, vi_pv_id,a.item_id,
coalesce(sum(case when (bid_date=pv_timestamp -
INTERVAL '3 day') then 1 else 0 end)) as recent_sales_3d1,
coalesce(sum(case when (bid_date=pv_timestamp -
INTERVAL '7 day') then 1 else 0 end)) as recent_sales_7d1,
coalesce(sum(case when (bid_date=pv_timestamp -
INTERVAL '14 day') then 1 else 0 end)) as recent_sales_14d1,
coalesce(sum(case when (bid_date=pv_timestamp -
INTERVAL '30 day') then 1 else 0 end)) as recent_sales_30d1,
coalesce(sum(case when (bid_date=pv_timestamp -
INTERVAL '60 day') then 1 else 0 end)) as recent_sales_60d1
from bm_us_views_main_1609 a
left outer join bm_us_bids b on (b.item_id=a.item_id and
b.bid_date=a.pv_timestamp - INTERVAL '60
day'))
where a.item_type in (7,9) and qty>1
group by soj_session_log_id, pv_timestamp, vi_pv_id, a.item_id;;

This query doesn't use any index according to the explain plan:
"HashAggregate  (cost=672109.07..683054.81 rows=182429 width=49)"
"  ->  Merge Left Join  (cost=646489.83..668004.42 rows=182429 width=49)"
"Merge Cond: (a.item_id = b.item_id)"
"Join Filter: ((b.bid_date < a.pv_timestamp) AND (b.bid_date >=
(a.pv_timestamp - '60 days'::interval)))"
"->  Sort  (cost=331768.62..332224.69 rows=182429 width=41)"
"  Sort Key: a.item_id"
"  ->  Seq Scan on bm_us_views_main_1609 a
(cost=0.00..315827.08 rows=182429 width=41)"
"Filter: ((item_type = ANY ('{7,9}'::numeric[])) AND
(qty > 1))"
"->  Sort  (cost=314669.01..320949.52 rows=2512205 width=19)"
"  Sort Key: b.item_id"
"  ->  Seq Scan on bm_us_bids b  (cost=0.00..47615.05
rows=2512205 width=19)"

2. Test case 2, using sub queries:
create table test2 as 
select
soj_session_log_id, pv_timestamp, vi_pv_id,item_id,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_date=pv_timestamp - INTERVAL '3 day' group by
item_id ),0) as recent_sales_3d,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_date=pv_timestamp - INTERVAL '7 day' group by
item_id ),0) as recent_sales_7d,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_date=pv_timestamp - INTERVAL '14 day' group
by item_id ),0) as recent_sales_14d,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_date=pv_timestamp - INTERVAL '30 day' group
by item_id ),0) as recent_sales_30d,
coalesce((select count(*) from bm_us_bids b where b.item_id=a.item_id and
bid_date=pv_timestamp - INTERVAL '60 day' group
by item_id ),0) as recent_sales_60d
from bm_us_views_main_1609 a
where item_type in (7,9) and qty>1;

This query uses indexes  according to the explain plan:
"Seq Scan on bm_us_views_main_1609 a  (cost=0.00..8720230.77 rows=182429
width=41)"
"  Filter: ((item_type = ANY ('{7,9}'::numeric[])) AND (qty > 1))"
"  SubPlan"
"->  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)"
"  ->  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)"
"Index Cond: ((item_id = $0) AND (bid_date < $1) AND
(bid_date >= ($1 - '60 days'::interval)))"
"->  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)"
"  ->  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)"
"Index Cond: ((item_id = $0) AND (bid_date < $1) AND
(bid_date >= ($1 - '30 days'::interval)))"
"->  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)"
"  ->  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)"
"Index Cond: ((item_id = $0) AND (bid_date < $1) AND
(bid_date >= ($1 - '14 days'::interval)))"
"->  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)"
"  ->  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)"
"Index Cond: ((item_id = $0) AND (bid_date < $1) AND
(bid_date >= ($1 - '7 days'::interval)))"
"->  GroupAggregate  (cost=0.00..9.21 rows=1 width=11)"
"  ->  Index Scan using bm_us_bids_item_ix on bm_us_bids b
(cost=0.00..9.20 rows=1 width=11)"
"Index Cond: ((item_id = $0) AND (bid_date < $1) AND
(bid_date >= ($1 - '3 days'::interval)))"

The index bm_us_bids_item_ix is on columns item_id, bidder_id, bid_date


QUESTION: Why the planner choose seq scan in the first case & indexes scan
in the second case? In a more general way, I observed that the planner has
difficulties to select index scans & does in almost all the cases seq scan,
when doing join queries. After investigations, it looks like when you join
table a with table b on a column x and y and you have an index on column x
only, the planner is not able to choose the index sca

Re: [PERFORM] Index usage problem on 8.3.3

2008-10-31 Thread Jeff Frost


On Fri, 31 Oct 2008, Gregory Stark wrote:


Tom Lane <[EMAIL PROTECTED]> writes:


Jeff Frost <[EMAIL PROTECTED]> writes:

Tom Lane wrote:

Huh.  That does sound like it's a version-to-version difference.
There's nothing in the CVS log that seems related though.  Are you
willing to post your test case?


It's a customer DB, so I'll contact them and see if we can boil it down
to a test case with no sensitive data.


Well, if there was a change it seems to have been in the right direction
;-) so this is mostly just idle curiosity.  Don't jump through hoops to
get a test case.


Assuming it's not a bug...


Well, after boiling down my test case to the bare essentials, I was unable to 
reproduce the different behavior between 8.3.3 and 8.3.4.  Now, I've gone back 
to the original script and can't reproduce the behavior I previously saw on 
8.3.4 and my screen session doesn't have enough scrollback to look at what 
happened previously.  I was thinking perhaps I had inadvertently committed the 
transaction, but then the act would have been dropped as it's a temp table 
created with ON COMMIT DROP.  But, I've tested 3 times in a row and every time 
8.3.4 uses the seq scan just like 8.3.3 now, so I must've done something 
differently to get that result as Tom had originally suggested.  I just can't 
think what it might have been.  Perhaps it's time to buy some glasses. :-/


--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage problem on 8.3.3

2008-10-31 Thread Gregory Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Jeff Frost <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> Huh.  That does sound like it's a version-to-version difference.
>>> There's nothing in the CVS log that seems related though.  Are you
>>> willing to post your test case?
>>> 
>> It's a customer DB, so I'll contact them and see if we can boil it down
>> to a test case with no sensitive data.
>
> Well, if there was a change it seems to have been in the right direction
> ;-) so this is mostly just idle curiosity.  Don't jump through hoops to
> get a test case.

Assuming it's not a bug...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Tom Lane

Jeff Frost <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Huh.  That does sound like it's a version-to-version difference.
>> There's nothing in the CVS log that seems related though.  Are you
>> willing to post your test case?
>> 
> It's a customer DB, so I'll contact them and see if we can boil it down
> to a test case with no sensitive data.

Well, if there was a change it seems to have been in the right direction
;-) so this is mostly just idle curiosity.  Don't jump through hoops to
get a test case.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Jeff Frost



Tom Lane wrote:
> Jeff Frost <[EMAIL PROTECTED]> writes:
>   
>> On Thu, 30 Oct 2008, Tom Lane wrote:
>> 
 Any idea why I don't see it on 8.3.4?
 
>>> I think it's more likely some small difference in your test conditions
>>> than any real version-to-version difference.  In particular I think the
>>> "still see" test might be influenced by the ages of transactions running
>>> concurrently.
>>>   
>
>   
>> Interesting.  This is on a test server which has no other concurrent 
>> transactions and it acts the same way after I stopped 8.3.4 and started up 
>> 8.3.3 again as it did before stopping 8.3.3 to bring up 8.3.4.  Hrmm..I'm 
>> not 
>> sure that makes sense.  So, I did the test with the sql script on 8.3.3, 
>> then 
>> shut down 8.3.3, started up 8.3.4 on the same data dir, ran the test 
>> successfully.  Next I shut down 8.3.4 and started 8.3.3 and verified that 
>> the 
>> behavior was still the same on 8.3.3.  I wonder what else I might be doing 
>> differently.
>> 
>
> Huh.  That does sound like it's a version-to-version difference.
> There's nothing in the CVS log that seems related though.  Are you
> willing to post your test case?
>   
>   
It's a customer DB, so I'll contact them and see if we can boil it down
to a test case with no sensitive data.

-- 
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032

Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Tom Lane

Jeff Frost <[EMAIL PROTECTED]> writes:
> On Thu, 30 Oct 2008, Tom Lane wrote:
>>> Any idea why I don't see it on 8.3.4?
>> 
>> I think it's more likely some small difference in your test conditions
>> than any real version-to-version difference.  In particular I think the
>> "still see" test might be influenced by the ages of transactions running
>> concurrently.

> Interesting.  This is on a test server which has no other concurrent 
> transactions and it acts the same way after I stopped 8.3.4 and started up 
> 8.3.3 again as it did before stopping 8.3.3 to bring up 8.3.4.  Hrmm..I'm not 
> sure that makes sense.  So, I did the test with the sql script on 8.3.3, then 
> shut down 8.3.3, started up 8.3.4 on the same data dir, ran the test 
> successfully.  Next I shut down 8.3.4 and started 8.3.3 and verified that the 
> behavior was still the same on 8.3.3.  I wonder what else I might be doing 
> differently.

Huh.  That does sound like it's a version-to-version difference.
There's nothing in the CVS log that seems related though.  Are you
willing to post your test case?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Jeff Frost


On Thu, 30 Oct 2008, Tom Lane wrote:


Any idea why I don't see it on 8.3.4?


I think it's more likely some small difference in your test conditions
than any real version-to-version difference.  In particular I think the
"still see" test might be influenced by the ages of transactions running
concurrently.


Interesting.  This is on a test server which has no other concurrent 
transactions and it acts the same way after I stopped 8.3.4 and started up 
8.3.3 again as it did before stopping 8.3.3 to bring up 8.3.4.  Hrmm..I'm not 
sure that makes sense.  So, I did the test with the sql script on 8.3.3, then 
shut down 8.3.3, started up 8.3.4 on the same data dir, ran the test 
successfully.  Next I shut down 8.3.4 and started 8.3.3 and verified that the 
behavior was still the same on 8.3.3.  I wonder what else I might be doing 
differently.


The good news is that making the indexes before the updates seems to make the 
planner happy!


--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Tom Lane

Jeff Frost <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Okay.  What that means is that the indexes were created on data that had
>> already been inserted and updated to some extent, resulting in
>> HOT-update chains that turned out to be illegal for the new indexes.
>> The way we deal with this is to mark the indexes as not usable by any
>> query that can still see the dead HOT-updated tuples.

> Any idea why I don't see it on 8.3.4?

I think it's more likely some small difference in your test conditions
than any real version-to-version difference.  In particular I think the
"still see" test might be influenced by the ages of transactions running
concurrently.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Jeff Frost

Tom Lane wrote:
> Okay.  What that means is that the indexes were created on data that had
> already been inserted and updated to some extent, resulting in
> HOT-update chains that turned out to be illegal for the new indexes.
> The way we deal with this is to mark the indexes as not usable by any
> query that can still see the dead HOT-updated tuples.
>
> Your best bet for dodging the problem is probably to break the operation
> into two transactions, if that's possible.  INSERT and UPDATE in the
> first xact, create the indexes at the start of the second.  (Hmm ...
> I'm not sure if that's sufficient if there are other concurrent
> transactions; but it's certainly necessary.)  Another possibility is
> to create the indexes just after data load, before you start updating
> the columns they're on.
>
>   
Thanks Tom!

Any idea why I don't see it on 8.3.4?

-- 
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Tom Lane

Jeff Frost <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> This may be a HOT side-effect ... is pg_index.indcheckxmin set for
>> the index?
>> 
> Yep, sure enough, the 'act' table's indexes have it set and jefftest and
> jefftest2's indexes do not.

Okay.  What that means is that the indexes were created on data that had
already been inserted and updated to some extent, resulting in
HOT-update chains that turned out to be illegal for the new indexes.
The way we deal with this is to mark the indexes as not usable by any
query that can still see the dead HOT-updated tuples.

Your best bet for dodging the problem is probably to break the operation
into two transactions, if that's possible.  INSERT and UPDATE in the
first xact, create the indexes at the start of the second.  (Hmm ...
I'm not sure if that's sufficient if there are other concurrent
transactions; but it's certainly necessary.)  Another possibility is
to create the indexes just after data load, before you start updating
the columns they're on.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Jeff Frost

Tom Lane wrote:
> Jeff Frost <[EMAIL PROTECTED]> writes:
>   
>> I've run across a strange problem with PG 8.3.3 not using indexes on a
>> particular table after building the table during a transaction.
>> 
>
> This may be a HOT side-effect ... is pg_index.indcheckxmin set for
> the index?
>   
Yep, sure enough, the 'act' table's indexes have it set and jefftest and
jefftest2's indexes do not.

select c.relname,i.indcheckxmin  from pg_class c, pg_index i WHERE
i.indexrelid = c.oid AND c.relname IN ('act_act_usr_id', 'act_arrived',
'act_closing', 'act_place');
relname | indcheckxmin
+--
 act_closing| t
 act_act_usr_id | t
 act_place  | t
 act_arrived| t
(4 rows)


consdb=# select c.relname,i.indcheckxmin  from pg_class c, pg_index i
WHERE i.indexrelid = c.oid AND c.relname IN
('jefftest2_jefftest_usr_id', 'jefftest2_arrived', 'jefftest2_closing',
'jefftest2_place');
  relname  | indcheckxmin
---+--
 jefftest2_jefftest_usr_id | f
 jefftest2_place   | f
 jefftest2_arrived | f
 jefftest2_closing | f
(4 rows)


-- 
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032

Re: [PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Tom Lane

Jeff Frost <[EMAIL PROTECTED]> writes:
> I've run across a strange problem with PG 8.3.3 not using indexes on a
> particular table after building the table during a transaction.

This may be a HOT side-effect ... is pg_index.indcheckxmin set for
the index?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Index usage problem on 8.3.3

2008-10-30 Thread Jeff Frost

I've run across a strange problem with PG 8.3.3 not using indexes on a
particular table after building the table during a transaction.

You can see a transcript of the issue here:

http://gist.github.com/21154

Interestingly, if I create another temp table 'CREATE TEMP TABLE AS
SELECT * FROM act' as seen on line 107, then add the same indexes to
that table, PG will use the indexes.  While it's not in the gist
transcript, even an extremely simple query like:

SELECT * FROM act WHERE act_usr_id = 1;

will not use the index on the original act table, but the jefftest and
jefftest2 tables both work fine.  As you can probably see in the
transcript, the tables have been ANALYZEd.  I even tried 'enable
seqscan=0;' and that made the cost really high for the seq scan, but the
planner still chose the seq scan.

The issue does not affect 8.2.3 nor does it affect 8.3.4.  I didn't see
any mention of a fix for this sort of thing in 8.3.4's release notes.  I
was wondering if this is a known bug in 8.3.3 (and maybe other 8.3.x
versions) and just didn't make it into the release notes of 8.3.4?

-- 
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 916-647-6411 FAX: 916-405-4032


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] index usage makes problem

2008-03-06 Thread sathiya psql

I had 50 lakh records in my table...

while counting that am using that row in where condition... which makes
problem, cpu is waiting for device...

Debian OS, postresql 7.4, 50 lakh records.

Query is

EXPLAIN ANALYZE select count(call_id) from call_log where call_id > 1;

while seeing the top, cpu is waiting for i/o, and without this call_id
condition if i do
   EXPLAIN ANALYZE select count(oid) from call_log where oid > 1;
it executed in 21 seconds

Re: [PERFORM] index usage on arrays

2008-02-07 Thread Tom Lane

andrew klassen <[EMAIL PROTECTED]> writes:
> Is there any alternative to what am I currently doing other than creating a 
> row for 
> each array element,

Since (I think) 8.2, you could create a GIN index on the array column
and then array overlap (&&) would be indexable.  GIN has some
performance issues if the table is heavily updated, though.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] index usage on arrays

2008-02-07 Thread Oleg Bartunov


andrew,
what are your queries ? Have you seen contrib/intarray, 
GIN index ?


On Thu, 7 Feb 2008, andrew klassen wrote:


I am using Postgres 8.2.5.

I have a table that has rows containing a variable length array with a known 
maximum.
I was doing selects on the array elements using an ANY match. The performance
was not too good as my table got bigger. So I added an index on the array.
That didn't help since the select was not using it.  I saw a thread in the
mailing lists stating the index wouldn't be used.

So I created indices on the individual array elements and then do a select
on each element separately and then combine each match using OR.
This did substantially increase the select performance. However, it may
be difficult to maintain this approach over time as the maximum array
size may increase dramatically and forming the query will become tedious.

Is there any alternative to what am I currently doing other than creating a row 
for
each array element, i.e. stop using an array and use a separate row for each
array index? The reason I didn't want to take this approach is because there are
other columns in the row that will be duplicated needlessly.

Thanks, Andrew


 

Be a better friend, newshound, and
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

[PERFORM] index usage on arrays

2008-02-07 Thread andrew klassen

I am using Postgres 8.2.5. 
 
I have a table that has rows containing a variable length array with a known 
maximum. 
I was doing selects on the array elements using an ANY match. The performance 
was not too good as my table got bigger. So I added an index on the array. 
That didn't help since the select was not using it.  I saw a thread in the 
mailing lists stating the index wouldn't be used. 
 
So I created indices on the individual array elements and then do a select
on each element separately and then combine each match using OR. 
This did substantially increase the select performance. However, it may 
be difficult to maintain this approach over time as the maximum array 
size may increase dramatically and forming the query will become tedious. 
 
Is there any alternative to what am I currently doing other than creating a row 
for 
each array element, i.e. stop using an array and use a separate row for each 
array index? The reason I didn't want to take this approach is because there 
are 
other columns in the row that will be duplicated needlessly.
 
Thanks, Andrew


  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Re: [PERFORM] Index usage when bitwise operator is used

2007-09-18 Thread valgog

Hi Tom,

do you think it would be a good idea to ask GIN index team to
implement an int-based bitmap set indexing operator for GIN/GiST based
indexes? Or there will be a possibility to somehow optimally index
arrays of enumerations to implement such bitmap structures in 8.3 or
later postgresql versions?

With best regards,

-- Valentine

On Sep 17, 3:37 pm, [EMAIL PROTECTED] (Tom Lane) wrote:
> "Kevin Grittner" <[EMAIL PROTECTED]> writes:
> > On Mon, Sep 17, 2007 at  2:49 AM, in message
> > <[EMAIL PROTECTED]>, valgog
> > <[EMAIL PROTECTED]> wrote:=20
> >> Are you sure you understood what was the question?
>
> >> Is the TBL1.CATEGORY = TBL2.CATEGORY the same as TBL1.CATEGORY &
> >> TBL2.CATEGORY > 0?
> > Yes, given that he stipulated that one and only one bit would be set.
>
> Really?  In that case, isn't this bit-field just a bad implementation of
> an enum-style field?
>
> regards, tom lane
>
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Index usage when bitwise operator is used

2007-09-17 Thread W.Alphonse HAROUNY

Hi,

A little clarification. Actually,  TBL1.CATEGORY and/or  TBL2.CATEGORY may
hold a binary value having multiple binary(ies) '1'.
Each binary value column represent an business attribute.
If a binary value column is equal to '1', it means that the business
attribute is True,
otherwise it is false.
I adopted this avoid defining a detail table to table TBL1. Idem to TBL2.

If  TBL1.CATEGORY |  TBL2.CATEGORY > 0
=> it means that we have at least one common business attribute that is TRUE
for TBL1 and TBL2.

Regards
W.Alf

On 9/17/07, Tom Lane <[EMAIL PROTECTED]> wrote:
>
> "Kevin Grittner" <[EMAIL PROTECTED]> writes:
> > On Mon, Sep 17, 2007 at  2:49 AM, in message
> > <[EMAIL PROTECTED]>, valgog
> > <[EMAIL PROTECTED]> wrote:=20
> >> Are you sure you understood what was the question?
> >>
> >> Is the TBL1.CATEGORY = TBL2.CATEGORY the same as TBL1.CATEGORY &
> >> TBL2.CATEGORY > 0?
>
> > Yes, given that he stipulated that one and only one bit would be set.
>
> Really?  In that case, isn't this bit-field just a bad implementation of
> an enum-style field?
>
>regards, tom lane
>
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly
>

Re: [PERFORM] Index usage when bitwise operator is used

2007-09-17 Thread Kevin Grittner

>>> On Mon, Sep 17, 2007 at  8:37 AM, in message <[EMAIL PROTECTED]>,
Tom Lane <[EMAIL PROTECTED]> wrote: 
> "Kevin Grittner" <[EMAIL PROTECTED]> writes:
>> On Mon, Sep 17, 2007 at  2:49 AM, in message
>> <[EMAIL PROTECTED]>, valgog
>> <[EMAIL PROTECTED]> wrote:=20
>>> Are you sure you understood what was the question?
>>> 
>>> Is the TBL1.CATEGORY = TBL2.CATEGORY the same as TBL1.CATEGORY &
>>> TBL2.CATEGORY > 0?
> 
>> Yes, given that he stipulated that one and only one bit would be set.
> 
> Really?  In that case, isn't this bit-field just a bad implementation of
> an enum-style field?
 
My bad.  I did misread it.  Sorry, all.
 
-Kevin
 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Index usage when bitwise operator is used

2007-09-17 Thread Tom Lane

"Kevin Grittner" <[EMAIL PROTECTED]> writes:
> On Mon, Sep 17, 2007 at  2:49 AM, in message
> <[EMAIL PROTECTED]>, valgog
> <[EMAIL PROTECTED]> wrote:=20
>> Are you sure you understood what was the question?
>> 
>> Is the TBL1.CATEGORY = TBL2.CATEGORY the same as TBL1.CATEGORY &
>> TBL2.CATEGORY > 0?

> Yes, given that he stipulated that one and only one bit would be set.

Really?  In that case, isn't this bit-field just a bad implementation of
an enum-style field?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Index usage when bitwise operator is used

2007-09-17 Thread Kevin Grittner

>>> On Mon, Sep 17, 2007 at  2:49 AM, in message
<[EMAIL PROTECTED]>, valgog
<[EMAIL PROTECTED]> wrote: 

>> What about saying?:
>>
>>   TBL1.CATEGORY = TBL2.CATEGORY
>>
> 
> Are you sure you understood what was the question?
> 
> Is the TBL1.CATEGORY = TBL2.CATEGORY the same as TBL1.CATEGORY &
> TBL2.CATEGORY > 0?

Yes, given that he stipulated that one and only one bit would be set.
 
-Kevin
 



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Index usage when bitwise operator is used

2007-09-17 Thread valgog


> What about saying?:
>
>   TBL1.CATEGORY = TBL2.CATEGORY
>

Are you sure you understood what was the question?

Is the TBL1.CATEGORY = TBL2.CATEGORY the same as TBL1.CATEGORY &
TBL2.CATEGORY > 0?


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Index usage when bitwise operator is used

2007-09-17 Thread valgog

Hi,

I could not find and normal solution for that issue. But I am using
some workarounds for that issue.

The solution, that I am using now is to create an index for every bit
of your bitmap field.

So something like

CREATE INDEX idx_hobbybit_0_limited
  ON "versionA".user_fast_index
  USING btree
  (gender, dateofbirth) -- here the gender and dateofbirth fields are
the fields that we usually ORDER BY in the select statements, but you
can play with the needed fields
  WHERE (hobby_bitmap & 1) > 0;

by creating such an index for every used bit and combining WHERE
(hobby_bitmap & 1 ) > 0 like statements the planner will be choosing
the right index to use.

Another workaround, that will be more applicable in your case I think,
is to create a functional GIN index on your bitmap field using a
static function to create an array of bitmap keys from your bitmap
field.

CREATE OR REPLACE FUNCTION
"versionA".bitmap_to_bit_array(source_bitmap integer)
  RETURNS integer[] AS
'select ARRAY( select (1 << s.i) from generate_series(0, 32) as s(i)
where ( 1 << s.i ) & $1 > 0 )'
  LANGUAGE 'sql' IMMUTABLE STRICT;

And than create a GIN index on the needed field using this stored
procedure. After that, it would be possible to use intarray set
operators on the result of that function. This will also make it
possible to use that GIN index.

Actually it would be much much better if it were possible to build GIN
indexes directly on the bitmap fields. But this is to be implemented
by GIN and GiST index development team. Probably would be not a bad
idea to make a feature request on them.

With best regards,

Valentine Gogichashvili

On Sep 13, 2:30 pm, [EMAIL PROTECTED] ("W.Alphonse HAROUNY") wrote:
> Hello,
>
> My question is about index usage when bitwise operations are invoked.
> Situation Context:
> --
>
> Lets suppose we have 2 tables TBL1 and TBL2 as the following:
> TBL1 {
>   . ;
>   integer categoryGroup; // categoryGroup is declared as an index on TABL1
>   . ;
>
> }
>
> TBL2 {
>   . ;
>   integer categoryGroup; // categoryGroup is declared as an index on TABL2
>   . ;
>
> }
>
> By conception, I suppose that:
> - [categoryGroup] may hold a limited number of values, less than 32 values.
> - [categoryGroup] is of type integer => it means 4 bytes => 32 bits
>   => 32 places available to hold binary '0' or binary '1' values.
> - [categoryGroup] is the result of an "OR bitwise operation" among a
> predefined set of variables [variableCategory].
>We suppose that [variableCategory] is of type integer (=>32 bits)
>and each binary value of [variableCategory] may only hold a single binary
> '1'.
>
> Ex: variableCategory1 = 0010
>   variableCategory2 = 0010
>   variableCategory3 = 1000
>
>  If [categoryGroup] =  variableCategory1 | variableCategory2 |
> variableCategory3
> =>[categoryGroup] = 00101010
>
> Question:
> --
> I have an SQL request similar to:
>
> SELECT . FROM TBL1, TBL2 WHERE
>   AND
>  TBL1.CATEGORY & TBL2.CATEGORY <> 0  //-- where & is the AND bitwise
> operator
>
> Qst:
> 1/ IS the above SQL request will use the INDEX [categoryGroup] defined on
> TBL1 and TBL2 ?
> 2/ What should I do or How should I modify my SQL request in order
>to force the query engine to use an index ? (the already defined index or
> another useful index)
>
> Thx a lot

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Index usage when bitwise operator is used

2007-09-16 Thread Kevin Grittner

>>> On Thu, Sep 13, 2007 at  7:30 AM, in message
<[EMAIL PROTECTED]>, "W.Alphonse
HAROUNY" <[EMAIL PROTECTED]> wrote: 

>and each binary value of [variableCategory] may only hold a single binary
> '1'.

>  TBL1.CATEGORY & TBL2.CATEGORY <> 0  //-- where & is the AND bitwise
> operator

What about saying?:

  TBL1.CATEGORY = TBL2.CATEGORY

If your indexes include this and the other columns which cause the tables
to be related, one or both of them stand a pretty good chance of evaluating
to the lowest-cost method to join the tables.  Forcing a query to use an
index outside of it being the cheapest path is rarely productive.

-Kevin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Index usage when bitwise operator is used

2007-09-16 Thread Valentine Gogichashvili

Hi,

I could not find and normal solution for that issue. But I am using some
workarounds for that issue.

The solution, that I am using now is to create an index for every bit of
your bitmap field.

So something like

CREATE INDEX idx_hobbybit_0_limited
  ON "versionA".user_fast_index
  USING btree
  (gender, dateofbirth) -- here the gender and dateofbirth fields are the
fields that we usually ORDER BY in the select statements, but you can
play with the needed fields
  WHERE (hobby_bitmap & 1) > 0;

by creating such an index for every used bit and combining WHERE
(hobby_bitmap & 1 ) > 0 like statements the planner will be choosing the
right index to use.

Another workaround, that will be more applicable in your case I think, is to
create a functional GIN index on your bitmap field using a static function
to create an array of bitmap keys from your bitmap field.

CREATE OR REPLACE FUNCTION "versionA".bitmap_to_bit_array(source_bitmap
integer)
  RETURNS integer[] AS
'select ARRAY( select (1 << s.i) from generate_series(0, 32) as s(i) where (
1 << s.i ) & $1 > 0 )'
  LANGUAGE 'sql' IMMUTABLE STRICT;

And than create a GIN index on the needed field using this stored procedure.
After that, it would be possible to use intarray set operators on the result
of that function. This will also make it possible to use that GIN index.

Actually it would be much much better if it were possible to build GIN
indexes directly on the bitmap fields. But this is to be implemented by GIN
and GiST index development team. Probably would be not a bad idea to make a
feature request on them.

With best regards,

Valentine Gogichashvili

On 9/13/07, W.Alphonse HAROUNY <[EMAIL PROTECTED]> wrote:
>
>  Hello,
>
> My question is about index usage when bitwise operations are invoked.
> Situation Context:
> --
>
> Lets suppose we have 2 tables TBL1 and TBL2 as the following:
> TBL1 {
>   . ;
>   integer categoryGroup; // categoryGroup is declared as an index on TABL1
>   . ;
> }
>
> TBL2 {
>   . ;
>   integer categoryGroup; // categoryGroup is declared as an index on TABL2
>   . ;
> }
>
> By conception, I suppose that:
> - [categoryGroup] may hold a limited number of values, less than 32
> values.
> - [categoryGroup] is of type integer => it means 4 bytes => 32 bits
>   => 32 places available to hold binary '0' or binary '1' values.
> - [categoryGroup] is the result of an "OR bitwise operation" among a
> predefined set of variables [variableCategory].
>We suppose that [variableCategory] is of type integer (=>32 bits)
>and each binary value of [variableCategory] may only hold a single
> binary '1'.
>
>
> Ex: variableCategory1 = 0010
>   variableCategory2 = 0010
>   variableCategory3 = 1000
>
>  If [categoryGroup] =  variableCategory1 | variableCategory2 |
> variableCategory3
> =>[categoryGroup] = 00101010
>
>
>
> Question:
> --
> I have an SQL request similar to:
>
> SELECT . FROM TBL1, TBL2 WHERE
>   AND
>  TBL1.CATEGORY & TBL2.CATEGORY <> 0  //-- where & is the AND bitwise
> operator
>
> Qst:
> 1/ IS the above SQL request will use the INDEX [categoryGroup] defined on
> TBL1 and TBL2 ?
> 2/ What should I do or How should I modify my SQL request in order
>to force the query engine to use an index ? (the already defined index
> or another useful index)
>
>
>
> Thx a loт
>

[PERFORM] Index usage when bitwise operator is used

2007-09-13 Thread W.Alphonse HAROUNY

Hello,

My question is about index usage when bitwise operations are invoked.
Situation Context:
--

Lets suppose we have 2 tables TBL1 and TBL2 as the following:
TBL1 {
  . ;
  integer categoryGroup; // categoryGroup is declared as an index on TABL1
  . ;
}

TBL2 {
  . ;
  integer categoryGroup; // categoryGroup is declared as an index on TABL2
  . ;
}

By conception, I suppose that:
- [categoryGroup] may hold a limited number of values, less than 32 values.
- [categoryGroup] is of type integer => it means 4 bytes => 32 bits
  => 32 places available to hold binary '0' or binary '1' values.
- [categoryGroup] is the result of an "OR bitwise operation" among a
predefined set of variables [variableCategory].
   We suppose that [variableCategory] is of type integer (=>32 bits)
   and each binary value of [variableCategory] may only hold a single binary
'1'.


Ex: variableCategory1 = 0010
  variableCategory2 = 0010
  variableCategory3 = 1000

 If [categoryGroup] =  variableCategory1 | variableCategory2 |
variableCategory3
=>[categoryGroup] = 00101010



Question:
--
I have an SQL request similar to:

SELECT . FROM TBL1, TBL2 WHERE
  AND
 TBL1.CATEGORY & TBL2.CATEGORY <> 0  //-- where & is the AND bitwise
operator

Qst:
1/ IS the above SQL request will use the INDEX [categoryGroup] defined on
TBL1 and TBL2 ?
2/ What should I do or How should I modify my SQL request in order
   to force the query engine to use an index ? (the already defined index or
another useful index)



Thx a lot

Re: [PERFORM] index usage

2007-04-23 Thread Steinar H. Gunderson

On Mon, Apr 23, 2007 at 07:20:29PM +0200, Arkadiusz Raj wrote:
> I have a table in my database that is updated every minute with new acquired

> data. Anyway there is a query to get latest values to be displayed on

> screen. I have postgresql 7.4.2 that work very fine.

You want _at least_ the latest 7.4 version -- ideally, the latest 8.2
version.

> The problem was that

> after hdd crash I have rebuild database from the archive and... Execution

> time of this query starts to be unacceptable.

Have you re-ANALYZEd after the data load?

Anyhow, the issue with the planner not knowing how to estimate expressions
like "now() - interval '5 minutes'" correctly is a known 7.4 issue, and it's
fixed in later versions. It might have worked more or less by accident
earlier, although it seems odd that it wouldn't even have considered the
index scan...

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[PERFORM] index usage

2007-04-23 Thread Arkadiusz Raj

Hi,

I have a table in my database that is updated every minute with new acquired
data. Anyway there is a query to get latest values to be displayed on
screen. I have postgresql 7.4.2 that work very fine. The problem was that
after hdd crash I have rebuild database from the archive and... Execution
time of this query starts to be unacceptable. And I found funny thing. Using
static value in place expression remove this problem. Query started to be
executed fast again.

I did not change any settings in postgresql configuration. Just had to
restart all the services.

Can someone tell me why the optimizer stopped to choose index? I had seqscan
disabled already.

One note about those two outputs below: there are different number of
touples returned due to the fact that in fact the timestamp is chosen
differently. 

Regards,

/Arek

--

explain analyze SELECT distinct on (index) index, status, value FROM _values
WHERE device=1 and timestamp>(now()-5*interval '1 min') ORDER by index,
timestamp desc;
   QUERY
PLAN 

-
 Unique  (cost=100117679.93..100117756.29 rows=1 width=24) (actual
time=5279.262..5279.308 rows=10 loops=1)
   ->  Sort  (cost=100117679.93..100117718.11 rows=15272 width=24) (actual
time=5279.260..5279.275 rows=21 loops=1)
 Sort Key: "index", "timestamp"
 ->  Seq Scan on _values  (cost=1.00..100116618.64
rows=15272 width=24) (actual time=5277.596..5279.184 rows=21 loops=1)
   Filter: ((device = 1) AND (("timestamp")::timestamp with time
zone > (now() - '00:05:00'::interval)))
 Total runtime: 5279.391 ms
(6 rows)

explain analyze SELECT distinct on (index) index, status, value FROM _values
WHERE device=1 and timestamp>'2007-04-22 21:20' ORDER by index, timestamp
desc;
    QUERY
PLAN    

---
 Unique  (cost=703.45..703.47 rows=1 width=24) (actual time=4.807..4.867
rows=10 loops=1)
   ->  Sort  (cost=703.45..703.46 rows=5 width=24) (actual time=4.804..4.827
rows=31 loops=1)
 Sort Key: "index", "timestamp"
 ->  Index Scan using _values_dbidx_idx on _values 
(cost=0.00..703.39 rows=5 width=24) (actual time=0.260..4.728 rows=31
loops=1)
   Index Cond: ("timestamp" > '2007-04-22 21:20:00'::timestamp
without time zone)
   Filter: (device = 1)
 Total runtime: 4.958 ms
(7 rows)


-- 
List przeskanowano programem ArcaMail, ArcaVir 2007
przeskanowano 2007-04-23 19:20:29, silnik: 2007.01.01 12:00:00, bazy: 
2007.04.15 09:21:20
This message has been scanned by ArcaMail, ArcaVir 2007
scanned 2007-04-23 19:20:29, engine: 2007.01.01 12:00:00, base: 2007.04.15 
09:21:20
http://www.arcabit.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Index usage

2006-08-21 Thread Scott Matseas


Tom Lane wrote:

What other planner parameters have you been fooling with?

Hi Tom,
The other parameters that have been changed are:
set join_collapse_limit to 1
set enable_sort to off

We are using version 8.1.3. We've noticed the query plan changing depending
on the amount of data in the tables especially when the query looks at 
more rows

in tableA. The parameter work_mem is set to 262,144.

Thanks,
Scott





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Index usage

2006-08-21 Thread Tom Lane

Scott Matseas <[EMAIL PROTECTED]> writes:
> If I enable sequential scan the Index Cond in
> question gets replaced with a Seq scan.

What other planner parameters have you been fooling with?

With no data in the tables, I get a reasonably sane-looking plan,
so I'm thinking you've chosen bad values for something or other
(starting with enable_seqscan = off ;-))

explain
 SELECT * FROM 
   last_summarized ls
   JOIN tableA s ON s.table_idA > ls.summarized_id AND s.table_idA
 <= ls.max_session_id
   LEFT JOIN tableB sf ON s.table_idA = sf.table_idA AND sf.direction = 
'a'::char
   LEFT JOIN tableB sfb ON s.table_idA = sfb.table_idA AND sfb.direction = 
'b'::char
  WHERE ls.summary_name::text = 'summary'::text ;
   QUERY PLAN   
 
-
 Nested Loop Left Join  (cost=6.16..54.51 rows=216 width=116)
   ->  Nested Loop Left Join  (cost=6.16..42.05 rows=216 width=95)
 ->  Nested Loop  (cost=6.16..29.58 rows=216 width=74)
   ->  Index Scan using last_summarized_pk on last_summarized ls  
(cost=0.00..8.02 rows=1 width=66)
 Index Cond: ((summary_name)::text = 'summary'::text)
   ->  Bitmap Heap Scan on tablea s  (cost=6.16..18.32 rows=216 
width=8)
 Recheck Cond: ((s.table_ida > ls.summarized_id) AND 
(s.table_ida <= ls.max_session_id))
 ->  Bitmap Index Scan on table_ida_pk  (cost=0.00..6.16 
rows=216 width=0)
   Index Cond: ((s.table_ida > ls.summarized_id) AND 
(s.table_ida <= ls.max_session_id))
 ->  Index Scan using tableb_unq on tableb sfb  (cost=0.00..0.05 rows=1 
width=21)
   Index Cond: ((s.table_ida = sfb.table_ida) AND (sfb.direction = 
'b'::bpchar))
   ->  Index Scan using tableb_unq on tableb sf  (cost=0.00..0.05 rows=1 
width=21)
 Index Cond: ((s.table_ida = sf.table_ida) AND (sf.direction = 
'a'::bpchar))
(13 rows)


regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] Index usage

2006-08-21 Thread Scott Matseas


We're having a problem with one of our queries being slow. It appears to be due
to the index being used to go from tableA to tableB.

Here are the tables:
CREATE TABLE tableA
(
 table_idA int8 NOT NULL DEFAULT nextval('tableA_id_seq'::regclass),
 CONSTRAINT table_idA_pk PRIMARY KEY (table_idA),
) 
WITHOUT OIDS;


CREATE TABLE tableB
(
 table_idB int8 NOT NULL DEFAULT nextval('tableB_id_seq'::regclass),
 table_idA int8 NOT NULL,
 direction char NOT NULL,
 CONSTRAINT tableB_pk PRIMARY KEY (table_idB),
 CONSTRAINT tableB_unq UNIQUE (table_idA, direction),
) 
WITHOUT OIDS;


CREATE TABLE last_summarized
(
 summary_name varchar(64) NOT NULL,
 summarized_id int8,
 max_session_id int8,
 CONSTRAINT last_summarized_pk PRIMARY KEY (summary_name)
) 
WITHOUT OIDS;


Here is the query:
explain
SELECT * FROM 
  last_summarized ls

  JOIN tableA s ON s.table_idA > ls.summarized_id AND s.table_idA
<= ls.max_session_id
  LEFT JOIN tableB sf ON s.table_idA = sf.table_idA AND sf.direction = 
'a'::"char"
  LEFT JOIN tableB sfb ON s.table_idA = sfb.table_idA AND sfb.direction = 
'b'::"char"
 WHERE ls.summary_name::text = 'summary'::text 


Size of tables in # of rows
tableA: 9,244,816
tableB: 15,398,497
last_summarized: 1


Explain of the above query:
"Hash Left Join  (cost=1811349.31..18546527.89 rows=1029087 width=294)"
"  Hash Cond: ("outer".table_idA = "inner".table_idA)"
"  ->  Hash Left Join  (cost=915760.88..7519203.61 rows=1029087 width=219)"
"Hash Cond: ("outer".table_idA = "inner".table_idA)"
"->  Nested Loop  (cost=0.00..126328.57 rows=1029087 width=144)"
"  ->  Index Scan using last_summarized_pk on last_summarized ls  
(cost=0.00..5.98 rows=1 width=82)"
"Index Cond: ((summary_name)::text = 'summary'::text)"
"  ->  Index Scan using table_idA_pk on tableA s  (cost=0.00..110886.29 
rows=1029087 width=62)"
"Index Cond: ((s.table_idA > "outer".summarized_id) AND (s.table_idA <= 
"outer".max_session_id))"
"->  Hash  (cost=784763.16..784763.16 rows=8100289 width=75)"
"  ->  Bitmap Heap Scan on tableB sf  (cost=216418.55..784763.16 
rows=8100289 width=75)"
"Recheck Cond: (direction = 'a'::"char")"
"->  Bitmap Index Scan on tableB_unq  (cost=0.00..216418.55 
rows=8100289 width=0)"
"  Index Cond: (direction = 'a'::"char")" <-- 
USING part of Index
"  ->  Hash  (cost=775968.61..775968.61 rows=7396725 width=75)"
"->  Bitmap Heap Scan on tableB sfb  (cost=216418.55..775968.61 rows=7396725 
width=75)"
"  Recheck Cond: (direction = 'b'::"char")"
"  ->  Bitmap Index Scan on tableB_unq  (cost=0.00..216418.55 
rows=7396725 width=0)"
"Index Cond: (direction = 'b'::"char")" <-- USING 
part of Index


From the above explain see inline comment("<-- USING part of Index"). The 
table_idA column

looks like it is being ignored in the index Cond. If I enable sequential scan 
the Index Cond in
question gets replaced with a Seq scan.

Also if I disable enable_bitmapscan sometimes both columns of the index(tableB_unq) will be 
used.


Does anyone know why we're experiencing this behavior?




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] index usage

2006-07-28 Thread Tom Lane

Ben <[EMAIL PROTECTED]> writes:
> It's volatile, but it will always return an integer.

If it's volatile then it can't be used for an index condition.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] index usage

2006-07-28 Thread Mark Lewis

A volatile function has may return a different result for each row;
think of the random() or nextval() functions for example.  You wouldn't
want them to return the same value for each row returned.

-- Mark Lewis

On Fri, 2006-07-28 at 13:59 -0700, Ben wrote:
> It's volatile, but it will always return an integer.
> 
> On Fri, 28 Jul 2006, Daniel Caune wrote:
> 
> >> De : [EMAIL PROTECTED] [mailto:pgsql-performance-
> >> [EMAIL PROTECTED] De la part de Ben
> >> Envoyé : vendredi, juillet 28, 2006 15:21
> >> À : pgsql-performance@postgresql.org
> >> Objet : [PERFORM] index usage
> >>
> >> I have a table with 37000 rows, an integer column, and an index on that
> >> column. I've got a function that returns an integer. When I do a select
> >> where I restrict that column to being equal to a static number, explain
> >> tells me the index will be used. When I do the same thing but use the
> >> function instead of a static number, explain shows me a full scan on the
> >> table.
> >>
> >> I must be missing something, because my understanding is that the function
> >> will be evaluated once for the statement and then collapsed into a static
> >> number for the filtering. But the results of the explain seem to imply
> >> that's not the case?
> >>
> >
> > Is your function IMMUTABLE, STABLE or VOLATILE?
> >
> > --
> > Daniel
> >
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] index usage

2006-07-28 Thread Ben


It's volatile, but it will always return an integer.

On Fri, 28 Jul 2006, Daniel Caune wrote:


De : [EMAIL PROTECTED] [mailto:pgsql-performance-
[EMAIL PROTECTED] De la part de Ben
Envoyé : vendredi, juillet 28, 2006 15:21
À : pgsql-performance@postgresql.org
Objet : [PERFORM] index usage

I have a table with 37000 rows, an integer column, and an index on that
column. I've got a function that returns an integer. When I do a select
where I restrict that column to being equal to a static number, explain
tells me the index will be used. When I do the same thing but use the
function instead of a static number, explain shows me a full scan on the
table.

I must be missing something, because my understanding is that the function
will be evaluated once for the statement and then collapsed into a static
number for the filtering. But the results of the explain seem to imply
that's not the case?



Is your function IMMUTABLE, STABLE or VOLATILE?

--
Daniel


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] index usage

2006-07-28 Thread Daniel Caune

> De : [EMAIL PROTECTED] [mailto:pgsql-performance-
> [EMAIL PROTECTED] De la part de Ben
> Envoyé : vendredi, juillet 28, 2006 15:21
> À : pgsql-performance@postgresql.org
> Objet : [PERFORM] index usage
> 
> I have a table with 37000 rows, an integer column, and an index on that
> column. I've got a function that returns an integer. When I do a select
> where I restrict that column to being equal to a static number, explain
> tells me the index will be used. When I do the same thing but use the
> function instead of a static number, explain shows me a full scan on the
> table.
> 
> I must be missing something, because my understanding is that the function
> will be evaluated once for the statement and then collapsed into a static
> number for the filtering. But the results of the explain seem to imply
> that's not the case?
> 

Is your function IMMUTABLE, STABLE or VOLATILE?

--
Daniel

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[PERFORM] index usage

2006-07-28 Thread Ben

I have a table with 37000 rows, an integer column, and an index on that 
column. I've got a function that returns an integer. When I do a select 
where I restrict that column to being equal to a static number, explain 
tells me the index will be used. When I do the same thing but use the 
function instead of a static number, explain shows me a full scan on the 
table.


I must be missing something, because my understanding is that the function 
will be evaluated once for the statement and then collapsed into a static 
number for the filtering. But the results of the explain seem to imply 
that's not the case?


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Index Usage using IN

2006-02-01 Thread Hari Warrier



Select ID from TableA where not exists ( Select ID from Table B where ID 
= TableA.ID)
might give you index scan. Of course, that is only useful is TableA is 
very small table.

Not appropriate for 250k rows

on 2/1/2006 12:12 PM Ralph Mason said the following:

Hi,

I have 2 tables both have an index on ID (both ID columns are an oid).

I want to find only only rows in one and not the other.

Select ID from TableA where ID not IN ( Select ID from Table B)

This always generates sequential scans.

Table A has about 250,000 rows.   Table B has about 250,000 Rows.

We should get a Scan on Table B and a Index Lookup on Table A.

Is there any way to force this?  enable_seqscan off doesn't help at all.

The Plan is

Seq Scan on tablea(cost=1.00..23883423070450.96 rows=119414 
width=4)

 Filter: (NOT (subplan))"
 SubPlan-> Seq Scan on tableb (cost=1.00..14611.17 
rows=242617 width=4)



Thanks
Ralph




---(end of broadcast)---
TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Index Usage using IN

2006-02-01 Thread Bruno Wolff III

On Thu, Feb 02, 2006 at 09:12:59 +1300,
  Ralph Mason <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> I have 2 tables both have an index on ID (both ID columns are an oid).
> 
> I want to find only only rows in one and not the other.
> 
> Select ID from TableA where ID not IN ( Select ID from Table B)
> 
> This always generates sequential scans.
> 
> Table A has about 250,000 rows.   Table B has about 250,000 Rows.
> 
> We should get a Scan on Table B and a Index Lookup on Table A.

I don't think that is going to work if there are NULLs in table B.
I don't know whether or not Postgres has code to special case NULL testing
(either for constraints ruling them out, or doing probes for them in addition
to the key it is trying to match) for doing NOT IN. Just doing a simple
index probe into table A isn't going to tell you all you need to know if
you don't find a match.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Index Usage using IN

2006-02-01 Thread Tom Lane

"Jeffrey W. Baker" <[EMAIL PROTECTED]> writes:
> On Thu, 2006-02-02 at 09:12 +1300, Ralph Mason wrote:
>> Select ID from TableA where ID not IN ( Select ID from Table B)

> Have you considered this:

> SELECT ID from TableA EXCEPT Select ID from Table B

Also, increasing work_mem might persuade the planner to try a hashed
subplan, which'd be a lot better than what you have.  Note that it's
quite unlikely that indexes are going to help for this.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Index Usage using IN

2006-02-01 Thread Jeffrey W. Baker

On Wed, 2006-02-01 at 12:22 -0800, Jeffrey W. Baker wrote:
> On Thu, 2006-02-02 at 09:12 +1300, Ralph Mason wrote:
> > Hi,
> > 
> > I have 2 tables both have an index on ID (both ID columns are an oid).
> > 
> > I want to find only only rows in one and not the other.
> > 
> > Select ID from TableA where ID not IN ( Select ID from Table B)
> 
> Have you considered this:
> 
> SELECT ID from TableA EXCEPT Select ID from Table B

Alternately:

   SELECT a.ID 
 FROM TableA AS a 
LEFT JOIN TableB AS b 
   ON a.ID = b.ID 
WHERE b.ID IS NULL

-jwb

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Index Usage using IN

2006-02-01 Thread Jeffrey W. Baker

On Thu, 2006-02-02 at 09:12 +1300, Ralph Mason wrote:
> Hi,
> 
> I have 2 tables both have an index on ID (both ID columns are an oid).
> 
> I want to find only only rows in one and not the other.
> 
> Select ID from TableA where ID not IN ( Select ID from Table B)

Have you considered this:

SELECT ID from TableA EXCEPT Select ID from Table B

?

-jwb


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[PERFORM] Index Usage using IN

2006-02-01 Thread Ralph Mason


Hi,

I have 2 tables both have an index on ID (both ID columns are an oid).

I want to find only only rows in one and not the other.

Select ID from TableA where ID not IN ( Select ID from Table B)

This always generates sequential scans.

Table A has about 250,000 rows.   Table B has about 250,000 Rows.

We should get a Scan on Table B and a Index Lookup on Table A.

Is there any way to force this?  enable_seqscan off doesn't help at all.

The Plan is

Seq Scan on tablea(cost=1.00..23883423070450.96 rows=119414 width=4)
 Filter: (NOT (subplan))"
 SubPlan-> 
Seq Scan on tableb (cost=1.00..14611.17 rows=242617 
width=4)



Thanks
Ralph




---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Index usage for sorted query

2004-11-22 Thread Markus Schaber

Hi, Pierre-Frédéric,

On Sat, 20 Nov 2004 17:12:43 +0100
Pierre-Frédéric Caillaud <[EMAIL PROTECTED]> wrote:

> > WHERE cd='ca' ORDER BY l_postcode;
> 
> Write :
> 
> > WHERE cd='ca' ORDER BY cd, l_postcode;
> 
> You have a multicolumn index, so you should specify a multicolumn sort  
> exactly the same as your index, and the planner will get it.

Thanks, that seems to help.

Seems weird to order by a column that is all the same value, but well,
why not :-)

Thanks a lot,
Markus


-- 
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:[EMAIL PROTECTED] | www.logi-track.com

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Index usage for sorted query

2004-11-20 Thread Tom Lane

Markus Schaber <[EMAIL PROTECTED]> writes:
> But as it fetches all the rows through the index, why doesn't it
> recognize that, fetching this way, the rows are already sorted by
> l_postcode?

Tell it to "ORDER BY cd, l_postcode".

> Is Postgresql 8 more intelligend in this case?

No.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Index usage for sorted query

2004-11-20 Thread Pierre-Frédéric Caillaud

Instead of :
WHERE cd='ca' ORDER BY l_postcode;
Write :
WHERE cd='ca' ORDER BY cd, l_postcode;
You have a multicolumn index, so you should specify a multicolumn sort  
exactly the same as your index, and the planner will get it.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html

[PERFORM] Index usage for sorted query

2004-11-20 Thread Markus Schaber

Hello,

I have the following query plan:

logigis=# explain SELECT geom, ref_in_id as ref, nref_in_id as nref, st_name as 
name, substr(l_postcode,1,2) as postfirst,  func_class as level FROM 
schabi.streets WHERE cd='ca' ORDER BY l_postcode;
QUERY PLAN  
   
---
 Sort  (cost=2950123.42..2952466.07 rows=937059 width=290)
   Sort Key: l_postcode
   ->  Index Scan using streets_name_idx on streets  (cost=0.00..2857177.57 
rows=937059 width=290)
 Index Cond: ((cd)::text = 'ca'::text)


And I have, beside others, the following index:
»streets_name_idx« btree (cd, l_postcode)

As the query plan shows, my postgresql 7.4 does fine on using the index
for the WHERE clause.

But as it fetches all the rows through the index, why doesn't it
recognize that, fetching this way, the rows are already sorted by
l_postcode?

As I have a larger set of data, it nearly breaks down our developer
machine every time we do this, as it always creates a temporary copy of
the large amount of data to sort it (setting sort_mem higher makes it
swap, setting it lower makes it thrashing disk directly).

Is Postgresql 8 more intelligend in this case?

Thanks for your hints,
Markus 

-- 
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:[EMAIL PROTECTED] | www.logi-track.com

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] index usage

2004-04-28 Thread Tom Lane

"scott.marlowe" <[EMAIL PROTECTED]> writes:
> There are tons of hints that it works this way in how they're written, but 
> nothing that just comes out and says that with pgsql's mvcc 
> implementation, an index scan still has to hit the pages that contain the 
> tuples, so often in pgsql a seq scan is a win where in other databases and 
> index scan would have been a win?

> If not, where would I add it if I were going to write something up for the 
> docs?  Just wondering...

AFAIR the only place in the docs that mentions seqscan or indexscan at
all is the discussion of EXPLAIN in "Performance Tips".  Perhaps a
suitably-enlarged version of that section could cover this.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] index usage

2004-04-28 Thread scott.marlowe

On Mon, 26 Apr 2004, Stephan Szabo wrote:

> 
> On Fri, 23 Apr 2004 [EMAIL PROTECTED] wrote:
> 
> > I have a query which I think should be using an index all of the time but
> > postgres only uses the index part of the time.  The index
> > (ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed
> > first followed by the selected column (support_person_id).  Wouldn't the
> > most efficient plan be to scan the index each time because the only columns
> > needed are in the index?  Below is the table, 2 queries showing the
> 
> Not necessarily.  The rows in the actual file still need to be checked to
> see if they're visible to the select and if it's expected that the entire
> file (or a reasonable % of the pages anyway) will need to be loaded using
> the index isn't necessarily a win.

While those of us familiar with PostgreSQL are well aware of the fact that 
indexes can't be used directly to garner information, but only as a lookup 
to a tuple in the table, it seems this misconception is quite common among 
those coming to postgreSQL from other databases.

Is there any information that directly reflects this issue in the docs?  
There are tons of hints that it works this way in how they're written, but 
nothing that just comes out and says that with pgsql's mvcc 
implementation, an index scan still has to hit the pages that contain the 
tuples, so often in pgsql a seq scan is a win where in other databases and 
index scan would have been a win?

If not, where would I add it if I were going to write something up for the 
docs?  Just wondering...

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] index usage

2004-04-26 Thread Stephan Szabo

On Fri, 23 Apr 2004 [EMAIL PROTECTED] wrote:

> I have a query which I think should be using an index all of the time but
> postgres only uses the index part of the time.  The index
> (ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed
> first followed by the selected column (support_person_id).  Wouldn't the
> most efficient plan be to scan the index each time because the only columns
> needed are in the index?  Below is the table, 2 queries showing the

Not necessarily.  The rows in the actual file still need to be checked to
see if they're visible to the select and if it's expected that the entire
file (or a reasonable % of the pages anyway) will need to be loaded using
the index isn't necessarily a win.

> athenapost=> explain analyze select distinct support_person_id from
> ticket_crm_map where crm_id = 1;
>   QUERY PLAN
>  
>  -
> Unique  (cost=10911.12..11349.26 rows=32 width=4) (actual
> time=659.102..791.517 rows=24 loops=1)
>   ->  Sort  (cost=10911.12..11130.19 rows=87628 width=4) (actual
> time=659.090..713.285 rows=93889 loops=1)
> Sort Key: support_person_id
> ->  Seq Scan on ticket_crm_map  (cost=0.00..3717.25 rows=87628
> width=4) (actual time=0.027..359.299 rows=93889 loops=1)
>   Filter: (crm_id = 1)
> Total runtime: 814.601 ms

How far off is this from the index scan version in time?  Try doing
set enable_seqscan=off; and then explain analyzing again.
It's possible that you may wish to lower random_page_cost to change the
estimated effect of how much more expensive random reads are compared to
sequential ones.

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] index usage

2004-04-26 Thread brad-pgperf

When checking an index in postgres the original table has to be checked for 
each result to find if the index entry is still valid?  In which case you 
can't blindly scan the whole index and assume the data is good. I was used 
to Oracle behavior where the index is up to date so it can do the scan 
without hitting the original table. 

Does this sound correct to anyone? 

Thanks,
Brad 

Stephan Szabo writes:
On Fri, 23 Apr 2004 [EMAIL PROTECTED] wrote: 

I have a query which I think should be using an index all of the time but
postgres only uses the index part of the time.  The index
(ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed
first followed by the selected column (support_person_id).  Wouldn't the
most efficient plan be to scan the index each time because the only columns
needed are in the index?  Below is the table, 2 queries showing the
Not necessarily.  The rows in the actual file still need to be checked to
see if they're visible to the select and if it's expected that the entire
file (or a reasonable % of the pages anyway) will need to be loaded using
the index isn't necessarily a win. 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[PERFORM] index usage

2004-04-26 Thread brad-pgperf

Hi, 

I have a query which I think should be using an index all of the time but 
postgres only uses the index part of the time.  The index 
(ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed 
first followed by the selected column (support_person_id).  Wouldn't the 
most efficient plan be to scan the index each time because the only columns 
needed are in the index?  Below is the table, 2 queries showing the 
difference in plans, followed by the record distribution of ticket_crm_map.  
I first did a 'vacuum analyze' to update the statistics. 

Thanks,
Brad 

athenapost=> \d ticket_crm_map
 Table "public.ticket_crm_map"
   Column |Type | 
Modifiers
+-+- 
---
tcrm_map_id| integer | not null
ticket_id  | integer | not null
crm_id | integer | not null
support_person_id  | integer | not null
escalated_to_person_id | integer | not null
status | character varying(50)   | not null default 
'Open'::character varying
close_date | timestamp without time zone |
updated_date   | timestamp without time zone |
updated_by | character varying(255)  |
created_date   | timestamp without time zone |
created_by | character varying(255)  |
additional_info| text|
subject| character varying(255)  |
Indexes:
  "ticket_crm_map_pkey" primary key, btree (tcrm_map_id)
  "ticket_crm_map_crm_id_key" unique, btree (crm_id, ticket_id)
  "ticket_crm_map_crm_id_suppid" btree (crm_id, support_person_id)
  "ticket_crm_map_status" btree (status)
  "ticket_crm_map_ticket_id" btree (ticket_id)
Foreign-key constraints:
  "$1" FOREIGN KEY (ticket_id) REFERENCES ticket(ticket_id)
  "$2" FOREIGN KEY (crm_id) REFERENCES company_crm(crm_id)
  "$3" FOREIGN KEY (support_person_id) REFERENCES person(person_id)
  "$4" FOREIGN KEY (escalated_to_person_id) REFERENCES person(person_id)
  "$5" FOREIGN KEY (status) REFERENCES ticket_status(status) 

athenapost=> explain analyze select distinct support_person_id from 
ticket_crm_map where crm_id = 7;
  
QUERY PLAN
 
 
--
Unique  (cost=1262.99..1265.27 rows=1 width=4) (actual time=15.335..18.245 
rows=20 loops=1)
 ->  Sort  (cost=1262.99..1264.13 rows=456 width=4) (actual 
time=15.332..16.605 rows=2275 loops=1)
   Sort Key: support_person_id
   ->  Index Scan using ticket_crm_map_crm_id_suppid on ticket_crm_map  
(cost=0.00..1242.85 rows=456 width=4) (actual time=0.055..11.281 rows=2275 
loops=1)
 Index Cond: (crm_id = 7)
Total runtime: 18.553 ms
(6 rows) 

Time: 20.598 ms
athenapost=> explain analyze select distinct support_person_id from 
ticket_crm_map where crm_id = 1;
 QUERY PLAN
 
-
Unique  (cost=10911.12..11349.26 rows=32 width=4) (actual 
time=659.102..791.517 rows=24 loops=1)
 ->  Sort  (cost=10911.12..11130.19 rows=87628 width=4) (actual 
time=659.090..713.285 rows=93889 loops=1)
   Sort Key: support_person_id
   ->  Seq Scan on ticket_crm_map  (cost=0.00..3717.25 rows=87628 
width=4) (actual time=0.027..359.299 rows=93889 loops=1)
 Filter: (crm_id = 1)
Total runtime: 814.601 ms
(6 rows) 

Time: 817.095 ms
athenapost=> select count(*), crm_id from ticket_crm_map group by crm_id;
count | crm_id
---+
2554 | 63
 129 | 25
  17 | 24
 110 | 23
  74 | 22
  69 | 21
   2 | 20
  53 | 82
  10 | 17
  16 | 81
46637 | 16
  14 | 80
   2 | 15
1062 | 79
  87 | 78
  93 | 77
  60 | 44
 363 | 76
 225 | 10
   4 | 74
  83 |  9
  27 | 73
 182 |  8
2275 |  7
  15 | 71
 554 |  6
  44 | 70
 631 |  5
  37 |  4
 190 |  3
 112 |  2
93889 |  1
(32 rows) 

Time: 436.697 ms
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

83 matches

Mail list logo