Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-04 Thread Jim Nasby

On Apr 2, 2006, at 6:30 PM, Josh Berkus wrote:

But just as a follow up question to your #1 suggestion, I have 8 GB
of ram in my production server. You're saying to set the
effective_cache_size then to 5 GB roughly? Somewhere around 655360?
Currently it is set to 65535. Is that something that's OS dependent?
I'm not sure how much memory my server sets aside for disk caching.


Yes, about.  It's really a judgement call; you're looking for the  
approximate
combined RAM available for disk caching and shared mem.  However,  
this is
just used as a way of estimating the probability that the data you  
want is
cached in memory, so you're just trying to be order-of-magnitude  
accurate,

not to-the-MB accurate.


FWIW, I typically set effective_cache_size to the amount of memory in  
the machine minus 1G for the OS and various other daemons, etc. But  
as Josh said, as long as your somewhere in the ballpark it's probably  
good enough.

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-04 Thread Jim Nasby

On Apr 1, 2006, at 12:51 PM, Brendan Duddridge wrote:
from SELECT * FROM pg_stats WHERE tablename='table' AND  
attname='category_id'


I find correlation on category_product for category_id is 0.643703

Would setting the index on category_id to be clustered help with this?


It would absolutely help on the query in question. In my experience,  
a correlation of 0.64 is too low to allow an index scan to be used  
for anything but a tiny number of rows.

--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-02 Thread Josh Berkus
Brendan,

 But just as a follow up question to your #1 suggestion, I have 8 GB
 of ram in my production server. You're saying to set the
 effective_cache_size then to 5 GB roughly? Somewhere around 655360?
 Currently it is set to 65535. Is that something that's OS dependent?
 I'm not sure how much memory my server sets aside for disk caching.

Yes, about.  It's really a judgement call; you're looking for the approximate 
combined RAM available for disk caching and shared mem.  However, this is 
just used as a way of estimating the probability that the data you want is 
cached in memory, so you're just trying to be order-of-magnitude accurate, 
not to-the-MB accurate.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-02 Thread Brendan Duddridge

Hi Josh,

Thanks. I've adjusted my effective_cache_size to 5 GB, so we'll see  
how that goes.


I'm also doing some query and de-normalization optimizations so we'll  
see how those go too.



Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 2, 2006, at 4:30 PM, Josh Berkus wrote:


Brendan,


But just as a follow up question to your #1 suggestion, I have 8 GB
of ram in my production server. You're saying to set the
effective_cache_size then to 5 GB roughly? Somewhere around 655360?
Currently it is set to 65535. Is that something that's OS dependent?
I'm not sure how much memory my server sets aside for disk caching.


Yes, about.  It's really a judgement call; you're looking for the  
approximate
combined RAM available for disk caching and shared mem.  However,  
this is
just used as a way of estimating the probability that the data you  
want is
cached in memory, so you're just trying to be order-of-magnitude  
accurate,

not to-the-MB accurate.

--
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

   http://archives.postgresql.org





smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-01 Thread Jim C. Nasby
On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote:
 On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote:
  Hi Jim,
 
  I'm not quite sure what you mean by the correlation of category_id?
 
 It means how many distinct values does it have (at least that's my
 understanding of it ;) ).

Your understanding is wrong. :) What you're discussing is n_distinct.

http://www.postgresql.org/docs/8.1/interactive/view-pg-stats.html

correlation: Statistical correlation between physical row ordering and
logical ordering of the column values. This ranges from -1 to +1. When
the value is near -1 or +1, an index scan on the column will be
estimated to be cheaper than when it is near zero, due to reduction of
random access to the disk. (This column is NULL if the column data type
does not have a  operator.)

In other words, the following will have a correlation of 1:

1
2
3
...
998
999
1000

And this is -1...

1000
999
...
2
1

While this would have a very low correlation:

1
1000
2
999
...

The lower the correlation, the more expensive an index scan is, because
it's more random. As I mentioned, I believe that the current index scan
cost estimator is flawed though, because it will bias heavily against
correlations that aren't close to 1 or -1.

So, what does

SELECT * FROM pg_stats WHERE tablename='table' AND attname='category_id';

show?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-01 Thread Brendan Duddridge

Hi Jim,

from SELECT * FROM pg_stats WHERE tablename='table' AND  
attname='category_id'


I find correlation on category_product for category_id is 0.643703

Would setting the index on category_id to be clustered help with this?

Thanks,


Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 1, 2006, at 8:32 AM, Jim C. Nasby wrote:


On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote:

On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote:

Hi Jim,

I'm not quite sure what you mean by the correlation of category_id?


It means how many distinct values does it have (at least that's my
understanding of it ;) ).


Your understanding is wrong. :) What you're discussing is n_distinct.

http://www.postgresql.org/docs/8.1/interactive/view-pg-stats.html

correlation: Statistical correlation between physical row ordering  
and

logical ordering of the column values. This ranges from -1 to +1. When
the value is near -1 or +1, an index scan on the column will be
estimated to be cheaper than when it is near zero, due to reduction of
random access to the disk. (This column is NULL if the column data  
type

does not have a  operator.)

In other words, the following will have a correlation of 1:

1
2
3
...
998
999
1000

And this is -1...

1000
999
...
2
1

While this would have a very low correlation:

1
1000
2
999
...

The lower the correlation, the more expensive an index scan is,  
because
it's more random. As I mentioned, I believe that the current index  
scan

cost estimator is flawed though, because it will bias heavily against
correlations that aren't close to 1 or -1.

So, what does

SELECT * FROM pg_stats WHERE tablename='table' AND  
attname='category_id';


show?
--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

   http://archives.postgresql.org





smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-01 Thread chris smith
On 4/2/06, Jim C. Nasby [EMAIL PROTECTED] wrote:
 On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote:
  On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote:
   Hi Jim,
  
   I'm not quite sure what you mean by the correlation of category_id?
 
  It means how many distinct values does it have (at least that's my
  understanding of it ;) ).

 Your understanding is wrong. :) What you're discussing is n_distinct.

Geez, I'm going well this week ;)

Thanks for the detailed info.

--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-01 Thread chris smith
On 4/2/06, chris smith [EMAIL PROTECTED] wrote:
 On 4/2/06, Jim C. Nasby [EMAIL PROTECTED] wrote:
  On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote:
   On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote:
Hi Jim,
   
I'm not quite sure what you mean by the correlation of category_id?
  
   It means how many distinct values does it have (at least that's my
   understanding of it ;) ).
 
  Your understanding is wrong. :) What you're discussing is n_distinct.

rant
It'd be nice if the database developers agreed on what terms meant.

http://dev.mysql.com/doc/refman/5.1/en/myisam-index-statistics.html

The SHOW INDEX statement displays a cardinality value based on N/S,
where N is the number of rows in the table and S is the average value
group size. That ratio yields an approximate number of value groups in
the table.
/rant

A work colleague found that information a few weeks ago so that's
where my misunderstanding came from - if I'm reading that right they
use n_distinct as their cardinality basis.. then again I could be
reading that completely wrong too.

I believe postgres (because it's a lot more standards compliant).. but
sheesh - what a difference!

This week's task - stop reading mysql documentation.

--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-01 Thread Alvaro Herrera
chris smith wrote:

 I believe postgres (because it's a lot more standards compliant).. but
 sheesh - what a difference!
 
 This week's task - stop reading mysql documentation.

You don't _have_ to believe Postgres -- this is stuff taught in any
statistics course.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-01 Thread Mark Kirkwood

chris smith wrote:


rant
It'd be nice if the database developers agreed on what terms meant.

http://dev.mysql.com/doc/refman/5.1/en/myisam-index-statistics.html

The SHOW INDEX statement displays a cardinality value based on N/S,
where N is the number of rows in the table and S is the average value
group size. That ratio yields an approximate number of value groups in
the table.
/rant

A work colleague found that information a few weeks ago so that's
where my misunderstanding came from - if I'm reading that right they
use n_distinct as their cardinality basis.. then again I could be
reading that completely wrong too.



Yeah that's right - e.g using the same table in postgres and mysql:

pgsql SELECT attname,n_distinct,correlation
   FROM pg_stats
   WHERE tablename='fact0'
   AND attname LIKE 'd%key';
 attname | n_distinct | correlation
-++-
 d0key   |  1 |  -0.0211169
 d1key   |100 |0.124012
 d2key   | 10 |0.998393
(3 rows)


mysql SHOW INDEX FROM fact0
- ;
+---++-+--+-+---+-+--++--++-+
| Table | Non_unique | Key_name| Seq_in_index | Column_name |
Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---++-+--+-+---+-+--++--++-+
| fact0 |  1 | fact0_d0key |1 | d0key   | A
|   1 | NULL | NULL   |  | BTREE  | |
| fact0 |  1 | fact0_d1key |1 | d1key   | A
| 100 | NULL | NULL   |  | BTREE  | |
| fact0 |  1 | fact0_d2key |1 | d2key   | A
|  10 | NULL | NULL   |  | BTREE  | |
+---++-+--+-+---+-+--++--++-+
3 rows in set (0.00 sec)


It is a bit confusing - '(distinct) cardinality' might be a better
heading for their 'cardinality' column!

On the correlation business - I don't think Mysql calculates it (or if
it does, its not displayed).



I believe postgres (because it's a lot more standards compliant).. but
sheesh - what a difference!



Well yes - however, to be fair to the Mysql guys, AFAICS the capture and 
display of index stats (and any other optimizer related data) is not 
part of any standard.



Cheers

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-01 Thread Tom Lane
Mark Kirkwood [EMAIL PROTECTED] writes:
 It is a bit confusing - '(distinct) cardinality' might be a better
 heading for their 'cardinality' column!

The usual mathematical meaning of cardinality is the number of
members in a set.  That isn't real helpful for the point at hand,
because the mathematical definition of a set disallows duplicate
members, so if you're dealing with non-unique values you could argue it
either way about whether to count duplicates or not.  However, I read in
the SQL99 spec (3.1 Definitions)

 d) cardinality (of a value of a collection type): The number of
elements in that value. Those elements need not necessarily have
distinct values.

so ... as all too often ... the mysql boys have not got a clue about
standards compliance.  They are using this term in the opposite way
from how the SQL committee uses it.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-03-31 Thread Jim C. Nasby
What's the correlation of category_id? The current index scan cost
estimator places a heavy penalty on anything with a correlation much
below about 90%.

On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote:
 Hi,
 
 I have a query that is using a sequential scan instead of an index  
 scan. I've turned off sequential scans and it is in fact faster with  
 the index scan.
 
 Here's my before and after.
 
 Before:
 
 ssdev=# SET enable_seqscan TO DEFAULT;
 ssdev=# explain analyze select cp.product_id
   from category_product cp, product_attribute_value pav
   where cp.category_id = 1001082 and cp.product_id = 
   pav.product_id;
 
 
QUERY PLAN
  
  
 --
 Hash Join  (cost=25.52..52140.59 rows=5139 width=4) (actual  
 time=4.521..2580.520 rows=19695 loops=1)
Hash Cond: (outer.product_id = inner.product_id)
-  Seq Scan on product_attribute_value pav  (cost=0.00..40127.12  
 rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846 loops=1)
-  Hash  (cost=23.10..23.10 rows=970 width=4) (actual  
 time=2.267..2.267 rows=1140 loops=1)
  -  Index Scan using x_category_product__category_id_fk_idx  
 on category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual  
 time=0.122..1.395 rows=1140 loops=1)
Index Cond: (category_id = 1001082)
 Total runtime: 2584.221 ms
 (7 rows)
 
 
 After:
 
 ssdev=# SET enable_seqscan TO false;
 ssdev=# explain analyze select cp.product_id
   from category_product cp, product_attribute_value pav
   where cp.category_id = 1001082 and cp.product_id = 
   pav.product_id;
 
 
   QUERY PLAN
  
  
 -
 Nested Loop  (cost=0.00..157425.22 rows=5139 width=4) (actual  
 time=0.373..71.177 rows=19695 loops=1)
-  Index Scan using x_category_product__category_id_fk_idx on  
 category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual  
 time=0.129..1.438 rows=1140 loops=1)
  Index Cond: (category_id = 1001082)
-  Index Scan using product_attribute_value__product_id_fk_idx  
 on product_attribute_value pav  (cost=0.00..161.51 rows=61 width=4)  
 (actual time=0.016..0.053 rows=17 loops=1140)
  Index Cond: (outer.product_id = pav.product_id)
 Total runtime: 74.747 ms
 (6 rows)
 
 There's quite a big difference in speed there. 2584.221 ms vs. 74.747  
 ms.
 
 Any ideas what I can do to improve this without turning sequential  
 scanning off?
 
 Thanks,
 
 
 Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]
 
 ClickSpace Interactive Inc.
 Suite L100, 239 - 10th Ave. SE
 Calgary, AB  T2G 0V9
 
 http://www.clickspace.com
 



-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-03-31 Thread Brendan Duddridge

Hi Jim,

I'm not quite sure what you mean by the correlation of category_id?
The category_id is part of a compound primary key in the  
category_product

table. The primary key on category_product is (category_id, product_id).

Here's the definitions of the two tables involved in the join:

Table public.category_product
   Column| Type | Modifiers
-+--+---
category_id | integer  | not null
product_id  | integer  | not null
en_name_sort_order  | integer  |
fr_name_sort_order  | integer  |
merchant_sort_order | integer  |
price_sort_order| integer  |
merchant_count  | integer  |
is_active   | character varying(5) |
Indexes:
x_category_product_pk PRIMARY KEY, btree (category_id,  
product_id)

category_product__is_active_idx btree (is_active)
category_product__merchant_sort_order_idx btree  
(merchant_sort_order)
x_category_product__category_id_fk_idx btree (category_id)  
CLUSTER

x_category_product__product_id_fk_idx btree (product_id)
Foreign-key constraints:
x_category_product_category_fk FOREIGN KEY (category_id)  
REFERENCES category(category_id) DEFERRABLE INITIALLY DEFERRED
x_category_product_product_fk FOREIGN KEY (product_id)  
REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED




 Table public.product_attribute_value
   Column   | Type  | Modifiers
+---+---
attribute_id   | integer   | not null
attribute_unit_id  | integer   |
attribute_value_id | integer   |
boolean_value  | character varying(5)  |
decimal_value  | numeric(30,10)|
product_attribute_value_id | integer   | not null
product_id | integer   | not null
product_reference_id   | integer   |
status_code| character varying(32) |
Indexes:
product_attribute_value_pk PRIMARY KEY, btree  
(product_attribute_value_id)

product_attribute_value__attribute_id_fk_idx btree (attribute_id)
product_attribute_value__attribute_unit_id_fk_idx btree  
(attribute_unit_id)
product_attribute_value__attribute_value_id_fk_idx btree  
(attribute_value_id)

product_attribute_value__product_id_fk_idx btree (product_id)
product_attribute_value__product_reference_id_fk_idx btree  
(product_reference_id)

Foreign-key constraints:
product_attribute_value_attribute_fk FOREIGN KEY  
(attribute_id) REFERENCES attribute(attribute_id) DEFERRABLE  
INITIALLY DEFERRED
product_attribute_value_attributeunit_fk FOREIGN KEY  
(attribute_unit_id) REFERENCES attribute_unit(attribute_unit_id)  
DEFERRABLE INITIALLY DEFERRED
product_attribute_value_attributevalue_fk FOREIGN KEY  
(attribute_value_id) REFERENCES attribute_value(attribute_value_id)  
DEFERRABLE INITIALLY DEFERRED
product_attribute_value_product_fk FOREIGN KEY (product_id)  
REFERENCES product(product_id) DEFERRABLE INITIALLY DEFERRED
product_attribute_value_productreference_fk FOREIGN KEY  
(product_reference_id) REFERENCES product(product_id) DEFERRABLE  
INITIALLY DEFERRED



Not sure if that helps answer your question, but the query is pretty  
slow. Sometimes it takes 5 - 15 seconds depending on the category_id  
specified.


Thanks,


Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote:


What's the correlation of category_id? The current index scan cost
estimator places a heavy penalty on anything with a correlation much
below about 90%.

On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote:

Hi,

I have a query that is using a sequential scan instead of an index
scan. I've turned off sequential scans and it is in fact faster with
the index scan.

Here's my before and after.

Before:

ssdev=# SET enable_seqscan TO DEFAULT;
ssdev=# explain analyze select cp.product_id
from category_product cp, product_attribute_value pav
where cp.category_id = 1001082 and cp.product_id =
pav.product_id;


   QUERY PLAN
- 
---
- 
---

--
Hash Join  (cost=25.52..52140.59 rows=5139 width=4) (actual
time=4.521..2580.520 rows=19695 loops=1)
   Hash Cond: (outer.product_id = inner.product_id)
   -  Seq Scan on product_attribute_value pav  (cost=0.00..40127.12
rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846  

Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-03-31 Thread chris smith
On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote:
 Hi Jim,

 I'm not quite sure what you mean by the correlation of category_id?

It means how many distinct values does it have (at least that's my
understanding of it ;) ).

select category_id, count(*) from category_product group by category_id;

will show you how many category_id's there are and how many products
are in each category.

Having a lot of products in one category (or having a small amount of
categories) can slow things down because the db can't use the index
effectively.. which might be what you're seeing (hence why it's fast
for some categories, slow for others).


 On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote:

  What's the correlation of category_id? The current index scan cost
  estimator places a heavy penalty on anything with a correlation much
  below about 90%.
 
  On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote:
  Hi,
 
  I have a query that is using a sequential scan instead of an index
  scan. I've turned off sequential scans and it is in fact faster with
  the index scan.
 
  Here's my before and after.
 
  Before:
 
  ssdev=# SET enable_seqscan TO DEFAULT;
  ssdev=# explain analyze select cp.product_id
   from category_product cp, product_attribute_value pav
   where cp.category_id = 1001082 and cp.product_id =
   pav.product_id;
 
 
 QUERY PLAN
  -
  ---
  -
  ---
  --
  Hash Join  (cost=25.52..52140.59 rows=5139 width=4) (actual
  time=4.521..2580.520 rows=19695 loops=1)
 Hash Cond: (outer.product_id = inner.product_id)
 -  Seq Scan on product_attribute_value pav  (cost=0.00..40127.12
  rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846
  loops=1)
 -  Hash  (cost=23.10..23.10 rows=970 width=4) (actual
  time=2.267..2.267 rows=1140 loops=1)
   -  Index Scan using x_category_product__category_id_fk_idx
  on category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual
  time=0.122..1.395 rows=1140 loops=1)
 Index Cond: (category_id = 1001082)
  Total runtime: 2584.221 ms
  (7 rows)
 
 
  After:
 
  ssdev=# SET enable_seqscan TO false;
  ssdev=# explain analyze select cp.product_id
   from category_product cp, product_attribute_value pav
   where cp.category_id = 1001082 and cp.product_id =
   pav.product_id;
 
 
QUERY PLAN
  -
  ---
  -
  ---
  -
  Nested Loop  (cost=0.00..157425.22 rows=5139 width=4) (actual
  time=0.373..71.177 rows=19695 loops=1)
 -  Index Scan using x_category_product__category_id_fk_idx on
  category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual
  time=0.129..1.438 rows=1140 loops=1)
   Index Cond: (category_id = 1001082)
 -  Index Scan using product_attribute_value__product_id_fk_idx
  on product_attribute_value pav  (cost=0.00..161.51 rows=61 width=4)
  (actual time=0.016..0.053 rows=17 loops=1140)
   Index Cond: (outer.product_id = pav.product_id)
  Total runtime: 74.747 ms
  (6 rows)
 
  There's quite a big difference in speed there. 2584.221 ms vs. 74.747
  ms.
 
  Any ideas what I can do to improve this without turning sequential
  scanning off?
 
  Thanks,
 
  
  Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]
 
  ClickSpace Interactive Inc.
  Suite L100, 239 - 10th Ave. SE
  Calgary, AB  T2G 0V9
 
  http://www.clickspace.com
 
 
 
 
  --
  Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
  Pervasive Software  http://pervasive.comwork: 512-231-6117
  vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461
 
  ---(end of
  broadcast)---
  TIP 1: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that
  your
 message can get through to the mailing list cleanly
 






--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-03-31 Thread Brendan Duddridge

Ah I see. Ok, well we have a very wide variety here...

category_id | count
-+---
 1000521 | 31145
 1001211 | 22991
 1001490 | 22019
 1001628 | 12472
 146 | 10480
 187 | 10338
 1001223 | 10020
 1001560 |  9532
 1000954 |  8633
 1001314 |  8191
 1001482 |  8140
 1001556 |  7959
 1001481 |  7850
[snip...]
 1001133 | 1
 1000532 | 1
 1000691 | 1
 1000817 | 1
 1000783 | 1
 1000689 | 1

(1157 rows)

So what's the best kind of query to handle this kind of data to make  
it fast in all cases? I'd like get down to sub-second response times.


currently we have:

select cp.product_id
 from category_product cp, product_attribute_value pav
 where cp.category_id = 1001082 and cp.product_id =
 pav.product_id;


Thanks,


Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Mar 31, 2006, at 6:23 PM, chris smith wrote:


On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote:

Hi Jim,

I'm not quite sure what you mean by the correlation of category_id?


It means how many distinct values does it have (at least that's my
understanding of it ;) ).

select category_id, count(*) from category_product group by  
category_id;


will show you how many category_id's there are and how many products
are in each category.

Having a lot of products in one category (or having a small amount of
categories) can slow things down because the db can't use the index
effectively.. which might be what you're seeing (hence why it's fast
for some categories, slow for others).



On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote:


What's the correlation of category_id? The current index scan cost
estimator places a heavy penalty on anything with a correlation much
below about 90%.

On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote:

Hi,

I have a query that is using a sequential scan instead of an index
scan. I've turned off sequential scans and it is in fact faster  
with

the index scan.

Here's my before and after.

Before:

ssdev=# SET enable_seqscan TO DEFAULT;
ssdev=# explain analyze select cp.product_id
 from category_product cp, product_attribute_value pav
 where cp.category_id = 1001082 and cp.product_id =
 pav.product_id;


   QUERY PLAN
--- 
--

---
--- 
--

---
--
Hash Join  (cost=25.52..52140.59 rows=5139 width=4) (actual
time=4.521..2580.520 rows=19695 loops=1)
   Hash Cond: (outer.product_id = inner.product_id)
   -  Seq Scan on product_attribute_value pav   
(cost=0.00..40127.12

rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846
loops=1)
   -  Hash  (cost=23.10..23.10 rows=970 width=4) (actual
time=2.267..2.267 rows=1140 loops=1)
 -  Index Scan using  
x_category_product__category_id_fk_idx

on category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual
time=0.122..1.395 rows=1140 loops=1)
   Index Cond: (category_id = 1001082)
Total runtime: 2584.221 ms
(7 rows)


After:

ssdev=# SET enable_seqscan TO false;
ssdev=# explain analyze select cp.product_id
 from category_product cp, product_attribute_value pav
 where cp.category_id = 1001082 and cp.product_id =
 pav.product_id;


  QUERY PLAN
--- 
--

---
--- 
--

---
-
Nested Loop  (cost=0.00..157425.22 rows=5139 width=4) (actual
time=0.373..71.177 rows=19695 loops=1)
   -  Index Scan using x_category_product__category_id_fk_idx on
category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual
time=0.129..1.438 rows=1140 loops=1)
 Index Cond: (category_id = 1001082)
   -  Index Scan using product_attribute_value__product_id_fk_idx
on product_attribute_value pav  (cost=0.00..161.51 rows=61 width=4)
(actual time=0.016..0.053 rows=17 loops=1140)
 Index Cond: (outer.product_id = pav.product_id)
Total runtime: 74.747 ms
(6 rows)

There's quite a big difference in speed there. 2584.221 ms vs.  
74.747

ms.

Any ideas what I can do to improve this without turning sequential
scanning off?

Thanks,

___ 
_
Brendan Duddridge | CTO | 403-277-5591 x24 |   
[EMAIL PROTECTED]


ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com





--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 

[PERFORM] Query using SeqScan instead of IndexScan

2006-03-29 Thread Brendan Duddridge
Hi,I have a query that is using a sequential scan instead of an index scan. I've turned off sequential scans and it is in fact faster with the index scan.Here's my before and after.Before:ssdev=# SET enable_seqscan TO DEFAULT;ssdev=# explain analyze select cp.product_id		from category_product cp, product_attribute_value pav		where cp.category_id = 1001082 and cp.product_id = pav.product_id;                                                                                  QUERY PLAN                                                                                  -- Hash Join  (cost=25.52..52140.59 rows=5139 width=4) (actual time=4.521..2580.520 rows=19695 loops=1)   Hash Cond: ("outer".product_id = "inner".product_id)   -  Seq Scan on product_attribute_value pav  (cost=0.00..40127.12 rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846 loops=1)   -  Hash  (cost=23.10..23.10 rows=970 width=4) (actual time=2.267..2.267 rows=1140 loops=1)         -  Index Scan using x_category_product__category_id_fk_idx on category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual time=0.122..1.395 rows=1140 loops=1)               Index Cond: (category_id = 1001082) Total runtime: 2584.221 ms(7 rows)After:ssdev=# SET enable_seqscan TO false;ssdev=# explain analyze select cp.product_id		from category_product cp, product_attribute_value pav		where cp.category_id = 1001082 and cp.product_id = pav.product_id;                                                                                     QUERY PLAN                                                                                      - Nested Loop  (cost=0.00..157425.22 rows=5139 width=4) (actual time=0.373..71.177 rows=19695 loops=1)   -  Index Scan using x_category_product__category_id_fk_idx on category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual time=0.129..1.438 rows=1140 loops=1)         Index Cond: (category_id = 1001082)   -  Index Scan using product_attribute_value__product_id_fk_idx on product_attribute_value pav  (cost=0.00..161.51 rows=61 width=4) (actual time=0.016..0.053 rows=17 loops=1140)         Index Cond: ("outer".product_id = pav.product_id) Total runtime: 74.747 ms(6 rows)There's quite a big difference in speed there. 2584.221 ms vs. 74.747 ms.Any ideas what I can do to improve this without turning sequential scanning off?Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]  ClickSpace Interactive Inc.  Suite L100, 239 - 10th Ave. SE  Calgary, AB  T2G 0V9 http://www.clickspace.com   

smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-03-29 Thread Brendan Duddridge
Oops. I forgot to mention that I was using PostgreSQL 8.1.3 on Mac OS X.Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]  ClickSpace Interactive Inc.  Suite L100, 239 - 10th Ave. SE  Calgary, AB  T2G 0V9 http://www.clickspace.com   On Mar 29, 2006, at 8:12 PM, Brendan Duddridge wrote:Hi,I have a query that is using a sequential scan instead of an index scan. I've turned off sequential scans and it is in fact faster with the index scan.Here's my before and after.Before:ssdev=# SET enable_seqscan TO DEFAULT;ssdev=# explain analyze select cp.product_id		from category_product cp, product_attribute_value pav		where cp.category_id = 1001082 and cp.product_id = pav.product_id;                                                                                  QUERY PLAN                                                                                  -- Hash Join  (cost=25.52..52140.59 rows=5139 width=4) (actual time=4.521..2580.520 rows=19695 loops=1)   Hash Cond: ("outer".product_id = "inner".product_id)   -  Seq Scan on product_attribute_value pav  (cost=0.00..40127.12 rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846 loops=1)   -  Hash  (cost=23.10..23.10 rows=970 width=4) (actual time=2.267..2.267 rows=1140 loops=1)         -  Index Scan using x_category_product__category_id_fk_idx on category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual time=0.122..1.395 rows=1140 loops=1)               Index Cond: (category_id = 1001082) Total runtime: 2584.221 ms(7 rows)After:ssdev=# SET enable_seqscan TO false;ssdev=# explain analyze select cp.product_id		from category_product cp, product_attribute_value pav		where cp.category_id = 1001082 and cp.product_id = pav.product_id;                                                                                     QUERY PLAN                                                                                      - Nested Loop  (cost=0.00..157425.22 rows=5139 width=4) (actual time=0.373..71.177 rows=19695 loops=1)   -  Index Scan using x_category_product__category_id_fk_idx on category_product cp  (cost=0.00..23.10 rows=970 width=4) (actual time=0.129..1.438 rows=1140 loops=1)         Index Cond: (category_id = 1001082)   -  Index Scan using product_attribute_value__product_id_fk_idx on product_attribute_value pav  (cost=0.00..161.51 rows=61 width=4) (actual time=0.016..0.053 rows=17 loops=1140)         Index Cond: ("outer".product_id = pav.product_id) Total runtime: 74.747 ms(6 rows)There's quite a big difference in speed there. 2584.221 ms vs. 74.747 ms.Any ideas what I can do to improve this without turning sequential scanning off?Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]  ClickSpace Interactive Inc.  Suite L100, 239 - 10th Ave. SE  Calgary, AB  T2G 0V9 http://www.clickspace.com   

smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-03-29 Thread Josh Berkus
Brenden,

 Any ideas what I can do to improve this without turning sequential  
 scanning off?

Hmmm, looks like your row estimates are good.  Which means it's probably your 
postgresql.conf parameters which are off.  Try the following, in the order 
below:

1) Raise effective_cache_size to 2/3 of your RAM (remember that ecs is in 8k 
pages).  Test again.

2) Multiply all of the cpu_* costs by 0.3.  Test again.

3) Lower random_page_cost by steps to 3.5, then 3.0, then 2.5, then 2.0, 
testing each time.

These are all runtime-settable parameters, so you can test them in one query 
window, then set them in the main postgresql.conf if they work.

-- 
Josh Berkus
Sun Microsystems
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match