Re: [PERFORM] [Solved] Slow performance on Windows .NET and OleDb

2006-04-01 Thread Antoine
On 01/04/06, Magnus Hagander [EMAIL PROTECTED] wrote:
  This is a blatant thread steal... but here we go...
  Do people have any opinions on the pgsql driver?

 It's very nice.
...

Thanks for the tips - i'll try a couple of test apps soon.
Cheers
Antoine




--
This is where I should put some witty comment.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-01 Thread Jim C. Nasby
On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote:
 On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote:
  Hi Jim,
 
  I'm not quite sure what you mean by the correlation of category_id?
 
 It means how many distinct values does it have (at least that's my
 understanding of it ;) ).

Your understanding is wrong. :) What you're discussing is n_distinct.

http://www.postgresql.org/docs/8.1/interactive/view-pg-stats.html

correlation: Statistical correlation between physical row ordering and
logical ordering of the column values. This ranges from -1 to +1. When
the value is near -1 or +1, an index scan on the column will be
estimated to be cheaper than when it is near zero, due to reduction of
random access to the disk. (This column is NULL if the column data type
does not have a  operator.)

In other words, the following will have a correlation of 1:

1
2
3
...
998
999
1000

And this is -1...

1000
999
...
2
1

While this would have a very low correlation:

1
1000
2
999
...

The lower the correlation, the more expensive an index scan is, because
it's more random. As I mentioned, I believe that the current index scan
cost estimator is flawed though, because it will bias heavily against
correlations that aren't close to 1 or -1.

So, what does

SELECT * FROM pg_stats WHERE tablename='table' AND attname='category_id';

show?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-01 Thread Brendan Duddridge

Hi Jim,

from SELECT * FROM pg_stats WHERE tablename='table' AND  
attname='category_id'


I find correlation on category_product for category_id is 0.643703

Would setting the index on category_id to be clustered help with this?

Thanks,


Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 1, 2006, at 8:32 AM, Jim C. Nasby wrote:


On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote:

On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote:

Hi Jim,

I'm not quite sure what you mean by the correlation of category_id?


It means how many distinct values does it have (at least that's my
understanding of it ;) ).


Your understanding is wrong. :) What you're discussing is n_distinct.

http://www.postgresql.org/docs/8.1/interactive/view-pg-stats.html

correlation: Statistical correlation between physical row ordering  
and

logical ordering of the column values. This ranges from -1 to +1. When
the value is near -1 or +1, an index scan on the column will be
estimated to be cheaper than when it is near zero, due to reduction of
random access to the disk. (This column is NULL if the column data  
type

does not have a  operator.)

In other words, the following will have a correlation of 1:

1
2
3
...
998
999
1000

And this is -1...

1000
999
...
2
1

While this would have a very low correlation:

1
1000
2
999
...

The lower the correlation, the more expensive an index scan is,  
because
it's more random. As I mentioned, I believe that the current index  
scan

cost estimator is flawed though, because it will bias heavily against
correlations that aren't close to 1 or -1.

So, what does

SELECT * FROM pg_stats WHERE tablename='table' AND  
attname='category_id';


show?
--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of  
broadcast)---

TIP 4: Have you searched our list archives?

   http://archives.postgresql.org





smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] [Solved] Slow performance on Windows .NET and OleDb

2006-04-01 Thread Merlin Moncure
On 3/31/06, Magnus Hagander [EMAIL PROTECTED] wrote:
  This is a blatant thread steal... but here we go...
  Do people have any opinions on the pgsql driver?

 I beleive so. I've been using it for a long time with zero problems.
 While I don't use many of the exotic features in it, I doubt most people
 do ;-) Don't get scared by the claim it's in beta - IIRC there's an RC
 out any day now, and it's been stable long before 1.0. But it's always a
 good idea to browse through the list of known bugs and see if one will
 likely hit you...

Up until a few months ago the npgsql driver was missing a few features
that made it easier to work with typed datasets in the IDE...I would
use the odbc driver to create the dataset at design time and work with
it at run time with the npgsql driver.

Lately though, it seems there is no reason not use the npgsql driver.

Merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-01 Thread chris smith
On 4/2/06, Jim C. Nasby [EMAIL PROTECTED] wrote:
 On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote:
  On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote:
   Hi Jim,
  
   I'm not quite sure what you mean by the correlation of category_id?
 
  It means how many distinct values does it have (at least that's my
  understanding of it ;) ).

 Your understanding is wrong. :) What you're discussing is n_distinct.

Geez, I'm going well this week ;)

Thanks for the detailed info.

--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-01 Thread chris smith
On 4/2/06, chris smith [EMAIL PROTECTED] wrote:
 On 4/2/06, Jim C. Nasby [EMAIL PROTECTED] wrote:
  On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote:
   On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote:
Hi Jim,
   
I'm not quite sure what you mean by the correlation of category_id?
  
   It means how many distinct values does it have (at least that's my
   understanding of it ;) ).
 
  Your understanding is wrong. :) What you're discussing is n_distinct.

rant
It'd be nice if the database developers agreed on what terms meant.

http://dev.mysql.com/doc/refman/5.1/en/myisam-index-statistics.html

The SHOW INDEX statement displays a cardinality value based on N/S,
where N is the number of rows in the table and S is the average value
group size. That ratio yields an approximate number of value groups in
the table.
/rant

A work colleague found that information a few weeks ago so that's
where my misunderstanding came from - if I'm reading that right they
use n_distinct as their cardinality basis.. then again I could be
reading that completely wrong too.

I believe postgres (because it's a lot more standards compliant).. but
sheesh - what a difference!

This week's task - stop reading mysql documentation.

--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-01 Thread Alvaro Herrera
chris smith wrote:

 I believe postgres (because it's a lot more standards compliant).. but
 sheesh - what a difference!
 
 This week's task - stop reading mysql documentation.

You don't _have_ to believe Postgres -- this is stuff taught in any
statistics course.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-01 Thread Mark Kirkwood

chris smith wrote:


rant
It'd be nice if the database developers agreed on what terms meant.

http://dev.mysql.com/doc/refman/5.1/en/myisam-index-statistics.html

The SHOW INDEX statement displays a cardinality value based on N/S,
where N is the number of rows in the table and S is the average value
group size. That ratio yields an approximate number of value groups in
the table.
/rant

A work colleague found that information a few weeks ago so that's
where my misunderstanding came from - if I'm reading that right they
use n_distinct as their cardinality basis.. then again I could be
reading that completely wrong too.



Yeah that's right - e.g using the same table in postgres and mysql:

pgsql SELECT attname,n_distinct,correlation
   FROM pg_stats
   WHERE tablename='fact0'
   AND attname LIKE 'd%key';
 attname | n_distinct | correlation
-++-
 d0key   |  1 |  -0.0211169
 d1key   |100 |0.124012
 d2key   | 10 |0.998393
(3 rows)


mysql SHOW INDEX FROM fact0
- ;
+---++-+--+-+---+-+--++--++-+
| Table | Non_unique | Key_name| Seq_in_index | Column_name |
Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---++-+--+-+---+-+--++--++-+
| fact0 |  1 | fact0_d0key |1 | d0key   | A
|   1 | NULL | NULL   |  | BTREE  | |
| fact0 |  1 | fact0_d1key |1 | d1key   | A
| 100 | NULL | NULL   |  | BTREE  | |
| fact0 |  1 | fact0_d2key |1 | d2key   | A
|  10 | NULL | NULL   |  | BTREE  | |
+---++-+--+-+---+-+--++--++-+
3 rows in set (0.00 sec)


It is a bit confusing - '(distinct) cardinality' might be a better
heading for their 'cardinality' column!

On the correlation business - I don't think Mysql calculates it (or if
it does, its not displayed).



I believe postgres (because it's a lot more standards compliant).. but
sheesh - what a difference!



Well yes - however, to be fair to the Mysql guys, AFAICS the capture and 
display of index stats (and any other optimizer related data) is not 
part of any standard.



Cheers

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Query using SeqScan instead of IndexScan

2006-04-01 Thread Tom Lane
Mark Kirkwood [EMAIL PROTECTED] writes:
 It is a bit confusing - '(distinct) cardinality' might be a better
 heading for their 'cardinality' column!

The usual mathematical meaning of cardinality is the number of
members in a set.  That isn't real helpful for the point at hand,
because the mathematical definition of a set disallows duplicate
members, so if you're dealing with non-unique values you could argue it
either way about whether to count duplicates or not.  However, I read in
the SQL99 spec (3.1 Definitions)

 d) cardinality (of a value of a collection type): The number of
elements in that value. Those elements need not necessarily have
distinct values.

so ... as all too often ... the mysql boys have not got a clue about
standards compliance.  They are using this term in the opposite way
from how the SQL committee uses it.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly