Re: [PERFORM] [Solved] Slow performance on Windows .NET and OleDb
On 01/04/06, Magnus Hagander [EMAIL PROTECTED] wrote: This is a blatant thread steal... but here we go... Do people have any opinions on the pgsql driver? It's very nice. ... Thanks for the tips - i'll try a couple of test apps soon. Cheers Antoine -- This is where I should put some witty comment. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query using SeqScan instead of IndexScan
On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote: On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote: Hi Jim, I'm not quite sure what you mean by the correlation of category_id? It means how many distinct values does it have (at least that's my understanding of it ;) ). Your understanding is wrong. :) What you're discussing is n_distinct. http://www.postgresql.org/docs/8.1/interactive/view-pg-stats.html correlation: Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk. (This column is NULL if the column data type does not have a operator.) In other words, the following will have a correlation of 1: 1 2 3 ... 998 999 1000 And this is -1... 1000 999 ... 2 1 While this would have a very low correlation: 1 1000 2 999 ... The lower the correlation, the more expensive an index scan is, because it's more random. As I mentioned, I believe that the current index scan cost estimator is flawed though, because it will bias heavily against correlations that aren't close to 1 or -1. So, what does SELECT * FROM pg_stats WHERE tablename='table' AND attname='category_id'; show? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query using SeqScan instead of IndexScan
Hi Jim, from SELECT * FROM pg_stats WHERE tablename='table' AND attname='category_id' I find correlation on category_product for category_id is 0.643703 Would setting the index on category_id to be clustered help with this? Thanks, Brendan Duddridge | CTO | 403-277-5591 x24 | [EMAIL PROTECTED] ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Apr 1, 2006, at 8:32 AM, Jim C. Nasby wrote: On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote: On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote: Hi Jim, I'm not quite sure what you mean by the correlation of category_id? It means how many distinct values does it have (at least that's my understanding of it ;) ). Your understanding is wrong. :) What you're discussing is n_distinct. http://www.postgresql.org/docs/8.1/interactive/view-pg-stats.html correlation: Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk. (This column is NULL if the column data type does not have a operator.) In other words, the following will have a correlation of 1: 1 2 3 ... 998 999 1000 And this is -1... 1000 999 ... 2 1 While this would have a very low correlation: 1 1000 2 999 ... The lower the correlation, the more expensive an index scan is, because it's more random. As I mentioned, I believe that the current index scan cost estimator is flawed though, because it will bias heavily against correlations that aren't close to 1 or -1. So, what does SELECT * FROM pg_stats WHERE tablename='table' AND attname='category_id'; show? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] [Solved] Slow performance on Windows .NET and OleDb
On 3/31/06, Magnus Hagander [EMAIL PROTECTED] wrote: This is a blatant thread steal... but here we go... Do people have any opinions on the pgsql driver? I beleive so. I've been using it for a long time with zero problems. While I don't use many of the exotic features in it, I doubt most people do ;-) Don't get scared by the claim it's in beta - IIRC there's an RC out any day now, and it's been stable long before 1.0. But it's always a good idea to browse through the list of known bugs and see if one will likely hit you... Up until a few months ago the npgsql driver was missing a few features that made it easier to work with typed datasets in the IDE...I would use the odbc driver to create the dataset at design time and work with it at run time with the npgsql driver. Lately though, it seems there is no reason not use the npgsql driver. Merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query using SeqScan instead of IndexScan
On 4/2/06, Jim C. Nasby [EMAIL PROTECTED] wrote: On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote: On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote: Hi Jim, I'm not quite sure what you mean by the correlation of category_id? It means how many distinct values does it have (at least that's my understanding of it ;) ). Your understanding is wrong. :) What you're discussing is n_distinct. Geez, I'm going well this week ;) Thanks for the detailed info. -- Postgresql php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query using SeqScan instead of IndexScan
On 4/2/06, chris smith [EMAIL PROTECTED] wrote: On 4/2/06, Jim C. Nasby [EMAIL PROTECTED] wrote: On Sat, Apr 01, 2006 at 11:23:37AM +1000, chris smith wrote: On 4/1/06, Brendan Duddridge [EMAIL PROTECTED] wrote: Hi Jim, I'm not quite sure what you mean by the correlation of category_id? It means how many distinct values does it have (at least that's my understanding of it ;) ). Your understanding is wrong. :) What you're discussing is n_distinct. rant It'd be nice if the database developers agreed on what terms meant. http://dev.mysql.com/doc/refman/5.1/en/myisam-index-statistics.html The SHOW INDEX statement displays a cardinality value based on N/S, where N is the number of rows in the table and S is the average value group size. That ratio yields an approximate number of value groups in the table. /rant A work colleague found that information a few weeks ago so that's where my misunderstanding came from - if I'm reading that right they use n_distinct as their cardinality basis.. then again I could be reading that completely wrong too. I believe postgres (because it's a lot more standards compliant).. but sheesh - what a difference! This week's task - stop reading mysql documentation. -- Postgresql php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query using SeqScan instead of IndexScan
chris smith wrote: I believe postgres (because it's a lot more standards compliant).. but sheesh - what a difference! This week's task - stop reading mysql documentation. You don't _have_ to believe Postgres -- this is stuff taught in any statistics course. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query using SeqScan instead of IndexScan
chris smith wrote: rant It'd be nice if the database developers agreed on what terms meant. http://dev.mysql.com/doc/refman/5.1/en/myisam-index-statistics.html The SHOW INDEX statement displays a cardinality value based on N/S, where N is the number of rows in the table and S is the average value group size. That ratio yields an approximate number of value groups in the table. /rant A work colleague found that information a few weeks ago so that's where my misunderstanding came from - if I'm reading that right they use n_distinct as their cardinality basis.. then again I could be reading that completely wrong too. Yeah that's right - e.g using the same table in postgres and mysql: pgsql SELECT attname,n_distinct,correlation FROM pg_stats WHERE tablename='fact0' AND attname LIKE 'd%key'; attname | n_distinct | correlation -++- d0key | 1 | -0.0211169 d1key |100 |0.124012 d2key | 10 |0.998393 (3 rows) mysql SHOW INDEX FROM fact0 - ; +---++-+--+-+---+-+--++--++-+ | Table | Non_unique | Key_name| Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +---++-+--+-+---+-+--++--++-+ | fact0 | 1 | fact0_d0key |1 | d0key | A | 1 | NULL | NULL | | BTREE | | | fact0 | 1 | fact0_d1key |1 | d1key | A | 100 | NULL | NULL | | BTREE | | | fact0 | 1 | fact0_d2key |1 | d2key | A | 10 | NULL | NULL | | BTREE | | +---++-+--+-+---+-+--++--++-+ 3 rows in set (0.00 sec) It is a bit confusing - '(distinct) cardinality' might be a better heading for their 'cardinality' column! On the correlation business - I don't think Mysql calculates it (or if it does, its not displayed). I believe postgres (because it's a lot more standards compliant).. but sheesh - what a difference! Well yes - however, to be fair to the Mysql guys, AFAICS the capture and display of index stats (and any other optimizer related data) is not part of any standard. Cheers Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query using SeqScan instead of IndexScan
Mark Kirkwood [EMAIL PROTECTED] writes: It is a bit confusing - '(distinct) cardinality' might be a better heading for their 'cardinality' column! The usual mathematical meaning of cardinality is the number of members in a set. That isn't real helpful for the point at hand, because the mathematical definition of a set disallows duplicate members, so if you're dealing with non-unique values you could argue it either way about whether to count duplicates or not. However, I read in the SQL99 spec (3.1 Definitions) d) cardinality (of a value of a collection type): The number of elements in that value. Those elements need not necessarily have distinct values. so ... as all too often ... the mysql boys have not got a clue about standards compliance. They are using this term in the opposite way from how the SQL committee uses it. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly