Re: [HACKERS] cross column correlation revisted

2010-07-15 Thread Hans-Jürgen Schönig
hello ... a view is already nice but i think it is still too narrow. the problem is: you don't want a view for every potential join. in addition to that - ideally there is not much left of a view when it comes to checking for costs. so, i think, this is not the kind of approach leading to total

Re: [HACKERS] cross column correlation revisted

2010-07-15 Thread Joshua Tolley
On Thu, Jul 15, 2010 at 12:04:21PM +0200, Hans-Jürgen Schönig wrote: hello ... a view is already nice but i think it is still too narrow. the problem is: you don't want a view for every potential join. in addition to that - ideally there is not much left of a view when it comes to

Re: [HACKERS] cross column correlation revisted

2010-07-15 Thread David Fetter
On Thu, Jul 15, 2010 at 12:04:21PM +0200, Hans-Jürgen Schönig wrote: hello ... a view is already nice but i think it is still too narrow. One sure way to fail is to take on a problem in chunks too large. If we get even one of the cross-column issues solved by statistics, we'll be ahead of

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Heikki Linnakangas
On 14/07/10 13:12, PostgreSQL - Hans-Jürgen Schönig wrote: hello everybody, we are currently facing some serious issues with cross correlation issue. consider: 10% of all people have breast cancer. we have 2 genders (50:50). if i select all the men with breast cancer, i will get basically

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
On Jul 14, 2010, at 12:40 PM, Heikki Linnakangas wrote: On 14/07/10 13:12, PostgreSQL - Hans-Jürgen Schönig wrote: hello everybody, we are currently facing some serious issues with cross correlation issue. consider: 10% of all people have breast cancer. we have 2 genders (50:50). if i

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Yeb Havinga
Heikki Linnakangas wrote: However, the problem is how to represent and store the cross-correlation. For fields with low cardinality, like gender and boolean breast-cancer-or-not you can count the prevalence of all the different combinations, but that doesn't scale. Another often cited example

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 14/07/10 13:12, PostgreSQL - Hans-Jürgen Schönig wrote: maybe somehow like this ... ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4) +1 is my general feeling, it's good if you can tell the system to collect

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Joshua Tolley
On Wed, Jul 14, 2010 at 01:21:19PM +0200, Yeb Havinga wrote: Heikki Linnakangas wrote: However, the problem is how to represent and store the cross-correlation. For fields with low cardinality, like gender and boolean breast-cancer-or-not you can count the prevalence of all the

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
hello tom, i think that having stats on an index is a problem by itself for 2 reasons - for cross column correlation at least: a.) joins cannot be covered by an index on two tables - we would fix inside a table correlation problems but not joins. b.) who says that there is

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Tom Lane
=?iso-8859-1?Q?PostgreSQL_-_Hans-J=FCrgen_Sch=F6nig?= postg...@cybertec.at writes: i think that having stats on an index is a problem by itself for 2 reasons - for cross column correlation at least: a.) joins cannot be covered by an index on two tables - we would fix inside a table

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Andrew Dunstan
Tom Lane wrote: If the combination of columns is actually interesting, there might well be an index in place, or the DBA might be willing to create it. I'm having a hard time imagining an interesting case where that wouldn't be so. For that matter, have you considered the idea of

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread PostgreSQL - Hans-Jürgen Schönig
hello ... look at the syntax i posted in more detail: ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = y.id2) it says X and Y ... the selectivity of joins are what i am most interested in. cross correlation of columns within the same table are just a byproduct.

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Joshua Tolley
On Wed, Jul 14, 2010 at 04:41:01PM +0200, PostgreSQL - Hans-Jürgen Schönig wrote: hello ... look at the syntax i posted in more detail: ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = y.id2) it says X and Y ... the selectivity of joins are what i am most

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Robert Haas
2010/7/14 Tom Lane t...@sss.pgh.pa.us: If the combination of columns is actually interesting, there might well be an index in place, or the DBA might be willing to create it. Indexes aren't free, though, nor even close to it. Still, I think we should figure out the underlying mechanism first

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread marcin mank
On Wed, Jul 14, 2010 at 5:13 PM, Robert Haas robertmh...@gmail.com wrote: 2010/7/14 Tom Lane t...@sss.pgh.pa.us: If the combination of columns is actually interesting, there might well be an index in place, or the DBA might be willing to create it. Indexes aren't free, though, nor even close

Re: [HACKERS] cross column correlation revisted

2010-07-14 Thread Dimitri Fontaine
Joshua Tolley eggyk...@gmail.com writes: ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id =3D y.id AND x.id= 2 =3D y.id2) =20 it says X and Y ... the selectivity of joins are what i am most interested in. cross correlation of columns within the same table are just a byproduct. the core