Re: [HACKERS] WIP: cross column correlation, 2nd shot

2012-03-14 Thread Robert Haas
On Tue, Mar 13, 2012 at 9:56 AM, Hans-Jürgen Schönig wrote: > Here's the cross-col patch against todays master branch. Please add your patch here, so it doesn't get forgotten: https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] WIP: cross column correlation ...

2011-02-28 Thread Rod Taylor
> > But it's not the same as tracking *sections of a table*. > > I dunno. I imagine if you have a "section" of a table in different > storage than other sections, you created a tablespace and moved the > partition holding that section there. Otherwise, how do you prevent the > tuples from moving

Re: [HACKERS] WIP: cross column correlation ...

2011-02-28 Thread Alvaro Herrera
Excerpts from Robert Haas's message of sáb feb 26 02:24:26 -0300 2011: > On Fri, Feb 25, 2011 at 2:26 PM, Alvaro Herrera > wrote: > > Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: > > > >> How practical would it be for analyze to keep a record of response times > >> for >

Re: [HACKERS] WIP: cross column correlation ...

2011-02-28 Thread Robert Haas
On Sun, Feb 27, 2011 at 5:17 PM, Josh Berkus wrote: > >> I think there would be value in giving the DBA an easier way to see >> which tables are hot, but I am really leery about the idea of trying >> to feed that directly into the query planner.  I think this is one of >> those cases where we let

Re: [HACKERS] WIP: cross column correlation ...

2011-02-27 Thread Josh Berkus
> I think there would be value in giving the DBA an easier way to see > which tables are hot, but I am really leery about the idea of trying > to feed that directly into the query planner. I think this is one of > those cases where we let people tune it manually for starters, and > then wait for

Re: [HACKERS] WIP: cross column correlation ...

2011-02-27 Thread Robert Haas
On Sun, Feb 27, 2011 at 3:03 AM, Bruce Momjian wrote: >> You make it sound as if we know how but are just too lazy to right the >> code.  That is not one of the weaknesses that this community has. > > Well, several automatic idea have been floated, but rejected because > they don't work well for q

Re: [HACKERS] WIP: cross column correlation ...

2011-02-27 Thread Robert Haas
On Sun, Feb 27, 2011 at 3:01 AM, Bruce Momjian wrote: > Grzegorz Jaskiewicz wrote: >> >> On 25 Feb 2011, at 13:18, Robert Haas wrote: >> >> >  People coming from Oracle are not favorably >> > impressed either by the amount of monitoring data PostgreSQL can >> > gather or by the number of knobs tha

Re: [HACKERS] WIP: cross column correlation ...

2011-02-27 Thread Bruce Momjian
Robert Haas wrote: > On Sat, Feb 26, 2011 at 1:57 AM, Bruce Momjian wrote: > > Robert Haas wrote: > >> > Actually, we *do* have some idea which tables are hot. ?Or at least, we > >> > could. ? Currently, pg_stats for tables are "timeless"; they just > >> > accumulate from the last reset, which has

Re: [HACKERS] WIP: cross column correlation ...

2011-02-27 Thread Bruce Momjian
Grzegorz Jaskiewicz wrote: > > On 25 Feb 2011, at 13:18, Robert Haas wrote: > > > People coming from Oracle are not favorably > > impressed either by the amount of monitoring data PostgreSQL can > > gather or by the number of knobs that are available to fix problems > > when they occur. We don'

Re: [HACKERS] WIP: cross column correlation ...

2011-02-27 Thread Bruce Momjian
Rod Taylor wrote: > On Fri, Feb 25, 2011 at 14:26, Alvaro Herrera > wrote: > > > Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: > > > > > How practical would it be for analyze to keep a record of response times > > for > > > given sections of a table as it randomly accesses

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Kevin Grittner
Greg Stark wrote: > Consider the oft-quoted example of a -- or > for Americans. I'm not sure everyone realizes just how complicated this particular issue is. If we can do a good job with U.S. city, state, zip code we will have something which will handle a lot of cases. Consider: (1) M

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Kevin Grittner
> Grzegorz Jaskiewicz wrote: > I guess that the systems could behave much better, but no one is > going to tweak settings for 50 different installations over 50 > different type of data and 50 different sets of hardware. > If there was even a tiny amount of automation provided in the > postgresq

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Grzegorz Jaskiewicz
On 26 Feb 2011, at 14:45, Robert Haas wrote: > On Sat, Feb 26, 2011 at 4:33 AM, Grzegorz Jaskiewicz >> > > I don't think *anyone* is avoiding that approach. There is almost > universal consensus here that auto-tuning is better than manual > tuning, even to the extent of being unwilling to add

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Martijn van Oosterhout
On Sat, Feb 26, 2011 at 06:44:52PM +, Greg Stark wrote: > 2011/2/26 PostgreSQL - Hans-Jürgen Schönig : > > what we are trying to do is to explicitly store column correlations. so, a > > histogram for (a, b) correlation and so on. > > The problem is that we haven't figured out how to usefully

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Greg Stark
2011/2/26 PostgreSQL - Hans-Jürgen Schönig : > what we are trying to do is to explicitly store column correlations. so, a > histogram for (a, b) correlation and so on. > The problem is that we haven't figured out how to usefully store a histogram for . Consider the oft-quoted example of a -- or

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread PostgreSQL - Hans-Jürgen Schönig
>>> >> >> Still, having more data a user can probe would be nice. >> >> I wonder why everyone avoids Microsoft's approach to the subject. >> Apparently, they go in the 'auto-tune as much as possible' direction. >> And tests we did a while ago, involving asking team from Microsoft and a >> team

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Robert Haas
On Sat, Feb 26, 2011 at 1:57 AM, Bruce Momjian wrote: > Robert Haas wrote: >> > Actually, we *do* have some idea which tables are hot. ?Or at least, we >> > could. ? Currently, pg_stats for tables are "timeless"; they just >> > accumulate from the last reset, which has always been a problem in >>

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Robert Haas
On Sat, Feb 26, 2011 at 4:33 AM, Grzegorz Jaskiewicz wrote: > > On 25 Feb 2011, at 13:18, Robert Haas wrote: > >>  People coming from Oracle are not favorably >> impressed either by the amount of monitoring data PostgreSQL can >> gather or by the number of knobs that are available to fix problems

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Rod Taylor
On Fri, Feb 25, 2011 at 14:26, Alvaro Herrera wrote: > Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: > > > How practical would it be for analyze to keep a record of response times > for > > given sections of a table as it randomly accesses them and generate some > > kind of

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Grzegorz Jaskiewicz
On 25 Feb 2011, at 13:18, Robert Haas wrote: > People coming from Oracle are not favorably > impressed either by the amount of monitoring data PostgreSQL can > gather or by the number of knobs that are available to fix problems > when they occur. We don't need to have as many knobs as Oracle an

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Bruce Momjian
Robert Haas wrote: > > Actually, we *do* have some idea which tables are hot. ?Or at least, we > > could. ? Currently, pg_stats for tables are "timeless"; they just > > accumulate from the last reset, which has always been a problem in > > general for monitoring. ?If we could make top-level table a

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas
On Fri, Feb 25, 2011 at 6:41 PM, Josh Berkus wrote: >> One idea Tom and I kicked around previously is to set an assumed >> caching percentage for each table based on its size relative to >> effective_cache_size - in other words, assume that the smaller a table >> is, the more of it will be cached.

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas
On Fri, Feb 25, 2011 at 2:26 PM, Alvaro Herrera wrote: > Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: > >> How practical would it be for analyze to keep a record of response times for >> given sections of a table as it randomly accesses them and generate some >> kind of a

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Josh Berkus
> 4. Even if we could accurately estimate the percentage of the table > that is cached, what then? For example, suppose that a user issues a > query which retrieves 1% of a table, and we know that 1% of that table > is cached. How much of the data that the user asked for is cache? FWIW, for a m

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Cédric Villemain
2011/2/25 Robert Haas : > 2011/2/25 Cédric Villemain : >>> All that having been said, I think that while Josh is thinking fuzzily >>> about the mathematics of his proposal, the basic idea is pretty >>> sensible.  It is not easy - likely not possible - for the system to >>> have a good idea which th

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Alvaro Herrera
Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: > How practical would it be for analyze to keep a record of response times for > given sections of a table as it randomly accesses them and generate some > kind of a map for expected response times for the pieces of data it is >

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas
2011/2/25 Cédric Villemain : >> All that having been said, I think that while Josh is thinking fuzzily >> about the mathematics of his proposal, the basic idea is pretty >> sensible.  It is not easy - likely not possible - for the system to >> have a good idea which things will be in some kind of c

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Rod Taylor
> 4. Even if we could accurately estimate the percentage of the table > that is cached, what then? For example, suppose that a user issues a > query which retrieves 1% of a table, and we know that 1% of that table > is cached. How much of the data that the user asked for is cache? > Hard to say,

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Cédric Villemain
2011/2/25 Robert Haas : > On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian wrote: >> Josh Berkus wrote: >>> On 2/23/11 7:10 AM, Robert Haas wrote: >>> > IME, most bad query plans are caused by either incorrect >>> > estimates of selectivity, or wrongheaded notions about what's likely >>> > to be cac

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas
On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian wrote: > Josh Berkus wrote: >> On 2/23/11 7:10 AM, Robert Haas wrote: >> > IME, most bad query plans are caused by either incorrect >> > estimates of selectivity, or wrongheaded notions about what's likely >> > to be cached.  If we could find a way, a

Re: [HACKERS] WIP: cross column correlation ...

2011-02-24 Thread Bruce Momjian
Josh Berkus wrote: > On 2/23/11 7:10 AM, Robert Haas wrote: > > IME, most bad query plans are caused by either incorrect > > estimates of selectivity, or wrongheaded notions about what's likely > > to be cached. If we could find a way, automated or manual, of > > providing the planner some better

Re: [HACKERS] WIP: cross column correlation ...

2011-02-24 Thread Josh Berkus
On 2/23/11 7:10 AM, Robert Haas wrote: > IME, most bad query plans are caused by either incorrect > estimates of selectivity, or wrongheaded notions about what's likely > to be cached. If we could find a way, automated or manual, of > providing the planner some better information about the facts o

Re: [HACKERS] WIP: cross column correlation ...

2011-02-24 Thread Bruce Momjian
Robert Haas wrote: > On Wed, Feb 23, 2011 at 10:30 PM, Bruce Momjian wrote: > > Robert Haas wrote: > >> If you want to take the above as in any way an exhaustive survey of > >> the landscape (which it isn't), C seems like a standout, maybe > >> augmented by the making the planner able to notice th

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
On Feb 24, 2011, at 2:09 AM, Josh Berkus wrote: > >> Personally, I think the first thing we ought to do is add a real, bona >> fide planner hint to override the selectivity calculation manually, >> maybe something like this: >> >> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); >> >> Then, having pr

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Robert Haas
On Wed, Feb 23, 2011 at 10:30 PM, Bruce Momjian wrote: > Robert Haas wrote: >> If you want to take the above as in any way an exhaustive survey of >> the landscape (which it isn't), C seems like a standout, maybe >> augmented by the making the planner able to notice that A1 = x1 AND A2 >> = x2 is

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Bruce Momjian
Robert Haas wrote: > If you want to take the above as in any way an exhaustive survey of > the landscape (which it isn't), C seems like a standout, maybe > augmented by the making the planner able to notice that A1 = x1 AND A2 > = x2 is equivalent to (A1,A2) = (x1, x2) so you don't have to rewrite

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Robert Haas
On Wed, Feb 23, 2011 at 8:09 PM, Josh Berkus wrote: >> Personally, I think the first thing we ought to do is add a real, bona >> fide planner hint to override the selectivity calculation manually, >> maybe something like this: >> >> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); >> >> Then, having pro

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Josh Berkus
> Personally, I think the first thing we ought to do is add a real, bona > fide planner hint to override the selectivity calculation manually, > maybe something like this: > > WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); > > Then, having provided a method for the DBA to extinguish the raging > fla

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Bruce Momjian
Robert Haas wrote: > 2011/2/23 PostgreSQL - Hans-J?rgen Sch?nig : > > i thought there was an agreement that we don't want planner hints? > > Well, I want them. I think some other people do, too. Whether those > people are more numerous than than the people who don't want them, and > how much tha

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Bruce Momjian
PostgreSQL - Hans-J?rgen Sch?nig wrote: > > Those are real problems, but I still want it. The last time I hit > > this problem I spent two days redesigning my schema and adding > > triggers all over the place to make things work. If I had been > > dealing with a 30TB database instead of a 300MB d

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Robert Haas
2011/2/23 PostgreSQL - Hans-Jürgen Schönig : > i thought there was an agreement that we don't want planner hints? Well, I want them. I think some other people do, too. Whether those people are more numerous than than the people who don't want them, and how much that matters either way, is anothe

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
On Feb 23, 2011, at 3:46 PM, Robert Haas wrote: > On Wed, Feb 23, 2011 at 12:50 AM, Nathan Boley wrote: >>> Personally, I think the first thing we ought to do is add a real, bona >>> fide planner hint to override the selectivity calculation manually, >>> maybe something like this: >>> >>> WHERE

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
>>> >> >> cheapest and easiest solution if you run into this: add "fake" functions >> which the planner cannot estimate properly. >> use OR to artificially prop up estimates or use AND to artificially lower >> them. there is actually no need to redesign the schema to get around it but >> it is

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Robert Haas
2011/2/23 PostgreSQL - Hans-Jürgen Schönig : >> Those are real problems, but I still want it.  The last time I hit >> this problem I spent two days redesigning my schema and adding >> triggers all over the place to make things work.  If I had been >> dealing with a 30TB database instead of a 300MB

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Robert Haas
On Wed, Feb 23, 2011 at 12:50 AM, Nathan Boley wrote: >> Personally, I think the first thing we ought to do is add a real, bona >> fide planner hint to override the selectivity calculation manually, >> maybe something like this: >> >> WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); > > If you're going

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
> Those are real problems, but I still want it. The last time I hit > this problem I spent two days redesigning my schema and adding > triggers all over the place to make things work. If I had been > dealing with a 30TB database instead of a 300MB database I would have > been royally up a creek.

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
On Feb 23, 2011, at 2:58 AM, Robert Haas wrote: > 2011/2/22 PostgreSQL - Hans-Jürgen Schönig : >> how does it work? we try to find suitable statistics for an arbitrary length >> list of conditions so that the planner can use it directly rather than >> multiplying all the selectivities. this sho

Re: [HACKERS] WIP: cross column correlation ...

2011-02-22 Thread Nathan Boley
> Personally, I think the first thing we ought to do is add a real, bona > fide planner hint to override the selectivity calculation manually, > maybe something like this: > > WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); > If you're going to go that far, why not just collect statistics on that spec

Re: [HACKERS] WIP: cross column correlation ...

2011-02-22 Thread Tom Lane
Robert Haas writes: > On Tue, Feb 22, 2011 at 9:43 PM, Tom Lane wrote: >> One of the criteria we've always had for a suitable hint-or-whatever- >> you-call-it design is that it *not* involve decorating the queries. > [ snip ] > To put that another way, it's true that some people can't adjust the

Re: [HACKERS] WIP: cross column correlation ...

2011-02-22 Thread Robert Haas
On Tue, Feb 22, 2011 at 9:43 PM, Tom Lane wrote: > Robert Haas writes: >> /me prepares to go down in flames. > >> Personally, I think the first thing we ought to do is add a real, bona >> fide planner hint to override the selectivity calculation manually, >> maybe something like this: > >> WHERE

Re: [HACKERS] WIP: cross column correlation ...

2011-02-22 Thread Tom Lane
Robert Haas writes: > /me prepares to go down in flames. > Personally, I think the first thing we ought to do is add a real, bona > fide planner hint to override the selectivity calculation manually, > maybe something like this: > WHERE (x < 5 AND y = 1) SELECTIVITY (0.1); One of the criteria w

Re: [HACKERS] WIP: cross column correlation ...

2011-02-22 Thread Robert Haas
2011/2/22 PostgreSQL - Hans-Jürgen Schönig : > how does it work? we try to find suitable statistics for an arbitrary length > list of conditions so that the planner can use it directly rather than > multiplying all the selectivities. this should make estimates a lot more > precise. > the current