Re: [HACKERS] WIP: cross column correlation, 2nd shot

2012-03-14 Thread Robert Haas
On Tue, Mar 13, 2012 at 9:56 AM, Hans-Jürgen Schönig postg...@cybertec.at wrote: Here's the cross-col patch against todays master branch. Please add your patch here, so it doesn't get forgotten: https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB:

Re: [HACKERS] WIP: cross column correlation ...

2011-02-28 Thread Robert Haas
On Sun, Feb 27, 2011 at 5:17 PM, Josh Berkus j...@agliodbs.com wrote: I think there would be value in giving the DBA an easier way to see which tables are hot, but I am really leery about the idea of trying to feed that directly into the query planner.  I think this is one of those cases

Re: [HACKERS] WIP: cross column correlation ...

2011-02-28 Thread Alvaro Herrera
Excerpts from Robert Haas's message of sáb feb 26 02:24:26 -0300 2011: On Fri, Feb 25, 2011 at 2:26 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: How practical would it be for analyze to keep a record of response

Re: [HACKERS] WIP: cross column correlation ...

2011-02-28 Thread Rod Taylor
But it's not the same as tracking *sections of a table*. I dunno. I imagine if you have a section of a table in different storage than other sections, you created a tablespace and moved the partition holding that section there. Otherwise, how do you prevent the tuples from moving to other

Re: [HACKERS] WIP: cross column correlation ...

2011-02-27 Thread Bruce Momjian
Rod Taylor wrote: On Fri, Feb 25, 2011 at 14:26, Alvaro Herrera alvhe...@commandprompt.comwrote: Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: How practical would it be for analyze to keep a record of response times for given sections of a table as it

Re: [HACKERS] WIP: cross column correlation ...

2011-02-27 Thread Bruce Momjian
Grzegorz Jaskiewicz wrote: On 25 Feb 2011, at 13:18, Robert Haas wrote: People coming from Oracle are not favorably impressed either by the amount of monitoring data PostgreSQL can gather or by the number of knobs that are available to fix problems when they occur. We don't need to

Re: [HACKERS] WIP: cross column correlation ...

2011-02-27 Thread Bruce Momjian
Robert Haas wrote: On Sat, Feb 26, 2011 at 1:57 AM, Bruce Momjian br...@momjian.us wrote: Robert Haas wrote: Actually, we *do* have some idea which tables are hot. ?Or at least, we could. ? Currently, pg_stats for tables are timeless; they just accumulate from the last reset, which has

Re: [HACKERS] WIP: cross column correlation ...

2011-02-27 Thread Robert Haas
On Sun, Feb 27, 2011 at 3:01 AM, Bruce Momjian br...@momjian.us wrote: Grzegorz Jaskiewicz wrote: On 25 Feb 2011, at 13:18, Robert Haas wrote:  People coming from Oracle are not favorably impressed either by the amount of monitoring data PostgreSQL can gather or by the number of knobs

Re: [HACKERS] WIP: cross column correlation ...

2011-02-27 Thread Robert Haas
On Sun, Feb 27, 2011 at 3:03 AM, Bruce Momjian br...@momjian.us wrote: You make it sound as if we know how but are just too lazy to right the code.  That is not one of the weaknesses that this community has. Well, several automatic idea have been floated, but rejected because they don't work

Re: [HACKERS] WIP: cross column correlation ...

2011-02-27 Thread Josh Berkus
I think there would be value in giving the DBA an easier way to see which tables are hot, but I am really leery about the idea of trying to feed that directly into the query planner. I think this is one of those cases where we let people tune it manually for starters, and then wait for

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Grzegorz Jaskiewicz
On 25 Feb 2011, at 13:18, Robert Haas wrote: People coming from Oracle are not favorably impressed either by the amount of monitoring data PostgreSQL can gather or by the number of knobs that are available to fix problems when they occur. We don't need to have as many knobs as Oracle and

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Rod Taylor
On Fri, Feb 25, 2011 at 14:26, Alvaro Herrera alvhe...@commandprompt.comwrote: Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: How practical would it be for analyze to keep a record of response times for given sections of a table as it randomly accesses them and

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Robert Haas
On Sat, Feb 26, 2011 at 4:33 AM, Grzegorz Jaskiewicz g...@pointblue.com.pl wrote: On 25 Feb 2011, at 13:18, Robert Haas wrote:  People coming from Oracle are not favorably impressed either by the amount of monitoring data PostgreSQL can gather or by the number of knobs that are available to

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Robert Haas
On Sat, Feb 26, 2011 at 1:57 AM, Bruce Momjian br...@momjian.us wrote: Robert Haas wrote: Actually, we *do* have some idea which tables are hot. ?Or at least, we could. ? Currently, pg_stats for tables are timeless; they just accumulate from the last reset, which has always been a problem

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread PostgreSQL - Hans-Jürgen Schönig
Still, having more data a user can probe would be nice. I wonder why everyone avoids Microsoft's approach to the subject. Apparently, they go in the 'auto-tune as much as possible' direction. And tests we did a while ago, involving asking team from Microsoft and a team from oracle to

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Greg Stark
2011/2/26 PostgreSQL - Hans-Jürgen Schönig postg...@cybertec.at: what we are trying to do is to explicitly store column correlations. so, a histogram for (a, b) correlation and so on. The problem is that we haven't figured out how to usefully store a histogram for a,b. Consider the oft-quoted

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Martijn van Oosterhout
On Sat, Feb 26, 2011 at 06:44:52PM +, Greg Stark wrote: 2011/2/26 PostgreSQL - Hans-Jürgen Schönig postg...@cybertec.at: what we are trying to do is to explicitly store column correlations. so, a histogram for (a, b) correlation and so on. The problem is that we haven't figured out

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Grzegorz Jaskiewicz
On 26 Feb 2011, at 14:45, Robert Haas wrote: On Sat, Feb 26, 2011 at 4:33 AM, Grzegorz Jaskiewicz I don't think *anyone* is avoiding that approach. There is almost universal consensus here that auto-tuning is better than manual tuning, even to the extent of being unwilling to add knobs

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Kevin Grittner
Grzegorz Jaskiewicz wrote: I guess that the systems could behave much better, but no one is going to tweak settings for 50 different installations over 50 different type of data and 50 different sets of hardware. If there was even a tiny amount of automation provided in the postgresql, I

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Kevin Grittner
Greg Stark wrote: Consider the oft-quoted example of a -- or for Americans. I'm not sure everyone realizes just how complicated this particular issue is. If we can do a good job with U.S. city, state, zip code we will have something which will handle a lot of cases. Consider: (1)

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas
On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian br...@momjian.us wrote: Josh Berkus wrote: On 2/23/11 7:10 AM, Robert Haas wrote: IME, most bad query plans are caused by either incorrect estimates of selectivity, or wrongheaded notions about what's likely to be cached.  If we could find a

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Cédric Villemain
2011/2/25 Robert Haas robertmh...@gmail.com: On Fri, Feb 25, 2011 at 1:33 AM, Bruce Momjian br...@momjian.us wrote: Josh Berkus wrote: On 2/23/11 7:10 AM, Robert Haas wrote: IME, most bad query plans are caused by either incorrect estimates of selectivity, or wrongheaded notions about

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Rod Taylor
4. Even if we could accurately estimate the percentage of the table that is cached, what then? For example, suppose that a user issues a query which retrieves 1% of a table, and we know that 1% of that table is cached. How much of the data that the user asked for is cache? Hard to say,

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas
2011/2/25 Cédric Villemain cedric.villemain.deb...@gmail.com: All that having been said, I think that while Josh is thinking fuzzily about the mathematics of his proposal, the basic idea is pretty sensible.  It is not easy - likely not possible - for the system to have a good idea which things

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Alvaro Herrera
Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: How practical would it be for analyze to keep a record of response times for given sections of a table as it randomly accesses them and generate some kind of a map for expected response times for the pieces of data it is

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Cédric Villemain
2011/2/25 Robert Haas robertmh...@gmail.com: 2011/2/25 Cédric Villemain cedric.villemain.deb...@gmail.com: All that having been said, I think that while Josh is thinking fuzzily about the mathematics of his proposal, the basic idea is pretty sensible.  It is not easy - likely not possible -

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Josh Berkus
4. Even if we could accurately estimate the percentage of the table that is cached, what then? For example, suppose that a user issues a query which retrieves 1% of a table, and we know that 1% of that table is cached. How much of the data that the user asked for is cache? FWIW, for a

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas
On Fri, Feb 25, 2011 at 2:26 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011: How practical would it be for analyze to keep a record of response times for given sections of a table as it randomly accesses them and

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Robert Haas
On Fri, Feb 25, 2011 at 6:41 PM, Josh Berkus j...@agliodbs.com wrote: One idea Tom and I kicked around previously is to set an assumed caching percentage for each table based on its size relative to effective_cache_size - in other words, assume that the smaller a table is, the more of it will

Re: [HACKERS] WIP: cross column correlation ...

2011-02-25 Thread Bruce Momjian
Robert Haas wrote: Actually, we *do* have some idea which tables are hot. ?Or at least, we could. ? Currently, pg_stats for tables are timeless; they just accumulate from the last reset, which has always been a problem in general for monitoring. ?If we could make top-level table and index

Re: [HACKERS] WIP: cross column correlation ...

2011-02-24 Thread Bruce Momjian
Robert Haas wrote: On Wed, Feb 23, 2011 at 10:30 PM, Bruce Momjian br...@momjian.us wrote: Robert Haas wrote: If you want to take the above as in any way an exhaustive survey of the landscape (which it isn't), C seems like a standout, maybe augmented by the making the planner able to

Re: [HACKERS] WIP: cross column correlation ...

2011-02-24 Thread Josh Berkus
On 2/23/11 7:10 AM, Robert Haas wrote: IME, most bad query plans are caused by either incorrect estimates of selectivity, or wrongheaded notions about what's likely to be cached. If we could find a way, automated or manual, of providing the planner some better information about the facts of

Re: [HACKERS] WIP: cross column correlation ...

2011-02-24 Thread Bruce Momjian
Josh Berkus wrote: On 2/23/11 7:10 AM, Robert Haas wrote: IME, most bad query plans are caused by either incorrect estimates of selectivity, or wrongheaded notions about what's likely to be cached. If we could find a way, automated or manual, of providing the planner some better

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
On Feb 23, 2011, at 2:58 AM, Robert Haas wrote: 2011/2/22 PostgreSQL - Hans-Jürgen Schönig postg...@cybertec.at: how does it work? we try to find suitable statistics for an arbitrary length list of conditions so that the planner can use it directly rather than multiplying all the

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
Those are real problems, but I still want it. The last time I hit this problem I spent two days redesigning my schema and adding triggers all over the place to make things work. If I had been dealing with a 30TB database instead of a 300MB database I would have been royally up a creek.

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Robert Haas
On Wed, Feb 23, 2011 at 12:50 AM, Nathan Boley npbo...@gmail.com wrote: Personally, I think the first thing we ought to do is add a real, bona fide planner hint to override the selectivity calculation manually, maybe something like this: WHERE (x 5 AND y = 1) SELECTIVITY (0.1); If you're

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Robert Haas
2011/2/23 PostgreSQL - Hans-Jürgen Schönig postg...@cybertec.at: Those are real problems, but I still want it.  The last time I hit this problem I spent two days redesigning my schema and adding triggers all over the place to make things work.  If I had been dealing with a 30TB database

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
cheapest and easiest solution if you run into this: add fake functions which the planner cannot estimate properly. use OR to artificially prop up estimates or use AND to artificially lower them. there is actually no need to redesign the schema to get around it but it is such an ugly

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
On Feb 23, 2011, at 3:46 PM, Robert Haas wrote: On Wed, Feb 23, 2011 at 12:50 AM, Nathan Boley npbo...@gmail.com wrote: Personally, I think the first thing we ought to do is add a real, bona fide planner hint to override the selectivity calculation manually, maybe something like this:

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Robert Haas
2011/2/23 PostgreSQL - Hans-Jürgen Schönig postg...@cybertec.at: i thought there was an agreement that we don't want planner hints? Well, I want them. I think some other people do, too. Whether those people are more numerous than than the people who don't want them, and how much that matters

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Bruce Momjian
PostgreSQL - Hans-J?rgen Sch?nig wrote: Those are real problems, but I still want it. The last time I hit this problem I spent two days redesigning my schema and adding triggers all over the place to make things work. If I had been dealing with a 30TB database instead of a 300MB database

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Bruce Momjian
Robert Haas wrote: 2011/2/23 PostgreSQL - Hans-J?rgen Sch?nig postg...@cybertec.at: i thought there was an agreement that we don't want planner hints? Well, I want them. I think some other people do, too. Whether those people are more numerous than than the people who don't want them, and

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Josh Berkus
Personally, I think the first thing we ought to do is add a real, bona fide planner hint to override the selectivity calculation manually, maybe something like this: WHERE (x 5 AND y = 1) SELECTIVITY (0.1); Then, having provided a method for the DBA to extinguish the raging flames of

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Robert Haas
On Wed, Feb 23, 2011 at 8:09 PM, Josh Berkus j...@agliodbs.com wrote: Personally, I think the first thing we ought to do is add a real, bona fide planner hint to override the selectivity calculation manually, maybe something like this: WHERE (x 5 AND y = 1) SELECTIVITY (0.1); Then, having

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Bruce Momjian
Robert Haas wrote: If you want to take the above as in any way an exhaustive survey of the landscape (which it isn't), C seems like a standout, maybe augmented by the making the planner able to notice that A1 = x1 AND A2 = x2 is equivalent to (A1,A2) = (x1, x2) so you don't have to rewrite

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread Robert Haas
On Wed, Feb 23, 2011 at 10:30 PM, Bruce Momjian br...@momjian.us wrote: Robert Haas wrote: If you want to take the above as in any way an exhaustive survey of the landscape (which it isn't), C seems like a standout, maybe augmented by the making the planner able to notice that A1 = x1 AND A2

Re: [HACKERS] WIP: cross column correlation ...

2011-02-23 Thread PostgreSQL - Hans-Jürgen Schönig
On Feb 24, 2011, at 2:09 AM, Josh Berkus wrote: Personally, I think the first thing we ought to do is add a real, bona fide planner hint to override the selectivity calculation manually, maybe something like this: WHERE (x 5 AND y = 1) SELECTIVITY (0.1); Then, having provided a method

Re: [HACKERS] WIP: cross column correlation ...

2011-02-22 Thread Robert Haas
2011/2/22 PostgreSQL - Hans-Jürgen Schönig postg...@cybertec.at: how does it work? we try to find suitable statistics for an arbitrary length list of conditions so that the planner can use it directly rather than multiplying all the selectivities. this should make estimates a lot more

Re: [HACKERS] WIP: cross column correlation ...

2011-02-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: /me prepares to go down in flames. Personally, I think the first thing we ought to do is add a real, bona fide planner hint to override the selectivity calculation manually, maybe something like this: WHERE (x 5 AND y = 1) SELECTIVITY (0.1); One

Re: [HACKERS] WIP: cross column correlation ...

2011-02-22 Thread Robert Haas
On Tue, Feb 22, 2011 at 9:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: /me prepares to go down in flames. Personally, I think the first thing we ought to do is add a real, bona fide planner hint to override the selectivity calculation manually, maybe

Re: [HACKERS] WIP: cross column correlation ...

2011-02-22 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Tue, Feb 22, 2011 at 9:43 PM, Tom Lane t...@sss.pgh.pa.us wrote: One of the criteria we've always had for a suitable hint-or-whatever- you-call-it design is that it *not* involve decorating the queries. [ snip ] To put that another way, it's true

Re: [HACKERS] WIP: cross column correlation ...

2011-02-22 Thread Nathan Boley
Personally, I think the first thing we ought to do is add a real, bona fide planner hint to override the selectivity calculation manually, maybe something like this: WHERE (x 5 AND y = 1) SELECTIVITY (0.1); If you're going to go that far, why not just collect statistics on that specific