Re: [HACKERS] multivariate statistics (v19)

2017-02-12 Thread David Fetter
On Sun, Feb 12, 2017 at 10:35:04AM +, Dean Rasheed wrote: > On 11 February 2017 at 01:17, Tomas Vondra > wrote: > > Thanks for the feedback, I'll fix this. I've allowed myself to be a bit > > sloppy because the number of attributes in the statistics is currently

Re: [HACKERS] multivariate statistics (v19)

2017-02-12 Thread Dean Rasheed
On 11 February 2017 at 01:17, Tomas Vondra wrote: > Thanks for the feedback, I'll fix this. I've allowed myself to be a bit > sloppy because the number of attributes in the statistics is currently > limited to 8, so the overflows are currently not an issue. But it

Re: [HACKERS] multivariate statistics (v19)

2017-02-10 Thread Tomas Vondra
On 02/08/2017 07:40 PM, Dean Rasheed wrote: On 8 February 2017 at 16:09, David Fetter wrote: Combinations are n!/(k! * (n-k)!), so computing those is more along the lines of: unsigned long long choose(unsigned long long n, unsigned long long k) { if (k > n) {

Re: [HACKERS] multivariate statistics (v19)

2017-02-08 Thread Dean Rasheed
On 8 February 2017 at 16:09, David Fetter wrote: > Combinations are n!/(k! * (n-k)!), so computing those is more > along the lines of: > > unsigned long long > choose(unsigned long long n, unsigned long long k) { > if (k > n) { > return 0; > } > unsigned long

Re: [HACKERS] multivariate statistics (v19)

2017-02-08 Thread David Fetter
On Wed, Feb 08, 2017 at 03:23:25PM +, Dean Rasheed wrote: > On 6 February 2017 at 21:26, Alvaro Herrera wrote: > > Tomas Vondra wrote: > >> On 02/01/2017 11:52 PM, Alvaro Herrera wrote: > > > >> > Nearby, some auxiliary functions such as n_choose_k and > >> >

Re: [HACKERS] multivariate statistics (v19)

2017-02-08 Thread Dean Rasheed
On 6 February 2017 at 21:26, Alvaro Herrera wrote: > Tomas Vondra wrote: >> On 02/01/2017 11:52 PM, Alvaro Herrera wrote: > >> > Nearby, some auxiliary functions such as n_choose_k and >> > num_combinations are not documented. What it is that they do? I'd >> > move these

Re: [HACKERS] multivariate statistics (v19)

2017-02-06 Thread Alvaro Herrera
Still about 0003. dependencies.c comment at the top of the file should contain some details about what is it implementing and a general description of the algorithm and data structures. As before, it's best to have the main entry point build_mv_dependencies at the top, the other public

Re: [HACKERS] multivariate statistics (v19)

2017-02-06 Thread Alvaro Herrera
Looking at 0003, I notice that gram.y is changed to add a WITH ( .. ) clause. If it's not specified, an error is raised. If you create stats with (ndistinct) then you can't alter it later to add "dependencies" or whatever; unless I misunderstand, you have to drop the statistics and create

Re: [HACKERS] multivariate statistics (v19)

2017-02-06 Thread Alvaro Herrera
Tomas Vondra wrote: > On 02/01/2017 11:52 PM, Alvaro Herrera wrote: > > Nearby, some auxiliary functions such as n_choose_k and > > num_combinations are not documented. What it is that they do? I'd > > move these at the end of the file, keeping the important entry points > > at the top of the

Re: [HACKERS] multivariate statistics (v19)

2017-02-03 Thread Robert Haas
On Thu, Feb 2, 2017 at 3:59 AM, Tomas Vondra wrote: > There's a subtle difference between pg_node_tree and the data types for > statistics - pg_node_tree stores the value as a string (matching the > nodeToString output), so the _in function is fairly simple. Of

Re: [HACKERS] multivariate statistics (v19)

2017-02-02 Thread Tomas Vondra
On 02/01/2017 11:52 PM, Alvaro Herrera wrote: Still looking at 0002. pg_ndistinct_in disallows input, claiming that pg_node_tree does the same thing. But pg_node_tree does it for security reasons: you could crash the backend if you supplied a malicious value. I don't think that applies to

Re: [HACKERS] multivariate statistics (v19)

2017-02-01 Thread Alvaro Herrera
Still looking at 0002. pg_ndistinct_in disallows input, claiming that pg_node_tree does the same thing. But pg_node_tree does it for security reasons: you could crash the backend if you supplied a malicious value. I don't think that applies to pg_ndistinct_in. Perhaps it will be useful to

Re: [HACKERS] multivariate statistics (v19)

2017-01-31 Thread Tomas Vondra
On 01/31/2017 07:52 AM, Amit Langote wrote: On 2017/01/31 6:57, Tomas Vondra wrote: On 01/30/2017 09:37 PM, Alvaro Herrera wrote: Looks good to me. I don't think we need to keep the names very short -- I would propose "standistinct", "stahistogram", "stadependencies". Yeah, I got annoyed

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Amit Langote
On 2017/01/31 6:57, Tomas Vondra wrote: > On 01/30/2017 09:37 PM, Alvaro Herrera wrote: >> Looks good to me. I don't think we need to keep the names very short -- >> I would propose "standistinct", "stahistogram", "stadependencies". >> > > Yeah, I got annoyed by the short names too. > > This

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Michael Paquier
On Tue, Jan 31, 2017 at 6:57 AM, Tomas Vondra wrote: > This however reminds me that perhaps pg_mv_statistic is not the best name. I > know others proposed pg_statistic_ext (and pg_stats_ext), and while I wasn't > a big fan initially, I think it's a better name.

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Tomas Vondra
On 01/30/2017 09:37 PM, Alvaro Herrera wrote: Tomas Vondra wrote: The 'built' flags may be easily replaced with a check if the bytea-like columns are NULL, and the 'enabled' columns may be replaced by the array of char, just like you proposed. That'd give us a single catalog looking like

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Alvaro Herrera
Tomas Vondra wrote: > The 'built' flags may be easily replaced with a check if the bytea-like > columns are NULL, and the 'enabled' columns may be replaced by the array of > char, just like you proposed. > > That'd give us a single catalog looking like this: > > pg_mv_statistics > starelid >

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Tomas Vondra
On 01/30/2017 05:12 PM, Alvaro Herrera wrote: > Hmm. So we have a catalog pg_mv_statistics which stores two things: 1. the configuration regarding mvstats that have been requested by user via CREATE/ALTER STATISTICS 2. the actual values captured from the above, via ANALYZE I think this

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Tomas Vondra
On 01/30/2017 05:55 PM, Alvaro Herrera wrote: Minor nitpicks: Let me suggest to use get_attnum() in CreateStatistics instead of SearchSysCacheAttName for each column. Also, we use type AttrNumber for attribute numbers rather than int16. Finally in the same function you have an erroneous

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Tomas Vondra
Hello, On 01/26/2017 10:03 AM, Ideriha, Takeshi wrote: Though I pointed out these typoes and so on, I believe these feedback are less priority compared to the source code itself. So please work on my feedback if you have time. I think getting the comments (and docs in general) right is

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Tomas Vondra
On 01/26/2017 10:43 AM, Dilip Kumar wrote: histograms -- + if (matches[i] == MVSTATS_MATCH_FULL) + s += mvhist->buckets[i]->ntuples; + else if (matches[i] == MVSTATS_MATCH_PARTIAL) + s += 0.5 * mvhist->buckets[i]->ntuples; Isn't it will be better that take some percentage of the

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Alvaro Herrera
Minor nitpicks: Let me suggest to use get_attnum() in CreateStatistics instead of SearchSysCacheAttName for each column. Also, we use type AttrNumber for attribute numbers rather than int16. Finally in the same function you have an erroneous ERRCODE_UNDEFINED_COLUMN which should be

Re: [HACKERS] multivariate statistics (v19)

2017-01-30 Thread Alvaro Herrera
Tomas Vondra wrote: > On 01/03/2017 05:22 PM, Tomas Vondra wrote: > > On 01/03/2017 02:42 PM, Dilip Kumar wrote: > ... > > > I think it should be easily reproducible, in case it's not I can send > > > call stack or core dump. > > > > > > > Thanks for the report. It was trivial to reproduce and

Re: [HACKERS] multivariate statistics (v19)

2017-01-26 Thread Kyotaro HORIGUCHI
Hello, I'll return on this since this should welcome more eyeballs. At Thu, 26 Jan 2017 09:03:10 +, "Ideriha, Takeshi" wrote in <4E72940DA2BF16479384A86D54D0988A565822A9@G01JPEXMBKW04> > Hi > > When you have time, could you rebase the pathes? > Some

Re: [HACKERS] multivariate statistics (v19)

2017-01-26 Thread Dilip Kumar
On Thu, Jan 5, 2017 at 3:27 AM, Tomas Vondra wrote: > Thanks. Those plans match my experiments with the TPC-H data set, although > I've been playing with the smallest scale (1GB). > > It's not very difficult to make the estimation error arbitrary large, e.g. > by

Re: [HACKERS] multivariate statistics (v19)

2017-01-26 Thread Ideriha, Takeshi
Hi When you have time, could you rebase the pathes? Some patches cannot be applied to the current HEAD. 0001 patch can be applied but the following 0002 patch cannot be. I've just started reading your patch (mainly docs and README, not yet source code.) Though these are minor things, I've

Re: [HACKERS] multivariate statistics (v19)

2017-01-25 Thread Michael Paquier
On Wed, Jan 25, 2017 at 9:56 PM, Alvaro Herrera wrote: > Michael Paquier wrote: >> And nothing has happened since. Are there people willing to review >> this patch and help it proceed? > > I am going to grab this patch as committer. Thanks, that's good to know. --

Re: [HACKERS] multivariate statistics (v19)

2017-01-25 Thread Alvaro Herrera
Michael Paquier wrote: > On Wed, Jan 4, 2017 at 11:35 AM, Tomas Vondra > wrote: > > Attached is v22 of the patch series, rebased to current master and fixing > > the reported bug. I haven't made any other changes - the issues reported by > > Petr are mostly minor,

Re: [HACKERS] multivariate statistics (v19)

2017-01-24 Thread Michael Paquier
On Wed, Jan 4, 2017 at 11:35 AM, Tomas Vondra wrote: > On 01/03/2017 05:22 PM, Tomas Vondra wrote: >> >> On 01/03/2017 02:42 PM, Dilip Kumar wrote: > > ... >>> >>> I think it should be easily reproducible, in case it's not I can send >>> call stack or core dump. >>>

Re: [HACKERS] multivariate statistics (v19)

2017-01-04 Thread Tomas Vondra
On 01/04/2017 03:21 PM, Dilip Kumar wrote: On Wed, Jan 4, 2017 at 8:05 AM, Tomas Vondra wrote: Attached is v22 of the patch series, rebased to current master and fixing the reported bug. I haven't made any other changes - the issues reported by Petr are mostly

Re: [HACKERS] multivariate statistics (v19)

2017-01-04 Thread Dilip Kumar
On Wed, Jan 4, 2017 at 8:05 AM, Tomas Vondra wrote: > Attached is v22 of the patch series, rebased to current master and fixing > the reported bug. I haven't made any other changes - the issues reported by > Petr are mostly minor, so I've decided to wait a bit more

Re: [HACKERS] multivariate statistics (v19)

2017-01-03 Thread Tomas Vondra
On 12/30/2016 02:12 PM, Petr Jelinek wrote: On 12/12/16 22:50, Tomas Vondra wrote: On 12/12/2016 12:26 PM, Amit Langote wrote: Hi Tomas, On 2016/10/30 4:23, Tomas Vondra wrote: Hi, Attached is v20 of the multivariate statistics patch series, doing mostly the changes outlined in the

Re: [HACKERS] multivariate statistics (v19)

2017-01-03 Thread Tomas Vondra
On 01/03/2017 02:42 PM, Dilip Kumar wrote: On Tue, Dec 13, 2016 at 3:20 AM, Tomas Vondra wrote: attached is v21 of the patch series, rebased to current master (resolving the duplicate OID and a few trivial merge conflicts), and also fixing some of the issues you

Re: [HACKERS] multivariate statistics (v19)

2017-01-03 Thread Dilip Kumar
On Tue, Dec 13, 2016 at 3:20 AM, Tomas Vondra wrote: > attached is v21 of the patch series, rebased to current master (resolving > the duplicate OID and a few trivial merge conflicts), and also fixing some > of the issues you reported. I wanted to test the grouping

Re: [HACKERS] multivariate statistics (v19)

2016-12-30 Thread Petr Jelinek
On 12/12/16 22:50, Tomas Vondra wrote: > On 12/12/2016 12:26 PM, Amit Langote wrote: >> >> Hi Tomas, >> >> On 2016/10/30 4:23, Tomas Vondra wrote: >>> Hi, >>> >>> Attached is v20 of the multivariate statistics patch series, doing >>> mostly >>> the changes outlined in the preceding e-mail from

Re: [HACKERS] multivariate statistics (v19)

2016-12-30 Thread Petr Jelinek
On 12/12/16 22:50, Tomas Vondra wrote: >> + >> +SELECT pg_mv_stats_dependencies_show(stadeps) >> + FROM pg_mv_statistic WHERE staname = 's1'; >> + >> + pg_mv_stats_dependencies_show >> +--- >> + (1) => 2, (2) => 1 >> +(1 row) >> + >> >> Couldn't this somehow show

Re: [HACKERS] multivariate statistics (v19)

2016-12-12 Thread Amit Langote
Hi Tomas, On 2016/10/30 4:23, Tomas Vondra wrote: > Hi, > > Attached is v20 of the multivariate statistics patch series, doing mostly > the changes outlined in the preceding e-mail from October 11. > > The patch series currently has these parts: > > * 0001 : (FIX) teach pull_varno about

Re: [HACKERS] multivariate statistics (v19)

2016-10-10 Thread Tomas Vondra
Hi everyone, thanks for the reviews. Let me sum the feedback so far, and outline my plans for the next patch version that I'd like to submit for CF 2016-11. 1) syntax changes I agree with the changes proposed by Dean, although only a subset of the syntax is going to be supported until we

Re: [HACKERS] multivariate statistics (v19)

2016-10-04 Thread Dean Rasheed
On 4 October 2016 at 09:15, Heikki Linnakangas wrote: > However, for tables and views, each object you store in those views is a > "table" or "view", but with this thing, the object you store is > "statistics". Would you have a catalog table called "pg_scissor"? > No, probably

Re: [HACKERS] multivariate statistics (v19)

2016-10-04 Thread Gavin Flower
On 04/10/16 20:37, Dean Rasheed wrote: On 4 October 2016 at 04:25, Michael Paquier wrote: OK. A second thing was related to the use of schemas in the new system catalogs. As mentioned in [1], those could be removed. [1]:

Re: [HACKERS] multivariate statistics (v19)

2016-10-04 Thread Heikki Linnakangas
On 10/04/2016 10:49 AM, Dean Rasheed wrote: On 30 September 2016 at 12:10, Heikki Linnakangas wrote: I fear that using "statistics" as the name of the new object might get a bit awkward. "statistics" is a plural, but we use it as the name of a single object, like "pants" or

Re: [HACKERS] multivariate statistics (v19)

2016-10-04 Thread Dean Rasheed
On 30 September 2016 at 12:10, Heikki Linnakangas wrote: > I fear that using "statistics" as the name of the new object might get a bit > awkward. "statistics" is a plural, but we use it as the name of a single > object, like "pants" or "scissors". Not sure I have any better

Re: [HACKERS] multivariate statistics (v19)

2016-10-04 Thread Dean Rasheed
On 4 October 2016 at 04:25, Michael Paquier wrote: > OK. A second thing was related to the use of schemas in the new system > catalogs. As mentioned in [1], those could be removed. > [1]: >

Re: [HACKERS] multivariate statistics (v19)

2016-10-03 Thread Michael Paquier
On Mon, Oct 3, 2016 at 8:25 PM, Heikki Linnakangas wrote: > Yeah. The idea was to use something like pg_node_tree to store all the > different kinds of statistics, the histogram, the MCV, and the functional > dependencies, in one datum. Or JSON, maybe. It sounds better than an

Re: [HACKERS] multivariate statistics (v19)

2016-10-03 Thread Heikki Linnakangas
On 10/03/2016 04:46 AM, Michael Paquier wrote: On Fri, Sep 30, 2016 at 8:10 PM, Heikki Linnakangas wrote: This patch set is in pretty good shape, the only problem is that it's so big that no-one seems to have the time or courage to do the final touches and commit it. Did you

Re: [HACKERS] multivariate statistics (v19)

2016-10-02 Thread Michael Paquier
On Fri, Sep 30, 2016 at 8:10 PM, Heikki Linnakangas wrote: > This patch set is in pretty good shape, the only problem is that it's so big > that no-one seems to have the time or courage to do the final touches and > commit it. Did you see my suggestions about simplifying its SQL

Re: [HACKERS] multivariate statistics (v19)

2016-09-30 Thread Heikki Linnakangas
This patch set is in pretty good shape, the only problem is that it's so big that no-one seems to have the time or courage to do the final touches and commit it. If we just focus on the functional dependencies part for now, I think we might get somewhere. I peeked at the MCV and histogram

Re: [HACKERS] multivariate statistics (v19)

2016-09-13 Thread Tomas Vondra
Hi, Thanks for looking into this! On 09/12/2016 04:08 PM, Dean Rasheed wrote: On 3 August 2016 at 02:58, Tomas Vondra wrote: Attached is v19 of the "multivariate stats" patch series Hi, I started looking at this - just at a very high level - I've not read

Re: [HACKERS] multivariate statistics (v19)

2016-09-12 Thread Dean Rasheed
On 3 August 2016 at 02:58, Tomas Vondra wrote: > Attached is v19 of the "multivariate stats" patch series Hi, I started looking at this - just at a very high level - I've not read much of the detail yet, but here are some initial review comments. I think the

Re: [HACKERS] multivariate statistics (v19)

2016-08-30 Thread Michael Paquier
On Wed, Aug 24, 2016 at 2:03 AM, Robert Haas wrote: > ISTR that you were going to try to look at this patch set. It seems > from the discussion that it's not really ready for serious > consideration for commit yet, but also that some high-level design > comments from you

Re: [HACKERS] multivariate statistics (v19)

2016-08-23 Thread Robert Haas
On Tue, Aug 2, 2016 at 9:58 PM, Tomas Vondra wrote: > Attached is v19 of the "multivariate stats" patch series - essentially v18 > rebased on top of current master. Tom: ISTR that you were going to try to look at this patch set. It seems from the discussion that

Re: [HACKERS] multivariate statistics (v19)

2016-08-15 Thread Tomas Vondra
On 08/10/2016 06:41 AM, Michael Paquier wrote: On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra wrote: 1) enriching the query tree with multivariate statistics info Right now all the stuff related to multivariate statistics estimation happens in clausesel.c -

Re: [HACKERS] multivariate statistics (v19)

2016-08-10 Thread Michael Paquier
On Thu, Aug 11, 2016 at 3:34 AM, Tomas Vondra wrote: > On 08/10/2016 02:23 PM, Michael Paquier wrote: >> >> On Wed, Aug 10, 2016 at 8:33 PM, Tomas Vondra >> wrote: >>> The idea is that the syntax should work even for statistics built on

Re: [HACKERS] multivariate statistics (v19)

2016-08-10 Thread Tomas Vondra
On 08/10/2016 02:23 PM, Michael Paquier wrote: On Wed, Aug 10, 2016 at 8:33 PM, Tomas Vondra wrote: On 08/10/2016 06:41 AM, Michael Paquier wrote: Patch 0001: there have been comments about that before, and you have put the checks on RestrictInfo in a couple of

Re: [HACKERS] multivariate statistics (v19)

2016-08-10 Thread Tomas Vondra
On 08/10/2016 02:24 PM, Michael Paquier wrote: On Wed, Aug 10, 2016 at 8:50 PM, Petr Jelinek wrote: On 10/08/16 13:33, Tomas Vondra wrote: On 08/10/2016 06:41 AM, Michael Paquier wrote: On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra 2) combining multiple statistics

Re: [HACKERS] multivariate statistics (v19)

2016-08-10 Thread Tomas Vondra
On 08/10/2016 03:29 PM, Ants Aasma wrote: On Wed, Aug 3, 2016 at 4:58 AM, Tomas Vondra wrote: 2) combining multiple statistics I think the ability to combine multivariate statistics (covering different subsets of conditions) is important and useful, but I'm

Re: [HACKERS] multivariate statistics (v19)

2016-08-10 Thread Ants Aasma
On Wed, Aug 3, 2016 at 4:58 AM, Tomas Vondra wrote: > 2) combining multiple statistics > > I think the ability to combine multivariate statistics (covering different > subsets of conditions) is important and useful, but I'm starting to think > that the current

Re: [HACKERS] multivariate statistics (v19)

2016-08-10 Thread Michael Paquier
On Wed, Aug 10, 2016 at 8:50 PM, Petr Jelinek wrote: > On 10/08/16 13:33, Tomas Vondra wrote: >> >> On 08/10/2016 06:41 AM, Michael Paquier wrote: >>> >>> On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra 2) combining multiple statistics I think the

Re: [HACKERS] multivariate statistics (v19)

2016-08-10 Thread Michael Paquier
On Wed, Aug 10, 2016 at 8:33 PM, Tomas Vondra wrote: > On 08/10/2016 06:41 AM, Michael Paquier wrote: >> Patch 0001: there have been comments about that before, and you have >> put the checks on RestrictInfo in a couple of variables of >> pull_varnos_walker, so

Re: [HACKERS] multivariate statistics (v19)

2016-08-10 Thread Petr Jelinek
On 10/08/16 13:33, Tomas Vondra wrote: On 08/10/2016 06:41 AM, Michael Paquier wrote: On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra 2) combining multiple statistics I think the ability to combine multivariate statistics (covering different subsets of conditions) is important and useful, but

Re: [HACKERS] multivariate statistics (v19)

2016-08-10 Thread Tomas Vondra
On 08/10/2016 06:41 AM, Michael Paquier wrote: On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra wrote: ... But more importantly, I think we'll need to show some of the data in EXPLAIN output. With per-column statistics it's fairly straightforward to determine which

Re: [HACKERS] multivariate statistics (v19)

2016-08-09 Thread Michael Paquier
On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra wrote: > 1) enriching the query tree with multivariate statistics info > > Right now all the stuff related to multivariate statistics estimation > happens in clausesel.c - matching condition to statistics, selection of >

Re: [HACKERS] multivariate statistics (v19)

2016-08-05 Thread Michael Paquier
On Sat, Aug 6, 2016 at 2:38 AM, Tomas Vondra wrote: > On 08/05/2016 06:24 AM, Michael Paquier wrote: >> >> On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra >> wrote: >>> >>> Attached is v19 of the "multivariate stats" patch series -

Re: [HACKERS] multivariate statistics (v19)

2016-08-05 Thread Tomas Vondra
On 08/05/2016 06:24 AM, Michael Paquier wrote: On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra wrote: Attached is v19 of the "multivariate stats" patch series - essentially v18 rebased on top of current master. Aside from a few bug fixes, the main improvement is

Re: [HACKERS] multivariate statistics (v19)

2016-08-04 Thread Michael Paquier
On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra wrote: > Attached is v19 of the "multivariate stats" patch series - essentially v18 > rebased on top of current master. Aside from a few bug fixes, the main > improvement is addition of SGML docs demonstrating the