### Re: [HACKERS] multivariate statistics (v19)

On Sun, Feb 12, 2017 at 10:35:04AM +, Dean Rasheed wrote: > On 11 February 2017 at 01:17, Tomas Vondra > wrote: > > Thanks for the feedback, I'll fix this. I've allowed myself to be a bit > > sloppy because the number of attributes in the statistics is currently

### Re: [HACKERS] multivariate statistics (v19)

On 11 February 2017 at 01:17, Tomas Vondra wrote: > Thanks for the feedback, I'll fix this. I've allowed myself to be a bit > sloppy because the number of attributes in the statistics is currently > limited to 8, so the overflows are currently not an issue. But it

### Re: [HACKERS] multivariate statistics (v19)

On 02/08/2017 07:40 PM, Dean Rasheed wrote: On 8 February 2017 at 16:09, David Fetter wrote: Combinations are n!/(k! * (n-k)!), so computing those is more along the lines of: unsigned long long choose(unsigned long long n, unsigned long long k) { if (k > n) {

### Re: [HACKERS] multivariate statistics (v19)

On 8 February 2017 at 16:09, David Fetter wrote: > Combinations are n!/(k! * (n-k)!), so computing those is more > along the lines of: > > unsigned long long > choose(unsigned long long n, unsigned long long k) { > if (k > n) { > return 0; > } > unsigned long

### Re: [HACKERS] multivariate statistics (v19)

On Wed, Feb 08, 2017 at 03:23:25PM +, Dean Rasheed wrote: > On 6 February 2017 at 21:26, Alvaro Herrera wrote: > > Tomas Vondra wrote: > >> On 02/01/2017 11:52 PM, Alvaro Herrera wrote: > > > >> > Nearby, some auxiliary functions such as n_choose_k and > >> >

### Re: [HACKERS] multivariate statistics (v19)

On 6 February 2017 at 21:26, Alvaro Herrera wrote: > Tomas Vondra wrote: >> On 02/01/2017 11:52 PM, Alvaro Herrera wrote: > >> > Nearby, some auxiliary functions such as n_choose_k and >> > num_combinations are not documented. What it is that they do? I'd >> > move these

### Re: [HACKERS] multivariate statistics (v19)

Still about 0003. dependencies.c comment at the top of the file should contain some details about what is it implementing and a general description of the algorithm and data structures. As before, it's best to have the main entry point build_mv_dependencies at the top, the other public

### Re: [HACKERS] multivariate statistics (v19)

Looking at 0003, I notice that gram.y is changed to add a WITH ( .. ) clause. If it's not specified, an error is raised. If you create stats with (ndistinct) then you can't alter it later to add "dependencies" or whatever; unless I misunderstand, you have to drop the statistics and create

### Re: [HACKERS] multivariate statistics (v19)

Tomas Vondra wrote: > On 02/01/2017 11:52 PM, Alvaro Herrera wrote: > > Nearby, some auxiliary functions such as n_choose_k and > > num_combinations are not documented. What it is that they do? I'd > > move these at the end of the file, keeping the important entry points > > at the top of the

### Re: [HACKERS] multivariate statistics (v19)

On Thu, Feb 2, 2017 at 3:59 AM, Tomas Vondra wrote: > There's a subtle difference between pg_node_tree and the data types for > statistics - pg_node_tree stores the value as a string (matching the > nodeToString output), so the _in function is fairly simple. Of

### Re: [HACKERS] multivariate statistics (v19)

On 02/01/2017 11:52 PM, Alvaro Herrera wrote: Still looking at 0002. pg_ndistinct_in disallows input, claiming that pg_node_tree does the same thing. But pg_node_tree does it for security reasons: you could crash the backend if you supplied a malicious value. I don't think that applies to

### Re: [HACKERS] multivariate statistics (v19)

Still looking at 0002. pg_ndistinct_in disallows input, claiming that pg_node_tree does the same thing. But pg_node_tree does it for security reasons: you could crash the backend if you supplied a malicious value. I don't think that applies to pg_ndistinct_in. Perhaps it will be useful to

### Re: [HACKERS] multivariate statistics (v19)

On 01/31/2017 07:52 AM, Amit Langote wrote: On 2017/01/31 6:57, Tomas Vondra wrote: On 01/30/2017 09:37 PM, Alvaro Herrera wrote: Looks good to me. I don't think we need to keep the names very short -- I would propose "standistinct", "stahistogram", "stadependencies". Yeah, I got annoyed

### Re: [HACKERS] multivariate statistics (v19)

On 2017/01/31 6:57, Tomas Vondra wrote: > On 01/30/2017 09:37 PM, Alvaro Herrera wrote: >> Looks good to me. I don't think we need to keep the names very short -- >> I would propose "standistinct", "stahistogram", "stadependencies". >> > > Yeah, I got annoyed by the short names too. > > This

### Re: [HACKERS] multivariate statistics (v19)

On Tue, Jan 31, 2017 at 6:57 AM, Tomas Vondra wrote: > This however reminds me that perhaps pg_mv_statistic is not the best name. I > know others proposed pg_statistic_ext (and pg_stats_ext), and while I wasn't > a big fan initially, I think it's a better name.

### Re: [HACKERS] multivariate statistics (v19)

On 01/30/2017 09:37 PM, Alvaro Herrera wrote: Tomas Vondra wrote: The 'built' flags may be easily replaced with a check if the bytea-like columns are NULL, and the 'enabled' columns may be replaced by the array of char, just like you proposed. That'd give us a single catalog looking like

### Re: [HACKERS] multivariate statistics (v19)

Tomas Vondra wrote: > The 'built' flags may be easily replaced with a check if the bytea-like > columns are NULL, and the 'enabled' columns may be replaced by the array of > char, just like you proposed. > > That'd give us a single catalog looking like this: > > pg_mv_statistics > starelid >

### Re: [HACKERS] multivariate statistics (v19)

On 01/30/2017 05:12 PM, Alvaro Herrera wrote: > Hmm. So we have a catalog pg_mv_statistics which stores two things: 1. the configuration regarding mvstats that have been requested by user via CREATE/ALTER STATISTICS 2. the actual values captured from the above, via ANALYZE I think this

### Re: [HACKERS] multivariate statistics (v19)

On 01/30/2017 05:55 PM, Alvaro Herrera wrote: Minor nitpicks: Let me suggest to use get_attnum() in CreateStatistics instead of SearchSysCacheAttName for each column. Also, we use type AttrNumber for attribute numbers rather than int16. Finally in the same function you have an erroneous

### Re: [HACKERS] multivariate statistics (v19)

Hello, On 01/26/2017 10:03 AM, Ideriha, Takeshi wrote: Though I pointed out these typoes and so on, I believe these feedback are less priority compared to the source code itself. So please work on my feedback if you have time. I think getting the comments (and docs in general) right is

### Re: [HACKERS] multivariate statistics (v19)

On 01/26/2017 10:43 AM, Dilip Kumar wrote: histograms -- + if (matches[i] == MVSTATS_MATCH_FULL) + s += mvhist->buckets[i]->ntuples; + else if (matches[i] == MVSTATS_MATCH_PARTIAL) + s += 0.5 * mvhist->buckets[i]->ntuples; Isn't it will be better that take some percentage of the

### Re: [HACKERS] multivariate statistics (v19)

Minor nitpicks: Let me suggest to use get_attnum() in CreateStatistics instead of SearchSysCacheAttName for each column. Also, we use type AttrNumber for attribute numbers rather than int16. Finally in the same function you have an erroneous ERRCODE_UNDEFINED_COLUMN which should be

### Re: [HACKERS] multivariate statistics (v19)

Tomas Vondra wrote: > On 01/03/2017 05:22 PM, Tomas Vondra wrote: > > On 01/03/2017 02:42 PM, Dilip Kumar wrote: > ... > > > I think it should be easily reproducible, in case it's not I can send > > > call stack or core dump. > > > > > > > Thanks for the report. It was trivial to reproduce and

### Re: [HACKERS] multivariate statistics (v19)

Hello, I'll return on this since this should welcome more eyeballs. At Thu, 26 Jan 2017 09:03:10 +, "Ideriha, Takeshi" wrote in <4E72940DA2BF16479384A86D54D0988A565822A9@G01JPEXMBKW04> > Hi > > When you have time, could you rebase the pathes? > Some

### Re: [HACKERS] multivariate statistics (v19)

On Thu, Jan 5, 2017 at 3:27 AM, Tomas Vondra wrote: > Thanks. Those plans match my experiments with the TPC-H data set, although > I've been playing with the smallest scale (1GB). > > It's not very difficult to make the estimation error arbitrary large, e.g. > by

### Re: [HACKERS] multivariate statistics (v19)

Hi When you have time, could you rebase the pathes? Some patches cannot be applied to the current HEAD. 0001 patch can be applied but the following 0002 patch cannot be. I've just started reading your patch (mainly docs and README, not yet source code.) Though these are minor things, I've

### Re: [HACKERS] multivariate statistics (v19)

On Wed, Jan 25, 2017 at 9:56 PM, Alvaro Herrera wrote: > Michael Paquier wrote: >> And nothing has happened since. Are there people willing to review >> this patch and help it proceed? > > I am going to grab this patch as committer. Thanks, that's good to know. --

### Re: [HACKERS] multivariate statistics (v19)

Michael Paquier wrote: > On Wed, Jan 4, 2017 at 11:35 AM, Tomas Vondra > wrote: > > Attached is v22 of the patch series, rebased to current master and fixing > > the reported bug. I haven't made any other changes - the issues reported by > > Petr are mostly minor,

### Re: [HACKERS] multivariate statistics (v19)

On Wed, Jan 4, 2017 at 11:35 AM, Tomas Vondra wrote: > On 01/03/2017 05:22 PM, Tomas Vondra wrote: >> >> On 01/03/2017 02:42 PM, Dilip Kumar wrote: > > ... >>> >>> I think it should be easily reproducible, in case it's not I can send >>> call stack or core dump. >>>

### Re: [HACKERS] multivariate statistics (v19)

On 01/04/2017 03:21 PM, Dilip Kumar wrote: On Wed, Jan 4, 2017 at 8:05 AM, Tomas Vondra wrote: Attached is v22 of the patch series, rebased to current master and fixing the reported bug. I haven't made any other changes - the issues reported by Petr are mostly

### Re: [HACKERS] multivariate statistics (v19)

On Wed, Jan 4, 2017 at 8:05 AM, Tomas Vondra wrote: > Attached is v22 of the patch series, rebased to current master and fixing > the reported bug. I haven't made any other changes - the issues reported by > Petr are mostly minor, so I've decided to wait a bit more

### Re: [HACKERS] multivariate statistics (v19)

On 12/30/2016 02:12 PM, Petr Jelinek wrote: On 12/12/16 22:50, Tomas Vondra wrote: On 12/12/2016 12:26 PM, Amit Langote wrote: Hi Tomas, On 2016/10/30 4:23, Tomas Vondra wrote: Hi, Attached is v20 of the multivariate statistics patch series, doing mostly the changes outlined in the

### Re: [HACKERS] multivariate statistics (v19)

On 01/03/2017 02:42 PM, Dilip Kumar wrote: On Tue, Dec 13, 2016 at 3:20 AM, Tomas Vondra wrote: attached is v21 of the patch series, rebased to current master (resolving the duplicate OID and a few trivial merge conflicts), and also fixing some of the issues you

### Re: [HACKERS] multivariate statistics (v19)

On Tue, Dec 13, 2016 at 3:20 AM, Tomas Vondra wrote: > attached is v21 of the patch series, rebased to current master (resolving > the duplicate OID and a few trivial merge conflicts), and also fixing some > of the issues you reported. I wanted to test the grouping

### Re: [HACKERS] multivariate statistics (v19)

On 12/12/16 22:50, Tomas Vondra wrote: > On 12/12/2016 12:26 PM, Amit Langote wrote: >> >> Hi Tomas, >> >> On 2016/10/30 4:23, Tomas Vondra wrote: >>> Hi, >>> >>> Attached is v20 of the multivariate statistics patch series, doing >>> mostly >>> the changes outlined in the preceding e-mail from

### Re: [HACKERS] multivariate statistics (v19)

On 12/12/16 22:50, Tomas Vondra wrote: >> + >> +SELECT pg_mv_stats_dependencies_show(stadeps) >> + FROM pg_mv_statistic WHERE staname = 's1'; >> + >> + pg_mv_stats_dependencies_show >> +--- >> + (1) => 2, (2) => 1 >> +(1 row) >> + >> >> Couldn't this somehow show

### Re: [HACKERS] multivariate statistics (v19)

Hi Tomas, On 2016/10/30 4:23, Tomas Vondra wrote: > Hi, > > Attached is v20 of the multivariate statistics patch series, doing mostly > the changes outlined in the preceding e-mail from October 11. > > The patch series currently has these parts: > > * 0001 : (FIX) teach pull_varno about

### Re: [HACKERS] multivariate statistics (v19)

Hi everyone, thanks for the reviews. Let me sum the feedback so far, and outline my plans for the next patch version that I'd like to submit for CF 2016-11. 1) syntax changes I agree with the changes proposed by Dean, although only a subset of the syntax is going to be supported until we

### Re: [HACKERS] multivariate statistics (v19)

On 4 October 2016 at 09:15, Heikki Linnakangas wrote: > However, for tables and views, each object you store in those views is a > "table" or "view", but with this thing, the object you store is > "statistics". Would you have a catalog table called "pg_scissor"? > No, probably

### Re: [HACKERS] multivariate statistics (v19)

On 04/10/16 20:37, Dean Rasheed wrote: On 4 October 2016 at 04:25, Michael Paquier wrote: OK. A second thing was related to the use of schemas in the new system catalogs. As mentioned in [1], those could be removed. [1]:

### Re: [HACKERS] multivariate statistics (v19)

On 10/04/2016 10:49 AM, Dean Rasheed wrote: On 30 September 2016 at 12:10, Heikki Linnakangas wrote: I fear that using "statistics" as the name of the new object might get a bit awkward. "statistics" is a plural, but we use it as the name of a single object, like "pants" or

### Re: [HACKERS] multivariate statistics (v19)

On 30 September 2016 at 12:10, Heikki Linnakangas wrote: > I fear that using "statistics" as the name of the new object might get a bit > awkward. "statistics" is a plural, but we use it as the name of a single > object, like "pants" or "scissors". Not sure I have any better

### Re: [HACKERS] multivariate statistics (v19)

On 4 October 2016 at 04:25, Michael Paquier wrote: > OK. A second thing was related to the use of schemas in the new system > catalogs. As mentioned in [1], those could be removed. > [1]: >

### Re: [HACKERS] multivariate statistics (v19)

On Mon, Oct 3, 2016 at 8:25 PM, Heikki Linnakangas wrote: > Yeah. The idea was to use something like pg_node_tree to store all the > different kinds of statistics, the histogram, the MCV, and the functional > dependencies, in one datum. Or JSON, maybe. It sounds better than an

### Re: [HACKERS] multivariate statistics (v19)

On 10/03/2016 04:46 AM, Michael Paquier wrote: On Fri, Sep 30, 2016 at 8:10 PM, Heikki Linnakangas wrote: This patch set is in pretty good shape, the only problem is that it's so big that no-one seems to have the time or courage to do the final touches and commit it. Did you

### Re: [HACKERS] multivariate statistics (v19)

On Fri, Sep 30, 2016 at 8:10 PM, Heikki Linnakangas wrote: > This patch set is in pretty good shape, the only problem is that it's so big > that no-one seems to have the time or courage to do the final touches and > commit it. Did you see my suggestions about simplifying its SQL

### Re: [HACKERS] multivariate statistics (v19)

This patch set is in pretty good shape, the only problem is that it's so big that no-one seems to have the time or courage to do the final touches and commit it. If we just focus on the functional dependencies part for now, I think we might get somewhere. I peeked at the MCV and histogram

### Re: [HACKERS] multivariate statistics (v19)

Hi, Thanks for looking into this! On 09/12/2016 04:08 PM, Dean Rasheed wrote: On 3 August 2016 at 02:58, Tomas Vondra wrote: Attached is v19 of the "multivariate stats" patch series Hi, I started looking at this - just at a very high level - I've not read

### Re: [HACKERS] multivariate statistics (v19)

On 3 August 2016 at 02:58, Tomas Vondra wrote: > Attached is v19 of the "multivariate stats" patch series Hi, I started looking at this - just at a very high level - I've not read much of the detail yet, but here are some initial review comments. I think the

### Re: [HACKERS] multivariate statistics (v19)

On Wed, Aug 24, 2016 at 2:03 AM, Robert Haas wrote: > ISTR that you were going to try to look at this patch set. It seems > from the discussion that it's not really ready for serious > consideration for commit yet, but also that some high-level design > comments from you

### Re: [HACKERS] multivariate statistics (v19)

On Tue, Aug 2, 2016 at 9:58 PM, Tomas Vondra wrote: > Attached is v19 of the "multivariate stats" patch series - essentially v18 > rebased on top of current master. Tom: ISTR that you were going to try to look at this patch set. It seems from the discussion that

### Re: [HACKERS] multivariate statistics (v19)

On 08/10/2016 06:41 AM, Michael Paquier wrote: On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra wrote: 1) enriching the query tree with multivariate statistics info Right now all the stuff related to multivariate statistics estimation happens in clausesel.c -

### Re: [HACKERS] multivariate statistics (v19)

On Thu, Aug 11, 2016 at 3:34 AM, Tomas Vondra wrote: > On 08/10/2016 02:23 PM, Michael Paquier wrote: >> >> On Wed, Aug 10, 2016 at 8:33 PM, Tomas Vondra >> wrote: >>> The idea is that the syntax should work even for statistics built on

### Re: [HACKERS] multivariate statistics (v19)

On 08/10/2016 02:23 PM, Michael Paquier wrote: On Wed, Aug 10, 2016 at 8:33 PM, Tomas Vondra wrote: On 08/10/2016 06:41 AM, Michael Paquier wrote: Patch 0001: there have been comments about that before, and you have put the checks on RestrictInfo in a couple of

### Re: [HACKERS] multivariate statistics (v19)

On 08/10/2016 02:24 PM, Michael Paquier wrote: On Wed, Aug 10, 2016 at 8:50 PM, Petr Jelinek wrote: On 10/08/16 13:33, Tomas Vondra wrote: On 08/10/2016 06:41 AM, Michael Paquier wrote: On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra 2) combining multiple statistics

### Re: [HACKERS] multivariate statistics (v19)

On 08/10/2016 03:29 PM, Ants Aasma wrote: On Wed, Aug 3, 2016 at 4:58 AM, Tomas Vondra wrote: 2) combining multiple statistics I think the ability to combine multivariate statistics (covering different subsets of conditions) is important and useful, but I'm

### Re: [HACKERS] multivariate statistics (v19)

On Wed, Aug 3, 2016 at 4:58 AM, Tomas Vondra wrote: > 2) combining multiple statistics > > I think the ability to combine multivariate statistics (covering different > subsets of conditions) is important and useful, but I'm starting to think > that the current

### Re: [HACKERS] multivariate statistics (v19)

On Wed, Aug 10, 2016 at 8:50 PM, Petr Jelinek wrote: > On 10/08/16 13:33, Tomas Vondra wrote: >> >> On 08/10/2016 06:41 AM, Michael Paquier wrote: >>> >>> On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra 2) combining multiple statistics I think the

### Re: [HACKERS] multivariate statistics (v19)

On Wed, Aug 10, 2016 at 8:33 PM, Tomas Vondra wrote: > On 08/10/2016 06:41 AM, Michael Paquier wrote: >> Patch 0001: there have been comments about that before, and you have >> put the checks on RestrictInfo in a couple of variables of >> pull_varnos_walker, so

### Re: [HACKERS] multivariate statistics (v19)

On 10/08/16 13:33, Tomas Vondra wrote: On 08/10/2016 06:41 AM, Michael Paquier wrote: On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra 2) combining multiple statistics I think the ability to combine multivariate statistics (covering different subsets of conditions) is important and useful, but

### Re: [HACKERS] multivariate statistics (v19)

On 08/10/2016 06:41 AM, Michael Paquier wrote: On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra wrote: ... But more importantly, I think we'll need to show some of the data in EXPLAIN output. With per-column statistics it's fairly straightforward to determine which

### Re: [HACKERS] multivariate statistics (v19)

On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra wrote: > 1) enriching the query tree with multivariate statistics info > > Right now all the stuff related to multivariate statistics estimation > happens in clausesel.c - matching condition to statistics, selection of >

### Re: [HACKERS] multivariate statistics (v19)

On Sat, Aug 6, 2016 at 2:38 AM, Tomas Vondra wrote: > On 08/05/2016 06:24 AM, Michael Paquier wrote: >> >> On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra >> wrote: >>> >>> Attached is v19 of the "multivariate stats" patch series -

### Re: [HACKERS] multivariate statistics (v19)

On 08/05/2016 06:24 AM, Michael Paquier wrote: On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra wrote: Attached is v19 of the "multivariate stats" patch series - essentially v18 rebased on top of current master. Aside from a few bug fixes, the main improvement is

### Re: [HACKERS] multivariate statistics (v19)

On Wed, Aug 3, 2016 at 10:58 AM, Tomas Vondra wrote: > Attached is v19 of the "multivariate stats" patch series - essentially v18 > rebased on top of current master. Aside from a few bug fixes, the main > improvement is addition of SGML docs demonstrating the