Re: [HACKERS] multivariate statistics v14

2016-04-10 Thread Tomas Vondra
On 04/10/2016 10:25 AM, Simon Riggs wrote: On 9 April 2016 at 18:37, Tatsuo Ishii > wrote: > But I still think it wouldn't move the patch any closer to committable > state, because what it really needs is review whether the catalog

Re: [HACKERS] multivariate statistics v14

2016-04-10 Thread Tomas Vondra
Hello, On 04/09/2016 07:37 PM, Tatsuo Ishii wrote: But I still think it wouldn't move the patch any closer to committable state, because what it really needs is review whether the catalog definition makes sense, whether it should be more like pg_statistic, and so on. Only then it makes sense to

Re: [HACKERS] multivariate statistics v14

2016-04-10 Thread Simon Riggs
On 9 April 2016 at 18:37, Tatsuo Ishii wrote: > > But I still think it wouldn't move the patch any closer to committable > > state, because what it really needs is review whether the catalog > > definition makes sense, whether it should be more like pg_statistic, > > and so

Re: [HACKERS] multivariate statistics v14

2016-04-09 Thread Tatsuo Ishii
> But I still think it wouldn't move the patch any closer to committable > state, because what it really needs is review whether the catalog > definition makes sense, whether it should be more like pg_statistic, > and so on. Only then it makes sense to describe the catalog structure > in the SGML

Re: [HACKERS] multivariate statistics v14

2016-04-09 Thread Tomas Vondra
Hi, On 04/09/2016 01:21 AM, Tatsuo Ishii wrote: From: Tomas Vondra ... My feedback regarding docs were: - There's no docs for pg_mv_statistic (should be added to "49. System Catalogs") - The word "multivariate statistics" or something like that should

Re: [HACKERS] multivariate statistics v14

2016-04-09 Thread Simon Riggs
On 8 April 2016 at 20:13, Tom Lane wrote: > I will make it a high priority for 9.7, though. > That is my plan also. I've already started reviewing the non-planner parts anyway, specifically patch 0002. -- Simon Riggshttp://www.2ndQuadrant.com/

Re: [HACKERS] multivariate statistics v14

2016-04-08 Thread Tatsuo Ishii
From: Tomas Vondra <tomas.von...@2ndquadrant.com> Subject: Re: [HACKERS] multivariate statistics v14 Date: Fri, 8 Apr 2016 20:55:24 +0200 Message-ID: <5d1d62a6-6228-188c-e079-c1be59942...@2ndquadrant.com> > On 04/08/2016 05:55 PM, Robert Haas wrote: >> On Tue, Mar 29, 20

Re: [HACKERS] multivariate statistics v14

2016-04-08 Thread Robert Haas
On Fri, Apr 8, 2016 at 3:13 PM, Tom Lane wrote: > Robert Haas writes: >> On Fri, Apr 8, 2016 at 2:55 PM, Tomas Vondra >> wrote: >>> Well, me to. But my feeling is the patch received entirely insufficient >>> amount of

Re: [HACKERS] multivariate statistics v14

2016-04-08 Thread Tom Lane
Robert Haas writes: > On Fri, Apr 8, 2016 at 2:55 PM, Tomas Vondra > wrote: >> Well, me to. But my feeling is the patch received entirely insufficient >> amount of thorough code review, considering how important part of the code >> it touches.

Re: [HACKERS] multivariate statistics v14

2016-04-08 Thread Robert Haas
On Fri, Apr 8, 2016 at 2:55 PM, Tomas Vondra wrote: > Well, me to. But my feeling is the patch received entirely insufficient > amount of thorough code review, considering how important part of the code > it touches. I agree docs are an important part of a patch, but

Re: [HACKERS] multivariate statistics v14

2016-04-08 Thread Tomas Vondra
On 04/08/2016 05:55 PM, Robert Haas wrote: On Tue, Mar 29, 2016 at 11:18 AM, David Steele wrote: On 3/28/16 4:42 AM, Tomas Vondra wrote: Yes, those are valid omissions. I plan to address them, and I'd also considering adding a section to 65.1 (How the Planner Uses

Re: [HACKERS] multivariate statistics v14

2016-04-08 Thread Robert Haas
On Tue, Mar 29, 2016 at 11:18 AM, David Steele wrote: > On 3/28/16 4:42 AM, Tomas Vondra wrote: >> Yes, those are valid omissions. I plan to address them, and I'd also >> considering adding a section to 65.1 (How the Planner Uses Statistics), >> explaining more thoroughly how

Re: [HACKERS] multivariate statistics v14

2016-03-29 Thread Tatsuo Ishii
>> with statistics without statistics >> case10.980.01 >> case298/01/0 > > The case2 shows that functional dependencies assume that the > conditions used in queries won't be incompatible - that's something > this type of statistics can't fix. It would

Re: [HACKERS] multivariate statistics v14

2016-03-29 Thread David Steele
Hi Tomas, On 3/28/16 4:42 AM, Tomas Vondra wrote: Yes, those are valid omissions. I plan to address them, and I'd also considering adding a section to 65.1 (How the Planner Uses Statistics), explaining more thoroughly how the planner uses multivariate stats. It looks you need post a new

Re: [HACKERS] multivariate statistics v14

2016-03-28 Thread Alvaro Herrera
Tomas Vondra wrote: > I'm not sure about the prototypes though. It was a bit weird because > prototypes in the same header file were formatted very differently. Yeah, it is very odd. What happens is that the BSD indent binary does one thing (return type is in one line and function name in

Re: [HACKERS] multivariate statistics v14

2016-03-28 Thread Tomas Vondra
On 03/26/2016 08:09 PM, Alvaro Herrera wrote: Tomas Vondra wrote: There are a few places where I reverted the pgindent formatting, because it seemed a bit too weird - the first one are the lists of function prototypes in common.h/mvstat.h, the second one are function calls to

Re: [HACKERS] multivariate statistics v14

2016-03-28 Thread Tomas Vondra
Hi, On 03/26/2016 10:18 AM, Tatsuo Ishii wrote: Fair point. Attached is v18 of the patch, after pgindent cleanup. Here are some feedbacks to v18 patch. 1) regarding examples in create_statistics manual Here are numbers I got. "with statistics" referrers to the case where multivariate

Re: [HACKERS] multivariate statistics v14

2016-03-26 Thread Alvaro Herrera
Tomas Vondra wrote: > There are a few places where I reverted the pgindent formatting, because it > seemed a bit too weird - the first one are the lists of function prototypes > in common.h/mvstat.h, the second one are function calls to > _greedy/_exhaustive methods. Function prototypes being

Re: [HACKERS] multivariate statistics v14

2016-03-26 Thread Tatsuo Ishii
> Fair point. Attached is v18 of the patch, after pgindent cleanup. Here are some feedbacks to v18 patch. 1) regarding examples in create_statistics manual Here are numbers I got. "with statistics" referrers to the case where multivariate statistics are used. "without statistics" referrers to

Re: [HACKERS] multivariate statistics v14

2016-03-25 Thread Tom Lane
Tomas Vondra writes: > I could do that, but isn't that a bit pointless? I thought pgindent is > run regularly on the whole codebase, not for individual patches. Sure, > it'll tweak the formatting on a few places in the patch (including the > code discussed above,

Re: [HACKERS] multivariate statistics v14

2016-03-25 Thread Tomas Vondra
On 03/24/2016 06:45 PM, Alvaro Herrera wrote: Tomas Vondra wrote: +values[Anum_pg_mv_statistic_stamcv - 1] = PointerGetDatum(data); Why the double space (that's actually in several places in several of the patches). To align the whole block like this:

Re: [HACKERS] multivariate statistics v14

2016-03-24 Thread Alvaro Herrera
Tomas Vondra wrote: > >+values[Anum_pg_mv_statistic_stamcv - 1] = PointerGetDatum(data); > > > >Why the double space (that's actually in several places in several of > >the patches). > > To align the whole block like this: > > nulls[Anum_pg_mv_statistic_stadeps -1] = true; >

Re: [HACKERS] multivariate statistics v14

2016-03-23 Thread Petr Jelinek
Hi, I'll add couple of code comments from my first cursory read through (this is huge): 0002: there is some whitespace noise between the varlistentries in alter_statistics.sgml + parentobject.classId = RelationRelationId; + parentobject.objectId =

Re: [HACKERS] multivariate statistics v14

2016-03-23 Thread Tomas Vondra
On 03/23/2016 06:20 AM, Tatsuo Ishii wrote: I am now looking into the create statistics doc to see if the example appearing in it is working. I will get back if I find any. I have the ref doc: CREATE STATISTICS There are nice examples how the multivariate statistics gives better row number

Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
>> I am now looking into the create statistics doc to see if the example >> appearing in it is working. I will get back if I find any. I have the ref doc: CREATE STATISTICS There are nice examples how the multivariate statistics gives better row number estimation. So I gave them a try. "Create

Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
>>> I believe this is because reference.sgml is missing a call to >>> (per report by Alvaro Herrera). >> >> Ok, I will patch reference.sgml. > > Here are some comments on docs. > > - There's no docs for pg_mv_statistic (should be added to "49. System > Catalogs") > > - The word

Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
>> I believe this is because reference.sgml is missing a call to >> (per report by Alvaro Herrera). > > Ok, I will patch reference.sgml. Here are some comments on docs. - There's no docs for pg_mv_statistic (should be added to "49. System Catalogs") - The word "multivariate statistics" or

Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
>> Thanks for the explanation. I will look into patch 0001 to 0005 so >> that they could get into 9.6. >> >> In the mean time after applying patch 0001 to 0005 of v16, I get this >> while compiling SGML docs. >> >> openjade:ref/create_statistics.sgml:281:26:X: reference to >> non-existent ID

Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tomas Vondra
On 03/23/2016 02:53 AM, Tatsuo Ishii wrote: The users will be able to define statistics with the limitation that only a single one (the one covering the most columns referenced by the clauses) can be used when estimating a query. Which is not perfect, but I think it's a valuable improvement.

Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
> The users will be able to define statistics with the limitation that > only a single one (the one covering the most columns referenced by the > clauses) can be used when estimating a query. Which is not perfect, > but I think it's a valuable improvement. > > It might also be possible to split

Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tomas Vondra
Hi, On 03/22/2016 01:46 PM, Tatsuo Ishii wrote: ... Sorry, maybe I did not explain clearly. My question is, if put patches only 0002 to 0005 into 9.6, does it still give any visible benefit to users? The users will be able to define statistics with the limitation that only a single one (the

Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
> On 03/22/2016 11:41 AM, Tatsuo Ishii wrote: Hum. So without 0006 or beyond, there's not much benefit for the PostgreSQL users, and you are not too confident about 0006 or beyond. Then I would think it is a little bit hard to justify in putting 000[2-5] into 9.6. I really like

Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tomas Vondra
Hi, On 03/22/2016 11:41 AM, Tatsuo Ishii wrote: Hum. So without 0006 or beyond, there's not much benefit for the PostgreSQL users, and you are not too confident about 0006 or beyond. Then I would think it is a little bit hard to justify in putting 000[2-5] into 9.6. I really like this feature

Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
>> Hum. So without 0006 or beyond, there's not much benefit for the >> PostgreSQL users, and you are not too confident about 0006 or >> beyond. Then I would think it is a little bit hard to justify in >> putting 000[2-5] into 9.6. I really like this feature and would like >> to see in PostgreSQL

Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tomas Vondra
Hello, On 03/22/2016 09:13 AM, Tatsuo Ishii wrote: Do you have any other missing parts in this work? I am asking because I wonder if you want to push this into 9.6 or rather 9.7. I think the first few parts of the patch series, namely: * shared infrastructure (0002) * functional

Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tomas Vondra
Hi, On 03/22/2016 06:53 AM, Jeff Janes wrote: On Sun, Mar 20, 2016 at 4:34 PM, Tomas Vondra wrote: D'oh. Thanks for reporting. Attached is v16, hopefully fixing the few remaining whitespace issues. Hi Tomas, I'm trying out v16 against a common problem, where

Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
>> Do you have any other missing parts in this work? I am asking >> because I wonder if you want to push this into 9.6 or rather 9.7. > > I think the first few parts of the patch series, namely: > > * shared infrastructure (0002) > * functional dependencies (0003) > * MCV lists (0004) >

Re: [HACKERS] multivariate statistics v14

2016-03-21 Thread Jeff Janes
On Sun, Mar 20, 2016 at 4:34 PM, Tomas Vondra wrote: > > > D'oh. Thanks for reporting. Attached is v16, hopefully fixing the few > remaining whitespace issues. Hi Tomas, I'm trying out v16 against a common problem, where postgresql thinks it is likely top stop

Re: [HACKERS] multivariate statistics v14

2016-03-21 Thread Tomas Vondra
On 03/21/2016 04:34 AM, Alvaro Herrera wrote: Another skim on 0002: reference.sgml is missing a call to ObjectProperty[] contains a comment that the ACL is "same as relation", but is that still correct, given that now stats may be related to more than one relation? Do we even know what the

Re: [HACKERS] multivariate statistics v14

2016-03-21 Thread Tomas Vondra
Hi, On 03/21/2016 10:34 AM, Robert Haas wrote: On Sun, Mar 20, 2016 at 11:34 PM, Alvaro Herrera wrote: ObjectProperty[] contains a comment that the ACL is "same as relation", but is that still correct, given that now stats may be related to more than one relation?

Re: [HACKERS] multivariate statistics v14

2016-03-21 Thread Robert Haas
On Sun, Mar 20, 2016 at 11:34 PM, Alvaro Herrera wrote: > ObjectProperty[] contains a comment that the ACL is "same as relation", > but is that still correct, given that now stats may be related to more > than one relation? Do we even know what the rules for ACLs on >

Re: [HACKERS] multivariate statistics v14

2016-03-20 Thread Alvaro Herrera
Another skim on 0002: reference.sgml is missing a call to ObjectProperty[] contains a comment that the ACL is "same as relation", but is that still correct, given that now stats may be related to more than one relation? Do we even know what the rules for ACLs on cross-relation stats are? One

Re: [HACKERS] multivariate statistics v14

2016-03-20 Thread Tatsuo Ishii
>> Many trailing white spaces found. > > Sorry, haven't noticed that after one of the rebases. Fixed in the > attached v15 of the patch. There are still few of traling spaces. /home/t-ishii/0002-shared-infrastructure-and-functional-dependencies.patch:3792: trailing whitespace.

Re: [HACKERS] multivariate statistics v14

2016-03-18 Thread Tomas Vondra
Hi, On 03/16/2016 03:58 AM, Tatsuo Ishii wrote: I apology if it's already discussed. I am new to this patch. Attached is v15 of the patch series, fixing this and also doing quite a few additional improvements: * added some basic examples into the SGML documentation * addressing the

Re: [HACKERS] multivariate statistics v14

2016-03-16 Thread Kyotaro HORIGUCHI
Hello, I returned to this. At Sun, 13 Mar 2016 22:59:38 +0100, Tomas Vondra wrote in <1457906378.27231.10.ca...@2ndquadrant.com> > Oh, yeah. There was an extra pfree(). > > Attached is v15 of the patch series, fixing this and also doing quite a > few additional

Re: [HACKERS] multivariate statistics v14

2016-03-15 Thread Tatsuo Ishii
I apology if it's already discussed. I am new to this patch. > Attached is v15 of the patch series, fixing this and also doing quite a > few additional improvements: > > * added some basic examples into the SGML documentation > > * addressing the objectaddress omissions, as pointed out by

Re: [HACKERS] multivariate statistics v14

2016-03-15 Thread Tatsuo Ishii
> Instead of simply multiplying the ndistinct estimate with selecticity, > we instead use the formula for the expected number of distinct values > observed in 'k' rows when there are 'd' distinct values in the bin > > d * (1 - ((d - 1) / d)^k) > > This is 'with replacements' which seems

Re: [HACKERS] multivariate statistics v14

2016-03-12 Thread Jeff Janes
On Wed, Mar 9, 2016 at 9:21 AM, Tomas Vondra wrote: > Hi, > > On Wed, 2016-03-09 at 08:45 -0800, Jeff Janes wrote: >> On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra >> wrote: >> > Hi, >> > >> > thanks for the feedback. Attached is v14 of

Re: [HACKERS] multivariate statistics v14

2016-03-09 Thread Tomas Vondra
On Wed, 2016-03-09 at 18:21 +0100, Tomas Vondra wrote: > Hi, > > On Wed, 2016-03-09 at 08:45 -0800, Jeff Janes wrote: > > On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra > > wrote: > > > Hi, > > > > > > thanks for the feedback. Attached is v14 of the patch series,

Re: [HACKERS] multivariate statistics v14

2016-03-09 Thread Jeff Janes
On Wed, Mar 9, 2016 at 9:21 AM, Tomas Vondra wrote: > Hi, > > On Wed, 2016-03-09 at 08:45 -0800, Jeff Janes wrote: >> On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra >> wrote: >> > Hi, >> > >> > thanks for the feedback. Attached is v14 of

Re: [HACKERS] multivariate statistics v14

2016-03-09 Thread Tomas Vondra
Hi, On Wed, 2016-03-09 at 08:45 -0800, Jeff Janes wrote: > On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra > wrote: > > Hi, > > > > thanks for the feedback. Attached is v14 of the patch series, fixing > > most of the points you've raised. > > > Hi Tomas, > > Applied

Re: [HACKERS] multivariate statistics v14

2016-03-09 Thread Jeff Janes
On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra wrote: > Hi, > > thanks for the feedback. Attached is v14 of the patch series, fixing > most of the points you've raised. Hi Tomas, Applied to aa09cd242fa7e3a694a31f, I still get the seg faults in make check if I