Re: [HACKERS] multivariate statistics v14

2016-04-10 Thread Tomas Vondra

On 04/10/2016 10:25 AM, Simon Riggs wrote:

On 9 April 2016 at 18:37, Tatsuo Ishii > wrote:

> But I still think it wouldn't move the patch any closer to committable
> state, because what it really needs is review whether the catalog
> definition makes sense, whether it should be more like pg_statistic,
> and so on. Only then it makes sense to describe the catalog structure
> in the SGML docs, I think. That's why I added some basic SGML docs for
> CREATE/DROP/ALTER STATISTICS, which I expect to be rather stable, and
> not the catalog and other low-level stuff (which is commented heavily
> in the code anyway).

Without "user-level docs" (now I understand that the term means all
SGML docs for you), it is very hard to find a visible
characteristics/behavior of the patch. CREATE/DROP/ALTER STATISTICS
just defines a user interface, and does not help how it affects to the
planning. The READMEs do not help either.

In this case reviewing your code is something like reviewing a program
which has no specification.

That's the reason why I said before below, but it was never seriously
considered.


I would likely have said this myself but didn't even get that far.

Your contribution was useful and went further than anybody else's
review, so thank you.


100% agreed. Thanks for the useful feedback.

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-04-10 Thread Tomas Vondra

Hello,

On 04/09/2016 07:37 PM, Tatsuo Ishii wrote:

But I still think it wouldn't move the patch any closer to committable
state, because what it really needs is review whether the catalog
definition makes sense, whether it should be more like pg_statistic,
and so on. Only then it makes sense to describe the catalog structure
in the SGML docs, I think. That's why I added some basic SGML docs for
CREATE/DROP/ALTER STATISTICS, which I expect to be rather stable, and
not the catalog and other low-level stuff (which is commented heavily
in the code anyway).


Without "user-level docs" (now I understand that the term means all
SGML docs for you), it is very hard to find a visible
characteristics/behavior of the patch. CREATE/DROP/ALTER STATISTICS
just defines a user interface, and does not help how it affects to
the planning. The READMEs do not help either.

In this case reviewing your code is something like reviewing a
program which has no specification.


I certainly agree that reviewing a patch without the context is hard. My 
intent was to provide such context / explanation in the READMEs, but 
perhaps I failed to do so with enough detail.


BTW when you say that READMEs do not help either, does that mean you 
consider READMEs unsuitable for this type of information in general, or 
that the current READMEs lack important information?




That's the reason why I said before below, but it was never
seriously considered.

>

I've considered it, but my plan was to have detailed READMEs, and then 
eventually distill that into something suitable for the SGML (perhaps 
without discussion of some implementation details). Maybe that's not the 
right approach.


FWIW providing the context is why I started working on a "paper" 
explaining both the motivation and implementation, including a bit of 
math and figures (which is what we don't have in READMEs or SGML). I 
haven't updated it recently, and it probably got buried in the thread, 
but perhaps this would be a better way to provide the context?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-04-10 Thread Simon Riggs
On 9 April 2016 at 18:37, Tatsuo Ishii  wrote:

> > But I still think it wouldn't move the patch any closer to committable
> > state, because what it really needs is review whether the catalog
> > definition makes sense, whether it should be more like pg_statistic,
> > and so on. Only then it makes sense to describe the catalog structure
> > in the SGML docs, I think. That's why I added some basic SGML docs for
> > CREATE/DROP/ALTER STATISTICS, which I expect to be rather stable, and
> > not the catalog and other low-level stuff (which is commented heavily
> > in the code anyway).
>
> Without "user-level docs" (now I understand that the term means all
> SGML docs for you), it is very hard to find a visible
> characteristics/behavior of the patch. CREATE/DROP/ALTER STATISTICS
> just defines a user interface, and does not help how it affects to the
> planning. The READMEs do not help either.
>
> In this case reviewing your code is something like reviewing a program
> which has no specification.
>
> That's the reason why I said before below, but it was never seriously
> considered.
>

I would likely have said this myself but didn't even get that far.

Your contribution was useful and went further than anybody else's review,
so thank you.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] multivariate statistics v14

2016-04-09 Thread Tatsuo Ishii
> But I still think it wouldn't move the patch any closer to committable
> state, because what it really needs is review whether the catalog
> definition makes sense, whether it should be more like pg_statistic,
> and so on. Only then it makes sense to describe the catalog structure
> in the SGML docs, I think. That's why I added some basic SGML docs for
> CREATE/DROP/ALTER STATISTICS, which I expect to be rather stable, and
> not the catalog and other low-level stuff (which is commented heavily
> in the code anyway).

Without "user-level docs" (now I understand that the term means all
SGML docs for you), it is very hard to find a visible
characteristics/behavior of the patch. CREATE/DROP/ALTER STATISTICS
just defines a user interface, and does not help how it affects to the
planning. The READMEs do not help either.

In this case reviewing your code is something like reviewing a program
which has no specification.

That's the reason why I said before below, but it was never seriously
considered.

>> - There are some explanation how to deal with multivariate statistics
>>   in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
>>   section.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-04-09 Thread Tomas Vondra

Hi,

On 04/09/2016 01:21 AM, Tatsuo Ishii wrote:

From: Tomas Vondra 

...

My feedback regarding docs were:

- There's no docs for pg_mv_statistic (should be added to "49. System
  Catalogs")

- The word "multivariate statistics" or something like that should
  appear in the index.

- There are some explanation how to deal with multivariate statistics
  in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
  section.


The second and the third point maybe are something like "polishing
user-level" docs, but I don't think the first one is for "user-level".
Also I think without the first one the patch will be never
committable. If someone add a new system catalog, the doc should be
added to "System Catalogs" section, that's our standard, at least in
my understanding.


I do apologize if it seemed that I don't value your review, and I do 
agree that those changes need to be done, although I still see them 
rather as a user-level docs (as opposed to READMEs/comments, which I 
think are used by developers much more often).


But I still think it wouldn't move the patch any closer to committable 
state, because what it really needs is review whether the catalog 
definition makes sense, whether it should be more like pg_statistic, and 
so on. Only then it makes sense to describe the catalog structure in the 
SGML docs, I think. That's why I added some basic SGML docs for 
CREATE/DROP/ALTER STATISTICS, which I expect to be rather stable, and 
not the catalog and other low-level stuff (which is commented heavily in 
the code anyway).


Had the patch been a Titanic, fixing the SGML docs a few days before the 
code freeze would be akin to washing the deck instead of looking for 
icebergs on April 15, 1912.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-04-09 Thread Simon Riggs
On 8 April 2016 at 20:13, Tom Lane  wrote:


> I will make it a high priority for 9.7, though.
>

That is my plan also. I've already started reviewing the non-planner parts
anyway, specifically patch 0002.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] multivariate statistics v14

2016-04-08 Thread Tatsuo Ishii
From: Tomas Vondra <tomas.von...@2ndquadrant.com>
Subject: Re: [HACKERS] multivariate statistics v14
Date: Fri, 8 Apr 2016 20:55:24 +0200
Message-ID: <5d1d62a6-6228-188c-e079-c1be59942...@2ndquadrant.com>

> On 04/08/2016 05:55 PM, Robert Haas wrote:
>> On Tue, Mar 29, 2016 at 11:18 AM, David Steele <da...@pgmasters.net>
>> wrote:
>>> On 3/28/16 4:42 AM, Tomas Vondra wrote:
>>>> Yes, those are valid omissions. I plan to address them, and I'd also
>>>> considering adding a section to 65.1 (How the Planner Uses
>>>> Statistics),
>>>> explaining more thoroughly how the planner uses multivariate stats.
>>>
>>> It looks you need post a new patch so I have marked this "waiting on
>>> author".
>>
>> Since no new version of this patch has been posted in the last 10
>> days, it seems clear that there will not be time for this to
>> reasonably become ready for committer and then get committed in the
>> few hours remaining before the deadline. That is a bummer, since I
>> was hoping we would have this feature in this release, but hopefully
>> we will get it into 9.7. I am marking it Returned with Feedback.
>>
> 
> Well, me to. But my feeling is the patch received entirely
> insufficient amount of thorough code review, considering how important
> part of the code it touches. I agree docs are an important part of a
> patch, but polishing user-level docs would hardly move the patch
> closer to being committable (especially when there's ~50kB of
> READMEs).

My feedback regarding docs were:
> - There's no docs for pg_mv_statistic (should be added to "49. System
>   Catalogs")
>
> - The word "multivariate statistics" or something like that should
>   appear in the index.
> 
> - There are some explanation how to deal with multivariate statistics
>   in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
>   section.

The second and the third point maybe are something like "polishing
user-level" docs, but I don't think the first one is for "user-level".
Also I think without the first one the patch will be never
committable. If someone add a new system catalog, the doc should be
added to "System Catalogs" section, that's our standard, at least in
my understanding.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-04-08 Thread Robert Haas
On Fri, Apr 8, 2016 at 3:13 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Apr 8, 2016 at 2:55 PM, Tomas Vondra
>>  wrote:
>>> Well, me to. But my feeling is the patch received entirely insufficient
>>> amount of thorough code review, considering how important part of the code
>>> it touches. I agree docs are an important part of a patch, but polishing
>>> user-level docs would hardly move the patch closer to being committable
>>> (especially when there's ~50kB of READMEs).
>
>> I have to admit that I was really hoping Tom would follow through on
>> his statement that he would look into this one, or that Dean Rasheed
>> would get involved.
>
> I'm sorry I didn't get to it, but it's not like I have been slacking
> during this commitfest.  At some point, you just have to accept that
> not everything we could wish will get into 9.6.

I did not mean to imply otherwise.  I'm just explaining why I didn't
spend time on it - I figured I was not the most qualified person, and
of course I have not been slacking either.  :-)

> I will make it a high priority for 9.7, though.

Woohoo!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-04-08 Thread Tom Lane
Robert Haas  writes:
> On Fri, Apr 8, 2016 at 2:55 PM, Tomas Vondra
>  wrote:
>> Well, me to. But my feeling is the patch received entirely insufficient
>> amount of thorough code review, considering how important part of the code
>> it touches. I agree docs are an important part of a patch, but polishing
>> user-level docs would hardly move the patch closer to being committable
>> (especially when there's ~50kB of READMEs).

> I have to admit that I was really hoping Tom would follow through on
> his statement that he would look into this one, or that Dean Rasheed
> would get involved.

I'm sorry I didn't get to it, but it's not like I have been slacking
during this commitfest.  At some point, you just have to accept that
not everything we could wish will get into 9.6.

I will make it a high priority for 9.7, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-04-08 Thread Robert Haas
On Fri, Apr 8, 2016 at 2:55 PM, Tomas Vondra
 wrote:
> Well, me to. But my feeling is the patch received entirely insufficient
> amount of thorough code review, considering how important part of the code
> it touches. I agree docs are an important part of a patch, but polishing
> user-level docs would hardly move the patch closer to being committable
> (especially when there's ~50kB of READMEs).

I have to admit that I was really hoping Tom would follow through on
his statement that he would look into this one, or that Dean Rasheed
would get involved.  I am sure I could do a good review of this patch
given enough time, but I am also sure that it would take an amount of
time that is at least one if not two orders of magnitude more than I
put into any patch this CommitFest.  I understand statistics at some
basic level, but I am not an expert on them the way some people here
are.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-04-08 Thread Tomas Vondra

On 04/08/2016 05:55 PM, Robert Haas wrote:

On Tue, Mar 29, 2016 at 11:18 AM, David Steele  wrote:

On 3/28/16 4:42 AM, Tomas Vondra wrote:

Yes, those are valid omissions. I plan to address them, and I'd also
considering adding a section to 65.1 (How the Planner Uses Statistics),
explaining more thoroughly how the planner uses multivariate stats.


It looks you need post a new patch so I have marked this "waiting on
author".


Since no new version of this patch has been posted in the last 10
days, it seems clear that there will not be time for this to
reasonably become ready for committer and then get committed in the
few hours remaining before the deadline. That is a bummer, since I
was hoping we would have this feature in this release, but hopefully
we will get it into 9.7. I am marking it Returned with Feedback.



Well, me to. But my feeling is the patch received entirely insufficient 
amount of thorough code review, considering how important part of the 
code it touches. I agree docs are an important part of a patch, but 
polishing user-level docs would hardly move the patch closer to being 
committable (especially when there's ~50kB of READMEs).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-04-08 Thread Robert Haas
On Tue, Mar 29, 2016 at 11:18 AM, David Steele  wrote:
> On 3/28/16 4:42 AM, Tomas Vondra wrote:
>> Yes, those are valid omissions. I plan to address them, and I'd also
>> considering adding a section to 65.1 (How the Planner Uses Statistics),
>> explaining more thoroughly how the planner uses multivariate stats.
>
> It looks you need post a new patch so I have marked this "waiting on
> author".

Since no new version of this patch has been posted in the last 10
days, it seems clear that there will not be time for this to
reasonably become ready for committer and then get committed in the
few hours remaining before the deadline.  That is a bummer, since I
was hoping we would have this feature in this release, but hopefully
we will get it into 9.7.  I am marking it Returned with Feedback.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-29 Thread Tatsuo Ishii
>>  with statistics without statistics
>> case10.980.01
>> case298/01/0
> 
> The case2 shows that functional dependencies assume that the
> conditions used in queries won't be incompatible - that's something
> this type of statistics can't fix.

It would be nice if that's mentioned in the manual to avoid user's
confusion.

>> case31.050.01
>> case41/0 103/0
>> case518.50   18.33
>> case623/0123/0
> 
> The last two lines (case5 + case6) seem a bit suspicious. I believe
> those are for the histogram data, and I do get these numbers:
> 
> case50.93 (5517 / 5949) 42.0 (249943 / 5949)
> case6100/0  100/0
> 
> Perhaps you've been using the version before the bugfix, with ANALYZE
> on the wrong table?

You are right. I accidentally ANALYZE t2, not t3. Now I get these
numbers:

case51.23 (7367 / 5968) 41.7 (249118 / 5981)
case6117/0  162092/0

>> 2) following comments by me are not addressed in the v18 patch.
>>
>>> - There's no docs for pg_mv_statistic (should be added to "49. System
>>>   Catalogs")
>>>
>>> - The word "multivariate statistics" or something like that should
>>>   appear in the index.
>>>
>>> - There are some explanation how to deal with multivariate statistics
>>>   in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
>>>   section.
> 
> Yes, those are valid omissions. I plan to address them, and I'd also
> considering adding a section to 65.1 (How the Planner Uses
> Statistics), explaining more thoroughly how the planner uses
> multivariate stats.

Great.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-29 Thread David Steele

Hi Tomas,

On 3/28/16 4:42 AM, Tomas Vondra wrote:


Yes, those are valid omissions. I plan to address them, and I'd also
considering adding a section to 65.1 (How the Planner Uses Statistics),
explaining more thoroughly how the planner uses multivariate stats.


It looks you need post a new patch so I have marked this "waiting on 
author".


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-28 Thread Alvaro Herrera
Tomas Vondra wrote:

> I'm not sure about the prototypes though. It was a bit weird because
> prototypes in the same header file were formatted very differently.

Yeah, it is very odd.  What happens is that the BSD indent binary does
one thing (return type is in one line and function name in following
line; subsequent argument lines are aligned to opening parens), then the
pgindent perl script changes it (moves function name to same line as
return type, but does not reindent subsequent lines of arguments).

You can imitate the effect by adding an extra newline just before the
function name, reflowing the arguments to align to the (, then deleting
the extra newline.  Rather annoying.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-28 Thread Tomas Vondra

On 03/26/2016 08:09 PM, Alvaro Herrera wrote:

Tomas Vondra wrote:


There are a few places where I reverted the pgindent formatting, because it
seemed a bit too weird - the first one are the lists of function prototypes
in common.h/mvstat.h, the second one are function calls to
_greedy/_exhaustive methods.


Function prototypes being weird is something that we've learned to
accept.  There's no point in undoing pgindent decisions there, because
the next run will re-apply them anyway.  Best not to fight it.

What you should definitely look into fixing is the formatting of
comments, if the result is too horrible.  You can prevent it from
messing those by adding dashes /*- at the beginning of the comment.



Yep, formatting of some of the comments got slightly broken, but it 
wasn't difficult to fix that without the /*--- trick.


I'm not sure about the prototypes though. It was a bit weird because 
prototypes in the same header file were formatted very differently.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-28 Thread Tomas Vondra

Hi,

On 03/26/2016 10:18 AM, Tatsuo Ishii wrote:

Fair point. Attached is v18 of the patch, after pgindent cleanup.


Here are some feedbacks to v18 patch.

1) regarding examples in create_statistics manual

Here are numbers I got. "with statistics" referrers to the case where
multivariate statistics are used.  "without statistics" referrers to the
case where multivariate statistics are not used. The numbers denote
estimated_rows/actual_rows. Thus closer to 1.0 is better. Some numbers
are shown as a fraction to avoid 0 division. In my understanding case
1, 3, 4 showed that multivariate statistics superior.

with statistics without statistics
case1   0.980.01
case2   98/01/0


The case2 shows that functional dependencies assume that the conditions 
used in queries won't be incompatible - that's something this type of 
statistics can't fix.



case3   1.050.01
case4   1/0 103/0
case5   18.50   18.33
case6   23/0123/0


The last two lines (case5 + case6) seem a bit suspicious. I believe 
those are for the histogram data, and I do get these numbers:


case50.93 (5517 / 5949) 42.0 (249943 / 5949)
case6100/0  100/0

Perhaps you've been using the version before the bugfix, with ANALYZE on 
the wrong table?




2) following comments by me are not addressed in the v18 patch.


- There's no docs for pg_mv_statistic (should be added to "49. System
  Catalogs")

- The word "multivariate statistics" or something like that should
  appear in the index.

- There are some explanation how to deal with multivariate statistics
  in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
  section.


Yes, those are valid omissions. I plan to address them, and I'd also 
considering adding a section to 65.1 (How the Planner Uses Statistics), 
explaining more thoroughly how the planner uses multivariate stats.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-26 Thread Alvaro Herrera
Tomas Vondra wrote:

> There are a few places where I reverted the pgindent formatting, because it
> seemed a bit too weird - the first one are the lists of function prototypes
> in common.h/mvstat.h, the second one are function calls to
> _greedy/_exhaustive methods.

Function prototypes being weird is something that we've learned to
accept.  There's no point in undoing pgindent decisions there, because
the next run will re-apply them anyway.  Best not to fight it.

What you should definitely look into fixing is the formatting of
comments, if the result is too horrible.  You can prevent it from
messing those by adding dashes /*- at the beginning of the comment.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-26 Thread Tatsuo Ishii
> Fair point. Attached is v18 of the patch, after pgindent cleanup.

Here are some feedbacks to v18 patch.

1) regarding examples in create_statistics manual

Here are numbers I got. "with statistics" referrers to the case where
multivariate statistics are used.  "without statistics" referrers to the
case where multivariate statistics are not used. The numbers denote
estimated_rows/actual_rows. Thus closer to 1.0 is better. Some numbers
are shown as a fraction to avoid 0 division. In my understanding case
1, 3, 4 showed that multivariate statistics superior.

with statistics without statistics
case1   0.980.01
case2   98/01/0
case3   1.050.01
case4   1/0 103/0
case5   18.50   18.33
case6   23/0123/0

2) following comments by me are not addressed in the v18 patch.

> - There's no docs for pg_mv_statistic (should be added to "49. System
>   Catalogs")
> 
> - The word "multivariate statistics" or something like that should
>   appear in the index.
> 
> - There are some explanation how to deal with multivariate statistics
>   in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
>   section.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-25 Thread Tom Lane
Tomas Vondra  writes:
> I could do that, but isn't that a bit pointless? I thought pgindent is 
> run regularly on the whole codebase, not for individual patches. Sure, 
> it'll tweak the formatting on a few places in the patch (including the 
> code discussed above, as you pointed out), but there are many other such 
> places coming from other committed patches.

One point of running pgindent for yourself is to make sure you haven't set
up any code in a way that will look horrible after pgindent gets done with
it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-25 Thread Tomas Vondra

On 03/24/2016 06:45 PM, Alvaro Herrera wrote:

Tomas Vondra wrote:


+values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);

Why the double space (that's actually in several places in several of
the patches).


To align the whole block like this:

nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
nulls[Anum_pg_mv_statistic_stahist  -1] = true;
nulls[Anum_pg_mv_statistic_standist -1] = true;

But I won't fight for this too hard, if it breaks rules somehow.


Yeah, it will be undone by pgindent.  I suggest you pgindent all the
patches in the series.  With some clever patch vs. patch -R application,
you can do it without having to resolve any conflicts when pgindent
modifies code that a patch further up in the series modifies again.



I could do that, but isn't that a bit pointless? I thought pgindent is 
run regularly on the whole codebase, not for individual patches. Sure, 
it'll tweak the formatting on a few places in the patch (including the 
code discussed above, as you pointed out), but there are many other such 
places coming from other committed patches.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-24 Thread Alvaro Herrera
Tomas Vondra wrote:

> >+values[Anum_pg_mv_statistic_stamcv  - 1] = PointerGetDatum(data);
> >
> >Why the double space (that's actually in several places in several of
> >the patches).
> 
> To align the whole block like this:
> 
> nulls[Anum_pg_mv_statistic_stadeps  -1] = true;
> nulls[Anum_pg_mv_statistic_stamcv   -1] = true;
> nulls[Anum_pg_mv_statistic_stahist  -1] = true;
> nulls[Anum_pg_mv_statistic_standist -1] = true;
> 
> But I won't fight for this too hard, if it breaks rules somehow.

Yeah, it will be undone by pgindent.  I suggest you pgindent all the
patches in the series.  With some clever patch vs. patch -R application,
you can do it without having to resolve any conflicts when pgindent
modifies code that a patch further up in the series modifies again.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-23 Thread Petr Jelinek

Hi,

I'll add couple of code comments from my first cursory read through 
(this is huge):


0002:
there is some whitespace noise between the varlistentries in 
alter_statistics.sgml


+   parentobject.classId = RelationRelationId;
+   parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+   parentobject.objectSubId = 0;
+   childobject.classId = MvStatisticRelationId;
+   childobject.objectId = statoid;
+   childobject.objectSubId = 0;

I wonder if this (several places similar code) would be simpler done 
using ObjectAddressSet()


The common.h in backend/utils/mvstat is slightly weird header file 
placement and naming.



0004:
+/* used for merging bitmaps - AND (min), OR (max) */
+#define MAX(x, y) (((x) > (y)) ? (x) : (y))
+#define MIN(x, y) (((x) < (y)) ? (x) : (y))

Huh? We have Max and Min macros defined in c.h

+   values[Anum_pg_mv_statistic_stamcv  - 1] = 
PointerGetDatum(data);

Why the double space (that's actually in several places in several of 
the patches).


I don't really understand why 0008 and 0009 are separate patches and 
aren't part of one of the other patches. But otherwise good job on 
splitting the functionality into patchset.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-23 Thread Tomas Vondra

On 03/23/2016 06:20 AM, Tatsuo Ishii wrote:

I am now looking into the create statistics doc to see if the example
appearing in it is working. I will get back if I find any.


I have the ref doc: CREATE STATISTICS

There are nice examples how the multivariate statistics gives better
row number estimation. So I gave them a try.

"Create table t1 with two functionally dependent columns,
 i.e. knowledge of a value in the first column is sufficient for
 determining the value in the other column" The example creates table
 "t1", then populates it using generate_series. After CREATE
 STATISTICS, ANALYZE and EXPLAIN. I expected the EXPLAIN demonstrates
 how result rows estimation is enhanced by using the multivariate
 statistics.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
QUERY PLAN
---
 Seq Scan on t1  (cost=0.00..19425.00 rows=98 width=8) (actual 
time=76.876..76.876 rows=0 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 100
 Planning time: 0.146 ms
 Execution time: 76.896 ms
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
QUERY PLAN
--
 Seq Scan on t1  (cost=0.00..19425.00 rows=1 width=8) (actual 
time=78.867..78.867 rows=0 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 100
 Planning time: 0.102 ms
 Execution time: 78.885 ms
(5 rows)

It seems the row numbers estimation (98) using the multivariate
statistics is actually *worse* than the one (1) not using the
statistics because the actual row number is 0.


Yes, there's a mistake in the first query, because the conditions 
actually are not compatible. I.e. (i/100)=1 and (i/500)=1 have no 
overlapping rows, clearly. It should be


EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);

instead. Will fix.



Next example (using table "t2") is much better than the case using t1.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
   QUERY PLAN

 Seq Scan on t2  (cost=0.00..19425.00 rows=9633 width=8) (actual 
time=0.012..75.350 rows=1 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 99
 Planning time: 0.107 ms
 Execution time: 75.680 ms
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
  QUERY PLAN
--
 Seq Scan on t2  (cost=0.00..19425.00 rows=91 width=8) (actual 
time=0.008..76.614 rows=1 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 99
 Planning time: 0.067 ms
 Execution time: 76.935 ms
(5 rows)

This time it seems the row numbers estimation (9633) using the
multivariate statistics is much better than the one (91) not using the
statistics because the actual row number is 1.

The last example (using table "t3") seems no effect by multivariate statistics.


Yes. There's a typo in the example - it analyzes the wrong table (t2 
instead of t3). Once I fix that, the estimates are much better.



In summary, the only case which shows the effect of the multivariate
statistics is the "t2" case. So I don't see why other examples are
shown in the manual. Am I missing something?


No, thanks for spotting those mistakes. I'll fix them and submit a new 
version of the patch - either later today or perhaps tomorrow.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
>> I am now looking into the create statistics doc to see if the example
>> appearing in it is working. I will get back if I find any.

I have the ref doc: CREATE STATISTICS

There are nice examples how the multivariate statistics gives better
row number estimation. So I gave them a try.

"Create table t1 with two functionally dependent columns,
 i.e. knowledge of a value in the first column is sufficient for
 determining the value in the other column" The example creates table
 "t1", then populates it using generate_series. After CREATE
 STATISTICS, ANALYZE and EXPLAIN. I expected the EXPLAIN demonstrates
 how result rows estimation is enhanced by using the multivariate
 statistics.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
QUERY PLAN  
   
---
 Seq Scan on t1  (cost=0.00..19425.00 rows=98 width=8) (actual 
time=76.876..76.876 rows=0 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 100
 Planning time: 0.146 ms
 Execution time: 76.896 ms
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 1);
QUERY PLAN  
  
--
 Seq Scan on t1  (cost=0.00..19425.00 rows=1 width=8) (actual 
time=78.867..78.867 rows=0 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 100
 Planning time: 0.102 ms
 Execution time: 78.885 ms
(5 rows)

It seems the row numbers estimation (98) using the multivariate
statistics is actually *worse* than the one (1) not using the
statistics because the actual row number is 0.

Next example (using table "t2") is much better than the case using t1.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
   QUERY PLAN   


 Seq Scan on t2  (cost=0.00..19425.00 rows=9633 width=8) (actual 
time=0.012..75.350 rows=1 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 99
 Planning time: 0.107 ms
 Execution time: 75.680 ms
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
  QUERY PLAN
  
--
 Seq Scan on t2  (cost=0.00..19425.00 rows=91 width=8) (actual 
time=0.008..76.614 rows=1 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 99
 Planning time: 0.067 ms
 Execution time: 76.935 ms
(5 rows)

This time it seems the row numbers estimation (9633) using the
multivariate statistics is much better than the one (91) not using the
statistics because the actual row number is 1.

The last example (using table "t3") seems no effect by multivariate statistics.

Here is the EXPLAIN output using the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 500) AND (b > 500);
QUERY PLAN  
   
---
 Seq Scan on t3  (cost=0.00..20407.65 rows=23 width=16) (actual 
time=0.154..132.509 rows=6002 loops=1)
   Filter: ((a < '500'::double precision) AND (b > '500'::double precision))
   Rows Removed by Filter: 993998
 Planning time: 0.080 ms
 Execution time: 132.735 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 400) AND (b > 600);
QUERY PLAN  
  
--
 Seq Scan on t3  (cost=0.00..20407.65 rows=23 width=16) (actual 
time=110.518..110.518 rows=0 loops=1)
   Filter: ((a < '400'::double precision) AND (b > '600'::double precision))
   Rows Removed by Filter: 100
 Planning time: 0.052 ms
 Execution time: 110.531 ms
(5 rows)

Here is the EXPLAIN output without the multivariate statistics:

EXPLAIN ANALYZE SELECT * FROM t3 WHERE (a < 500) AND (b > 500);
QUERY PLAN  
   
---
 Seq Scan on t3  

Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
>>> I believe this is because reference.sgml is missing a call to
>>>  (per report by Alvaro Herrera).
>> 
>> Ok, I will patch reference.sgml.
> 
> Here are some comments on docs.
> 
> - There's no docs for pg_mv_statistic (should be added to "49. System
>   Catalogs")
> 
> - The word "multivariate statistics" or something like that should
>   appear in the index.
> 
> - There are some explanation how to deal with multivariate statistics
Oops. Should read "There should be some explanations".

>   in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
>   section.
> 
> I am now looking into the create statistics doc to see if the example
> appearing in it is working. I will get back if I find any.
> 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
>> I believe this is because reference.sgml is missing a call to
>>  (per report by Alvaro Herrera).
> 
> Ok, I will patch reference.sgml.

Here are some comments on docs.

- There's no docs for pg_mv_statistic (should be added to "49. System
  Catalogs")

- The word "multivariate statistics" or something like that should
  appear in the index.

- There are some explanation how to deal with multivariate statistics
  in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
  section.

I am now looking into the create statistics doc to see if the example
appearing in it is working. I will get back if I find any.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
>> Thanks for the explanation. I will look into patch 0001 to 0005 so
>> that they could get into 9.6.
>>
>> In the mean time after applying patch 0001 to 0005 of v16, I get this
>> while compiling SGML docs.
>>
>> openjade:ref/create_statistics.sgml:281:26:X: reference to
>> non-existent ID "SQL-ALTERSTATISTICS"
>> openjade:ref/drop_statistics.sgml:86:26:X: reference to non-existent
>> ID "SQL-ALTERSTATISTICS"
> 
> I believe this is because reference.sgml is missing a call to
>  (per report by Alvaro Herrera).

Ok, I will patch reference.sgml.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tomas Vondra


On 03/23/2016 02:53 AM, Tatsuo Ishii wrote:

The users will be able to define statistics with the limitation that
only a single one (the one covering the most columns referenced by the
clauses) can be used when estimating a query. Which is not perfect,
but I think it's a valuable improvement.

It might also be possible to split 0006 into smaller pieces, for
example implementing the "non-overlapping statistics" case first and
then extending it to more complicated cases. That might increase the
change of getting at least some of that into 9.6 ...

But considering it's not clear whether the initial chunks are likely
to make it into 9.6 - I kinda expect a fair amount of comments from TL
about the preceding parts, who mentioned he might look at the patch
this week. So I'm not sure splitting 0006 into smaller pieces makes
sense at this point.


Thanks for the explanation. I will look into patch 0001 to 0005 so
that they could get into 9.6.

In the mean time after applying patch 0001 to 0005 of v16, I get this
while compiling SGML docs.

openjade:ref/create_statistics.sgml:281:26:X: reference to non-existent ID 
"SQL-ALTERSTATISTICS"
openjade:ref/drop_statistics.sgml:86:26:X: reference to non-existent ID 
"SQL-ALTERSTATISTICS"


I believe this is because reference.sgml is missing a call to 
 (per report by Alvaro Herrera).


thanks

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
> The users will be able to define statistics with the limitation that
> only a single one (the one covering the most columns referenced by the
> clauses) can be used when estimating a query. Which is not perfect,
> but I think it's a valuable improvement.
> 
> It might also be possible to split 0006 into smaller pieces, for
> example implementing the "non-overlapping statistics" case first and
> then extending it to more complicated cases. That might increase the
> change of getting at least some of that into 9.6 ...
> 
> But considering it's not clear whether the initial chunks are likely
> to make it into 9.6 - I kinda expect a fair amount of comments from TL
> about the preceding parts, who mentioned he might look at the patch
> this week. So I'm not sure splitting 0006 into smaller pieces makes
> sense at this point.

Thanks for the explanation. I will look into patch 0001 to 0005 so
that they could get into 9.6.

In the mean time after applying patch 0001 to 0005 of v16, I get this
while compiling SGML docs.

openjade:ref/create_statistics.sgml:281:26:X: reference to non-existent ID 
"SQL-ALTERSTATISTICS"
openjade:ref/drop_statistics.sgml:86:26:X: reference to non-existent ID 
"SQL-ALTERSTATISTICS"

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tomas Vondra

Hi,

On 03/22/2016 01:46 PM, Tatsuo Ishii wrote:
...

Sorry, maybe I did not explain clearly. My question is, if put
patches only 0002 to 0005 into 9.6, does it still give any visible
benefit to users?


The users will be able to define statistics with the limitation that 
only a single one (the one covering the most columns referenced by the 
clauses) can be used when estimating a query. Which is not perfect, but 
I think it's a valuable improvement.


It might also be possible to split 0006 into smaller pieces, for example 
implementing the "non-overlapping statistics" case first and then 
extending it to more complicated cases. That might increase the change 
of getting at least some of that into 9.6 ...


But considering it's not clear whether the initial chunks are likely to 
make it into 9.6 - I kinda expect a fair amount of comments from TL 
about the preceding parts, who mentioned he might look at the patch this 
week. So I'm not sure splitting 0006 into smaller pieces makes sense at 
this point.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
> On 03/22/2016 11:41 AM, Tatsuo Ishii wrote:
 Hum. So without 0006 or beyond, there's not much benefit for the
 PostgreSQL users, and you are not too confident about 0006 or
 beyond. Then I would think it is a little bit hard to justify in
 putting 000[2-5] into 9.6. I really like this feature and would
 like to see in PostgreSQL someday, but I'm not sure if we should
 put the patches (0002-0005) into PostgreSQL now. Please let me
 know if there's some reaons we should put the patches into
 PostgreSQL now.
>>>
>>> I don't think so. While being able to combine multiple statistics
>>> is certainly useful, I'm convinced that the initial patched add
>>> enough
>>
>> Can you please elaborate a little bit more how combining multiple
>> statistics is useful?
> 
> Sure.
> 
> The goal of multivariate statistics is to approximate a probability
> distribution on a group of columns. The larger the number of columns,
> the less accurate the statistics will be (with respect to individual
> columns), assuming fixed size of the sample in ANALYZE, and fixed
> statistics size.
> 
> For example, if you add a column to multivariate histogram, you'll do
> some "bucket splits" by this dimension, thus reducing the accuracy for
> the other columns. You may of course allow larger statistics
> (e.g. histograms with more buckets), but that also requires larger
> samples, and so on.
> 
> Now, let's  assume you have a query like this:
> 
> WHERE (a=1) AND (b=2) AND (c=3) AND (d=4)
> 
> and that "a" and "b" are correlated, and "c" and "d" are correlated,
> but that otherwise the columns are independent. It'd be a bit silly to
> require building statistics on (a,b,c,d), when two statistics on each
> of the column pairs would be cheaper and also more accurate.
> 
> That's of course a trivial case - independent groups of correlated
> columns. But I'd say this is actually a pretty common case, and I do
> believe there's not much controversy that we should support it.
> 
> Another reason to allow multiple statistics is that columns in one
> group may be a good fit for MCV list (which works well for discrete
> values), while the other group may be a good candidate for histogram
> (which works well for continuous values). This can't be solved by
> first building a MCV and then a histogram on the group.
> 
> The question of course is what to do if the groups are not
> independent. The patch does that by assuming the statistics overlap,
> and uses conditions on the columns included in both statistics to
> combine them using conditional probabilities. I do believe this works
> quite well, but this is perhaps the part that needs further
> discussion. There are other ways to combine the statistics, but I do
> expect them to be considerably more expensive.
> 
> Is this a sufficient explanation?
> 
> Of course, there's a fair amount of additional complexity that I have
> not mentioned here (e.g. selecting the right combination of stats).

Sorry, maybe I did not explain clearyly. My question is, if put
patches only 0002 to 0005 into 9.6, does it still give any visible
benefit to users?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tomas Vondra

Hi,

On 03/22/2016 11:41 AM, Tatsuo Ishii wrote:

Hum. So without 0006 or beyond, there's not much benefit for the
PostgreSQL users, and you are not too confident about 0006 or
beyond. Then I would think it is a little bit hard to justify in
putting 000[2-5] into 9.6. I really like this feature and would
like to see in PostgreSQL someday, but I'm not sure if we should
put the patches (0002-0005) into PostgreSQL now. Please let me
know if there's some reaons we should put the patches into
PostgreSQL now.


I don't think so. While being able to combine multiple statistics
is certainly useful, I'm convinced that the initial patched add
enough


Can you please elaborate a little bit more how combining multiple
statistics is useful?


Sure.

The goal of multivariate statistics is to approximate a probability 
distribution on a group of columns. The larger the number of columns, 
the less accurate the statistics will be (with respect to individual 
columns), assuming fixed size of the sample in ANALYZE, and fixed 
statistics size.


For example, if you add a column to multivariate histogram, you'll do 
some "bucket splits" by this dimension, thus reducing the accuracy for 
the other columns. You may of course allow larger statistics (e.g. 
histograms with more buckets), but that also requires larger samples, 
and so on.


Now, let's  assume you have a query like this:

WHERE (a=1) AND (b=2) AND (c=3) AND (d=4)

and that "a" and "b" are correlated, and "c" and "d" are correlated, but 
that otherwise the columns are independent. It'd be a bit silly to 
require building statistics on (a,b,c,d), when two statistics on each of 
the column pairs would be cheaper and also more accurate.


That's of course a trivial case - independent groups of correlated 
columns. But I'd say this is actually a pretty common case, and I do 
believe there's not much controversy that we should support it.


Another reason to allow multiple statistics is that columns in one group 
may be a good fit for MCV list (which works well for discrete values), 
while the other group may be a good candidate for histogram (which works 
well for continuous values). This can't be solved by first building a 
MCV and then a histogram on the group.


The question of course is what to do if the groups are not independent. 
The patch does that by assuming the statistics overlap, and uses 
conditions on the columns included in both statistics to combine them 
using conditional probabilities. I do believe this works quite well, but 
this is perhaps the part that needs further discussion. There are other 
ways to combine the statistics, but I do expect them to be considerably 
more expensive.


Is this a sufficient explanation?

Of course, there's a fair amount of additional complexity that I have 
not mentioned here (e.g. selecting the right combination of stats).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
>> Hum. So without 0006 or beyond, there's not much benefit for the
>> PostgreSQL users, and you are not too confident about 0006 or
>> beyond. Then I would think it is a little bit hard to justify in
>> putting 000[2-5] into 9.6. I really like this feature and would like
>> to see in PostgreSQL someday, but I'm not sure if we should put the
>> patches (0002-0005) into PostgreSQL now. Please let me know if there's
>> some reaons we should put the patches into PostgreSQL now.
> 
> I don't think so. While being able to combine multiple statistics is
> certainly useful, I'm convinced that the initial patched add enough

Can you please elaborate a little bit more how combining multiple
statistics is useful?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tomas Vondra

Hello,

On 03/22/2016 09:13 AM, Tatsuo Ishii wrote:

Do you have any other missing parts in this work? I am asking
because I wonder if you want to push this into 9.6 or rather 9.7.


I think the first few parts of the patch series, namely:

  * shared infrastructure (0002)
  * functional dependencies (0003)
  * MCV lists (0004)
  * histograms (0005)

might make it into 9.6. I believe the code for building and storing
the different kinds of stats is reasonably solid. What probably needs
more thorough review are the changes in clauselist_selectivity(), but
the code in these parts is reasonably simple as it only supports using
a single multi-variate statistics per relation.

The part (0006) that allows using multiple statistics (i.e. selects
which of the available stats to use and in what order) is probably the
most complex part of the whole patch, and I myself do have some
questions about some aspects of it. I don't think this part might get
into 9.6 at this point (although it'd be nice if we managed to do
that).


Hum. So without 0006 or beyond, there's not much benefit for the
PostgreSQL users, and you are not too confident about 0006 or
beyond. Then I would think it is a little bit hard to justify in
putting 000[2-5] into 9.6. I really like this feature and would like
to see in PostgreSQL someday, but I'm not sure if we should put the
patches (0002-0005) into PostgreSQL now. Please let me know if there's
some reaons we should put the patches into PostgreSQL now.


I don't think so. While being able to combine multiple statistics is 
certainly useful, I'm convinced that the initial patched add enough 
value on their own, even if the 0006 patch gets committed later.


A lot of queries will be just fine with the "single multivariate 
statistics" limitation, either because it's using less than 8 columns, 
or because only 8 columns are actually correlated. (FWIW the 8 column 
limit is mostly arbitrary, it may get increased if needed.)


I haven't really mentioned the aspects of 0006 that I think need more 
discussion, but it's mostly about the question whether combining the 
statistics by using the overlapping clauses as "conditions" is the right 
thing to do (or whether a more expensive approach is needed). None of 
that however invalidates the preceding patches.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tomas Vondra

Hi,

On 03/22/2016 06:53 AM, Jeff Janes wrote:

On Sun, Mar 20, 2016 at 4:34 PM, Tomas Vondra
 wrote:



D'oh. Thanks for reporting. Attached is v16, hopefully fixing the few
remaining whitespace issues.


Hi Tomas,

I'm trying out v16 against a common problem, where postgresql thinks
it is likely top stop early during a "order by (index express) limit
1" but it doesn't actually stop early due to cross-column
correlations.  But the multivariate statistics don't seem to help.  Am
I doing this wrong, or just expecting too much?


Yes, I think you're expecting a too much from the current patch.

I've been thinking about perhaps addressing cases like this in the 
future, but it requires tracking position within the table somehow (e.g. 
by means of including ctid in the table, or something like that), and 
the current patch does not implement that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-22 Thread Tatsuo Ishii
>> Do you have any other missing parts in this work? I am asking
>> because I wonder if you want to push this into 9.6 or rather 9.7.
> 
> I think the first few parts of the patch series, namely:
> 
>   * shared infrastructure (0002)
>   * functional dependencies (0003)
>   * MCV lists (0004)
>   * histograms (0005)
> 
> might make it into 9.6. I believe the code for building and storing
> the different kinds of stats is reasonably solid. What probably needs
> more thorough review are the changes in clauselist_selectivity(), but
> the code in these parts is reasonably simple as it only supports using
> a single multi-variate statistics per relation.
> 
> The part (0006) that allows using multiple statistics (i.e. selects
> which of the available stats to use and in what order) is probably the
> most complex part of the whole patch, and I myself do have some
> questions about some aspects of it. I don't think this part might get
> into 9.6 at this point (although it'd be nice if we managed to do
> that).

Hum. So without 0006 or beyond, there's not much benefit for the
PostgreSQL users, and you are not too confident about 0006 or
beyond. Then I would think it is a little bit hard to justify in
putting 000[2-5] into 9.6. I really like this feature and would like
to see in PostgreSQL someday, but I'm not sure if we should put the
patches (0002-0005) into PostgreSQL now. Please let me know if there's
some reaons we should put the patches into PostgreSQL now.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-21 Thread Jeff Janes
On Sun, Mar 20, 2016 at 4:34 PM, Tomas Vondra
 wrote:
>
>
> D'oh. Thanks for reporting. Attached is v16, hopefully fixing the few
> remaining whitespace issues.

Hi Tomas,

I'm trying out v16 against a common problem, where postgresql thinks
it is likely top stop early during a "order by (index express) limit
1" but it doesn't actually stop early due to cross-column
correlations.  But the multivariate statistics don't seem to help.  Am
I doing this wrong, or just expecting too much?


jjanes=# create table foo as select x, floor(x/(1000/500))::int as
y  from generate_series(1,1000) f(x);
jjanes=# create index on foo (x,y);
jjanes=# create index on foo (y,x);
jjanes=# create statistics jjj on foo (x,y) with (dependencies,histogram);
jjanes=# vacuum analyze ;


jjanes=# explain (analyze, timing off)  select x from foo where y
between 478 and 480 order by x limit 1;
QUERY PLAN
---
 Limit  (cost=0.43..4.92 rows=1 width=4) (actual rows=1 loops=1)
   ->  Index Only Scan using foo_x_y_idx on foo  (cost=0.43..210156.55
rows=46812 width=4) (actual rows=1 loops=1)
 Index Cond: ((y >= 478) AND (y <= 480))
 Heap Fetches: 0
 Planning time: 0.311 ms
 Execution time: 478.917 ms

Here is walks up the index on x, until it meets the first row meeting
the qualification on y. It thinks it will get to stop early and be
very fast, but it doesn't.

If I add an dummy addition to the ORDER BY, to force it not to talk
the index, I get a plan which uses the other index and is actually
much faster, but is planned to be several hundred times slower:


jjanes=# explain (analyze, timing off)  select x from foo where y
between 478 and 480 order by x+0 limit 1;
QUERY PLAN
---
 Limit  (cost=1803.77..1803.77 rows=1 width=8) (actual rows=1 loops=1)
   ->  Sort  (cost=1803.77..1920.80 rows=46812 width=8) (actual rows=1 loops=1)
 Sort Key: ((x + 0))
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Index Only Scan using foo_y_x_idx on foo
(cost=0.43..1569.70 rows=46812 width=8) (actual rows=6 loops=1)
   Index Cond: ((y >= 478) AND (y <= 480))
   Heap Fetches: 0
 Planning time: 0.175 ms
 Execution time: 20.264 ms

(I use the "timing off" option, because without it the second plan
spends most of its time calling "gettimeofday")

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-21 Thread Tomas Vondra

On 03/21/2016 04:34 AM, Alvaro Herrera wrote:

Another skim on 0002:

reference.sgml is missing a call to 

ObjectProperty[] contains a comment that the ACL is "same as relation",
but is that still correct, given that now stats may be related to more
than one relation?  Do we even know what the rules for ACLs on
cross-relation stats are?  One very simple way to get around this is to
dictate that all the rels must have the same owner.  Perhaps we're not
considering the multi-relation case yet?


As I wrote in response to Robert's message, I don't think we need ACLs 
for statistics - the user should be able to use them when they can 
access all the underlying relations (in a query). For ALTER STATISTICS 
the (owner || superuser) check should be enough, right?




We have this FIXME comment in do_analyze_rel:

+* FIXME This sample sizing is mostly OK when computing stats for
+*   individual columns, but when computing multi-variate stats
+*   for multivariate stats (histograms, mcv, ...) it's rather
+*   insufficient. For stats on multiple columns / complex stats
+*   we need larger sample sizes, because we need to build more
+*   detailed stats (more MCV items / histogram buckets) to get
+*   good accuracy. Maybe it'd be appropriate to use samples
+*   proportional to the table (say, 0.5% - 1%) instead of a
+*   fixed size might be more appropriate. Also, this should be
+*   bound to the requested statistics size - e.g. number of MCV
+*   items or histogram buckets should require several sample
+*   rows per item/bucket (so the sample should be k*size).

Maybe this merits more discussion.  Right now we have an upper bound on
how much to scan for analyze; if we introduce the idea of scanning a
percentage of the relation, the time to analyze very large relations
could increase significantly.  Do we have an idea of what to do for
this?  For instance, a rule that would make me comfortable would say to
scan a sample 3x the current size when you have a mvstats on 3 columns;
then the size of fraction to scan is still bounded.  But does that
actually work?  From the wording of this comment, I assume you don't
actually know.


Yeah. I think more discussion is needed, because I myself am not sure 
the FIXME is actually correct. For now I think we're OK with using the 
same logic as statistics on a single column (300 * target).




In this block (CreateStatistics)
+   /* look for duplicities */
+   for (i = 0; i < numcols; i++)
+   for (j = 0; j < numcols; j++)
+   if ((i != j) && (attnums[i] == attnums[j]))
+   ereport(ERROR,
+   
(errcode(ERRCODE_UNDEFINED_COLUMN),
+errmsg("duplicate column name in 
statistics definition")));

isn't it easier to have the inner loop go from i+1 to numcols?


It probably is.



I wonder if this is sensible with multi-relation statistics:
+   /*
+* Store a dependency too, so that statistics are dropped on DROP TABLE
+*/
+   parentobject.classId = RelationRelationId;
+   parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+   parentobject.objectSubId = 0;
+   childobject.classId = MvStatisticRelationId;
+   childobject.objectId = statoid;
+   childobject.objectSubId = 0;

I suppose the idea is to drop the stats if any of the rels they are for
is dropped.


What do you mean by sensible? I mean, we don't support multiple tables 
at this point (except for choosing a syntax that should allow that), but 
the code assumes a single relation on a few places (like this one).




Right after that you create a dependency on the schema.  Is that
necessary?  Since you have the dependency on the relation, the stats
would be dropped by recursion.


H, that's probably right. Also, now that I think about it, it 
probably gets broken after ALTER STATISTICS ... SET SCHEMA, because the 
code does not remove the old dependency (and does not create a new one).




Why are you #include'ing builtins.h everywhere?


Stupidity.



RelationGetMVStatList() needs a comment.


OK.



Please get rid of common.h.  It's totally unlike the way we structure
our header files.  We don't keep headers in src/backend; they're all in
src/include.  One reason is that the latter gets installed as a whole in
include/server, which this file will not be.  This file may be necessary
to build some extensions in the future, for example.


OK, I'll rework that and move it to src/include/.



In mvstats.h, please mark function prototypes as "extern".

Many files need a pgindent pass.


OK.

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list 

Re: [HACKERS] multivariate statistics v14

2016-03-21 Thread Tomas Vondra

Hi,

On 03/21/2016 10:34 AM, Robert Haas wrote:

On Sun, Mar 20, 2016 at 11:34 PM, Alvaro Herrera
 wrote:

ObjectProperty[] contains a comment that the ACL is "same as relation",
but is that still correct, given that now stats may be related to more
than one relation?  Do we even know what the rules for ACLs on
cross-relation stats are?  One very simple way to get around this is to
dictate that all the rels must have the same owner.


That's not really all that simple - you'd have to forbid changing
the owner of a relation involved in multi-rel statistics, but that's
horrible. Presumably at the very least you'd then have to find some
way of allowing the owner of everything in the group to be changed
at the same time, but that's a whole new innovation. I think this is
a very messy line of attack.


I agree. I don't think we should / need to impose such additional 
restrictions (e.g. same owner for all tables).


I think for using the statistics (to compute estimates for a query), it 
should be enough that the user can access all the tables it's built on. 
Which happens somehow implicitly, and currently it's trivial as each 
statistics is built on a single table.


I don't have a clear idea what should we do in the future with multiple 
tables (e.g. when the statistics is built on 3 tables, the query is on 2 
of them and the user does not have access to the remaining one).


But maybe we need to support ACLs because of ALTER STATISTICS?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-21 Thread Robert Haas
On Sun, Mar 20, 2016 at 11:34 PM, Alvaro Herrera
 wrote:
> ObjectProperty[] contains a comment that the ACL is "same as relation",
> but is that still correct, given that now stats may be related to more
> than one relation?  Do we even know what the rules for ACLs on
> cross-relation stats are?  One very simple way to get around this is to
> dictate that all the rels must have the same owner.

That's not really all that simple - you'd have to forbid changing the
owner of a relation involved in multi-rel statistics, but that's
horrible.  Presumably at the very least you'd then have to find some
way of allowing the owner of everything in the group to be changed at
the same time, but that's a whole new innovation.  I think this is a
very messy line of attack.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-20 Thread Alvaro Herrera
Another skim on 0002:

reference.sgml is missing a call to 

ObjectProperty[] contains a comment that the ACL is "same as relation",
but is that still correct, given that now stats may be related to more
than one relation?  Do we even know what the rules for ACLs on
cross-relation stats are?  One very simple way to get around this is to
dictate that all the rels must have the same owner.  Perhaps we're not
considering the multi-relation case yet?

We have this FIXME comment in do_analyze_rel:

+* FIXME This sample sizing is mostly OK when computing stats for
+*   individual columns, but when computing multi-variate stats
+*   for multivariate stats (histograms, mcv, ...) it's rather
+*   insufficient. For stats on multiple columns / complex stats
+*   we need larger sample sizes, because we need to build more
+*   detailed stats (more MCV items / histogram buckets) to get
+*   good accuracy. Maybe it'd be appropriate to use samples
+*   proportional to the table (say, 0.5% - 1%) instead of a
+*   fixed size might be more appropriate. Also, this should be
+*   bound to the requested statistics size - e.g. number of MCV
+*   items or histogram buckets should require several sample
+*   rows per item/bucket (so the sample should be k*size).

Maybe this merits more discussion.  Right now we have an upper bound on
how much to scan for analyze; if we introduce the idea of scanning a
percentage of the relation, the time to analyze very large relations
could increase significantly.  Do we have an idea of what to do for
this?  For instance, a rule that would make me comfortable would say to
scan a sample 3x the current size when you have a mvstats on 3 columns;
then the size of fraction to scan is still bounded.  But does that
actually work?  From the wording of this comment, I assume you don't
actually know.

In this block (CreateStatistics)
+   /* look for duplicities */
+   for (i = 0; i < numcols; i++)
+   for (j = 0; j < numcols; j++)
+   if ((i != j) && (attnums[i] == attnums[j]))
+   ereport(ERROR,
+   
(errcode(ERRCODE_UNDEFINED_COLUMN),
+errmsg("duplicate column name 
in statistics definition")));

isn't it easier to have the inner loop go from i+1 to numcols?


I wonder if this is sensible with multi-relation statistics:
+   /*
+* Store a dependency too, so that statistics are dropped on DROP TABLE
+*/
+   parentobject.classId = RelationRelationId;
+   parentobject.objectId = ObjectIdGetDatum(RelationGetRelid(rel));
+   parentobject.objectSubId = 0;
+   childobject.classId = MvStatisticRelationId;
+   childobject.objectId = statoid;
+   childobject.objectSubId = 0;

I suppose the idea is to drop the stats if any of the rels they are for
is dropped.

Right after that you create a dependency on the schema.  Is that
necessary?  Since you have the dependency on the relation, the stats
would be dropped by recursion.

Why are you #include'ing builtins.h everywhere?

RelationGetMVStatList() needs a comment.

Please get rid of common.h.  It's totally unlike the way we structure
our header files.  We don't keep headers in src/backend; they're all in
src/include.  One reason is that the latter gets installed as a whole in
include/server, which this file will not be.  This file may be necessary
to build some extensions in the future, for example.

In mvstats.h, please mark function prototypes as "extern".

Many files need a pgindent pass.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-20 Thread Tatsuo Ishii
>> Many trailing white spaces found.
> 
> Sorry, haven't noticed that after one of the rebases. Fixed in the
> attached v15 of the patch.

There are still few of traling spaces.

/home/t-ishii/0002-shared-infrastructure-and-functional-dependencies.patch:3792:
 trailing whitespace.
/home/t-ishii/0004-multivariate-MCV-lists.patch:471: trailing whitespace.
/home/t-ishii/0004-multivariate-MCV-lists.patch:656: space before tab in indent.
{
/home/t-ishii/0004-multivariate-MCV-lists.patch:682: space before tab in indent.
}
/home/t-ishii/0004-multivariate-MCV-lists.patch:685: space before tab in indent.
{
/home/t-ishii/0004-multivariate-MCV-lists.patch:715: trailing whitespace.
/home/t-ishii/0006-multi-statistics-estimation.patch:2513: trailing whitespace.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-18 Thread Tomas Vondra

Hi,

On 03/16/2016 03:58 AM, Tatsuo Ishii wrote:

I apology if it's already discussed. I am new to this patch.


Attached is v15 of the patch series, fixing this and also doing quite a
few additional improvements:

* added some basic examples into the SGML documentation

* addressing the objectaddress omissions, as pointed out by Alvaro

* support for ALTER STATISTICS ... OWNER TO / RENAME / SET SCHEMA

* significant refactoring of MCV and histogram code, particularly
  serialization, deserialization and building

* reworking the functional dependencies to support more complex
  dependencies, with multiple columns as 'conditions'

* the reduction using functional dependencies is also significantly
  simplified (I decided to get rid of computing the transitive closure
  for now - it got too complex after the multi-condition dependencies,
  so I'll leave that for the future


Do you have any other missing parts in this work? I am asking
because I wonder if you want to push this into 9.6 or rather 9.7.


I think the first few parts of the patch series, namely:

  * shared infrastructure (0002)
  * functional dependencies (0003)
  * MCV lists (0004)
  * histograms (0005)

might make it into 9.6. I believe the code for building and storing the 
different kinds of stats is reasonably solid. What probably needs more 
thorough review are the changes in clauselist_selectivity(), but the 
code in these parts is reasonably simple as it only supports using a 
single multi-variate statistics per relation.


The part (0006) that allows using multiple statistics (i.e. selects 
which of the available stats to use and in what order) is probably the 
most complex part of the whole patch, and I myself do have some 
questions about some aspects of it. I don't think this part might get 
into 9.6 at this point (although it'd be nice if we managed to do that).


I can also imagine moving the ndistinct pieces forward, in front of 0006 
if that helps getting it into 9.6. There's a bit more work on making it 
more flexible, though, to allow handling subsets columns (currently we 
need a perfect match).



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-16 Thread Kyotaro HORIGUCHI
Hello, I returned to this.

At Sun, 13 Mar 2016 22:59:38 +0100, Tomas Vondra  
wrote in <1457906378.27231.10.ca...@2ndquadrant.com>
> Oh, yeah. There was an extra pfree().
> 
> Attached is v15 of the patch series, fixing this and also doing quite a
> few additional improvements:
> 
> * added some basic examples into the SGML documentation
> 
> * addressing the objectaddress omissions, as pointed out by Alvaro
> 
> * support for ALTER STATISTICS ... OWNER TO / RENAME / SET SCHEMA
> 
> * significant refactoring of MCV and histogram code, particularly 
>   serialization, deserialization and building
> 
> * reworking the functional dependencies to support more complex 
>   dependencies, with multiple columns as 'conditions'
> 
> * the reduction using functional dependencies is also significantly 
>   simplified (I decided to get rid of computing the transitive closure 
>   for now - it got too complex after the multi-condition dependencies, 
>   so I'll leave that for the future

Many trailing white spaces found.

0002

+ * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group

 2014 should be 2016? 


 This patch defines many "magic"s for many structs, but
 magic(number)s seems to be used to identify file or buffer page
 in PostgreSQL. They wouldn't be needed if you don't intend to
 dig out or identify the orphan memory blocks of mvstats.

+   MVDependencydeps[1];/* XXX why not a pointer? */

MVDependency seems to be a pointer type. 

+   if (numcols >= MVSTATS_MAX_DIMENSIONS)
+   ereport(ERROR,
and
+   Assert((attrs->dim1 >= 2) && (attrs->dim1 <= 
MVSTATS_MAX_DIMENSIONS));

seem to be contradicting.

.. Sorry, time is up..

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-15 Thread Tatsuo Ishii
I apology if it's already discussed. I am new to this patch.

> Attached is v15 of the patch series, fixing this and also doing quite a
> few additional improvements:
> 
> * added some basic examples into the SGML documentation
> 
> * addressing the objectaddress omissions, as pointed out by Alvaro
> 
> * support for ALTER STATISTICS ... OWNER TO / RENAME / SET SCHEMA
> 
> * significant refactoring of MCV and histogram code, particularly 
>   serialization, deserialization and building
> 
> * reworking the functional dependencies to support more complex 
>   dependencies, with multiple columns as 'conditions'
> 
> * the reduction using functional dependencies is also significantly 
>   simplified (I decided to get rid of computing the transitive closure 
>   for now - it got too complex after the multi-condition dependencies, 
>   so I'll leave that for the future

Do you have any other missing parts in this work? I am asking because
I wonder if you want to push this into 9.6 or rather 9.7.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-15 Thread Tatsuo Ishii
> Instead of simply multiplying the ndistinct estimate with selecticity,
> we instead use the formula for the expected number of distinct values
> observed in 'k' rows when there are 'd' distinct values in the bin
> 
> d * (1 - ((d - 1) / d)^k)
> 
> This is 'with replacements' which seems appropriate for the use, and it
> mostly assumes uniform distribution of the distinct values. So if the
> distribution is not uniform (e.g. there are very frequent groups) this
> may be less accurate than the current algorithm in some cases, giving
> over-estimates. But that's probably better than OOM.
> ---
>  src/backend/utils/adt/selfuncs.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/backend/utils/adt/selfuncs.c 
> b/src/backend/utils/adt/selfuncs.c
> index f8d39aa..6eceedf 100644
> --- a/src/backend/utils/adt/selfuncs.c
> +++ b/src/backend/utils/adt/selfuncs.c
> @@ -3466,7 +3466,7 @@ estimate_num_groups(PlannerInfo *root, List 
> *groupExprs, double input_rows,
>   /*
>* Multiply by restriction selectivity.
>*/
> - reldistinct *= rel->rows / rel->tuples;
> + reldistinct = reldistinct * (1 - powl((reldistinct - 1) 
> / reldistinct,rel->rows));

Why do you change "*=" style? I see no reason to change this.

reldistinct *= 1 - powl((reldistinct - 1) / 
reldistinct, rel->rows);

Looks better to me because it's shorter and cleaner.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-12 Thread Jeff Janes
On Wed, Mar 9, 2016 at 9:21 AM, Tomas Vondra
 wrote:
> Hi,
>
> On Wed, 2016-03-09 at 08:45 -0800, Jeff Janes wrote:
>> On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra
>>  wrote:
>> > Hi,
>> >
>> > thanks for the feedback. Attached is v14 of the patch series, fixing
>> > most of the points you've raised.
>>
>>
>> Hi Tomas,
>>
>> Applied to aa09cd242fa7e3a694a31f, I still get the seg faults in make
>> check if I configure without --enable-cassert.
>
> Ah, after disabling asserts I can reproduce it too. And the reason why
> it fails is quite simple - clauselist_selectivity modifies the original
> list of clauses, which then confuses cost_qual_eval.
>
> Can you try if the attached patch fixes the issue? I'll need to rework a
> bit more of the code, but let's see if this fixes the issue on your
> machine too.

That patch on top of v14 did fix the original problem.  But I got
another segfault:

jjanes=# create table foo as select x, floor(x/(1000/500))::int as
y  from generate_series(1,1000) f(x);
jjanes=# create index on foo (x,y);
jjanes=# create index on foo (y,x);
jjanes=# create statistics jjj on foo (x,y) with (dependencies,histogram);
jjanes=# analyze ;
server closed the connection unexpectedly

#0  multi_sort_add_dimension (mss=mss@entry=0x7f45dafc7c88,
sortdim=sortdim@entry=0, dim=dim@entry=0,
vacattrstats=vacattrstats@entry=0x16f0dd0) at common.c:436
#1  0x007d022a in update_bucket_ndistinct (attrs=0x166fdf8,
stats=0x16f0dd0, bucket=) at histogram.c:1384
#2  0x007d09aa in create_initial_mv_bucket (stats=0x16f0dd0,
attrs=0x166fdf8, rows=0x17cda20, numrows=3) at histogram.c:880
#3  build_mv_histogram (numrows=3, rows=rows@entry=0x170ecf0,
attrs=attrs@entry=0x166fdf8, stats=stats@entry=0x16f0dd0,
numrows_total=numrows_total@entry=3)
at histogram.c:156
#4  0x007ced19 in build_mv_stats
(onerel=onerel@entry=0x7f45e797d040, totalrows=985,
numrows=numrows@entry=3, rows=rows@entry=0x170ecf0,
natts=natts@entry=2,
vacattrstats=vacattrstats@entry=0x166efa0) at common.c:106
#5  0x0055ff6b in do_analyze_rel
(onerel=onerel@entry=0x7f45e797d040, options=options@entry=2,
va_cols=va_cols@entry=0x0, acquirefunc=,
relpages=44248,
inh=inh@entry=0 '\000', in_outer_xact=in_outer_xact@entry=0
'\000', elevel=elevel@entry=13, params=0x7ffcbe382a30) at
analyze.c:585
#6  0x00560ced in analyze_rel (relid=relid@entry=16441,
relation=relation@entry=0x16bc9d0, options=options@entry=2,
params=params@entry=0x7ffcbe382a30,
va_cols=va_cols@entry=0x0, in_outer_xact=,
bstrategy=0x16640f0) at analyze.c:262
#7  0x005b70fd in vacuum (options=2, relation=0x16bc9d0,
relid=relid@entry=0, params=params@entry=0x7ffcbe382a30, va_cols=0x0,
bstrategy=,
bstrategy@entry=0x0, isTopLevel=isTopLevel@entry=1 '\001') at vacuum.c:313
#8  0x005b748e in ExecVacuum (vacstmt=vacstmt@entry=0x16bca20,
isTopLevel=isTopLevel@entry=1 '\001') at vacuum.c:121
#9  0x006c90f3 in standard_ProcessUtility
(parsetree=0x16bca20, queryString=0x16bbfc0 "analyze foo ;",
context=, params=0x0, dest=0x16bcd60,
completionTag=0x7ffcbe382fa0 "") at utility.c:654
#10 0x7f45e413b1d1 in pgss_ProcessUtility (parsetree=0x16bca20,
queryString=0x16bbfc0 "analyze foo ;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x16bcd60,
completionTag=0x7ffcbe382fa0 "") at pg_stat_statements.c:986
#11 0x006c6841 in PortalRunUtility (portal=0x16f7700,
utilityStmt=0x16bca20, isTopLevel=, dest=0x16bcd60,
completionTag=0x7ffcbe382fa0 "") at pquery.c:1175
#12 0x006c73c5 in PortalRunMulti
(portal=portal@entry=0x16f7700, isTopLevel=isTopLevel@entry=1 '\001',
dest=dest@entry=0x16bcd60, altdest=altdest@entry=0x16bcd60,
completionTag=completionTag@entry=0x7ffcbe382fa0 "") at pquery.c:1306
#13 0x006c7dd9 in PortalRun (portal=portal@entry=0x16f7700,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1
'\001', dest=dest@entry=0x16bcd60,
altdest=altdest@entry=0x16bcd60,
completionTag=completionTag@entry=0x7ffcbe382fa0 "") at pquery.c:813
#14 0x006c5c98 in exec_simple_query (query_string=0x16bbfc0
"analyze foo ;") at postgres.c:1094
#15 PostgresMain (argc=, argv=argv@entry=0x164baf8,
dbname=0x164b9a8 "jjanes", username=) at
postgres.c:4021
#16 0x0047cb1e in BackendRun (port=0x1669d40) at postmaster.c:4258
#17 BackendStartup (port=0x1669d40) at postmaster.c:3932
#18 ServerLoop () at postmaster.c:1690
#19 0x0066ff27 in PostmasterMain (argc=argc@entry=1,
argv=argv@entry=0x164aa10) at postmaster.c:1298
#20 0x0047d35e in main (argc=1, argv=0x164aa10) at main.c:228

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-09 Thread Tomas Vondra
On Wed, 2016-03-09 at 18:21 +0100, Tomas Vondra wrote:
> Hi,
> 
> On Wed, 2016-03-09 at 08:45 -0800, Jeff Janes wrote:
> > On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra
> >  wrote:
> > > Hi,
> > >
> > > thanks for the feedback. Attached is v14 of the patch series, fixing
> > > most of the points you've raised.
> > 
> > 
> > Hi Tomas,
> > 
> > Applied to aa09cd242fa7e3a694a31f, I still get the seg faults in make
> > check if I configure without --enable-cassert.
> 
> Ah, after disabling asserts I can reproduce it too. And the reason why
> it fails is quite simple - clauselist_selectivity modifies the original
> list of clauses, which then confuses cost_qual_eval.

More precisely, it gets confused because the first clause in the list
gets deleted but cost_qual_eval never learns about that, and follows
stale pointer to the next cell, thus a segfault.

> 
> Can you try if the attached patch fixes the issue? I'll need to rework a
> bit more of the code, but let's see if this fixes the issue on your
> machine too.
> 
> > With --enable-cassert, it passes the regression test.
> 
> I wonder how can it work with casserts and fail without them. That's
> kinda exactly the opposite to what I'd expect ...

FWIW it seems to be somehow related to this assert in clausesel.c:

   Assert(count_mv_attnums(list_union(stat_clauses, stat_conditions),   
  relid, MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2);

With the assert in place, the code passes without a failure. After
removing the assert (commenting it out), or even just changing it to

Assert(count_mv_attnums(stat_clauses, relid,
MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST)
 + count_mv_attnums(stat_conditions, relid,
MV_CLAUSE_TYPE_MCV | MV_CLAUSE_TYPE_HIST) >= 2);

i.e. removing the list_union, it fails as expected.

The only thing that I can think of is that list_union happens to place
the right stuff at the right position in memory - pure luck.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-09 Thread Jeff Janes
On Wed, Mar 9, 2016 at 9:21 AM, Tomas Vondra
 wrote:
> Hi,
>
> On Wed, 2016-03-09 at 08:45 -0800, Jeff Janes wrote:
>> On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra
>>  wrote:
>> > Hi,
>> >
>> > thanks for the feedback. Attached is v14 of the patch series, fixing
>> > most of the points you've raised.
>>
>>
>> Hi Tomas,
>>
>> Applied to aa09cd242fa7e3a694a31f, I still get the seg faults in make
>> check if I configure without --enable-cassert.
>
> Ah, after disabling asserts I can reproduce it too. And the reason why
> it fails is quite simple - clauselist_selectivity modifies the original
> list of clauses, which then confuses cost_qual_eval.
>
> Can you try if the attached patch fixes the issue? I'll need to rework a
> bit more of the code, but let's see if this fixes the issue on your
> machine too.

Yes, that fixes it.


>
>> With --enable-cassert, it passes the regression test.
>
> I wonder how can it work with casserts and fail without them. That's
> kinda exactly the opposite to what I'd expect ...

I too was surprised by that.  Maybe cassert makes a copy of some data
structure which is used in-place without cassert?

Thanks,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-09 Thread Tomas Vondra
Hi,

On Wed, 2016-03-09 at 08:45 -0800, Jeff Janes wrote:
> On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra
>  wrote:
> > Hi,
> >
> > thanks for the feedback. Attached is v14 of the patch series, fixing
> > most of the points you've raised.
> 
> 
> Hi Tomas,
> 
> Applied to aa09cd242fa7e3a694a31f, I still get the seg faults in make
> check if I configure without --enable-cassert.

Ah, after disabling asserts I can reproduce it too. And the reason why
it fails is quite simple - clauselist_selectivity modifies the original
list of clauses, which then confuses cost_qual_eval.

Can you try if the attached patch fixes the issue? I'll need to rework a
bit more of the code, but let's see if this fixes the issue on your
machine too.

> With --enable-cassert, it passes the regression test.

I wonder how can it work with casserts and fail without them. That's
kinda exactly the opposite to what I'd expect ...

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index 2540da9..ddfdc3b 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -279,6 +279,10 @@ clauselist_selectivity(PlannerInfo *root,
 		List *solution = choose_mv_statistics(root, relid, stats,
 			  clauses, conditions);
 
+		/* FIXME we must not scribble over the original list */
+		if (solution)
+			clauses = list_copy(clauses);
+
 		/*
 		 * We have a good solution, which is merely a list of statistics that
 		 * we need to apply. We'll apply the statistics one by one (in the order

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multivariate statistics v14

2016-03-09 Thread Jeff Janes
On Wed, Mar 9, 2016 at 7:02 AM, Tomas Vondra
 wrote:
> Hi,
>
> thanks for the feedback. Attached is v14 of the patch series, fixing
> most of the points you've raised.


Hi Tomas,

Applied to aa09cd242fa7e3a694a31f, I still get the seg faults in make
check if I configure without --enable-cassert.

With --enable-cassert, it passes the regression test.

I got the core file, configured and compiled with:
CFLAGS="-fno-omit-frame-pointer"  --enable-debug

The first core dump is on this statement:

  -- check explain (expect bitmap index scan, not plain index scan)
  INSERT INTO functional_dependencies
   SELECT i/1, i/2, i/4 FROM generate_series(1,100) s(i);

bt

#0  0x006e1160 in cost_qual_eval (cost=0x2494418,
quals=0x2495550, root=0x2541b88) at costsize.c:3181
#1  0x006e1ee5 in set_baserel_size_estimates (root=0x2541b88,
rel=0x2494300) at costsize.c:3754
#2  0x006d37e8 in set_plain_rel_size (root=0x2541b88,
rel=0x2494300, rte=0x247e660) at allpaths.c:480
#3  0x006d353d in set_rel_size (root=0x2541b88, rel=0x2494300,
rti=1, rte=0x247e660) at allpaths.c:350
#4  0x006d338f in set_base_rel_sizes (root=0x2541b88) at allpaths.c:270
#5  0x006d3233 in make_one_rel (root=0x2541b88,
joinlist=0x2494628) at allpaths.c:169
#6  0x0070012e in query_planner (root=0x2541b88,
tlist=0x2541e58, qp_callback=0x7048d4 ,
qp_extra=0x7ffefa6474e0)
at planmain.c:246
#7  0x00702a33 in grouping_planner (root=0x2541b88,
inheritance_update=0 '\000', tuple_fraction=0) at planner.c:1647
#8  0x00701310 in subquery_planner (glob=0x2541af8,
parse=0x246a838, parent_root=0x0, hasRecursion=0 '\000',
tuple_fraction=0) at planner.c:740
#9  0x0070055b in standard_planner (parse=0x246a838,
cursorOptions=256, boundParams=0x0) at planner.c:290
#10 0x0070023f in planner (parse=0x246a838, cursorOptions=256,
boundParams=0x0) at planner.c:160
#11 0x007b8bf9 in pg_plan_query (querytree=0x246a838,
cursorOptions=256, boundParams=0x0) at postgres.c:798
#12 0x005d1967 in ExplainOneQuery (query=0x246a838, into=0x0,
es=0x246a778,
queryString=0x2443d80 "EXPLAIN (COSTS off)\n SELECT * FROM
mcv_list WHERE a = 10 AND b = 5;", params=0x0) at explain.c:350
#13 0x005d16a3 in ExplainQuery (stmt=0x2444f90,
queryString=0x2443d80 "EXPLAIN (COSTS off)\n SELECT * FROM mcv_list
WHERE a = 10 AND b = 5;",
params=0x0, dest=0x246a6e8) at explain.c:244
#14 0x007c0afb in standard_ProcessUtility (parsetree=0x2444f90,
queryString=0x2443d80 "EXPLAIN (COSTS off)\n SELECT * FROM
mcv_list WHERE a = 10 AND b = 5;", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0,
dest=0x246a6e8, completionTag=0x7ffefa647b60 "") at utility.c:659
#15 0x007c0299 in ProcessUtility (parsetree=0x2444f90,
queryString=0x2443d80 "EXPLAIN (COSTS off)\n SELECT * FROM mcv_list
WHERE a = 10 AND b = 5;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, dest=0x246a6e8,
completionTag=0x7ffefa647b60 "") at utility.c:335
#16 0x007bf47b in PortalRunUtility (portal=0x23ed510,
utilityStmt=0x2444f90, isTopLevel=1 '\001', dest=0x246a6e8,
completionTag=0x7ffefa647b60 "")
at pquery.c:1183
#17 0x007bf1ce in FillPortalStore (portal=0x23ed510,
isTopLevel=1 '\001') at pquery.c:1057
#18 0x007beb19 in PortalRun (portal=0x23ed510,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x253f6c0,
altdest=0x253f6c0,
completionTag=0x7ffefa647d40 "") at pquery.c:781
#19 0x007b90ae in exec_simple_query (query_string=0x2443d80
"EXPLAIN (COSTS off)\n SELECT * FROM mcv_list WHERE a = 10 AND b =
5;")
at postgres.c:1094
#20 0x007bcfac in PostgresMain (argc=1, argv=0x23d5070,
dbname=0x23d4e48 "regression", username=0x23d4e30 "jjanes") at
postgres.c:4021
#21 0x00745a62 in BackendRun (port=0x23f4110) at postmaster.c:4258
#22 0x007451d6 in BackendStartup (port=0x23f4110) at postmaster.c:3932
#23 0x00741ab7 in ServerLoop () at postmaster.c:1690
#24 0x007411c0 in PostmasterMain (argc=8, argv=0x23d3f20) at
postmaster.c:1298
#25 0x00690026 in main (argc=8, argv=0x23d3f20) at main.c:223

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers