Re: Multivariate MCV list vs. statistics target

2020-09-05 Thread Justin Pryzby
I think the docs are inconsistent with the commit message and the code
(d06215d03) and docs should be corrected, soemthing like so:

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index b135c89005..cd10a6a6fc 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7302,7 +7302,8 @@ SCRAM-SHA-256$iteration 
count:
of statistics accumulated for this statistics object by
.
A zero value indicates that no statistics should be collected.
-   A negative value says to use the system default statistics target.
+   A negative value says to use the maximum of the statistics targets of
+   the referenced columns, if set, or the system default statistics target.
Positive values of stxstattarget
determine the target number of most common values
to collect.
diff --git a/doc/src/sgml/ref/alter_statistics.sgml 
b/doc/src/sgml/ref/alter_statistics.sgml
index be4c3f1f05..f2e8a93166 100644
--- a/doc/src/sgml/ref/alter_statistics.sgml
+++ b/doc/src/sgml/ref/alter_statistics.sgml
@@ -101,7 +101,8 @@ ALTER STATISTICS name SET STATISTIC
 The statistic-gathering target for this statistics object for 
subsequent
  operations.
 The target can be set in the range 0 to 1; alternatively, set it
-to -1 to revert to using the system default statistics
+to -1 to revert to using the maximum statistics target of the
+referenced column's, if set, or the system default statistics
 target ().
 For more information on the use of statistics by the
 PostgreSQL query planner, refer to
-- 
2.17.0





Re: Multivariate MCV list vs. statistics target

2020-03-18 Thread Tomas Vondra

On Wed, Mar 18, 2020 at 01:32:19PM +, Shinoda, Noriyoshi (PN Japan
A Delivery) wrote:

Hi,


Thanks for the report. Yes, this is clearly an omission. I think it
would be better to use wording >similar to attstattarget, per the
attached patch. That is, without reference to ALTER STATISTICS and

better explaination of what positive/negative values do. Do you
agree?


Thank you for your comment.  I agree with the text you suggested.



Thank you for the report, I've pushed the reworded fix.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




RE: Multivariate MCV list vs. statistics target

2020-03-18 Thread Shinoda, Noriyoshi (PN Japan A Delivery)
Hi, 

>Thanks for the report. Yes, this is clearly an omission. I think it would be 
>better to use wording >similar to attstattarget, per the attached patch. That 
>is, without reference to ALTER STATISTICS and >better explaination of what 
>positive/negative values do. Do you agree?

Thank you for your comment.
I agree with the text you suggested.

Regards,
Noriyoshi Shinoda

-Original Message-
From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com] 
Sent: Wednesday, March 18, 2020 9:36 PM
To: Shinoda, Noriyoshi (PN Japan A Delivery) 
Cc: Alvaro Herrera ; Thomas Munro 
; Jamison, Kirk ; Dean 
Rasheed ; PostgreSQL Hackers 

Subject: Re: Multivariate MCV list vs. statistics target

Hi,

On Wed, Mar 18, 2020 at 04:28:34AM +, Shinoda, Noriyoshi (PN Japan A 
Delivery) wrote:
>Hello,
>
>I found a missing column description in the pg_statistic_ext catalog document 
>for this new feature.
>The attached patch adds a description of the 'stxstattarget' column to 
>pg_statistic_ext catalog's document.
>If there is a better explanation, please correct it.
>

Thanks for the report. Yes, this is clearly an omission. I think it would be 
better to use wording similar to attstattarget, per the attached patch. That 
is, without reference to ALTER STATISTICS and better explaination of what 
positive/negative values do. Do you agree?


regards

>Regards,
>Noriyoshi Shinoda
>
>-Original Message-
>From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com]
>Sent: Wednesday, September 11, 2019 7:28 AM
>To: Alvaro Herrera 
>Cc: Thomas Munro ; Jamison, Kirk 
>; Dean Rasheed ; 
>PostgreSQL Hackers 
>Subject: Re: Multivariate MCV list vs. statistics target
>
>On Tue, Sep 03, 2019 at 02:38:56PM -0400, Alvaro Herrera wrote:
>>On 2019-Aug-01, Tomas Vondra wrote:
>>
>>> I'll move it to the next CF. Aside from the issues pointed by 
>>> Kyotaro-san in his review, I still haven't made my mind about 
>>> whether to base the use statistics targets set for the attributes. 
>>> That's what we're doing now, but I'm not sure it's a good idea after adding 
>>> separate statistics target.
>>> I wonder what Dean's opinion on this is, as he added the current logic.
>>
>>Latest patch no longer applies.  Please update.  And since you already 
>>seem to have handled all review comments since it was Ready for 
>>Committer, and you now know Dean's opinion on the remaining question, 
>>please get it pushed.
>>
>
>OK, I've pushed this the patch, after some minor polishing.
>
>
>regards
>
>-- 
>Tomas Vondra  http://www.2ndQuadrant.com 
>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>



-- 
Tomas Vondra  http://www.2ndQuadrant.com 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Multivariate MCV list vs. statistics target

2020-03-18 Thread Tomas Vondra

Hi,

On Wed, Mar 18, 2020 at 04:28:34AM +, Shinoda, Noriyoshi (PN Japan A 
Delivery) wrote:

Hello,

I found a missing column description in the pg_statistic_ext catalog document 
for this new feature.
The attached patch adds a description of the 'stxstattarget' column to 
pg_statistic_ext catalog's document.
If there is a better explanation, please correct it.



Thanks for the report. Yes, this is clearly an omission. I think it
would be better to use wording similar to attstattarget, per the
attached patch. That is, without reference to ALTER STATISTICS and
better explaination of what positive/negative values do. Do you agree?


regards


Regards,
Noriyoshi Shinoda

-Original Message-
From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com]
Sent: Wednesday, September 11, 2019 7:28 AM
To: Alvaro Herrera 
Cc: Thomas Munro ; Jamison, Kirk ; Dean 
Rasheed ; PostgreSQL Hackers 
Subject: Re: Multivariate MCV list vs. statistics target

On Tue, Sep 03, 2019 at 02:38:56PM -0400, Alvaro Herrera wrote:

On 2019-Aug-01, Tomas Vondra wrote:


I'll move it to the next CF. Aside from the issues pointed by
Kyotaro-san in his review, I still haven't made my mind about whether
to base the use statistics targets set for the attributes. That's
what we're doing now, but I'm not sure it's a good idea after adding separate 
statistics target.
I wonder what Dean's opinion on this is, as he added the current logic.


Latest patch no longer applies.  Please update.  And since you already
seem to have handled all review comments since it was Ready for
Committer, and you now know Dean's opinion on the remaining question,
please get it pushed.



OK, I've pushed this the patch, after some minor polishing.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services







--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c6f95fa688..64614b569c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6452,6 +6452,22 @@ SCRAM-SHA-256$iteration 
count:
   Owner of the statistics object
  
 
+ 
+  stxstattarget
+  int4
+  
+  
+   stxstattarget controls the level of detail
+   of statistics accumulated for this statistics object by
+   .
+   A zero value indicates that no statistics should be collected.
+   A negative value says to use the system default statistics target.
+   Positive values stxstattarget
+   determine the target number of most common values
+   to collect.
+  
+ 
+
  
   stxkeys
   int2vector


RE: Multivariate MCV list vs. statistics target

2020-03-17 Thread Shinoda, Noriyoshi (PN Japan A Delivery)
Hello,

I found a missing column description in the pg_statistic_ext catalog document 
for this new feature.
The attached patch adds a description of the 'stxstattarget' column to 
pg_statistic_ext catalog's document. 
If there is a better explanation, please correct it.

Regards,
Noriyoshi Shinoda

-Original Message-
From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com] 
Sent: Wednesday, September 11, 2019 7:28 AM
To: Alvaro Herrera 
Cc: Thomas Munro ; Jamison, Kirk 
; Dean Rasheed ; PostgreSQL 
Hackers 
Subject: Re: Multivariate MCV list vs. statistics target

On Tue, Sep 03, 2019 at 02:38:56PM -0400, Alvaro Herrera wrote:
>On 2019-Aug-01, Tomas Vondra wrote:
>
>> I'll move it to the next CF. Aside from the issues pointed by 
>> Kyotaro-san in his review, I still haven't made my mind about whether 
>> to base the use statistics targets set for the attributes. That's 
>> what we're doing now, but I'm not sure it's a good idea after adding 
>> separate statistics target.
>> I wonder what Dean's opinion on this is, as he added the current logic.
>
>Latest patch no longer applies.  Please update.  And since you already 
>seem to have handled all review comments since it was Ready for 
>Committer, and you now know Dean's opinion on the remaining question, 
>please get it pushed.
>

OK, I've pushed this the patch, after some minor polishing.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com 
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





pg_statistic_ext_doc_v1.diff
Description: pg_statistic_ext_doc_v1.diff


Re: Multivariate MCV list vs. statistics target

2019-09-10 Thread Tomas Vondra

On Tue, Sep 03, 2019 at 02:38:56PM -0400, Alvaro Herrera wrote:

On 2019-Aug-01, Tomas Vondra wrote:


I'll move it to the next CF. Aside from the issues pointed by Kyotaro-san
in his review, I still haven't made my mind about whether to base the use
statistics targets set for the attributes. That's what we're doing now,
but I'm not sure it's a good idea after adding separate statistics target.
I wonder what Dean's opinion on this is, as he added the current logic.


Latest patch no longer applies.  Please update.  And since you already
seem to have handled all review comments since it was Ready for
Committer, and you now know Dean's opinion on the remaining question,
please get it pushed.



OK, I've pushed this the patch, after some minor polishing.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Multivariate MCV list vs. statistics target

2019-09-03 Thread Alvaro Herrera
On 2019-Aug-01, Tomas Vondra wrote:

> I'll move it to the next CF. Aside from the issues pointed by Kyotaro-san
> in his review, I still haven't made my mind about whether to base the use
> statistics targets set for the attributes. That's what we're doing now,
> but I'm not sure it's a good idea after adding separate statistics target.
> I wonder what Dean's opinion on this is, as he added the current logic.

Latest patch no longer applies.  Please update.  And since you already
seem to have handled all review comments since it was Ready for
Committer, and you now know Dean's opinion on the remaining question,
please get it pushed.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Multivariate MCV list vs. statistics target

2019-08-01 Thread Dean Rasheed
On Thu, 1 Aug 2019 at 11:30, Tomas Vondra  wrote:
>
> I'll move it to the next CF. Aside from the issues pointed by Kyotaro-san
> in his review, I still haven't made my mind about whether to base the use
> statistics targets set for the attributes. That's what we're doing now,
> but I'm not sure it's a good idea after adding separate statistics target.
> I wonder what Dean's opinion on this is, as he added the current logic.
>

If this were being released in the same version as MCV stats first
appeared, I'd say that there's not much point basing the default
multivariate stats target on the per-column targets, when it has its
own knob to control it. However, since this won't be released for a
year, those per-column-based defaults will be in the field for that
long, and so I'd say that we shouldn't change the default when adding
this, otherwise users who don't use this new feature might be
surprised by the change in behaviour.

Regards,
Dean




Re: Multivariate MCV list vs. statistics target

2019-08-01 Thread Tomas Vondra

On Thu, Aug 01, 2019 at 05:25:31PM +1200, Thomas Munro wrote:

On Thu, Aug 1, 2019 at 12:16 PM Jamison, Kirk  wrote:

> On Saturday, July 27, 2019 7:06 AM(GMT+9), Tomas Vondra wrote:
> > On Fri, Jul 26, 2019 at 07:03:41AM +, Jamison, Kirk wrote:
> > >Overall, the patch is almost already in good shape for commit.
> > >I'll wait for the next update.



> The patch passed the make-world and regression tests as well.
> I've marked this as "ready for committer".

The patch does not apply anymore.


Based on the above, it sounds like this patch is super close and the
only problem is bitrot, so I've set it back to Ready for Committer.
Over to Tomas to rebase and commit, or move to the next CF if that's
more appropriate.



I'll move it to the next CF. Aside from the issues pointed by Kyotaro-san
in his review, I still haven't made my mind about whether to base the use
statistics targets set for the attributes. That's what we're doing now,
but I'm not sure it's a good idea after adding separate statistics target.
I wonder what Dean's opinion on this is, as he added the current logic.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Multivariate MCV list vs. statistics target

2019-08-01 Thread Kyotaro Horiguchi
Hello.

At Thu, 1 Aug 2019 00:15:48 +, "Jamison, Kirk"  
wrote in 
> The patch does not apply anymore.
> In addition to the whitespace detected,
> kindly rebase the patch as there were changes from recent commits
> in the following files:
>src/backend/commands/analyze.c
>src/bin/pg_dump/pg_dump.c
>src/bin/pg_dump/t/002_pg_dump.pl
>src/test/regress/expected/stats_ext.out
>src/test/regress/sql/stats_ext.sql

The patch finally failed only for stats_ext.out, where 14ef15a222
is hitting. (for b2a3d706b8)

I looked through this patch and have some comments.



+++ b/src/backend/commands/statscmds.c
+#include "access/heapam.h"
..
+#include "utils/fmgroids.h"

These don't seem needed.


+Assert(stmt->missing_ok);

Perhaps we shouldn't Assert on this condition. Isn't it better we
just "elog(ERROR" here?


+DeconstructQualifiedName(stmt->defnames, , );

Maybe we don't need detailed analysis that the function emits on
error. Couldn't we use NameListToString() instead? That reduces
the number of ereport()s and considerably simplify the code
around.

+  oldtup = SearchSysCache1(STATEXTOID, ObjectIdGetDatum(stxoid));
+
+  /* Must be owner of the existing statistics object */
+  if (!pg_statistics_object_ownercheck(stxoid, GetUserId()))

This repeats the SearchSysCache twice in a quite short
duration. I suppose it'd be better that ACL (and validity) checks
done directly using oldtup.


+  /* replace the stxstattarget column */
+  repl_repl[Anum_pg_statistic_ext_stxstattarget - 1] = true;
+  repl_val[Anum_pg_statistic_ext_stxstattarget - 1] = Int32GetDatum(newtarget)

We usually do this kind of work using SearchSysCacheCopyN(),
which simplifies the code around, too.


+++ b/src/backend/statistics/mcv.c
>* Maximum number of MCV items to store, based on the attribute with the
>* largest stats target (and the number of groups we have available).
>*/
-  nitems = stats[0]->attr->attstattarget;
-  for (i = 1; i < numattrs; i++)
-  {
-if (stats[i]->attr->attstattarget > nitems)
-  nitems = stats[i]->attr->attstattarget;
-  }
+  nitems = stattarget;

Maybe you forgot to modify the comment.


check_xact_readonly() returns false for this command. As the
result it emits a somewhat pointless error message.

=# alter statistics s1 set statistics 0;
ERROR:  cannot assign TransactionIds during recovery


+++ b/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.h
>   i_stxname = PQfnumber(res, "stxname");
>   i_stxnamespace = PQfnumber(res, "stxnamespace");
>   i_rolname = PQfnumber(res, "rolname");
+   i_stattarget = PQfnumber(res, "stxstattarget");

I'm not sure whether it is the convention here, but variable name
is different from column name only for the added line.


+++ b/src/bin/psql/tab-complete.c
-   COMPLETE_WITH("OWNER TO", "RENAME TO", "SET SCHEMA");
+   COMPLETE_WITH("OWNER TO", "RENAME TO", "SET SCHEMA", "SET 
STATISTICS");

ALTER STATISTICS s2 SET STATISTICS is suggested with
ext-stats names,  but it's the place for target value.


+++ b/src/include/nodes/nodes.h
   T_CallStmt,
+  T_AlterStatsStmt,
 
I think it should be immediately below T_CreateStatsStmt.


+++ b/src/include/nodes/parsenodes.h
+  boolmissing_ok;/* skip error if statistics object is missing */

Should be very trivial, but many bool members especially
missing_ok have a comment having "?" at the end.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Multivariate MCV list vs. statistics target

2019-07-31 Thread Thomas Munro
On Thu, Aug 1, 2019 at 12:16 PM Jamison, Kirk  wrote:
> > On Saturday, July 27, 2019 7:06 AM(GMT+9), Tomas Vondra wrote:
> > > On Fri, Jul 26, 2019 at 07:03:41AM +, Jamison, Kirk wrote:
> > > >Overall, the patch is almost already in good shape for commit.
> > > >I'll wait for the next update.

> > The patch passed the make-world and regression tests as well.
> > I've marked this as "ready for committer".
>
> The patch does not apply anymore.

Based on the above, it sounds like this patch is super close and the
only problem is bitrot, so I've set it back to Ready for Committer.
Over to Tomas to rebase and commit, or move to the next CF if that's
more appropriate.

-- 
Thomas Munro
https://enterprisedb.com




RE: Multivariate MCV list vs. statistics target

2019-07-31 Thread Jamison, Kirk
Hi, 

> From: Jamison, Kirk [mailto:k.jami...@jp.fujitsu.com]
> Sent: Monday, July 29, 2019 10:53 AM
> To: 'Tomas Vondra' 
> Cc: Dean Rasheed ; PostgreSQL Hackers
> 
> Subject: RE: Multivariate MCV list vs. statistics target
> 
> On Saturday, July 27, 2019 7:06 AM(GMT+9), Tomas Vondra wrote:
> > On Fri, Jul 26, 2019 at 07:03:41AM +, Jamison, Kirk wrote:
> > >On Sat, July 20, 2019 8:12 AM (GMT+9), Tomas Vondra wrote:
> > >
> > >> >+   /* XXX What if the target is set to 0? Reset the 
> > >> >statistic?
> > >> */
> > >> >
> > >> >This also makes me wonder. I haven't looked deeply into the code,
> > >> >but since 0 is a valid value, I believe it should reset the stats.
> > >>
> > >> I agree resetting the stats after setting the target to 0 seems
> > >> quite reasonable. But that's not what we do for attribute stats,
> > >> because in that case we simply skip the attribute during the future
> > >> ANALYZE runs
> > >> - we don't reset the stats, we keep the existing stats. So I've
> > >> done the same thing here, and I've removed the XXX comment.
> > >>
> > >> If we want to change that, I'd do it in a separate patch for both
> > >> the regular and extended stats.
> > >
> > >Hi, Tomas
> > >
> > >Sorry for my late reply.
> > >You're right. I have no strong opinion whether we'd want to change
> > >that
> > behavior.
> > >I've also confirmed the change in the patch where setting statistics
> > >target 0 skips the statistics.
> > >
> >
> > OK, thanks.
> >
> > >Maybe only some minor nitpicks in the source code comments below:
> > >1. "it's" should be "its":
> > >> + * Compute statistic target, based on what's set for the
> > statistic
> > >> + * object itself, and for it's attributes.
> > >
> > >2. Consistency whether you'd use either "statistic " or "statisticS ".
> > >Ex. statistic target vs statisticS target, statistics object vs
> > >statistic
> > object, etc.
> > >
> > >> Attached is v4 of the patch, with a couple more improvements:
> > >>
> > >> 1) I've renamed the if_not_exists flag to missing_ok, because
> > >> that's more consistent with the "IF EXISTS" clause in the grammar
> > >> (the old flag was kinda the exact opposite), and I've added a NOTICE
> about the skip.
> > >
> > >+  boolmissing_ok;  /* do nothing if statistics does
> > not exist */
> > >
> > >Confirmed. So we ignore if statistic does not exist, and skip the error.
> > >Maybe to make it consistent with other data structures in
> > >parsernodes.h, you can change the comment of missing_ok to:
> > >/* skip error if statistics object does not exist */
> > >
> >
> > Thanks, I've fixed all those places in the attached v5.
> 
> I've confirmed the fix.
> 
> > >> 2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows,
> > >> because that's what the function was doing anyway (computing sample
> size).
> > >>
> > >> 3) I've added a couple of regression tests to stats_ext.sql
> > >>
> > >> Aside from that, I've cleaned up a couple of places and improved a
> > >> bunch of comments. Nothing huge.
> > >
> > >I have a question though regarding ComputeExtStatisticsRows.
> > >I'm just curious with the value 300 when computing sample size.
> > >Where did this value come from?
> > >
> > >+  /* compute sample size based on the statistic target */
> > >+  return (300 * result);
> > >
> > >Overall, the patch is almost already in good shape for commit.
> > >I'll wait for the next update.
> > >
> >
> > That's how we compute number of rows to sample, based on the statistics
> target.
> > See std_typanalyze() in analyze.c, which also cites the paper where
> > this comes from.
> Noted. Found it. Thank you for the reference.
> 
> 
> There's just a small whitespace (extra space) below after running git diff
> --check.
> >src/bin/pg_dump/pg_dump.c:7226: trailing whitespace.
> >+
> It would be better to post an updated patch, but other than that, I've 
> confirmed
> the fixes.
> The patch passed the make-world and regression tests as well.
> I've marked this as "ready for committer".

The patch does not apply anymore.
In addition to the whitespace detected,
kindly rebase the patch as there were changes from recent commits
in the following files:
   src/backend/commands/analyze.c
   src/bin/pg_dump/pg_dump.c
   src/bin/pg_dump/t/002_pg_dump.pl
   src/test/regress/expected/stats_ext.out
   src/test/regress/sql/stats_ext.sql

Regards,
Kirk Jamison




RE: Multivariate MCV list vs. statistics target

2019-07-28 Thread Jamison, Kirk
On Saturday, July 27, 2019 7:06 AM(GMT+9), Tomas Vondra wrote:
> On Fri, Jul 26, 2019 at 07:03:41AM +, Jamison, Kirk wrote:
> >On Sat, July 20, 2019 8:12 AM (GMT+9), Tomas Vondra wrote:
> >
> >> >+ /* XXX What if the target is set to 0? Reset the statistic?
> >> */
> >> >
> >> >This also makes me wonder. I haven't looked deeply into the code,
> >> >but since 0 is a valid value, I believe it should reset the stats.
> >>
> >> I agree resetting the stats after setting the target to 0 seems quite
> >> reasonable. But that's not what we do for attribute stats, because in
> >> that case we simply skip the attribute during the future ANALYZE runs
> >> - we don't reset the stats, we keep the existing stats. So I've done
> >> the same thing here, and I've removed the XXX comment.
> >>
> >> If we want to change that, I'd do it in a separate patch for both the
> >> regular and extended stats.
> >
> >Hi, Tomas
> >
> >Sorry for my late reply.
> >You're right. I have no strong opinion whether we'd want to change that
> behavior.
> >I've also confirmed the change in the patch where setting statistics
> >target 0 skips the statistics.
> >
> 
> OK, thanks.
> 
> >Maybe only some minor nitpicks in the source code comments below:
> >1. "it's" should be "its":
> >> +   * Compute statistic target, based on what's set for the
> statistic
> >> +   * object itself, and for it's attributes.
> >
> >2. Consistency whether you'd use either "statistic " or "statisticS ".
> >Ex. statistic target vs statisticS target, statistics object vs statistic
> object, etc.
> >
> >> Attached is v4 of the patch, with a couple more improvements:
> >>
> >> 1) I've renamed the if_not_exists flag to missing_ok, because that's
> >> more consistent with the "IF EXISTS" clause in the grammar (the old
> >> flag was kinda the exact opposite), and I've added a NOTICE about the skip.
> >
> >+boolmissing_ok;  /* do nothing if statistics does
> not exist */
> >
> >Confirmed. So we ignore if statistic does not exist, and skip the error.
> >Maybe to make it consistent with other data structures in
> >parsernodes.h, you can change the comment of missing_ok to:
> >/* skip error if statistics object does not exist */
> >
> 
> Thanks, I've fixed all those places in the attached v5.

I've confirmed the fix.

> >> 2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows, because
> >> that's what the function was doing anyway (computing sample size).
> >>
> >> 3) I've added a couple of regression tests to stats_ext.sql
> >>
> >> Aside from that, I've cleaned up a couple of places and improved a
> >> bunch of comments. Nothing huge.
> >
> >I have a question though regarding ComputeExtStatisticsRows.
> >I'm just curious with the value 300 when computing sample size.
> >Where did this value come from?
> >
> >+/* compute sample size based on the statistic target */
> >+return (300 * result);
> >
> >Overall, the patch is almost already in good shape for commit.
> >I'll wait for the next update.
> >
> 
> That's how we compute number of rows to sample, based on the statistics 
> target.
> See std_typanalyze() in analyze.c, which also cites the paper where this comes
> from.
Noted. Found it. Thank you for the reference.


There's just a small whitespace (extra space) below after running git diff 
--check.
>src/bin/pg_dump/pg_dump.c:7226: trailing whitespace.
>+ 
It would be better to post an updated patch,
but other than that, I've confirmed the fixes.
The patch passed the make-world and regression tests as well.
I've marked this as "ready for committer".


Regards,
Kirk Jamison




Re: Multivariate MCV list vs. statistics target

2019-07-26 Thread Tomas Vondra

On Fri, Jul 26, 2019 at 07:03:41AM +, Jamison, Kirk wrote:

On Sat, July 20, 2019 8:12 AM (GMT+9), Tomas Vondra wrote:


>+   /* XXX What if the target is set to 0? Reset the statistic?
*/
>
>This also makes me wonder. I haven't looked deeply into the code, but
>since 0 is a valid value, I believe it should reset the stats.

I agree resetting the stats after setting the target to 0 seems quite
reasonable. But that's not what we do for attribute stats, because in that
case we simply skip the attribute during the future ANALYZE runs - we don't
reset the stats, we keep the existing stats. So I've done the same thing here,
and I've removed the XXX comment.

If we want to change that, I'd do it in a separate patch for both the regular
and extended stats.


Hi, Tomas

Sorry for my late reply.
You're right. I have no strong opinion whether we'd want to change that 
behavior.
I've also confirmed the change in the patch where setting statistics target 0
skips the statistics.



OK, thanks.


Maybe only some minor nitpicks in the source code comments below:
1. "it's" should be "its":

+* Compute statistic target, based on what's set for the 
statistic
+* object itself, and for it's attributes.


2. Consistency whether you'd use either "statistic " or "statisticS ".
Ex. statistic target vs statisticS target, statistics object vs statistic 
object, etc.


Attached is v4 of the patch, with a couple more improvements:

1) I've renamed the if_not_exists flag to missing_ok, because that's more
consistent with the "IF EXISTS" clause in the grammar (the old flag was kinda
the exact opposite), and I've added a NOTICE about the skip.


+   boolmissing_ok;  /* do nothing if statistics does not 
exist */

Confirmed. So we ignore if statistic does not exist, and skip the error.
Maybe to make it consistent with other data structures in parsernodes.h,
you can change the comment of missing_ok to:
/* skip error if statistics object does not exist */



Thanks, I've fixed all those places in the attached v5.


2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows, because that's
what the function was doing anyway (computing sample size).

3) I've added a couple of regression tests to stats_ext.sql

Aside from that, I've cleaned up a couple of places and improved a bunch of
comments. Nothing huge.


I have a question though regarding ComputeExtStatisticsRows.
I'm just curious with the value 300 when computing sample size.
Where did this value come from?

+   /* compute sample size based on the statistic target */
+   return (300 * result);

Overall, the patch is almost already in good shape for commit.
I'll wait for the next update.



That's how we compute number of rows to sample, based on the statistics
target. See std_typanalyze() in analyze.c, which also cites the paper
where this comes from.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/doc/src/sgml/ref/alter_statistics.sgml 
b/doc/src/sgml/ref/alter_statistics.sgml
index 58c7ed020d..be4c3f1f05 100644
--- a/doc/src/sgml/ref/alter_statistics.sgml
+++ b/doc/src/sgml/ref/alter_statistics.sgml
@@ -26,6 +26,7 @@ PostgreSQL documentation
 ALTER STATISTICS name OWNER TO { 
new_owner | CURRENT_USER | 
SESSION_USER }
 ALTER STATISTICS name RENAME TO 
new_name
 ALTER STATISTICS name SET SCHEMA 
new_schema
+ALTER STATISTICS name SET 
STATISTICS new_target
 
  
 
@@ -93,6 +94,22 @@ ALTER STATISTICS name SET SCHEMA 
  
 
+ 
+  new_target
+  
+   
+The statistic-gathering target for this statistics object for 
subsequent
+ operations.
+The target can be set in the range 0 to 1; alternatively, set it
+to -1 to revert to using the system default statistics
+target ().
+For more information on the use of statistics by the
+PostgreSQL query planner, refer to
+.
+   
+  
+ 
+
 

   
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8d633f2585..c27df32d92 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -307,7 +307,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
VacAttrStats **vacattrstats;
AnlIndexData *indexdata;
int targrows,
-   numrows;
+   numrows,
+   minrows;
double  totalrows,
totaldeadrows;
HeapTuple  *rows;
@@ -491,6 +492,16 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
}
}
 
+   /*
+* Look at extended statistics objects too, as those may define custom
+* statistics target. So we may need to sample more rows and then build
+* the statistics with enough detail.
+*/
+

RE: Multivariate MCV list vs. statistics target

2019-07-26 Thread Jamison, Kirk
On Sat, July 20, 2019 8:12 AM (GMT+9), Tomas Vondra wrote:

> >+/* XXX What if the target is set to 0? Reset the statistic?
> */
> >
> >This also makes me wonder. I haven't looked deeply into the code, but
> >since 0 is a valid value, I believe it should reset the stats.
> 
> I agree resetting the stats after setting the target to 0 seems quite
> reasonable. But that's not what we do for attribute stats, because in that
> case we simply skip the attribute during the future ANALYZE runs - we don't
> reset the stats, we keep the existing stats. So I've done the same thing here,
> and I've removed the XXX comment.
> 
> If we want to change that, I'd do it in a separate patch for both the regular
> and extended stats.

Hi, Tomas

Sorry for my late reply.
You're right. I have no strong opinion whether we'd want to change that 
behavior.
I've also confirmed the change in the patch where setting statistics target 0
skips the statistics. 

Maybe only some minor nitpicks in the source code comments below:
1. "it's" should be "its":
> +  * Compute statistic target, based on what's set for the 
> statistic
> +  * object itself, and for it's attributes.

2. Consistency whether you'd use either "statistic " or "statisticS ".
Ex. statistic target vs statisticS target, statistics object vs statistic 
object, etc.

> Attached is v4 of the patch, with a couple more improvements:
>
> 1) I've renamed the if_not_exists flag to missing_ok, because that's more
> consistent with the "IF EXISTS" clause in the grammar (the old flag was kinda
> the exact opposite), and I've added a NOTICE about the skip.

+   boolmissing_ok;  /* do nothing if statistics does not 
exist */

Confirmed. So we ignore if statistic does not exist, and skip the error.
Maybe to make it consistent with other data structures in parsernodes.h,
you can change the comment of missing_ok to: 
/* skip error if statistics object does not exist */

> 2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows, because that's
> what the function was doing anyway (computing sample size).
>
> 3) I've added a couple of regression tests to stats_ext.sql
> 
> Aside from that, I've cleaned up a couple of places and improved a bunch of
> comments. Nothing huge.

I have a question though regarding ComputeExtStatisticsRows.
I'm just curious with the value 300 when computing sample size.
Where did this value come from?

+   /* compute sample size based on the statistic target */
+   return (300 * result);

Overall, the patch is almost already in good shape for commit.
I'll wait for the next update.

Regards,
Kirk Jamison




Re: Multivariate MCV list vs. statistics target

2019-07-19 Thread Tomas Vondra

On Fri, Jul 19, 2019 at 06:12:20AM +, Jamison, Kirk wrote:

On Tuesday, July 9, 2019, Tomas Vondra wrote:

>apparently v1 of the ALTER STATISTICS patch was a bit confused about
>length of the VacAttrStats array passed to statext_compute_stattarget,
>causing segfaults. Attached v2 patch fixes that, and it also makes sure
>we print warnings about ignored statistics only once.
>

v3 of the patch, adding pg_dump support - it works just like when you tweak
statistics target for a column, for example. When the value stored in the
catalog is not -1, pg_dump emits a separate ALTER STATISTICS command setting
it (for the already created statistics object).



Hi Tomas, I stumbled upon your patch.

According to the CF bot, your patch applies cleanly, builds successfully, and
passes make world. Meaning, the pg_dump tap test passed, but there was no
test for the new SET STATISTICS yet. So you might want to add a regression
test for that and integrate it in the existing alter_generic file.

Upon quick read-through, the syntax and docs are correct because it's similar
to the format of ALTER TABLE/INDEX... SET STATISTICS... :
   ALTER [ COLUMN ] column_name SET STATISTICS integer

+   /* XXX What if the target is set to 0? Reset the statistic? */

This also makes me wonder. I haven't looked deeply into the code, but since 0 is
a valid value, I believe it should reset the stats.


I agree resetting the stats after setting the target to 0 seems quite
reasonable. But that's not what we do for attribute stats, because in
that case we simply skip the attribute during the future ANALYZE runs -
we don't reset the stats, we keep the existing stats. So I've done the
same thing here, and I've removed the XXX comment.

If we want to change that, I'd do it in a separate patch for both the
regular and extended stats.


After lookup though, this is how it's tested in ALTER TABLE:
/test/regress/sql/stats_ext.sql:-- Ensure things work sanely with SET 
STATISTICS 0



Well, yeah. But that tests we skip building the extended statistic
(because we excluded the column from the ANALYZE run).


I've considered making it part of CREATE STATISTICS itself, but it seems a
bit cumbersome and we don't do it for columns either. I'm not against adding
it in the future, but at this point I don't see a need.


I agree. Perhaps that's for another patch should you decide to add it in the 
future.



Right.

Attached is v4 of the patch, with a couple more improvements:

1) I've renamed the if_not_exists flag to missing_ok, because that's
more consistent with the "IF EXISTS" clause in the grammar (the old flag
was kinda the exact opposite), and I've added a NOTICE about the skip.

2) I've renamed ComputeExtStatsTarget to ComputeExtStatsRows, because
that's what the function was doing anyway (computing sample size).

3) I've added a couple of regression tests to stats_ext.sql

Aside from that, I've cleaned up a couple of places and improved a bunch
of comments. Nothing huge.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/doc/src/sgml/ref/alter_statistics.sgml 
b/doc/src/sgml/ref/alter_statistics.sgml
index 58c7ed020d..987f9dbc6f 100644
--- a/doc/src/sgml/ref/alter_statistics.sgml
+++ b/doc/src/sgml/ref/alter_statistics.sgml
@@ -26,6 +26,7 @@ PostgreSQL documentation
 ALTER STATISTICS name OWNER TO { 
new_owner | CURRENT_USER | 
SESSION_USER }
 ALTER STATISTICS name RENAME TO 
new_name
 ALTER STATISTICS name SET SCHEMA 
new_schema
+ALTER STATISTICS name SET 
STATISTICS new_target
 
  
 
@@ -93,6 +94,22 @@ ALTER STATISTICS name SET SCHEMA 
  
 
+ 
+  new_target
+  
+   
+The statistic-gathering target for this statistic object for subsequent
+ operations.
+The target can be set in the range 0 to 1; alternatively, set it
+to -1 to revert to using the system default statistics
+target ().
+For more information on the use of statistics by the
+PostgreSQL query planner, refer to
+.
+   
+  
+ 
+
 

   
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8d633f2585..edbd24f5e9 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -307,7 +307,8 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
VacAttrStats **vacattrstats;
AnlIndexData *indexdata;
int targrows,
-   numrows;
+   numrows,
+   minrows;
double  totalrows,
totaldeadrows;
HeapTuple  *rows;
@@ -491,6 +492,16 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
}
}
 
+   /*
+* Look at extended statistic objects too, as those may define custom
+* statistic target. So we may need to 

RE: Multivariate MCV list vs. statistics target

2019-07-19 Thread Jamison, Kirk
On Tuesday, July 9, 2019, Tomas Vondra wrote:
> >apparently v1 of the ALTER STATISTICS patch was a bit confused about
> >length of the VacAttrStats array passed to statext_compute_stattarget,
> >causing segfaults. Attached v2 patch fixes that, and it also makes sure
> >we print warnings about ignored statistics only once.
> >
> 
> v3 of the patch, adding pg_dump support - it works just like when you tweak
> statistics target for a column, for example. When the value stored in the
> catalog is not -1, pg_dump emits a separate ALTER STATISTICS command setting
> it (for the already created statistics object).
> 

Hi Tomas, I stumbled upon your patch.

According to the CF bot, your patch applies cleanly, builds successfully, and
passes make world. Meaning, the pg_dump tap test passed, but there was no
test for the new SET STATISTICS yet. So you might want to add a regression
test for that and integrate it in the existing alter_generic file.

Upon quick read-through, the syntax and docs are correct because it's similar
to the format of ALTER TABLE/INDEX... SET STATISTICS... :
ALTER [ COLUMN ] column_name SET STATISTICS integer

+   /* XXX What if the target is set to 0? Reset the statistic? */

This also makes me wonder. I haven't looked deeply into the code, but since 0 is
a valid value, I believe it should reset the stats.
After lookup though, this is how it's tested in ALTER TABLE:
/test/regress/sql/stats_ext.sql:-- Ensure things work sanely with SET 
STATISTICS 0

> I've considered making it part of CREATE STATISTICS itself, but it seems a
> bit cumbersome and we don't do it for columns either. I'm not against adding
> it in the future, but at this point I don't see a need.

I agree. Perhaps that's for another patch should you decide to add it in the 
future.

Regards,
Kirk Jamison




Re: Multivariate MCV list vs. statistics target

2019-07-08 Thread Tomas Vondra

On Sun, Jul 07, 2019 at 12:02:38AM +0200, Tomas Vondra wrote:

Hi,

apparently v1 of the ALTER STATISTICS patch was a bit confused about
length of the VacAttrStats array passed to statext_compute_stattarget,
causing segfaults. Attached v2 patch fixes that, and it also makes sure
we print warnings about ignored statistics only once.



v3 of the patch, adding pg_dump support - it works just like when you
tweak statistics target for a column, for example. When the value stored
in the catalog is not -1, pg_dump emits a separate ALTER STATISTICS
command setting it (for the already created statistics object).

I've considered making it part of CREATE STATISTICS itself, but it seems
a bit cumbersome and we don't do it for columns either. I'm not against
adding it in the future, but at this point I don't see a need.

At this point I'm not aware of any missing or broken pieces, so I'd
welcome feedback.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/ref/alter_statistics.sgml 
b/doc/src/sgml/ref/alter_statistics.sgml
index 58c7ed020d..987f9dbc6f 100644
--- a/doc/src/sgml/ref/alter_statistics.sgml
+++ b/doc/src/sgml/ref/alter_statistics.sgml
@@ -26,6 +26,7 @@ PostgreSQL documentation
 ALTER STATISTICS name OWNER TO { 
new_owner | CURRENT_USER | 
SESSION_USER }
 ALTER STATISTICS name RENAME TO 
new_name
 ALTER STATISTICS name SET SCHEMA 
new_schema
+ALTER STATISTICS name SET 
STATISTICS new_target
 
  
 
@@ -93,6 +94,22 @@ ALTER STATISTICS name SET SCHEMA 
  
 
+ 
+  new_target
+  
+   
+The statistic-gathering target for this statistic object for subsequent
+ operations.
+The target can be set in the range 0 to 1; alternatively, set it
+to -1 to revert to using the system default statistics
+target ().
+For more information on the use of statistics by the
+PostgreSQL query planner, refer to
+.
+   
+  
+ 
+
 

   
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d7004e5313..caa4fca99d 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -490,6 +490,19 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
}
}
 
+   /*
+* Look at extended statistic objects too, because those may define 
their
+* own statistic target. So we need to sample enough rows and then build
+* the statistics with enough detail.
+*/
+   {
+   int nrows = ComputeExtStatisticsTarget(onerel,
+   
   attr_cnt, vacattrstats);
+
+   if (targrows < nrows)
+   targrows = nrows;
+   }
+
/*
 * Acquire the sample rows
 */
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index cf406f6f96..356b6096ac 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "access/heapam.h"
 #include "access/relation.h"
 #include "access/relscan.h"
 #include "access/table.h"
@@ -21,6 +22,7 @@
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
 #include "catalog/namespace.h"
+#include "catalog/objectaccess.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_statistic_ext_data.h"
@@ -29,6 +31,7 @@
 #include "miscadmin.h"
 #include "statistics/statistics.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
 #include "utils/inval.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -336,6 +339,7 @@ CreateStatistics(CreateStatsStmt *stmt)
values[Anum_pg_statistic_ext_stxrelid - 1] = ObjectIdGetDatum(relid);
values[Anum_pg_statistic_ext_stxname - 1] = NameGetDatum();
values[Anum_pg_statistic_ext_stxnamespace - 1] = 
ObjectIdGetDatum(namespaceId);
+   values[Anum_pg_statistic_ext_stxstattarget - 1] = Int32GetDatum(-1);
values[Anum_pg_statistic_ext_stxowner - 1] = ObjectIdGetDatum(stxowner);
values[Anum_pg_statistic_ext_stxkeys - 1] = PointerGetDatum(stxkeys);
values[Anum_pg_statistic_ext_stxkind - 1] = PointerGetDatum(stxkind);
@@ -414,6 +418,85 @@ CreateStatistics(CreateStatsStmt *stmt)
return myself;
 }
 
+
+/*
+ * ALTER STATISTICS
+ */
+ObjectAddress
+AlterStatistics(AlterStatsStmt *stmt)
+{
+   Relationrel;
+   Oid stxoid;
+   HeapTuple   oldtup;
+   HeapTuple   newtup;
+   Datum   repl_val[Natts_pg_statistic_ext];
+   boolrepl_null[Natts_pg_statistic_ext];
+   boolrepl_repl[Natts_pg_statistic_ext];
+   ObjectAddress   address;
+   int newtarget = stmt->stxstattarget;
+
+   /* Limit 

Re: Multivariate MCV list vs. statistics target

2019-07-06 Thread Tomas Vondra

Hi,

apparently v1 of the ALTER STATISTICS patch was a bit confused about
length of the VacAttrStats array passed to statext_compute_stattarget,
causing segfaults. Attached v2 patch fixes that, and it also makes sure
we print warnings about ignored statistics only once.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/doc/src/sgml/ref/alter_statistics.sgml 
b/doc/src/sgml/ref/alter_statistics.sgml
index 58c7ed020d..987f9dbc6f 100644
--- a/doc/src/sgml/ref/alter_statistics.sgml
+++ b/doc/src/sgml/ref/alter_statistics.sgml
@@ -26,6 +26,7 @@ PostgreSQL documentation
 ALTER STATISTICS name OWNER TO { 
new_owner | CURRENT_USER | 
SESSION_USER }
 ALTER STATISTICS name RENAME TO 
new_name
 ALTER STATISTICS name SET SCHEMA 
new_schema
+ALTER STATISTICS name SET 
STATISTICS new_target
 
  
 
@@ -93,6 +94,22 @@ ALTER STATISTICS name SET SCHEMA 
  
 
+ 
+  new_target
+  
+   
+The statistic-gathering target for this statistic object for subsequent
+ operations.
+The target can be set in the range 0 to 1; alternatively, set it
+to -1 to revert to using the system default statistics
+target ().
+For more information on the use of statistics by the
+PostgreSQL query planner, refer to
+.
+   
+  
+ 
+
 

   
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d7004e5313..caa4fca99d 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -490,6 +490,19 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
}
}
 
+   /*
+* Look at extended statistic objects too, because those may define 
their
+* own statistic target. So we need to sample enough rows and then build
+* the statistics with enough detail.
+*/
+   {
+   int nrows = ComputeExtStatisticsTarget(onerel,
+   
   attr_cnt, vacattrstats);
+
+   if (targrows < nrows)
+   targrows = nrows;
+   }
+
/*
 * Acquire the sample rows
 */
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index cf406f6f96..356b6096ac 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "access/heapam.h"
 #include "access/relation.h"
 #include "access/relscan.h"
 #include "access/table.h"
@@ -21,6 +22,7 @@
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
 #include "catalog/namespace.h"
+#include "catalog/objectaccess.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_statistic_ext_data.h"
@@ -29,6 +31,7 @@
 #include "miscadmin.h"
 #include "statistics/statistics.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
 #include "utils/inval.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -336,6 +339,7 @@ CreateStatistics(CreateStatsStmt *stmt)
values[Anum_pg_statistic_ext_stxrelid - 1] = ObjectIdGetDatum(relid);
values[Anum_pg_statistic_ext_stxname - 1] = NameGetDatum();
values[Anum_pg_statistic_ext_stxnamespace - 1] = 
ObjectIdGetDatum(namespaceId);
+   values[Anum_pg_statistic_ext_stxstattarget - 1] = Int32GetDatum(-1);
values[Anum_pg_statistic_ext_stxowner - 1] = ObjectIdGetDatum(stxowner);
values[Anum_pg_statistic_ext_stxkeys - 1] = PointerGetDatum(stxkeys);
values[Anum_pg_statistic_ext_stxkind - 1] = PointerGetDatum(stxkind);
@@ -414,6 +418,85 @@ CreateStatistics(CreateStatsStmt *stmt)
return myself;
 }
 
+
+/*
+ * ALTER STATISTICS
+ */
+ObjectAddress
+AlterStatistics(AlterStatsStmt *stmt)
+{
+   Relationrel;
+   Oid stxoid;
+   HeapTuple   oldtup;
+   HeapTuple   newtup;
+   Datum   repl_val[Natts_pg_statistic_ext];
+   boolrepl_null[Natts_pg_statistic_ext];
+   boolrepl_repl[Natts_pg_statistic_ext];
+   ObjectAddress   address;
+   int newtarget = stmt->stxstattarget;
+
+   /* Limit target to a sane range */
+   if (newtarget < -1)
+   {
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("statistics target %d is too low",
+   newtarget)));
+   }
+   else if (newtarget > 1)
+   {
+   newtarget = 1;
+   ereport(WARNING,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("lowering statistics target to %d",
+   

Re: Multivariate MCV list vs. statistics target

2019-06-30 Thread David Rowley
On Fri, 21 Jun 2019 at 19:09, Dean Rasheed  wrote:
> Hmm, maybe. I can certainly understand your dislike of using text[].
> I'm not sure that we can confidently say that multivariate statistics
> won't ever need additional tuning knobs, but I have no idea at the
> moment what they might be, and nothing else has come up so far in all
> the time spent considering MCV lists and histograms, so maybe this is
> OK.

I agree with having the stxstattarget column. Even if something did
come up in the future, then we could consider merging the
stxstattarget column with a new text[] column at that time.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Multivariate MCV list vs. statistics target

2019-06-29 Thread Tomas Vondra

On Fri, Jun 21, 2019 at 08:09:18AM +0100, Dean Rasheed wrote:

On Thu, 20 Jun 2019 at 23:12, Tomas Vondra  wrote:


On Thu, Jun 20, 2019 at 08:08:44AM +0100, Dean Rasheed wrote:
>On Tue, 18 Jun 2019 at 22:34, Tomas Vondra  
wrote:
>>
>> So I'm thinking we should allow tweaking the statistics for extended
>> stats, and serialize it in the pg_statistic_ext catalog. Any opinions
>> why that would be a bad idea?
>
>Seems reasonable to me. This might not be the only option we'll ever
>want to add though, so perhaps a "stxoptions text[]" column along the
>lines of a relation's reloptions would be the way to go.

I don't know - I kinda dislike the idea of stashing stuff like this into
text[] arrays unless there's a clear need for such flexibility (i.e.
vision to have more such options). Which I'm not sure is the case here.
And we kinda have a precedent in pg_attribute.attstattarget, so I'd use
the same approach here.



Hmm, maybe. I can certainly understand your dislike of using text[].
I'm not sure that we can confidently say that multivariate statistics
won't ever need additional tuning knobs, but I have no idea at the
moment what they might be, and nothing else has come up so far in all
the time spent considering MCV lists and histograms, so maybe this is
OK.



OK, attached is a patch implementing this - it adds

   ALTER STATISTICS ... SET STATISTICS ...

modifying a new stxstattarget column in pg_statistic_ext catalog,
following the same logic as pg_attribute.attstattarget.

During analyze, the per-ext-statistic value is determined like this:

1) When pg_statistic_ext.stxstattarget != (-1), we just use this value
and we're done.

2) Otherwise we inspect per-column attstattarget values, and use the
largest value. This is what we do now, so it's backwards-compatible
behavior.

3) If the value is still (-1), we use default_statistics_target.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
diff --git a/doc/src/sgml/ref/alter_statistics.sgml 
b/doc/src/sgml/ref/alter_statistics.sgml
index 58c7ed020d..987f9dbc6f 100644
--- a/doc/src/sgml/ref/alter_statistics.sgml
+++ b/doc/src/sgml/ref/alter_statistics.sgml
@@ -26,6 +26,7 @@ PostgreSQL documentation
 ALTER STATISTICS name OWNER TO { 
new_owner | CURRENT_USER | 
SESSION_USER }
 ALTER STATISTICS name RENAME TO 
new_name
 ALTER STATISTICS name SET SCHEMA 
new_schema
+ALTER STATISTICS name SET 
STATISTICS new_target
 
  
 
@@ -93,6 +94,22 @@ ALTER STATISTICS name SET SCHEMA 
  
 
+ 
+  new_target
+  
+   
+The statistic-gathering target for this statistic object for subsequent
+ operations.
+The target can be set in the range 0 to 1; alternatively, set it
+to -1 to revert to using the system default statistics
+target ().
+For more information on the use of statistics by the
+PostgreSQL query planner, refer to
+.
+   
+  
+ 
+
 

   
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d7004e5313..caa4fca99d 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -490,6 +490,19 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
}
}
 
+   /*
+* Look at extended statistic objects too, because those may define 
their
+* own statistic target. So we need to sample enough rows and then build
+* the statistics with enough detail.
+*/
+   {
+   int nrows = ComputeExtStatisticsTarget(onerel,
+   
   attr_cnt, vacattrstats);
+
+   if (targrows < nrows)
+   targrows = nrows;
+   }
+
/*
 * Acquire the sample rows
 */
diff --git a/src/backend/commands/statscmds.c b/src/backend/commands/statscmds.c
index cf406f6f96..356b6096ac 100644
--- a/src/backend/commands/statscmds.c
+++ b/src/backend/commands/statscmds.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "access/heapam.h"
 #include "access/relation.h"
 #include "access/relscan.h"
 #include "access/table.h"
@@ -21,6 +22,7 @@
 #include "catalog/dependency.h"
 #include "catalog/indexing.h"
 #include "catalog/namespace.h"
+#include "catalog/objectaccess.h"
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_statistic_ext_data.h"
@@ -29,6 +31,7 @@
 #include "miscadmin.h"
 #include "statistics/statistics.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
 #include "utils/inval.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
@@ -336,6 +339,7 @@ CreateStatistics(CreateStatsStmt *stmt)
values[Anum_pg_statistic_ext_stxrelid - 1] = ObjectIdGetDatum(relid);
values[Anum_pg_statistic_ext_stxname - 1] = NameGetDatum();
values[Anum_pg_statistic_ext_stxnamespace - 1] = 

Re: Multivariate MCV list vs. statistics target

2019-06-21 Thread Dean Rasheed
On Thu, 20 Jun 2019 at 23:12, Tomas Vondra  wrote:
>
> On Thu, Jun 20, 2019 at 08:08:44AM +0100, Dean Rasheed wrote:
> >On Tue, 18 Jun 2019 at 22:34, Tomas Vondra  
> >wrote:
> >>
> >> So I'm thinking we should allow tweaking the statistics for extended
> >> stats, and serialize it in the pg_statistic_ext catalog. Any opinions
> >> why that would be a bad idea?
> >
> >Seems reasonable to me. This might not be the only option we'll ever
> >want to add though, so perhaps a "stxoptions text[]" column along the
> >lines of a relation's reloptions would be the way to go.
>
> I don't know - I kinda dislike the idea of stashing stuff like this into
> text[] arrays unless there's a clear need for such flexibility (i.e.
> vision to have more such options). Which I'm not sure is the case here.
> And we kinda have a precedent in pg_attribute.attstattarget, so I'd use
> the same approach here.
>

Hmm, maybe. I can certainly understand your dislike of using text[].
I'm not sure that we can confidently say that multivariate statistics
won't ever need additional tuning knobs, but I have no idea at the
moment what they might be, and nothing else has come up so far in all
the time spent considering MCV lists and histograms, so maybe this is
OK.

Regards,
Dean




Re: Multivariate MCV list vs. statistics target

2019-06-20 Thread Tomas Vondra

On Thu, Jun 20, 2019 at 08:08:44AM +0100, Dean Rasheed wrote:

On Tue, 18 Jun 2019 at 22:34, Tomas Vondra  wrote:


One slightly inconvenient thing I realized while playing with the
address data set is that it's somewhat difficult to set the desired size
of the multi-column MCV list.

At the moment, we simply use the maximum statistic target for attributes
the MCV list is built on. But that does not allow keeping default size
for per-column stats, and only increase size of multi-column MCV lists.

So I'm thinking we should allow tweaking the statistics for extended
stats, and serialize it in the pg_statistic_ext catalog. Any opinions
why that would be a bad idea?



Seems reasonable to me. This might not be the only option we'll ever
want to add though, so perhaps a "stxoptions text[]" column along the
lines of a relation's reloptions would be the way to go.



I don't know - I kinda dislike the idea of stashing stuff like this into
text[] arrays unless there's a clear need for such flexibility (i.e.
vision to have more such options). Which I'm not sure is the case here.
And we kinda have a precedent in pg_attribute.attstattarget, so I'd use
the same approach here.


I suppose it should be part of the CREATE STATISTICS command, but I'm
not sure what'd be the best syntax. We might also have something more
similar to ALTER COLUMNT, but perhaps

 ALTER STATISTICS s SET STATISTICS 1000;

looks a bit too weird.



Yes it does look a bit weird, but that's the natural generalisation of
what we have for per-column statistics, so it's probably preferable to
do that rather than invent some other syntax that wouldn't be so
consistent.



Yeah, I agree.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services