Re: PoC/WIP: Extended statistics on expressions

2021-09-19 Thread Tomas Vondra
On 9/3/21 5:56 AM, Justin Pryzby wrote: On Wed, Sep 01, 2021 at 06:45:29PM +0200, Tomas Vondra wrote: However while polishing the second patch, I realized we're allowing statistics on expressions referencing system attributes. So this fails; CREATE STATISTICS s ON ctid, x FROM t; but this

Re: PoC/WIP: Extended statistics on expressions

2021-09-02 Thread Justin Pryzby
On Wed, Sep 01, 2021 at 06:45:29PM +0200, Tomas Vondra wrote: > However while polishing the second patch, I realized we're allowing > statistics on expressions referencing system attributes. So this fails; > > CREATE STATISTICS s ON ctid, x FROM t; > > but this passes: > > CREATE STATISTICS s

Re: PoC/WIP: Extended statistics on expressions

2021-09-01 Thread Tomas Vondra
On 9/1/21 9:38 PM, Justin Pryzby wrote: On Wed, Sep 01, 2021 at 06:45:29PM +0200, Tomas Vondra wrote: Patch 0001 fixes the "double parens" issue discussed elsewhere in this thread, and patch 0002 tweaks CREATE STATISTICS to treat "(a)" as a simple column reference. 0002 refuses to create

Re: PoC/WIP: Extended statistics on expressions

2021-09-01 Thread Justin Pryzby
On Wed, Sep 01, 2021 at 06:45:29PM +0200, Tomas Vondra wrote: > > > Patch 0001 fixes the "double parens" issue discussed elsewhere in this > > > thread, and patch 0002 tweaks CREATE STATISTICS to treat "(a)" as a simple > > > column reference. > > > > 0002 refuses to create expressional stats on

Re: PoC/WIP: Extended statistics on expressions

2021-09-01 Thread Tomas Vondra
On 8/24/21 3:13 PM, Justin Pryzby wrote: On Mon, Aug 16, 2021 at 05:41:57PM +0200, Tomas Vondra wrote: This still seems odd: postgres=# CREATE STATISTICS asf ON i FROM t; ERROR: extended statistics require at least 2 columns postgres=# CREATE STATISTICS asf ON (i) FROM t; CREATE STATISTICS

Re: PoC/WIP: Extended statistics on expressions

2021-08-24 Thread Justin Pryzby
On Mon, Aug 16, 2021 at 05:41:57PM +0200, Tomas Vondra wrote: > > This still seems odd: > > > > postgres=# CREATE STATISTICS asf ON i FROM t; > > ERROR: extended statistics require at least 2 columns > > postgres=# CREATE STATISTICS asf ON (i) FROM t; > > CREATE STATISTICS > > > > It seems

Re: PoC/WIP: Extended statistics on expressions

2021-08-18 Thread Tomas Vondra
On 8/18/21 5:07 AM, Justin Pryzby wrote: Patch 0001 fixes the "double parens" issue discussed elsewhere in this thread, and patch 0002 tweaks CREATE STATISTICS to treat "(a)" as a simple column reference. From: Tomas Vondra Date: Mon, 16 Aug 2021 17:19:33 +0200 Subject: [PATCH 2/2] fix:

Re: PoC/WIP: Extended statistics on expressions

2021-08-17 Thread Justin Pryzby
> Patch 0001 fixes the "double parens" issue discussed elsewhere in this > thread, and patch 0002 tweaks CREATE STATISTICS to treat "(a)" as a simple > column reference. > From: Tomas Vondra > Date: Mon, 16 Aug 2021 17:19:33 +0200 > Subject: [PATCH 2/2] fix: identify single-attribute references

Re: PoC/WIP: Extended statistics on expressions

2021-08-16 Thread Tomas Vondra
On 8/16/21 3:32 AM, Justin Pryzby wrote: On Mon, Dec 07, 2020 at 03:15:17PM +0100, Tomas Vondra wrote: Looking at the current behaviour, there are a couple of things that seem a little odd, even though they are understandable. For example, the fact that CREATE STATISTICS s (expressions)

Re: PoC/WIP: Extended statistics on expressions

2021-08-16 Thread Tomas Vondra
On 8/16/21 3:31 AM, Justin Pryzby wrote: On 1/22/21 5:01 AM, Justin Pryzby wrote: In any case, why are there so many parentheses ? On Fri, Jan 22, 2021 at 02:09:04PM +0100, Tomas Vondra wrote: That's a bug in pg_get_statisticsobj_worker, probably. It shouldn't be adding extra parentheses,

Re: PoC/WIP: Extended statistics on expressions

2021-08-15 Thread Justin Pryzby
On Mon, Dec 07, 2020 at 03:15:17PM +0100, Tomas Vondra wrote: > > Looking at the current behaviour, there are a couple of things that > > seem a little odd, even though they are understandable. For example, > > the fact that > > > > CREATE STATISTICS s (expressions) ON (expr), col FROM tbl; > >

Re: PoC/WIP: Extended statistics on expressions

2021-08-15 Thread Justin Pryzby
On 1/22/21 5:01 AM, Justin Pryzby wrote: > > In any case, why are there so many parentheses ? On Fri, Jan 22, 2021 at 02:09:04PM +0100, Tomas Vondra wrote: > That's a bug in pg_get_statisticsobj_worker, probably. It shouldn't be > adding extra parentheses, on top of what deparse_expression_pretty

Re: PoC/WIP: Extended statistics on expressions

2021-06-11 Thread Tomas Vondra
On 6/11/21 6:55 AM, Noah Misch wrote: On Sun, Jun 06, 2021 at 09:13:17PM +0200, Tomas Vondra wrote: On 6/6/21 7:37 AM, Noah Misch wrote: On Sat, Mar 27, 2021 at 01:17:14AM +0100, Tomas Vondra wrote: OK, pushed after a bit more polishing and testing. This added a "transformed" field to

Re: PoC/WIP: Extended statistics on expressions

2021-06-10 Thread Noah Misch
On Sun, Jun 06, 2021 at 09:13:17PM +0200, Tomas Vondra wrote: > > On 6/6/21 7:37 AM, Noah Misch wrote: > > On Sat, Mar 27, 2021 at 01:17:14AM +0100, Tomas Vondra wrote: > >> OK, pushed after a bit more polishing and testing. > > > > This added a "transformed" field to CreateStatsStmt, but it

Re: PoC/WIP: Extended statistics on expressions

2021-06-06 Thread Tom Lane
Tomas Vondra writes: > On 6/6/21 9:17 PM, Tom Lane wrote: >> I'm curious about how come the buildfarm didn't notice this. The >> animals using COPY_PARSE_PLAN_TREES should have failed. The fact >> that they didn't implies that there's no test case that makes use >> of a nonzero value for this

Re: PoC/WIP: Extended statistics on expressions

2021-06-06 Thread Tomas Vondra
On 6/6/21 9:17 PM, Tom Lane wrote: > Tomas Vondra writes: >> On 6/6/21 7:37 AM, Noah Misch wrote: >>> This added a "transformed" field to CreateStatsStmt, but it didn't mention >>> that field in src/backend/nodes. Should those functions handle the field? > >> Yup, that's a mistake - it

Re: PoC/WIP: Extended statistics on expressions

2021-06-06 Thread Tom Lane
Tomas Vondra writes: > On 6/6/21 7:37 AM, Noah Misch wrote: >> This added a "transformed" field to CreateStatsStmt, but it didn't mention >> that field in src/backend/nodes. Should those functions handle the field? > Yup, that's a mistake - it should do whatever CREATE INDEX is doing. Not >

Re: PoC/WIP: Extended statistics on expressions

2021-06-06 Thread Tomas Vondra
On 6/6/21 7:37 AM, Noah Misch wrote: > On Sat, Mar 27, 2021 at 01:17:14AM +0100, Tomas Vondra wrote: >> OK, pushed after a bit more polishing and testing. > > This added a "transformed" field to CreateStatsStmt, but it didn't mention > that field in src/backend/nodes. Should those functions

Re: PoC/WIP: Extended statistics on expressions

2021-06-05 Thread Noah Misch
On Sat, Mar 27, 2021 at 01:17:14AM +0100, Tomas Vondra wrote: > OK, pushed after a bit more polishing and testing. This added a "transformed" field to CreateStatsStmt, but it didn't mention that field in src/backend/nodes. Should those functions handle the field?

Re: PoC/WIP: Extended statistics on expressions (\d in old client)

2021-06-02 Thread Justin Pryzby
On Fri, Jan 22, 2021 at 02:09:04PM +0100, Tomas Vondra wrote: > On 1/22/21 5:01 AM, Justin Pryzby wrote: > > On Fri, Jan 22, 2021 at 04:49:51AM +0100, Tomas Vondra wrote: > > > > > | Statistics objects: > > > > > | "public"."s2" (ndistinct, dependencies, mcv) ON FROM t > > > > > > Umm, for

Re: PoC/WIP: Extended statistics on expressions

2021-04-22 Thread Justin Pryzby
I suggest to add some kind of reference to stats expressions here. --- a/doc/src/sgml/maintenance.sgml +++ b/doc/src/sgml/maintenance.sgml Updating Planner Statistics statistics of the planner [...] @@ -330,10 +330,12 @@ Also, by default there is limited

Re: PoC/WIP: Extended statistics on expressions

2021-03-26 Thread Tomas Vondra
On 3/27/21 1:17 AM, Tomas Vondra wrote: > On 3/26/21 1:54 PM, Tomas Vondra wrote: >> >> >> On 3/26/21 12:37 PM, Dean Rasheed wrote: >>> On Thu, 25 Mar 2021 at 19:59, Tomas Vondra >>> wrote: Attached is an updated patch series, with all the changes discussed here. I've cleaned up

Re: PoC/WIP: Extended statistics on expressions

2021-03-26 Thread Tomas Vondra
On 3/26/21 1:54 PM, Tomas Vondra wrote: > > > On 3/26/21 12:37 PM, Dean Rasheed wrote: >> On Thu, 25 Mar 2021 at 19:59, Tomas Vondra >> wrote: >>> >>> Attached is an updated patch series, with all the changes discussed >>> here. I've cleaned up the ndistinct stuff a bit more (essentially >>>

Re: PoC/WIP: Extended statistics on expressions

2021-03-26 Thread Tomas Vondra
On 3/26/21 12:37 PM, Dean Rasheed wrote: > On Thu, 25 Mar 2021 at 19:59, Tomas Vondra > wrote: >> >> Attached is an updated patch series, with all the changes discussed >> here. I've cleaned up the ndistinct stuff a bit more (essentially >> reverting back from GroupExprInfo to GroupVarInfo

Re: PoC/WIP: Extended statistics on expressions

2021-03-26 Thread Dean Rasheed
On Thu, 25 Mar 2021 at 19:59, Tomas Vondra wrote: > > Attached is an updated patch series, with all the changes discussed > here. I've cleaned up the ndistinct stuff a bit more (essentially > reverting back from GroupExprInfo to GroupVarInfo name), and got rid of > the

Re: PoC/WIP: Extended statistics on expressions

2021-03-25 Thread Dean Rasheed
On Thu, 25 Mar 2021 at 00:05, Tomas Vondra wrote: > > here's an updated patch. 0001 The change to the way that CreateStatistics() records dependencies isn't quite right -- recordDependencyOnSingleRelExpr() will not create any dependencies if the expression uses only a whole-row Var. However,

Re: PoC/WIP: Extended statistics on expressions

2021-03-25 Thread Dean Rasheed
On Thu, 25 Mar 2021 at 00:05, Tomas Vondra wrote: > > Actually, I think we need that block at all - there's no point in > keeping the exact expression, because if there was a statistics matching > it it'd be matched by the examine_variable. So if we get here, we have > to just split it into the

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Tomas Vondra
On 3/25/21 1:05 AM, Tomas Vondra wrote: > ... > > 0002 is an attempt to fix an issue I noticed today - we need to handle > type changes. Until now we did not have problems with that, because we > only had attnums - so we just reset the statistics (with the exception > of functional dependencies,

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Justin Pryzby
On Thu, Mar 25, 2021 at 01:05:37AM +0100, Tomas Vondra wrote: > here's an updated patch. 0001 should address most of the today's review > items regarding comments etc. This is still an issue: postgres=# CREATE STATISTICS xt ON a FROM t JOIN t ON true; ERROR: schema "i" does not exist --

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Alvaro Herrera
On 2021-Mar-24, Justin Pryzby wrote: > On Wed, Mar 24, 2021 at 05:15:46PM +, Dean Rasheed wrote: > > Hmm, I think "univariate" and "multivariate" are pretty ubiquitous, > > when used to describe statistics. You could use "single-column" and > > "multi-column", but then "column" isn't really

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Justin Pryzby
On Wed, Mar 24, 2021 at 05:15:46PM +, Dean Rasheed wrote: > On Wed, 24 Mar 2021 at 16:48, Tomas Vondra > wrote: > > > > As for the changes proposed in the create_statistics, do we really want > > to use univariate / multivariate there? Yes, the terms are correct, but > > I'm not sure how many

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Dean Rasheed
On Wed, 24 Mar 2021 at 16:48, Tomas Vondra wrote: > > As for the changes proposed in the create_statistics, do we really want > to use univariate / multivariate there? Yes, the terms are correct, but > I'm not sure how many people looking at CREATE STATISTICS will > understand them. > Hmm, I

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Tomas Vondra
On 3/24/21 5:28 PM, Dean Rasheed wrote: > On Wed, 24 Mar 2021 at 14:48, Tomas Vondra > wrote: >> >> AFAIK the primary issue here is that the two places disagree. While >> estimate_num_groups does this >> >> varnos = pull_varnos(root, (Node *) varshere); >> if (bms_membership(varnos) ==

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Justin Pryzby
On Wed, Mar 24, 2021 at 01:24:46AM -0500, Justin Pryzby wrote: > It seems like you're preferring to use pluralized "statistics" in a lot of > places that sound wrong to me. For example: > > Currently the first statistics wins, which seems silly. > I can write more separately, but I think this is

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Tomas Vondra
On 3/24/21 7:24 AM, Justin Pryzby wrote: > Most importantly, it looks like this forgets to update catalog documentation > for stxexprs and stxkind='e' > Good catch. > It seems like you're preferring to use pluralized "statistics" in a lot of > places that sound wrong to me. For example: >>

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Dean Rasheed
On Wed, 24 Mar 2021 at 14:48, Tomas Vondra wrote: > > AFAIK the primary issue here is that the two places disagree. While > estimate_num_groups does this > > varnos = pull_varnos(root, (Node *) varshere); > if (bms_membership(varnos) == BMS_SINGLETON) > { ... } > > the

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Tomas Vondra
On 3/24/21 2:36 PM, Dean Rasheed wrote: > On Wed, 24 Mar 2021 at 10:22, Tomas Vondra > wrote: >> >> Thanks, it seems to be some thinko in handling in PlaceHolderVars, which >> seem to break the code's assumptions about varnos. This fixes it for me, >> but I need to look at it more closely. >> >

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Dean Rasheed
On Wed, 24 Mar 2021 at 10:22, Tomas Vondra wrote: > > Thanks, it seems to be some thinko in handling in PlaceHolderVars, which > seem to break the code's assumptions about varnos. This fixes it for me, > but I need to look at it more closely. > I think that makes sense. Reviewing the docs, I

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Justin Pryzby
On Wed, Mar 24, 2021 at 09:54:22AM +0100, Tomas Vondra wrote: > Hi Justin, > > Unfortunately the query is incomplete, so I can't quite determine what > went wrong. Can you extract the full query causing the crash, either > from the server log or from a core file? Oh, shoot, I didn't realize it

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Tomas Vondra
Hi Justin, Unfortunately the query is incomplete, so I can't quite determine what went wrong. Can you extract the full query causing the crash, either from the server log or from a core file? thanks On 3/24/21 9:14 AM, Justin Pryzby wrote: > I got this crash running sqlsmith: > > #1

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Justin Pryzby
I got this crash running sqlsmith: #1 0x7f907574b801 in __GI_abort () at abort.c:79 #2 0x5646b95a35f8 in ExceptionalCondition (conditionName=conditionName@entry=0x5646b97411db "bms_num_members(varnos) == 1", errorType=errorType@entry=0x5646b95fa00b "FailedAssertion",

Re: PoC/WIP: Extended statistics on expressions

2021-03-24 Thread Justin Pryzby
Most importantly, it looks like this forgets to update catalog documentation for stxexprs and stxkind='e' It seems like you're preferring to use pluralized "statistics" in a lot of places that sound wrong to me. For example: > Currently the first statistics wins, which seems silly. I can write

Re: PoC/WIP: Extended statistics on expressions

2021-03-18 Thread Dean Rasheed
On Wed, 17 Mar 2021 at 21:31, Tomas Vondra wrote: > > I agree applying at least the [(a+b),c] stats is probably the right > approach, as it means we're considering at least the available > information about dependence between the columns. > > I think to improve this, we'll need to teach the code

Re: PoC/WIP: Extended statistics on expressions

2021-03-17 Thread Tomas Vondra
On 3/17/21 9:58 PM, Dean Rasheed wrote: > On Wed, 17 Mar 2021 at 20:48, Dean Rasheed wrote: >> >> For reference, here is the test case I was using (which isn't really very >> good for >> catching dependence between columns): >> > > And here's a test case with much more dependence between the

Re: PoC/WIP: Extended statistics on expressions

2021-03-17 Thread Dean Rasheed
On Wed, 17 Mar 2021 at 20:48, Dean Rasheed wrote: > > For reference, here is the test case I was using (which isn't really very > good for > catching dependence between columns): > And here's a test case with much more dependence between the columns: DROP TABLE IF EXISTS foo; CREATE TABLE foo

Re: PoC/WIP: Extended statistics on expressions

2021-03-17 Thread Dean Rasheed
On Wed, 17 Mar 2021 at 19:07, Tomas Vondra wrote: > > On 3/17/21 7:54 PM, Dean Rasheed wrote: > > > > it might have been better to estimate the first case as > > > > ndistinct((a+b)) * ndistinct(c) * ndistinct(d) > > > > and the second case as > > > > ndistinct((a+b)) * ndistinct((c+d))

Re: PoC/WIP: Extended statistics on expressions

2021-03-17 Thread Tomas Vondra
On 3/17/21 7:54 PM, Dean Rasheed wrote: > On Wed, 17 Mar 2021 at 17:26, Tomas Vondra > wrote: >> >> My concern is that the current behavior (where we prefer expression >> stats over multi-column stats to some extent) works fine as long as the >> parts are independent, but once there's

Re: PoC/WIP: Extended statistics on expressions

2021-03-17 Thread Dean Rasheed
On Wed, 17 Mar 2021 at 17:26, Tomas Vondra wrote: > > My concern is that the current behavior (where we prefer expression > stats over multi-column stats to some extent) works fine as long as the > parts are independent, but once there's dependency it's probably more > likely to produce

Re: PoC/WIP: Extended statistics on expressions

2021-03-17 Thread Tomas Vondra
Hi, On 3/17/21 4:55 PM, Dean Rasheed wrote: > On Sun, 7 Mar 2021 at 21:10, Tomas Vondra > wrote: >> >> 2) ndistinct >> >> There's one thing that's bugging me, in how we handle "partial" matches. >> For each expression we track both the original expression and the Vars >> we extract from it. If

Re: PoC/WIP: Extended statistics on expressions

2021-03-17 Thread Dean Rasheed
On Sun, 7 Mar 2021 at 21:10, Tomas Vondra wrote: > > 2) ndistinct > > There's one thing that's bugging me, in how we handle "partial" matches. > For each expression we track both the original expression and the Vars > we extract from it. If we can't find a statistics matching the whole >

Re: PoC/WIP: Extended statistics on expressions

2021-03-05 Thread Dean Rasheed
On Thu, 4 Mar 2021 at 22:16, Tomas Vondra wrote: > > Attached is a slightly improved version of the patch series, addressing > most of the issues raised in the previous message. Cool. Sorry for the delay replying. > 0003-Extended-statistics-on-expressions-20210304.patch > > Mostly unchanged,

Re: PoC/WIP: Extended statistics on expressions

2021-03-04 Thread Tomas Vondra
On 3/5/21 1:43 AM, Justin Pryzby wrote: > On Mon, Jan 04, 2021 at 09:45:24AM -0600, Justin Pryzby wrote: >> On Mon, Jan 04, 2021 at 03:34:08PM +, Dean Rasheed wrote: >>> * I'm not sure I understand the need for 0001. Wasn't there an earlier >>> version of this patch that just did it by

Re: PoC/WIP: Extended statistics on expressions

2021-03-04 Thread Justin Pryzby
On Mon, Jan 04, 2021 at 09:45:24AM -0600, Justin Pryzby wrote: > On Mon, Jan 04, 2021 at 03:34:08PM +, Dean Rasheed wrote: > > * I'm not sure I understand the need for 0001. Wasn't there an earlier > > version of this patch that just did it by re-populating the type > > array, but which still

Re: PoC/WIP: Extended statistics on expressions

2021-02-16 Thread Tomas Vondra
On 1/27/21 12:02 PM, Dean Rasheed wrote: > On Fri, 22 Jan 2021 at 03:49, Tomas Vondra > wrote: >> >> Whooops. A fixed version attached. >> > > The change to pg_stats_ext_exprs isn't quite right, because now it > cross joins expressions and their stats, which leads to too many rows, > with the

Re: PoC/WIP: Extended statistics on expressions

2021-01-22 Thread Tomas Vondra
On 1/22/21 5:01 AM, Justin Pryzby wrote: On Fri, Jan 22, 2021 at 04:49:51AM +0100, Tomas Vondra wrote: | Statistics objects: | "public"."s2" (ndistinct, dependencies, mcv) ON FROM t Umm, for me that prints: "public"."s2" ON ((i + 1)), (((i + 1) + 0)) FROM t which I think is

Re: PoC/WIP: Extended statistics on expressions

2021-01-22 Thread Tomas Vondra
On 1/22/21 10:00 AM, Dean Rasheed wrote: On Fri, 22 Jan 2021 at 04:46, Justin Pryzby wrote: I think you'll maybe have to do something better - this seems a bit too weird: | postgres=# CREATE STATISTICS s2 ON (i+1) ,i FROM t; | postgres=# \d t | ... | "public"."s2" (ndistinct,

Re: PoC/WIP: Extended statistics on expressions

2021-01-22 Thread Dean Rasheed
On Fri, 22 Jan 2021 at 04:46, Justin Pryzby wrote: > > I think you'll maybe have to do something better - this seems a bit too weird: > > | postgres=# CREATE STATISTICS s2 ON (i+1) ,i FROM t; > | postgres=# \d t > | ... > | "public"."s2" (ndistinct, dependencies, mcv) ON i FROM t > I guess

Re: PoC/WIP: Extended statistics on expressions

2021-01-21 Thread Justin Pryzby
On Thu, Jan 21, 2021 at 10:01:01PM -0600, Justin Pryzby wrote: > On Fri, Jan 22, 2021 at 04:49:51AM +0100, Tomas Vondra wrote: > > > > | Statistics objects: > > > > | "public"."s2" (ndistinct, dependencies, mcv) ON FROM t > > > > Umm, for me that prints: > > > "public"."s2" ON ((i +

Re: PoC/WIP: Extended statistics on expressions

2021-01-21 Thread Justin Pryzby
On Fri, Jan 22, 2021 at 04:49:51AM +0100, Tomas Vondra wrote: > > > | Statistics objects: > > > | "public"."s2" (ndistinct, dependencies, mcv) ON FROM t > > Umm, for me that prints: > "public"."s2" ON ((i + 1)), (((i + 1) + 0)) FROM t > > which I think is OK. But maybe there's

Re: PoC/WIP: Extended statistics on expressions

2021-01-21 Thread Justin Pryzby
This already needs to be rebased on 55dc86eca. And needs to update rules.out. And doesn't address this one: On Sun, Jan 17, 2021 at 10:53:31PM -0600, Justin Pryzby wrote: > | postgres=# CREATE TABLE t(i int); > | postgres=# CREATE STATISTICS s2 ON (i+1) ,(i+1+0) FROM t; > | postgres=# \d t > |

Re: PoC/WIP: Extended statistics on expressions

2021-01-21 Thread Dean Rasheed
On Tue, 19 Jan 2021 at 01:57, Tomas Vondra wrote: > > > A slightly bigger issue that I don't like is the way it assigns > > attribute numbers for expressions starting from > > MaxHeapAttributeNumber+1, so the first expression has an attnum of > > 1601. That leads to pretty inefficient use of

Re: PoC/WIP: Extended statistics on expressions

2021-01-18 Thread Tomas Vondra
On 1/18/21 4:48 PM, Dean Rasheed wrote: Looking through extended_stats.c, I found a corner case that can lead to a seg-fault: CREATE TABLE foo(); CREATE STATISTICS s ON (1) FROM foo; ANALYSE foo; This crashes in lookup_var_attr_stats(), because it isn't expecting nvacatts to be 0. I can't

Re: PoC/WIP: Extended statistics on expressions

2021-01-18 Thread Dean Rasheed
Looking through extended_stats.c, I found a corner case that can lead to a seg-fault: CREATE TABLE foo(); CREATE STATISTICS s ON (1) FROM foo; ANALYSE foo; This crashes in lookup_var_attr_stats(), because it isn't expecting nvacatts to be 0. I can't think of any case where building stats on a

Re: PoC/WIP: Extended statistics on expressions

2021-01-17 Thread Justin Pryzby
On Sun, Jan 17, 2021 at 01:23:39AM +0100, Tomas Vondra wrote: > > CREATE TABLE t AS SELECT generate_series(1,9) AS i; > > CREATE STATISTICS s ON (i+1) ,(i+1+0) FROM t; > > ANALYZE t; > > SELECT i+1 FROM t GROUP BY 1; > > ERROR: corrupt MVNDistinct entry > > Thanks. There was a thinko in

Re: PoC/WIP: Extended statistics on expressions

2021-01-17 Thread Tomas Vondra
On 1/17/21 6:29 AM, Justin Pryzby wrote: On Sun, Jan 17, 2021 at 01:23:39AM +0100, Tomas Vondra wrote: diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml index 4363be50c3..5b8eb8d248 100644 --- a/doc/src/sgml/ref/create_statistics.sgml +++

Re: PoC/WIP: Extended statistics on expressions

2021-01-17 Thread Tomas Vondra
On 1/17/21 3:55 AM, Zhihong Yu wrote: Hi, +    * Check that only the base rel is mentioned.  (This should be dead code +    * now that add_missing_from is history.) +    */ +   if (list_length(pstate->p_rtable) != 1) If it is dead code, it can be removed, right ? Maybe. The question is

Re: PoC/WIP: Extended statistics on expressions

2021-01-16 Thread Justin Pryzby
On Sun, Jan 17, 2021 at 01:23:39AM +0100, Tomas Vondra wrote: > diff --git a/doc/src/sgml/ref/create_statistics.sgml > b/doc/src/sgml/ref/create_statistics.sgml > index 4363be50c3..5b8eb8d248 100644 > --- a/doc/src/sgml/ref/create_statistics.sgml > +++ b/doc/src/sgml/ref/create_statistics.sgml >

Re: PoC/WIP: Extended statistics on expressions

2021-01-16 Thread Zhihong Yu
Hi, +* Check that only the base rel is mentioned. (This should be dead code +* now that add_missing_from is history.) +*/ + if (list_length(pstate->p_rtable) != 1) If it is dead code, it can be removed, right ? For statext_mcv_clauselist_selectivity: + //

Re: PoC/WIP: Extended statistics on expressions

2021-01-16 Thread Justin Pryzby
On Sat, Jan 16, 2021 at 05:48:43PM +0100, Tomas Vondra wrote: > + > + expr text > + > + > + Expression the extended statistics is defined on > + Expression the extended statistics ARE defined on Or maybe say "on which the extended statistics are defined" > +

Re: PoC/WIP: Extended statistics on expressions

2021-01-08 Thread Tomas Vondra
On 1/8/21 3:35 AM, Justin Pryzby wrote: > On Fri, Jan 08, 2021 at 01:57:29AM +0100, Tomas Vondra wrote: >> Attached is a patch fixing most of the issues. There are a couple >> exceptions: > > In the docs: > > ... > Thanks! Checking the docs and comments is a tedious work, I appreciate you going

Re: PoC/WIP: Extended statistics on expressions

2021-01-07 Thread Justin Pryzby
On Fri, Jan 08, 2021 at 01:57:29AM +0100, Tomas Vondra wrote: > Attached is a patch fixing most of the issues. There are a couple > exceptions: In the docs: +at the cost that its schema must be extended whenever the structure

Re: PoC/WIP: Extended statistics on expressions

2021-01-07 Thread Dean Rasheed
Starting to look at the planner code, I found an oversight in the way expression stats are read at the start of planning -- it is necessary to call ChangeVarNodes() on any expressions if the relid isn't 1, otherwise the stats expressions may contain Var nodes referring to the wrong relation.

Re: PoC/WIP: Extended statistics on expressions

2021-01-06 Thread Dean Rasheed
Looking over the statscmds.c changes, there are a few XXX's and FIXME's that need resolving, and I had a couple of other minor comments: + /* +* An expression using mutable functions is probably wrong, +* since if you aren't going to get the same result for the +

Re: PoC/WIP: Extended statistics on expressions

2021-01-05 Thread Tomas Vondra
On 1/5/21 3:10 PM, Dean Rasheed wrote: On Tue, 5 Jan 2021 at 00:45, Tomas Vondra wrote: On 1/4/21 4:34 PM, Dean Rasheed wrote: * In src/bin/psql/describe.c, I think the \d output should also exclude the "expressions" stats kind and just list the other kinds (or have no kinds list at all,

Re: PoC/WIP: Extended statistics on expressions

2021-01-05 Thread Dean Rasheed
On Tue, 5 Jan 2021 at 00:45, Tomas Vondra wrote: > > On 1/4/21 4:34 PM, Dean Rasheed wrote: > > > > * In src/bin/psql/describe.c, I think the \d output should also > > exclude the "expressions" stats kind and just list the other kinds (or > > have no kinds list at all, if there are no other

Re: PoC/WIP: Extended statistics on expressions

2021-01-04 Thread Tomas Vondra
On 1/4/21 4:34 PM, Dean Rasheed wrote: ... Some other comments: * I'm not sure I understand the need for 0001. Wasn't there an earlier version of this patch that just did it by re-populating the type array, but which still had it as an array rather than turning it into a list? Making it a

Re: PoC/WIP: Extended statistics on expressions

2021-01-04 Thread Justin Pryzby
On Mon, Jan 04, 2021 at 03:34:08PM +, Dean Rasheed wrote: > * I'm not sure I understand the need for 0001. Wasn't there an earlier > version of this patch that just did it by re-populating the type > array, but which still had it as an array rather than turning it into > a list? Making it a

Re: PoC/WIP: Extended statistics on expressions

2021-01-04 Thread Dean Rasheed
On Fri, 11 Dec 2020 at 20:17, Tomas Vondra wrote: > > OK. Attached is an updated version, reworking it this way. Cool. I think this is an exciting development, so I hope it makes it into the next release. I have started looking at it. So far I have only looked at the catalog, parser and client

Re: PoC/WIP: Extended statistics on expressions

2020-12-11 Thread Dean Rasheed
On Tue, 8 Dec 2020 at 12:44, Tomas Vondra wrote: > > Possibly. But I don't think it's worth the extra complexity. I don't > expect people to have a lot of overlapping stats, so the amount of > wasted space and CPU time is expected to be fairly limited. > > So I don't think it's worth spending too

Re: PoC/WIP: Extended statistics on expressions

2020-12-08 Thread Tomas Vondra
On 12/7/20 5:02 PM, Dean Rasheed wrote: > On Mon, 7 Dec 2020 at 14:15, Tomas Vondra > wrote: >> >> On 12/7/20 10:56 AM, Dean Rasheed wrote: >>> it might actually be >>> neater to have separate documented syntaxes for single- and >>> multi-column statistics: >>> >>> CREATE STATISTICS [ IF

Re: PoC/WIP: Extended statistics on expressions

2020-12-07 Thread Dean Rasheed
On Mon, 7 Dec 2020 at 14:15, Tomas Vondra wrote: > > On 12/7/20 10:56 AM, Dean Rasheed wrote: > > it might actually be > > neater to have separate documented syntaxes for single- and > > multi-column statistics: > > > > CREATE STATISTICS [ IF NOT EXISTS ] statistics_name > > ON (expression)

Re: PoC/WIP: Extended statistics on expressions

2020-12-07 Thread Tomas Vondra
On 12/7/20 10:56 AM, Dean Rasheed wrote: > On Thu, 3 Dec 2020 at 15:23, Tomas Vondra > wrote: >> >> Attached is a patch series rebased on top of 25a9e54d2d. > > After reading this thread and [1], I think I prefer the name > "standard" rather than "expressions", because it is meant to

Re: PoC/WIP: Extended statistics on expressions

2020-12-07 Thread Dean Rasheed
On Thu, 3 Dec 2020 at 15:23, Tomas Vondra wrote: > > Attached is a patch series rebased on top of 25a9e54d2d. After reading this thread and [1], I think I prefer the name "standard" rather than "expressions", because it is meant to describe the kind of statistics being built rather than what

Re: PoC/WIP: Extended statistics on expressions

2020-11-24 Thread Tomas Vondra
On 11/24/20 5:23 PM, Justin Pryzby wrote: > On Mon, Nov 23, 2020 at 04:30:26AM +0100, Tomas Vondra wrote: >> 0004 - Seems fine. IMHO not really "silly errors" but OK. > > This is one of the same issues you pointed out - shadowing a variable. > Could be backpatched. > > On Mon, Nov 23, 2020 at

Re: PoC/WIP: Extended statistics on expressions

2020-11-24 Thread Justin Pryzby
On Mon, Nov 23, 2020 at 04:30:26AM +0100, Tomas Vondra wrote: > 0004 - Seems fine. IMHO not really "silly errors" but OK. This is one of the same issues you pointed out - shadowing a variable. Could be backpatched. On Mon, Nov 23, 2020 at 04:30:26AM +0100, Tomas Vondra wrote: > > +

Re: PoC/WIP: Extended statistics on expressions

2020-11-22 Thread Tomas Vondra
On 11/23/20 3:26 AM, Justin Pryzby wrote: > On Sun, Nov 22, 2020 at 08:03:51PM +0100, Tomas Vondra wrote: >> attached is a significantly improved version of the patch, allowing >> defining extended statistics on expressions. This fixes most of the >> problems in the previous WIP version and

Re: PoC/WIP: Extended statistics on expressions

2020-11-16 Thread Tomas Vondra
On 11/16/20 2:49 PM, Tomas Vondra wrote: > Hi, > > ... > > 4) apply the statistics > >This is the hard part, really, and the exact state of the support >depends on type of statistics. > >For ndistinct coefficients, it generally works. I'm sure there may be >bugs in