Re: using extended statistics to improve join estimates

2024-05-22 Thread Andrei Lepikhov
On 5/23/24 09:04, Andy Fan wrote: Andrei Lepikhov writes: * c) No extended stats with MCV. If there are multiple join clauses, * we can try using ndistinct coefficients and do what eqjoinsel does. OK, I didn't pay enough attention to this comment before. and yes, I get the same conclusion as

Re: using extended statistics to improve join estimates

2024-05-22 Thread Andy Fan
Andrei Lepikhov writes: > On 20/5/2024 15:52, Andy Fan wrote: >> Hi Andrei, >> >>> On 4/3/24 01:22, Tomas Vondra wrote: Cool! There's obviously no chance to get this into v18, and I have stuff to do in this CF. But I'll take a look after that. >>> I'm looking at your patch now - an

Re: using extended statistics to improve join estimates

2024-05-21 Thread Andrei Lepikhov
On 5/20/24 16:40, Andrei Lepikhov wrote: On 20/5/2024 15:52, Andy Fan wrote: +    if (clauselist_selectivity_hook) +    *return* clauselist_selectivity_hook(root, clauses, ..) Of course - library may estimate not all the clauses - it is a reason, why I added input/output parameter

Re: using extended statistics to improve join estimates

2024-05-20 Thread Andrei Lepikhov
On 20/5/2024 15:52, Andy Fan wrote: Hi Andrei, On 4/3/24 01:22, Tomas Vondra wrote: Cool! There's obviously no chance to get this into v18, and I have stuff to do in this CF. But I'll take a look after that. I'm looking at your patch now - an excellent start to an eagerly awaited feature! A

Re: using extended statistics to improve join estimates

2024-05-20 Thread Andy Fan
Hi Andrei, > On 4/3/24 01:22, Tomas Vondra wrote: >> Cool! There's obviously no chance to get this into v18, and I have stuff >> to do in this CF. But I'll take a look after that. > I'm looking at your patch now - an excellent start to an eagerly awaited > feature! > A couple of questions: > 1.

Re: using extended statistics to improve join estimates

2024-05-20 Thread Andrei Lepikhov
On 4/3/24 01:22, Tomas Vondra wrote: Cool! There's obviously no chance to get this into v18, and I have stuff to do in this CF. But I'll take a look after that. I'm looking at your patch now - an excellent start to an eagerly awaited feature! A couple of questions: 1. I didn't find the

Re: using extended statistics to improve join estimates

2024-04-30 Thread Andy Fan
Hello Justin, Thanks for showing interest on this! > On Sun, Apr 28, 2024 at 10:07:01AM +0800, Andy Fan wrote: >> 's/estimiatedcluases/estimatedclauses/' typo error in the >> commit message is not fixed since I have to regenerate all the commits > > Maybe you know this, but some of these

Re: using extended statistics to improve join estimates

2024-04-29 Thread Justin Pryzby
On Sun, Apr 28, 2024 at 10:07:01AM +0800, Andy Fan wrote: > 's/estimiatedcluases/estimatedclauses/' typo error in the > commit message is not fixed since I have to regenerate all the commits Maybe you know this, but some of these patches need to be squashed. Regenerating the patches to address

Re: using extended statistics to improve join estimates

2024-04-27 Thread Andy Fan
Hello Justin! Justin Pryzby writes: > |../src/backend/statistics/extended_stats.c:3151:36: warning: ‘relid’ may be > used uninitialized [-Wmaybe-uninitialized] > | 3151 | if (var->varno != relid) > | |^ >

Re: using extended statistics to improve join estimates

2024-04-12 Thread Justin Pryzby
On Tue, Apr 02, 2024 at 04:23:45PM +0800, Andy Fan wrote: > > 0001 is your patch, I just rebase them against the current master. 0006 > is not much relevant with current patch, and I think it can be committed > individually if you are OK with that. Your 002 should also remove listidx to avoid

Re: using extended statistics to improve join estimates

2024-04-02 Thread Andy Fan
Tomas Vondra writes: > On 4/2/24 10:23, Andy Fan wrote: >> >>> On Wed, Mar 02, 2022 at 11:38:21AM -0600, Justin Pryzby wrote: Rebased over 269b532ae and muted compiler warnings. >> >> Thank you Justin for the rebase! >> >> Hello Tomas, >> >> Thanks for the patch! Before I review the

Re: using extended statistics to improve join estimates

2024-04-02 Thread Tomas Vondra
On 4/2/24 10:23, Andy Fan wrote: > >> On Wed, Mar 02, 2022 at 11:38:21AM -0600, Justin Pryzby wrote: >>> Rebased over 269b532ae and muted compiler warnings. > > Thank you Justin for the rebase! > > Hello Tomas, > > Thanks for the patch! Before I review the path at the code level, I want > to

Re: using extended statistics to improve join estimates

2024-04-02 Thread Andy Fan
> On Wed, Mar 02, 2022 at 11:38:21AM -0600, Justin Pryzby wrote: >> Rebased over 269b532ae and muted compiler warnings. Thank you Justin for the rebase! Hello Tomas, Thanks for the patch! Before I review the path at the code level, I want to explain my understanding about this patch first.

Re: using extended statistics to improve join estimates

2022-03-02 Thread Justin Pryzby
On Wed, Mar 02, 2022 at 11:38:21AM -0600, Justin Pryzby wrote: > Rebased over 269b532ae and muted compiler warnings. And attached. >From 587a5e9fe87c26cdcd9602fc349f092da95cc580 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Mon, 13 Dec 2021 14:05:17 +0100 Subject: [PATCH] Estimate joins

Re: using extended statistics to improve join estimates

2022-03-02 Thread Justin Pryzby
On Wed, Jan 19, 2022 at 06:18:09PM +0800, Julien Rouhaud wrote: > On Tue, Jan 04, 2022 at 03:55:50PM -0800, Andres Freund wrote: > > On 2022-01-01 18:21:06 +0100, Tomas Vondra wrote: > > > Here's an updated patch, rebased and fixing a couple typos reported by > > > Justin Pryzby directly. > > > >

Re: using extended statistics to improve join estimates

2022-01-19 Thread Julien Rouhaud
Hi, On Tue, Jan 04, 2022 at 03:55:50PM -0800, Andres Freund wrote: > On 2022-01-01 18:21:06 +0100, Tomas Vondra wrote: > > Here's an updated patch, rebased and fixing a couple typos reported by > > Justin Pryzby directly. > > FWIW, cfbot reports a few compiler warnings: Also the patch doesn't

Re: using extended statistics to improve join estimates

2022-01-04 Thread Andres Freund
On 2022-01-01 18:21:06 +0100, Tomas Vondra wrote: > Here's an updated patch, rebased and fixing a couple typos reported by > Justin Pryzby directly. FWIW, cfbot reports a few compiler warnings: https://cirrus-ci.com/task/6067262669979648?logs=gcc_warning#L505 [18:52:15.132] time make -s

Re: using extended statistics to improve join estimates

2022-01-01 Thread Tomas Vondra
Hi, Here's an updated patch, rebased and fixing a couple typos reported by Justin Pryzby directly. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 15d0fa5b565d9ae8b4f333c1d54745397964110d Mon Sep 17 00:00:00 2001 From: Tomas Vondra

Re: using extended statistics to improve join estimates

2021-12-13 Thread Tomas Vondra
On 11/6/21 11:03, Andy Fan wrote: Hi Tomas: This is the exact patch I want, thanks for the patch! Good to hear. On Thu, Oct 7, 2021 at 3:33 AM Tomas Vondra wrote: 3) estimation by join pairs At the moment, the estimates are calculated for pairs of relations, so for example given a

Re: using extended statistics to improve join estimates

2021-12-13 Thread Tomas Vondra
On 11/22/21 02:23, Justin Pryzby wrote: Your regression tests include two errors, which appear to be accidental, and fixing the error shows that this case is being estimated poorly. +-- try combining with single-column (and single-expression) statistics +DROP STATISTICS join_test_2; +ERROR:

Re: using extended statistics to improve join estimates

2021-11-21 Thread Justin Pryzby
Your regression tests include two errors, which appear to be accidental, and fixing the error shows that this case is being estimated poorly. +-- try combining with single-column (and single-expression) statistics +DROP STATISTICS join_test_2; +ERROR: statistics object "join_test_2" does not

Re: using extended statistics to improve join estimates

2021-11-06 Thread Andy Fan
Hi Tomas: This is the exact patch I want, thanks for the patch! On Thu, Oct 7, 2021 at 3:33 AM Tomas Vondra wrote: > 3) estimation by join pairs > > At the moment, the estimates are calculated for pairs of relations, so > for example given a query > > explain analyze > select * from t1

Re: using extended statistics to improve join estimates

2021-10-06 Thread Tomas Vondra
On 10/6/21 23:03, Zhihong Yu wrote: Hi, +       conditions2 = statext_determine_join_restrictions(root, rel, mcv); + +       /* if the new statistics covers more conditions, use it */ +       if (list_length(conditions2) > list_length(conditions1)) +       { +           mcv = stat; It seems

Re: using extended statistics to improve join estimates

2021-10-06 Thread Zhihong Yu
On Wed, Oct 6, 2021 at 12:33 PM Tomas Vondra wrote: > Hi, > > attached is an improved version of this patch, addressing some of the > points mentioned in my last message: > > 1) Adds a couple regression tests, testing various join cases with > expressions, additional conditions, etc. > > 2) Adds

Re: using extended statistics to improve join estimates

2021-10-06 Thread Tomas Vondra
Hi, attached is an improved version of this patch, addressing some of the points mentioned in my last message: 1) Adds a couple regression tests, testing various join cases with expressions, additional conditions, etc. 2) Adds support for expressions, so the join clauses don't need to reference

Re: using extended statistics to improve join estimates

2021-06-14 Thread Tomas Vondra
Hi, Here's a slightly improved / cleaned up version of the PoC patch, removing a bunch of XXX and FIXMEs, adding comments, etc. The approach is sound in principle, I think, although there's still a bunch of things to address: 1) statext_compare_mcvs only really deals with equijoins / inner

Re: using extended statistics to improve join estimates

2021-03-31 Thread Zhihong Yu
Hi, + * has_matching_mcv + * Check whether the list contains statistic of a given kind The method name is find_matching_mcv(). It seems the method initially returned bool but later the return type was changed. + StatisticExtInfo *found = NULL; found normally is associated with bool

using extended statistics to improve join estimates

2021-03-31 Thread Tomas Vondra
Hi, So far the extended statistics are applied only at scan level, i.e. when estimating selectivity for individual tables. Which is great, but joins are a known challenge, so let's try doing something about it ... Konstantin Knizhnik posted a patch [1] using functional dependencies to improve