Re: Should 'sum(mvf)' read 'sum(mcv)'...?

2023-04-12 Thread Daniel Gustafsson
> On 12 Apr 2023, at 14:14, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> I was inclined to spell it out as mcv_frequencies but we use xxx_freqs
>> elsewhere on the same page so keeping it consistent seems better.  The 
>> attached
>> does this as well as adding mcf/mcv as acronyms as previously mentioned 
>> (since
>> they are both tagged as ).
> 
> mcv_freqs looks good.  I'd write the glossary entries as singular
> (Most Common Frequency, Most Common Value) since our typical usage
> is to pluralize them at the point of use ("MCVs").  Also, just
> expanding the acronym doesn't seem that helpful.  Maybe more like

Pushed with your suggested changes, thanks!

--
Daniel Gustafsson





Re: Should 'sum(mvf)' read 'sum(mcv)'...?

2023-04-12 Thread Tom Lane
Daniel Gustafsson  writes:
> I was inclined to spell it out as mcv_frequencies but we use xxx_freqs
> elsewhere on the same page so keeping it consistent seems better.  The 
> attached
> does this as well as adding mcf/mcv as acronyms as previously mentioned (since
> they are both tagged as ).

mcv_freqs looks good.  I'd write the glossary entries as singular
(Most Common Frequency, Most Common Value) since our typical usage
is to pluralize them at the point of use ("MCVs").  Also, just
expanding the acronym doesn't seem that helpful.  Maybe more like

MCF

Most Common Frequency, that is the frequency associated
with some Most Common Value

MCV

Most Common Value, one of the values appearing most often
within a particular table column

regards, tom lane




Re: Should 'sum(mvf)' read 'sum(mcv)'...?

2023-04-12 Thread Daniel Gustafsson
> On 22 Aug 2022, at 14:58, Tom Lane  wrote:
> 
> Julien Rouhaud  writes:
>> On Sun, Aug 21, 2022 at 11:02:04PM +, PG Doc comments form wrote:
>>> It appears the above sentence is referring to the "(1 - sum(mvf))" portion
>>> of the formula, however I am not sure what "mvf" is referring to
>>> there...shouldn't it be "(1 - sum(mcv))" in order to match what the
>>> explanatory sentence is saying?
> 
>> It should be mcf, ie. Most Common Frequencies.  It looks like a very old typo
>> that survived until now.
> 
> I don't think it's a typo exactly, but an odd abbreviation for "Most
> common Values' Frequencies".  (Summing the MCVs themselves isn't
> sensible; they might not even be numeric.)
> 
> I'd vote for replacing mvf in both places with something a bit more
> spelled-out, perhaps "mcv_freqs".

I was inclined to spell it out as mcv_frequencies but we use xxx_freqs
elsewhere on the same page so keeping it consistent seems better.  The attached
does this as well as adding mcf/mcv as acronyms as previously mentioned (since
they are both tagged as ).

--
Daniel Gustafsson



mcf_mcv.diff
Description: Binary data


Re: Should 'sum(mvf)' read 'sum(mcv)'...?

2022-08-22 Thread Tom Lane
Julien Rouhaud  writes:
> On Sun, Aug 21, 2022 at 11:02:04PM +, PG Doc comments form wrote:
>> It appears the above sentence is referring to the "(1 - sum(mvf))" portion
>> of the formula, however I am not sure what "mvf" is referring to
>> there...shouldn't it be "(1 - sum(mcv))" in order to match what the
>> explanatory sentence is saying?

> It should be mcf, ie. Most Common Frequencies.  It looks like a very old typo
> that survived until now.

I don't think it's a typo exactly, but an odd abbreviation for "Most
common Values' Frequencies".  (Summing the MCVs themselves isn't
sensible; they might not even be numeric.)

I'd vote for replacing mvf in both places with something a bit more
spelled-out, perhaps "mcv_freqs".

regards, tom lane




Re: Should 'sum(mvf)' read 'sum(mcv)'...?

2022-08-22 Thread Daniel Gustafsson
> On 22 Aug 2022, at 12:08, Julien Rouhaud  wrote:

> That was actually introduced 2 years before in 234d50812c8 by Bruce.

Yes, I was unclear, I meant that the second use was by Tom (whom I also missed
to CC as I said I would so doing that now).

--
Daniel Gustafsson   https://vmware.com/





Re: Should 'sum(mvf)' read 'sum(mcv)'...?

2022-08-22 Thread Julien Rouhaud
Hi,

On Mon, Aug 22, 2022 at 11:13:38AM +0200, Daniel Gustafsson wrote:
> > On 22 Aug 2022, at 09:48, Julien Rouhaud  wrote:
> > On Sun, Aug 21, 2022 at 11:02:04PM +, PG Doc comments form wrote:
> 
> >> It appears the above sentence is referring to the "(1 - sum(mvf))" portion
> >> of the formula, however I am not sure what "mvf" is referring to
> >> there...shouldn't it be "(1 - sum(mcv))" in order to match what the
> >> explanatory sentence is saying?
> > 
> > It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
> > that survived until now.
> 
> That seems plausible, but it does seem introduced on purpose in f5678e8e075 so
> CC:ing Tom for a trip down memory lane.

That was actually introduced 2 years before in 234d50812c8 by Bruce.

> Looking at this I noticed that we mark up MCV and MCF as acronyms but they
> aren't defined in acronyms.sgml.  ISTM it's a good idea to keep a 1:1 mapping
> between markup and content, so we should probably do that as per the attached?

Agreed, although MCF is only used in planstats.sgml and the acronym defined
locally.




Re: Should 'sum(mvf)' read 'sum(mcv)'...?

2022-08-22 Thread Daniel Gustafsson
> On 22 Aug 2022, at 09:48, Julien Rouhaud  wrote:
> On Sun, Aug 21, 2022 at 11:02:04PM +, PG Doc comments form wrote:

>> It appears the above sentence is referring to the "(1 - sum(mvf))" portion
>> of the formula, however I am not sure what "mvf" is referring to
>> there...shouldn't it be "(1 - sum(mcv))" in order to match what the
>> explanatory sentence is saying?
> 
> It should be mcf, ie. Most Common Frequencies. It looks like a very old typo
> that survived until now.

That seems plausible, but it does seem introduced on purpose in f5678e8e075 so
CC:ing Tom for a trip down memory lane.

Looking at this I noticed that we mark up MCV and MCF as acronyms but they
aren't defined in acronyms.sgml.  ISTM it's a good idea to keep a 1:1 mapping
between markup and content, so we should probably do that as per the attached?

--
Daniel Gustafsson   https://vmware.com/



mcx_acronyms.diff
Description: Binary data


Re: Should 'sum(mvf)' read 'sum(mcv)'...?

2022-08-22 Thread Julien Rouhaud
Hi,

On Sun, Aug 21, 2022 at 11:02:04PM +, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/14/row-estimation-examples.html
> Description:
> 
> About halfway down this page
> https://www.postgresql.org/docs/current/row-estimation-examples.html we see
> the following formula for calculating selectivity:
> 
> > selectivity = (1 - sum(mvf))/(num_distinct - num_mcv)
> 
> And just below the formula we see the explanatory sentence saying:
> 
>> That is, add up all the frequencies for the MCVs and subtract them from
> one, ...
> 
> It appears the above sentence is referring to the "(1 - sum(mvf))" portion
> of the formula, however I am not sure what "mvf" is referring to
> there...shouldn't it be "(1 - sum(mcv))" in order to match what the
> explanatory sentence is saying?

It should be mcf, ie. Most Common Frequencies.  It looks like a very old typo
that survived until now.