On Mon, 13 Oct 2025 at 14:28, David G. Johnston
<[email protected]> wrote:
> How about:
>
> "n_disinct is used for normal tables while n_distinct_inherited is used for 
> partitioned tables.  Both are usable (selected via the ONLY modifier) for an 
> inheritance parent table since it has both storage and children."

I specifically was trying my best to avoid using the word "normal"
here as it's very indescriptive.  Rhetorical question: would you
expect a setting which applies to "normal" tables to be applied to a
materialized view? Instead, I described the cases where
n_distinct_inherited applies, and left it to the reader to assume that
n_distinct applies in all other cases, which IMO is correct. I don't
see the need to talk about ONLY. I'm only discussing stored statistics
in this part, i.e, what's in pg_statistic.

> The use of both "Ordinarily" and "overwritten" is bothering me here.  And it 
> implies that n_distinct doesn't work for inheritance parent tables or, 
> conversely, that n_distinct does work for partitioned tables.

I don't know why you think that implies that n_distinct works for
partitioned tables. I've specifically said that "n_distinct_inherited
allows the n_distinct estimate to be overwritten for partitioned
tables". Why would the reader then assume n_distinct is for doing
that?

As for "overwritten", this is exactly what changing this setting does
to the pg_statistic.stadistinct when ANALYZE runs for the table, so as
far as I'm concerned, it's a good and accurate way to describe what
the settings do.

I'm open to finding something better for "Ordinarily
<literal>n_distinct</literal> is used."

>> I also fixed what I thought was some misleading text about ANALYZE
>> using this value to calculate things.
>
>
> > values in the column is linear with the estimated number
> >      of rows in the table; the exact count is to be computed by multiplying 
> > the estimated
> >      rows in the table by the absolute value of the given number.
>
> "...is proportional to the estimated number of rows in the table at planning 
> time."
>
> (The final "at planning time" substitutes for the sentence you pondered 
> removing.)

I've adjusted the "proportional" part, but the "add planning time" at
the end is surplus because we've already mentioned that this is
happening in "the query planner". I doubt anyone needs the additional
reassurance that things that the query planner does happen at planning
time.

> (The rest, including the examples, seem a bit self-explanatory given the 
> definition, though I do get reader inexperience with the terminology.  But 
> proportional implies a linear relationship, and the positive/negative 
> bifurcation seems straight-forward here.)

I don't feel the need to remove the examples. I think they're good to
reinforce how the negative values are utilised during planning.

> I'm thinking everything else below is better incorporated into 14.2 which 
> should be linking back to this section.  That way the crux of the usage is 
> defined in syntax while the details about setting a specific value are 
> located in the section covering the overall topic.
>
>  the exact count is to be computed by multiplying the estimated
> +      rows in the table by the absolute value of the given number.  For 
> example,
>        a value of -1 implies that all values in the column are distinct, while
> -      a value of -0.5 implies that each value appears twice on the average.
> +      a value of -0.5 implies that each value appears twice on average.
>        This can be useful when the size of the table changes over time, since
>        the multiplication by the number of rows in the table is not performed
>        until query planning time.
>
> (Leave: Specify a value of 0 to revert to estimating...)

Do you mean remove that part?  I did consider that as I didn't think
the reader needed to know that as they could use RESET (n_distinct);

> That said, this rework would be OK as-is.
>
> Also, looking at stadistinct, the multiplier stored there accounts for the 
> presence of null.  This attribute-option does not.  Is that difference worth 
> noting?

It's already becoming more painful to get this changed than I would
have anticipated. Feel free to suggest something on a new thread if
you think it's worthwhile to mention it.

David


Reply via email to