Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Mark Wed, 02 Jan 2019 10:05:16 -0800

Hi Tom,

Thanks for your reply.


Am I correct in my understanding that any row that has been modified (i.e.
UPDATE) is in state HEAPTUPLE_INSERT_IN_PROGRESS so it will not be included
in the sample?

I'm going to rework the application so there is less time between the
DELETE and the COMMIT so I will only see the problem if ANALYZE runs during
this smaller time window. Looks like your patch will help if this happens.

Then again, it also seems no one has had a problem with its current
behaviour (for 11 years!).

Thanks,

Mark

On Wed, 2 Jan 2019 at 16:11 Tom Lane <t...@sss.pgh.pa.us> wrote:

> Mark <mwchamb...@gmail.com> writes:
> > I have a long running job which deletes all of the common_student table
> and
> > then repopulates it. It takes long time to load all the other data and
> > commit the transaction. I didn't think the delete inside the transaction
> > would have any effect until it is commited or rolled back.
> > I will have to rewrite the application so it updates the existing rows
> > rather than deleting all and then inserting.
>
> Hmm ... I'm not sure if that will actually make things better.  The root
> of the issue is what analyze.c does with DELETE_IN_PROGRESS tuples:
>
>                      * We count delete-in-progress rows as still live,
> using
>                      * the same reasoning given above; but we don't bother
> to
>                      * include them in the sample.
>
> The "reasoning given above" is a reference to what happens for
> INSERT_IN_PROGRESS tuples:
>
>                      * Insert-in-progress rows are not counted.  We assume
>                      * that when the inserting transaction commits or
> aborts,
>                      * it will send a stats message to increment the proper
>                      * count.  This works right only if that transaction
> ends
>                      * after we finish analyzing the table; if things
> happen
>                      * in the other order, its stats update will be
>                      * overwritten by ours.  However, the error will be
> large
>                      * only if the other transaction runs long enough to
>                      * insert many tuples, so assuming it will finish
> after us
>                      * is the safer option.
>
> Now the problem with this, from your perspective, is that *neither*
> INSERT_IN_PROGRESS nor DELETE_IN_PROGRESS tuples are included in
> ANALYZE's data sample.  So a long-running update transaction will
> cause all the rows it changes to be excluded from the sample until
> commit.  This seems like a bad thing, and it definitely means that
> adjusting your app as you're contemplating won't help.
>
> In order to handle the bulk-update scenario sanely, it seems like
> we ought to allow one of INSERT_IN_PROGRESS and DELETE_IN_PROGRESS tuples
> to be included.  And it looks like, for consistency with the row-counting
> logic, the one that's included needs to be DELETE_IN_PROGRESS.  This
> is a slightly annoying conclusion, because it means we're accumulating
> stats that we know are likely to soon be obsolete, but I think sampling
> INSERT_IN_PROGRESS tuples instead would lead to very strange results
> in some cases.
>
> In short, I think we want to do this to the DELETE_IN_PROGRESS logic:
>
>                     if
> (TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(targtuple.t_data)))
>                         deadrows += 1;
>                     else
> +                   {
> +                       sample_it = true;
>                         liverows += 1;
> +                   }
>
> with suitable adjustment of the adjacent comment.
>
> Thoughts?
>
>             regards, tom lane
>

Re: Query planner / Analyse statistics bad estimate rows=1 with maximum statistics 10000 on PostgreSQL 10.2

Reply via email to