On Sat, Dec 3, 2016 at 7:23 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> I do share your concerns about unpredictable behavior - that's
> particularly worrying for pg_restore, which may be used for time-
> sensitive use cases (DR, migrations between versions), so unpredictable
> changes in behavior / duration are unwelcome.


> But isn't this more a deficiency in pg_restore, than in CREATE INDEX?
> The issue seems to be that the reltuples value may or may not get
> updated, so maybe forcing ANALYZE (even very low statistics_target
> values would do the trick, I think) would be more appropriate solution?
> Or maybe it's time add at least some rudimentary statistics into the
> dumps (the reltuples field seems like a good candidate).

I think that there is a number of reasonable ways of looking at it. It
might also be worthwhile to have a minimal ANALYZE performed by CREATE
INDEX directly, iff there are no preexisting statistics (there is
definitely going to be something pg_restore-like that we cannot fix --
some ETL tool, for example). Perhaps, as an additional condition to
proceeding with such an ANALYZE, it should also only happen when there
is any chance at all of parallelism being used (but then you get into
having to establish the relation size reliably in the absence of any
pg_class.relpages, which isn't very appealing when there are many tiny

In summary, I would really like it if a consensus emerged on how
parallel CREATE INDEX should handle the ecosystem of tools like
pg_restore, reindexdb, and so on. Personally, I'm neutral on which
general approach should be taken. Proposals from other hackers about
what to do here are particularly welcome.

Peter Geoghegan

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to