Re: Bogosities in pg_dump's extended statistics support

2018-02-12 Thread Robert Haas
On Mon, Feb 12, 2018 at 10:07 AM, Alvaro Herrera
 wrote:
> The followup question is what to do for multi-table stats objects.  I
> think the options are that it should be dumped if:
>
> 1. any of its tables are dumped
> 2. *all* its tables are dumped
> 3. the schema containing the stats object is dumped
>
> I think #2 is a loser: the object goes missing if you dump the tables
> individually.
>
> #3 has a similar problem: if you put the stats object in a schema that's
> not that of any of the involved tables (admittedly a somewhat odd
> choice), then again the stats object goes missing, and there is no
> indication that the dump is incomplete except that the application
> becomes slower -- a situation difficult to detect (unlike missing
> regular objects, because then queries break in obvious ways).
>
> My preference is for #1.  It has the small problem that if you have one
> dump for each table, the stats object appears more than once (so
> restoring both causes an error during restore of the second one), but I
> don't think this is terrible.

It's not the worst thing in the world, but it's a long way from being
the best thing.  My personal experience has been that it sucks when
you get errors during a restore -- it's often not clear exactly what
object was affected and whether the error is something that's going to
break your application.  Of course, it's even worse if you are
restoring in single-transaction or on-error-stop mode.

I believe I previously argued that statistics objects should be
considered first-class objects in their own right, not just table
properties, and that's still my position.  In short, I agree with Tom.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Bogosities in pg_dump's extended statistics support

2018-02-12 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> I propose that we should just treat extended stats objects like other
>> generic database objects --- which, in practice, means "dump if the
>> containing schema is getting dumped".  We don't exclude views or matviews
>> on the basis of whether their underlying tables are going to be dumped,
>> so why should stats objects operate differently from those?

> I claim that views and matviews are indeed separate objects rather than
> sub-objects belonging to a table; but a stats object is, in some way,
> just part of the table(s) definition(s), so the analogy doesn't hold
> necessarily.  In particular the decision to dump the stats object should
> be connected to the table(s) rather than the stat object as standing
> alone.

Meh.  The thing that is going to make this painful is when we allow
extensions to contain stats objects, something which doesn't seem to be
allowed today (at least, ALTER EXTENSION ADD STATISTICS doesn't work)
but which is surely a rather bad oversight.  Consider two cases for an
extension E containing a table T:

* There is a stats object S on T, and S is also a member of E.

* There is a stats object S on T, and S is NOT a member of E
(perhaps somebody made it after the fact).

I argue that if you try to tie S's dumpability strictly to T's, you are
going to break one or the other of these cases, either treating S as if
it were a member of E when you shouldn't or vice versa.  Getting the
binary-upgrade case right will make it even more complicated.

(And I haven't even mentioned, say, stats objects on two tables belonging
to different extensions.)

I think we've made our choice and it's that stats objects are independent
objects.  Trying to preserve some part of the they're-just-attributes-
of-a-table view is going to create more problems than it solves.

regards, tom lane



Re: Bogosities in pg_dump's extended statistics support

2018-02-12 Thread Alvaro Herrera
Thanks for patching.  Your points #1, #2 and #4 are OK with me.  I'm
thinking about point #3, concerning when-to-dump policy for stats
objects:

Tom Lane wrote:

> 3. pg_dump decides whether to dump a stats object on the basis of whether
> its underlying table is going to be dumped.  While that's not totally
> silly today, it likewise isn't going to scale to multi-table stats.
> I propose that we should just treat extended stats objects like other
> generic database objects --- which, in practice, means "dump if the
> containing schema is getting dumped".  We don't exclude views or matviews
> on the basis of whether their underlying tables are going to be dumped,
> so why should stats objects operate differently from those?

I claim that views and matviews are indeed separate objects rather than
sub-objects belonging to a table; but a stats object is, in some way,
just part of the table(s) definition(s), so the analogy doesn't hold
necessarily.  In particular the decision to dump the stats object should
be connected to the table(s) rather than the stat object as standing
alone.

In other words, I think the original policy of dumping a single-table
stats object whenever the table is dumped should be retained.  I think
this should be a simple patch on top of what you committed.

The followup question is what to do for multi-table stats objects.  I
think the options are that it should be dumped if:

1. any of its tables are dumped
2. *all* its tables are dumped
3. the schema containing the stats object is dumped

I think #2 is a loser: the object goes missing if you dump the tables
individually.

#3 has a similar problem: if you put the stats object in a schema that's
not that of any of the involved tables (admittedly a somewhat odd
choice), then again the stats object goes missing, and there is no
indication that the dump is incomplete except that the application
becomes slower -- a situation difficult to detect (unlike missing
regular objects, because then queries break in obvious ways).

My preference is for #1.  It has the small problem that if you have one
dump for each table, the stats object appears more than once (so
restoring both causes an error during restore of the second one), but I
don't think this is terrible.

I think you could go even further and say (combination of #1 and #3):

4. dump the stats object if any of the involved tables are dumped, *or*
   the schema containing the stats object is dumped.

with the rationale that if you explicitly dump a schema in which you've
specifically put a number of stat objects, then they should appear in
the dump, even if the tables are not dumped.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services