Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-23 Thread Tom Lane
Bruce Momjian  writes:
> Oh, I see what you mean.  I was just worried that some code might expect
> template1 to always have an oid of 1, but we can just call that code
> broken.

Ever since we invented template0, it's been possible to drop and recreate
template1, so I'd say any expectation that it must have OID 1 has been
wrong for a long time.  This change will just make the situation more
common.

regards, tom lane



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-23 Thread Bruce Momjian
On Sun, Jan 21, 2018 at 11:02:29AM -0500, Tom Lane wrote:
> Bruce Momjian  writes:
> > On Fri, Jan 19, 2018 at 02:54:25PM -0500, Tom Lane wrote:
> >> Command was: DROP DATABASE "template1";
> 
> > Uh, the oid of the template1 database is 1, and I assume we would want
> > to preserve that too.
> 
> I don't feel any huge attachment to that.  In the first place, under
> this proposal recreating template1 is something you would only need to do
> if you weren't satisfied with its default properties as set by initdb.
> Which ought to be a minority of users.  In the second place, if you are
> changing those properties from the way initdb set it up, it's not really
> virgin template1 anymore, so why shouldn't it have a new OID?

Oh, I see what you mean.  I was just worried that some code might expect
template1 to always have an oid of 1, but we can just call that code
broken.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-21 Thread Tom Lane
I've been hacking away at this patch, and attached is what I've got
so far.  I think this is committable, but if anyone wants to do
further review and testing, that'd be fine.

Per discussion, I got rid of the separate --set-db-properties switch:
additional database properties are now applied if and only if --create is
specified.  But the DATABASE TOC entry itself still contains only CREATE
DATABASE; the additional commands are carried in a "DATABASE PROPERTIES"
TOC entry so that they will be issued after reconnecting to the new DB.
This dodges the "ALTER DATABASE SET default_transaction_read_only"
problem.  (Furthermore, it turns out that we have to do it like that
because pg_restore issues any one TOC entry's contents as a single PQexec.
If you try to cram anything else in with the CREATE DATABASE, then the
server spits up because CREATE DATABASE becomes part of an implicit
transaction block.)

I also fixed it so that a database's comment, security label, and ACL are
restored only when saying --create.  This is different from the previous
behavior for DB comments and security labels, but it seems a great deal
more useful and consistent.  In no other case would pg_dump/pg_restore
attempt to restore a comment, security label, or ACL for an object it
hadn't created, so why should it act differently for databases?

Worth noting here is that if we accept that behavior, the problem for
which "COMMENT ON CURRENT_DATABASE" was proposed goes away, because
there's no longer a risk of trying to apply COMMENT ON DATABASE to the
wrong database.  We might still want that patch as a standalone feature,
but pg_dump no longer needs it.

Another point worth commenting on (this change was also in the v13 patch)
is that pg_dumpall formerly took some pains to omit the encoding and
locale specifications in its CREATE DATABASE commands, for databases whose
settings matched the cluster default.  This new implementation won't do
that.  We could change it to do so, but then a standalone "pg_dump
--create" would also act that way, which is not really easy to defend.
IIRC, the argument for pg_dumpall doing it like that was to make it
less painful to migrate a cluster to a new machine that might not have the
identical set of locales, or if one wanted to migrate everything to a new
encoding.  But those are not mainstream use-cases, and if you really need
to do that you can still get there by dumping databases individually
without using --create.  On the other hand, there are obvious gotchas
involved in letting a dump/reload silently change to another locale or
encoding.  So I think this is an improvement overall, but it bears
noting as a behavioral change.

Another point to note is that I dithered about whether to bump the
pg_dump archive version number, which would have the effect of preventing
pre-v11 versions of pg_restore from processing dumps made by this version.
The argument for breaking archive compatibility is that older pg_restore
versions will not know that it'd be a good idea to skip DATABASE
PROPERTIES TOC entries and database ACL entries if not using --create.
However, in default cases there won't be a DATABASE PROPERTIES entry to
skip.  Moreover, applying these entries unconditionally isn't that much
different conceptually from applying database comments or sec labels
unconditionally, as such older pg_restore versions would do anyway.
It also seems like if your back were against the wall, being able to
read a newer archive file with an older pg_restore is better than being
locked out of it completely.  So I'm leaning to no version bump, but
it could still be discussed.

One thing we could possibly use here is more regression test cases.
The only existing regression test that's affected by this patch is
002_pg_dump.pl's expectations about which cases will print a database
comment, so it seems like we're missing some behavioral coverage.
Not sure what that should look like though.

This patch has to be applied over the patches proposed in
https://www.postgresql.org/message-id/21714.1516553459%40sss.pgh.pa.us
Aside from touching some of the same code, this is dependent on
the changes made there to make comment, seclabel, and ACL entries
reliably identifiable.

As far as notable code changes go, I got rid of the previous patch's
move of executeQuery() into dumputils.c.  That had some undesirable
knock-on effects in terms of creating even more coupling between
different modules (through the g_verbose global), and it was basically
misguided anyway.  pg_dump executes queries via ExecuteSqlQuery; we
do not need a few of its functions to be using low-level routines with
behavior different from that.

If anyone wants to do further review on this, let me know; otherwise
I'll push it in a day or so.

regards, tom lane

diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 08cad68..11582dd 100644
*** a/doc/src/sgml/ref/pg_dump.sgml
--- b/doc/src/sgml/ref/pg_dump.sgml

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-21 Thread Tom Lane
Bruce Momjian  writes:
> On Fri, Jan 19, 2018 at 02:54:25PM -0500, Tom Lane wrote:
>> Command was: DROP DATABASE "template1";

> Uh, the oid of the template1 database is 1, and I assume we would want
> to preserve that too.

I don't feel any huge attachment to that.  In the first place, under
this proposal recreating template1 is something you would only need to do
if you weren't satisfied with its default properties as set by initdb.
Which ought to be a minority of users.  In the second place, if you are
changing those properties from the way initdb set it up, it's not really
virgin template1 anymore, so why shouldn't it have a new OID?

regards, tom lane



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-21 Thread Bruce Momjian
On Fri, Jan 19, 2018 at 02:54:25PM -0500, Tom Lane wrote:
> Hmm ... so there's a small problem with this idea of dropping and
> recreating template1:
> 
> pg_restore: connecting to database for restore
> pg_restore: dropping DATABASE template1
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 3024; 1262 1 DATABASE 
> template1
>  postgres
> pg_restore: [archiver (db)] could not execute query: ERROR:  cannot drop a 
> templ
> ate database
> Command was: DROP DATABASE "template1";

Uh, the oid of the template1 database is 1, and I assume we would want
to preserve that too.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Tom Lane
Robert Haas  writes:
> On Fri, Jan 19, 2018 at 2:54 PM, Tom Lane  wrote:
>> What do people think of just removing this DROP DATABASE restriction?
>> Arguably, superusers should know better than to drop template1 anyway.
>> Maybe we should replace it with a hard-wired check against dropping
>> template0 (matched by name) just to stave off the worst-case scenario.

> I think it's a little scary.  The tail might be wagging the dog at this point.

True, and having to make assumptions about which server version we're
restoring to is not very nice either.

After further reflection I've found a way to do this on the pg_dump
end that's not as ugly as I feared -- no uglier than most of the other
hacks there, anyway.  So nevermind ...

regards, tom lane



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Robert Haas
On Fri, Jan 19, 2018 at 2:54 PM, Tom Lane  wrote:
> Hmm ... so there's a small problem with this idea of dropping and
> recreating template1:
>
> pg_restore: connecting to database for restore
> pg_restore: dropping DATABASE template1
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 3024; 1262 1 DATABASE 
> template1
>  postgres
> pg_restore: [archiver (db)] could not execute query: ERROR:  cannot drop a 
> templ
> ate database
> Command was: DROP DATABASE "template1";
>
> Now in principle we could hack around that by issuing "ALTER DATABASE
> ... IS_TEMPLATE false" first, but it turns out to be harder than you
> might think to wedge that into the pg_dump infrastructure.  (The
> natural way to do it, trying to add this into the dropCmd for the
> database TOC entry, fails because (a) DROP DATABASE then ends up as
> one part of an implicit transaction block, and (b) it confuses the heck
> out of pg_restore's --if-exists kluge.)
>
> You can actually exhibit this in current releases if you try "pg_dump
> --clean --create" on a user-created template database, so it's not
> solely the fault of this patch.
>
> What do people think of just removing this DROP DATABASE restriction?
> Arguably, superusers should know better than to drop template1 anyway.
> Maybe we should replace it with a hard-wired check against dropping
> template0 (matched by name) just to stave off the worst-case scenario.

I think it's a little scary.  The tail might be wagging the dog at this point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Tom Lane
Hmm ... so there's a small problem with this idea of dropping and
recreating template1:

pg_restore: connecting to database for restore
pg_restore: dropping DATABASE template1
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 3024; 1262 1 DATABASE template1
 postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  cannot drop a templ
ate database
Command was: DROP DATABASE "template1";

Now in principle we could hack around that by issuing "ALTER DATABASE
... IS_TEMPLATE false" first, but it turns out to be harder than you
might think to wedge that into the pg_dump infrastructure.  (The
natural way to do it, trying to add this into the dropCmd for the
database TOC entry, fails because (a) DROP DATABASE then ends up as
one part of an implicit transaction block, and (b) it confuses the heck
out of pg_restore's --if-exists kluge.)

You can actually exhibit this in current releases if you try "pg_dump
--clean --create" on a user-created template database, so it's not
solely the fault of this patch.

What do people think of just removing this DROP DATABASE restriction?
Arguably, superusers should know better than to drop template1 anyway.
Maybe we should replace it with a hard-wired check against dropping
template0 (matched by name) just to stave off the worst-case scenario.

regards, tom lane



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Tom Lane
Robert Haas  writes:
> On Fri, Jan 19, 2018 at 10:00 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Fri, Jan 19, 2018 at 9:45 AM, Tom Lane  wrote:
 Well, we could say that the properties of template1 and postgres
 are only restored if you use --clean.

>>> True.  Would that be a POLA violation, do you think?

>> It seems a bit non-orthogonal.  Also, while your point that people
>> expect "merge" behavior from pg_dump is certainly true, I'm not
>> convinced that anybody would be relying on that for pg_dumpall.

> [ assorted examples ]
> Still, it's worth thinking over these kinds of
> scenarios, I think.  People do a lot of ugly things in the real world
> that we as developers would never do, mostly to work around the
> problems we fail to foresee.

Unless someone has a better idea, I'll go with the semantics stated
above: DB-level properties of the two standard databases are only
transferred to pg_dumpall's target cluster if you authorize dropping
their old contents by saying --clean.  (pg_upgrade, of course, will
do exactly that.)

regards, tom lane



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Robert Haas
On Fri, Jan 19, 2018 at 10:00 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Jan 19, 2018 at 9:45 AM, Tom Lane  wrote:
>>> Well, we could say that the properties of template1 and postgres
>>> are only restored if you use --clean.
>
>> True.  Would that be a POLA violation, do you think?
>
> It seems a bit non-orthogonal.  Also, while your point that people
> expect "merge" behavior from pg_dump is certainly true, I'm not
> convinced that anybody would be relying on that for pg_dumpall.

What I expect typically happens is that someone restores into an
existing cluster and then realizes their mistake and goes and removes
all of the extra stuff that got created.  This is a bit like when you
run tar -zxf expecting it to create a subdirectory, but it turns out
to extract in the current directory instead.  You then curse your fate
and remove all of the files it created.  It's annoying, but not that
bad.  If it nuked the contents of the currently directory first,
though, you'd be much sadder.  If somebody changed tar to have that
behavior, people would probably *eventually* become aware of the risk
and adjust their behavior to avoid catastrophe, but it would probably
take 1-2 disasters per individual before they got used to the new way,
and that's a lot of disasters considering how many people use tar.
Or, maybe people wouldn't get used to it and they'd just go after
whoever made the change with pitchforks.  Anyway, that kind of thing
seems like a danger here.

The other possibility that comes to mind is that somebody might be
doing this is working around a failure of something like CREATE
LANGUAGE.  Say the name of the .so has changed, so CREATE LANGUAGE
will fail or do the wrong thing.  Instead of editing the dump, they
pre-install the language with the correct definition and then restore
over it, ignoring errors.  I guess that would only work with pg_dump,
not pg_dumpall, unless the database that had the problem was one of
the pre-created ones.  Still, it's worth thinking over these kinds of
scenarios, I think.  People do a lot of ugly things in the real world
that we as developers would never do, mostly to work around the
problems we fail to foresee.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Tom Lane
Robert Haas  writes:
> On Fri, Jan 19, 2018 at 9:45 AM, Tom Lane  wrote:
>> Well, we could say that the properties of template1 and postgres
>> are only restored if you use --clean.

> True.  Would that be a POLA violation, do you think?

It seems a bit non-orthogonal.  Also, while your point that people
expect "merge" behavior from pg_dump is certainly true, I'm not
convinced that anybody would be relying on that for pg_dumpall.

regards, tom lane



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jan 18, 2018 at 6:35 PM, Tom Lane  wrote:
>> If we did it like that, the rationale for an actual --set-db-properties
>> switch would vanish, at least so far as pg_dumpall is concerned -- we
>> could just make all that behavior an integral part of --create.  And
>> this wouldn't need to be conditional on getting ALTER DATABASE
>> CURRENT_DATABASE done.

> Unfortunately, I have a feeling that --set-db-properties might not be
> the only thing that would vanish.  I think users are accustomed by now
> to the idea that if you restore into an existing database, the
> existing contents are preserved and the new stuff from the dump is
> added (possibly with some errors and messiness).  With this design,
> the existing database contents will instead vanish, and that is
> probably going to make somebody unhappy.

Well, we could say that the properties of template1 and postgres
are only restored if you use --clean.

regards, tom lane



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Robert Haas
On Thu, Jan 18, 2018 at 6:35 PM, Tom Lane  wrote:
> If we did it like that, the rationale for an actual --set-db-properties
> switch would vanish, at least so far as pg_dumpall is concerned -- we
> could just make all that behavior an integral part of --create.  And
> this wouldn't need to be conditional on getting ALTER DATABASE
> CURRENT_DATABASE done.

Unfortunately, I have a feeling that --set-db-properties might not be
the only thing that would vanish.  I think users are accustomed by now
to the idea that if you restore into an existing database, the
existing contents are preserved and the new stuff from the dump is
added (possibly with some errors and messiness).  With this design,
the existing database contents will instead vanish, and that is
probably going to make somebody unhappy.

I agree with you that making ALTER DATABASE accept CURRENT_DATABASE is
a good idea.  I don't have a great idea what to do about the SET
TABLESPACE problem.  It's always seemed to me to be sort of weird that
you have to have a database in order to create, drop, etc. another
database, or even get a list of databases that exist.  As a new user,
I remember being quite frustrated that connecting to a database that
didn't exist gave no hint of how to find out what databases did exist.
Eventually I discovered psql -l and all was well, but I had no idea
how it worked under the hood.  Even though I now understand the
architectural considerations that have gotten us to where we are, I
still think it would be more intuitive to users if there were a
command-shell unrelated to any database that would let you perform
operations on databases.  I realize, however, that this patch isn't
going to create such a thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-18 Thread Haribabu Kommi
On Fri, Jan 19, 2018 at 10:35 AM, Tom Lane  wrote:

> I wrote:
> > What I think we should do for the time being is to have pg_dump treat
> > database tablespace as a property it can't adjust after creation, just
> > as it can't adjust locale or encoding.  That's a loss of functionality
> > for pg_dumpall/pg_upgrade compared to where we are today, in that if
> > you've set up the template1 or postgres DBs with nondefault tablespace
> > then that won't propagate to the new cluster.  But the same can already
> > be said about their locale and encoding, and I find it hard to believe
> > that many people are trying to give those two DBs tablespace settings
> > different from the cluster default, anyway.
>
> Hm ... actually, there is more than one way to skin this cat.
>
> Let me offer a modest proposal: pg_dumpall/pg_upgrade should simply DROP
> postgres and template1 in the target cluster, and then re-create them
> (from template0 of course).  With that, we'd not only cope with preserving
> their tablespace settings, but we'd gain the ability to preserve their
> locale and encoding, even if the target cluster had been initialized with
> some other default.
>

Yes, I agree that this may be simple change to handle this problem.
Already pg_upgrade doesn't work if there is any encoding difference
between source and target databases. Most probably User will create
with same encoding.

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-18 Thread Tom Lane
I wrote:
> What I think we should do for the time being is to have pg_dump treat
> database tablespace as a property it can't adjust after creation, just
> as it can't adjust locale or encoding.  That's a loss of functionality
> for pg_dumpall/pg_upgrade compared to where we are today, in that if
> you've set up the template1 or postgres DBs with nondefault tablespace
> then that won't propagate to the new cluster.  But the same can already
> be said about their locale and encoding, and I find it hard to believe
> that many people are trying to give those two DBs tablespace settings
> different from the cluster default, anyway.

Hm ... actually, there is more than one way to skin this cat.

Let me offer a modest proposal: pg_dumpall/pg_upgrade should simply DROP
postgres and template1 in the target cluster, and then re-create them
(from template0 of course).  With that, we'd not only cope with preserving
their tablespace settings, but we'd gain the ability to preserve their
locale and encoding, even if the target cluster had been initialized with
some other default.

If we did it like that, the rationale for an actual --set-db-properties
switch would vanish, at least so far as pg_dumpall is concerned -- we
could just make all that behavior an integral part of --create.  And
this wouldn't need to be conditional on getting ALTER DATABASE
CURRENT_DATABASE done.

Comments?

regards, tom lane



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-18 Thread Tom Lane
... okay, so the reason why --set-db-properties isn't as general-purpose
a switch as you might think finally penetrated my thick skull :-(

The problem is that, as the patch is currently constituted, that will
result in emitting a lot of "ALTER DATABASE foo" type commands.  If
somebody tries to load the pg_dump output into a database not named foo,
mayhem ensues.  This is exactly the same problem people have already
noted with respect to COMMENT ON DATABASE, but now we're propagating it to
other operations that have (much) higher downsides for getting it wrong.

What we need, therefore, is ALTER DATABASE CURRENT_DATABASE, which
I see is part of the pending patch for fixing the COMMENT ON DATABASE
problem.  That one is stuck in Waiting on Author, but I'm inclined to
go see if I can push it across the finish line, and then come back
to this one.

Even with that, there's a small problem: the backend cannot reasonably
support ALTER DATABASE CURRENT_DATABASE SET TABLESPACE, except perhaps
in the no-op case where the specified tablespace is already the right
one.  So this puts a serious hole in my thoughts about allowing the
tablespace to be adjusted during --set-db-properties without --create.

What I think we should do for the time being is to have pg_dump treat
database tablespace as a property it can't adjust after creation, just
as it can't adjust locale or encoding.  That's a loss of functionality
for pg_dumpall/pg_upgrade compared to where we are today, in that if
you've set up the template1 or postgres DBs with nondefault tablespace
then that won't propagate to the new cluster.  But the same can already
be said about their locale and encoding, and I find it hard to believe
that many people are trying to give those two DBs tablespace settings
different from the cluster default, anyway.

The only way around that that I can see is to give pg_dump some
additional switch along the lines of --i-promise-i-wont-restore-
this-into-a-database-with-a-different-name, and then it could emit
the same sort of
\connect template1
alter database postgres set tablespace ...;
\connect postgres
dance that pg_dumpall is using today.  But that sort of ugliness
is exactly what we said we didn't want in this patch.

regards, tom lane



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-18 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jan 17, 2018 at 6:50 PM, Tom Lane  wrote:
>> * As the patch stands, --set-db-properties is implicit when you specify
>> -C, and in fact the patch goes to the trouble of throwing an error if you
>> try to specify both switches.  I'm inclined to think this might be a bad
>> idea.  What about saying that -C enables emitting CREATE DATABASE and
>> reconnecting to that DB, and independently of that, --set-db-properties
>> enables emitting the additional per-database properties?  This seems
>> simpler to understand, more flexible, and less of a change from the
>> previous behavior of -C.  On the other hand you could argue that people
>> would always want --set-db-properties with -C and so we're merely
>> promoting carpal tunnel (and errors of omission) if we do it like that.

> I would vigorously agree with that latter argument.  I think the
> chances of errors of omission would be very high even if the carpal
> tunnel dangers were ameliorated by giving --set-db-properties a short
> option name.

Fair enough.  We'll keep the behavioral change then.

>> If so, maybe we could say "-C implies --set-db-properties by default, but
>> if you really don't want that, you can say -C --no-set-db-properties".

> That seems like a better idea.

What I think I'll do for the moment is make them independent options so
far as the implementation is concerned, but have the command line switch
processing do

/* --create implies --set-db-properties, for now anyway */
if (dopt.outputCreateDB)
dopt.set_db_properties = 1;

If somebody actually asks for --no-set-db-properties, we can add that
later.

regards, tom lane



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-18 Thread Robert Haas
On Wed, Jan 17, 2018 at 6:50 PM, Tom Lane  wrote:
> * As the patch stands, --set-db-properties is implicit when you specify
> -C, and in fact the patch goes to the trouble of throwing an error if you
> try to specify both switches.  I'm inclined to think this might be a bad
> idea.  What about saying that -C enables emitting CREATE DATABASE and
> reconnecting to that DB, and independently of that, --set-db-properties
> enables emitting the additional per-database properties?  This seems
> simpler to understand, more flexible, and less of a change from the
> previous behavior of -C.  On the other hand you could argue that people
> would always want --set-db-properties with -C and so we're merely
> promoting carpal tunnel (and errors of omission) if we do it like that.

I would vigorously agree with that latter argument.  I think the
chances of errors of omission would be very high even if the carpal
tunnel dangers were ameliorated by giving --set-db-properties a short
option name.

> If so, maybe we could say "-C implies --set-db-properties by default, but
> if you really don't want that, you can say -C --no-set-db-properties".

That seems like a better idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-17 Thread Tom Lane
Haribabu Kommi  writes:
> [ pg_dump-and-pg_dumpall-database-handling-refactoring_v12.patch ]

I've gone through this in a once-over-lightly fashion.  Since there was
quite a bit of debate upthread about how things should work, I'd like
to start by summarizing the decisions this patch has made, in case
anyone still wants to object to them:

* pg_dump will now be responsible for dumping all per-database properties,
including "ALTER ROLE IN DATABASE SET" settings.  To get that behavior,
pg_dumpall will invoke pg_dump using -C, or using the new switch
--set-db-properties when working on "template1" or "postgres" databases,
since those should already exist in the target installation.

* pg_dumpall still won't dump template0 (or any other not-datallowconn
database).  This means that it will no longer propagate any non-default
per-database properties for such databases.  I think this is fine for
template0: if you've messed with that at all, you are fooling with non
user-serviceable parts.  It's a bit less good for other DBs maybe, but on
the whole it seems more consistent to simply treat nonconnectable DBs
as not existing.  (That could stand to be documented though.)

* "pg_dumpall -g" will now produce only role- and tablespace-related output.

* There isn't any direct way to ask pg_dump for "just the DB properties
and nothing else" (although I suppose you can mostly fake it by adding
"--schema nosuchschema" or some such).  This makes it inconvenient to
exactly duplicate the old behavior of "pg_dumpall -g", if someone wanted
to do that to fit into their existing backup procedures.  I'm not sure
how important that is.  Personally I'm content to lose it, but if there's
enough pushback maybe we could invent a "--db-properties-only" switch?

There are some other points that haven't been debated:

* As the patch stands, --set-db-properties is implicit when you specify
-C, and in fact the patch goes to the trouble of throwing an error if you
try to specify both switches.  I'm inclined to think this might be a bad
idea.  What about saying that -C enables emitting CREATE DATABASE and
reconnecting to that DB, and independently of that, --set-db-properties
enables emitting the additional per-database properties?  This seems
simpler to understand, more flexible, and less of a change from the
previous behavior of -C.  On the other hand you could argue that people
would always want --set-db-properties with -C and so we're merely
promoting carpal tunnel (and errors of omission) if we do it like that.
If so, maybe we could say "-C implies --set-db-properties by default, but
if you really don't want that, you can say -C --no-set-db-properties".
Perhaps the only application of this is to reproduce pg_dump's historical
behavior, but that's probably of some value.

* The patch fails to make any provision for deciding at pg_restore time
whether to emit DB properties.  Considering that you can use -C at restore
time, I think it's pretty awful that you can't use --set-db-properties
then.  Moreover, not only has the patch not added a separate "DATABASE
PROPERTIES" TOC entry type as was originally proposed, what it's actually
doing is emitting two identically labeled "DATABASE" TOC entries, one
with the CREATE and the other with the rest.  That's simply horrid.
I think we need to do this properly and emit two distinguishable TOC
entries, and control which of them get printed on the restore side of the
logic, not the TOC entry construction side.  (I'm not sure at this point
if we need an archive version bump to do that, but perhaps not --- in the
past we've added new TOC types without a version bump.)

* Some years ago, commit 4bd371f6f caused pg_dumpall to emit "SET
default_transaction_read_only = off;" in its scripts, so that it could
successfully handle DBs that have "SET default_transaction_read_only = on"
as a database property.  This was intentionally NOT done to pg_dump,
arguing that doing so would greatly weaken such a setting as a defense
against stupid errors (i.e. restoring into your production database).
The patch currently ignores that reasoning and moves the setting into
pg_dump anyway.  Haribabu mentioned this point upthread and got basically
no response, but I'm concerned about it.

I think however that we might be able to dodge that issue as a byproduct
of fixing the previous problem.  If the CREATE DATABASE is done as one
TOC entry, and we reconnect to the target DB after processing that entry,
and then we process the DATABASE PROPERTIES entry, then any DB property
settings we've installed will not apply in our existing session.  So I
think we can just leave out the "SET default_transaction_read_only = on"
altogether.  Obviously this puts a premium on not reconnecting, but
reconnecting would break other more important features like
--single-transaction, so I am not worried about that.

* An issue in connection with that is that because you can't issue
ALTER DATABASE SET TABLESPACE against the 

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-17 Thread Haribabu Kommi
On Thu, Jan 18, 2018 at 8:25 AM, Tom Lane  wrote:

> Haribabu Kommi  writes:
> > [ pg_dump-and-pg_dumpall-database-handling-refactoring_v12.patch ]
>
> I started to look through this, and almost immediately found that the
> diff in t/002_pg_dump.pl doesn't seem necessary --- the test passes
> for me without applying that hunk.  Is that a leftover from a previous
> patch iteration, or is there some platform dependency in the test?
> If it's not necessary, I'd be inclined to leave it as it was.
>

Thanks for the review.

Yes, it is a left over from previous patch iteration. The test used to fail
with this
patch earlier. Now there is no problem in test even after removing the hunk
from
my side also. There is no platform dependency.

Attached is an updated patch after removing the test changes.

Regards,
Hari Babu
Fujitsu Australia


pg_dump-and-pg_dumpall-database-handling-refactoring_v13.patch
Description: Binary data


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-17 Thread Tom Lane
Haribabu Kommi  writes:
> [ pg_dump-and-pg_dumpall-database-handling-refactoring_v12.patch ]

I started to look through this, and almost immediately found that the
diff in t/002_pg_dump.pl doesn't seem necessary --- the test passes
for me without applying that hunk.  Is that a leftover from a previous
patch iteration, or is there some platform dependency in the test?
If it's not necessary, I'd be inclined to leave it as it was.

regards, tom lane



Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-04 Thread Vaishnavi Prabakaran
On Fri, Jan 5, 2018 at 4:32 PM, Haribabu Kommi 
wrote:
>
>
> Corrected.
> Updated patch attached.
>
> Moved this CF item to "Ready for committer"

Regards,
Vaishnavi,
Fujitsu Australia.


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-12-12 Thread Haribabu Kommi
On Wed, Nov 29, 2017 at 4:25 PM, Michael Paquier 
wrote:

> On Wed, Nov 8, 2017 at 8:50 AM, Haribabu Kommi 
> wrote:
> > Ok. Removed the documentation changes that it cannot be used for normal
> > scenarios, and also added a Note section explaining the problem of using
> > the dump with pg_restore command with --clean and --create options.
>
> Hari, the documentation portion of the patch does not apply. Could you
> rebase? For now I am moving it to next CF as this did not get any
> reviews, and the status is switched to "waiting on author".
>

Rebased patch attached that fixes the documentation build problem.

Regards,
Hari Babu
Fujitsu Australia


pg_dump-and-pg_dumpall-database-handling-refactoring_v11.patch
Description: Binary data


Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2017-11-28 Thread Michael Paquier
On Wed, Nov 8, 2017 at 8:50 AM, Haribabu Kommi  wrote:
> Ok. Removed the documentation changes that it cannot be used for normal
> scenarios, and also added a Note section explaining the problem of using
> the dump with pg_restore command with --clean and --create options.

Hari, the documentation portion of the patch does not apply. Could you
rebase? For now I am moving it to next CF as this did not get any
reviews, and the status is switched to "waiting on author".
-- 
Michael