Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-02-10 Thread Tom Lane
I wrote:
 Now, back to the original subject of this thread: both HEAD and 9.1 are
 now operating as designed, in that they will dump the (user-provided
 portion of the) contents of an extension config table whenever that
 extension is dumped, even if --schema is specified.

Or so I thought, anyway.  Further experimentation with despez's example
shows that in HEAD, --schema is still able to block dumping of extension
config table contents, and the reason appears to be commit
a4cd6abcc901c1a8009c62a27f78696717bb8fe1, which added yet another set
of filtering conditions in a poorly chosen place; or possibly I should
say it made arbitrary changes in the definition of the --schema switch.
That patch needs some rethinking too, though I'm not sure what yet.

I also note that his example shows that if you have a selective dump
(say, with a -t switch), config table contents will be dumped even when
the owning extension is not.  This seems like a pretty clear bug:
getExtensionMembership should not be creating TableDataInfo objects for
extension config tables if the owning extension is not to be dumped.
Barring objections, I'll go fix and back-patch that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-02-08 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 OK, in this version we simply suppress creation of the TableDataInfo 
 object if it's not wanted.

I applied this with some further adjustments.

 I actually removed the code from 
 makeTableDataInfo - there are only two places it gets called and doing 
 it in those two spots seemed a bit cleaner.

After study it seemed to me that this was the wrong direction to take.
It was already the case that the config-table path was skipping the
filters in getTableData(), none of which seem to be good for it to skip
other than the one on the table's schema-dump flag.  So I've refactored
the code to put all those filters into makeTableDataInfo where they
will be applied to both the normal and config-table cases.

Now, back to the original subject of this thread: both HEAD and 9.1 are
now operating as designed, in that they will dump the (user-provided
portion of the) contents of an extension config table whenever that
extension is dumped, even if --schema is specified.  Do we want to
change that?  I'm not convinced that it's something to mess with
lightly.  I could possibly support a definition that says that we omit
such data in --schema mode, except that I'm not sure what is sensible
for rows that were modified (not inserted) by user actions.  Also, we
probably ought to get some input from the postgis people, because after
all this entire feature was designed for their schema auxiliary tables.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-02-07 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 01/31/2012 11:10 PM, Andrew Dunstan wrote:
 Here's a possible patch for the exclude-table-data problem along the 
 lines you suggest.

 Should I apply this?

I'm not happy with this yet.  My core complaint is that pg_dump used to
consider that creation of a TableDataInfo object for a table happens
if and only if we're going to dump the table's data.  And the comments
(eg in pg_dump.h) still say that.  But the previous patch left us in a
halfway zone where sometimes we'd create a TableDataInfo object and then
choose not to dump the data, and this patch doesn't get us out of that.
I think we should either revert to the previous definition, or go over
to a design wherein we always create TableDataInfo objects for all
tables (but probably still excluding data-less relations such as views)
and the whether-to-dump decision is expressed only by setting or not
setting the object's dump flag.

I worked a little bit on a patch to do the latter but found that it was
more invasive than I'd hoped.  Given the lack of any immediate payoff
I think it'd probably make more sense to do the former.  We could still
centralize the decision making into makeTableDataInfo a bit more than
now, but it should take the form of not creating the object at all,
rather than creating it and then clearing its dump flag.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-02-07 Thread Andrew Dunstan



On 02/07/2012 02:56 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 01/31/2012 11:10 PM, Andrew Dunstan wrote:

Here's a possible patch for the exclude-table-data problem along the
lines you suggest.

Should I apply this?

I'm not happy with this yet.  My core complaint is that pg_dump used to
consider that creation of a TableDataInfo object for a table happens
if and only if we're going to dump the table's data.  And the comments
(eg in pg_dump.h) still say that.  But the previous patch left us in a
halfway zone where sometimes we'd create a TableDataInfo object and then
choose not to dump the data, and this patch doesn't get us out of that.
I think we should either revert to the previous definition, or go over
to a design wherein we always create TableDataInfo objects for all
tables (but probably still excluding data-less relations such as views)
and the whether-to-dump decision is expressed only by setting or not
setting the object's dump flag.

I worked a little bit on a patch to do the latter but found that it was
more invasive than I'd hoped.  Given the lack of any immediate payoff
I think it'd probably make more sense to do the former.  We could still
centralize the decision making into makeTableDataInfo a bit more than
now, but it should take the form of not creating the object at all,
rather than creating it and then clearing its dump flag.





OK, in this version we simply suppress creation of the TableDataInfo 
object if it's not wanted. I actually removed the code from 
makeTableDataInfo - there are only two places it gets called and doing 
it in those two spots seemed a bit cleaner.


cheers

andrew
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***
*** 1141,1154  selectDumpableTable(TableInfo *tbinfo)
  			   tbinfo-dobj.catId.oid))
  		tbinfo-dobj.dump = false;
  
- 	/* If table is to be dumped, check that the data is not excluded */
- 	if (tbinfo-dobj.dump  !
- 		simple_oid_list_member(tabledata_exclude_oids,
- 			   tbinfo-dobj.catId.oid))
- 		tbinfo-dobj.dumpdata = true;
- 	else
- 		tbinfo-dobj.dumpdata = false;
- 
  }
  
  /*
--- 1141,1146 
***
*** 1599,1608  dumpTableData(Archive *fout, TableDataInfo *tdinfo)
  	DataDumperPtr dumpFn;
  	char	   *copyStmt;
  
- 	/* don't do anything if the data isn't wanted */
- 	if (!tbinfo-dobj.dumpdata)
- 		return;
- 
  	if (!dump_inserts)
  	{
  		/* Dump/restore using COPY */
--- 1591,1596 
***
*** 1658,1663  getTableData(TableInfo *tblinfo, int numTables, bool oids)
--- 1646,1656 
  			 no_unlogged_table_data)
  			continue;
  
+ 		/* Check that the data is not explicitly excluded */
+ 		if (simple_oid_list_member(tabledata_exclude_oids,
+    tblinfo[i].dobj.catId.oid))
+ 			continue;
+ 
  		if (tblinfo[i].dobj.dump  tblinfo[i].dataObj == NULL)
  			makeTableDataInfo((tblinfo[i]), oids);
  	}
***
*** 14127,14133  getExtensionMembership(Archive *fout, ExtensionInfo extinfo[],
  TableInfo  *configtbl;
  
  configtbl = findTableByOid(atooid(extconfigarray[j]));
! if (configtbl  configtbl-dataObj == NULL)
  {
  	/*
  	 * Note: config tables are dumped without OIDs regardless
--- 14120,14132 
  TableInfo  *configtbl;
  
  configtbl = findTableByOid(atooid(extconfigarray[j]));
! 
! if (configtbl == NULL || 
! 	simple_oid_list_member(tabledata_exclude_oids,
! 		   configtbl-dobj.catId.oid))
! 	continue;
! 
! if (configtbl-dataObj == NULL)
  {
  	/*
  	 * Note: config tables are dumped without OIDs regardless
*** a/src/bin/pg_dump/pg_dump.h
--- b/src/bin/pg_dump/pg_dump.h
***
*** 129,135  typedef struct _dumpableObject
  	char	   *name;			/* object name (should never be NULL) */
  	struct _namespaceInfo *namespace;	/* containing namespace, or NULL */
  	bool		dump;			/* true if we want to dump this object */
- 	booldumpdata;   /* true if we want data for this object */
  	bool		ext_member;		/* true if object is member of extension */
  	DumpId	   *dependencies;	/* dumpIds of objects this one depends on */
  	int			nDeps;			/* number of valid dependencies */
--- 129,134 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-02-06 Thread Andrew Dunstan



On 01/31/2012 11:10 PM, Andrew Dunstan wrote:



Here's a possible patch for the exclude-table-data problem along the 
lines you suggest.



Should I apply this?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-02-02 Thread hubert depesz lubaczewski
On Wed, Feb 01, 2012 at 10:02:14PM +0100, Dimitri Fontaine wrote:
 The case for a table that is partly user data and partly extension data
 is very thin, I think that if I had this need I would use inheritance
 and a CHECK(user_data is true/false) constraint to filter the data.

definitely agree. i.e. i don't really see a case when we'd have data
from both extension, and normal usage, in the same table.
and the overhead of tracking source of data seems to be excessive.

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-02-01 Thread Dimitri Fontaine
Hi,

Sorry to be late in the thread, I'm too busy right now.  Cédric called
it to my immediate attention though.

Martijn van Oosterhout klep...@svana.org writes:
 Perhaps a better way of dealing with this is providing a way of dumping
 extensions explicitly. Then you could say:

 pg_dump --extension=postgis -s

That's something I'm working on in this commit fest under the “inline
extensions” topic, and we should have that facility in 9.2 baring major
obstacles (consensus is made).

Tom Lane t...@sss.pgh.pa.us writes:
 On Mon, Jan 30, 2012 at 11:18 PM, Tom Lanet...@sss.pgh.pa.us  wrote:
 What's not apparent to me is whether there's an argument for doing more
 than that.  It strikes me that the current design is not very friendly
 towards the idea of an extension that creates a table that's meant
 solely to hold user data --- you'd have to mark it as config which
 seems a bit unfortunate terminology for that case.  Is it important to
 do something about that, and if so what?

 My thought exactly --- maybe it's only a minor cosmetic issue that will
 affect few people, or maybe this will someday be a major use-case.
 I don't know.  I was hoping Dimitri had an opinion.

So, being able to stuff data into an extension has been made possible to
address two use cases:

 - postgis
 - (sql only) data extensions

The former is very specific and as we didn't hear back from them I guess
we addressed it well enough, the latter is still WIP. It's about being
able to ship data as an extension (think timezone updates, geo ip, bank
cards database, exchange rates, etc). You need to be able to easily ship
those (CSV isn't the best we can do here, as generally it doesn't
include the schema nor the COPY recipe that can be non-trivial) and to
easily update those.

The case for a table that is partly user data and partly extension data
is very thin, I think that if I had this need I would use inheritance
and a CHECK(user_data is true/false) constraint to filter the data.

So I sure would appreciate being able to call that data rather than
config, and to mark any table at once. If that doesn't need any pg_dump
stretching I think providing that in 9.2 would be great.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-01-31 Thread hubert depesz lubaczewski
On Mon, Jan 30, 2012 at 11:18:31PM -0500, Tom Lane wrote:
 I don't recall that we thought very hard about what should happen when
 pg_dump switches are used to produce a selective dump, but ISTM
 reasonable that if it's user data then it should be dumped only if
 data in a regular user table would be.  So I agree it's pretty broken
 that pg_dump -t foo will dump data belonging to a config table not
 selected by the -t switch.  I think this should be changed in both HEAD
 and 9.1 (note that HEAD will presumably return to the 9.1 behavior once
 that --exclude-table-data patch gets fixed).
 
 What's not apparent to me is whether there's an argument for doing more
 than that.  It strikes me that the current design is not very friendly
 towards the idea of an extension that creates a table that's meant
 solely to hold user data --- you'd have to mark it as config which
 seems a bit unfortunate terminology for that case.  Is it important to
 do something about that, and if so what?

Currently we are migrating away from using extensions. But - recently
on planet.postgresql.org there were some (more than 2) posts about
schema versioning.
EXTENSIONS, with their versions, upgrades, dependency tracking, would be
*perfect* for storing application structures, if:
1. we could use them from arbitrary location, and not only
   install-root/share/postgresql/extension/ which usually shouldn't be
   writtable by users
2. they do not interfere with pg_dump

2nd point means that I still need to be able to get:
1. full database schema dump - which can use create extension
2. single table schema dump - which, in my opinion, should use create
   table, and only schema of requested table(s) should be shown, no
   schema or data for other tables should be dumped.
3. full database data dump
4. single table data dump - in which case neither structure, nor data of
   other tables (than requested) should be emitted.

personally, I think that the marking of extension tables should be
reversed - by default they should normally dump data - just like any
other table. Just in case of some specific tables you'd mark them with
do not dump data by default which would exclude their data from normal
database wide pg_dump.

that's how I envision working extensions, and that's how I'd like them
to work. of course your needs/opinions can be different - especially in
case when we consider extensions to be only a tool to simplify
dump/restore of contrib modules.

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-01-31 Thread Robert Haas
On Mon, Jan 30, 2012 at 11:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I don't recall that we thought very hard about what should happen when
 pg_dump switches are used to produce a selective dump, but ISTM
 reasonable that if it's user data then it should be dumped only if
 data in a regular user table would be.

Yep.

 What's not apparent to me is whether there's an argument for doing more
 than that.  It strikes me that the current design is not very friendly
 towards the idea of an extension that creates a table that's meant
 solely to hold user data --- you'd have to mark it as config which
 seems a bit unfortunate terminology for that case.  Is it important to
 do something about that, and if so what?

Is this anything more than a naming problem?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-01-31 Thread Andrew Dunstan



On 01/30/2012 11:18 PM, Tom Lane wrote:

[ example showing pg_dump's odd behavior for extension config tables ]
[ traces through that with gdb... ]

As I suspected, the behavioral change from 9.1 to HEAD is not
intentional.  It is an artifact of commit
7b070e896ca835318c90b02c830a5c4844413b64, which is almost, but not
quite, entirely broken.  I won't enumerate its shortcomings here,
because they're not really relevant, but it does seem appropriate to
discuss exactly what we think *should* happen for tables created inside
extensions.

I'm perplexed about what you thing the patch does wrong or how it affects this. 
If I've broken something I'd like to know how, exactly, so I have a chance to 
fix it.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-01-31 Thread Martijn van Oosterhout
On Mon, Jan 30, 2012 at 11:18:31PM -0500, Tom Lane wrote:
 I don't recall that we thought very hard about what should happen when
 pg_dump switches are used to produce a selective dump, but ISTM
 reasonable that if it's user data then it should be dumped only if
 data in a regular user table would be.  So I agree it's pretty broken
 that pg_dump -t foo will dump data belonging to a config table not
 selected by the -t switch.  I think this should be changed in both HEAD
 and 9.1 (note that HEAD will presumably return to the 9.1 behavior once
 that --exclude-table-data patch gets fixed).

Perhaps a better way of dealing with this is providing a way of dumping
extensions explicitly. Then you could say:

pg_dump --extension=postgis -s

to get the data. And you can use all the normal pg_dump options for
controlling the output. The flag currently used to seperate the table
schema from the table content could then interact logically. Another
way perhaps:

pg_dump --extension-postgis=data-only
pg_dump --extension-postgis=schema
pg_dump --extension-postgis=all
pg_dump --extension-postgis=none

The last being the default.

Just throwing out some completely different ideas.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-01-31 Thread Andrew Dunstan



On 01/31/2012 03:02 PM, Martijn van Oosterhout wrote:

On Mon, Jan 30, 2012 at 11:18:31PM -0500, Tom Lane wrote:

I don't recall that we thought very hard about what should happen when
pg_dump switches are used to produce a selective dump, but ISTM
reasonable that if it's user data then it should be dumped only if
data in a regular user table would be.  So I agree it's pretty broken
that pg_dump -t foo will dump data belonging to a config table not
selected by the -t switch.  I think this should be changed in both HEAD
and 9.1 (note that HEAD will presumably return to the 9.1 behavior once
that --exclude-table-data patch gets fixed).

Perhaps a better way of dealing with this is providing a way of dumping
extensions explicitly. Then you could say:

pg_dump --extension=postgis -s

to get the data. And you can use all the normal pg_dump options for
controlling the output. The flag currently used to seperate the table
schema from the table content could then interact logically. Another
way perhaps:

pg_dump --extension-postgis=data-only
pg_dump --extension-postgis=schema
pg_dump --extension-postgis=all
pg_dump --extension-postgis=none



This one won't fly. We use option processing that requires that the 
option names to be known at compile time, so you can't embed an 
arbitrary extension name in there.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-01-31 Thread Adrian Klaver

On 01/31/2012 04:36 AM, Robert Haas wrote:

On Mon, Jan 30, 2012 at 11:18 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

I don't recall that we thought very hard about what should happen when
pg_dump switches are used to produce a selective dump, but ISTM
reasonable that if it's user data then it should be dumped only if
data in a regular user table would be.


Yep.


What's not apparent to me is whether there's an argument for doing more
than that.  It strikes me that the current design is not very friendly
towards the idea of an extension that creates a table that's meant
solely to hold user data --- you'd have to mark it as config which
seems a bit unfortunate terminology for that case.  Is it important to
do something about that, and if so what?


Is this anything more than a naming problem?


Seems to me that would be dependent on what the future plans are for the 
extension mechanism. There is also the issue of backward compatibility 
for those people that are using configuration tables in their extensions 
and would like to maintain that separation. I could see adding another 
function that is similar and would be used to identify strictly user 
data tables.




--
Adrian Klaver
adrian.kla...@gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-01-31 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 01/30/2012 11:18 PM, Tom Lane wrote:
 As I suspected, the behavioral change from 9.1 to HEAD is not
 intentional.  It is an artifact of commit
 7b070e896ca835318c90b02c830a5c4844413b64, which is almost, but not
 quite, entirely broken.  I won't enumerate its shortcomings here,

 I'm perplexed about what you thing the patch does wrong or how it affects 
 this. If I've broken something I'd like to know how, exactly, so I have a 
 chance to fix it.

Well, it adds a new field to all instances of DumpableObject and then
leaves it uninitialized in most cases, which is bad style (and unlike in
the backend, there is no forced zeroing to ensure a consistent value);
but the proximate cause of the bug is that you put the filtering in the
wrong place.  The way this is supposed to work, or at least used to
work, is that dump-the-data-or-not is determined by whether
a TableDataInfo DumpableObject gets created --- see the callers of
makeTableDataInfo.  You didn't follow that convention but instead
inserted an extra filter test in dumpTableData.  The reason that
depesz's example is not dumping the unwanted config table is that the
code path in which getExtensionMembership calls makeTableDataInfo isn't
ever setting the dumpdata flag.  Unfortunately that means that config
table data won't get dumped when it *is* wanted, either.  Or worse,
it means that the data might or might not get dumped depending on
whether the pg_malloc in makeTableDataInfo is allocating new or recycled
memory and what happens to be in that memory in the latter case.

Now I'm not entirely sure that we ought to preserve the old code
structure as-is; it might be more future-proof if we always made
TableDataInfo objects and then used their dump flags to control whether
to dump them.  (The main potential advantage of that is that we'd deal
more sanely with dependency chains linking through TableDataInfo
objects; but I'm not sure there are any such at the moment.)  But what
we've got at the moment is a mess.  In any case I think we'd be better
off without the separate dumpdata field: if we need a flag we should use
the dump flag of the TableDataInfo object.

As far as depesz's actual complaint is concerned, I think the core of
the problem is that getExtensionMembership is unconditionally asking for
a config table's data to be dumped, without any consideration of whether
pg_dump switches ought to filter that.  I'm unsure whether we should
just add more logic there, or instead put the policy logic into
makeTableDataInfo.  The latter might result in less duplicative code,
but would require more rethinking of getTableData() --- which conditions
tested there ought to move into makeTableDataInfo?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-01-31 Thread Tom Lane
Adrian Klaver adrian.kla...@gmail.com writes:
 On 01/31/2012 04:36 AM, Robert Haas wrote:
 On Mon, Jan 30, 2012 at 11:18 PM, Tom Lanet...@sss.pgh.pa.us  wrote:
 What's not apparent to me is whether there's an argument for doing more
 than that.  It strikes me that the current design is not very friendly
 towards the idea of an extension that creates a table that's meant
 solely to hold user data --- you'd have to mark it as config which
 seems a bit unfortunate terminology for that case.  Is it important to
 do something about that, and if so what?

 Is this anything more than a naming problem?

 Seems to me that would be dependent on what the future plans are for the 
 extension mechanism.

My thought exactly --- maybe it's only a minor cosmetic issue that will
affect few people, or maybe this will someday be a major use-case.
I don't know.  I was hoping Dimitri had an opinion.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-01-31 Thread Andrew Dunstan



On 01/31/2012 05:48 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 01/30/2012 11:18 PM, Tom Lane wrote:

As I suspected, the behavioral change from 9.1 to HEAD is not
intentional.  It is an artifact of commit
7b070e896ca835318c90b02c830a5c4844413b64, which is almost, but not
quite, entirely broken.  I won't enumerate its shortcomings here,

I'm perplexed about what you thing the patch does wrong or how it affects this. 
If I've broken something I'd like to know how, exactly, so I have a chance to 
fix it.

Well, it adds a new field to all instances of DumpableObject and then
leaves it uninitialized in most cases, which is bad style (and unlike in
the backend, there is no forced zeroing to ensure a consistent value);
but the proximate cause of the bug is that you put the filtering in the
wrong place.  The way this is supposed to work, or at least used to
work, is that dump-the-data-or-not is determined by whether
a TableDataInfo DumpableObject gets created --- see the callers of
makeTableDataInfo.  You didn't follow that convention but instead
inserted an extra filter test in dumpTableData.  The reason that
depesz's example is not dumping the unwanted config table is that the
code path in which getExtensionMembership calls makeTableDataInfo isn't
ever setting the dumpdata flag.  Unfortunately that means that config
table data won't get dumped when it *is* wanted, either.  Or worse,
it means that the data might or might not get dumped depending on
whether the pg_malloc in makeTableDataInfo is allocating new or recycled
memory and what happens to be in that memory in the latter case.

Now I'm not entirely sure that we ought to preserve the old code
structure as-is; it might be more future-proof if we always made
TableDataInfo objects and then used their dump flags to control whether
to dump them.  (The main potential advantage of that is that we'd deal
more sanely with dependency chains linking through TableDataInfo
objects; but I'm not sure there are any such at the moment.)  But what
we've got at the moment is a mess.  In any case I think we'd be better
off without the separate dumpdata field: if we need a flag we should use
the dump flag of the TableDataInfo object.

As far as depesz's actual complaint is concerned, I think the core of
the problem is that getExtensionMembership is unconditionally asking for
a config table's data to be dumped, without any consideration of whether
pg_dump switches ought to filter that.  I'm unsure whether we should
just add more logic there, or instead put the policy logic into
makeTableDataInfo.  The latter might result in less duplicative code,
but would require more rethinking of getTableData() --- which conditions
tested there ought to move into makeTableDataInfo?





Here's a possible patch for the exclude-table-data problem along the 
lines you suggest.


cheers

andrew


*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***
*** 1133,1146  selectDumpableTable(TableInfo *tbinfo)
  			   tbinfo-dobj.catId.oid))
  		tbinfo-dobj.dump = false;
  
- 	/* If table is to be dumped, check that the data is not excluded */
- 	if (tbinfo-dobj.dump  !
- 		simple_oid_list_member(tabledata_exclude_oids,
- 			   tbinfo-dobj.catId.oid))
- 		tbinfo-dobj.dumpdata = true;
- 	else
- 		tbinfo-dobj.dumpdata = false;
- 
  }
  
  /*
--- 1133,1138 
***
*** 1581,1587  dumpTableData(Archive *fout, TableDataInfo *tdinfo)
  	char	   *copyStmt;
  
  	/* don't do anything if the data isn't wanted */
! 	if (!tbinfo-dobj.dumpdata)
  		return;
  
  	if (!dump_inserts)
--- 1573,1579 
  	char	   *copyStmt;
  
  	/* don't do anything if the data isn't wanted */
! 	if (!tdinfo-dobj.dump)
  		return;
  
  	if (!dump_inserts)
***
*** 1670,1675  makeTableDataInfo(TableInfo *tbinfo, bool oids)
--- 1662,1674 
  	tdinfo-filtercond = NULL;	/* might get set later */
  	addObjectDependency(tdinfo-dobj, tbinfo-dobj.dumpId);
  
+ 	/* Check that the data is not explicitly excluded */
+ 	if (! simple_oid_list_member(tabledata_exclude_oids,
+ 			   tbinfo-dobj.catId.oid))
+ 		tdinfo-dobj.dump = true;
+ 	else
+ 		tdinfo-dobj.dump = false;
+ 
  	tbinfo-dataObj = tdinfo;
  }
  
*** a/src/bin/pg_dump/pg_dump.h
--- b/src/bin/pg_dump/pg_dump.h
***
*** 129,135  typedef struct _dumpableObject
  	char	   *name;			/* object name (should never be NULL) */
  	struct _namespaceInfo *namespace;	/* containing namespace, or NULL */
  	bool		dump;			/* true if we want to dump this object */
- 	booldumpdata;   /* true if we want data for this object */
  	bool		ext_member;		/* true if object is member of extension */
  	DumpId	   *dependencies;	/* dumpIds of objects this one depends on */
  	int			nDeps;			/* number of valid dependencies */
--- 129,134 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] [GENERAL] pg_dump -s dumps data?!

2012-01-30 Thread Tom Lane
[ Note: please follow-up to pgsql-hackers not pgsql-general; I think
  this discussion needs to move there ]

hubert depesz lubaczewski dep...@depesz.com writes:
 On Mon, Jan 30, 2012 at 11:30:51AM -0500, Tom Lane wrote:
 That is way too vague for my taste, as you have not shown the pg_dump
 options you're using, for example.

 i tried to explain that the options don't matter, but here we go. full
 example:
 [ example showing pg_dump's odd behavior for extension config tables ]

[ traces through that with gdb... ]

As I suspected, the behavioral change from 9.1 to HEAD is not
intentional.  It is an artifact of commit
7b070e896ca835318c90b02c830a5c4844413b64, which is almost, but not
quite, entirely broken.  I won't enumerate its shortcomings here,
because they're not really relevant, but it does seem appropriate to
discuss exactly what we think *should* happen for tables created inside
extensions.

The original design intention for non-config tables was, per the manual:

Ordinarily, if a table is part of an extension, neither the
table's definition nor its content will be dumped by pg_dump.

the assumption being that both the definition and the content would be
re-loaded by executing the extension's SQL script.  The purpose of
pg_extension_config_dump() is to allow part or all of the data in the
table to be treated as user data and thus dumped; this is assumed to
be data not supplied by the extension script but by subsequent user
insertions.

I don't recall that we thought very hard about what should happen when
pg_dump switches are used to produce a selective dump, but ISTM
reasonable that if it's user data then it should be dumped only if
data in a regular user table would be.  So I agree it's pretty broken
that pg_dump -t foo will dump data belonging to a config table not
selected by the -t switch.  I think this should be changed in both HEAD
and 9.1 (note that HEAD will presumably return to the 9.1 behavior once
that --exclude-table-data patch gets fixed).

What's not apparent to me is whether there's an argument for doing more
than that.  It strikes me that the current design is not very friendly
towards the idea of an extension that creates a table that's meant
solely to hold user data --- you'd have to mark it as config which
seems a bit unfortunate terminology for that case.  Is it important to
do something about that, and if so what?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers