Re: some pg_dump query code simplification

2018-08-30 Thread Stephen Frost
Greetings,

* Andrew Dunstan (andrew.duns...@2ndquadrant.com) wrote:
> On 08/28/2018 06:05 PM, Tom Lane wrote:
> >Dunno about the idea of running the pg_dump TAP tests against back
> >branches.  I find that code sufficiently unreadable that maintaining
> >several more copies of it doesn't sound like fun at all.
> 
> Agreed. The code could do with a lot of comments. I recently looked at
> adding something to it and decided I had more pressing things to do.

I'm happy to add more comments to it..  There's a pretty good block that
tries to describe how the tests work above where the tests are actually
defined.  What would help?  Should I include an example in that code
block?  Or are you looking for more comments in the actual test
definitions about what they're covering or why they're included?  Or are
you interested in comments about the actual code down at the bottom
which runs the tests..?

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: some pg_dump query code simplification

2018-08-30 Thread Stephen Frost
Greetings,

* Andrew Dunstan (andrew.duns...@2ndquadrant.com) wrote:
> On 08/28/2018 06:10 PM, Stephen Frost wrote:
> >>Andrew has a buildfarm module that does precisely that, although
> >>I'm not sure what its test dataset is --- probably the regression
> >>database from each branch.  I also have a habit of doing such testing
> >>manually whenever I touch version-sensitive parts of pg_dump.
> >I've gotten better about doing that back-branch testing myself and
> >certainly prefer it, but I think we should also have buildfarm coverage.
> >I don't think we have the full matrix covered, or, really, anything
> >anywhere near it, so I'm looking for other options to at least get that
> >code exercised.
> 
> If by matrix you mean the version matrix, then the module in question covers
> every live branch as the source and every same or later live branch as the
> destination.

That's certainly better than I had thought it did.

> It could conceivably cover older branches as well - the branch names aren't
> actually hardcoded - I'd just have to be able to build them and do a
> standard buildfarm run once.

Having it cover older branches really is pretty important given how much
code there is for those older branches in pg_dump.  I've had pretty good
success compiling back down to about 7.2, as I recall, using patches
that Greg Stark made available.  Older is probably possible too.
Getting everything we claim to support covered would be fantastic (which
I believe is basically 9.6/9.5/9.4/9.3 against 7.0 and up, and
master/11/10 against 8.0 and up).

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: some pg_dump query code simplification

2018-08-29 Thread Tom Lane
Peter Eisentraut  writes:
> OK, updated patch attached.  If the updated style is acceptable, I'll
> start running more extensive tests against the older branches.

Looks sane to me.

regards, tom lane



Re: some pg_dump query code simplification

2018-08-29 Thread Peter Eisentraut
On 28/08/2018 23:43, Tom Lane wrote:
> I think I had this discussion already with somebody, but ... I do not
> like this style at all:
> 
> tbinfo->attidentity[j] = (i_attidentity >= 0 ? *(PQgetvalue(res, 
> j, i_attidentity)) : '\0');

OK, updated patch attached.  If the updated style is acceptable, I'll
start running more extensive tests against the older branches.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 1a51258aa317f8068c18f034906c3536d570b354 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 29 Aug 2018 16:45:32 +0200
Subject: [PATCH v2] pg_dump: Reorganize getTableAttrs()

Instead of repeating the almost same large query in each version branch,
use one query and add a few columns to the SELECT list depending on the
version.  This saves a lot of duplication.
---
 src/bin/pg_dump/pg_dump.c | 198 --
 1 file changed, 62 insertions(+), 136 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d833c41147..f0ea83e6a9 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -8162,150 +8162,77 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int 
numTables)
 
resetPQExpBuffer(q);
 
+   appendPQExpBuffer(q,
+ "SELECT\n"
+ "a.attnum,\n"
+ "a.attname,\n"
+ "a.atttypmod,\n"
+ "a.attstattarget,\n"
+ "a.attstorage,\n"
+ "t.typstorage,\n"
+ "a.attnotnull,\n"
+ "a.atthasdef,\n"
+ "a.attisdropped,\n"
+ "a.attlen,\n"
+ "a.attalign,\n"
+ "a.attislocal,\n"
+ 
"pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname,\n");
+
if (fout->remoteVersion >= 11)
-   {
-   /* atthasmissing and attmissingval are new in 11 */
-   appendPQExpBuffer(q, "SELECT a.attnum, a.attname, 
a.atttypmod, "
- "a.attstattarget, 
a.attstorage, t.typstorage, "
- "a.attnotnull, 
a.atthasdef, a.attisdropped, "
- "a.attlen, 
a.attalign, a.attislocal, "
- 
"pg_catalog.format_type(t.oid,a.atttypmod) AS atttypname, "
- 
"array_to_string(a.attoptions, ', ') AS attoptions, "
- "CASE WHEN 
a.attcollation <> t.typcollation "
- "THEN a.attcollation 
ELSE 0 END AS attcollation, "
- "a.attidentity, "
- 
"pg_catalog.array_to_string(ARRAY("
- "SELECT 
pg_catalog.quote_ident(option_name) || "
- "' ' || 
pg_catalog.quote_literal(option_value) "
- "FROM 
pg_catalog.pg_options_to_table(attfdwoptions) "
- "ORDER BY option_name"
- "), E',\n') AS 
attfdwoptions ,"
+   appendPQExpBuffer(q,
  "CASE WHEN 
a.atthasmissing AND NOT a.attisdropped "
- "THEN a.attmissingval 
ELSE null END AS attmissingval "
- "FROM 
pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_type t "
- "ON a.atttypid = 
t.oid "
- "WHERE a.attrelid = 
'%u'::pg_catalog.oid "
- "AND a.attnum > 
0::pg_catalog.int2 "
- "ORDER BY a.attnum",
- 
tbinfo->dobj.catId.oid);
-   }
-   else if (fout->remoteVersion >= 10)
-   {
-   /*
-* 

Re: some pg_dump query code simplification

2018-08-29 Thread Andrew Dunstan




On 08/28/2018 06:10 PM, Stephen Frost wrote:



Andrew has a buildfarm module that does precisely that, although
I'm not sure what its test dataset is --- probably the regression
database from each branch.  I also have a habit of doing such testing
manually whenever I touch version-sensitive parts of pg_dump.

I've gotten better about doing that back-branch testing myself and
certainly prefer it, but I think we should also have buildfarm coverage.
I don't think we have the full matrix covered, or, really, anything
anywhere near it, so I'm looking for other options to at least get that
code exercised.





If by matrix you mean the version matrix, then the module in question 
covers every live branch as the source and every same or later live 
branch as the destination.


It could conceivably cover older branches as well - the branch names 
aren't actually hardcoded - I'd just have to be able to build them and 
do a standard buildfarm run once.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: some pg_dump query code simplification

2018-08-29 Thread Andrew Dunstan




On 08/28/2018 06:05 PM, Tom Lane wrote:

Stephen Frost  writes:

I wonder- what if we had an option to pg_dump to explicitly tell it what
the server's version is and then have TAP tests to run with different
versions?

Uh ... telling it what the version is doesn't make that true, so I'd
have no confidence in a test^H^H^H^Hkluge done that way.  The way
to test is to point it at an *actual* back-branch server.

Andrew has a buildfarm module that does precisely that, although
I'm not sure what its test dataset is --- probably the regression
database from each branch.  I also have a habit of doing such testing
manually whenever I touch version-sensitive parts of pg_dump.



It's all the databases from a buildfarm run apart from TAP tests. Since 
it uses USE_MODULE_DB=1 it covers most of the contrib modules plus the 
standard regression db, as well as isolation test and pl_regression 
(which should be taught to do separate DBs for each PL if requested).



There is no reason it couldn't test more.



Dunno about the idea of running the pg_dump TAP tests against back
branches.  I find that code sufficiently unreadable that maintaining
several more copies of it doesn't sound like fun at all.





Agreed. The code could do with a lot of comments. I recently looked at 
adding something to it and decided I had more pressing things to do.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: some pg_dump query code simplification

2018-08-28 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> Stephen Frost  writes:
> >>> I wonder- what if we had an option to pg_dump to explicitly tell it what
> >>> the server's version is and then have TAP tests to run with different
> >>> versions?
> 
> >> Uh ... telling it what the version is doesn't make that true, so I'd
> >> have no confidence in a test^H^H^H^Hkluge done that way.  The way
> >> to test is to point it at an *actual* back-branch server.
> 
> > I certainly agree that this would be ideal, but nonetheless, I've seen
> > multiple cases where just trying to run the query, even against a
> > current version, would have shown that it's malformed or has some issue
> > which needs fixing and today we haven't even got that.
> 
> Yeah, but cases where we need to touch column C in one version and column
> D in another can't be made to pass when pg_dump is operating under a false
> assumption about the server version.  So this seems like a nonstarter.

Yes, there are cases that wouldn't work.

I'd be much happier with a complete solution, were that forthcoming.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: some pg_dump query code simplification

2018-08-28 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Stephen Frost  writes:
>>> I wonder- what if we had an option to pg_dump to explicitly tell it what
>>> the server's version is and then have TAP tests to run with different
>>> versions?

>> Uh ... telling it what the version is doesn't make that true, so I'd
>> have no confidence in a test^H^H^H^Hkluge done that way.  The way
>> to test is to point it at an *actual* back-branch server.

> I certainly agree that this would be ideal, but nonetheless, I've seen
> multiple cases where just trying to run the query, even against a
> current version, would have shown that it's malformed or has some issue
> which needs fixing and today we haven't even got that.

Yeah, but cases where we need to touch column C in one version and column
D in another can't be made to pass when pg_dump is operating under a false
assumption about the server version.  So this seems like a nonstarter.

regards, tom lane



Re: some pg_dump query code simplification

2018-08-28 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > I wonder- what if we had an option to pg_dump to explicitly tell it what
> > the server's version is and then have TAP tests to run with different
> > versions?
> 
> Uh ... telling it what the version is doesn't make that true, so I'd
> have no confidence in a test^H^H^H^Hkluge done that way.  The way
> to test is to point it at an *actual* back-branch server.

I certainly agree that this would be ideal, but nonetheless, I've seen
multiple cases where just trying to run the query, even against a
current version, would have shown that it's malformed or has some issue
which needs fixing and today we haven't even got that.

> Andrew has a buildfarm module that does precisely that, although
> I'm not sure what its test dataset is --- probably the regression
> database from each branch.  I also have a habit of doing such testing
> manually whenever I touch version-sensitive parts of pg_dump.

I've gotten better about doing that back-branch testing myself and
certainly prefer it, but I think we should also have buildfarm coverage.
I don't think we have the full matrix covered, or, really, anything
anywhere near it, so I'm looking for other options to at least get that
code exercised.

> Dunno about the idea of running the pg_dump TAP tests against back
> branches.  I find that code sufficiently unreadable that maintaining
> several more copies of it doesn't sound like fun at all.

I hadn't been thinking we'd need more copies of it, simply a few more
runs which have different version values specified, though even just
doing a pg_dump of the regression suite for each combination would be
something.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: some pg_dump query code simplification

2018-08-28 Thread Tom Lane
Stephen Frost  writes:
> I wonder- what if we had an option to pg_dump to explicitly tell it what
> the server's version is and then have TAP tests to run with different
> versions?

Uh ... telling it what the version is doesn't make that true, so I'd
have no confidence in a test^H^H^H^Hkluge done that way.  The way
to test is to point it at an *actual* back-branch server.

Andrew has a buildfarm module that does precisely that, although
I'm not sure what its test dataset is --- probably the regression
database from each branch.  I also have a habit of doing such testing
manually whenever I touch version-sensitive parts of pg_dump.

Dunno about the idea of running the pg_dump TAP tests against back
branches.  I find that code sufficiently unreadable that maintaining
several more copies of it doesn't sound like fun at all.

regards, tom lane



Re: some pg_dump query code simplification

2018-08-28 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Peter Eisentraut  writes:
> > There is a lot of version dependent code in pg_dump now, especially
> > per-version queries.  The way they were originally built was that we
> > just have an entirely separate query per version.  But as the number of
> > versions and the size of the query grows, this becomes unwieldy.
> 
> Agreed, it's becoming a bit of a mess.
> 
> > So I tried, as an example, to write the queries in getTableAttrs()
> > differently.  Instead of repeating the almost same large query in each
> > version branch, use one query and add a few columns to the SELECT list
> > depending on the version.  This saves a lot of duplication and is easier
> > to deal with.
> 
> I think I had this discussion already with somebody, but ... I do not
> like this style at all:
> 
> tbinfo->attidentity[j] = (i_attidentity >= 0 ? *(PQgetvalue(res, 
> j, i_attidentity)) : '\0');
> 
> It's basically throwing away all the cross-checking that exists now to
> ensure that you didn't forget to deal with a column, misspell the column's
> AS name in one place or the other, etc etc.  The code could be completely
> wrong and it'd still fail silently, producing a default result that might
> well be the right answer, making it hard to debug.  I think the way to do
> this is to continue to require the query to produce the same set of
> columns regardless of server version.  So the version-dependent bits would
> look like, eg,
> 
> if (fout->remoteVersion >= 11)
> appendPQExpBufferStr(q,
>  "CASE WHEN a.atthasmissing AND NOT 
> a.attisdropped "
>  "THEN a.attmissingval ELSE null END AS 
> attmissingval,\n");
> else
> appendPQExpBufferStr(q,
>  "null AS attmissingval,\n");
> 
> and the data extraction code would remain the same as now.
> 
> Other than that nit, I agree that this shows a lot of promise.

+1 to Tom's thoughts on this- seems like a much better approach.

Overall, I definitely agree that it'd be nice to reduce the amount of
duplication happening today.  Working out some way to actually test all
of those queries would be awful nice too.

I wonder- what if we had an option to pg_dump to explicitly tell it what
the server's version is and then have TAP tests to run with different
versions?  There may be some cases where that doesn't work, of course,
and we'd need to address that, but having those tests might reduce the
number of times we end up with something committed which doesn't work at
all against some older version of PG because it wasn't tested...

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: some pg_dump query code simplification

2018-08-28 Thread Tom Lane
Peter Eisentraut  writes:
> There is a lot of version dependent code in pg_dump now, especially
> per-version queries.  The way they were originally built was that we
> just have an entirely separate query per version.  But as the number of
> versions and the size of the query grows, this becomes unwieldy.

Agreed, it's becoming a bit of a mess.

> So I tried, as an example, to write the queries in getTableAttrs()
> differently.  Instead of repeating the almost same large query in each
> version branch, use one query and add a few columns to the SELECT list
> depending on the version.  This saves a lot of duplication and is easier
> to deal with.

I think I had this discussion already with somebody, but ... I do not
like this style at all:

tbinfo->attidentity[j] = (i_attidentity >= 0 ? *(PQgetvalue(res, j, 
i_attidentity)) : '\0');

It's basically throwing away all the cross-checking that exists now to
ensure that you didn't forget to deal with a column, misspell the column's
AS name in one place or the other, etc etc.  The code could be completely
wrong and it'd still fail silently, producing a default result that might
well be the right answer, making it hard to debug.  I think the way to do
this is to continue to require the query to produce the same set of
columns regardless of server version.  So the version-dependent bits would
look like, eg,

if (fout->remoteVersion >= 11)
appendPQExpBufferStr(q,
 "CASE WHEN a.atthasmissing AND NOT 
a.attisdropped "
 "THEN a.attmissingval ELSE null END AS 
attmissingval,\n");
else
appendPQExpBufferStr(q,
 "null AS attmissingval,\n");

and the data extraction code would remain the same as now.

Other than that nit, I agree that this shows a lot of promise.

regards, tom lane