On 09/05/2012 03:50 PM, Andrew Dunstan wrote:

On 09/05/2012 03:40 PM, Bruce Momjian wrote:
On Wed, Sep  5, 2012 at 03:17:40PM -0400, Andrew Dunstan wrote:
The PG_BINARY_W change has only been verified on a non-buildfarm
setup on my laptop (Mingw)

Note that while it does look like there's a bug either in
pg_upgrade or pg_dumpall, it's probably mostly harmless (adding
some spurious CRs to function code bodies on Windows). I'd feel
happier if it didn't, and happier still if I knew for sure the
ultimate origin. Your pg_dumpall discovery above is interesting. I
might have time later on today to delve into all this. I'm out of
contact for the next few hours.

OK, I now have a complete handle on what's going on here, and
withdraw my earlier statement that I am confused on this issue :-)

First, one lot of CRs is produced because the pg_upgrade test script
calls pg_dumpall without -f and redirects that to a file, which
Windows kindly opens on text mode. The solution to that is to change
the test script to use pg_dumpall -f instead.

The second lot of CRs (seen in the second dump file in the diff i
previously sent) is produced by pg_upgrade writing its output in
text mode, which turns LF into CRLF. The solution to that is the
patch to dump.c I posted, which, as Bruce observed, does the same
thing that pg_dumpall does. Arguably, it should also open the input
file in binary, so that if there really is a CRLF in the dump it
won't be eaten.
So, right now we are only add \r for function bodies, which is mostly
harmless, but what if a function body has strings with an embedded
newlines?  What about creating a table with newlines in its identifiers:

CREATE TABLE "a
b" ("c
d" int);

If \r is added in there, it would be a data corruption problem. Can you
test that?

These are among the reasons why I am suggesting opening the file in binary mode. You're right, that would be data corruption.

I can set up a check, but it will take a bit of time.


As expected, we get a difference in field names. Here's the extract from the dumps diff (* again represents CR):


     ***************
   *** 5220,5228 ****
      --

      CREATE TABLE hasnewline (
   !     "x
      y" integer,
   !     "a
      b" text
      );

   --- 5220,5228 ----
      --

      CREATE TABLE hasnewline (
   !     "x*
      y" integer,
   !     "a*
      b" text
      );

If we open the input and output files in binary mode in pg_upgrade's dump.c this disappears.

Given this, I think we have no choice but to apply the patch, all the way back to 9.0 in fact.

cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to