Re: [HACKERS] pg_basebackup fails with long tablespace paths

2015-02-24 Thread Peter Eisentraut
On 2/2/15 8:58 AM, Robert Haas wrote:
 I think we should commit this, where by this I mean your patch to
 error-check the length of filenames and symlinks instead of truncating
 them.

done



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2015-02-24 Thread Peter Eisentraut
On 1/23/15 3:26 AM, Abhijit Menon-Sen wrote:
 At 2014-12-24 08:10:46 -0500, pete...@gmx.net wrote:

 As a demo for how this might look, attached is a wildly incomplete
 patch to produce GNU long-link headers.
 
 Hi Peter.
 
 In what way exactly is this patch wildly incomplete? (I ask because it's
 been added to the current CF).

This patch is not in the commit fest.  It's just the most recently
posted patch-like attachment in this thread, which confuses the new CP app.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2015-02-02 Thread Robert Haas
On Fri, Nov 7, 2014 at 9:03 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 11/4/14 3:52 PM, Peter Eisentraut wrote:
 Here are patches to address that.  First, it reports errors when
 attempting to create a tar header that would truncate file or symlink
 names.  Second, it works around the problem in the tests by creating a
 symlink from the short-name tempdir that we had set up for the
 Unix-socket directory case.

 I ended up splitting this up differently.  I applied to part of the
 second patch that works around the length issue in tablespaces.  So the
 tests now pass in 9.4 and up even in working directories with long
 names.  This clears up the regression in 9.4.

 The remaining, not applied patch is attached.  It errors when the file
 name is too long and adds tests for that.  This could be applied to 9.5
 and backpatched, if we so choose.  It might become obsolete if
 https://commitfest.postgresql.org/action/patch_view?id=1512 is accepted.
  If that patch doesn't get accepted, I might add my patch to a future
 commit fest.

I think we should commit this, where by this I mean your patch to
error-check the length of filenames and symlinks instead of truncating
them.  I don't know what will become of Amit's patch, but I think this
is a good idea anyway.  We should perhaps even consider back-patching
it, because silently eating people's data is generally not cool.  It's
possible that there are people out there who know that their filenames
and links are being truncated and don't care, and those people would
be unhappy to see this back-patched.  However, it's also possible that
there are people who don't know that this is happening and do care,
and those people would be happy about a back-patch.  I don't know
which group is larger.  At the least, I think we should apply it to
master; because whatever we end up doing about Amit's patch, adding
error checks for conditions where we're chewing up somebody's
filenames and spitting out what's left over has got to be a good
thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2015-01-23 Thread Abhijit Menon-Sen
At 2014-12-24 08:10:46 -0500, pete...@gmx.net wrote:

 As a demo for how this might look, attached is a wildly incomplete
 patch to produce GNU long-link headers.

Hi Peter.

In what way exactly is this patch wildly incomplete? (I ask because it's
been added to the current CF).

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2015-01-14 Thread Robert Haas
On Tue, Jan 13, 2015 at 4:41 PM, Peter Eisentraut pete...@gmx.net wrote:
 I think the key point I'm approaching is that the information should
 only ever be in one place, all the time.  This is not dissimilar from
 why we took the tablespace location out of the system catalogs.  Users
 might have all kinds of workflows for how they back up, restore, and
 move their tablespaces.  This works pretty well right now, because the
 authoritative configuration information is always in plain view.  The
 proposal is essentially that we add another location for this
 information, because the existing location is incompatible with some
 operating system tools.  And, when considered by a user, that second
 location might or might not collide with or overwrite the first location
 at some mysterious times.

 So I think the preferable fix is not to add a second location, but to
 make the first location compatible with said operating system tools,
 possibly in the way I propose above.

I see.  I'm a little concerned that following symlinks may be cheaper
than whatever system we would come up with for caching the
tablespace-name-to-file-name mappings.  But that concern might be
unfounded, and apart from it I have no reason to oppose your proposal,
if you want to do the work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2015-01-13 Thread Peter Eisentraut
On 1/7/15 3:19 PM, Robert Haas wrote:
 On Tue, Jan 6, 2015 at 4:33 PM, Peter Eisentraut pete...@gmx.net wrote:
 Currently, when you unpack a tarred basebackup with tablespaces, the
 symlinks will tell you whether you have unpacked the tablespace tars at
 the right place.  Otherwise, how do you know?  Secondly, you also have
 the option of putting the tablespaces somewhere else by changing the
 symlinks.
 
 That's a good argument for making the tablespace-map file
 human-readable and human-editable, but I don't think it's an argument
 for duplicating its contents inaccurately in the filesystem.
 
 One way to address this would be to do away with the symlinks altogether
 and have pg_tblspc/12345 be a text file that contains the tablespace
 location.  Kind of symlinks implemented in user space.
 
 Well, that's just spreading the tablespace-map file out into several
 files, and maybe keeping it around after we've restored from backup.

I think the key point I'm approaching is that the information should
only ever be in one place, all the time.  This is not dissimilar from
why we took the tablespace location out of the system catalogs.  Users
might have all kinds of workflows for how they back up, restore, and
move their tablespaces.  This works pretty well right now, because the
authoritative configuration information is always in plain view.  The
proposal is essentially that we add another location for this
information, because the existing location is incompatible with some
operating system tools.  And, when considered by a user, that second
location might or might not collide with or overwrite the first location
at some mysterious times.

So I think the preferable fix is not to add a second location, but to
make the first location compatible with said operating system tools,
possibly in the way I propose above.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2015-01-07 Thread Robert Haas
On Tue, Jan 6, 2015 at 4:33 PM, Peter Eisentraut pete...@gmx.net wrote:
 Currently, when you unpack a tarred basebackup with tablespaces, the
 symlinks will tell you whether you have unpacked the tablespace tars at
 the right place.  Otherwise, how do you know?  Secondly, you also have
 the option of putting the tablespaces somewhere else by changing the
 symlinks.

That's a good argument for making the tablespace-map file
human-readable and human-editable, but I don't think it's an argument
for duplicating its contents inaccurately in the filesystem.

 One way to address this would be to do away with the symlinks altogether
 and have pg_tblspc/12345 be a text file that contains the tablespace
 location.  Kind of symlinks implemented in user space.

Well, that's just spreading the tablespace-map file out into several
files, and maybe keeping it around after we've restored from backup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2015-01-06 Thread Peter Eisentraut
On 12/27/14 8:02 PM, Robert Haas wrote:
 On Wed, Dec 24, 2014 at 8:12 AM, Peter Eisentraut pete...@gmx.net wrote:
 On 12/22/14 10:00 PM, Amit Kapila wrote:
 There is already a patch [1] in this CF which will handle both cases, so
 I am
 not sure if it is very good idea to go with a new tar format to handle this
 issue.

 I think it would still make sense to have proper symlinks in the
 basebackup if possible, for clarity.
 
 I guess I would have assumed it would be more clear to omit the
 symlinks if we're expecting the server to put them in.  Otherwise, the
 server has to remove the existing symlinks and create new ones, which
 introduces various possibilities for failure and confusion.

Currently, when you unpack a tarred basebackup with tablespaces, the
symlinks will tell you whether you have unpacked the tablespace tars at
the right place.  Otherwise, how do you know?  Secondly, you also have
the option of putting the tablespaces somewhere else by changing the
symlinks.  Under the new scheme, the existing symlinks would be
overwritten (or not?).  If that is actually correct, then the proposed
fix doesn't really replicate the required functionality on Windows.

One way to address this would be to do away with the symlinks altogether
and have pg_tblspc/12345 be a text file that contains the tablespace
location.  Kind of symlinks implemented in user space.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2015-01-06 Thread Amit Kapila
On Wed, Jan 7, 2015 at 3:03 AM, Peter Eisentraut pete...@gmx.net wrote:

 On 12/27/14 8:02 PM, Robert Haas wrote:
  On Wed, Dec 24, 2014 at 8:12 AM, Peter Eisentraut pete...@gmx.net
wrote:
  On 12/22/14 10:00 PM, Amit Kapila wrote:
  There is already a patch [1] in this CF which will handle both cases,
so
  I am
  not sure if it is very good idea to go with a new tar format to
handle this
  issue.
 
  I think it would still make sense to have proper symlinks in the
  basebackup if possible, for clarity.
 
  I guess I would have assumed it would be more clear to omit the
  symlinks if we're expecting the server to put them in.  Otherwise, the
  server has to remove the existing symlinks and create new ones, which
  introduces various possibilities for failure and confusion.

 Currently, when you unpack a tarred basebackup with tablespaces, the
 symlinks will tell you whether you have unpacked the tablespace tars at
 the right place.  Otherwise, how do you know?

via some kind of tablespace map file which will tell us the exact
location where symlink need to be pointed and the same will be used
to create a symlink.  So after you unpack a tarred basebackup with
tablespaces, there will be no symlinks; when you start the server
(archive recovery) using base backup, it will create the appropriate
symlinks.

 Secondly, you also have
 the option of putting the tablespaces somewhere else by changing the
 symlinks.  Under the new scheme, the existing symlinks would be
 overwritten (or not?).  If that is actually correct, then the proposed
 fix doesn't really replicate the required functionality on Windows.

 One way to address this would be to do away with the symlinks altogether
 and have pg_tblspc/12345 be a text file that contains the tablespace
 location.  Kind of symlinks implemented in user space.


I think this is somewhat similar to what existing patch [1] does with
the different that there is just one file for all the tablespace locations
rather than individual file in each tablespace directory.


[1] : https://commitfest.postgresql.org/action/patch_view?id=1512
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-12-27 Thread Robert Haas
On Wed, Dec 24, 2014 at 8:12 AM, Peter Eisentraut pete...@gmx.net wrote:
 On 12/22/14 10:00 PM, Amit Kapila wrote:
 There is already a patch [1] in this CF which will handle both cases, so
 I am
 not sure if it is very good idea to go with a new tar format to handle this
 issue.

 I think it would still make sense to have proper symlinks in the
 basebackup if possible, for clarity.

I guess I would have assumed it would be more clear to omit the
symlinks if we're expecting the server to put them in.  Otherwise, the
server has to remove the existing symlinks and create new ones, which
introduces various possibilities for failure and confusion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-12-24 Thread Peter Eisentraut
On 12/22/14 5:40 PM, Oskari Saarenmaa wrote:
 I think we should just use the UStar tar format
 (http://en.wikipedia.org/wiki/Tar_%28computing%29#UStar_format) and
 allow long file names; all actively used tar implementations should be
 able to handle them.  I'll try to write a patch for that soonish.

UStar doesn't handle long link targets, only long file names (and then
only up to 255 characters, which doesn't seem satisfactory).

AFAICT, to allow long link targets, the available solutions are either
pax extended headers or GNU-specific long-link extra headers.

When I create a symlink with a long target and call tar on it, GNU tar
by default creates the GNU long-link header and BSD tar by default
creates a pax header.  But they are both able to extract either one.

As a demo for how this might look, attached is a wildly incomplete patch
to produce GNU long-link headers.
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index fbcecbb..f0cffcf 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1233,13 +1233,15 @@ static void
 _tarWriteHeader(const char *filename, const char *linktarget,
 struct stat * statbuf)
 {
-	char		h[512];
+	char		*h;
 
-	tarCreateHeader(h, filename, linktarget, statbuf-st_size,
+	h = tarCreateHeader(filename, linktarget, statbuf-st_size,
 	statbuf-st_mode, statbuf-st_uid, statbuf-st_gid,
 	statbuf-st_mtime);
 
 	pq_putmessage('d', h, 512);
+
+	free(h);
 }
 
 /*
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index e7c2939..490dd98 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -911,10 +911,10 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 
 			if (basetablespace  writerecoveryconf)
 			{
-char		header[512];
+char		*header;
 int			padding;
 
-tarCreateHeader(header, recovery.conf, NULL,
+header = tarCreateHeader(recovery.conf, NULL,
 recoveryconfcontents-len,
 0600, 04000, 02000,
 time(NULL));
@@ -925,6 +925,8 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
 WRITE_TAR_DATA(recoveryconfcontents-data, recoveryconfcontents-len);
 if (padding)
 	WRITE_TAR_DATA(zerobuf, padding);
+
+free(header);
 			}
 
 			/* 2 * 512 bytes empty data at end of file */
diff --git a/src/bin/pg_dump/pg_backup_tar.c b/src/bin/pg_dump/pg_backup_tar.c
index f48d369..60ae013 100644
--- a/src/bin/pg_dump/pg_backup_tar.c
+++ b/src/bin/pg_dump/pg_backup_tar.c
@@ -1300,11 +1300,13 @@ _tarGetHeader(ArchiveHandle *AH, TAR_MEMBER *th)
 static void
 _tarWriteHeader(TAR_MEMBER *th)
 {
-	char		h[512];
+	char		*h;
 
-	tarCreateHeader(h, th-targetFile, NULL, th-fileLen, 0600, 04000, 02000, time(NULL));
+	h = tarCreateHeader(th-targetFile, NULL, th-fileLen, 0600, 04000, 02000, time(NULL));
 
 	/* Now write the completed header. */
 	if (fwrite(h, 1, 512, th-tarFH) != 512)
 		WRITE_ERROR_EXIT;
+
+	free(h);
 }
diff --git a/src/include/pgtar.h b/src/include/pgtar.h
index 2dd6f6b..c3e0eb5 100644
--- a/src/include/pgtar.h
+++ b/src/include/pgtar.h
@@ -11,5 +11,5 @@
  *
  *-
  */
-extern void tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
+extern char * tarCreateHeader(const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
 extern int	tarChecksum(char *header);
diff --git a/src/port/tar.c b/src/port/tar.c
index 8ef4f9c..d8183d7 100644
--- a/src/port/tar.c
+++ b/src/port/tar.c
@@ -49,10 +49,26 @@ tarChecksum(char *header)
  * must always have space for 512 characters, which is a requirement by
  * the tar format.
  */
-void
-tarCreateHeader(char *h, const char *filename, const char *linktarget,
+char *
+tarCreateHeader(const char *filename, const char *linktarget,
 size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime)
 {
+	char *h;
+	char *ext_header = NULL;
+	char *ext_content = NULL;
+
+	if (0  linktarget  strlen(linktarget)  99)
+	{
+		ext_header = calloc(512, 1);
+		print_val(ext_header[124], 512, 8, 11);
+		sprintf(ext_header[156], K);
+		sprintf(ext_header[148], %06o , tarChecksum(ext_header));
+
+		ext_content = calloc(512, 1);
+		strcpy(ext_content, linktarget);
+	}
+	h = malloc(512);
+
 	/*
 	 * Note: most of the fields in a tar header are not supposed to be
 	 * null-terminated.  We use sprintf, which will write a null after the
@@ -141,4 +157,18 @@ tarCreateHeader(char *h, const char *filename, const char *linktarget,
 	 * 6 digits, a space, and a null, which is legal per POSIX.
 	 */
 	sprintf(h[148], %06o , tarChecksum(h));
+
+	if (ext_header || ext_content)
+	{
+		char *ret = malloc(3 * 512);
+		memcpy(ret, ext_header, 512);
+		memcpy(ret + 512, ext_content, 512);
+		memcpy(ret + 1024, h, 512);
+		free(ext_header);
+		

Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-12-24 Thread Peter Eisentraut
On 12/22/14 10:00 PM, Amit Kapila wrote:
 There is already a patch [1] in this CF which will handle both cases, so
 I am
 not sure if it is very good idea to go with a new tar format to handle this
 issue.   

I think it would still make sense to have proper symlinks in the
basebackup if possible, for clarity.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-12-23 Thread Oskari Saarenmaa
23.12.2014, 05:00, Amit Kapila kirjoitti:
 On Tue, Dec 23, 2014 at 4:10 AM, Oskari Saarenmaa wrote:
 08.11.2014, 04:03, Peter Eisentraut kirjoitti:
  It errors when the file
  name is too long and adds tests for that.  This could be applied to 9.5
  and backpatched, if we so choose.  It might become obsolete if
  https://commitfest.postgresql.org/action/patch_view?id=1512 is accepted.
   If that patch doesn't get accepted, I might add my patch to a future
  commit fest.

 I think we should just use the UStar tar format
 (http://en.wikipedia.org/wiki/Tar_%28computing%29#UStar_format) and
 allow long file names; all actively used tar implementations should be
 able to handle them.  I'll try to write a patch for that soonish.

 
 I think even using UStar format won't make it work for Windows where
 the standard utilities are not able to understand the symlinks in tar.
 There is already a patch [1] in this CF which will handle both cases, so
 I am
 not sure if it is very good idea to go with a new tar format to handle this
 issue.   
 
 [1] : https://commitfest.postgresql.org/action/patch_view?id=1512 

That patch makes sense for 9.5, but I don't think it's going to be
backpatched to previous releases?  I think we should also apply Peter's
patch to master and backbranches to avoid creating invalid tar files
anywhere.  And optionally implement and backpatch long filename support
in tar even if 9.5 no longer creates tar files with long names.

/ Oskari



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-12-22 Thread Oskari Saarenmaa
08.11.2014, 04:03, Peter Eisentraut kirjoitti:
 On 11/4/14 3:52 PM, Peter Eisentraut wrote:
  Here are patches to address that.  First, it reports errors when
  attempting to create a tar header that would truncate file or symlink
  names.  Second, it works around the problem in the tests by creating a
  symlink from the short-name tempdir that we had set up for the
  Unix-socket directory case.
 I ended up splitting this up differently.  I applied to part of the
 second patch that works around the length issue in tablespaces.  So the
 tests now pass in 9.4 and up even in working directories with long
 names.  This clears up the regression in 9.4.
 
 The remaining, not applied patch is attached.  It errors when the file
 name is too long and adds tests for that.  This could be applied to 9.5
 and backpatched, if we so choose.  It might become obsolete if
 https://commitfest.postgresql.org/action/patch_view?id=1512 is accepted.
  If that patch doesn't get accepted, I might add my patch to a future
 commit fest.

I think we should just use the UStar tar format
(http://en.wikipedia.org/wiki/Tar_%28computing%29#UStar_format) and
allow long file names; all actively used tar implementations should be
able to handle them.  I'll try to write a patch for that soonish.

Until UStar format is used we should raise an error if a filename is
being truncated by tar instead of creating invalid archives.  Also note
that Posix tar format allows 100 byte file names as the name doesn't
have to be zero terminated, but we may want to stick to 99 bytes in old
type tar anyway as using 100 byte filenames has shown bugs in other tar
implementations, for example
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=689582 - and
truncating at 100 bytes instead of 99 doesn't help us too much anyway.

/ Oskari



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-12-22 Thread Amit Kapila
On Tue, Dec 23, 2014 at 4:10 AM, Oskari Saarenmaa o...@ohmu.fi wrote:

 08.11.2014, 04:03, Peter Eisentraut kirjoitti:
  On 11/4/14 3:52 PM, Peter Eisentraut wrote:
   Here are patches to address that.  First, it reports errors when
   attempting to create a tar header that would truncate file or symlink
   names.  Second, it works around the problem in the tests by creating
a
   symlink from the short-name tempdir that we had set up for the
   Unix-socket directory case.
  I ended up splitting this up differently.  I applied to part of the
  second patch that works around the length issue in tablespaces.  So the
  tests now pass in 9.4 and up even in working directories with long
  names.  This clears up the regression in 9.4.
 
  The remaining, not applied patch is attached.  It errors when the file
  name is too long and adds tests for that.  This could be applied to 9.5
  and backpatched, if we so choose.  It might become obsolete if
  https://commitfest.postgresql.org/action/patch_view?id=1512 is accepted.
   If that patch doesn't get accepted, I might add my patch to a future
  commit fest.

 I think we should just use the UStar tar format
 (http://en.wikipedia.org/wiki/Tar_%28computing%29#UStar_format) and
 allow long file names; all actively used tar implementations should be
 able to handle them.  I'll try to write a patch for that soonish.


I think even using UStar format won't make it work for Windows where
the standard utilities are not able to understand the symlinks in tar.
There is already a patch [1] in this CF which will handle both cases, so I
am
not sure if it is very good idea to go with a new tar format to handle this
issue.

[1] : https://commitfest.postgresql.org/action/patch_view?id=1512

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-11-07 Thread Peter Eisentraut
On 11/4/14 3:52 PM, Peter Eisentraut wrote:
 Here are patches to address that.  First, it reports errors when
 attempting to create a tar header that would truncate file or symlink
 names.  Second, it works around the problem in the tests by creating a
 symlink from the short-name tempdir that we had set up for the
 Unix-socket directory case.

I ended up splitting this up differently.  I applied to part of the
second patch that works around the length issue in tablespaces.  So the
tests now pass in 9.4 and up even in working directories with long
names.  This clears up the regression in 9.4.

The remaining, not applied patch is attached.  It errors when the file
name is too long and adds tests for that.  This could be applied to 9.5
and backpatched, if we so choose.  It might become obsolete if
https://commitfest.postgresql.org/action/patch_view?id=1512 is accepted.
 If that patch doesn't get accepted, I might add my patch to a future
commit fest.

From bd15c32714c9292a93763c7e74508c3643a7b6cd Mon Sep 17 00:00:00 2001
From: Peter Eisentraut peter_e@gmx.net
Date: Tue, 4 Nov 2014 15:19:18 -0500
Subject: [PATCH] Error when creating names too long for tar format

The tar format (at least the version we are using), does not support
file names or symlink targets longer than 99 bytes.  Until now, the tar
creation code would silently truncate any names that are too long.  (Its
original application was pg_dump, where this never happens.)  This
creates problems when running base backups over the replication
protocol.

The most important problem is when a tablespace path is longer than 99
bytes, which will result in a truncated tablespace path being backed up.
Less importantly, the basebackup protocol also promises to back up any
other files it happens to find in the data directory, which would also
lead to file name truncation if someone put a file with a long name in
there.

Now both of these cases result in an error during the backup.

Add tests that fail when a too-long file name or symlink is attempted to
be backed up.
---
 src/backend/replication/basebackup.c | 21 -
 src/bin/pg_basebackup/t/010_pg_basebackup.pl | 15 ++-
 src/include/pgtar.h  | 10 +-
 src/port/tar.c   | 10 +-
 4 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index fbcecbb..5835725 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1234,11 +1234,30 @@ _tarWriteHeader(const char *filename, const char *linktarget,
 struct stat * statbuf)
 {
 	char		h[512];
+	enum tarError rc;
 
-	tarCreateHeader(h, filename, linktarget, statbuf-st_size,
+	rc = tarCreateHeader(h, filename, linktarget, statbuf-st_size,
 	statbuf-st_mode, statbuf-st_uid, statbuf-st_gid,
 	statbuf-st_mtime);
 
+	switch (rc)
+	{
+		case TAR_OK:
+			break;
+		case TAR_NAME_TOO_LONG:
+			ereport(ERROR,
+	(errmsg(file name too long for tar format: \%s\,
+			filename)));
+			break;
+		case TAR_SYMLINK_TOO_LONG:
+			ereport(ERROR,
+	(errmsg(symbolic link target too long for tar format: file name \%s\, target \%s\,
+			filename, linktarget)));
+			break;
+		default:
+			elog(ERROR, unrecognized tar error: %d, rc);
+	}
+
 	pq_putmessage('d', h, 512);
 }
 
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index c966de0..7e9a776 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -2,7 +2,7 @@
 use warnings;
 use Cwd;
 use TestLib;
-use Test::More tests = 33;
+use Test::More tests = 35;
 
 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -49,6 +49,13 @@
 	'tar format');
 ok(-f $tempdir/tarbackup/base.tar, 'backup tar was created');
 
+my $superlongname = superlongname_ . (xx100);
+
+system_or_bail 'touch', $tempdir/pgdata/$superlongname;
+command_fails([ 'pg_basebackup', '-D', $tempdir/tarbackup_l1, '-Ft' ],
+			  'pg_basebackup tar with long name fails');
+unlink $tempdir/pgdata/$superlongname;
+
 # Create a temporary directory in the system location and symlink it
 # to our physical temp location.  That way we can use shorter names
 # for the tablespace directories, which hopefully won't run afoul of
@@ -117,3 +124,9 @@
 command_fails(
 	[ 'pg_basebackup', '-D', $tempdir/backup_foo, '-Fp', -Tfoo ],
 	'-T with invalid format fails');
+
+mkdir $tempdir/$superlongname;
+psql 'postgres', CREATE TABLESPACE tblspc3 LOCATION '$tempdir/$superlongname';;
+command_fails([ 'pg_basebackup', '-D', $tempdir/tarbackup_l3, '-Ft' ],
+			  'pg_basebackup tar with long symlink target fails');
+psql 'postgres', DROP TABLESPACE tblspc3;;
diff --git a/src/include/pgtar.h b/src/include/pgtar.h
index 2dd6f6b..6b6c1e1 100644
--- a/src/include/pgtar.h
+++ b/src/include/pgtar.h
@@ -11,5 +11,13 @@
  *
  

Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-11-04 Thread Peter Eisentraut
On 10/20/14 4:51 PM, Peter Eisentraut wrote:
 On 10/20/14 2:59 PM, Tom Lane wrote:
 What do we want to do about this?  I think a minimum expectation would be
 for pg_basebackup to notice and complain when it's trying to create an
 unworkably long symlink entry, but it would be far better if we found a
 way to cope instead.
 
 Isn't it the backend that should error out before sending truncated
 files names?
 
 src/port/tar.c:
 
 /* Name 100 */
 sprintf(h[0], %.99s, filename);

Here are patches to address that.  First, it reports errors when
attempting to create a tar header that would truncate file or symlink
names.  Second, it works around the problem in the tests by creating a
symlink from the short-name tempdir that we had set up for the
Unix-socket directory case.

The first patch can be backpatched to 9.3.  The tar code before that is
different and would need manual adjustments.

If someone has a too-long tablespace path, I think they can work around
that after this patch by creating a shorter symlink and updating the
pg_tblspc symlinks to point there.
From 16d5eb9514c89391e6c2361212363a7ac2f16e0b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut peter_e@gmx.net
Date: Tue, 4 Nov 2014 15:19:18 -0500
Subject: [PATCH 1/2] Error when creating names too long for tar format

The tar format (at least the version we are using), does not support
file names or symlink targets longer than 99 bytes.  Until now, the tar
creation code would silently truncate any names that are too long.  (Its
original application was pg_dump, where this never happens.)  This
creates problems when running base backups over the replication
protocol.

The most important problem is when a tablespace path is longer than 99
bytes, which will result in a truncated tablespace path being backed up.
Less importantly, the basebackup protocol also promises to back up any
other files it happens to find in the data directory, which would also
lead to file name truncation if someone put a file with a long name in
there.

Now both of these cases result in an error during the backup.
---
 src/backend/replication/basebackup.c | 21 -
 src/include/pgtar.h  | 10 +-
 src/port/tar.c   | 10 +-
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index fbcecbb..5835725 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1234,11 +1234,30 @@ _tarWriteHeader(const char *filename, const char *linktarget,
 struct stat * statbuf)
 {
 	char		h[512];
+	enum tarError rc;
 
-	tarCreateHeader(h, filename, linktarget, statbuf-st_size,
+	rc = tarCreateHeader(h, filename, linktarget, statbuf-st_size,
 	statbuf-st_mode, statbuf-st_uid, statbuf-st_gid,
 	statbuf-st_mtime);
 
+	switch (rc)
+	{
+		case TAR_OK:
+			break;
+		case TAR_NAME_TOO_LONG:
+			ereport(ERROR,
+	(errmsg(file name too long for tar format: \%s\,
+			filename)));
+			break;
+		case TAR_SYMLINK_TOO_LONG:
+			ereport(ERROR,
+	(errmsg(symbolic link target too long for tar format: file name \%s\, target \%s\,
+			filename, linktarget)));
+			break;
+		default:
+			elog(ERROR, unrecognized tar error: %d, rc);
+	}
+
 	pq_putmessage('d', h, 512);
 }
 
diff --git a/src/include/pgtar.h b/src/include/pgtar.h
index 2dd6f6b..6b6c1e1 100644
--- a/src/include/pgtar.h
+++ b/src/include/pgtar.h
@@ -11,5 +11,13 @@
  *
  *-
  */
-extern void tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
+
+enum tarError
+{
+	TAR_OK = 0,
+	TAR_NAME_TOO_LONG,
+	TAR_SYMLINK_TOO_LONG
+};
+
+extern enum tarError tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
 extern int	tarChecksum(char *header);
diff --git a/src/port/tar.c b/src/port/tar.c
index 09fd6c1..74168e8 100644
--- a/src/port/tar.c
+++ b/src/port/tar.c
@@ -49,10 +49,16 @@ tarChecksum(char *header)
  * must always have space for 512 characters, which is a requirement by
  * the tar format.
  */
-void
+enum tarError
 tarCreateHeader(char *h, const char *filename, const char *linktarget,
 size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime)
 {
+	if (strlen(filename)  99)
+		return TAR_NAME_TOO_LONG;
+
+	if (linktarget  strlen(linktarget)  99)
+		return TAR_SYMLINK_TOO_LONG;
+
 	/*
 	 * Note: most of the fields in a tar header are not supposed to be
 	 * null-terminated.  We use sprintf, which will write a null after the
@@ -141,4 +147,6 @@ tarCreateHeader(char *h, const char *filename, const char *linktarget,
 	 * 6 digits, a space, and a null, which is legal per POSIX.
 	 */
 	sprintf(h[148], %06o , tarChecksum(h));
+
+	return TAR_OK;
 }
-- 
2.1.3

From 25f6daded3a4be8182af2d185f4826708f7a1b76 Mon 

Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-10-30 Thread Peter Eisentraut
On 10/29/14 10:48 AM, Robert Haas wrote:
 On Tue, Oct 28, 2014 at 8:29 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 10/20/14 2:59 PM, Tom Lane wrote:
 My Salesforce colleague Thomas Fanghaenel observed that the TAP tests
 for pg_basebackup fail when run in a sufficiently deeply-nested directory
 tree.

 As for the test, we can do something like the attached to mark the test
 as TODO.
 
 What does this actually do?  It doesn't appear that it's just
 disabling the test.

It still runs the tests, but doesn't count the results in whether the
suite passes.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-10-29 Thread Robert Haas
On Tue, Oct 28, 2014 at 8:29 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 10/20/14 2:59 PM, Tom Lane wrote:
 My Salesforce colleague Thomas Fanghaenel observed that the TAP tests
 for pg_basebackup fail when run in a sufficiently deeply-nested directory
 tree.

 As for the test, we can do something like the attached to mark the test
 as TODO.

What does this actually do?  It doesn't appear that it's just
disabling the test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-10-28 Thread Peter Eisentraut
On 10/20/14 2:59 PM, Tom Lane wrote:
 My Salesforce colleague Thomas Fanghaenel observed that the TAP tests
 for pg_basebackup fail when run in a sufficiently deeply-nested directory
 tree.

As for the test, we can do something like the attached to mark the test
as TODO.

diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 597fb60..695fd98 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -68,6 +68,8 @@
 		-T$tempdir/tblspc1=$tempdir/tbackup/tblspc1 ],
 	'plain format with tablespaces succeeds with tablespace mapping');
 ok(-d $tempdir/tbackup/tblspc1, 'tablespace was relocated');
+TODO: {
+	local $TODO = 'symlinks 99 chars not supported';
 opendir(my $dh, $tempdir/pgdata/pg_tblspc) or die;
 ok( (   grep
 		{
@@ -77,6 +79,7 @@
 		  } readdir($dh)),
 	tablespace symlink was updated);
 closedir $dh;
+}
 
 mkdir $tempdir/tbl=spc2;
 psql 'postgres', DROP TABLE test1;;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-10-20 Thread Peter Eisentraut
On 10/20/14 2:59 PM, Tom Lane wrote:
 What do we want to do about this?  I think a minimum expectation would be
 for pg_basebackup to notice and complain when it's trying to create an
 unworkably long symlink entry, but it would be far better if we found a
 way to cope instead.

Isn't it the backend that should error out before sending truncated
files names?

src/port/tar.c:

/* Name 100 */
sprintf(h[0], %.99s, filename);

And then do we need to prevent the creation of tablespaces that can't be
backed up?

 One thing we could possibly do without reinventing tar is to avoid 
using
 absolute path names if a PGDATA-relative one would do.

Maybe we could hack up the tar format to store the symlink target as the
file body, like cpio does.  Of course then we'd lose the property of
this actually being tar.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-10-20 Thread Amit Kapila
On Tue, Oct 21, 2014 at 12:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 My Salesforce colleague Thomas Fanghaenel observed that the TAP tests
 for pg_basebackup fail when run in a sufficiently deeply-nested directory
 tree.  The cause appears to be that we rely on standard tar format
 to represent the symlink for a tablespace, and POSIX tar format has a
 hard-wired restriction of 99 bytes in a symlink's expansion.

 What do we want to do about this?  I think a minimum expectation would be
 for pg_basebackup to notice and complain when it's trying to create an
 unworkably long symlink entry, but it would be far better if we found a
 way to cope instead.

One way to cope with such a situation could be that during backup we create
a backup symlink file which contains listing of symlinks and then archive
recovery recreates it.  Basically this is the solution (patch), I have
proposed
for Windows [1].


[1] - https://commitfest.postgresql.org/action/patch_view?id=1512

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com