Re: [HACKERS] [PATCH] pg_upgrade: support for btrfs copy-on-write clones

2014-02-12 Thread Bruce Momjian
On Fri, Nov 15, 2013 at 10:40:20AM +0200, Heikki Linnakangas wrote:
 The BTRFS_IOC_CLONE ioctl operates on file level and can be used to
 clone files anywhere in a btrfs filesystem.
 
 Hmm, you can also do
 
 cp --reflog -r data92 data-tmp

I think you mean --reflink here.

 pg_upgrade --link --old-datadir=data92-copy --new-datadir=data-tmp
 rm -rf data-tmp
 
 That BTRFS_IOC_CLONE ioctl seems so hacky that I'd rather not get
 that in our source tree. cp --reflog is much more likely to get that
 magic incantation right, since it gets a lot more attention and
 testing than pg_upgrade.
 
 I'm not in favor of adding filesystem-specific tricks into
 pg_upgrade. It would be nice to list these tricks in the docs,
 though.

I have applied the attached patch which suggests the use of file system
snapshots and copy-on-write file copies.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index 3d529b2..bb3b6a0
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*** psql --username postgres --file script.s
*** 569,575 
 the old server and run commandrsync/ again to update the copy with any
 changes to make it consistent.  You might want to exclude some
 files, e.g. filenamepostmaster.pid/, as documented in xref
!linkend=backup-lowlevel-base-backup.
/para
  
   refsect2
--- 569,578 
 the old server and run commandrsync/ again to update the copy with any
 changes to make it consistent.  You might want to exclude some
 files, e.g. filenamepostmaster.pid/, as documented in xref
!linkend=backup-lowlevel-base-backup.  If your file system supports
!file system snapshots or copy-on-write file copying, you can use that
!to make a backup of the old cluster, though the snapshot and copies
!must be created simultaneously or while the database server is down.
/para
  
   refsect2

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_upgrade: support for btrfs copy-on-write clones

2013-11-15 Thread Heikki Linnakangas

On 05.10.2013 16:57, Oskari Saarenmaa wrote:

05.10.2013 16:38, Bruce Momjian kirjoitti:

On Fri, Oct  4, 2013 at 10:42:46PM +0300, Oskari Saarenmaa wrote:

Thanks for the offers, but it looks like ZFS doesn't actually implement
a similar file level clone operation.  See
https://github.com/zfsonlinux/zfs/issues/405 for discussion on a feature
request for it.

ZFS does support cloning entire datasets which seem to be similar to
btrfs subvolume snapshots and could be used to set up a new data
directory for a new $PGDATA.   This would require the original $PGDATA
to be a dataset/subvolume of its own and quite a bit different logic
(than just another file copy method in pg_upgrade) to initialize the new
version's $PGDATA as a snapshot/clone of the original.  The way this
would work is that the original $PGDATA dataset/subvolume gets cloned to
a new location after which we move the files out of the way of the new
PG installation and run pg_upgrade in link mode.  I'm not sure if
there's a good way to integrate this into pg_upgrade or if it's just
something that could be documented as a fast way to run pg_upgrade
without touching original files.

With btrfs tooling the sequence would be something like this:

   btrfs subvolume snapshot /srv/pg92 /srv/pg93
   mv /srv/pg93/data /srv/pg93/data92
   initdb /data/pg93/data
   pg_upgrade --link \
   --old-datadir=/data/pg93/data92 \
   --new-datadir=/data/pg93/data


Does btrfs support file system snapshots?  If so, shouldn't people just
take a snapshot of the old data directory before the ugprade, rather
than using cloning?


Yeah, it's possible to clone an existing subvolume, but this requires
that $PGDATA is a subvolume of its own and would be a bit difficult to
integrate into existing pg_upgrade scripts.

The BTRFS_IOC_CLONE ioctl operates on file level and can be used to
clone files anywhere in a btrfs filesystem.


Hmm, you can also do

cp --reflog -r data92 data-tmp
pg_upgrade --link --old-datadir=data92-copy --new-datadir=data-tmp
rm -rf data-tmp

That BTRFS_IOC_CLONE ioctl seems so hacky that I'd rather not get that 
in our source tree. cp --reflog is much more likely to get that magic 
incantation right, since it gets a lot more attention and testing than 
pg_upgrade.


I'm not in favor of adding filesystem-specific tricks into pg_upgrade. 
It would be nice to list these tricks in the docs, though.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_upgrade: support for btrfs copy-on-write clones

2013-10-05 Thread Bruce Momjian
On Fri, Oct  4, 2013 at 10:42:46PM +0300, Oskari Saarenmaa wrote:
 Thanks for the offers, but it looks like ZFS doesn't actually implement
 a similar file level clone operation.  See
 https://github.com/zfsonlinux/zfs/issues/405 for discussion on a feature
 request for it.
 
 ZFS does support cloning entire datasets which seem to be similar to
 btrfs subvolume snapshots and could be used to set up a new data
 directory for a new $PGDATA.   This would require the original $PGDATA
 to be a dataset/subvolume of its own and quite a bit different logic
 (than just another file copy method in pg_upgrade) to initialize the new
 version's $PGDATA as a snapshot/clone of the original.  The way this
 would work is that the original $PGDATA dataset/subvolume gets cloned to
 a new location after which we move the files out of the way of the new
 PG installation and run pg_upgrade in link mode.  I'm not sure if
 there's a good way to integrate this into pg_upgrade or if it's just
 something that could be documented as a fast way to run pg_upgrade
 without touching original files.
 
 With btrfs tooling the sequence would be something like this:
 
   btrfs subvolume snapshot /srv/pg92 /srv/pg93
   mv /srv/pg93/data /srv/pg93/data92
   initdb /data/pg93/data
   pg_upgrade --link \
   --old-datadir=/data/pg93/data92 \
   --new-datadir=/data/pg93/data

Does btrfs support file system snapshots?  If so, shouldn't people just
take a snapshot of the old data directory before the ugprade, rather
than using cloning?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_upgrade: support for btrfs copy-on-write clones

2013-10-05 Thread Oskari Saarenmaa
05.10.2013 16:38, Bruce Momjian kirjoitti:
 On Fri, Oct  4, 2013 at 10:42:46PM +0300, Oskari Saarenmaa wrote:
 Thanks for the offers, but it looks like ZFS doesn't actually implement
 a similar file level clone operation.  See
 https://github.com/zfsonlinux/zfs/issues/405 for discussion on a feature
 request for it.

 ZFS does support cloning entire datasets which seem to be similar to
 btrfs subvolume snapshots and could be used to set up a new data
 directory for a new $PGDATA.   This would require the original $PGDATA
 to be a dataset/subvolume of its own and quite a bit different logic
 (than just another file copy method in pg_upgrade) to initialize the new
 version's $PGDATA as a snapshot/clone of the original.  The way this
 would work is that the original $PGDATA dataset/subvolume gets cloned to
 a new location after which we move the files out of the way of the new
 PG installation and run pg_upgrade in link mode.  I'm not sure if
 there's a good way to integrate this into pg_upgrade or if it's just
 something that could be documented as a fast way to run pg_upgrade
 without touching original files.

 With btrfs tooling the sequence would be something like this:

   btrfs subvolume snapshot /srv/pg92 /srv/pg93
   mv /srv/pg93/data /srv/pg93/data92
   initdb /data/pg93/data
   pg_upgrade --link \
   --old-datadir=/data/pg93/data92 \
   --new-datadir=/data/pg93/data
 
 Does btrfs support file system snapshots?  If so, shouldn't people just
 take a snapshot of the old data directory before the ugprade, rather
 than using cloning?

Yeah, it's possible to clone an existing subvolume, but this requires
that $PGDATA is a subvolume of its own and would be a bit difficult to
integrate into existing pg_upgrade scripts.

The BTRFS_IOC_CLONE ioctl operates on file level and can be used to
clone files anywhere in a btrfs filesystem.

/ Oskari



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_upgrade: support for btrfs copy-on-write clones

2013-10-04 Thread Oskari Saarenmaa
03.10.2013 01:35, Larry Rosenman kirjoitti:
 On 2013-10-02 17:32, Josh Berkus wrote:
 No fundamental reason; I'm hoping ZFS will be supported in addition to
 btrfs, but I don't have any systems with ZFS filesystems at the moment
 so I haven't been able to test it or find out the mechanisms ZFS uses
 for cloning.  On btrfs cloning is implemented with a custom
 btrfs-specific ioctl, ZFS probably has something similar which would be
 pretty easy to add on top of this patch.

 Would you like a VM with ZFS on it?  I'm pretty sure I can supply one.

 I can also supply SSH access to a FreeBSD 10 system that is totally ZFS.

Thanks for the offers, but it looks like ZFS doesn't actually implement
a similar file level clone operation.  See
https://github.com/zfsonlinux/zfs/issues/405 for discussion on a feature
request for it.

ZFS does support cloning entire datasets which seem to be similar to
btrfs subvolume snapshots and could be used to set up a new data
directory for a new $PGDATA.   This would require the original $PGDATA
to be a dataset/subvolume of its own and quite a bit different logic
(than just another file copy method in pg_upgrade) to initialize the new
version's $PGDATA as a snapshot/clone of the original.  The way this
would work is that the original $PGDATA dataset/subvolume gets cloned to
a new location after which we move the files out of the way of the new
PG installation and run pg_upgrade in link mode.  I'm not sure if
there's a good way to integrate this into pg_upgrade or if it's just
something that could be documented as a fast way to run pg_upgrade
without touching original files.

With btrfs tooling the sequence would be something like this:

  btrfs subvolume snapshot /srv/pg92 /srv/pg93
  mv /srv/pg93/data /srv/pg93/data92
  initdb /data/pg93/data
  pg_upgrade --link \
  --old-datadir=/data/pg93/data92 \
  --new-datadir=/data/pg93/data


/ Oskari


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_upgrade: support for btrfs copy-on-write clones

2013-10-02 Thread Kevin Grittner
Oskari Saarenmaa o...@ohmu.fi wrote:

 Add file cloning as an alternative data transfer method to pg_upgrade.

 Currently only btrfs is supported, but copy-on-write cloning is also
 available on at least ZFS.  Cloning must be requested explicitly and if
 it isn't supported by the operating system or filesystem a fatal error
 is thrown.
 
 This provides upgrade performance similar to link mode while allowing
 the old cluster to be used even after the new one has been started.

Please add the patch here to make sure it gets reviewed:

https://commitfest.postgresql.org/action/commitfest_view/open

For more information on the process, see:

http://wiki.postgresql.org/wiki/CommitFest


--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_upgrade: support for btrfs copy-on-write clones

2013-10-02 Thread Oskari Saarenmaa

On 02/10/13 17:18, Andrew Dunstan wrote:


On 10/01/2013 06:31 PM, Oskari Saarenmaa wrote:

Add file cloning as an alternative data transfer method to pg_upgrade.
Currently only btrfs is supported, but copy-on-write cloning is also
available on at least ZFS.  Cloning must be requested explicitly and if
it isn't supported by the operating system or filesystem a fatal error
is thrown.



So, just curious, why isn't ZFS supported? It's what I am more
interested in, at least.


No fundamental reason; I'm hoping ZFS will be supported in addition to 
btrfs, but I don't have any systems with ZFS filesystems at the moment 
so I haven't been able to test it or find out the mechanisms ZFS uses 
for cloning.  On btrfs cloning is implemented with a custom 
btrfs-specific ioctl, ZFS probably has something similar which would be 
pretty easy to add on top of this patch.


Added this patch to commitfest as suggested, 
https://commitfest.postgresql.org/action/patch_view?id=1251


/ Oskari



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_upgrade: support for btrfs copy-on-write clones

2013-10-02 Thread Andrew Dunstan


On 10/01/2013 06:31 PM, Oskari Saarenmaa wrote:

Add file cloning as an alternative data transfer method to pg_upgrade.
Currently only btrfs is supported, but copy-on-write cloning is also
available on at least ZFS.  Cloning must be requested explicitly and if
it isn't supported by the operating system or filesystem a fatal error
is thrown.



So, just curious, why isn't ZFS supported? It's what I am more 
interested in, at least.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_upgrade: support for btrfs copy-on-write clones

2013-10-02 Thread Bruce Momjian
On Wed, Oct  2, 2013 at 05:23:31PM +0300, Oskari Saarenmaa wrote:
 On 02/10/13 17:18, Andrew Dunstan wrote:
 
 On 10/01/2013 06:31 PM, Oskari Saarenmaa wrote:
 Add file cloning as an alternative data transfer method to pg_upgrade.
 Currently only btrfs is supported, but copy-on-write cloning is also
 available on at least ZFS.  Cloning must be requested explicitly and if
 it isn't supported by the operating system or filesystem a fatal error
 is thrown.
 
 
 So, just curious, why isn't ZFS supported? It's what I am more
 interested in, at least.
 
 No fundamental reason; I'm hoping ZFS will be supported in addition
 to btrfs, but I don't have any systems with ZFS filesystems at the
 moment so I haven't been able to test it or find out the mechanisms
 ZFS uses for cloning.  On btrfs cloning is implemented with a custom
 btrfs-specific ioctl, ZFS probably has something similar which would
 be pretty easy to add on top of this patch.
 
 Added this patch to commitfest as suggested,
 https://commitfest.postgresql.org/action/patch_view?id=1251

What is the performance overhead of using a cloned data directory for a
cluster?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_upgrade: support for btrfs copy-on-write clones

2013-10-02 Thread Josh Berkus

 No fundamental reason; I'm hoping ZFS will be supported in addition to
 btrfs, but I don't have any systems with ZFS filesystems at the moment
 so I haven't been able to test it or find out the mechanisms ZFS uses
 for cloning.  On btrfs cloning is implemented with a custom
 btrfs-specific ioctl, ZFS probably has something similar which would be
 pretty easy to add on top of this patch.

Would you like a VM with ZFS on it?  I'm pretty sure I can supply one.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] pg_upgrade: support for btrfs copy-on-write clones

2013-10-02 Thread Larry Rosenman

On 2013-10-02 17:32, Josh Berkus wrote:

No fundamental reason; I'm hoping ZFS will be supported in addition to
btrfs, but I don't have any systems with ZFS filesystems at the moment
so I haven't been able to test it or find out the mechanisms ZFS uses
for cloning.  On btrfs cloning is implemented with a custom
btrfs-specific ioctl, ZFS probably has something similar which would 
be

pretty easy to add on top of this patch.


Would you like a VM with ZFS on it?  I'm pretty sure I can supply one.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

I can also supply SSH access to a FreeBSD 10 system that is totally ZFS.


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: l...@lerctr.org
US Mail: 108 Turvey Cove, Hutto, TX 78634-5688


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] pg_upgrade: support for btrfs copy-on-write clones

2013-10-01 Thread Oskari Saarenmaa
Add file cloning as an alternative data transfer method to pg_upgrade.
Currently only btrfs is supported, but copy-on-write cloning is also
available on at least ZFS.  Cloning must be requested explicitly and if
it isn't supported by the operating system or filesystem a fatal error
is thrown.

This provides upgrade performance similar to link mode while allowing
the old cluster to be used even after the new one has been started.

Signed-off-by: Oskari Saarenmaa o...@ohmu.fi
---
 configure|   5 +-
 configure.in |   7 ++-
 contrib/pg_upgrade/check.c   |   3 +
 contrib/pg_upgrade/file.c| 125 +--
 contrib/pg_upgrade/option.c  |   7 +++
 contrib/pg_upgrade/pg_upgrade.h  |  13 ++--
 contrib/pg_upgrade/relfilenode.c |  31 --
 doc/src/sgml/pgupgrade.sgml  |   7 +++
 src/include/pg_config.h.in   |   3 +
 9 files changed, 141 insertions(+), 60 deletions(-)

diff --git a/configure b/configure
index c685ca3..5087463 100755
--- a/configure
+++ b/configure
@@ -10351,7 +10351,10 @@ done
 
 
 
-for ac_header in crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h 
langinfo.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h 
sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h 
sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
+for ac_header in crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h 
langinfo.h \
+linux/btrfs.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h \
+sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h \
+sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h
 do
 as_ac_Header=`$as_echo ac_cv_header_$ac_header | $as_tr_sh`
 if { as_var=$as_ac_Header; eval test \\${$as_var+set}\ = set; }; then
diff --git a/configure.in b/configure.in
index 82771bd..609aa73 100644
--- a/configure.in
+++ b/configure.in
@@ -982,7 +982,12 @@ AC_SUBST(OSSP_UUID_LIBS)
 ##
 
 dnl sys/socket.h is required by AC_FUNC_ACCEPT_ARGTYPES
-AC_CHECK_HEADERS([crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h 
langinfo.h poll.h pwd.h sys/ioctl.h sys/ipc.h sys/poll.h sys/pstat.h 
sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/socket.h sys/sockio.h 
sys/tas.h sys/time.h sys/un.h termios.h ucred.h utime.h wchar.h wctype.h])
+AC_CHECK_HEADERS([crypt.h dld.h fp_class.h getopt.h ieeefp.h ifaddrs.h \
+  langinfo.h linux/btrfs.h poll.h pwd.h sys/ioctl.h \
+  sys/ipc.h sys/poll.h sys/pstat.h sys/resource.h \
+  sys/select.h sys/sem.h sys/shm.h sys/socket.h \
+  sys/sockio.h sys/tas.h sys/time.h sys/un.h termios.h \
+  ucred.h utime.h wchar.h wctype.h])
 
 # On BSD, test for net/if.h will fail unless sys/socket.h
 # is included first.
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
index 0376fcb..2a52dd8 100644
--- a/contrib/pg_upgrade/check.c
+++ b/contrib/pg_upgrade/check.c
@@ -151,6 +151,9 @@ check_new_cluster(void)
if (user_opts.transfer_mode == TRANSFER_MODE_LINK)
check_hard_link();
 
+   if (user_opts.transfer_mode == TRANSFER_MODE_CLONE)
+   check_clone_file();
+
check_is_super_user(new_cluster);
 
/*
diff --git a/contrib/pg_upgrade/file.c b/contrib/pg_upgrade/file.c
index dfeb79f..fc935b7 100644
--- a/contrib/pg_upgrade/file.c
+++ b/contrib/pg_upgrade/file.c
@@ -8,11 +8,16 @@
  */
 
 #include postgres_fe.h
+#include pg_config.h
 
 #include pg_upgrade.h
 
 #include fcntl.h
 
+#ifdef HAVE_LINUX_BTRFS_H
+# include sys/ioctl.h
+# include linux/btrfs.h
+#endif
 
 
 #ifndef WIN32
@@ -23,21 +28,42 @@ static int  win32_pghardlink(const char *src, const char 
*dst);
 
 
 /*
- * copyAndUpdateFile()
+ * upgradeFile()
  *
- * Copies a relation file from src to dst.  If pageConverter is non-NULL, 
this function
- * uses that pageConverter to do a page-by-page conversion.
+ * Transfer a relation file from src to dst using one of the supported
+ * methods.  If the on-disk format of the new cluster is bit-for-bit
+ * compatible with the on-disk format of the old cluster we can simply link
+ * each relation to perform a true in-place upgrade.  Otherwise we must copy
+ * (either block-by-block or using a copy-on-write clone) the data from old
+ * cluster to new cluster and then perform the conversion.
  */
 const char *
-copyAndUpdateFile(pageCnvCtx *pageConverter,
- const char *src, const char *dst, bool force)
+upgradeFile(transferMode transfer_mode, const char *src,
+   const char *dst, pageCnvCtx *pageConverter)
 {
if (pageConverter == NULL)
{
-   if (pg_copy_file(src, dst, force) == -1)
-   return getErrorText(errno);
-   else
-   return NULL;
+   int rc = -1;
+
+   switch (transfer_mode)
+   {
+