Hello,

IMHO, there is a serious issue in the script to clean the old data directory
when running pg_upgrade in link mode.

in short: When working with symbolic links, the first step in 
delete_old_cluster.sh
is to delete the old $PGDATA folder that may contain tablespaces used by the 
new instance.

in long, our use case:

our postgres data directories are organized as follow:

1) they are all registered in a root location, i.e. /opt/data,
   but can be located somewhere else using symbolic links:

   ll /opt/app/
   ...
   postgresql-data-1 -> /pgdata/postgresql-data-1

2) we have fixed names for root locations of tablespaces within $PGDATA.
   these can be real folders or again symbolic links to some other places:

   ll /pgdata/postgresql-data-1
   ...
   tblspc_data
   tblspc_idx -> /datarep/pg1/tblspc_idx

   (additionally, each schema has its own tablespaces in these locations, but 
this is not relevant here)

3 ) we do have some custom content within $PGDATA. e.g. an extra log folder 
used by our deployment script

After running pg_upgrade, checking the tablespace location within the NEW 
instance:

ll pg_tblspc

 16428 -> /opt/app/postgresql-data-1/tblspc_data/foo
 16429 -> /opt/app/postgresql-data-1/tblspc_idx/foo

which, resolving the symbolic links is equivalent to:

  /pgdata/postgresql-data-1/tblspc_data/foo (x)
  /datarep/pg1/tblspc_idx/foo               (y)

I called pg_upgrade using the true paths (no symbolic links):

 ./pg_upgrade \
  --link\
  --check\
  --old-datadir "/pgdata/postgresql-data-1"\
  --new-datadir "/pgdata/postgresql_93-data-1"

now, checking what the cleanup script would like to do:

cat delete_old_cluster.sh
#!/bin/sh

(a) rm -rf /pgdata/postgresql-data-1
(b) rm -rf /opt/app/postgresql-data-1/tblspc_data/foo/PG_9.1_201105231
(c) rm -rf /opt/app/postgresql-data-1/tblspc_err_data/foo/PG_9.1_201105231

a: will delete the folder (x) which contains data for the NEW Postgres instance 
!
b: already gone through (a)
c: still exists in /datarep/pg1/tblspc_idx/foo  but can't be found
   as the symbolic link in /pgdata/postgresql-data-1 is already deleted through 
(a)

moreover, our custom content in $OLD_PGATA would be gone too

It seems that these issues could all be avoided
while first removing the expected content of $OLD_PGATA
and then only unlink $OLD_PGATA itself when empty
(or add a note in the output of pg_restore):

replace

rm -rf /pgdata/postgresql-data-1

with

cd /pgdata/postgresql-data-1
rm -rf base
rm -rf global
rm -rf pg_clog
rm -rf pg_hba.conf (*)
rm -rf pg_ident.conf (*)
rm -rf pg_log
rm -rf pg_multixact
rm -rf pg_notify
rm -rf pg_serial
rm -rf pg_stat_tmp
rm -rf pg_subtrans
rm -rf pg_tblspc
rm -rf pg_twophase
rm -rf PG_VERSION (*)
rm -rf pg_xlog
rm -rf postgresql.conf (*)
rm -rf postmaster.log
rm -rf postmaster.opts (*)

(*):  could be nice to keep as a reference.

best regards,

Marc Mamin

Reply via email to