Thank you for reviewing!
Attached updated patch.

On Thu, Mar 10, 2016 at 3:37 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, Mar 9, 2016 at 9:09 AM, Masahiko Sawada
> <sawada.m...@gmail.com> wrote: Attached latest 2 patches.
>> * 000 patch : Incorporated the review comments and made rewriting
>> logic more clearly.
>
> That's better, thanks.  But your comments don't survive pgindent.
> After running pgindent, I get this:
>
> +               /*
> +                * These old_* variables point to old visibility map page.
> +                *
> +                * cur_old        : Points to current position on old
> page. blkend_old :
> +                * Points to end of old block. break_old  : Points to
> old page break
> +                * position for rewriting a new page. After wrote a
> new page, old_end
> +                * proceeds rewriteVmBytesPerPgae bytes.
> +                */
>
> You need to either surround this sort of thing with dashes to make
> pgindent ignore it, or, probably better, rewrite it using complete
> sentences that together form a paragraph.

Fixed.

>
> +       Oid                     pg_database_oid;        /* OID of
> pg_database relation */
>
> Not used anywhere?

Fixed.

> Instead of vm_need_rewrite, how about vm_must_add_frozenbit?

Fixed.

> Can you explain the changes to test.sh?

Current regression test scenario is,
1. Do 'make check' on pre-upgrade cluster
2. Dump relallvisible values of all relation in pre-upgrade cluster to
vm_test1.txt
3. Do pg_upgrade
4. Do analyze (not vacuum), dump relallvisibile values of all relation
in post-upgrade cluster to vm_test2.txt
5. Compare between vm_test1.txt and vm_test2.txt

That is, regression test compares between relallvisible values in
pre-upgrade cluster and post-upgrade cluster.
But because test.sh always uses pre/post clusters with same catalog
version, I realized that we cannot ensure that visibility map
rewriting is processed successfully on test.sh framework.
Rewriting visibility map never be executed.
We might need to have another framework for rewriting visibility map page..

Regards,

--
Masahiko Sawada
diff --git a/src/bin/pg_upgrade/file.c b/src/bin/pg_upgrade/file.c
index 2a99a28..7c5bfa6 100644
--- a/src/bin/pg_upgrade/file.c
+++ b/src/bin/pg_upgrade/file.c
@@ -9,7 +9,11 @@
 
 #include "postgres_fe.h"
 
+#include "access/visibilitymap.h"
 #include "pg_upgrade.h"
+#include "storage/bufpage.h"
+#include "storage/checksum.h"
+#include "storage/checksum_impl.h"
 
 #include <fcntl.h>
 
@@ -138,6 +142,130 @@ copy_file(const char *srcfile, const char *dstfile, bool force)
 #endif
 
 
+/*
+ * rewriteVisibilityMap()
+ *
+ * In versions of PostgreSQL prior to catversion 201602181, PostgreSQL's
+ * visibility map included one bit per heap page; it now includes two.
+ * When upgrading a cluster from before that time to a current PostgreSQL
+ * version, we could refuse to copy visibility maps from the old cluster
+ * to the new cluster; the next VACUUM would recreate them, but at the
+ * price of scanning the entire table.  So, instead, we rewrite the old
+ * visibility maps in the new format.  That way, the all-visible bit
+ * remains set for the pages for which it was set previously.  The
+ * all-frozen bit is never set by this conversion; we leave that to
+ * VACUUM.
+ */
+const char *
+rewriteVisibilityMap(const char *fromfile, const char *tofile, bool force)
+{
+#define BITS_PER_HEAPBLOCK_OLD 1
+
+	int			src_fd = 0;
+	int			dst_fd = 0;
+	char		buffer[BLCKSZ];
+	ssize_t 	bytesRead;
+	int			rewriteVmBytesPerPage;
+	BlockNumber	blkno = 0;
+
+	/* Compute we need how many old page bytes to rewrite a new page */
+	rewriteVmBytesPerPage = (BLCKSZ - SizeOfPageHeaderData) / 2;
+
+	/* Reset errno */
+	errno = 0;
+
+	if ((fromfile == NULL) || (tofile == NULL))
+		return getErrorText();
+
+	if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0)
+		goto err;
+
+	if ((dst_fd = open(tofile, O_RDWR | O_CREAT | (force ? 0 : O_EXCL), S_IRUSR | S_IWUSR)) < 0)
+		goto err;
+
+	/*
+	 * Turn each visibility map page into 2 pages one by one.
+	 * Rewritten 2 pages have same page header as old page had.
+	 */
+	while ((bytesRead = read(src_fd, buffer, BLCKSZ)) == BLCKSZ)
+	{
+		char	*cur_old, *break_old, *blkend_old;
+		PageHeaderData	pageheader;
+
+		/* Save the page header data */
+		memcpy(&pageheader, buffer, SizeOfPageHeaderData);
+
+		/*
+		 * These old_* variables point to old visibility map page.
+		 * cur_old points to current potision on old page. blend_old
+		 * points to end of old block. break_old points to old page
+		 * break position for rewritin a new page. After wrote a new
+		 * page, break_old proceeds rewriteVmBytesPerPage bytes.
+		 */
+		cur_old = buffer + SizeOfPageHeaderData;
+		blkend_old = buffer + bytesRead;
+		break_old = cur_old + rewriteVmBytesPerPage;
+
+		while (blkend_old >= break_old)
+		{
+			char	vmbuf[BLCKSZ];
+			char	*cur_new = vmbuf;
+
+			/* Copy page header in advance */
+			memcpy(vmbuf, &pageheader, SizeOfPageHeaderData);
+
+			cur_new += SizeOfPageHeaderData;
+
+			/*
+			 * Process old page bytes one by one, and turn it
+			 * into new page.
+			 */
+			while (break_old > cur_old)
+			{
+				uint16 new_vmbits;
+				int i;
+
+				/* Generate new format bits while keeping old information */
+				for (i = 0; i < BITS_PER_BYTE; i++)
+				{
+					if ((((uint8) *cur_old) & (1 << i)) != 0)
+						new_vmbits |= 1 << (BITS_PER_HEAPBLOCK * i);
+				}
+
+				/* Copy new visibility map bit to new format page */
+				memcpy(cur_new, &new_vmbits, BITS_PER_HEAPBLOCK);
+
+				cur_old += BITS_PER_HEAPBLOCK_OLD;
+				cur_new += BITS_PER_HEAPBLOCK;
+			}
+
+			/* Set new checksum for a visibility map page, If enabled */
+			if (old_cluster.controldata.data_checksum_version != 0 &&
+				new_cluster.controldata.data_checksum_version != 0)
+				((PageHeader) vmbuf)->pd_checksum = pg_checksum_page(vmbuf, blkno);
+
+			if (write(dst_fd, vmbuf, BLCKSZ) != BLCKSZ)
+			{
+					if (errno == 0)
+						errno = ENOSPC;
+					goto err;
+			}
+
+			break_old += rewriteVmBytesPerPage;
+			blkno++;
+		}
+	}
+
+err:
+	if (src_fd != 0)
+		close(src_fd);
+
+	if (dst_fd != 0)
+		close(dst_fd);
+
+	return (errno == 0) ? NULL : getErrorText();
+}
+
 void
 check_hard_link(void)
 {
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 6122878..876633b 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -110,6 +110,10 @@ extern char *output_files[];
 #define VISIBILITY_MAP_CRASHSAFE_CAT_VER 201107031
 
 /*
+ * The format of visibility map is changed with this 9.6 commit,
+ */
+#define VISIBILITY_MAP_FROZEN_BIT_CAT_VER 201602181
+/*
  * pg_multixact format changed in 9.3 commit 0ac5ad5134f2769ccbaefec73844f85,
  * ("Improve concurrency of foreign key locking") which also updated catalog
  * version to this value.  pg_upgrade behavior depends on whether old and new
@@ -365,6 +369,8 @@ bool		pid_lock_file_exists(const char *datadir);
 
 const char *copyFile(const char *src, const char *dst, bool force);
 const char *linkFile(const char *src, const char *dst);
+const char *rewriteVisibilityMap(const char *fromfile, const char *tofile,
+								 bool force);
 
 void		check_hard_link(void);
 FILE	   *fopen_priv(const char *path, const char *mode);
diff --git a/src/bin/pg_upgrade/relfilenode.c b/src/bin/pg_upgrade/relfilenode.c
index b20f073..9daef0b 100644
--- a/src/bin/pg_upgrade/relfilenode.c
+++ b/src/bin/pg_upgrade/relfilenode.c
@@ -16,7 +16,7 @@
 
 
 static void transfer_single_new_db(FileNameMap *maps, int size, char *old_tablespace);
-static void transfer_relfile(FileNameMap *map, const char *suffix);
+static void transfer_relfile(FileNameMap *map, const char *suffix, bool vm_must_add_frozenbit);
 
 
 /*
@@ -132,6 +132,7 @@ transfer_single_new_db(FileNameMap *maps, int size, char *old_tablespace)
 {
 	int			mapnum;
 	bool		vm_crashsafe_match = true;
+	bool		vm_must_add_frozenbit = false;
 
 	/*
 	 * Do the old and new cluster disagree on the crash-safetiness of the vm
@@ -141,13 +142,20 @@ transfer_single_new_db(FileNameMap *maps, int size, char *old_tablespace)
 		new_cluster.controldata.cat_ver >= VISIBILITY_MAP_CRASHSAFE_CAT_VER)
 		vm_crashsafe_match = false;
 
+	/*
+	 * Do we need to rewrite visibilitymap?
+	 */
+	if (old_cluster.controldata.cat_ver < VISIBILITY_MAP_FROZEN_BIT_CAT_VER &&
+		new_cluster.controldata.cat_ver >= VISIBILITY_MAP_FROZEN_BIT_CAT_VER)
+		vm_must_add_frozenbit = true;
+
 	for (mapnum = 0; mapnum < size; mapnum++)
 	{
 		if (old_tablespace == NULL ||
 			strcmp(maps[mapnum].old_tablespace, old_tablespace) == 0)
 		{
 			/* transfer primary file */
-			transfer_relfile(&maps[mapnum], "");
+			transfer_relfile(&maps[mapnum], "", vm_must_add_frozenbit);
 
 			/* fsm/vm files added in PG 8.4 */
 			if (GET_MAJOR_VERSION(old_cluster.major_version) >= 804)
@@ -155,9 +163,9 @@ transfer_single_new_db(FileNameMap *maps, int size, char *old_tablespace)
 				/*
 				 * Copy/link any fsm and vm files, if they exist
 				 */
-				transfer_relfile(&maps[mapnum], "_fsm");
+				transfer_relfile(&maps[mapnum], "_fsm",  vm_must_add_frozenbit);
 				if (vm_crashsafe_match)
-					transfer_relfile(&maps[mapnum], "_vm");
+					transfer_relfile(&maps[mapnum], "_vm",  vm_must_add_frozenbit);
 			}
 		}
 	}
@@ -168,9 +176,11 @@ transfer_single_new_db(FileNameMap *maps, int size, char *old_tablespace)
  * transfer_relfile()
  *
  * Copy or link file from old cluster to new one.
+ * if vm_must_add_frozenbti is true, each visibility map pages are written while
+ * adding frozen bit, even link mode.
  */
 static void
-transfer_relfile(FileNameMap *map, const char *type_suffix)
+transfer_relfile(FileNameMap *map, const char *type_suffix, bool vm_must_add_frozenbit)
 {
 	const char *msg;
 	char		old_file[MAXPGPATH];
@@ -232,7 +242,13 @@ transfer_relfile(FileNameMap *map, const char *type_suffix)
 		{
 			pg_log(PG_VERBOSE, "copying \"%s\" to \"%s\"\n", old_file, new_file);
 
-			if ((msg = copyFile(old_file, new_file, true)) != NULL)
+			/* Rewrite visibility map if needed */
+			if (vm_must_add_frozenbit && (strcmp(type_suffix, "_vm") == 0))
+				msg = rewriteVisibilityMap(old_file, new_file, true);
+			else
+				msg = copyFile(old_file, new_file, true);
+
+			if (msg)
 				pg_fatal("error while copying relation \"%s.%s\" (\"%s\" to \"%s\"): %s\n",
 						 map->nspname, map->relname, old_file, new_file, msg);
 		}
@@ -240,7 +256,13 @@ transfer_relfile(FileNameMap *map, const char *type_suffix)
 		{
 			pg_log(PG_VERBOSE, "linking \"%s\" to \"%s\"\n", old_file, new_file);
 
-			if ((msg = linkFile(old_file, new_file)) != NULL)
+			/* Rewrite visibility map if needed */
+			if (vm_must_add_frozenbit && (strcmp(type_suffix, "_vm") == 0))
+				msg = rewriteVisibilityMap(old_file, new_file, true);
+			else
+				msg = linkFile(old_file, new_file);
+
+			if (msg)
 				pg_fatal("error while creating link for relation \"%s.%s\" (\"%s\" to \"%s\"): %s\n",
 						 map->nspname, map->relname, old_file, new_file, msg);
 		}
diff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh
index ba79fb3..8f037d5 100644
--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -174,6 +174,12 @@ if "$MAKE" -C "$oldsrc" installcheck; then
 		mv "$temp_root"/dump1.sql "$temp_root"/dump1.sql.orig
 		sed "s;$oldsrc;$newsrc;g" "$temp_root"/dump1.sql.orig >"$temp_root"/dump1.sql
 	fi
+
+	# After vacuum all relation, dump rellallvisible values of all relation in pre-upgarde
+	# cluster and save them to vm_test1.txt to visibility map rewriting regression test.
+	vm_sql="SELECT c.relname, c.relallvisible FROM pg_class as c, pg_namespace as n WHERE c.relnamespace = n.oid AND n.nspname NOT IN ('information_schema', 'pg_toast', 'pg_catalog') ORDER BY c.relname;"
+	vacuumdb -d regression || visibilitymap_vacuum1_status=$?
+	psql -d regression -c "$vm_sql" > "$temp_root"/vm_test1.txt || visibilitymap_test1_status=$?
 else
 	make_installcheck_status=$?
 fi
@@ -188,6 +194,14 @@ if [ -n "$pg_dumpall1_status" ]; then
 	echo "pg_dumpall of pre-upgrade database cluster failed"
 	exit 1
 fi
+if [ -n "$visibilitymap_vacuum1_status" ];then
+	echo "VACUUM of pre-upgrade database cluster for visibility map test failed"
+	exit 1
+fi
+if [ -n "$visibilitymap_test1_status" ];then
+	echo "SELECT of pre-upgrade database cluster for visibility map test failed"
+	exit 1
+fi
 
 PGDATA=$BASE_PGDATA
 
@@ -203,6 +217,10 @@ case $testhost in
 esac
 
 pg_dumpall -f "$temp_root"/dump2.sql || pg_dumpall2_status=$?
+# After analyze (do not vacuum) all relation, dump relallvisible values of all relation in
+# post-upgrade cluster to vm_test2.txt
+psql -d regression -c "$vm_sql" > "$temp_root"/vm_test2.txt || visibilitymap_test2_status=$?
+
 pg_ctl -m fast stop
 
 # no need to echo commands anymore
@@ -214,11 +232,28 @@ if [ -n "$pg_dumpall2_status" ]; then
 	exit 1
 fi
 
+if [ -n "$visibilitymap_vacuum2_status" ];then
+	echo "VACUUM of post-upgrade database cluster for visibility map test failed"
+	exit 1
+fi
+
+if [ -n "$visibilitymap_test2_status" ];then
+	echo "SELECT of post-upgrade database cluster for visibility map test failed"
+	exit 1
+fi
+
 case $testhost in
 	MINGW*)	cmd /c delete_old_cluster.bat ;;
 	*)	    sh ./delete_old_cluster.sh ;;
 esac
 
+# Compare relallvisible values of all relation between pre-upgrade cluster and
+# post-upgrade cluster.
+if ! diff "$temp_root"/vm_test1.txt "$temp_root"/vm_test2.txt >/dev/null; then
+	echo "Visibility map rewriting test failed"
+	exit 1
+fi
+
 if diff "$temp_root"/dump1.sql "$temp_root"/dump2.sql >/dev/null; then
 	echo PASSED
 	exit 0
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to