Bruce Momjian wrote:
> > This is the "production system". I have absolutely no indications that
> > anything should be wrong in there. It has run rock-solid since it got
> > migrated (dump/restore) to 8.4 for about 7 months now. So I am a bit
> > scared about you telling that it seems wrong. (but that cannot be
> > attributed to pg_upgrade)
> 
> I am on chat with Alvaro now and it seems we do somehow connect to
> template0 for transaction id wraparound.  I think Alvaro will post
> shortly on this.
> 
> > > OK, thanks.  This does seem odd.  Frankly, having template0's
> > > datfrozenxid be wrong would not cause any kind of instability because
> > > template0 is used only by pg_dump, so I am wondering if something else
> > > is seriously wrong.
> > >    
> > I also think that something was seriously wrong with the pg_upgrade'd
> > version. I'll try to reproduce and be a bit more carefull in tracking 
> > the steps
> > this time.
> 
> Thanks, but I think the entire problem might be this template0 xid issue
> that Alvaro and I are researching.  I can now see how invalid template0
> xids could cause the instability you saw in the new database.  Odd no
> one has seen this bug before.

OK, after talking to Alvaro and Heikki, the problem is that while you
cannot connect to template0, it is accessed by autovacuum for vacuum
freeze, even if autovacuum is turned off.  I think the reason you are
seeing this bug is that your xid counter is near 2 billion (50% to
wraparound) and the original template0 xids are the maximum distance
from your counter.

I am attaching the newest patch which fixes this issue.  I did modify
this code yesterday with another patch, and I am unclear exactly if you
need that patch as well.  CVS now has all these changes.

If you could test with this and the earlier patch, I think it will now
work fine.  Thanks for the valuable testing, and quick feedback.

-- 
  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
Index: contrib/pg_upgrade/pg_upgrade.c
===================================================================
RCS file: /cvsroot/pgsql/contrib/pg_upgrade/pg_upgrade.c,v
retrieving revision 1.3
diff -c -c -r1.3 pg_upgrade.c
*** contrib/pg_upgrade/pg_upgrade.c	18 May 2010 18:40:51 -0000	1.3
--- contrib/pg_upgrade/pg_upgrade.c	19 May 2010 18:20:03 -0000
***************
*** 164,170 ****
  	check_ok(ctx);
  
  	/*
! 	 * We do freeze after analyze so pg_statistic is also frozen
  	 */
  	prep_status(ctx, "Freezing all rows on the new cluster");
  	exec_prog(ctx, true,
--- 164,173 ----
  	check_ok(ctx);
  
  	/*
! 	 * We do freeze after analyze so pg_statistic is also frozen.
! 	 * template0 is not frozen here, but data rows were frozen by initdb,
! 	 * and we set its datfrozenxid and relfrozenxids later to match the
! 	 * new xid counter later.
  	 */
  	prep_status(ctx, "Freezing all rows on the new cluster");
  	exec_prog(ctx, true,
***************
*** 292,339 ****
  set_frozenxids(migratorContext *ctx)
  {
  	int			dbnum;
! 	PGconn	   *conn;
  	PGresult   *dbres;
  	int			ntups;
  
  	prep_status(ctx, "Setting frozenxid counters in new cluster");
  
! 	conn = connectToServer(ctx, "template1", CLUSTER_NEW);
  
  	/* set pg_database.datfrozenxid */
! 	PQclear(executeQueryOrDie(ctx, conn,
  							  "UPDATE pg_catalog.pg_database "
! 							  "SET	datfrozenxid = '%u' "
! 							  "WHERE datallowconn = true",
  							  ctx->old.controldata.chkpnt_nxtxid));
  
  	/* get database names */
! 	dbres = executeQueryOrDie(ctx, conn,
! 							  "SELECT	datname "
! 							  "FROM	pg_catalog.pg_database "
! 							  "WHERE datallowconn = true");
  
! 	/* free dbres below */
! 	PQfinish(conn);
  
  	ntups = PQntuples(dbres);
  	for (dbnum = 0; dbnum < ntups; dbnum++)
  	{
! 		conn = connectToServer(ctx, PQgetvalue(dbres, dbnum, 0), CLUSTER_NEW);
  
  		/* set pg_class.relfrozenxid */
  		PQclear(executeQueryOrDie(ctx, conn,
  								  "UPDATE	pg_catalog.pg_class "
  								  "SET	relfrozenxid = '%u' "
  		/* only heap and TOAST are vacuumed */
! 								  "WHERE	relkind = 'r' OR "
! 								  "		relkind = 't'",
  								  ctx->old.controldata.chkpnt_nxtxid));
  		PQfinish(conn);
  	}
  
  	PQclear(dbres);
  
  	check_ok(ctx);
  }
  
--- 295,366 ----
  set_frozenxids(migratorContext *ctx)
  {
  	int			dbnum;
! 	PGconn	   *conn, *conn_template1;
  	PGresult   *dbres;
  	int			ntups;
+ 	int			i_datname;
+ 	int			i_datallowconn;
  
  	prep_status(ctx, "Setting frozenxid counters in new cluster");
  
! 	conn_template1 = connectToServer(ctx, "template1", CLUSTER_NEW);
  
  	/* set pg_database.datfrozenxid */
! 	PQclear(executeQueryOrDie(ctx, conn_template1,
  							  "UPDATE pg_catalog.pg_database "
! 							  "SET	datfrozenxid = '%u'",
  							  ctx->old.controldata.chkpnt_nxtxid));
  
  	/* get database names */
! 	dbres = executeQueryOrDie(ctx, conn_template1,
! 							  "SELECT	datname, datallowconn "
! 							  "FROM	pg_catalog.pg_database");
  
! 	i_datname = PQfnumber(dbres, "datname");
! 	i_datallowconn = PQfnumber(dbres, "datallowconn");
  
  	ntups = PQntuples(dbres);
  	for (dbnum = 0; dbnum < ntups; dbnum++)
  	{
! 		char *datname = PQgetvalue(dbres, dbnum, i_datname);
! 		char *datallowconn= PQgetvalue(dbres, dbnum, i_datallowconn);
! 
! 		/*
! 		 *	We must update databases where datallowconn = false, e.g.
! 		 *	template0, because autovacuum increments their datfrozenxids and
! 		 *	relfrozenxids even if autovacuum is turned off, and even though
! 		 *	all the data rows are already frozen  To enable this, we
! 		 *	temporarily change datallowconn.
! 		 */
! 		if (strcmp(datallowconn, "f") == 0)
! 			PQclear(executeQueryOrDie(ctx, conn_template1,
! 								  "UPDATE pg_catalog.pg_database "
! 								  "SET	datallowconn = true "
! 								  "WHERE datname = '%s'", datname));
! 
! 		conn = connectToServer(ctx, datname, CLUSTER_NEW);
  
  		/* set pg_class.relfrozenxid */
  		PQclear(executeQueryOrDie(ctx, conn,
  								  "UPDATE	pg_catalog.pg_class "
  								  "SET	relfrozenxid = '%u' "
  		/* only heap and TOAST are vacuumed */
! 								  "WHERE	relkind IN ('r', 't')",
  								  ctx->old.controldata.chkpnt_nxtxid));
  		PQfinish(conn);
+ 
+ 		/* Reset datallowconn flag */
+ 		if (strcmp(datallowconn, "f") == 0)
+ 			PQclear(executeQueryOrDie(ctx, conn_template1,
+ 								  "UPDATE pg_catalog.pg_database "
+ 								  "SET	datallowconn = false "
+ 								  "WHERE datname = '%s'", datname));
  	}
  
  	PQclear(dbres);
  
+ 	PQfinish(conn_template1);
+ 
  	check_ok(ctx);
  }
  
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to