On Tue, Jan  7, 2014 at 08:19:49AM -0500, Peter Eisentraut wrote:
> That was probably me. I'll look into it.
> 
> 
> 
> > On Jan 6, 2014, at 11:40 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > 
> > Bruce Momjian <br...@momjian.us> writes:
> >>> On Sun, Dec 29, 2013 at 02:48:21AM -0500, Tom Lane wrote:
> >>> 3. pg_upgrade ignores the fact that pg_resetxlog failed, and keeps going.
> > 
> >> Does pg_resetxlog return a non-zero exit status?  If so, pg_upgrade
> >> should have caught that and exited.
> > 
> > It certainly does:
> > 
> >    if (errno)
> >    {
> >        fprintf(stderr, _("%s: could not read from directory \"%s\": %s\n"),
> >                progname, XLOGDIR, strerror(errno));
> >        exit(1);
> >    }
> > 
> > The bug is that pg_upgrade appears to assume (in many places not just this
> > one) that exec_prog() will abort if the called program fails, but *it
> > doesn't*, contrary to the claim in its own header comment.  This is
> > because pg_log(FATAL, ...) doesn't call exit().  pg_fatal() does, but
> > that's not what's being called in the throw_error case.
> > 
> > I imagine that this used to work correctly and got broken by some
> > ill-advised refactoring, but whatever the origin, it's 100% broken today.

I know Peter is looking at this, but I looked at and I can't see the
problem.  Every call of exec_prog() that uses pg_resetxlog has
throw_error = true, and the test there is:

    result = system(cmd);

    if (result != 0)
        ...
        pg_log(FATAL, ...)

and in pg_log_v() I see:

        switch (type)
        ...
        case PG_FATAL:
            printf("\n%s", _(message));
            printf("Failure, exiting\n");
-->         exit(1);
            break;

so I am not clear how you are seeing the return status of pg_resetxlog
ignored.  I tried the attached patch which causes pg_resetxlog -f to
return -1, and got the proper error from pg_upgrade in git head:

        Performing Upgrade
        ------------------
        Analyzing all rows in the new cluster                       ok
        Freezing all rows on the new cluster                        ok
        Deleting files from new pg_clog                             ok
        Copying old pg_clog to new server                           ok
        Setting next transaction ID for new cluster
        *failure*
        
        Consult the last few lines of "pg_upgrade_utility.log" for
        the probable cause of the failure.
        Failure, exiting

and the last line in "pg_upgrade_utility.log" is:

        command: "/u/pgsql/bin/pg_resetxlog" -f -x 683 "/u/pgsql/data" >>
        "pg_upgrade_utility.log" 2>&1

-- 
  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c
new file mode 100644
index 03f2fad..3e67630
*** a/src/bin/pg_resetxlog/pg_resetxlog.c
--- b/src/bin/pg_resetxlog/pg_resetxlog.c
*************** main(int argc, char *argv[])
*** 121,126 ****
--- 121,127 ----
  		{
  			case 'f':
  				force = true;
+ exit(1);
  				break;
  
  			case 'n':
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to