Andres Freund wrote: > Hi, > > On 2013-12-24 12:58:04 -0300, Alvaro Herrera wrote: > > > Shortly after this patch was committed, buildfarm member locust (running > > > Mac OS X 10.5 apparently) started failing the pg_upgrade check: > > > > > > command: > > > "/Users/pgbuildfarm/Documents/workdir/HEAD/pgsql.82393/contrib/pg_upgrade/tmp_check/install//Users/pgbuildfarm/Documents/workdir//HEAD/inst/bin/pg_ctl" > > > -w -l "pg_upgrade_server.log" -D > > > "/Users/pgbuildfarm/Documents/workdir/HEAD/pgsql.82393/contrib/pg_upgrade/tmp_check/data" > > > -o "-p 57632 -b -c synchronous_commit=off -c fsync=off -c > > > full_page_writes=off -c listen_addresses='' -c > > > unix_socket_permissions=0700 -c > > > unix_socket_directories='/Users/pgbuildfarm/Documents/workdir/HEAD/pgsql.82393/contrib/pg_upgrade'" > > > start >> "pg_upgrade_server.log" 2>&1 > > > waiting for server to start....LOG: database system was shut down at > > > 2013-12-19 12:51:16 CET > > > LOG: invalid primary checkpoint record > > > LOG: invalid secondary checkpoint link in control file > > > PANIC: could not locate a valid checkpoint record > > > > Any comment on this problem? Somehow ReadRecord is unable to find a > > checkpoint, yet there's no error message to be seen anywhere, whereas > > pg_resetxlog does report it: > > > > > command: > > > "/Users/pgbuildfarm/Documents/workdir/HEAD/pgsql.82393/contrib/pg_upgrade/tmp_check/install//Users/pgbuildfarm/Documents/workdir//HEAD/inst/bin/pg_resetxlog" > > > -l 000000010000000000000009 > > > "/Users/pgbuildfarm/Documents/workdir/HEAD/pgsql.82393/contrib/pg_upgrade/tmp_check/data" > > > >> "pg_upgrade_utility.log" 2>&1 > > > pg_resetxlog: could not read from directory "pg_xlog": Invalid argument > > > > I cannot but think xlogreader is at fault. > > > > Regardless of the solution to the Mac OS X problem, ISTM this should be > > fixed. > > I didn't look at any code, and I won't today, but it doesn't look > surprising - the report when starting the server above is presumable the > one in ReadCheckpoint() (or similar) and it probably just reports that > ReadRecord() didn't return a record.
How is this not surprising? Surely failing to find a checkpoint record is not a problem to be taken lightly. > pg_resetxlog (which doesn't use xlogreader!) reports that it couldn't > read from directory "pg_xlog", so there's something wonky independently > from xlogreader. Yes, most likely there is. My point is that the LOG messages above should have logged the system error that caused the checkpoint record to be unfindable. > I'd guess that xlog.c read_page callback errors out without reporting > an error. IIRC we're logging some failures as DEBUG there, because > they really aren't unexpected, and normally just signal the end of > wal. Hmm? At least, I recall something like a "unexpected pageaddr" message is sometimes logged when end-of-wal is found. Why would other error messages be hidden? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers