Re: [HACKERS] invalid magic number in log segment

2013-12-11 Thread Heikki Linnakangas

On 12/11/2013 09:44 AM, Erikjan Rijkers wrote:

I don't know whether the below constitutes a bug, but:

Daily (sometimes even more often) I recompile 9.4devel (after git pull)  to run 
a large dev database (100 GB or so).

To avoid frequent initdb and many-hour-restore of data, I do this only when the 
following two #defines are unchanged:
   CATALOG_VERSION_NO in src/include/catalog/catversion.h, and
   PG_CONTROL_VERSION in src/include/catalog/pg_control.h

the goal being to always run the latest db, without having to reload the ~100 
GB dev db unexpectedly at inconvenient times.

Generally, this works OK.

However, the last few weeks I sometimes get, after such recompiling,  'invalid 
magic number' errors from which I don't know
how to recover (it means, apparently, an initdb is needed and I have then to 
reload the database).


2013-12-11 00:15:25.627 CET 25304 LOG:  received smart shutdown request
2013-12-11 00:15:25.631 CET 25306 LOG:  shutting down
2013-12-11 00:15:25.904 CET 25306 LOG:  database system is shut down
2013-12-11 08:11:59.858 CET 25490 LOG:  database system was shut down at 
2013-12-11 00:15:25 CET
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid magic number D078 in log 
segment 000100630034, offset 0
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid primary checkpoint record
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid magic number D078 in log 
segment 000100630034, offset 0
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid secondary checkpoint record
2013-12-11 08:11:59.901 CET 25490 PANIC:  could not locate a valid checkpoint 
record
2013-12-11 08:12:00.326 CET 25492 FATAL:  the database system is starting up
2013-12-11 08:12:01.328 CET 25493 FATAL:  the database system is starting up
2013-12-11 08:12:01.682 CET 25489 LOG:  startup process (PID 25490) was 
terminated by signal 6: Aborted
2013-12-11 08:12:01.682 CET 25489 LOG:  aborting startup due to startup process 
failure


My question is two-fold:

1. (general:)  is this 'invalid magic number' unexpected, and should it be 
reported always?


The magic number it's complaining about is the constant stored in the 
WAL logs, that acts as a version number. Think of it as the 
CATALOG_VERSION_NO, but for the WAL log format. It's bumped whenever the 
WAL format changes.



2. (for my setup specifically:)  is there any way that I can recognize, 
beforehand, at the code base level, such an
impending 'invalid magic number' state?


Keep an eye on the XLOG_PAGE_MAGIC constaint in xlog_internal.h.


Can de db be recovered from easily? (although this dev database is expendable, 
it takes many hours to rebuild; I'd like to
avoid that if possible).


Yes, you can use pg_resetxlog to recover.

BTW, you could also use pg_upgrade. That should work also when 
catversion has been bumped.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] invalid magic number in log segment

2013-12-10 Thread Erikjan Rijkers
I don't know whether the below constitutes a bug, but:

Daily (sometimes even more often) I recompile 9.4devel (after git pull)  to run 
a large dev database (100 GB or so).

To avoid frequent initdb and many-hour-restore of data, I do this only when the 
following two #defines are unchanged:
  CATALOG_VERSION_NO in src/include/catalog/catversion.h, and
  PG_CONTROL_VERSION in src/include/catalog/pg_control.h

the goal being to always run the latest db, without having to reload the ~100 
GB dev db unexpectedly at inconvenient times.

Generally, this works OK.

However, the last few weeks I sometimes get, after such recompiling,  'invalid 
magic number' errors from which I don't know
how to recover (it means, apparently, an initdb is needed and I have then to 
reload the database).


2013-12-11 00:15:25.627 CET 25304 LOG:  received smart shutdown request
2013-12-11 00:15:25.631 CET 25306 LOG:  shutting down
2013-12-11 00:15:25.904 CET 25306 LOG:  database system is shut down
2013-12-11 08:11:59.858 CET 25490 LOG:  database system was shut down at 
2013-12-11 00:15:25 CET
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid magic number D078 in log 
segment 000100630034, offset 0
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid primary checkpoint record
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid magic number D078 in log 
segment 000100630034, offset 0
2013-12-11 08:11:59.901 CET 25490 LOG:  invalid secondary checkpoint record
2013-12-11 08:11:59.901 CET 25490 PANIC:  could not locate a valid checkpoint 
record
2013-12-11 08:12:00.326 CET 25492 FATAL:  the database system is starting up
2013-12-11 08:12:01.328 CET 25493 FATAL:  the database system is starting up
2013-12-11 08:12:01.682 CET 25489 LOG:  startup process (PID 25490) was 
terminated by signal 6: Aborted
2013-12-11 08:12:01.682 CET 25489 LOG:  aborting startup due to startup process 
failure


My question is two-fold:

1. (general:)  is this 'invalid magic number' unexpected, and should it be 
reported always?

2. (for my setup specifically:)  is there any way that I can recognize, 
beforehand, at the code base level, such an
impending 'invalid magic number' state?
Can de db be recovered from easily? (although this dev database is expendable, 
it takes many hours to rebuild; I'd like to
avoid that if possible).

thanks,

Erikjan Rijkers









-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers