Re: [HACKERS] Recovery from multi trouble
On Mon, 2006-03-27 at 12:14 +0900, OKADA Satoshi wrote: Our aim is giving database recovery chances to a database administrator at PostgreSQL startup time when there is possibility of data loss of losing log files. There is no possibility of data loss because of loss of a single log file, if you have your hardware configured correctly. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Recovery from multi trouble
Simon Riggs [EMAIL PROTECTED] writes: On Mon, 2006-03-27 at 12:14 +0900, OKADA Satoshi wrote: Our aim is giving database recovery chances to a database administrator at PostgreSQL startup time when there is possibility of data loss of losing log files. There is no possibility of data loss because of loss of a single log file, if you have your hardware configured correctly. I'm fairly concerned about whether this isn't just replacing one failure mode with another one. See nearby discussion with Alex Bahdushka for a graphic reminder why PANICing on any little thing during WAL replay is not necessarily such a great idea. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Recovery from multi trouble
Martijn van Oosterhout wrote: On Thu, Dec 22, 2005 at 10:53:39AM +, Simon Riggs wrote: IMHO the problem is the deletion of the xlog file, not the error message. If you *did* lose an xlog file, would you not expect the system to come up anyway? You're saying that you'd want the system to stay down because of this? Would you want the system to be less available in that circumstance? Well, that's what pg_resetxlog does. If you have an unclean shutdown and you lose the xlog, you've possibly lost data. Should the postmaster just come up and pretend nothing happened? I guess you want might a new postmaster option: don't come up if you are damaged. Would you really use that? Well, we have zero_damaged_pages, which is off by default. Overall, thank you for doing the durability testing. It is good to know that you're doing that and taking the time to report any issues you see. Having a system that just blithely continues in the face of possible data loss doesn't seem very nice either. Sure, it's nice to know about it but is it really something we can do something about? The admin either restores from backup or runs pg_resetxlog, accepting the fact data will be lost. I don't think this is something postgres should be doing on its own. Thank you for comment, and I'm sorry that my reply is too late. Our aim is giving database recovery chances to a database administrator at PostgreSQL startup time when there is possibility of data loss of losing log files. Because we plan to use clustering software that switches server machine automatically when PostgreSQL server is down by any trouble, it is adequate that PostgreSQL should stop a startup process and tell us an anomaly in such a case. And then, we should do (manually) a proper recovery operation by the database administrator. Therefore, we think new startup option of postmaster; * Introduce new startup option(ex. postmaster -R). * When postmaster can't read xlog and startup with this option, postmaster stop startup process with PANIC. We will make a patch. Do you think this patch? OKADA Satoshi NTT Cyberspace Lab. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Recovery from multi trouble
On Mon, 2005-12-19 at 17:17 +0900, OKADA Satoshi wrote: Tom Lane wrote: OKADA Satoshi [EMAIL PROTECTED] writes: The loss of log was simulated by deleting the latest xlog file. What does that have to do with reality? Postgres is very careful not to use an xlog file until it's been fully metadata-synced. You might as well complain that PG doesn't recover after rm -rf / ... In this case(postmaster abnormal end ,and log is lost), I understand that database cannot recover normally. Though a database cannot recover normally, postmaster does not output a clear message showing this situation. I think that it is a problem. IMHO the problem is the deletion of the xlog file, not the error message. If you *did* lose an xlog file, would you not expect the system to come up anyway? You're saying that you'd want the system to stay down because of this? Would you want the system to be less available in that circumstance? I guess you want might a new postmaster option: don't come up if you are damaged. Would you really use that? Overall, thank you for doing the durability testing. It is good to know that you're doing that and taking the time to report any issues you see. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Recovery from multi trouble
On Thu, Dec 22, 2005 at 10:53:39AM +, Simon Riggs wrote: IMHO the problem is the deletion of the xlog file, not the error message. If you *did* lose an xlog file, would you not expect the system to come up anyway? You're saying that you'd want the system to stay down because of this? Would you want the system to be less available in that circumstance? Well, that's what pg_resetxlog does. If you have an unclean shutdown and you lose the xlog, you've possibly lost data. Should the postmaster just come up and pretend nothing happened? I guess you want might a new postmaster option: don't come up if you are damaged. Would you really use that? Well, we have zero_damaged_pages, which is off by default. Overall, thank you for doing the durability testing. It is good to know that you're doing that and taking the time to report any issues you see. Having a system that just blithely continues in the face of possible data loss doesn't seem very nice either. Sure, it's nice to know about it but is it really something we can do something about? The admin either restores from backup or runs pg_resetxlog, accepting the fact data will be lost. I don't think this is something postgres should be doing on its own. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a tool for doing 5% of the work and then sitting around waiting for someone else to do the other 95% so you can sue them. pgpcJa0Nhxn1G.pgp Description: PGP signature
Re: [HACKERS] Recovery from multi trouble
Tom Lane wrote: OKADA Satoshi [EMAIL PROTECTED] writes: The loss of log was simulated by deleting the latest xlog file. What does that have to do with reality? Postgres is very careful not to use an xlog file until it's been fully metadata-synced. You might as well complain that PG doesn't recover after rm -rf / ... In this case(postmaster abnormal end ,and log is lost), I understand that database cannot recover normally. Though a database cannot recover normally, postmaster does not output a clear message showing this situation. I think that it is a problem. Thanks, OKADA Satoshi ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Recovery from multi trouble
Hi I am testing the durability of PostgreSQL. When a part of log was lost due to the trouble(power supply trouble etc.), I tested about the recovery of PostgreSQL. The loss of log was simulated by deleting the latest xlog file. postmaster startup, it only output log message No such file or directory while recovering database. Here, the client can be connected with the database.I tried to access table by psql, but could not access it or all query was lost. When xlog was lost or destroyed, I think that postmaster should output PANIC message, and interrupt startup process. (At least, I think that log level should be Error.) How do you think? My operations are as follows. PostgreSQL Version: 8.1.1 OS: Fedora Core 4 * create database and table createdb testdb CREATE DATABASE pgbench -d testdb -i * database access by using pgbench pgbench testdb -t 1 * kill postgresql processes before end of pgbench execution (simulate trouble) ps -ef | grep post postgres 18901 1 0 03:10 pts/2 00:00:00 /home/postgres/Work/pg811/bin/postmaster postgres 18903 18901 0 03:10 pts/2 00:00:00 postgres: writer process postgres 18904 18901 0 03:10 pts/2 00:00:00 postgres: stats buffer process postgres 18905 18904 0 03:10 pts/2 00:00:00 postgres: stats collector process postgres 18947 18901 59 03:11 pts/2 00:00:01 postgres: postgres testdb [local] UPDATE postgres 18949 18918 0 03:11 pts/3 00:00:00 grep post kill -9 18901; kill -9 18947 * latest xlog remove (sumilate xlog trouble) cd $PGDATA/pg_xlog; ls 0001 00010001 archive_status/ mv 00010001 /tmp/. * postmaster start(recovery start) pg_ctl -l pg811_1.log start * check tables of pgbench psql testdb testdb=# SELECT * from history ; tid | bid | aid | delta | mtime | filler -+-+-+---+---+ (0 rows) testdb=# SELECT * from accounts ; ERROR: xlog flush request 0/1200F88 is not satisfied flushed only to 0/160 CONTEXT: writing block 12 of relation 1663/16384/2619 * contnt of postmaster log file, pg811_1.log --- LOG: database system was interrupted at 2005-12-14 03:10:51 JST LOG: checkpoint record is at 0/33AC48 LOG: redo record is at 0/33AC48; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 567; next OID: 24576 LOG: next MultiXactId: 1; next MultiXactOffset: 0 LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/33AC8C LOG: could not open file pg_xlog/00010001 (log file 0, segment 1): no such file or direcotry LOG: redo done at 0/5C LOG: database system is ready LOG: transaction ID wrap limit is 1073742415, limited by database testdb ERROR: xlog flush request 0/1200F88 is not satisfied --- flushed only to 0/160 CONTEXT: writing block 12 of relation 1663/16384/2619 ERROR: xlog flush request 0/11FDE40 is not satisfied --- flushed only to 0/160 CONTEXT: writing block 4 of relation 1663/16384/2619 ERROR: xlog flush request 0/1205938 is not satisfied --- flushed only to 0/160 CONTEXT: writing block 1 of relation 1663/16384/2696 ERROR: xlog flush request 0/11F0DB4 is not satisfied --- flushed only to 0/160 CONTEXT: writing block 9 of relation 1663/16384/2619 ERROR: xlog flush request 0/11F0290 is not satisfied --- flushed only to 0/160 CONTEXT: writing block 1 of relation 1663/16384/2619 ERROR: xlog flush request 0/1206B80 is not satisfied --- flushed only to 0/160 CONTEXT: writing block 2 of relation 1663/16384/2696 ERROR: xlog flush request 0/1200F88 is not satisfied --- flushed only to 0/160 CONTEXT: writing block 12 of relation 1663/16384/2619 WARNING: could not write block 12 of 1663/16384/2619 DETAIL: Multiple failures --- write error may be permanent. ERROR: xlog flush request 0/11FDE40 is not satisfied --- flushed only to 0/160 CONTEXT: writing block 4 of relation 1663/16384/2619 WARNING: could not write block 4 of 1663/16384/2619 DETAIL: Multiple failures --- write error may be permanent. ERROR: xlog flush request 0/12D69A0 is not satisfied --- flushed only to 0/160 CONTEXT: writing block 0 of relation 1663/16384/16391 ERROR: xlog flush request 0/134F6D4 is not satisfied --- flushed only to 0/160 CONTEXT: writing block 1 of relation 1663/16384/16391 ERROR: xlog flush request 0/139AF10 is not satisfied --- flushed only to 0/160 CONTEXT: writing block 2 of relation 1663/16384/16391 ERROR: xlog flush request 0/13CC814 is not satisfied --- flushed only to 0/160 CONTEXT: writing block 3 of relation 1663/16384/16391 LOG: received smart shutdown request LOG: shutting down PANIC: xlog flush request 0/1205938 is not satisfied --- flushed only to 0/160 CONTEXT: writing block 1 of relation 1663/16384/2696 LOG: background writer process (PID 18969) was terminated by signal 6 LOG: terminating any other active server processes LOG: all server processes terminated;
Re: [HACKERS] Recovery from multi trouble
OKADA Satoshi [EMAIL PROTECTED] writes: The loss of log was simulated by deleting the latest xlog file. What does that have to do with reality? Postgres is very careful not to use an xlog file until it's been fully metadata-synced. You might as well complain that PG doesn't recover after rm -rf / ... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly