Re: [HACKERS] Recovery from multi trouble

2006-03-27 Thread Simon Riggs
On Mon, 2006-03-27 at 12:14 +0900, OKADA Satoshi wrote:

 Our aim is giving database recovery chances to a database administrator
 at PostgreSQL startup time when there is possibility of data loss of
 losing log files.

There is no possibility of data loss because of loss of a single log
file, if you have your hardware configured correctly.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Recovery from multi trouble

2006-03-27 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2006-03-27 at 12:14 +0900, OKADA Satoshi wrote:
 Our aim is giving database recovery chances to a database administrator
 at PostgreSQL startup time when there is possibility of data loss of
 losing log files.

 There is no possibility of data loss because of loss of a single log
 file, if you have your hardware configured correctly.

I'm fairly concerned about whether this isn't just replacing one failure
mode with another one.  See nearby discussion with Alex Bahdushka for a
graphic reminder why PANICing on any little thing during WAL replay is
not necessarily such a great idea.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Recovery from multi trouble

2006-03-26 Thread OKADA Satoshi
Martijn van Oosterhout wrote:

On Thu, Dec 22, 2005 at 10:53:39AM +, Simon Riggs wrote:
  

IMHO the problem is the deletion of the xlog file, not the error
message.

If you *did* lose an xlog file, would you not expect the system to come
up anyway? You're saying that you'd want the system to stay down because
of this? Would you want the system to be less available in that
circumstance?



Well, that's what pg_resetxlog does. If you have an unclean shutdown
and you lose the xlog, you've possibly lost data. Should the postmaster
just come up and pretend nothing happened?

  

I guess you want might a new postmaster option: don't come up if you
are damaged. Would you really use that?



Well, we have zero_damaged_pages, which is off by default.

  

Overall, thank you for doing the durability testing. It is good to know
that you're doing that and taking the time to report any issues you see.



Having a system that just blithely continues in the face of possible
data loss doesn't seem very nice either. Sure, it's nice to know about
it but is it really something we can do something about? The admin
either restores from backup or runs pg_resetxlog, accepting the fact
data will be lost. I don't think this is something postgres should be
doing on its own.
  


Thank you for comment, and I'm sorry that my reply is too late.

Our aim is giving database recovery chances to a database administrator
at PostgreSQL startup time when there is possibility of data loss of
losing log files.

Because we plan to use clustering software that switches server
machine automatically when PostgreSQL server is down by any trouble,
it is adequate that PostgreSQL should stop a startup process and tell
us an anomaly in such a case. And then, we should do (manually)
a proper recovery operation by the database administrator.

Therefore, we think new startup option of postmaster;
* Introduce new startup option(ex. postmaster -R).
* When postmaster can't read xlog and startup with this option,
postmaster stop startup process with PANIC.

We will make a patch. Do you think this patch?


OKADA Satoshi
NTT Cyberspace Lab.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Recovery from multi trouble

2005-12-22 Thread Simon Riggs
On Mon, 2005-12-19 at 17:17 +0900, OKADA Satoshi wrote:
 Tom Lane wrote:
 
 OKADA Satoshi [EMAIL PROTECTED] writes:
   
 
 The loss of log was simulated by deleting the latest xlog file. 
 
 
 
 What does that have to do with reality?  Postgres is very careful not to
 use an xlog file until it's been fully metadata-synced.  You might as
 well complain that PG doesn't recover after rm -rf / ...
   
 
 In this case(postmaster abnormal end ,and log is lost), I understand
 that database cannot recover normally.
 
 
 Though a database cannot recover normally, postmaster does not output
 a clear message showing this situation. I think that it is a problem.

IMHO the problem is the deletion of the xlog file, not the error
message.

If you *did* lose an xlog file, would you not expect the system to come
up anyway? You're saying that you'd want the system to stay down because
of this? Would you want the system to be less available in that
circumstance?

I guess you want might a new postmaster option: don't come up if you
are damaged. Would you really use that?

Overall, thank you for doing the durability testing. It is good to know
that you're doing that and taking the time to report any issues you see.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Recovery from multi trouble

2005-12-22 Thread Martijn van Oosterhout
On Thu, Dec 22, 2005 at 10:53:39AM +, Simon Riggs wrote:
 IMHO the problem is the deletion of the xlog file, not the error
 message.
 
 If you *did* lose an xlog file, would you not expect the system to come
 up anyway? You're saying that you'd want the system to stay down because
 of this? Would you want the system to be less available in that
 circumstance?

Well, that's what pg_resetxlog does. If you have an unclean shutdown
and you lose the xlog, you've possibly lost data. Should the postmaster
just come up and pretend nothing happened?

 I guess you want might a new postmaster option: don't come up if you
 are damaged. Would you really use that?

Well, we have zero_damaged_pages, which is off by default.

 Overall, thank you for doing the durability testing. It is good to know
 that you're doing that and taking the time to report any issues you see.

Having a system that just blithely continues in the face of possible
data loss doesn't seem very nice either. Sure, it's nice to know about
it but is it really something we can do something about? The admin
either restores from backup or runs pg_resetxlog, accepting the fact
data will be lost. I don't think this is something postgres should be
doing on its own.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpcJa0Nhxn1G.pgp
Description: PGP signature


Re: [HACKERS] Recovery from multi trouble

2005-12-19 Thread OKADA Satoshi
Tom Lane wrote:

OKADA Satoshi [EMAIL PROTECTED] writes:
  

The loss of log was simulated by deleting the latest xlog file. 



What does that have to do with reality?  Postgres is very careful not to
use an xlog file until it's been fully metadata-synced.  You might as
well complain that PG doesn't recover after rm -rf / ...
  

In this case(postmaster abnormal end ,and log is lost), I understand
that database cannot recover normally.


Though a database cannot recover normally, postmaster does not output
a clear message showing this situation. I think that it is a problem.


Thanks, OKADA Satoshi

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] Recovery from multi trouble

2005-12-18 Thread OKADA Satoshi
Hi

I am testing the durability of PostgreSQL. When a part of log 
was lost due to the trouble(power supply trouble etc.), 
I tested about the recovery of PostgreSQL.

The loss of log was simulated by deleting the latest xlog file. 

postmaster startup, it only output log message No such file or 
directory while recovering database.
Here, the client can be connected with the database.I tried to 
access table by psql, but could not access it or all query was lost.


When xlog was lost or destroyed, I think that postmaster
should output PANIC message, and interrupt startup process.
(At least, I think that log level should be Error.)

How do you think?

My operations are as follows.


PostgreSQL Version: 8.1.1
OS: Fedora Core 4

* create database and table
   createdb testdb
  CREATE DATABASE
   pgbench -d testdb -i

* database access by using pgbench
   pgbench testdb -t 1 

* kill postgresql processes before end of pgbench execution (simulate trouble)
   ps -ef | grep post
  postgres 18901 1 0 03:10 pts/2 00:00:00 
/home/postgres/Work/pg811/bin/postmaster
  postgres 18903 18901 0 03:10 pts/2 00:00:00 postgres: writer process
  postgres 18904 18901 0 03:10 pts/2 00:00:00 postgres: stats buffer process
  postgres 18905 18904 0 03:10 pts/2 00:00:00 postgres: stats collector process
  postgres 18947 18901 59 03:11 pts/2 00:00:01 postgres: postgres testdb 
[local] UPDATE
  postgres 18949 18918 0 03:11 pts/3 00:00:00 grep post
   kill -9 18901; kill -9 18947


 * latest xlog remove (sumilate xlog trouble) 
   cd $PGDATA/pg_xlog; ls
  0001 00010001 archive_status/
   mv 00010001 /tmp/.

* postmaster start(recovery start)
   pg_ctl -l pg811_1.log start

* check tables of pgbench
   psql testdb
 testdb=# SELECT * from history ;
   tid | bid | aid | delta | mtime | filler
  -+-+-+---+---+
  (0 rows) 

  testdb=# SELECT * from accounts ;
  ERROR: xlog flush request 0/1200F88 is not satisfied flushed
  only to 0/160
  CONTEXT: writing block 12 of relation 1663/16384/2619


* contnt of postmaster log file, pg811_1.log
---
LOG: database system was interrupted at 2005-12-14 03:10:51 JST
LOG: checkpoint record is at 0/33AC48
LOG: redo record is at 0/33AC48; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 567; next OID: 24576
LOG: next MultiXactId: 1; next MultiXactOffset: 0
LOG: database system was not properly shut down; automatic recovery in progress
LOG: redo starts at 0/33AC8C
LOG: could not open file pg_xlog/00010001 (log file 0, 
segment 1): no such file or direcotry
LOG: redo done at 0/5C
LOG: database system is ready
LOG: transaction ID wrap limit is 1073742415, limited by database testdb
ERROR: xlog flush request 0/1200F88 is not satisfied --- flushed only to 
0/160
CONTEXT: writing block 12 of relation 1663/16384/2619
ERROR: xlog flush request 0/11FDE40 is not satisfied --- flushed only to 
0/160
CONTEXT: writing block 4 of relation 1663/16384/2619
ERROR: xlog flush request 0/1205938 is not satisfied --- flushed only to 
0/160
CONTEXT: writing block 1 of relation 1663/16384/2696
ERROR: xlog flush request 0/11F0DB4 is not satisfied --- flushed only to 
0/160
CONTEXT: writing block 9 of relation 1663/16384/2619
ERROR: xlog flush request 0/11F0290 is not satisfied --- flushed only to 
0/160
CONTEXT: writing block 1 of relation 1663/16384/2619
ERROR: xlog flush request 0/1206B80 is not satisfied --- flushed only to 
0/160
CONTEXT: writing block 2 of relation 1663/16384/2696
ERROR: xlog flush request 0/1200F88 is not satisfied --- flushed only to 
0/160
CONTEXT: writing block 12 of relation 1663/16384/2619
WARNING: could not write block 12 of 1663/16384/2619
DETAIL: Multiple failures --- write error may be permanent.
ERROR: xlog flush request 0/11FDE40 is not satisfied --- flushed only to 
0/160
CONTEXT: writing block 4 of relation 1663/16384/2619
WARNING: could not write block 4 of 1663/16384/2619
DETAIL: Multiple failures --- write error may be permanent.
ERROR: xlog flush request 0/12D69A0 is not satisfied --- flushed only to 
0/160
CONTEXT: writing block 0 of relation 1663/16384/16391
ERROR: xlog flush request 0/134F6D4 is not satisfied --- flushed only to 
0/160
CONTEXT: writing block 1 of relation 1663/16384/16391
ERROR: xlog flush request 0/139AF10 is not satisfied --- flushed only to 
0/160
CONTEXT: writing block 2 of relation 1663/16384/16391
ERROR: xlog flush request 0/13CC814 is not satisfied --- flushed only to 
0/160
CONTEXT: writing block 3 of relation 1663/16384/16391
LOG: received smart shutdown request
LOG: shutting down
PANIC: xlog flush request 0/1205938 is not satisfied --- flushed only to 
0/160
CONTEXT: writing block 1 of relation 1663/16384/2696
LOG: background writer process (PID 18969) was terminated by signal 6
LOG: terminating any other active server processes
LOG: all server processes terminated; 

Re: [HACKERS] Recovery from multi trouble

2005-12-18 Thread Tom Lane
OKADA Satoshi [EMAIL PROTECTED] writes:
 The loss of log was simulated by deleting the latest xlog file. 

What does that have to do with reality?  Postgres is very careful not to
use an xlog file until it's been fully metadata-synced.  You might as
well complain that PG doesn't recover after rm -rf / ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly