Re: [HACKERS] testing cvs HEAD - HS/SR - cannot stat

2010-02-12 Thread Heikki Linnakangas
Ok, I committed a patch to reduce the chatter a bit:

Fujii Masao wrote:
 On Fri, Feb 5, 2010 at 3:58 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 LOG:  starting archive recovery
 
 This is reported even if restore_command is not given and the WAL files are
 never restored from the archive. We should get rid of this in that case?

Yeah, removed.

 LOG:  restore_command = 'cp /home/hlinnaka/pgsql.cvshead/walarchive/%f %p'
 LOG:  standby_mode = 'true'
 LOG:  primary_conninfo = 'host=localhost port=5432 user=rep_user 
 password=reppass'
 LOG:  trigger_file = '/tmp/standby-trigger'
 Do we really need to echo all the lines in recovery.conf? That might be
 interesting information, but perhaps it could be condensed and worded
 more nicely.
 
 It's OK for me to move them from LOG to DEBUG.

Done.

 cp: cannot stat 
 `/home/hlinnaka/pgsql.cvshead/walarchive/00010007': No such 
 file or directory
 This is the noise line Josh started this thread with.
 
 Agreed. But the messages other than ENOENT that restore_command emits
 might be useful.
 
 LOG:  automatic recovery in progress
 Ok, so what?
 
 Seems unnecessary.

I replaced this with a more informative message that says either:

entering standby mode, or
starting point-in-time recovery to XID %u, or
starting archive recovery

 LOG:  initializing recovery connections
 Seems like unnecessary noise, doesn't mean anything to a DBA.
 
 Agreed.

Demoted to DEBUG1.

 LOG:  redo starts at 0/700140C
 I guess this could be useful debug information sometimes.
 
 Agreed.

I left this as it was.

 LOG:  consistent recovery state reached at 0/700142C
 It's nice to know that it has reached consistency, but was there any way
 to know before this line that it hadn't been reached yet? Perhaps the
 redo starts line should mention that consistency hasn't been reached yet.
 
 But redo might restart from the consistent database if the standby
 server was shut down after it reached the consistent status.

Yeah, it would be nice to say whether the database is already consistent
or not, but I've left this alone for now.

 Now, should we print a line when we connect to the master successfully?
 Seems like useful information.
 
 Agreed.
 
 Then there's the LOG lines whenever a file is restored successfully from
 archive; are the necessary anymore, now that you can connect to the
 standby and use pg_last_xlog_replay_location() to poll its status?
 
 How about moving those messages from LOG to DEBUG?

Didn't touch these yet.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing cvs HEAD - HS/SR - cannot stat

2010-02-04 Thread Josh Berkus

 Yeah, this is not a bug.
 
 At first, the standby performs an archive recovery until an invalid
 WAL record is found. Then it starts replication and tries to receive
 the missing WAL records from the primary. So such an error message
 would be logged whenever an invalid record is found and replication
 is started.

Can we improve the error message?  Right now it's alarming people.  Such as:

cannot stat
`/var/data1/pg_stuff/dump/replication_archive/00010002':
End of Log

Of course, it would be even better to supress this error message
entirely unless the user is at DEBUG1 or better.

--Josh Berkus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing cvs HEAD - HS/SR - cannot stat

2010-02-04 Thread Heikki Linnakangas
Josh Berkus wrote:
 Can we improve the error message?  Right now it's alarming people.  Such as:
 
 cannot stat
 `/var/data1/pg_stuff/dump/replication_archive/00010002':
 End of Log

Not really, it's coming from 'cp'. Not sure if we could capture the
stderr and somehow decorate the message to make it less alarming.

This isn't really a new problem, PITR has always done that, but I agree
it's always been a bit clumsy to have those messages in the log.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing cvs HEAD - HS/SR - cannot stat

2010-02-04 Thread Magnus Hagander
2010/2/4 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 Josh Berkus wrote:
 Can we improve the error message?  Right now it's alarming people.  Such as:

 cannot stat
 `/var/data1/pg_stuff/dump/replication_archive/00010002':
 End of Log

 Not really, it's coming from 'cp'. Not sure if we could capture the
 stderr and somehow decorate the message to make it less alarming.

 This isn't really a new problem, PITR has always done that, but I agree
 it's always been a bit clumsy to have those messages in the log.

Can we follow it up with a well-phrased message that we switched to
streaming mode right after it, so people realize it's not a problem?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing cvs HEAD - HS/SR - cannot stat

2010-02-04 Thread Erik Rijkers
On Thu, February 4, 2010 19:29, Heikki Linnakangas wrote:
 Josh Berkus wrote:
 Can we improve the error message?  Right now it's alarming people.  Such as:

 cannot stat
 `/var/data1/pg_stuff/dump/replication_archive/00010002':
 End of Log

 Not really, it's coming from 'cp'. Not sure if we could capture the
 stderr and somehow decorate the message to make it less alarming.

 This isn't really a new problem, PITR has always done that, but I agree
 it's always been a bit clumsy to have those messages in the log.


It was (eventually) clear to me that it was not really a problem,
but only because I was testing anyway :)

I think most people would do what I did: search the documentation
for the string 'No such file or directory' or 'cannot stat'.
(But that produces no mention of this (non-)problem.)

ISTM mentioning in the documentation is good enough.



Erik Rijkers














-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing cvs HEAD - HS/SR - cannot stat

2010-02-04 Thread Heikki Linnakangas
Magnus Hagander wrote:
 2010/2/4 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 Josh Berkus wrote:
 Can we improve the error message?  Right now it's alarming people.  Such as:

 cannot stat
 `/var/data1/pg_stuff/dump/replication_archive/00010002':
 End of Log
 Not really, it's coming from 'cp'. Not sure if we could capture the
 stderr and somehow decorate the message to make it less alarming.

 This isn't really a new problem, PITR has always done that, but I agree
 it's always been a bit clumsy to have those messages in the log.
 
 Can we follow it up with a well-phrased message that we switched to
 streaming mode right after it, so people realize it's not a problem?

Yeah, that would be a good idea. We should give the messages we emit
some holistic thinking. There's quite a few new messages printed thanks
to both Hot Standby and Streaming replication, and it seems we still
need a few more. Which ones are really required, and which ones are just
noise?

Here's what I see in a test standby server:

 LOG:  database system was interrupted while in recovery at log time 
 2010-02-04 20:45:40 EET
 HINT:  If this has occurred more than once some data might be corrupted and 
 you might need to choose an earlier recovery target.

Can we avoid printing this? It's a bit alarming to talk about corrupted
data.

 LOG:  starting archive recovery
 LOG:  restore_command = 'cp /home/hlinnaka/pgsql.cvshead/walarchive/%f %p'
 LOG:  standby_mode = 'true'
 LOG:  primary_conninfo = 'host=localhost port=5432 user=rep_user 
 password=reppass'
 LOG:  trigger_file = '/tmp/standby-trigger'

Do we really need to echo all the lines in recovery.conf? That might be
interesting information, but perhaps it could be condensed and worded
more nicely.

 cp: cannot stat 
 `/home/hlinnaka/pgsql.cvshead/walarchive/00010007': No such 
 file or directory

This is the noise line Josh started this thread with.

 LOG:  automatic recovery in progress

Ok, so what?

 LOG:  initializing recovery connections

Seems like unnecessary noise, doesn't mean anything to a DBA.

 LOG:  redo starts at 0/700140C

I guess this could be useful debug information sometimes.

 LOG:  consistent recovery state reached at 0/700142C

It's nice to know that it has reached consistency, but was there any way
to know before this line that it hadn't been reached yet? Perhaps the
redo starts line should mention that consistency hasn't been reached yet.

 LOG:  database system is ready to accept read only connections

This is an important piece of information with Hot Standby.

Now, should we print a line when we connect to the master successfully?
Seems like useful information.

Then there's the LOG lines whenever a file is restored successfully from
archive; are the necessary anymore, now that you can connect to the
standby and use pg_last_xlog_replay_location() to poll its status?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing cvs HEAD - HS/SR - cannot stat

2010-02-04 Thread Fujii Masao
On Fri, Feb 5, 2010 at 3:58 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 LOG:  database system was interrupted while in recovery at log time 
 2010-02-04 20:45:40 EET
 HINT:  If this has occurred more than once some data might be corrupted and 
 you might need to choose an earlier recovery target.

 Can we avoid printing this? It's a bit alarming to talk about corrupted
 data.

I think this is useful information as a hint of data corruption. But it's
odd for this to be reported even when restarting the standby server that
has been previously shut down normally (i.e., in fact shutdown mode).

How about adding new system status like DB_SHUTDOWNED_IN_RECOVERY, setting
the status to it when the shutdown is performed during recovery, and
reporting the suitable message when starting up the server from it?

 LOG:  starting archive recovery

This is reported even if restore_command is not given and the WAL files are
never restored from the archive. We should get rid of this in that case?

 LOG:  restore_command = 'cp /home/hlinnaka/pgsql.cvshead/walarchive/%f %p'
 LOG:  standby_mode = 'true'
 LOG:  primary_conninfo = 'host=localhost port=5432 user=rep_user 
 password=reppass'
 LOG:  trigger_file = '/tmp/standby-trigger'

 Do we really need to echo all the lines in recovery.conf? That might be
 interesting information, but perhaps it could be condensed and worded
 more nicely.

It's OK for me to move them from LOG to DEBUG.

 cp: cannot stat 
 `/home/hlinnaka/pgsql.cvshead/walarchive/00010007': No such 
 file or directory

 This is the noise line Josh started this thread with.

Agreed. But the messages other than ENOENT that restore_command emits
might be useful.

 LOG:  automatic recovery in progress

 Ok, so what?

Seems unnecessary.

 LOG:  initializing recovery connections

 Seems like unnecessary noise, doesn't mean anything to a DBA.

Agreed.

 LOG:  redo starts at 0/700140C

 I guess this could be useful debug information sometimes.

Agreed.

 LOG:  consistent recovery state reached at 0/700142C

 It's nice to know that it has reached consistency, but was there any way
 to know before this line that it hadn't been reached yet? Perhaps the
 redo starts line should mention that consistency hasn't been reached yet.

But redo might restart from the consistent database if the standby
server was shut down after it reached the consistent status.

 LOG:  database system is ready to accept read only connections

 This is an important piece of information with Hot Standby.

Agreed.

 Now, should we print a line when we connect to the master successfully?
 Seems like useful information.

Agreed.

 Then there's the LOG lines whenever a file is restored successfully from
 archive; are the necessary anymore, now that you can connect to the
 standby and use pg_last_xlog_replay_location() to poll its status?

How about moving those messages from LOG to DEBUG?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] testing cvs HEAD - HS/SR - cannot stat

2010-02-03 Thread Fujii Masao
On Thu, Feb 4, 2010 at 6:39 AM, Erik Rijkers e...@xs4all.nl wrote:
 However, whenever (re-)starting the slave the I get
 messages like:

 cp: cannot stat 
 `/var/data1/pg_stuff/dump/replication_archive/00010002': No 
 such
 file or directory

 At this point,  /var/data1/pg_stuff/dump/replication_archive:

 -rw--- 1        240 Feb  3 22:35 
 00010001.0020.backup
 -rw--- 1   16777216 Feb  3 22:35 00010001
 -rw--- 1   16777216 Feb  3 22:35 0001


 Maybe the message is not really a bug: everything works fine otherwise.

Yeah, this is not a bug.

At first, the standby performs an archive recovery until an invalid
WAL record is found. Then it starts replication and tries to receive
the missing WAL records from the primary. So such an error message
would be logged whenever an invalid record is found and replication
is started.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers