Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-06 Thread Simon Riggs
On Fri, 2007-01-05 at 22:57 -0500, Tom Lane wrote:
 Jim Nasby [EMAIL PROTECTED] writes:
  On Jan 5, 2007, at 6:30 AM, Zeugswetter Andreas ADI SD wrote:
  Ok, so when you need CRC's on a replicate (but not on the master) you
 
  Which sounds to me like a good reason to allow the option in  
  recovery.conf as well...
 
 Actually, I'm not seeing the use-case for a slave having a different
 setting from the master at all?
 
   My backup server is less reliable than the primary.
 
   My backup server is more reliable than the primary.
 
 Somehow, neither of these statements seem likely to be uttered by
 a sane DBA ...

If I take a backup of a server and bring it up on a new system, the
blocks in the backup will not have been CRC checked before they go to
disk.

If I take the same server and now stream log records across to it, why
*must* that data be CRC checked, when the original data has not been?

I'm proposing choice, with a safe default. That's all.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-06 Thread Bruce Momjian
Simon Riggs wrote:
  Somehow, neither of these statements seem likely to be uttered by
  a sane DBA ...
 
 If I take a backup of a server and bring it up on a new system, the
 blocks in the backup will not have been CRC checked before they go to
 disk.
 
 If I take the same server and now stream log records across to it, why
 *must* that data be CRC checked, when the original data has not been?
 
 I'm proposing choice, with a safe default. That's all.

Are there performance numbers to justify the option?  We don't give
people options unless there is real value to it.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-05 Thread Zeugswetter Andreas ADI SD

   Recovery can occur with/without same setting of wal_checksum, to
avoid
   complications from crashes immediately after turning GUC on.
   
   Surely not.  Otherwise even the on setting is not really a
defense.
  
   Only when the CRC is exactly zero, which happens very very rarely.
  
  It works most of the time doesn't exactly satisfy me.  What's the

Agreed

  use-case for changing the variable on the fly anyway?  Seems a
better
  solution is just to lock down the setting at postmaster start.

I guess that the use case is more for a WAL based replicate, that 
has/wants a different setting. Maybe we want a WAL entry for the change,
or force a log switch (so you can interrupt the replicate, change it's
setting
and proceed with the next log) ?

Maybe a 3rd mode for replicates that ignores 0 CRC's ?

Andreas

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-05 Thread Simon Riggs
On Fri, 2007-01-05 at 11:01 +0100, Zeugswetter Andreas ADI SD wrote:

   What's the use-case for changing the variable on the fly anyway?  Seems a
 better
   solution is just to lock down the setting at postmaster start.
 
 I guess that the use case is more for a WAL based replicate, that 
 has/wants a different setting. Maybe we want a WAL entry for the change,
 or force a log switch (so you can interrupt the replicate, change it's
 setting
 and proceed with the next log) ?
 
 Maybe a 3rd mode for replicates that ignores 0 CRC's ?

Well, wal_checksum allows you to have this turned ON for the main server
and OFF on a Warm Standby. 

The recovery process doesn't check for postgresql.conf reloads, so
setting it at server start is effectively the same thing in that case.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-05 Thread Zeugswetter Andreas ADI SD

What's the use-case for changing the variable on the fly anyway?
Seems a
  better
solution is just to lock down the setting at postmaster start.
  
  I guess that the use case is more for a WAL based replicate, that 
  has/wants a different setting. Maybe we want a WAL entry for the
change,
  or force a log switch (so you can interrupt the replicate, change
it's
  setting
  and proceed with the next log) ?
  
  Maybe a 3rd mode for replicates that ignores 0 CRC's ?
 
 Well, wal_checksum allows you to have this turned ON for the main
server
 and OFF on a Warm Standby.

Ok, so when you need CRC's on a replicate (but not on the master) you
turn it
off during standby replay, but turn it on when you start the replicate
for normal operation.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-05 Thread Zeugswetter Andreas ADI SD

  Ok, so when you need CRC's on a replicate (but not on the master)
you
  turn it
  off during standby replay, but turn it on when you start the
replicate
  for normal operation.
 
 Thought: even when it's off, the CRC had better be computed for
 shutdown-checkpoint records.  Else there's no way to turn it on even
 with a postmaster restart --- unless we accept the idea of poking a
hole
 in the normal mode.  (Which I still dislike, and even more so if the
 special value is zero.  Almost any other value would be safer than
zero.)
 
 On the whole, though, I still don't want to put this in.  I don't
think
 Simon has thought it through sufficiently, 

Well, the part that we do not really want a special value (at least not
0)
is new, and makes things a bit more complicated.

 and we haven't even seen any demonstration of a big speedup.

Yes, iirc the demonstration was with the 64 bit crc instead of the
sufficient 
32-bit (or a bad crc compiler optimization?).
But I do think it can be shown to provide significant speedup
(at least peak burst performance).

Especially on target hardware WAL write IO is extremely fast 
(since it is write cached), so the CPU should show.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-05 Thread Jim Nasby

On Jan 5, 2007, at 6:30 AM, Zeugswetter Andreas ADI SD wrote:

Ok, so when you need CRC's on a replicate (but not on the master) you
turn it
off during standby replay, but turn it on when you start the replicate
for normal operation.


Which sounds to me like a good reason to allow the option in  
recovery.conf as well...

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-05 Thread Tom Lane
Jim Nasby [EMAIL PROTECTED] writes:
 On Jan 5, 2007, at 6:30 AM, Zeugswetter Andreas ADI SD wrote:
 Ok, so when you need CRC's on a replicate (but not on the master) you

 Which sounds to me like a good reason to allow the option in  
 recovery.conf as well...

Actually, I'm not seeing the use-case for a slave having a different
setting from the master at all?

My backup server is less reliable than the primary.

My backup server is more reliable than the primary.

Somehow, neither of these statements seem likely to be uttered by
a sane DBA ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-05 Thread Joshua D. Drake

 Actually, I'm not seeing the use-case for a slave having a different
 setting from the master at all?
 
   My backup server is less reliable than the primary.
 
   My backup server is more reliable than the primary.
 
 Somehow, neither of these statements seem likely to be uttered by
 a sane DBA ...

My backup server is actually my dev machine.
My backup server is just a reporting machine.
My backup machine is using SATA just because it is just an absolute
emergency machine.
My backups machine is also my web server.

Real world dictates differently. Let's not forget that not every company
can spend 100k on two identical machines, yet many companies can spend
50k + 5k for a backup machine based on Sata or secondary services.

Sincerely,

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-04 Thread Simon Riggs
On Thu, 2007-01-04 at 11:09 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Thu, 2007-01-04 at 10:00 -0500, Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:
  Recovery can occur with/without same setting of wal_checksum, to avoid
  complications from crashes immediately after turning GUC on.
  
  Surely not.  Otherwise even the on setting is not really a defense.
 
  Only when the CRC is exactly zero, which happens very very rarely.
 
 It works most of the time doesn't exactly satisfy me.  What's the
 use-case for changing the variable on the fly anyway?  Seems a better
 solution is just to lock down the setting at postmaster start.

That would prevent us from using the secondary checkpoint location, in
the case of a crash effecting the primary checkpoint when it is a
shutdown checkpoint where we changed the setting of wal_checksum. It
seemed safer to allow a very rare error through to the next level of
error checking rather than to close the door so tight that recovery
would not be possible in a very rare case.

If your're good with server start, so am I.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-04 Thread Simon Riggs
On Thu, 2007-01-04 at 17:58 +0100, Florian Weimer wrote:
 * Simon Riggs:
 
  Surely not.  Otherwise even the on setting is not really a defense.
 
  Only when the CRC is exactly zero, which happens very very rarely.
 
 Have you tried switching to Adler32 instead of CRC32?

No. Please explain further.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-04 Thread Tom Lane
Florian Weimer [EMAIL PROTECTED] writes:
 Have you tried switching to Adler32 instead of CRC32?

Is anything known about the error detection capabilities of Adler32?
There's a lot of math behind CRCs but AFAIR Adler's method is pretty
much ad-hoc.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-04 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Thu, 2007-01-04 at 11:09 -0500, Tom Lane wrote:
 It works most of the time doesn't exactly satisfy me.

 It seemed safer to allow a very rare error through to the next level of
 error checking rather than to close the door so tight that recovery
 would not be possible in a very rare case.

If a DBA is turning checksums off at all, he's already bought into the
assumption that he's prepared to recover from backups.  What you don't
seem to get here is that this feature is pretty darn questionable in
the first place, and for it to have a side effect of poking a hole in
the system's reliability even when it's off is more than enough to get
it rejected outright.  It's just a No Sale.

I don't believe that the hole is real small, either, as
overwrite-with-zeroes is not exactly an unheard-of failure mode for
filesystems.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-04 Thread Florian Weimer
* Tom Lane:

 Florian Weimer [EMAIL PROTECTED] writes:
 Have you tried switching to Adler32 instead of CRC32?

 Is anything known about the error detection capabilities of Adler32?
 There's a lot of math behind CRCs but AFAIR Adler's method is pretty
 much ad-hoc.

Correct me if I'm wrong, but the main reason for the WAL CRC is to
detect partial WAL writes (due to improper caching, for instance).
This means that you're out of the realm of traditional CRC analysis
anyway, because the things you are guarding against are neither burts
errors nor n-bit errors (for small n).

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-04 Thread Tom Lane
Florian Weimer [EMAIL PROTECTED] writes:
 * Tom Lane:
 There's a lot of math behind CRCs but AFAIR Adler's method is pretty
 much ad-hoc.

 Correct me if I'm wrong, but the main reason for the WAL CRC is to
 detect partial WAL writes (due to improper caching, for instance).

Well, that's *a* reason, but not the only one, and IMHO not one that
gives any particular guidance on what kind of checksum to use.

 This means that you're out of the realm of traditional CRC analysis
 anyway, because the things you are guarding against are neither burts
 errors nor n-bit errors (for small n).

I think short burst errors are fairly likely: the kind of scenario I'm
worried about is a wild store corrupting a word of a WAL entry while
it's waiting around to be written in the WAL buffers.  So the CRC math
does give me some comfort that that'll be detected.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PATCHES] wal_checksum = on (default) | off

2007-01-04 Thread Tom Lane
Florian Weimer [EMAIL PROTECTED] writes:
 Ah, does this mean that each WAL entry gets its own checksum?

Right.

 (I had assumed that PostgreSQLs WAL checksumming was justified by the
 partial write issue.  The wild store could easily occur with a heap
 page, too, and AFAIK, tuples, aren't checksummed.  Which would be an
 interesting option, I guess.)

We've discussed it but there's never been a pressing reason to do it.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster