* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Andres Freund <and...@anarazel.de> writes:
> > Sure, it might be easy, but we don't have it.  Personally I think
> > checksums just aren't even ready for prime time. If we had:
> > - ability to switch on/off at runtime (early patches for that have IIRC
> >   been posted)
> > - *builtin* tooling to check checksums for everything
> > - *builtin* tooling to compute checksums after changing setting
> > - configurable background sweeps for checksums
> Yeah, and there's a bunch of usability tooling that we don't have,
> centered around "what do you do after you get a checksum error?".
> AFAIK there's no way to check or clear such an error; but without
> such tools, I'm afraid that checksums are as much of a foot-gun
> as a benefit.

Uh, ignore_checksum_failure and zero_damanged_pages ...?

Not that I'd suggest flipping those on for your production database the
first time you see a checksum failure, but we aren't completely without
a way to address such cases.

Or the to-be-implemented ability to disable checksums for a cluster.

> I think developing all this stuff is a good long-term activity,
> but I'm hesitant about turning checksums loose on the average
> user before we have it.

What I dislike about this stance is that it just means we're going to
have more and more systems out there that won't have checksums enabled,
and there's not going to be an easy way to fix that.

> To draw a weak analogy, our checksums right now are more or less
> where our replication was in 9.1 --- we had it, but there were still
> an awful lot of rough edges.  It was only just in this past release
> cycle that we started to change default settings towards the idea
> that they should support replication by default.  I think the checksum
> tooling likewise needs years of maturation before we can say that it's
> realistically ready to be the default.

Well, checksums were introduced in 9.3, which would mean that this is
really only being proposed a year earlier than the replication timeline
case, if I'm following correctly.  I do agree that checksums have not
seen quite as much love as the replicaiton work, though I'm tempted to
argue that's because they aren't an "interesting" feature now that we've
got them- but even those uninteresting features really need someone to
champion them when they're important.  Unfortunately, our situation with
checksums does make me feel a bit like they were added to satisfy a
check-box requirement and then not really developed very much further.

Following your weak analogy though, it's not like users are going to
have to dump/restore their entire cluster to change their systems to
take advantage of the new replication capabilities.



Attachment: signature.asc
Description: Digital signature

Reply via email to