Re: [HACKERS] Checksums by default?

Robert Haas Wed, 25 Jan 2017 16:26:35 -0800

On Wed, Jan 25, 2017 at 6:30 PM, Stephen Frost <[email protected]> wrote:
> I hope to discuss it further after we have the ability to turn it off
> easily.


I think we should have the ability to flip it in BOTH directions easily.

>> Second, really hard to enable is a relative term.  I accept that
>> enabling checksums is not a pleasant process.  Right now, you'd have
>> to do a dump/restore, or use logical replication to replicate the data
>> to a new cluster and then switch over.  On the other hand, if
>> checksums are really a critical feature, how are people getting to the
>> point where they've got a mission-critical production system and only
>> then discovering that they want to enable checksums?
>
> I truely do wish everyone would come talk to me before building out a
> database.  Perhaps that's been your experience, in which case, I envy
> you, but I tend to get a reaction more along the lines of "wait, what do
> you mean I had to pass some option to initdb to enable checksum?!?!".
> The fact that we've got a WAL implementation and clearly understand
> fsync requirements, why full page writes make sense, and that our WAL
> has its own CRCs which isn't possible to disable, tends to lead people
> to think we really know what we're doing and that we care a lot about
> their data.

It sounds to me like you are misleading users about the positives and
negatives of checksums, which then causes them to be shocked that they
are not the default.

> As I have said, I don't believe it has to be on for everyone.

For the second time, I didn't say that.  But the default has a
powerful influence on behavior.  If it didn't, you wouldn't be trying
to get it changed.

> [ unsolicited bragging about an unspecified backup tool, presumably 
> pgbackrest ]

Great.

> Presently, last I checked at least, the database system doesn't fall
> over and die if a single page's checksum fails.

This is another thing that I never said.

> [ more unsolicited bragging an unspecified backup tool, presumably still 
> pgbackrest ]

Swell.

>> I'm not trying to downplay the usefulness of checksums *in a certain
>> context*.  It's a good feature, and I'm glad we have it.  But I think
>> you're somewhat inflating the utility of it while discounting the very
>> real costs.
>
> The costs for checksums don't bother me any more than the costs for WAL
> or WAL CRCs or full page writes.

Obviously.  But I think they should.  Frankly, I think the costs for
full page writes should bother the heck out of all of us, but the
solution isn't to shut them off any more than it is to enable
checksums despite the cost.  It's to find a way to reduce the costs.

> They may not be required on every
> system, but they're certainly required on more than 'zero' entirely
> reasonable systems which people deploy in their production environments.

Nobody said otherwise.

> I'd rather walk into an engagement where the user is saying "yeah, we
> enabled checksums and it caught this corruption issue" than having to
> break the bad news, which I've had to do over and over, that their
> existing system hasn't got checksums enabled.  This isn't hypothetical,
> it's what I run into regularly with entirely reasonable and skilled
> engineers who have been deploying PG.

Maybe you should just stop telling them and use the time thus freed up
to work on improving the checksum feature.

I'm skeptical of this whole discussion because you seem to be filled
with unalloyed confidence that checksums have little performance
impact and will do wonderful things to prevent data loss, whereas I
think they have significant performance impact and will only very
slightly help to prevent data loss.  I admit that the idea of having
pgbackrest verify checksums while backing up seems like it could
greatly improve the chances of checksums being useful, but I'm not
going to endorse changing PostgreSQL's default for pgbackrest's
benefit.  It's got to be to the benefit of PostgreSQL users broadly,
not just the subset of those people who use one particular backup
tool.  Also, the massive hit that will probably occur on
high-concurrency OLTP workloads larger than shared_buffers is going to
be had to justify for any amount of backup security.  I think that
problem's got to be solved or at least mitigated before we think about
changing this.  I realize that not everyone would set the bar that
high, but I see far too many customers with exactly that workload to
dismiss it lightly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

Reply via email to