> On Feb 19, 2017, at 8:14 PM, Jim Nasby <jim.na...@bluetreble.com> wrote:
> On 2/19/17 11:02 AM, David Christensen wrote:
>> My design notes for the patch were submitted to the list with little 
>> comment; see: 
>> https://www.postgresql.org/message-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8%40endpoint.com
>> I have since added the WAL logging of checksum states, however I’d be glad 
>> to take feedback on the other proposed approaches (particularly the system 
>> catalog changes + the concept of a checksum cycle).]
> A couple notes:
> - AFAIK unlogged tables get checksummed today if checksums are enabled; the 
> same should hold true if someone enables checksums on the whole cluster.

Agreed; AFAIK this should already be working if it’s using the PageIsVerified() 
API, since I just effectively modified the logic there, depending on state.

> - Shared relations should be handled as well; you don't mention them.

I agree that they should be handled as well; I thought I had mentioned it later 
in the design doc, though TBH I’m not sure if there is more involved than just 
visiting the global relations in pg_class.  In addition we need to visit all 
forks of each relation.  I’m not certain if loading those into shared_buffers 
would be sufficient; I think so, but I’d be glad to be informed otherwise.  (As 
long as they’re already using the PageIsVerified() API call they get this logic 
for free.

> - If an entire cluster is going to be considered as checksummed, then even 
> databases that don't allow connections would need to get enabled.

Yeah, the workaround for now would be to require “datallowconn" to be set to 
’t' for all databases before proceeding, unless there’s a way to connect to 
those databases internally that bypasses that check.  Open to ideas, though for 
a first pass seems like the “datallowconn” approach is the least amount of work.

> I like the idea of revalidation, but I'd suggest leaving that off of the 
> first pass.

Yeah, agreed.

> It might be easier on a first pass to look at supporting per-database 
> checksums (in this case, essentially treating shared catalogs as their own 
> database). All normal backends do per-database stuff (such as setting 
> current_database) during startup anyway. That doesn't really help for things 
> like recovery and replication though. :/ And there's still the question of 
> SLRUs (or are those not checksum'd today??).

So you’re suggesting that the data_checksums GUC get set per-database context, 
so once it’s fully enabled in the specific database it treats it as in 
enforcing state, even if the rest of the cluster hasn’t completed?  Hmm, might 
think on that a bit, but it seems pretty straightforward.

What issues do you see affecting replication and recovery specifically (other 
than the entire cluster not being complete)?  Since the checksum changes are 
WAL logged, seems you be no worse the wear on a standby if you had to change 

> BTW, it occurs to me that this is related to the problem we have with trying 
> to make changes that break page binary compatibility. If we had a method for 
> handling that it would probably be useful for enabling checksums as well. 
> You'd essentially treat an un-checksum'd page as if it was an "old page 
> version". The biggest problem there is dealing with the potential that the 
> new page needs to be larger than the old one was, but maybe there's some 
> useful progress to be had in this area before tackling the "page too small" 
> problem.

I agree it’s very similar; my issue is I don’t want to have to postpone 
handling a specific case for some future infrastructure.  That being said, what 
I could see being a general case is the piece which basically walks all pages 
in the cluster; as long as the page checksum/format validation happens at Page 
read time, we could do page version checks/conversions at the same time, and 
the only special code we’d need is to keep track of which files still need to 
be visited and how to minimize the impact on the cluster via some sort of 
throttling/leveling.  (It’s also similar to vacuum in that way, however we have 
been going out of our way to make vacuum smart enough to *avoid* visiting every 
page, so I think it is a different enough use case that we can’t tie the two 
systems together, nor do I feel like taking that project on.)

We could call the checksum_cycle something else (page_walk_cycle?  bikeshed 
time!) and basically have the infrastructure to initiate online/gradual 
conversion/processing of all pages for free.


David Christensen
End Point Corporation

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to