Re: [Patch] Checksums for SLRU files

2018-08-01 Thread Thomas Munro
On Thu, Aug 2, 2018 at 1:20 PM, Alvaro Herrera  wrote:
> On 2018-Aug-02, Thomas Munro wrote:
>> PostgreSQL only requires atomic writes of 512 bytes (see
>> PG_CONTROL_MAX_SAFE_SIZE), the traditional sector size for disks made
>> approximately 1980-2010, though as far as I know spinning disks made
>> this decade use 4KB sectors, and for SSDs there is more variation.  I
>> suppose the theory for torn SLRU page safety today is that the
>> existing SLRU users all have fully independent values that don't cross
>> sector boundaries, so torn writes can't corrupt them.
>
> Hmm, I wonder if this is true for multixact/members.  I think it's not
> true for either 4kB sectors nor for 512 byte sectors.

Hmm, right, the set of members can span sectors.  Let me try that
again.  You can cross sector boundaries, but only if you don't require
any kind of multi-sector consistency during replay.

I think the important property for correct operation without FPWs is
that you can't read data from the page itself in order to redo writes
to the page.  That rules out whole-page checksum verification, and
probably requires "physical" addressing.  By physical addressing I
mean for example that the WAL record that writes member data must know
exactly where to put it on the page without, for example, consulting
the page header or item pointers to data that can move data around
("logical" intra-page addressing).  We make the page consistent
incrementally, because each WAL record that writes new members into a
page is concerned with a specific physical part of the page identified
by offset and doesn't care about the rest, and no one should ever try
to read any part of it that hasn't already been made consistent.  This
seems OK.

Another way to say it is that FPWs are physical logging of whole pages
(they say how to set every single bit), and WAL for multixacts is a
bit like physical logging of smaller regions of the page.  Physical
logging doesn't suffer from torn pages, as long as readers are also
looking stuff up by physical addresses and never trying to read areas
of the page that haven't been written to yet.  If you want page-level
checksums, though, the incremental approach won't work.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [Patch] Checksums for SLRU files

2018-08-01 Thread Andres Freund
On 2018-08-01 21:20:22 -0400, Alvaro Herrera wrote:
> On 2018-Aug-02, Thomas Munro wrote:
> 
> > PostgreSQL only requires atomic writes of 512 bytes (see
> > PG_CONTROL_MAX_SAFE_SIZE), the traditional sector size for disks made
> > approximately 1980-2010, though as far as I know spinning disks made
> > this decade use 4KB sectors, and for SSDs there is more variation.  I
> > suppose the theory for torn SLRU page safety today is that the
> > existing SLRU users all have fully independent values that don't cross
> > sector boundaries, so torn writes can't corrupt them.
> 
> Hmm, I wonder if this is true for multixact/members.  I think it's not
> true for either 4kB sectors nor for 512 byte sectors.

Hm, why not? Individual entries are 4bytes large and aligned, no? And as
we're solely appending (logicly if not physicly), that should be ok?

Greetings,

Andres Freund



Re: [Patch] Checksums for SLRU files

2018-08-01 Thread Thomas Munro
On Wed, Aug 1, 2018 at 11:06 PM, Andrey Borodin  wrote:
>> 1 авг. 2018 г., в 13:49, Thomas Munro  
>> написал(а):
>> Hmm.  This proposal doesn't seem to deal with torn writes.
>
> That's true, but it's a bit orthogonal to problem solved with checksums.
> Checksums provide way to avoid reading bad page, torn pages - is about 
> preventing writing bad writes.

It's a problem if you look at it like this:  Without your patch, my
database can recover after power loss.  With your patch, a torn SLRU
page can cause recovery to fail.  Then my only option is to set
ignore_checksum_failure=on so that my cluster can start up.  Without
significant effort I can't tell if the checksum verification failed
because data was arbitrarily corrupted (the reason for this feature to
exist), or because of a torn page (*expected behaviour* on
interruption of a storage system with atomic write size < BLCKSZ).
This may also apply also to online filesystem-level backups.

PostgreSQL only requires atomic writes of 512 bytes (see
PG_CONTROL_MAX_SAFE_SIZE), the traditional sector size for disks made
approximately 1980-2010, though as far as I know spinning disks made
this decade use 4KB sectors, and for SSDs there is more variation.  I
suppose the theory for torn SLRU page safety today is that the
existing SLRU users all have fully independent values that don't cross
sector boundaries, so torn writes can't corrupt them.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [Patch] Checksums for SLRU files

2018-08-01 Thread Thomas Munro
On Thu, Aug 2, 2018 at 6:17 AM, Andres Freund  wrote:
> On 2018-08-01 11:14:18 -0700, Shawn Debnath wrote:
>> On Wed, Aug 01, 2018 at 11:00:46AM -0700, Andres Freund wrote:
>> > I believe you also planned to work on that, do I remember that
>> > correctly, or is that just wishful thinking?
>>
>> Yep - I am actively working on this at the moment, planning on having a
>> proposal out to this list in a week or two.
>
> Cool!

+1

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [Patch] Checksums for SLRU files

2018-08-01 Thread Andres Freund
On 2018-08-01 11:14:18 -0700, Shawn Debnath wrote:
> On Wed, Aug 01, 2018 at 11:00:46AM -0700, Andres Freund wrote:
> > I believe you also planned to work on that, do I remember that
> > correctly, or is that just wishful thinking?
> 
> Yep - I am actively working on this at the moment, planning on having a 
> proposal out to this list in a week or two.

Cool!



Re: [Patch] Checksums for SLRU files

2018-08-01 Thread Andres Freund
Hi,

On 2018-08-01 10:58:03 -0700, Shawn Debnath wrote:
> On Wed, Aug 01, 2018 at 02:06:44PM +0300, Andrey Borodin wrote:
> > But adding LSNs, and whole regular PageHeader is quite easy in this 
> > patch. Do you think we should really go that way?
> > 
> > Putting SLRUs into usual shared buffers and WAL-logging looks like a good 
> > idea, but a lot of work.
> 
> Back at PgCon in Ottawa this year, I pitched the idea to moving 
> components off of SLRU and on to the buffer cache.

I believe you also planned to work on that, do I remember that
correctly, or is that just wishful thinking?


> That idea would 
> involve adding regular page headers to SLRU blocks, and transitioning 
> various components like multixact, commit timestamp, clog, subtrans and 
> others, to being first class residents in the buffer cache. The goal is 
> to provide performance benefits along with resiliency. 
> 
> I believe Heikki also mentioned wanting support for stamping LSNs to 
> ensure durability in clog. I agree with the idea of incorporating the 
> PageHeader at the beginning of the SLRU page to maintain checksums and 
> we can iteratively add support for tracking LSNs if necessary.

I'm entirely in agreement with that goal, for the reasons stated above.

Greetings,

Andres Freund



Re: [Patch] Checksums for SLRU files

2018-08-01 Thread Andrey Borodin
Hi!

> 1 авг. 2018 г., в 13:49, Thomas Munro  
> написал(а):
> 
> Hmm.  This proposal doesn't seem to deal with torn writes.
That's true, but it's a bit orthogonal to problem solved with checksums.
Checksums provide way to avoid reading bad page, torn pages - is about 
preventing writing bad writes.

But adding LSNs, and whole regular PageHeader is quite easy in this patch. Do 
you think we should really go that way?

Putting SLRUs into usual shared buffers and WAL-logging looks like a good idea, 
but a lot of work.

Best regards, Andrey Borodin.


Re: [Patch] Checksums for SLRU files

2018-08-01 Thread Thomas Munro
On Wed, Jul 18, 2018 at 5:54 PM, Thomas Munro
 wrote:
> On Wed, Jul 18, 2018 at 5:41 PM, Andrey Borodin  wrote:
>>> I think we'd want pg_upgrade tests showing an example of each SLRU
>>> growing past one segment, and then being upgraded, and then being
>>> accessed in various different pages and segment files, so that we can
>>> see that we're able to shift the data to the right place successfully.
>>> For example I think I'd want to see that a single aborted transaction
>>> surrounded by many committed transactions shows up in the right place
>>> after an upgrade.
>>
>> Can you elaborate a bit on how to implement this test. I've searched for 
>> some automated pg_upgrade tests but didn't found one.
>> Should it be one-time test script or something "make check-world"-able?

Hmm.  This proposal doesn't seem to deal with torn writes.  If someone
modifies an 8KB SLRU page and it is partially written out (say,
because your disk has 4KB sectors, and the power cuts out after only
one sector is modified), then during recovery we'll try to read that
block back in and the checksum will be wrong.

The way PostgreSQL usually deals with this problem (and higher level
problems caused by torn writes) is by putting a full page image into
the WAL the first time each page is modified after each checkpoint.
(There are other approaches used by other databases, such as MySQL's
write-two-copies strategy with a barrier in between, since they can't
both be torn, and the problem goes away if your filesystem somehow
magically provides atomic 8KB blocks so you can turn full page writes
off.)

To reuse the existing machinery, in theory I think you'd call
XLogRegisterBuffer() in every place that modifies an SLRU page, and
PageSetLSN() after inserting the WAL.  The problem is that these pages
are not in the regular buffer pool and don't have an LSN in the
standard place, so that won't work.  I heard about a project to put
SLRUs into the regular buffer pool, but I don't know the status.
Without that I think you might need to invent equivalent machinery
that can register SLRU buffers with xloginsert.c.

To avoid writing full page images for every SLRU page, you'd probably
want to use something like REGBUF_WILL_INIT to skip FPW for pages
you're zero-initialising (eg in ZeroCLOGPage()).  I haven't studied
the synchronisation problems lurking there.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [Patch] Checksums for SLRU files

2018-07-17 Thread Thomas Munro
On Wed, Jul 18, 2018 at 5:41 PM, Andrey Borodin  wrote:
>> I think we'd want pg_upgrade tests showing an example of each SLRU
>> growing past one segment, and then being upgraded, and then being
>> accessed in various different pages and segment files, so that we can
>> see that we're able to shift the data to the right place successfully.
>> For example I think I'd want to see that a single aborted transaction
>> surrounded by many committed transactions shows up in the right place
>> after an upgrade.
>
> Can you elaborate a bit on how to implement this test. I've searched for some 
> automated pg_upgrade tests but didn't found one.
> Should it be one-time test script or something "make check-world"-able?

Hi Andrey,

Like this (also reached by check-world):

$ cd src/bin/pg_upgrade
$ make check

It looks like the interesting bits are in test.sh.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [Patch] Checksums for SLRU files

2018-07-17 Thread Andrey Borodin
Hi, Tomas!

> 
> I think we'd want pg_upgrade tests showing an example of each SLRU
> growing past one segment, and then being upgraded, and then being
> accessed in various different pages and segment files, so that we can
> see that we're able to shift the data to the right place successfully.
> For example I think I'd want to see that a single aborted transaction
> surrounded by many committed transactions shows up in the right place
> after an upgrade.

Can you elaborate a bit on how to implement this test. I've searched for some 
automated pg_upgrade tests but didn't found one.
Should it be one-time test script or something "make check-world"-able?

Best regards, Andrey Borodin.


Re: [Patch] Checksums for SLRU files

2018-05-16 Thread Andrey Borodin
Hi, Ivan!

> 19 марта 2018 г., в 15:32, Andrey Borodin  написал(а):
> I was looking into this patch mainly because I was reviewing other checksums 
> patch in different thread. But the purpose of this patch seems viable for me.
> After looking into the patch I'd like to propose some editorialization:

I propose adding this patch to commitfest, if you do not object. We will have 
to add support for online SLRU checksum computation for this pages in spite of 
"Online enabling of checksums" patch(chances are that it will be committed 
earlier), so there is some work to do anyway. But I think it is good to keep 
this patch on the radars of potential reviewers.

Best regards, Andrey Borodin.


Re: [Patch] Checksums for SLRU files

2018-03-19 Thread Andrey Borodin
Hi, Ivan!

> 5 марта 2018 г., в 20:58, Andrey Borodin  написал(а):
> 
> I've found that there are few more places with SLRU items per page

I was looking into this patch mainly because I was reviewing other checksums 
patch in different thread. But the purpose of this patch seems viable for me.
After looking into the patch I'd like to propose some editorialization:
0. Removed GUC: ignore_checksum_failure looks good enough
1. Removed copy on read from disk. Also I do not see a reason to copy page 
before write
2. multis are upgraded by WAL reset, CLOG is actually upgraded
3. Updated all places where SLRU block size was used


Best regards, Andrey Borodin.


0001-SLRU-checksums-patch.patch
Description: Binary data


Re: [Patch] Checksums for SLRU files

2018-03-05 Thread Andrey Borodin
Hi, Ivan!

I've found that there are few more places with SLRU items per page, where you 
have to update usable page size. Please find the diff attached.
I agree that there is a little chance to get this commitable quickly, but 
still, the feature worth working on, from my point of view.

Best regards, Andrey Borodin.


moar_slru.diff
Description: Binary data


Re: [Patch] Checksums for SLRU files

2018-03-02 Thread Andres Freund
On 2018-03-02 11:49:05 -0500, Robert Haas wrote:
> On Thu, Mar 1, 2018 at 8:25 PM, Andres Freund  wrote:
> > On 2018-02-02 11:37:34 +1300, Thomas Munro wrote:
> >> > 3. pg_upgrade isn't considered.  This patch should provide upgrading 
> >> > SLRUs
> >> > to adopt changed useful size of page.  That seems to be hardest patch of
> >> > this patch to be written.
> >>
> >> +1
> >>
> >> I think we'd want pg_upgrade tests showing an example of each SLRU
> >> growing past one segment, and then being upgraded, and then being
> >> accessed in various different pages and segment files, so that we can
> >> see that we're able to shift the data to the right place successfully.
> >> For example I think I'd want to see that a single aborted transaction
> >> surrounded by many committed transactions shows up in the right place
> >> after an upgrade.
> >
> > This patch is in the 2018-03 CF, but I don't see any progress since the
> > last comments. As it has been Waiting on author since the last CF, I
> > think we should mark this as returned with feedback.
> 
> +1.

Done.



Re: [Patch] Checksums for SLRU files

2018-03-02 Thread Robert Haas
On Thu, Mar 1, 2018 at 8:25 PM, Andres Freund  wrote:
> On 2018-02-02 11:37:34 +1300, Thomas Munro wrote:
>> > 3. pg_upgrade isn't considered.  This patch should provide upgrading SLRUs
>> > to adopt changed useful size of page.  That seems to be hardest patch of
>> > this patch to be written.
>>
>> +1
>>
>> I think we'd want pg_upgrade tests showing an example of each SLRU
>> growing past one segment, and then being upgraded, and then being
>> accessed in various different pages and segment files, so that we can
>> see that we're able to shift the data to the right place successfully.
>> For example I think I'd want to see that a single aborted transaction
>> surrounded by many committed transactions shows up in the right place
>> after an upgrade.
>
> This patch is in the 2018-03 CF, but I don't see any progress since the
> last comments. As it has been Waiting on author since the last CF, I
> think we should mark this as returned with feedback.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [Patch] Checksums for SLRU files

2018-03-01 Thread Andres Freund
On 2018-02-02 11:37:34 +1300, Thomas Munro wrote:
> > 3. pg_upgrade isn't considered.  This patch should provide upgrading SLRUs
> > to adopt changed useful size of page.  That seems to be hardest patch of
> > this patch to be written.
> 
> +1
> 
> I think we'd want pg_upgrade tests showing an example of each SLRU
> growing past one segment, and then being upgraded, and then being
> accessed in various different pages and segment files, so that we can
> see that we're able to shift the data to the right place successfully.
> For example I think I'd want to see that a single aborted transaction
> surrounded by many committed transactions shows up in the right place
> after an upgrade.

This patch is in the 2018-03 CF, but I don't see any progress since the
last comments. As it has been Waiting on author since the last CF, I
think we should mark this as returned with feedback.

Greetings,

Andres Freund



Re: [Patch] Checksums for SLRU files

2018-02-01 Thread Thomas Munro
On Wed, Jan 3, 2018 at 11:21 AM, Alexander Korotkov
 wrote:
> On Mon, Jan 1, 2018 at 9:19 PM, Andrey Borodin  wrote:
>> > 31 дек. 2017 г., в 22:30, Ivan Kartyshov 
>> > написал(а):
>> >
>> > Hello, I`d like to show my implementation of SLRU file protection with
>> > checksums.
>> > .
>> > I would like to hear your thoughts over my patch.
>>
>> As far as I can see, the patch solves problem of hardware corruption in
>> SLRU.
>> This seems a valid concern.

+1

It doesn't make sense to have a checksum feature that protects
relation files, but doesn't protect these super important meta-data
files that affect our interpretation of the relation files.

> [...]
> 3. pg_upgrade isn't considered.  This patch should provide upgrading SLRUs
> to adopt changed useful size of page.  That seems to be hardest patch of
> this patch to be written.

+1

I think we'd want pg_upgrade tests showing an example of each SLRU
growing past one segment, and then being upgraded, and then being
accessed in various different pages and segment files, so that we can
see that we're able to shift the data to the right place successfully.
For example I think I'd want to see that a single aborted transaction
surrounded by many committed transactions shows up in the right place
after an upgrade.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [Patch] Checksums for SLRU files

2018-01-02 Thread Alexander Korotkov
On Mon, Jan 1, 2018 at 9:19 PM, Andrey Borodin  wrote:

> > 31 дек. 2017 г., в 22:30, Ivan Kartyshov 
> написал(а):
> >
> > Hello, I`d like to show my implementation of SLRU file protection with
> checksums.
> > .
> > I would like to hear your thoughts over my patch.
>
> As far as I can see, the patch solves problem of hardware corruption in
> SLRU.
> This seems a valid concern. I've tried to understand your patch and few
> questions arose which I could not answer myself.
>
> 1. Why do you propose different GUC besides ignore_checksum_failure? What
> is scenario of it's use which is not covered by general GUC switch?
> 2. What is performance penalty of having this checksums?
>
> Besides this, some things seems suspicious to me:
> 1. This comment seems excessive. I'd leave just first one first line.
> +/*
> + * GUC variable
> + * Set from backend:
> + * alter system set ignore_slru_checksum_failure = on/off;
> + * select pg_reload_conf();
> + */
> 2. Functions pg_checksum_slru_page() and pg_getchecksum_slru_page()
> operate with two bytes instead of uint16. This seems strange.
> 3. This line
> checksum = (pg_checksum_block(page, BLCKSZ) % 65535) + 1;
> Need to share comment with previous function (pg_checksum_page()). +1 was
> a tough thing for me to understand before looking around and reading those
> comments.
> 4. I could not understand purpose of this expression
> page[BLCKSZ - 1] & 0X00FF
>

Andrey, thank you for review.
Ivan, thank you for submitting this patch.  I also have some notes on it
from the first glance.

1. It seems that you need to define some macro for (BLCKSZ - CHKSUMSZ) in
order to evade typing it multiple times.
2. You also didn't modify all the SLRU macros depending on useful size of
page.  You missed at least COMMIT_TS_XACTS_PER_PAGE,
MULTIXACT_MEMBERGROUPS_PER_PAGE and SUBTRANS_XACTS_PER_PAGE.
3. pg_upgrade isn't considered.  This patch should provide upgrading SLRUs
to adopt changed useful size of page.  That seems to be hardest patch of
this patch to be written.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [Patch] Checksums for SLRU files

2018-01-01 Thread Andrey Borodin
Hi, Ivan!
> 31 дек. 2017 г., в 22:30, Ivan Kartyshov  
> написал(а):
> 
> Hello, I`d like to show my implementation of SLRU file protection with 
> checksums.
> .
> I would like to hear your thoughts over my patch.

As far as I can see, the patch solves problem of hardware corruption in SLRU.
This seems a valid concern. I've tried to understand your patch and few 
questions arose which I could not answer myself.

1. Why do you propose different GUC besides ignore_checksum_failure? What is 
scenario of it's use which is not covered by general GUC switch?
2. What is performance penalty of having this checksums?

Besides this, some things seems suspicious to me:
1. This comment seems excessive. I'd leave just first one first line.
+/*
+ * GUC variable
+ * Set from backend:
+ * alter system set ignore_slru_checksum_failure = on/off;
+ * select pg_reload_conf();
+ */
2. Functions pg_checksum_slru_page() and pg_getchecksum_slru_page() operate 
with two bytes instead of uint16. This seems strange.
3. This line
checksum = (pg_checksum_block(page, BLCKSZ) % 65535) + 1;
Need to share comment with previous function (pg_checksum_page()). +1 was a 
tough thing for me to understand before looking around and reading those 
comments.
4. I could not understand purpose of this expression
page[BLCKSZ - 1] & 0X00FF

Happy New Year :) Keep up good work.

Best regards, Andrey Borodin.