Re: Online verification of checksums

2021-09-13 Thread Daniel Gustafsson
> On 2 Sep 2021, at 13:18, Daniel Gustafsson wrote: > >> On 9 Jul 2021, at 22:00, Ibrar Ahmed wrote: > >> I am changing the status to "Waiting on Author" based on the latest comments >> of @David Steele >> and secondly the patch does not apply cleanly. > > This patch hasn’t moved since

Re: Online verification of checksums

2021-09-02 Thread Daniel Gustafsson
> On 9 Jul 2021, at 22:00, Ibrar Ahmed wrote: > I am changing the status to "Waiting on Author" based on the latest comments > of @David Steele > and secondly the patch does not apply cleanly. This patch hasn’t moved since marked as WoA in the last CF and still doesn’t apply, unless there is

Re: Online verification of checksums

2021-07-09 Thread Ibrar Ahmed
On Tue, Mar 9, 2021 at 10:43 PM David Steele wrote: > On 11/30/20 6:38 PM, David Steele wrote: > > On 11/30/20 9:27 AM, Stephen Frost wrote: > >> * Michael Paquier (mich...@paquier.xyz) wrote: > >>> On Fri, Nov 27, 2020 at 11:15:27AM -0500, Stephen Frost wrote: > * Magnus Hagander

Re: Online verification of checksums

2021-03-09 Thread David Steele
On 11/30/20 6:38 PM, David Steele wrote: On 11/30/20 9:27 AM, Stephen Frost wrote: * Michael Paquier (mich...@paquier.xyz) wrote: On Fri, Nov 27, 2020 at 11:15:27AM -0500, Stephen Frost wrote: * Magnus Hagander (mag...@hagander.net) wrote: On Thu, Nov 26, 2020 at 8:42 AM Michael Paquier

Re: Online verification of checksums

2020-11-30 Thread David Steele
On 11/30/20 9:27 AM, Stephen Frost wrote: Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: On Fri, Nov 27, 2020 at 11:15:27AM -0500, Stephen Frost wrote: * Magnus Hagander (mag...@hagander.net) wrote: On Thu, Nov 26, 2020 at 8:42 AM Michael Paquier wrote: But here the checksum is

Re: Online verification of checksums

2020-11-30 Thread Stephen Frost
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Fri, Nov 27, 2020 at 11:15:27AM -0500, Stephen Frost wrote: > > * Magnus Hagander (mag...@hagander.net) wrote: > >> On Thu, Nov 26, 2020 at 8:42 AM Michael Paquier > >> wrote: > >>> But here the checksum is broken, so while the

Re: Online verification of checksums

2020-11-28 Thread Michael Paquier
On Fri, Nov 27, 2020 at 11:15:27AM -0500, Stephen Frost wrote: > * Magnus Hagander (mag...@hagander.net) wrote: >> On Thu, Nov 26, 2020 at 8:42 AM Michael Paquier wrote: >>> But here the checksum is broken, so while the offset is something we >>> can rely on how do you make sure that the LSN is

Re: Online verification of checksums

2020-11-27 Thread Stephen Frost
Greetings, * Magnus Hagander (mag...@hagander.net) wrote: > On Thu, Nov 26, 2020 at 8:42 AM Michael Paquier wrote: > > On Tue, Nov 24, 2020 at 12:38:30PM -0500, David Steele wrote: > > > We are not just looking at one LSN value. Here are the steps we are > > > proposing (I'll skip checks for

Re: Online verification of checksums

2020-11-26 Thread Magnus Hagander
On Thu, Nov 26, 2020 at 8:42 AM Michael Paquier wrote: > > On Tue, Nov 24, 2020 at 12:38:30PM -0500, David Steele wrote: > > We are not just looking at one LSN value. Here are the steps we are > > proposing (I'll skip checks for zero pages here): > > > > 1) Test the page checksum. If it passes

Re: Online verification of checksums

2020-11-25 Thread Michael Paquier
On Tue, Nov 24, 2020 at 12:38:30PM -0500, David Steele wrote: > We are not just looking at one LSN value. Here are the steps we are > proposing (I'll skip checks for zero pages here): > > 1) Test the page checksum. If it passes the page is OK. > 2) If the checksum does not pass then record the

Re: Online verification of checksums

2020-11-24 Thread David Steele
Hi Michael, On 11/23/20 8:10 PM, Michael Paquier wrote: On Mon, Nov 23, 2020 at 10:35:54AM -0500, Stephen Frost wrote: Also- what is the point of reading the page from shared buffers anyway..? All we need to do is prove that the page will be rewritten during WAL replay. If we can prove

Re: Online verification of checksums

2020-11-23 Thread Stephen Frost
Greetings, On Mon, Nov 23, 2020 at 20:28 Michael Paquier wrote: > On Mon, Nov 23, 2020 at 05:28:52PM -0500, Stephen Frost wrote: > > * Anastasia Lubennikova (a.lubennik...@postgrespro.ru) wrote: > >> Yes and this is a tricky part. Until you have explained it in your > latest > >> message, I

Re: Online verification of checksums

2020-11-23 Thread Michael Paquier
On Mon, Nov 23, 2020 at 05:28:52PM -0500, Stephen Frost wrote: > * Anastasia Lubennikova (a.lubennik...@postgrespro.ru) wrote: >> Yes and this is a tricky part. Until you have explained it in your latest >> message, I wasn't sure how we can distinct concurrent update from a page >> header

Re: Online verification of checksums

2020-11-23 Thread Michael Paquier
On Mon, Nov 23, 2020 at 10:35:54AM -0500, Stephen Frost wrote: > * Anastasia Lubennikova (a.lubennik...@postgrespro.ru) wrote: >> It seems reasonable to me to rely on checksums only. >> >> As for retry, I think that API for concurrent I/O will be complicated. >> Instead, we can introduce a

Re: Online verification of checksums

2020-11-23 Thread Stephen Frost
Greetings, * Anastasia Lubennikova (a.lubennik...@postgrespro.ru) wrote: > On 23.11.2020 18:35, Stephen Frost wrote: > >* Anastasia Lubennikova (a.lubennik...@postgrespro.ru) wrote: > >>On 21.11.2020 04:30, Michael Paquier wrote: > >>>The only method I can think as being really > >>>reliable is

Re: Online verification of checksums

2020-11-23 Thread Anastasia Lubennikova
On 23.11.2020 18:35, Stephen Frost wrote: Greetings, * Anastasia Lubennikova (a.lubennik...@postgrespro.ru) wrote: On 21.11.2020 04:30, Michael Paquier wrote: The only method I can think as being really reliable is based on two facts: - Do a check only on pd_checksums, as that validates the

Re: Online verification of checksums

2020-11-23 Thread Stephen Frost
Greetings, * Anastasia Lubennikova (a.lubennik...@postgrespro.ru) wrote: > On 21.11.2020 04:30, Michael Paquier wrote: > >The only method I can think as being really > >reliable is based on two facts: > >- Do a check only on pd_checksums, as that validates the full contents > >of the page. > >-

Re: Online verification of checksums

2020-11-23 Thread Stephen Frost
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Fri, Nov 20, 2020 at 11:08:27AM -0500, Stephen Frost wrote: > > David Steele (da...@pgmasters.net) wrote: > >> Our current plan for pgBackRest: > >> > >> 1) Remove the LSN check as you have done in your patch and when rechecking > >>

Re: Online verification of checksums

2020-11-23 Thread Anastasia Lubennikova
On 21.11.2020 04:30, Michael Paquier wrote: The only method I can think as being really reliable is based on two facts: - Do a check only on pd_checksums, as that validates the full contents of the page. - When doing a retry, make sure that there is no concurrent I/O activity in the shared

Re: Online verification of checksums

2020-11-20 Thread Michael Paquier
On Fri, Nov 20, 2020 at 11:08:27AM -0500, Stephen Frost wrote: > David Steele (da...@pgmasters.net) wrote: >> Our current plan for pgBackRest: >> >> 1) Remove the LSN check as you have done in your patch and when rechecking >> see if the page has become valid *or* the LSN is ascending. >> 2)

Re: Online verification of checksums

2020-11-20 Thread Stephen Frost
Greetings, * David Steele (da...@pgmasters.net) wrote: > On 11/20/20 2:28 AM, Michael Paquier wrote: > >On Mon, Nov 16, 2020 at 11:41:51AM +0100, Magnus Hagander wrote: > >>I was referring to the latest patch on the thread. But as I said, I have > >>not read up on all the different issues raised

Re: Online verification of checksums

2020-11-20 Thread David Steele
Hi Michael, On 11/20/20 2:28 AM, Michael Paquier wrote: On Mon, Nov 16, 2020 at 11:41:51AM +0100, Magnus Hagander wrote: I was referring to the latest patch on the thread. But as I said, I have not read up on all the different issues raised in the thread, so take it with a big grain os salt.

Re: Online verification of checksums

2020-11-19 Thread Michael Paquier
On Mon, Nov 16, 2020 at 11:41:51AM +0100, Magnus Hagander wrote: > I was referring to the latest patch on the thread. But as I said, I have > not read up on all the different issues raised in the thread, so take it > with a big grain os salt. > > And I would also echo the previous comment that

Re: Online verification of checksums

2020-11-16 Thread Magnus Hagander
On Mon, Nov 16, 2020 at 1:23 AM Michael Paquier wrote: > On Sun, Nov 15, 2020 at 04:37:36PM +0100, Magnus Hagander wrote: > > On Tue, Nov 10, 2020 at 5:44 AM Michael Paquier > wrote: > >> On Thu, Nov 05, 2020 at 10:57:16AM +0900, Michael Paquier wrote: > >>> I was referring to the patch I sent

Re: Online verification of checksums

2020-11-15 Thread Michael Paquier
On Sun, Nov 15, 2020 at 04:37:36PM +0100, Magnus Hagander wrote: > On Tue, Nov 10, 2020 at 5:44 AM Michael Paquier wrote: >> On Thu, Nov 05, 2020 at 10:57:16AM +0900, Michael Paquier wrote: >>> I was referring to the patch I sent on this thread that fixes the >>> detection of a corruption for the

Re: Online verification of checksums

2020-11-15 Thread Magnus Hagander
On Tue, Nov 10, 2020 at 5:44 AM Michael Paquier wrote: > On Thu, Nov 05, 2020 at 10:57:16AM +0900, Michael Paquier wrote: > > I was referring to the patch I sent on this thread that fixes the > > detection of a corruption for the zero-only case and where pd_lsn > > and/or pg_upper are trashed by

Re: Online verification of checksums

2020-11-09 Thread Michael Paquier
On Thu, Nov 05, 2020 at 10:57:16AM +0900, Michael Paquier wrote: > I was referring to the patch I sent on this thread that fixes the > detection of a corruption for the zero-only case and where pd_lsn > and/or pg_upper are trashed by a corruption of the page header. Both > cases allow a base

Re: Online verification of checksums

2020-11-04 Thread Michael Paquier
On Wed, Nov 04, 2020 at 05:41:39PM +0100, Michael Banck wrote: > Am Mittwoch, den 04.11.2020, 17:48 +0900 schrieb Michael Paquier: >> So, I have done much more testing of this patch using an instance with >> a small shared buffer pool and pgbench running in parallel for having >> a large eviction

Re: Online verification of checksums

2020-11-04 Thread Michael Banck
Hi, Am Mittwoch, den 04.11.2020, 17:48 +0900 schrieb Michael Paquier: > On Fri, Oct 30, 2020 at 11:30:28AM +0900, Michael Paquier wrote: > > Playing with dd and generating random pages, this detects random > > corruptions, making use of a wait/retry loop if a failure is detected. > > As mentioned

Re: Online verification of checksums

2020-11-04 Thread Michael Paquier
On Fri, Oct 30, 2020 at 11:30:28AM +0900, Michael Paquier wrote: > Playing with dd and generating random pages, this detects random > corruptions, making use of a wait/retry loop if a failure is detected. > As mentioned upthread, this is a double-edged sword, increasing the > number of retries

Re: Online verification of checksums

2020-10-29 Thread Michael Paquier
On Thu, Oct 22, 2020 at 10:41:53AM +0900, Michael Paquier wrote: > We cannot trust the fields fields of the page header because these may > have been messed up with some random corruption, so what really > matters is that the checksums don't match, and that we can just rely > on that. The

Re: Online verification of checksums

2020-10-21 Thread Michael Paquier
On Wed, Oct 21, 2020 at 07:10:34PM +0900, Michael Paquier wrote: > My guess is that we should be able to make use of that for base > backups as well, but I also think that I'd rather let v13 go with more > retries without depending on a new API layer, removing of the LSN > check altogether.

Re: Online verification of checksums

2020-10-21 Thread Michael Paquier
On Wed, Oct 21, 2020 at 12:00:23PM +0200, Michael Banck wrote: > The check was ported (or the concept of it adapted) from pgBackRest if I > remember correctly. Okay, I did not know that. >> As things stand, I'd like to think that it would be much more useful >> to remove this check and to have

Re: Online verification of checksums

2020-10-21 Thread Michael Banck
Hi, Am Dienstag, den 20.10.2020, 18:11 +0900 schrieb Michael Paquier: > On Mon, Apr 06, 2020 at 04:45:44PM -0400, Tom Lane wrote: > > Actually, after thinking about that a bit more: why is there an LSN-based > > special condition at all? It seems like it'd be far more useful to > > checksum

Re: Online verification of checksums

2020-10-20 Thread Michael Paquier
On Mon, Apr 06, 2020 at 04:45:44PM -0400, Tom Lane wrote: > Actually, after thinking about that a bit more: why is there an LSN-based > special condition at all? It seems like it'd be far more useful to > checksum everything, and on failure try to re-read and re-verify the page > once or twice,

Re: Online verification of checksums

2020-07-30 Thread Daniel Gustafsson
> On 5 Jul 2020, at 13:52, Daniel Gustafsson wrote: > >> On 6 Apr 2020, at 23:15, Michael Banck wrote: > >> Probably we need to take a step back; > > This patch has been Waiting on Author since the last commitfest (and no longer > applies as well), and by the sounds of the thread there are

Re: Online verification of checksums

2020-07-05 Thread Daniel Gustafsson
> On 6 Apr 2020, at 23:15, Michael Banck wrote: > Probably we need to take a step back; This patch has been Waiting on Author since the last commitfest (and no longer applies as well), and by the sounds of the thread there are some open issues with it. Should it be Returned with Feedback to be

Re: Online verification of checksums

2020-04-06 Thread Michael Banck
Hi, Am Montag, den 06.04.2020, 16:45 -0400 schrieb Tom Lane: > I wrote: > > Another thing that's bothering me is that the patch compares page LSN > > against GetInsertRecPtr(); but that function says > > ... > > I'm not convinced that an approximation is good enough here. It seems > > like a

Re: Online verification of checksums

2020-04-06 Thread Tom Lane
I wrote: > Another thing that's bothering me is that the patch compares page LSN > against GetInsertRecPtr(); but that function says > ... > I'm not convinced that an approximation is good enough here. It seems > like a page that's just now been updated could have an LSN beyond the > current XLOG

Re: Online verification of checksums

2020-04-06 Thread Tom Lane
Michael Banck writes: > [ 0001-Fix-checksum-verification-in-base-backups-for-random_V3.patch ] I noticed that the cfbot wasn't testing this because of a minor merge conflict. I rebased it over that, and also readjusted things a little bit to avoid unnecessarily reindenting existing code, in

Re: Online verification of checksums

2020-03-12 Thread Michael Banck
Hi, thanks for reviewing this patch! Am Donnerstag, den 27.02.2020, 10:57 + schrieb Asif Rehman: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant:

Re: Online verification of checksums

2020-02-27 Thread Asif Rehman
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested The patch applies cleanly and works as expected. Just a few

Re: Online verification of checksums

2019-03-30 Thread Andres Freund
Hi, On 2019-03-30 12:56:21 +0100, Magnus Hagander wrote: > > ISTM that the fact that we had to teach it about different segment files > > for checksum verification by splitting up the filename at "." implies > > that it is not the correct level of abstraction (but maybe it could get > > schooled

Re: Online verification of checksums

2019-03-30 Thread Magnus Hagander
On Fri, Mar 29, 2019 at 10:08 PM Michael Banck wrote: > Hi, > > Am Freitag, den 29.03.2019, 16:52 +0100 schrieb Magnus Hagander: > > On Fri, Mar 29, 2019 at 4:30 PM Stephen Frost > wrote: > > > * Magnus Hagander (mag...@hagander.net) wrote: > > > > On Thu, Mar 28, 2019 at 10:19 PM Tomas Vondra

Re: Online verification of checksums

2019-03-29 Thread Michael Banck
Hi, Am Freitag, den 29.03.2019, 16:52 +0100 schrieb Magnus Hagander: > On Fri, Mar 29, 2019 at 4:30 PM Stephen Frost wrote: > > * Magnus Hagander (mag...@hagander.net) wrote: > > > On Thu, Mar 28, 2019 at 10:19 PM Tomas Vondra > > > > > > wrote: > > > > On Thu, Mar 28, 2019 at 01:11:40PM

Re: Online verification of checksums

2019-03-29 Thread Magnus Hagander
On Fri, Mar 29, 2019 at 4:30 PM Stephen Frost wrote: > Greetings, > > * Magnus Hagander (mag...@hagander.net) wrote: > > On Thu, Mar 28, 2019 at 10:19 PM Tomas Vondra < > tomas.von...@2ndquadrant.com> > > wrote: > > > > > On Thu, Mar 28, 2019 at 01:11:40PM -0700, Andres Freund wrote: > > > >Hi,

Re: Online verification of checksums

2019-03-29 Thread Andres Freund
Hi, On 2019-03-29 11:38:02 -0400, Stephen Frost wrote: > The server-side function would essentially lock the page against i/o, > re-read it off disk into an independent location, unlock the page, then > calculate the checksum and report back? Right. I think there's a few minor variations of how

Re: Online verification of checksums

2019-03-29 Thread Stephen Frost
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2019-03-29 11:30:15 -0400, Stephen Frost wrote: > > * Magnus Hagander (mag...@hagander.net) wrote: > > > On Thu, Mar 28, 2019 at 10:19 PM Tomas Vondra > > > > > > wrote: > > > > On Thu, Mar 28, 2019 at 01:11:40PM -0700, Andres Freund

Re: Online verification of checksums

2019-03-29 Thread Andres Freund
Hi, On 2019-03-29 11:30:15 -0400, Stephen Frost wrote: > * Magnus Hagander (mag...@hagander.net) wrote: > > On Thu, Mar 28, 2019 at 10:19 PM Tomas Vondra > > wrote: > > > On Thu, Mar 28, 2019 at 01:11:40PM -0700, Andres Freund wrote: > > > >Hi, > > > > > > > >On 2019-03-28 21:09:22 +0100,

Re: Online verification of checksums

2019-03-29 Thread Stephen Frost
Greetings, * Magnus Hagander (mag...@hagander.net) wrote: > On Thu, Mar 28, 2019 at 10:19 PM Tomas Vondra > wrote: > > > On Thu, Mar 28, 2019 at 01:11:40PM -0700, Andres Freund wrote: > > >Hi, > > > > > >On 2019-03-28 21:09:22 +0100, Michael Banck wrote: > > >> I agree that the current patch

Re: Online verification of checksums

2019-03-29 Thread Magnus Hagander
On Thu, Mar 28, 2019 at 10:19 PM Tomas Vondra wrote: > On Thu, Mar 28, 2019 at 01:11:40PM -0700, Andres Freund wrote: > >Hi, > > > >On 2019-03-28 21:09:22 +0100, Michael Banck wrote: > >> I agree that the current patch might have some corner-cases where it > >> does not guarantee 100% accuracy

Re: Online verification of checksums

2019-03-28 Thread Tomas Vondra
On Thu, Mar 28, 2019 at 01:11:40PM -0700, Andres Freund wrote: Hi, On 2019-03-28 21:09:22 +0100, Michael Banck wrote: I agree that the current patch might have some corner-cases where it does not guarantee 100% accuracy in online mode, but I hope the current version at least has no more false

Re: Online verification of checksums

2019-03-28 Thread Andres Freund
Hi, On 2019-03-28 21:09:22 +0100, Michael Banck wrote: > I agree that the current patch might have some corner-cases where it > does not guarantee 100% accuracy in online mode, but I hope the current > version at least has no more false negatives. False positives are *bad*. We shouldn't

Re: Online verification of checksums

2019-03-28 Thread Michael Banck
Hi, Am Donnerstag, den 28.03.2019, 18:19 +0100 schrieb Tomas Vondra: > On Thu, Mar 28, 2019 at 05:08:33PM +0100, Michael Banck wrote: > > I also fixed the two issues Andres reported, namely a zeroed-out > > pageheader and a random LSN. The first is caught be checking for an all- > > zero-page in

Re: Online verification of checksums

2019-03-28 Thread Tomas Vondra
On Thu, Mar 28, 2019 at 05:08:33PM +0100, Michael Banck wrote: Hi, I have rebased this patch now. I also fixed the two issues Andres reported, namely a zeroed-out pageheader and a random LSN. The first is caught be checking for an all- zero-page in the way PageIsVerified() does. The second is

Re: Online verification of checksums

2019-03-28 Thread Michael Banck
Hi, I have rebased this patch now. I also fixed the two issues Andres reported, namely a zeroed-out pageheader and a random LSN. The first is caught be checking for an all- zero-page in the way PageIsVerified() does. The second is caught by comparing the upper 32 bits of the LSN as well and

Re: [Patch] Base backups and random or zero pageheaders (was: Online verification of checksums)

2019-03-26 Thread Michael Banck
Hi, Am Dienstag, den 26.03.2019, 10:30 -0700 schrieb Andres Freund: > On 2019-03-26 18:22:55 +0100, Michael Banck wrote: > > Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund: > > > CREATE TABLE corruptme AS SELECT g.i::text AS data FROM > > > generate_series(1, 100) g(i); > > >

Re: [Patch] Base backups and random or zero pageheaders (was: Online verification of checksums)

2019-03-26 Thread Andres Freund
On 2019-03-26 18:22:55 +0100, Michael Banck wrote: > Hi, > > Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund: > > CREATE TABLE corruptme AS SELECT g.i::text AS data FROM generate_series(1, > > 100) g(i); > > SELECT pg_relation_size('corruptme'); > > postgres[22890][1]=# SELECT

[Patch] Base backups and random or zero pageheaders (was: Online verification of checksums)

2019-03-26 Thread Michael Banck
Hi, Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund: > CREATE TABLE corruptme AS SELECT g.i::text AS data FROM generate_series(1, > 100) g(i); > SELECT pg_relation_size('corruptme'); > postgres[22890][1]=# SELECT current_setting('data_directory') || '/' || >

Re: Online verification of checksums

2019-03-19 Thread Michael Paquier
On Tue, Mar 19, 2019 at 02:44:52PM -0700, Andres Freund wrote: > That's *PRECISELY* my point. I think it's a bad idea to do online > checksumming from outside the backend. It needs to be inside the > backend, and if there's any verification failures on a block, it needs > to acquire the IO lock on

Re: Online verification of checksums

2019-03-19 Thread Andres Freund
Hi, On 2019-03-19 22:39:16 +0100, Michael Banck wrote: > Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund: > > a) checks that the page is all zeroes if PageIsNew() (like > >PageIsVerified() does for the backend). That avoids missing cases > >where corruption just zeroed out

Re: Online verification of checksums

2019-03-19 Thread Michael Banck
Hi, Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund: > On 2019-03-20 03:27:55 +0800, Stephen Frost wrote: > > On Tue, Mar 19, 2019 at 23:59 Andres Freund wrote: > > > On 2019-03-19 16:52:08 +0100, Michael Banck wrote: > > > > Am Dienstag, den 19.03.2019, 11:22 -0400 schrieb Robert

Re: Online verification of checksums

2019-03-19 Thread Robert Haas
On Tue, Mar 19, 2019 at 4:49 PM Andres Freund wrote: > To demonstrate that I ran a loop that verified that a) a normal backend > query using the tale detects the corruption b) pg_basebackup doesn't. > > i=0; > while true; do > i=$(($i+1)); > echo attempt $i; > dd if=/dev/urandom

Re: Online verification of checksums

2019-03-19 Thread Andres Freund
On 2019-03-19 13:00:50 -0700, Andres Freund wrote: > As it stands, the logic seems to give more false confidence than > anything else. To demonstrate that I ran a loop that verified that a) a normal backend query using the tale detects the corruption b) pg_basebackup doesn't. i=0; while true; do

Re: Online verification of checksums

2019-03-19 Thread Andres Freund
Hi, On 2019-03-20 03:27:55 +0800, Stephen Frost wrote: > On Tue, Mar 19, 2019 at 23:59 Andres Freund wrote: > > On 2019-03-19 16:52:08 +0100, Michael Banck wrote: > > > Am Dienstag, den 19.03.2019, 11:22 -0400 schrieb Robert Haas: > > > > It's torn pages that I am concerned about - the server is

Re: Online verification of checksums

2019-03-19 Thread Stephen Frost
Greetings, On Tue, Mar 19, 2019 at 23:59 Andres Freund wrote: > Hi, > > On 2019-03-19 16:52:08 +0100, Michael Banck wrote: > > Am Dienstag, den 19.03.2019, 11:22 -0400 schrieb Robert Haas: > > > It's torn pages that I am concerned about - the server is writing and > > > we are reading, and we

Re: Online verification of checksums

2019-03-19 Thread Andres Freund
Hi, On 2019-03-19 16:52:08 +0100, Michael Banck wrote: > Am Dienstag, den 19.03.2019, 11:22 -0400 schrieb Robert Haas: > > It's torn pages that I am concerned about - the server is writing and > > we are reading, and we get a mix of old and new content. We have been > > quite diligent about

Re: Online verification of checksums

2019-03-19 Thread Michael Banck
Hi, Am Dienstag, den 19.03.2019, 11:22 -0400 schrieb Robert Haas: > It's torn pages that I am concerned about - the server is writing and > we are reading, and we get a mix of old and new content. We have been > quite diligent about protecting ourselves from such risks elsewhere, > and checksum

Re: Online verification of checksums

2019-03-19 Thread Robert Haas
On Mon, Mar 18, 2019 at 2:38 AM Stephen Frost wrote: > Sure the backend has those facilities since it needs to, but these > frontend tools *don't* need that to *never* have any false positives, so > why are we complicating things by saying that this frontend tool and the > backend have to

Re: Online verification of checksums

2019-03-18 Thread Stephen Frost
Greetings, On Tue, Mar 19, 2019 at 04:15 Michael Banck wrote: > Am Montag, den 18.03.2019, 16:11 +0800 schrieb Stephen Frost: > > On Mon, Mar 18, 2019 at 15:52 Michael Banck > wrote: > > > Am Montag, den 18.03.2019, 03:34 -0400 schrieb Stephen Frost: > > > > Thanks for that. Reading through

Re: Online verification of checksums

2019-03-18 Thread Michael Banck
Hi, Am Montag, den 18.03.2019, 16:11 +0800 schrieb Stephen Frost: > On Mon, Mar 18, 2019 at 15:52 Michael Banck wrote: > > Am Montag, den 18.03.2019, 03:34 -0400 schrieb Stephen Frost: > > > Thanks for that.  Reading through the code though, I don't entirely > > > understand why we're making

Re: Online verification of checksums

2019-03-18 Thread Robert Haas
On Mon, Mar 18, 2019 at 2:06 AM Michael Paquier wrote: > The mentions on this thread that the server has all the facility in > place to properly lock a buffer and make sure that a partial read > *never* happens and that we *never* have any kind of false positives, > directly preventing the set of

Re: Online verification of checksums

2019-03-18 Thread Stephen Frost
Greetings, On Mon, Mar 18, 2019 at 15:52 Michael Banck wrote: > Hi. > > Am Montag, den 18.03.2019, 03:34 -0400 schrieb Stephen Frost: > > * Michael Banck (michael.ba...@credativ.de) wrote: > > > Am Montag, den 18.03.2019, 02:38 -0400 schrieb Stephen Frost: > > > > * Michael Paquier

Re: Online verification of checksums

2019-03-18 Thread Michael Banck
Hi. Am Montag, den 18.03.2019, 03:34 -0400 schrieb Stephen Frost: > * Michael Banck (michael.ba...@credativ.de) wrote: > > Am Montag, den 18.03.2019, 02:38 -0400 schrieb Stephen Frost: > > > * Michael Paquier (mich...@paquier.xyz) wrote: > > > > On Mon, Mar 18, 2019 at 01:43:08AM -0400, Stephen

Re: Online verification of checksums

2019-03-18 Thread Michael Banck
Hi, Am Montag, den 18.03.2019, 08:18 +0100 schrieb Michael Banck: > I have now rebased that patch on top of the pg_verify_checksums -> > pg_checksums renaming, see attached. Sorry, I had missed some hunks in the TAP tests, fixed-up patch attached. Michael -- Michael Banck Projektleiter /

Re: Online verification of checksums

2019-03-18 Thread Stephen Frost
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Mon, Mar 18, 2019 at 02:38:10AM -0400, Stephen Frost wrote: > > Uh, we are, of course, going to have partial reads- we just need to > > handle them appropriately, and that's not hard to do in a way that we > > never have false

Re: Online verification of checksums

2019-03-18 Thread Stephen Frost
Greetings, * Michael Banck (michael.ba...@credativ.de) wrote: > Am Montag, den 18.03.2019, 02:38 -0400 schrieb Stephen Frost: > > * Michael Paquier (mich...@paquier.xyz) wrote: > > > On Mon, Mar 18, 2019 at 01:43:08AM -0400, Stephen Frost wrote: > > > > To be clear, I agree completely that we

Re: Online verification of checksums

2019-03-18 Thread Michael Paquier
On Mon, Mar 18, 2019 at 02:38:10AM -0400, Stephen Frost wrote: > Uh, we are, of course, going to have partial reads- we just need to > handle them appropriately, and that's not hard to do in a way that we > never have false positives. Ere, my apologies here. I meant the read of a torn page, not

Re: Online verification of checksums

2019-03-18 Thread Michael Banck
Hi, Am Montag, den 18.03.2019, 02:38 -0400 schrieb Stephen Frost: > * Michael Paquier (mich...@paquier.xyz) wrote: > > On Mon, Mar 18, 2019 at 01:43:08AM -0400, Stephen Frost wrote: > > > To be clear, I agree completely that we don't want to be reporting false > > > positives or "this might mean

Re: Online verification of checksums

2019-03-18 Thread Stephen Frost
Greetings, * Michael Paquier (mich...@paquier.xyz) wrote: > On Mon, Mar 18, 2019 at 01:43:08AM -0400, Stephen Frost wrote: > > To be clear, I agree completely that we don't want to be reporting false > > positives or "this might mean corruption!" to users running the tool, > > but I haven't seen

Re: Online verification of checksums

2019-03-18 Thread Michael Paquier
On Mon, Mar 18, 2019 at 01:43:08AM -0400, Stephen Frost wrote: > To be clear, I agree completely that we don't want to be reporting false > positives or "this might mean corruption!" to users running the tool, > but I haven't seen a good explaination of why this needs to involve the > server to

Re: Online verification of checksums

2019-03-17 Thread Stephen Frost
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > If we want to run it from the server itself, then I guess a background > worker would be a better solution. Incidentally, that's something I've > been toying with some time ago, see [1]. So, I'm a big fan of this idea of having a

Re: Online verification of checksums

2019-03-17 Thread Stephen Frost
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > On 3/2/19 12:03 AM, Robert Haas wrote: > > On Tue, Sep 18, 2018 at 10:37 AM Michael Banck > > wrote: > >> I have added a retry for this as well now, without a pg_sleep() as well. > >> This catches around 80% of the half-reads, but

Re: Online verification of checksums

2019-03-08 Thread Julien Rouhaud
On Fri, Mar 8, 2019 at 6:50 PM Tomas Vondra wrote: > > On 3/8/19 4:19 PM, Julien Rouhaud wrote: > > On Thu, Mar 7, 2019 at 7:00 PM Andres Freund wrote: > >> > >> On 2019-03-07 12:53:30 +0100, Tomas Vondra wrote: > >>> > >>> But then again, we could just > >>> hack a special version of

Re: Online verification of checksums

2019-03-08 Thread Tomas Vondra
On 3/8/19 4:19 PM, Julien Rouhaud wrote: > On Thu, Mar 7, 2019 at 7:00 PM Andres Freund wrote: >> >> On 2019-03-07 12:53:30 +0100, Tomas Vondra wrote: >>> >>> But then again, we could just >>> hack a special version of ReadBuffer_common() which would just >> >>> (a) check if a page is in shared

Re: Online verification of checksums

2019-03-08 Thread Julien Rouhaud
On Thu, Mar 7, 2019 at 7:00 PM Andres Freund wrote: > > On 2019-03-07 12:53:30 +0100, Tomas Vondra wrote: > > > > But then again, we could just > > hack a special version of ReadBuffer_common() which would just > > > (a) check if a page is in shared buffers, and if it is then consider the > >

Re: Online verification of checksums

2019-03-08 Thread Michael Banck
Hi, Am Sonntag, den 03.03.2019, 11:51 +0100 schrieb Michael Banck: > Am Samstag, den 02.03.2019, 11:08 -0500 schrieb Stephen Frost: > > I'm not necessairly against skipping to the next file, to be clear, > > but I think I'd be happier if we kept reading the file until we > > actually get EOF. >

Re: Online verification of checksums

2019-03-07 Thread Andres Freund
Hi, On 2019-03-07 12:53:30 +0100, Tomas Vondra wrote: > On 3/6/19 6:42 PM, Andres Freund wrote: > > > > ... > > > > To me the right way seems to be to IO lock the page via PG after such a > > failure, and then retry. Which should be relatively easily doable for > > the basebackup case, but

Re: Online verification of checksums

2019-03-07 Thread Tomas Vondra
On 3/6/19 6:42 PM, Andres Freund wrote: > ... > To me the right way seems to be to IO lock the page via PG after such a failure, and then retry. Which should be relatively easily doable for the basebackup case, but obviously harder for the pg_verify_checksums case. Actually, what do you

Re: Online verification of checksums

2019-03-06 Thread Michael Paquier
On Wed, Mar 06, 2019 at 08:53:57PM +0100, Tomas Vondra wrote: > Not sure. AFAICS that would to require a single transaction, and if we > happen to add some sort of throttling (which is a feature request I'd > expect pretty soon to make it usable on live clusters) that might be > quite

Re: Online verification of checksums

2019-03-06 Thread Tomas Vondra
On 3/6/19 8:41 PM, Andres Freund wrote: > Hi, > > On 2019-03-06 20:37:39 +0100, Tomas Vondra wrote: >> Not sure how to integrate it into the CLI tool, though. Perhaps we it >> could require connection info so that it can execute a function, when >> executed in online mode? > > To me the right

Re: Online verification of checksums

2019-03-06 Thread Andres Freund
Hi, On 2019-03-06 20:37:39 +0100, Tomas Vondra wrote: > Not sure how to integrate it into the CLI tool, though. Perhaps we it > could require connection info so that it can execute a function, when > executed in online mode? To me the right fix would be to simply have this run as part of the

Re: Online verification of checksums

2019-03-06 Thread Tomas Vondra
On 3/6/19 6:42 PM, Andres Freund wrote: > On 2019-03-06 12:33:49 -0500, Robert Haas wrote: >> On Sat, Mar 2, 2019 at 5:45 AM Michael Banck >> wrote: >>> Am Freitag, den 01.03.2019, 18:03 -0500 schrieb Robert Haas: On Tue, Sep 18, 2018 at 10:37 AM Michael Banck wrote: > I have

Re: Online verification of checksums

2019-03-06 Thread Tomas Vondra
On 3/6/19 6:26 PM, Robert Haas wrote: > On Sat, Mar 2, 2019 at 4:38 PM Tomas Vondra > wrote: >> FWIW I don't think this qualifies as torn page - i.e. it's not a full >> read with a mix of old and new data. This is partial write, most likely >> because we read the blocks one by one, and when we

Re: Online verification of checksums

2019-03-06 Thread Andres Freund
On 2019-03-06 12:33:49 -0500, Robert Haas wrote: > On Sat, Mar 2, 2019 at 5:45 AM Michael Banck > wrote: > > Am Freitag, den 01.03.2019, 18:03 -0500 schrieb Robert Haas: > > > On Tue, Sep 18, 2018 at 10:37 AM Michael Banck > > > wrote: > > > > I have added a retry for this as well now, without

Re: Online verification of checksums

2019-03-06 Thread Robert Haas
On Sat, Mar 2, 2019 at 5:45 AM Michael Banck wrote: > Am Freitag, den 01.03.2019, 18:03 -0500 schrieb Robert Haas: > > On Tue, Sep 18, 2018 at 10:37 AM Michael Banck > > wrote: > > > I have added a retry for this as well now, without a pg_sleep() as well. > > > This catches around 80% of the

Re: Online verification of checksums

2019-03-06 Thread Robert Haas
On Sat, Mar 2, 2019 at 4:38 PM Tomas Vondra wrote: > FWIW I don't think this qualifies as torn page - i.e. it's not a full > read with a mix of old and new data. This is partial write, most likely > because we read the blocks one by one, and when we hit the last page > while the table is being

Re: Online verification of checksums

2019-03-05 Thread Stephen Frost
Greetings, On Tue, Mar 5, 2019 at 18:36 Michael Paquier wrote: > On Tue, Mar 05, 2019 at 02:08:03PM +0100, Tomas Vondra wrote: > > Based on quickly skimming that thread the main issue seems to be > > deciding which files in the data directory are expected to have > > checksums. Which is a valid

Re: Online verification of checksums

2019-03-05 Thread Michael Paquier
On Tue, Mar 05, 2019 at 02:08:03PM +0100, Tomas Vondra wrote: > Based on quickly skimming that thread the main issue seems to be > deciding which files in the data directory are expected to have > checksums. Which is a valid issue, of course, but I was expecting > something about partial

Re: Online verification of checksums

2019-03-05 Thread Tomas Vondra
On 3/5/19 4:12 AM, Michael Paquier wrote: > On Mon, Mar 04, 2019 at 03:08:09PM +0100, Tomas Vondra wrote: >> I still don't understand what issue you see in how basebackup verifies >> checksums. Can you point me to the explanation you've sent after 11 was >> released? > > The history is mostly on

  1   2   >