subject:"Online enabling of checksums"

Re: Online enabling of checksums

2019-01-31 Thread Magnus Hagander

On Thu, Jan 31, 2019 at 11:57 AM Andres Freund  wrote:

> On 2018-09-30 10:48:36 +0200, Tomas Vondra wrote:
> >
> >
> > On 09/29/2018 06:51 PM, Stephen Frost wrote:
> > > Greetings,
> > >
> > > * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> > >> On 09/29/2018 02:19 PM, Stephen Frost wrote:
> > >>> * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> >  While looking at the online checksum verification patch (which I
> guess
> >  will get committed before this one), it occurred to me that
> disabling
> >  checksums may need to be more elaborate, to protect against someone
> >  using the stale flag value (instead of simply switching to "off"
> >  assuming that's fine).
> > 
> >  The signals etc. seem good enough for our internal stuff, but what
> if
> >  someone uses the flag in a different way? E.g. the online checksum
> >  verification runs as an independent process (i.e. not a backend) and
> >  reads the control file to find out if the checksums are enabled or
> not.
> >  So if we just switch from "on" to "off" that will break.
> > 
> >  Of course, we may also say "Don't disable checksums while online
> >  verification is running!" but that's not ideal.
> > >>>
> > >>> I'm not really sure what else we could say here..?  I don't
> particularly
> > >>> see an issue with telling people that if they disable checksums while
> > >>> they're running a tool that's checking the checksums that they're
> going
> > >>> to get odd results.
> > >>
> > >> I don't know, to be honest. I was merely looking at the online
> > >> verification patch and realized that if someone disables checksums it
> > >> won't notice it (because it only reads the flag once, at the very
> > >> beginning) and will likely produce bogus errors.
> > >>
> > >> Although, maybe it won't - it now uses a checkpoint LSN, so that might
> > >> fix it. The checkpoint LSN is read from the same controlfile as the
> > >> flag, so we know the checksums were enabled during that checkpoint.
> Soi
> > >> if we ignore failures with a newer LSN, that should do the trick, no?
> > >>
> > >> So perhaps that's the right "protocol" to handle this?
> > >
> > > I certainly don't think we need to do anything more.
> > >
> >
> > Not sure I agree. I'm not suggesting we absolutely have to write huge
> > amount of code to deal with this issue, but I hope we agree we need to
> > at least understand the issue so that we can put warnings into docs.
> >
> > FWIW pg_basebackup (in the default "verify checksums") has this issue
> > too AFAICS, and it seems rather unfriendly to just start reporting
> > checksum errors during backup in that case.
> >
> > But as I mentioned, maybe there's no problem at all and using the
> > checkpoint LSN deals with it automatically.
>
> Given that this patch has not been developed in a few months, I don't
> see why this has an active 2019-01 CF entry? I think we should mark this
> as Returned With Feedback.
>
> https://commitfest.postgresql.org/21/1535/
>
>
Unfortunately, I agree. I wish that wasn't the case, but due to things
outside my control, that's what happened.

So yes, let's punt it.

//Magnus

Re: Online enabling of checksums

2019-01-31 Thread Andres Freund

On 2018-09-30 10:48:36 +0200, Tomas Vondra wrote:
> 
> 
> On 09/29/2018 06:51 PM, Stephen Frost wrote:
> > Greetings,
> > 
> > * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> >> On 09/29/2018 02:19 PM, Stephen Frost wrote:
> >>> * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
>  While looking at the online checksum verification patch (which I guess
>  will get committed before this one), it occurred to me that disabling
>  checksums may need to be more elaborate, to protect against someone
>  using the stale flag value (instead of simply switching to "off"
>  assuming that's fine).
> 
>  The signals etc. seem good enough for our internal stuff, but what if
>  someone uses the flag in a different way? E.g. the online checksum
>  verification runs as an independent process (i.e. not a backend) and
>  reads the control file to find out if the checksums are enabled or not.
>  So if we just switch from "on" to "off" that will break.
> 
>  Of course, we may also say "Don't disable checksums while online
>  verification is running!" but that's not ideal.
> >>>
> >>> I'm not really sure what else we could say here..?  I don't particularly
> >>> see an issue with telling people that if they disable checksums while
> >>> they're running a tool that's checking the checksums that they're going
> >>> to get odd results.
> >>
> >> I don't know, to be honest. I was merely looking at the online
> >> verification patch and realized that if someone disables checksums it
> >> won't notice it (because it only reads the flag once, at the very
> >> beginning) and will likely produce bogus errors.
> >>
> >> Although, maybe it won't - it now uses a checkpoint LSN, so that might
> >> fix it. The checkpoint LSN is read from the same controlfile as the
> >> flag, so we know the checksums were enabled during that checkpoint. Soi
> >> if we ignore failures with a newer LSN, that should do the trick, no?
> >>
> >> So perhaps that's the right "protocol" to handle this?
> > 
> > I certainly don't think we need to do anything more.
> > 
> 
> Not sure I agree. I'm not suggesting we absolutely have to write huge
> amount of code to deal with this issue, but I hope we agree we need to
> at least understand the issue so that we can put warnings into docs.
> 
> FWIW pg_basebackup (in the default "verify checksums") has this issue
> too AFAICS, and it seems rather unfriendly to just start reporting
> checksum errors during backup in that case.
> 
> But as I mentioned, maybe there's no problem at all and using the
> checkpoint LSN deals with it automatically.

Given that this patch has not been developed in a few months, I don't
see why this has an active 2019-01 CF entry? I think we should mark this
as Returned With Feedback.

https://commitfest.postgresql.org/21/1535/

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-09-30 Thread Tomas Vondra

On 09/29/2018 06:51 PM, Stephen Frost wrote:
> Greetings,
> 
> * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
>> On 09/29/2018 02:19 PM, Stephen Frost wrote:
>>> * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
 While looking at the online checksum verification patch (which I guess
 will get committed before this one), it occurred to me that disabling
 checksums may need to be more elaborate, to protect against someone
 using the stale flag value (instead of simply switching to "off"
 assuming that's fine).

 The signals etc. seem good enough for our internal stuff, but what if
 someone uses the flag in a different way? E.g. the online checksum
 verification runs as an independent process (i.e. not a backend) and
 reads the control file to find out if the checksums are enabled or not.
 So if we just switch from "on" to "off" that will break.

 Of course, we may also say "Don't disable checksums while online
 verification is running!" but that's not ideal.
>>>
>>> I'm not really sure what else we could say here..?  I don't particularly
>>> see an issue with telling people that if they disable checksums while
>>> they're running a tool that's checking the checksums that they're going
>>> to get odd results.
>>
>> I don't know, to be honest. I was merely looking at the online
>> verification patch and realized that if someone disables checksums it
>> won't notice it (because it only reads the flag once, at the very
>> beginning) and will likely produce bogus errors.
>>
>> Although, maybe it won't - it now uses a checkpoint LSN, so that might
>> fix it. The checkpoint LSN is read from the same controlfile as the
>> flag, so we know the checksums were enabled during that checkpoint. Soi
>> if we ignore failures with a newer LSN, that should do the trick, no?
>>
>> So perhaps that's the right "protocol" to handle this?
> 
> I certainly don't think we need to do anything more.
> 

Not sure I agree. I'm not suggesting we absolutely have to write huge
amount of code to deal with this issue, but I hope we agree we need to
at least understand the issue so that we can put warnings into docs.

FWIW pg_basebackup (in the default "verify checksums") has this issue
too AFAICS, and it seems rather unfriendly to just start reporting
checksum errors during backup in that case.

But as I mentioned, maybe there's no problem at all and using the
checkpoint LSN deals with it automatically.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Online enabling of checksums

2018-09-29 Thread Stephen Frost

Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On 09/29/2018 02:19 PM, Stephen Frost wrote:
> > * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> >> While looking at the online checksum verification patch (which I guess
> >> will get committed before this one), it occurred to me that disabling
> >> checksums may need to be more elaborate, to protect against someone
> >> using the stale flag value (instead of simply switching to "off"
> >> assuming that's fine).
> >>
> >> The signals etc. seem good enough for our internal stuff, but what if
> >> someone uses the flag in a different way? E.g. the online checksum
> >> verification runs as an independent process (i.e. not a backend) and
> >> reads the control file to find out if the checksums are enabled or not.
> >> So if we just switch from "on" to "off" that will break.
> >>
> >> Of course, we may also say "Don't disable checksums while online
> >> verification is running!" but that's not ideal.
> > 
> > I'm not really sure what else we could say here..?  I don't particularly
> > see an issue with telling people that if they disable checksums while
> > they're running a tool that's checking the checksums that they're going
> > to get odd results.
> 
> I don't know, to be honest. I was merely looking at the online
> verification patch and realized that if someone disables checksums it
> won't notice it (because it only reads the flag once, at the very
> beginning) and will likely produce bogus errors.
> 
> Although, maybe it won't - it now uses a checkpoint LSN, so that might
> fix it. The checkpoint LSN is read from the same controlfile as the
> flag, so we know the checksums were enabled during that checkpoint. Soi
> if we ignore failures with a newer LSN, that should do the trick, no?
> 
> So perhaps that's the right "protocol" to handle this?

I certainly don't think we need to do anything more.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: Online enabling of checksums

2018-09-29 Thread Tomas Vondra

On 09/29/2018 02:19 PM, Stephen Frost wrote:
> Greetings,
> 
> * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
>> While looking at the online checksum verification patch (which I guess
>> will get committed before this one), it occurred to me that disabling
>> checksums may need to be more elaborate, to protect against someone
>> using the stale flag value (instead of simply switching to "off"
>> assuming that's fine).
>>
>> The signals etc. seem good enough for our internal stuff, but what if
>> someone uses the flag in a different way? E.g. the online checksum
>> verification runs as an independent process (i.e. not a backend) and
>> reads the control file to find out if the checksums are enabled or not.
>> So if we just switch from "on" to "off" that will break.
>>
>> Of course, we may also say "Don't disable checksums while online
>> verification is running!" but that's not ideal.
> 
> I'm not really sure what else we could say here..?  I don't particularly
> see an issue with telling people that if they disable checksums while
> they're running a tool that's checking the checksums that they're going
> to get odd results.
> 

I don't know, to be honest. I was merely looking at the online
verification patch and realized that if someone disables checksums it
won't notice it (because it only reads the flag once, at the very
beginning) and will likely produce bogus errors.

Although, maybe it won't - it now uses a checkpoint LSN, so that might
fix it. The checkpoint LSN is read from the same controlfile as the
flag, so we know the checksums were enabled during that checkpoint. Soi
if we ignore failures with a newer LSN, that should do the trick, no?

So perhaps that's the right "protocol" to handle this?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Online enabling of checksums

2018-09-29 Thread Stephen Frost

Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> While looking at the online checksum verification patch (which I guess
> will get committed before this one), it occurred to me that disabling
> checksums may need to be more elaborate, to protect against someone
> using the stale flag value (instead of simply switching to "off"
> assuming that's fine).
> 
> The signals etc. seem good enough for our internal stuff, but what if
> someone uses the flag in a different way? E.g. the online checksum
> verification runs as an independent process (i.e. not a backend) and
> reads the control file to find out if the checksums are enabled or not.
> So if we just switch from "on" to "off" that will break.
> 
> Of course, we may also say "Don't disable checksums while online
> verification is running!" but that's not ideal.

I'm not really sure what else we could say here..?  I don't particularly
see an issue with telling people that if they disable checksums while
they're running a tool that's checking the checksums that they're going
to get odd results.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: Online enabling of checksums

2018-09-29 Thread Tomas Vondra

Hi,

While looking at the online checksum verification patch (which I guess
will get committed before this one), it occurred to me that disabling
checksums may need to be more elaborate, to protect against someone
using the stale flag value (instead of simply switching to "off"
assuming that's fine).

The signals etc. seem good enough for our internal stuff, but what if
someone uses the flag in a different way? E.g. the online checksum
verification runs as an independent process (i.e. not a backend) and
reads the control file to find out if the checksums are enabled or not.
So if we just switch from "on" to "off" that will break.

Of course, we may also say "Don't disable checksums while online
verification is running!" but that's not ideal.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Online enabling of checksums

2018-08-01 Thread Sergei Kornilov

Hi

> This doesn't test the consequences of the restart being skipped, nor
> does it review on a code level the correctness.
I check not only one stuff during review. I look code too: bgworker 
checksumhelper.c registered with:
> bgw.bgw_start_time = BgWorkerStart_RecoveryFinished;
And then process the whole cluster (even if we run checksumhelper before, but 
exit before its completed). Or BgWorkerStart_RecoveryFinished does not 
guarantee start only after recovery finished?
Before start any real work (and after recovery end) checksumhelper checked 
current cluster status again:

> +  * If a standby was restarted when in pending state, a background worker
> +  * was registered to start. If it's later promoted after the master has
> +  * completed enabling checksums, we need to terminate immediately and 
> not
> +  * do anything. If the cluster is still in pending state when promoted,
> +  * the background worker should start to complete the job.

> What if your replicas are delayed (e.g. recovery_min_apply_delay)?
> What if you later need to do PITR?
if we start after replay pg_enable_data_checksums and before it completed - we 
plan start bgworker on recovery finish.
if we replay checksumhelper finish - we _can_ start checksumhelper again and 
this is handled during checksumhelper start.

Behavior seems correct for me. I miss something very wrong?

regards, Sergei

Re: Online enabling of checksums

2018-08-01 Thread Andres Freund

Hi,

On 2018-08-01 21:03:22 +0300, Sergei Kornilov wrote:
> > They fail over to a secondary to do maintenance on a primary.
> But this is not problem even in current patch state. We can restart replica 
> before failover and it works. I tested this behavior during my review.
> We can:
> - call pg_enable_data_checksums() on master
> - wait change data_checksums to inprogress on replica

That's *precisely* the problem. What if your replicas are delayed
(e.g. recovery_min_apply_delay)? How would you schedule that restart
properly?  What if you later need to do PITR?

> - restart replica - we can restart replica before promote, right?
> - promote this replica
> - checksum helper is launched now and working on this promoted cluster

This doesn't test the consequences of the restart being skipped, nor
does it review on a code level the correctness.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-08-01 Thread Sergei Kornilov

Hi

> They fail over to a secondary to do maintenance on a primary.
But this is not problem even in current patch state. We can restart replica 
before failover and it works. I tested this behavior during my review.
We can:
- call pg_enable_data_checksums() on master
- wait change data_checksums to inprogress on replica
- restart replica - we can restart replica before promote, right?
- promote this replica
- checksum helper is launched now and working on this promoted cluster

regards, Sergei

Re: Online enabling of checksums

2018-08-01 Thread Andres Freund

Hi,

On 2018-08-01 10:34:55 -0700, Joshua D. Drake wrote:
> Lastly, I think Alvaro has a point with the incremental development and I
> also think some others on this thread need to, "show me the patch" instead
> of being armchair directors of development.

Oh, FFS. I pointed out the issue that led to the restart being
introduced (reminder, in the committed but then reverted version that
didn't exist). I explained how the much less intrusive version would
work. I think it's absurd to describe that as "armchair directors of
development".

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-08-01 Thread Joshua D. Drake


On 08/01/2018 09:20 AM, Alvaro Herrera wrote:



my problem is that I think the "restart" approach is just using the
entirely wrong hammer to solve the problem at hand.  At the very least
it's very problematic in respect to replicas, which need to know about
the setting too, and can have similar problems the restart on the
primary is supposed to prevent.

If we define "restart" to mean taking all the servers down
simultaneously, that can be planned.


People in mission critical environments do not "restart all servers". 
They fail over to a secondary to do maintenance on a primary. When you 
have a system where you literally lose thousands of dollars every minute 
the database is down you can't do what you are proposing. When you have 
a system that if the database is down for longer than X minutes, you 
actually lose a whole day because all of the fabricators have to 
revalidate before they begin work, you can't do that. Granted that is 
not the majority (which you mention) but let's not forget them.


The one place where a restart does happen and will continue to happen 
for around 5 (3 if you incorporate pg_logical and 9.6) more years is 
upgrades. Although we have logical replication for upgrades now, we are 
5 years away from the majority of users being on a version of PostgreSQL 
that supports logical replication for upgrades. So, I can see an 
argument for an incremental approach because people could enable 
checksums as part of their upgrade restart.



For users that cannot do that,
that's too bad, they'll have to wait to the next release in order to
enable checksums (assuming they fund the necessary development).  But


I have to say, as a proponent of funded development for longer than most 
I like to see this refreshing take on the fact that this all does take 
money.



there are many systems where it *is* possible to take everything down
for five seconds, then back up.  They can definitely take advantage of
checksummed data.


This is a good point.


Currently, the only way to enable checksums is to initdb and create a
new copy of the data from a logical backup, which could take hours or
even days if data is large, or use logical replication.


Originally, I was going to -1 how this is being implemented. I too wish 
we had the "ALTER DATABASE ENABLE CHECKSUM" or equivalent without a 
restart. However, being able to just restart is a huge step forward from 
what we have now.


Lastly, I think Alvaro has a point with the incremental development and 
I also think some others on this thread need to, "show me the patch" 
instead of being armchair directors of development.


JD



--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
***  A fault and talent of mine is to tell it exactly how it is.  ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
* Unless otherwise stated, opinions are my own.   *

Re: Online enabling of checksums

2018-08-01 Thread Andres Freund

Hi,

On 2018-08-01 12:45:27 -0400, Bruce Momjian wrote:
> On Wed, Aug  1, 2018 at 09:39:43AM -0700, Andres Freund wrote:
> > On 2018-08-01 12:36:13 -0400, Bruce Momjian wrote:
> > > This patchset is weird because it is perhaps our first case of trying to
> > > change the state of the server while it is running.  We just don't have
> > > an established protocol for how to orchestrate that, so we are limping
> > > along toward a solution.
> > 
> > There's a number of GUCs that do this? Even in related areas,
> > cf. full_page_writes.
> 
> How is that taking the server from one state to the next in a
> non-instantaneous way?

Because it requires coordination around checkpoints, it can only really
fully take effect after the start of a checkpoint.  And it even both has
primary and replica differences...

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-08-01 Thread Bruce Momjian

On Wed, Aug  1, 2018 at 09:39:43AM -0700, Andres Freund wrote:
> On 2018-08-01 12:36:13 -0400, Bruce Momjian wrote:
> > This patchset is weird because it is perhaps our first case of trying to
> > change the state of the server while it is running.  We just don't have
> > an established protocol for how to orchestrate that, so we are limping
> > along toward a solution.
> 
> There's a number of GUCs that do this? Even in related areas,
> cf. full_page_writes.

How is that taking the server from one state to the next in a
non-instantaneous way?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Online enabling of checksums

2018-08-01 Thread Andres Freund

On 2018-08-01 12:36:13 -0400, Bruce Momjian wrote:
> This patchset is weird because it is perhaps our first case of trying to
> change the state of the server while it is running.  We just don't have
> an established protocol for how to orchestrate that, so we are limping
> along toward a solution.

There's a number of GUCs that do this? Even in related areas,
cf. full_page_writes.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-08-01 Thread Bruce Momjian

On Tue, Jul 31, 2018 at 04:05:23PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2018-07-31 18:56:29 -0400, Alvaro Herrera wrote:
> > In the spirit of supporting incremental development, I think it's quite
> > sensible to get the current thing done, then see what it takes to get
> > the next thing done.  Each is an improvement on its own merits.  And it
> > doesn't have to be made by the same people.
> 
> I just don't buy this. An earlier version of this feature was committed
> to v11 without the restart, over objections. There's now extra state in
> the control file to support the restart based system, there's extra
> tests, extra docs. And it'd not be much code to just make it work
> without the restart. The process around this patchset is just plain
> weird.   --
  -

This patchset is weird because it is perhaps our first case of trying to
change the state of the server while it is running.  We just don't have
an established protocol for how to orchestrate that, so we are limping
along toward a solution.  Forcing a restart is probably part of that
primitive orchestration.  We will probably have similar challenges if we
ever allowed Postgres to change its data format on the fly.  These
challenges are one reason pg_upgrade only modifies the new cluster,
never the old one.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Online enabling of checksums

2018-08-01 Thread Andres Freund

Hi,

On 2018-08-01 18:25:48 +0200, Tomas Vondra wrote:
> Sure, if there are issues with this approach, that would make it
> unacceptable. I'm not sure why would it be an issue for replicas (which is
> what you mention elsewhere), considering those don't write data and so can't
> fail to update a checksum?

Standbys compute checksums on writeout as well, no? We compute checksums
not at buffer modification, but at writeout time. And replay just marks
buffers dirty, it doesn't directly write to disk.

Architecturally there'd also be hint bits as a source, but I think we
probably neutered them enough for that not to be a problem during
replay.

And then there's also promotions.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-08-01 Thread Andres Freund

On 2018-08-01 12:20:12 -0400, Alvaro Herrera wrote:
> Hello
> 
> On 2018-Aug-01, Andres Freund wrote:
> 
> > My problem isn't just that I shouldn't think this should be committed
> > without at least a firm committement to do better,
> 
> I take "I think this shouldn't be committed" is what you meant.

Yep.


> I'm not sure I agree with this line of argument.  The reality is that
> real life or diverging priorities preclude people from working on
> $stuff.

Right.


> This is a useful feature-1 we have here, and if we stall it
> until we have feature-2, we may not get either until a year later.
> That's not a great outcome.  We didn't wait for partitioning, parallel
> query, DDL progress reporting, logical replication, JIT, wait events (to
> name but a few) to solve world's hunger in order to start getting
> committed.  We move forward step by step, and that's a good thing.

But we asked that they implement something consistent, and rejected many
that were not deemed to be that.


> > my problem is that I think the "restart" approach is just using the
> > entirely wrong hammer to solve the problem at hand.  At the very least
> > it's very problematic in respect to replicas, which need to know about
> > the setting too, and can have similar problems the restart on the
> > primary is supposed to prevent.
> 
> If we define "restart" to mean taking all the servers down
> simultaneously, that can be planned.

It really can't realistically. That'd essentially mean breaking PITR.
You'd have to schedule the restart of any replicas to happen after a
specific record.  And what if there's replicas that are on a delay? What
if there's data centers that are currently offline?

And again, this isn't hard to do properly. I don't get why we're talking
about an at least operationally complex workaround when the proper
solution isn't hard.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-08-01 Thread Alvaro Herrera

Hello

On 2018-Aug-01, Andres Freund wrote:

> My problem isn't just that I shouldn't think this should be committed
> without at least a firm committement to do better,

I take "I think this shouldn't be committed" is what you meant.

I'm not sure I agree with this line of argument.  The reality is that
real life or diverging priorities preclude people from working on
$stuff.  This is a useful feature-1 we have here, and if we stall it
until we have feature-2, we may not get either until a year later.
That's not a great outcome.  We didn't wait for partitioning, parallel
query, DDL progress reporting, logical replication, JIT, wait events (to
name but a few) to solve world's hunger in order to start getting
committed.  We move forward step by step, and that's a good thing.

Firm commitments are great things to have, and if the firmness leads to
feature-2 being part of the same release, great, but if it's not firm
enough, we can have feature-2 the next release (or whenever).  Even if
there's no such commitment, feature-1 is useful on its own.

> my problem is that I think the "restart" approach is just using the
> entirely wrong hammer to solve the problem at hand.  At the very least
> it's very problematic in respect to replicas, which need to know about
> the setting too, and can have similar problems the restart on the
> primary is supposed to prevent.

If we define "restart" to mean taking all the servers down
simultaneously, that can be planned.  For users that cannot do that,
that's too bad, they'll have to wait to the next release in order to
enable checksums (assuming they fund the necessary development).  But
there are many systems where it *is* possible to take everything down
for five seconds, then back up.  They can definitely take advantage of
checksummed data.

Currently, the only way to enable checksums is to initdb and create a
new copy of the data from a logical backup, which could take hours or
even days if data is large, or use logical replication.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Online enabling of checksums

2018-08-01 Thread Andres Freund

Hi,

On 2018-08-01 11:15:38 +0200, Tomas Vondra wrote:
> On 08/01/2018 10:40 AM, Michael Banck wrote:
> > If this was one week before feature freeze, I would agree with you 
> > that it makes sense to ship it with the restart requirement rather 
> > than not shipping it at all. But we're several commitfests away from 
> > v12, so making an effort to having this work without a downtime
> > looks like a reasonable requirement to me.
> > 
> 
> Why would all those pieces had to be committed at once? Why not to
> commit what we have now (with the restart) and then remove the
> restriction in a later commit?

Sure, if all the pieces existed in various degrees of solidness (with
the earlier pieces committable, but later ones needing work), I'd feel
*much* less concerned about it.


> In a way, the question is how far can we reasonably push the patch
> author(s) to implement stuff we consider desirable, but he/she/they
> decided it's not worth the time investment at this point.

We push people to only implement something really consistent all the
time.


> To me, it seems like an immensely useful feature even with the restart,
> and I don't think the restart is a major burden for most systems (it can
> be, if your system has no maintenance windows, or course).

I think it a problem, my problem is more that I don't think it's really
a solution for the problem.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-08-01 Thread Andres Freund

Hi,

On 2018-08-01 10:40:24 +0200, Michael Banck wrote:
> If this was one week before feature freeze, I would agree with you that
> it makes sense to ship it with the restart requirement rather than not
> shipping it at all. But we're several commitfests away from v12, so
> making an effort to having this work without a downtime looks like a
> reasonable requirement to me.

My problem isn't just that I shouldn't think this should be committed
without at least a firm committement to do better, my problem is that I
think the "restart" approach is just using the entirely wrong hammer to
solve the problem at hand.  At the very least it's very problematic in
respect to replicas, which need to know about the setting too, and can
have similar problems the restart on the primary is supposed to
prevent.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-08-01 Thread Sergei Kornilov

Hello
I think one restart is acceptable for such feature. I doubt user want often 
disable-enable checksums. In many cases checksums will be enabled one time 
during all cluster life. We need more downtimes for minor updates (4/year) and 
changes in config PGC_POSTMASTER (max_connections or autovacuum workers, for 
example).

regards, Sergei

Re: Online enabling of checksums

2018-08-01 Thread Tomas Vondra

On 08/01/2018 10:40 AM, Michael Banck wrote:
> Hi,
> 
> Am Dienstag, den 31.07.2018, 18:56 -0400 schrieb Alvaro Herrera:
>> The ability to get checksums enabled is a killer feature; the
>> ability to do it with no restart ... okay, it's better than
>> requiring a restart, but it's not *that* big a deal.
> 
> Well, it's a downtime and service interruption from the client's POV,
> and arguably we should remove the 'online' from the patch subject
> then. You can activate checksums on an offline instance already via
> the pg_checksums extensions to pg_verify_checksums that Michael
> Paquier and I wrote independently; of course, that downtime will be
> linerarily longer the more data you have.
> 

IMHO the main work still happens while the instance is running, so I
don't see why the restart would make it "not online".

But keeping or not keeping "online" is not the main dilemma faced by
this patch, I think. That is, if we drop "online" from the name I doubt
it'll make it any more acceptable for those objecting to having to
restart the cluster.

> If this was one week before feature freeze, I would agree with you 
> that it makes sense to ship it with the restart requirement rather 
> than not shipping it at all. But we're several commitfests away from 
> v12, so making an effort to having this work without a downtime
> looks like a reasonable requirement to me.
> 

Why would all those pieces had to be committed at once? Why not to
commit what we have now (with the restart) and then remove the
restriction in a later commit?

I understand the desire to be able to enable checksums without a
restart, but kinda agree with Alvaro regarding incremental development.

In a way, the question is how far can we reasonably push the patch
author(s) to implement stuff we consider desirable, but he/she/they
decided it's not worth the time investment at this point.

To me, it seems like an immensely useful feature even with the restart,
and I don't think the restart is a major burden for most systems (it can
be, if your system has no maintenance windows, or course).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Online enabling of checksums

2018-08-01 Thread Michael Banck

Hi,

Am Dienstag, den 31.07.2018, 18:56 -0400 schrieb Alvaro Herrera:
> The ability to get checksums enabled is a killer feature; the ability to
> do it with no restart ... okay, it's better than requiring a restart,
> but it's not *that* big a deal.

Well, it's a downtime and service interruption from the client's POV,
and arguably we should remove the 'online' from the patch subject then.
You can activate checksums on an offline instance already via the
pg_checksums extensions to pg_verify_checksums that Michael Paquier and
I wrote independently; of course, that downtime will be linerarily
longer the more data you have.

If this was one week before feature freeze, I would agree with you that
it makes sense to ship it with the restart requirement rather than not
shipping it at all. But we're several commitfests away from v12, so
making an effort to having this work without a downtime looks like a
reasonable requirement to me.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Re: Online enabling of checksums

2018-07-31 Thread Andres Freund

Hi,

On 2018-07-31 18:56:29 -0400, Alvaro Herrera wrote:
> In the spirit of supporting incremental development, I think it's quite
> sensible to get the current thing done, then see what it takes to get
> the next thing done.  Each is an improvement on its own merits.  And it
> doesn't have to be made by the same people.

I just don't buy this. An earlier version of this feature was committed
to v11 without the restart, over objections. There's now extra state in
the control file to support the restart based system, there's extra
tests, extra docs. And it'd not be much code to just make it work
without the restart. The process around this patchset is just plain
weird.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-07-31 Thread Alvaro Herrera

On 2018-Jul-31, Tom Lane wrote:

> Andres Freund  writes:
> > On 2018-07-31 23:20:27 +0200, Daniel Gustafsson wrote:
> >> Not really arguing for or against, but just to understand the reasoning 
> >> before
> >> starting hacking.  Why do we feel that a restart (intended for safety 
> >> here) in
> >> this case is a burden on a use-once process?  Is it from a usability or
> >> technical point of view?  Just want to make sure we are on the same page 
> >> before
> >> digging in to not hack on this patch in a direction which isn’t what is
> >> requested.
> 
> > Having, at some arbitrary seeming point in the middle of enabling
> > checksums to restart the server makes it harder to use and to schedule.
> > The restart is only needed to fix a relatively small issue, and doesn't
> > save that much code.
> 
> Without taking a position on the merits ... I don't see how you can
> claim "it doesn't save that much code" when we don't have a patch to
> compare to that doesn't require the restart.  Maybe it will turn out
> not to be much code, but we don't know that now.

The ability to get checksums enabled is a killer feature; the ability to
do it with no restart ... okay, it's better than requiring a restart,
but it's not *that* big a deal.

In the spirit of supporting incremental development, I think it's quite
sensible to get the current thing done, then see what it takes to get
the next thing done.  Each is an improvement on its own merits.  And it
doesn't have to be made by the same people.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Online enabling of checksums

2018-07-31 Thread Tom Lane

Andres Freund  writes:
> On 2018-07-31 23:20:27 +0200, Daniel Gustafsson wrote:
>> Not really arguing for or against, but just to understand the reasoning 
>> before
>> starting hacking.  Why do we feel that a restart (intended for safety here) 
>> in
>> this case is a burden on a use-once process?  Is it from a usability or
>> technical point of view?  Just want to make sure we are on the same page 
>> before
>> digging in to not hack on this patch in a direction which isn’t what is
>> requested.

> Having, at some arbitrary seeming point in the middle of enabling
> checksums to restart the server makes it harder to use and to schedule.
> The restart is only needed to fix a relatively small issue, and doesn't
> save that much code.

Without taking a position on the merits ... I don't see how you can
claim "it doesn't save that much code" when we don't have a patch to
compare to that doesn't require the restart.  Maybe it will turn out
not to be much code, but we don't know that now.

regards, tom lane

Re: Online enabling of checksums

2018-07-31 Thread Andres Freund

Hi,

On 2018-07-31 23:20:27 +0200, Daniel Gustafsson wrote:
> > On 26 Jul 2018, at 19:35, Andres Freund  wrote:
> > On July 26, 2018 10:03:39 AM PDT, Robert Haas  > > wrote:
> 
> >> Why can't we do better?
> > 
> > I don't think it's that hard to do better. IIRC I even outlined something 
> > before the freeze. If not, o certainly can (sketch: use procsignal based 
> > acknowledgment protocol, using a 64 bit integer. Useful for plenty other 
> > things).
> 
> Not really arguing for or against, but just to understand the reasoning before
> starting hacking.  Why do we feel that a restart (intended for safety here) in
> this case is a burden on a use-once process?  Is it from a usability or
> technical point of view?  Just want to make sure we are on the same page 
> before
> digging in to not hack on this patch in a direction which isn’t what is
> requested.

Having, at some arbitrary seeming point in the middle of enabling
checksums to restart the server makes it harder to use and to schedule.
The restart is only needed to fix a relatively small issue, and doesn't
save that much code.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-07-31 Thread Daniel Gustafsson

> On 31 Jul 2018, at 21:52, Joshua D. Drake  wrote:
> 
> On 07/31/2018 12:45 PM, Bruce Momjian wrote:

>>> Thanks for reviewing, I’ve updated the patch with the above mentioned 
>>> incorrect
>>> linkends as well as fixed the comments you made in a previous review.
>>> 
>>> The CF-builder-bot is red, but it’s because it’s trying to apply the already
>>> committed patch which is in the attached datallowconn thread.
>> I think checksumhelper_cost_delay should be checksum_helper_cost_delay.
>> ^
>> Is "helper" the right word?

IIRC, “helper” was chosen to signal that it’s a single process where “worker”
may be thought of as a process of which there can be many.

> Based on other terminology within the postgresql.conf should it be 
> "checksum_worker_cost_delay”?

Yes, I think it makes sense to rename it “worker” to align better with the
postgres nomenclature. Will fix.

cheers ./daniel

Re: Online enabling of checksums

2018-07-31 Thread Bruce Momjian

On Tue, Jul 31, 2018 at 12:52:40PM -0700, Joshua Drake wrote:
> On 07/31/2018 12:45 PM, Bruce Momjian wrote:
> >
> >>Hi!,
> >>
> >>Thanks for reviewing, I’ve updated the patch with the above mentioned 
> >>incorrect
> >>linkends as well as fixed the comments you made in a previous review.
> >>
> >>The CF-builder-bot is red, but it’s because it’s trying to apply the already
> >>committed patch which is in the attached datallowconn thread.
> >I think checksumhelper_cost_delay should be checksum_helper_cost_delay.
> > ^
> >
> >Is "helper" the right word?
> 
> Based on other terminology within the postgresql.conf should it be
> "checksum_worker_cost_delay"?

+1> 

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Online enabling of checksums

2018-07-31 Thread Joshua D. Drake


On 07/31/2018 12:45 PM, Bruce Momjian wrote:



Hi!,

Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
linkends as well as fixed the comments you made in a previous review.

The CF-builder-bot is red, but it’s because it’s trying to apply the already
committed patch which is in the attached datallowconn thread.

I think checksumhelper_cost_delay should be checksum_helper_cost_delay.
 ^

Is "helper" the right word?


Based on other terminology within the postgresql.conf should it be 
"checksum_worker_cost_delay"?


JD





--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
***  A fault and talent of mine is to tell it exactly how it is.  ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
* Unless otherwise stated, opinions are my own.   *

Re: Online enabling of checksums

2018-07-31 Thread Bruce Momjian

On Wed, Jul 25, 2018 at 11:35:31AM +0200, Daniel Gustafsson wrote:
> > On 24 Jul 2018, at 11:05, Sergei Kornilov  wrote:
> > 
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  tested, failed
> > Implements feature:   not tested
> > Spec compliant:   not tested
> > Documentation:tested, failed
> > 
> > Hello
> > As i wrote few weeks ago i can not build documentation due errors:
> >> postgres.sgml:19625: element xref: validity error : IDREF attribute 
> >> linkend references an unknown ID "runtime-checksumhelper-cost-delay"
> >> postgres.sgml:19626: element xref: validity error : IDREF attribute 
> >> linkend references an unknown ID "runtime-checksumhelper-cost-limit"
> > 
> > After remove such xref for test purposes patch pass check-world.
> 
> Hi!,
> 
> Thanks for reviewing, I’ve updated the patch with the above mentioned 
> incorrect
> linkends as well as fixed the comments you made in a previous review.
> 
> The CF-builder-bot is red, but it’s because it’s trying to apply the already
> committed patch which is in the attached datallowconn thread.

I think checksumhelper_cost_delay should be checksum_helper_cost_delay.
^

Is "helper" the right word?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Online enabling of checksums

2018-07-26 Thread Andres Freund




On July 26, 2018 10:03:39 AM PDT, Robert Haas  wrote:
>On Tue, Jun 26, 2018 at 7:45 AM, Magnus Hagander 
>wrote:
>> PFA an updated version of the patch for the next CF. We believe this
>one
>> takes care of all the things pointed out so far.
>>
>> For this version, we "implemented" the
>MegaExpensiveRareMemoryBarrier() by
>> simply requiring a restart of PostgreSQL to initiate the conversion
>> background. That is definitely going to guarantee a memory barrier.
>It's
>> certainly not ideal, but restarting the cluster is still a *lot*
>better than
>> having to do the entire conversion offline. This can of course be
>improved
>> upon in the future, but for now we stuck to the safe way.
>
>Honestly, I feel like the bar for this feature ought to be higher than
>that.
>
>(I half-expect a vigorous discussion of whether I have set the bar for
>the features I've developed in the right place or not, but I think
>that's not really a fair response.  If somebody thinks some feature I
>implemented should've been more baked, they might be right, but that's
>not what this thread is about.  I'm giving you MY opinion about THIS
>patch, nothing more or less.)

+1

>Why can't we do better?

I don't think it's that hard to do better. IIRC I even outlined something 
before the freeze. If not, o certainly can (sketch: use procsignal based 
acknowledgment protocol, using a 64 bit integer. Useful for plenty other 
things).

Andres


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Online enabling of checksums

2018-07-26 Thread Robert Haas

On Tue, Jun 26, 2018 at 7:45 AM, Magnus Hagander  wrote:
> PFA an updated version of the patch for the next CF. We believe this one
> takes care of all the things pointed out so far.
>
> For this version, we "implemented" the MegaExpensiveRareMemoryBarrier() by
> simply requiring a restart of PostgreSQL to initiate the conversion
> background. That is definitely going to guarantee a memory barrier. It's
> certainly not ideal, but restarting the cluster is still a *lot* better than
> having to do the entire conversion offline. This can of course be improved
> upon in the future, but for now we stuck to the safe way.

Honestly, I feel like the bar for this feature ought to be higher than that.

(I half-expect a vigorous discussion of whether I have set the bar for
the features I've developed in the right place or not, but I think
that's not really a fair response.  If somebody thinks some feature I
implemented should've been more baked, they might be right, but that's
not what this thread is about.  I'm giving you MY opinion about THIS
patch, nothing more or less.)

Why can't we do better?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Online enabling of checksums

2018-07-25 Thread Sergei Kornilov

Hello
Thank you for update! I did only quick test now: patch applied and build clean. 
But i have reproducible error during check-world:

> t/001_standby_checksum.pl .. 6/10 
> #   Failed test 'ensure checksums are enabled on standby'
> #   at t/001_standby_checksum.pl line 84.
> #  got: 'inprogress'
> # expected: 'on'

In stanby log i found error:
> 2018-07-25 13:13:05.463 MSK [16544] FATAL:  could not receive data from WAL 
> stream: ERROR:  requested WAL segment 00010003 has already 
> been removed

Checksumhelper obviously writes lot of wal. Test pass if i change restart order 
to slave first:
> $node_standby_1->restart();
> $node_master->restart();
Or we need replication slot setup.

Also we have log record after start:
> data checksums in pending state, starting background worker to enable
even in recovery state with actual start background worker only at recovery end 
(or promote). I think better using DEBUG ereport in postmaster and LOG in 
checksumhelper.

regards, Sergei

25.07.2018, 12:35, "Daniel Gustafsson" :
>>  On 24 Jul 2018, at 11:05, Sergei Kornilov  wrote:
>>
>>  The following review has been posted through the commitfest application:
>>  make installcheck-world: tested, failed
>>  Implements feature: not tested
>>  Spec compliant: not tested
>>  Documentation: tested, failed
>>
>>  Hello
>>  As i wrote few weeks ago i can not build documentation due errors:
>>>  postgres.sgml:19625: element xref: validity error : IDREF attribute 
>>> linkend references an unknown ID "runtime-checksumhelper-cost-delay"
>>>  postgres.sgml:19626: element xref: validity error : IDREF attribute 
>>> linkend references an unknown ID "runtime-checksumhelper-cost-limit"
>>
>>  After remove such xref for test purposes patch pass check-world.
>
> Hi!,
>
> Thanks for reviewing, I’ve updated the patch with the above mentioned 
> incorrect
> linkends as well as fixed the comments you made in a previous review.
>
> The CF-builder-bot is red, but it’s because it’s trying to apply the already
> committed patch which is in the attached datallowconn thread.
>
> cheers ./daniel

Re: Online enabling of checksums

2018-07-25 Thread Daniel Gustafsson

> On 24 Jul 2018, at 11:05, Sergei Kornilov  wrote:
> 
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, failed
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:tested, failed
> 
> Hello
> As i wrote few weeks ago i can not build documentation due errors:
>> postgres.sgml:19625: element xref: validity error : IDREF attribute linkend 
>> references an unknown ID "runtime-checksumhelper-cost-delay"
>> postgres.sgml:19626: element xref: validity error : IDREF attribute linkend 
>> references an unknown ID "runtime-checksumhelper-cost-limit"
> 
> After remove such xref for test purposes patch pass check-world.

Hi!,

Thanks for reviewing, I’ve updated the patch with the above mentioned incorrect
linkends as well as fixed the comments you made in a previous review.

The CF-builder-bot is red, but it’s because it’s trying to apply the already
committed patch which is in the attached datallowconn thread.

cheers ./daniel



online_checksums13.patch
Description: Binary data

Re: Online enabling of checksums

2018-07-24 Thread Sergei Kornilov

The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:tested, failed

Hello
As i wrote few weeks ago i can not build documentation due errors:
> postgres.sgml:19625: element xref: validity error : IDREF attribute linkend 
> references an unknown ID "runtime-checksumhelper-cost-delay"
> postgres.sgml:19626: element xref: validity error : IDREF attribute linkend 
> references an unknown ID "runtime-checksumhelper-cost-limit"

After remove such xref for test purposes patch pass check-world.

regards, Sergei

The new status of this patch is: Waiting on Author

Re: Online enabling of checksums

2018-06-26 Thread Sergei Kornilov

Hello

I tried build this patch and got error during make docs
> postgres.sgml:19626: element xref: validity error : IDREF attribute linkend 
> references an unknown ID "runtime-checksumhelper-cost-limit"
> postgres.sgml:19625: element xref: validity error : IDREF attribute linkend 
> references an unknown ID "runtime-checksumhelper-cost-delay"

Both new GUC checksumhelper_cost_delay and checksumhelper_cost_limit mentioned 
in postgresql.conf with special value -1 (-1 to use vacuum_cost_limit), but 
this value was not mentioned in docs. I noticed that the code and documentation 
describe different defaults.
Also i found one "in progress" in pg_enable_data_checksums() 
description. In other places status is called "inprogress" (without space).

>   VacuumPageHit = 0;
>   VacuumPageMiss = 0;
>   VacuumPageDirty = 0;
Hm, why these settings are set to 0 in checksumhelper process?

> /*
> * Force a checkpoint to get everything out to disk. XXX: this should
> * probably not be an IMMEDIATE checkpoint, but leave it there for now for
> * testing
> */
> RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | CHECKPOINT_IMMEDIATE);
We need not forget that.

regards, Sergei

Re: Online enabling of checksums

2018-06-26 Thread Magnus Hagander

On Mon, Apr 9, 2018 at 7:22 PM, Magnus Hagander  wrote:

> On Sat, Apr 7, 2018 at 6:22 PM, Andres Freund  wrote:
>
>> Hi,
>>
>> On 2018-04-07 08:57:03 +0200, Magnus Hagander wrote:
>> > Note however that I'm sans-laptop until Sunday, so I will revert it
>> then or
>> > possibly Monday.
>>
>> I'll deactive the isolationtester tests until then. They've been
>> intermittently broken for days now and prevent other tests from being
>> exercised.
>>
>
> Thanks.
>
> I've pushed the revert now, and left the pg_verify_checksums in place for
> the time being.
>
>
PFA an updated version of the patch for the next CF. We believe this one
takes care of all the things pointed out so far.

For this version, we "implemented" the MegaExpensiveRareMemoryBarrier() by
simply requiring a restart of PostgreSQL to initiate the conversion
background. That is definitely going to guarantee a memory barrier. It's
certainly not ideal, but restarting the cluster is still a *lot* better
than having to do the entire conversion offline. This can of course be
improved upon in the future, but for now we stuck to the safe way.

The concurrent create-database-from-one-that-had-no-checksums is handled by
simply looping over the list of databases as long as new databases show up,
and waiting for all open transactions to finish at the right moment to
ensure there is no concurrently running one as we get the database list.

Since the worker is now a regular background worker started from
postmaster, the cost-delay parameters had to be made GUCs instead of
function arguments.

(And the more or less broken isolation tests are simply removed)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7bfbc87109..108e049a85 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2011,6 +2011,42 @@ include_dir 'conf.d'
  
 
 
+
+ Online Checksumming
+
+ 
+  
+   checksumhelper_cost_delay (integer)
+   
+checksumhelper_cost_delay configuration parameter
+   
+   
+   
+
+ The length of time, in milliseconds, that the process will sleep when
+ the cost limit has been exceeded. The default value is zero, which
+ disables the cost-based checksumming delay feature. Positive values
+ enable cost-based checksumming.
+
+   
+  
+
+  
+   checksumhelper_cost_limit (integer)
+   
+checksumhelper_cost_limit configuration parameter
+   
+   
+   
+
+ The accumulated cost that will cause the checksumming process to sleep.
+ It is turned off by default.
+
+   
+  
+ 
+
+
 
  Asynchronous Behavior
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5dce8ef178..154cf40cd3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19582,6 +19582,74 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   
 
+  
+   Data Checksum Functions
+
+   
+The functions shown in  can
+be used to enable or disable data checksums in a running cluster.
+See  for details.
+   
+
+   
+Checksum SQL Functions
+
+ 
+  
+   Function
+   Return Type
+   Description
+  
+ 
+ 
+  
+   
+
+ pg_enable_data_checksums
+
+pg_enable_data_checksums()
+   
+   
+void
+   
+   
+
+ Initiates data checksums for the cluster. This will switch the data checksums mode
+ to in progress, but will not initiate the checksumming process.
+ In order to start checksumming the data pages the database must be restarted. Upon
+ restart a background worker will start processing all data in the database and enable
+ checksums for it. When all data pages have had checksums enabled, the cluster will
+ automatically switch to checksums on.
+
+
+ The  and
+  GUCs are used to 
+ throttle the
+ speed of the process is throttled using the same principles as
+ Cost-based Vacuum Delay.
+
+   
+  
+  
+   
+
+ pg_disable_data_checksums
+
+pg_disable_data_checksums()
+   
+   
+void
+   
+   
+Disables data checksums for the cluster. This takes effect immediately.
+   
+  
+ 
+
+   
+
+  
+
   
Database Object Management Functions
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 4489b585c7..be489e78b9 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -214,9 +214,9 @@ PostgreSQL documentation

 Use checksums on data pages to help detect corruption by the
 I/O system that would otherwise be silent.

Re: Online enabling of checksums

2018-04-15 Thread Magnus Hagander

On Tue, Apr 10, 2018 at 6:18 PM, Robert Haas  wrote:

> On Fri, Apr 6, 2018 at 8:59 PM, Andres Freund  wrote:
> > This is PROPARALLEL_RESTRICTED. That doesn't strike me right, shouldn't
> > they be PROPARALLEL_UNSAFE? It might be fine, but I'd not want to rely
> > on it.
>
> Just a fine-grained note on this particular point:
>
> It's totally fine for parallel-restricted operations to write WAL,
> write to the filesystem, or launch nukes at ${ENEMY_NATION}.  Well, I
> mean, the last one might be a bad idea for geopolitical reasons, but
> it's not a problem for parallel query.  It is a problem to insert or
> update heap tuples because it might extend the relation; mutual
> exclusion doesn't work properly there yet (there was a patch to fix
> that, but you had some concerns and it didn't go in).  It is a problem
> to update or delete heap tuples which might create new combo CIDs; not
> all workers will have the same view (there's no patch for this yet
> AFAIK, but the fix probably doesn't look that different from
> cc5f81366c36b3dd8f02bd9be1cf75b2cc8482bd and could probably use most
> of the same infrastructure).
>
> TL;DR: Writing pages (e.g. to set a checksum) doesn't make something
> non-parallel-safe.  Writing heap tuples makes it parallel-unsafe.
>
>
That's a good summary, thanks!

Just to be clear in this case though -- the function itself doesn't write
out *anything*. It only starts a background worker that later does it. The
background worker itself is not parallelized, so the risk in  this
particular usecase would be that we ended up starting multiple workers (or
just failed), I think.

But the summary is very good to have regardless! :)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Online enabling of checksums

2018-04-10 Thread Robert Haas

On Fri, Apr 6, 2018 at 8:59 PM, Andres Freund  wrote:
> This is PROPARALLEL_RESTRICTED. That doesn't strike me right, shouldn't
> they be PROPARALLEL_UNSAFE? It might be fine, but I'd not want to rely
> on it.

Just a fine-grained note on this particular point:

It's totally fine for parallel-restricted operations to write WAL,
write to the filesystem, or launch nukes at ${ENEMY_NATION}.  Well, I
mean, the last one might be a bad idea for geopolitical reasons, but
it's not a problem for parallel query.  It is a problem to insert or
update heap tuples because it might extend the relation; mutual
exclusion doesn't work properly there yet (there was a patch to fix
that, but you had some concerns and it didn't go in).  It is a problem
to update or delete heap tuples which might create new combo CIDs; not
all workers will have the same view (there's no patch for this yet
AFAIK, but the fix probably doesn't look that different from
cc5f81366c36b3dd8f02bd9be1cf75b2cc8482bd and could probably use most
of the same infrastructure).

TL;DR: Writing pages (e.g. to set a checksum) doesn't make something
non-parallel-safe.  Writing heap tuples makes it parallel-unsafe.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Online enabling of checksums

2018-04-09 Thread Magnus Hagander

On Sat, Apr 7, 2018 at 6:22 PM, Andres Freund  wrote:

> Hi,
>
> On 2018-04-07 08:57:03 +0200, Magnus Hagander wrote:
> > Note however that I'm sans-laptop until Sunday, so I will revert it then
> or
> > possibly Monday.
>
> I'll deactive the isolationtester tests until then. They've been
> intermittently broken for days now and prevent other tests from being
> exercised.
>

Thanks.

I've pushed the revert now, and left the pg_verify_checksums in place for
the time being.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Online enabling of checksums

2018-04-07 Thread Andres Freund

Hi,

On 2018-04-07 08:57:03 +0200, Magnus Hagander wrote:
> Note however that I'm sans-laptop until Sunday, so I will revert it then or
> possibly Monday.

I'll deactive the isolationtester tests until then. They've been
intermittently broken for days now and prevent other tests from being
exercised.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-07 Thread Andres Freund

Hi,

On 2018-04-07 10:14:49 +0200, Michael Banck wrote:
> Can the pg_verify_checksums command be kept at least, please? 
> 
> AFAICT this one is not contentious, the code is isolated, it's really
> useful, orthogonal to online checksum activation and argueably could've
> been committed as a separate patch anyway.

I've not looked at it in any meaningful amount of detail, but it does
seem a lot lower risk from here.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-07 Thread Michael Banck

Hi,

On Sat, Apr 07, 2018 at 08:57:03AM +0200, Magnus Hagander wrote:
> On Sat, Apr 7, 2018 at 6:26 AM, Andres Freund  wrote:
> > If it's not obvious: This isn't ready, should be reverted, cleaned up,
> > and re-submitted for v12.
> 
> While I do think that it's still definitely fixable in time for 11, I won't
> argue for it.Will revert.

:(

Can the pg_verify_checksums command be kept at least, please? 

AFAICT this one is not contentious, the code is isolated, it's really
useful, orthogonal to online checksum activation and argueably could've
been committed as a separate patch anyway.

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Re: Online enabling of checksums

2018-04-07 Thread Magnus Hagander

On Sat, Apr 7, 2018 at 6:26 AM, Andres Freund  wrote:

> On 2018-04-06 17:59:28 -0700, Andres Freund wrote:
> > + /*
> > +  * Create a database list.  We don't need to concern ourselves with
> > +  * rebuilding this list during runtime since any database created
> after
> > +  * this process started will be running with checksums turned on
> from the
> > +  * start.
> > +  */
> >
> > Why is this true? What if somebody runs CREATE DATABASE while the
> > launcher / worker are processing a different database? It'll copy the
> > template database on the filesystem level, and it very well might not
> > yet have checksums set?  Afaict the second time we go through this list
> > that's not cought.
>
> *caught
>
> It's indeed trivial to reproduce this, just slowing down a checksum run
> and copying the database yields:
> ./pg_verify_checksums -D /srv/dev/pgdev-dev
> pg_verify_checksums: checksum verification failed in file
> "/srv/dev/pgdev-dev/base/16385/2703", block 0: calculated checksum 45A7
> but expected 0
> pg_verify_checksums: checksum verification failed in file
> "/srv/dev/pgdev-dev/base/16385/2703", block 1: calculated checksum 8C7D
> but expected 0
>
>
>
> further complaints:
>
> The new isolation test cannot be re-run on an existing cluster. That's
> because the first test expects isolationtests to be disabled. As even
> remarked upon:
> # The checksum_enable suite will enable checksums for the cluster so should
> # not run before anything expecting the cluster to have checksums turned
> off
>
> How's that ok? You can leave database wide objects around, but the
> cluster-wide stuff needs to be cleaned up.
>
>
> The tests don't actually make sure that no checksum launcher / apply is
> running anymore. They just assume that it's gone once the GUC shows
> checksums have been set.  If you wanted to make the tests stable, you'd
> need to wait for that to show true *and* then check that no workers are
> around anymore.
>
>
> If it's not obvious: This isn't ready, should be reverted, cleaned up,
> and re-submitted for v12.
>

While I do think that it's still definitely fixable in time for 11, I won't
argue for it.Will revert.

Note however that I'm sans-laptop until Sunday, so I will revert it then or
possibly Monday.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Online enabling of checksums

2018-04-06 Thread Andres Freund

Here's a pass through the patch:
@@ -1033,7 +1034,7 @@ XLogInsertRecord(XLogRecData *rdata,
Assert(RedoRecPtr < Insert->RedoRecPtr);
RedoRecPtr = Insert->RedoRecPtr;
}
-   doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites);
+   doPageWrites = (Insert->fullPageWrites || Insert->forcePageWrites || 
DataChecksumsInProgress());

if (fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr && 
doPageWrites)

Why does this need an in-progress specific addition? Given that we
unconditionally log FPWs for all pages, and

#define XLogHintBitIsNeeded() (DataChecksumsNeedWrite() || wal_log_hints)

it's not clear what this achieves?  At the very least needs a comment.


@@ -4748,12 +4745,90 @@ GetMockAuthenticationNonce(void)
  * Are checksums enabled for data pages?
  */
 bool
-DataChecksumsEnabled(void)
+DataChecksumsNeedWrite(void)
 {
Assert(ControlFile != NULL);
return (ControlFile->data_checksum_version > 0);
 }

+bool
+DataChecksumsNeedVerify(void)
+{
+   Assert(ControlFile != NULL);
+
+   /*
+* Only verify checksums if they are fully enabled in the cluster. In
+* inprogress state they are only updated, not verified.
+*/
+   return (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_VERSION);
+}
+
+bool
+DataChecksumsInProgress(void)
+{
+   Assert(ControlFile != NULL);
+   return (ControlFile->data_checksum_version == 
PG_DATA_CHECKSUM_INPROGRESS_VERSION);
+}

As previously mentioned, the locking model around this is unclear.  It's
probably fine due to to surrounding memory barriers, but that needs to
be very very explicitly documented.


+void
+SetDataChecksumsOn(void)
+{
+   Assert(ControlFile != NULL);
+
+   LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
+
+   if (ControlFile->data_checksum_version != 
PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+   {
+   LWLockRelease(ControlFileLock);
+   elog(ERROR, "Checksums not in inprogress mode");
+   }
+
+   ControlFile->data_checksum_version = PG_DATA_CHECKSUM_VERSION;
+   UpdateControlFile();
+   LWLockRelease(ControlFileLock);
+
+   XlogChecksums(PG_DATA_CHECKSUM_VERSION);


As I've explained in
https://www.postgresql.org/message-id/20180406235126.d4sg4dtgicdpu...@alap3.anarazel.de
this appears to be unsafe.  There's no guarantee that the
ControlFile->data_checksum_version hasn't intermittently set to 0.


@@ -7788,6 +7863,16 @@ StartupXLOG(void)
 */
CompleteCommitTsInitialization();

+   /*
+* If we reach this point with checksums in inprogress state, we notify
+* the user that they need to manually restart the process to enable
+* checksums.
+*/
+   if (ControlFile->data_checksum_version == 
PG_DATA_CHECKSUM_INPROGRESS_VERSION)
+   ereport(WARNING,
+   (errmsg("checksum state is \"inprogress\" with 
no worker"),
+errhint("Either disable or enable checksums by 
calling the pg_disable_data_checksums() or pg_enable_data_checksums() 
functions.")));
+

Hm, so one manually has to take action here. Any reason we can't just
re-start the worker? Also, this'll be issued on standbys etc too, that
seems misleading?

+
+/*
+ * Disables checksums for the cluster, unless already disabled.
+ *
+ * Has immediate effect - the checksums are set to off right away.
+ */
+Datum
+disable_data_checksums(PG_FUNCTION_ARGS)
+{
+   /*
+* If we don't need to write new checksums, then clearly they are 
already
+* disabled.
+*/
+   if (!DataChecksumsNeedWrite())
+   ereport(ERROR,
+   (errmsg("data checksums already disabled")));
+
+   ShutdownChecksumHelperIfRunning();
+
+   SetDataChecksumsOff();
+
+   PG_RETURN_VOID();

Unsafe, see SetDataChecksumsOn comment above.

Shouldn't this be named with a pg_? We normally do that for SQL callable
functions, no? See the preceding functions.

This function is marked PROPARALLEL_SAFE. That can't be right?

Also, shouldn't this refuse to work if called in recovery mode? Erroring
out with
ERROR:  XX000: cannot make new WAL entries during recovery
doesn't seem right.


+/*
+ * Enables checksums for the cluster, unless already enabled.
+ *
+ * Supports vacuum-like cost-based throttling, to limit system load.
+ * Starts a background worker that updates checksums on existing data.
+ */
+Datum
+enable_data_checksums(PG_FUNCTION_ARGS)

This is PROPARALLEL_RESTRICTED. That doesn't strike me right, shouldn't
they be PROPARALLEL_UNSAFE? It might be fine, but I'd not want to rely
on it.


+/*
+ * Main entry point for checksumhelper launcher process.
+ */
+bool
+StartChecksumHelperLauncher(int cost_delay, int cost_limit)

entry point sounds a bit like it's the bgw invoked routine...


+/*
+ * ShutdownChecksumHelperIfRunning
+ * Request shutdown of the

Re: Online enabling of checksums

2018-04-06 Thread Stephen Frost

Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On 2018-04-06 19:31:56 -0400, Stephen Frost wrote:
> > I'm quite sure that bringing up MERGE in this thread and saying it needs
> > to be reverted without even having the committer of that feature on the
> > CC list isn't terribly useful and conflates two otherwise unrelated
> > patches and efforts.
> 
> Robert also mentioned it on the other thread, so... And no, they're not
> unrelated matters, in that it's pushing half baked stuff.

Apparently I've missed where he specifically called for it to be
reverted then, which is fine, and my apologies for missing it amongst
the depth of that particular thread.  I do think that specifically
asking for it to be reverted is distinct from expressing concerns about
it.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Online enabling of checksums

2018-04-06 Thread Andres Freund

On 2018-04-07 01:27:13 +0200, Daniel Gustafsson wrote:
> > On 07 Apr 2018, at 01:13, Andres Freund  wrote:
> > 
> > On 2018-04-07 01:04:50 +0200, Daniel Gustafsson wrote:
> >>> I'm fairly certain that the bug here is a simple race condition in the
> >>> test (not the main code!):
> >> 
> >> I wonder if it may perhaps be a case of both?
> > 
> > See my other message about the atomic fallback bit.
> 
> Yep, my MUA pulled it down just as I had sent this.  Thanks for confirming my
> suspicion.

But note it fails because the code using it is WRONG. There's a reason
there's "unlocked" in the name. But even leaving that aside, it probably
*still* be wrong if it were locked.

It seems *extremely* dubious that we'll allow to re-enable the checksums
while a worker is still doing stuff for the old cycle in the
background.  Consider what happens if the checksum helper is currently
doing RequestCheckpoint() (something that can certainly take a *LONG*)
while. Another process disables checksums. Pages get written out
without checksums. Yet another process re-enables checksums. Helper
process does SetDataChecksumsOn(). Which succeeds because

if (ControlFile->data_checksum_version != 
PG_DATA_CHECKSUM_INPROGRESS_VERSION)
{
LWLockRelease(ControlFileLock);
elog(ERROR, "Checksums not in inprogress mode");
}

succeeds.  Boom. Cluster with partially set checksums but marked as
valid.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-06 Thread Andres Freund

On 2018-04-06 19:31:56 -0400, Stephen Frost wrote:
> Greetings,
> 
> * Robert Haas (robertmh...@gmail.com) wrote:
> > On Fri, Apr 6, 2018 at 6:56 PM, Andres Freund  wrote:
> > > no one can entirely quibble with the rationale that this is ok (I'll
> > > post a patch cleaning up the atomics simulation of flags in a bit), but
> > > this is certainly not a correct locking strategy.
> > 
> > I think we have enough evidence at this point to conclude that this
> > patch, along with MERGE, should be reverted.
> 
> I'm not sure that I see some issues around getting the locking correct
> when starting/stopping the process is really evidence of a major problem
> with the patch-

Note that there've been several other things mentioned in the
thread. I'll add some more in a bit.


> yes, it obviously needs to be fixed and it would have been unfortuante
> if we hadn't caught it, but a good bit of effort appears to have been
> taken to ensure that exactly this is tested (which is in part why the
> buildfarm is failing) and this evidently found an existing bug, which
> is hardly this patch's fault.

THAT is the problem. It costs people that haven't been involved in the
feature time. I've friggin started debugging this because nobody else
could be bothered. Even though I'd planned to spend that time on other
patches that have been submitted far ahead in time.


> I'm quite sure that bringing up MERGE in this thread and saying it needs
> to be reverted without even having the committer of that feature on the
> CC list isn't terribly useful and conflates two otherwise unrelated
> patches and efforts.

Robert also mentioned it on the other thread, so... And no, they're not
unrelated matters, in that it's pushing half baked stuff.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-06 Thread Stephen Frost

Greetings,

* Robert Haas (robertmh...@gmail.com) wrote:
> On Fri, Apr 6, 2018 at 6:56 PM, Andres Freund  wrote:
> > no one can entirely quibble with the rationale that this is ok (I'll
> > post a patch cleaning up the atomics simulation of flags in a bit), but
> > this is certainly not a correct locking strategy.
> 
> I think we have enough evidence at this point to conclude that this
> patch, along with MERGE, should be reverted.

I'm not sure that I see some issues around getting the locking correct
when starting/stopping the process is really evidence of a major problem
with the patch- yes, it obviously needs to be fixed and it would have
been unfortuante if we hadn't caught it, but a good bit of effort
appears to have been taken to ensure that exactly this is tested (which
is in part why the buildfarm is failing) and this evidently found an
existing bug, which is hardly this patch's fault.

In short, I don't agree (yet..) that this needs reverting.

I'm quite sure that bringing up MERGE in this thread and saying it needs
to be reverted without even having the committer of that feature on the
CC list isn't terribly useful and conflates two otherwise unrelated
patches and efforts.  Let's try to use the threads the way they're
intended and keep our responses to each on their respective threads.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Online enabling of checksums

2018-04-06 Thread Daniel Gustafsson

> On 07 Apr 2018, at 01:13, Andres Freund  wrote:
> 
> On 2018-04-07 01:04:50 +0200, Daniel Gustafsson wrote:
>>> I'm fairly certain that the bug here is a simple race condition in the
>>> test (not the main code!):
>> 
>> I wonder if it may perhaps be a case of both?
> 
> See my other message about the atomic fallback bit.

Yep, my MUA pulled it down just as I had sent this.  Thanks for confirming my
suspicion.

cheers ./daniel

Re: Online enabling of checksums

2018-04-06 Thread Andres Freund

On 2018-04-07 01:04:50 +0200, Daniel Gustafsson wrote:
> > I'm fairly certain that the bug here is a simple race condition in the
> > test (not the main code!):
> 
> I wonder if it may perhaps be a case of both?

See my other message about the atomic fallback bit.


> > It's
> > exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to
> > make sure that a process has finished exiting.  Then followup tests fail
> > because the process is still running
> 
> I can reproduce the error when building with --disable-atomics, and it seems
> that all the failing members either do that, lack atomic.h, lack atomics or a
> combination.

atomics.h isn't important, it's just relevant for solaris (IIRC).  Only
one of the failing ones lacks atomics afaict. See

On 2018-04-06 14:19:09 -0700, Andres Freund wrote:
> Is that an explanation for
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial=2018-04-06%2019%3A18%3A11
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack=2018-04-06%2016%3A03%3A01
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2018-04-06%2015%3A46%3A16
> ? Those all don't seem fall under that? Having proper atomics?

So there it's the timing. Note that they didn't always fail either.


> > really?  Let's just force the test take at least 6s purely from
> > sleeping?
> 
> The test needs continuous reading in a session to try and trigger any bugs in
> read access on the cluster during checksumming, is there a good way to do that
> in the isolationtester?  I have failed to find a good way to repeat a step 
> like
> that, but I might be missing something.

IDK, I know this isn't right.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-06 Thread Daniel Gustafsson

> On 07 Apr 2018, at 00:23, Andres Freund  wrote:

> On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:
>> Applying this makes the _cancel test pass, moving the failure instead to the
>> following _enable test (which matches what coypu and mylodon are seeing).
> 
> FWIW, I'm somewhat annoyed that I'm now spending time debugging this to
> get the buildfarm green again.

Sorry about that, I’m a bit slow due to various $family situations at the
moment.

> I'm fairly certain that the bug here is a simple race condition in the
> test (not the main code!):

I wonder if it may perhaps be a case of both?

> It's
> exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to
> make sure that a process has finished exiting.  Then followup tests fail
> because the process is still running

I can reproduce the error when building with --disable-atomics, and it seems
that all the failing members either do that, lack atomic.h, lack atomics or a
combination.  checksumhelper cancellation use pg_atomic_unlocked_test_flag() to
test if it’s running when asked to abort, something which seems unsafe to do in
semaphore simulation as it always returns true.  If I for debugging synthesize
a flag test with testset/clear, the tests pass green (with the upstream patch
for pg_atomic_test_set_flag_impl() applied).  Cancelling with semaphore sim is
thus doomed to never work IIUC.  Or it’s a red herring.

As Magnus mentioned upstream, rewriting to not use an atomic flag is probably
the best option, once the current failure is understood.

> really?  Let's just force the test take at least 6s purely from
> sleeping?

The test needs continuous reading in a session to try and trigger any bugs in
read access on the cluster during checksumming, is there a good way to do that
in the isolationtester?  I have failed to find a good way to repeat a step like
that, but I might be missing something.

cheers ./daniel

Re: Online enabling of checksums

2018-04-06 Thread Robert Haas

On Fri, Apr 6, 2018 at 6:56 PM, Andres Freund  wrote:
> no one can entirely quibble with the rationale that this is ok (I'll
> post a patch cleaning up the atomics simulation of flags in a bit), but
> this is certainly not a correct locking strategy.

I think we have enough evidence at this point to conclude that this
patch, along with MERGE, should be reverted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Online enabling of checksums

2018-04-06 Thread Andres Freund

On 2018-04-06 14:33:48 -0700, Andres Freund wrote:
> On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:
> > Looking into the isolationtester failure on piculet, which builds using
> > --disable-atomics, and locust which doesn’t have atomics, the code for
> > pg_atomic_test_set_flag seems a bit odd.
> > 
> > TAS() is defined to return zero if successful, and pg_atomic_test_set_flag()
> > defined to return True if it could set.  When running without atomics, 
> > don’t we
> > need to do something like the below diff to make these APIs match?  :
> > 
> > --- a/src/backend/port/atomics.c
> > +++ b/src/backend/port/atomics.c
> > @@ -73,7 +73,7 @@ pg_atomic_init_flag_impl(volatile pg_atomic_flag *ptr)
> >  bool
> >  pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
> >  {
> > -   return TAS((slock_t *) >sema);
> > +   return TAS((slock_t *) >sema) == 0;
> >  }
> 
> Yes, this looks wrong.

And the reason the tests fail reliably after is because the locking
model around ChecksumHelperShmem->launcher_started arguably is broken:

/* If the launcher isn't started, there is nothing to shut down */
if 
(pg_atomic_unlocked_test_flag(>launcher_started))
return;

This uses a non-concurrency safe primitive. Which then spuriously
triggers:

#define PG_HAVE_ATOMIC_UNLOCKED_TEST_FLAG
static inline bool
pg_atomic_unlocked_test_flag_impl(volatile pg_atomic_flag *ptr)
{
/*
 * Can't do this efficiently in the semaphore based implementation - 
we'd
 * have to try to acquire the semaphore - so always return true. That's
 * correct, because this is only an unlocked test anyway. Do this in the
 * header so compilers can optimize the test away.
 */
return true;
}

no one can entirely quibble with the rationale that this is ok (I'll
post a patch cleaning up the atomics simulation of flags in a bit), but
this is certainly not a correct locking strategy.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-06 Thread Andres Freund

Hi,

On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:
> Applying this makes the _cancel test pass, moving the failure instead to the
> following _enable test (which matches what coypu and mylodon are seeing).

FWIW, I'm somewhat annoyed that I'm now spending time debugging this to
get the buildfarm green again.

I'm fairly certain that the bug here is a simple race condition in the
test (not the main code!):

The flag informing whether the worker has started is cleared via an
on_shmem_exit() hook:

static void
launcher_exit(int code, Datum arg)
{
ChecksumHelperShmem->abort = false;
pg_atomic_clear_flag(>launcher_started);
}

but the the wait in the test is done via functions like:

CREATE OR REPLACE FUNCTION test_checksums_on() RETURNS boolean AS $$
DECLARE
enabled boolean;
BEGIN
LOOP
SELECT setting = 'on' INTO enabled FROM 
pg_catalog.pg_settings WHERE name = 'data_checksums';
IF enabled THEN
EXIT;
END IF;
PERFORM pg_sleep(1);
END LOOP;
RETURN enabled;
END;
$$ LANGUAGE plpgsql;

INSERT INTO t1 (b, c) VALUES (generate_series(1,1), 'starting 
values');

CREATE OR REPLACE FUNCTION test_checksums_off() RETURNS boolean AS $$
DECLARE
enabled boolean;
BEGIN
PERFORM pg_sleep(1);
SELECT setting = 'off' INTO enabled FROM pg_catalog.pg_settings 
WHERE name = 'data_checksums';
RETURN enabled;
END;
$$ LANGUAGE plpgsql;

which just waits for setting checksums to have finished.  It's
exceedingly unsurprising that a 'pg_sleep(1)' is not a reliable way to
make sure that a process has finished exiting.  Then followup tests fail
because the process is still running

Also:
CREATE OR REPLACE FUNCTION reader_loop() RETURNS boolean AS $$
DECLARE
counter integer;
BEGIN
FOR counter IN 1..30 LOOP
PERFORM count(a) FROM t1;
PERFORM pg_sleep(0.2);
END LOOP;
RETURN True;
END;
$$ LANGUAGE plpgsql;
}

really?  Let's just force the test take at least 6s purely from
sleeping?

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-06 Thread Andres Freund

On 2018-04-06 02:28:17 +0200, Daniel Gustafsson wrote:
> Looking into the isolationtester failure on piculet, which builds using
> --disable-atomics, and locust which doesn’t have atomics, the code for
> pg_atomic_test_set_flag seems a bit odd.
> 
> TAS() is defined to return zero if successful, and pg_atomic_test_set_flag()
> defined to return True if it could set.  When running without atomics, don’t 
> we
> need to do something like the below diff to make these APIs match?  :
> 
> --- a/src/backend/port/atomics.c
> +++ b/src/backend/port/atomics.c
> @@ -73,7 +73,7 @@ pg_atomic_init_flag_impl(volatile pg_atomic_flag *ptr)
>  bool
>  pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
>  {
> -   return TAS((slock_t *) >sema);
> +   return TAS((slock_t *) >sema) == 0;
>  }

Yes, this looks wrong.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-06 Thread Tomas Vondra



On 04/06/2018 08:13 PM, Andres Freund wrote:
> On 2018-04-06 19:59:17 +0200, Tomas Vondra wrote:
>> On 04/06/2018 07:46 PM, Andres Freund wrote:
 Sure. But what would that be? I can't think of anything. A process that
 modifies a buffer (or any other piece of shared state) without holding
 some sort of lock seems broken by default.
>>>
>>> You can quite possibly already *hold* a lock if it's not an exclusive
>>> one.
>>>
>>
>> Sure, but if you're holding the buffer lock when the checksum version is
>> changed, then the checksumhelper is obviously not running yet. In which
>> case it will update the checksum on the buffer later.
> 
> The buffer content lock itself doesn't generally give any such guarantee
> afaict, as it's required that the content lock is held in shared mode
> during IO. ProcessSingleRelationFork() happens to use exclusive mode
> (which could and possibly should be optimized), so that's probably
> sufficient from that end though.
> 

Oh, I've just realized the phrasing of my previous message was rather
confusing. What I meant to say is this:

  Sure, but the checksum version is changed before the checksumhelper
  launcher/worker is even started. So if you're holding the buffer lock
  at that time, then the buffer is essentially guaranteed to be updated
  by the worker later.

Sorry if it seemed I'm suggesting the buffer lock itself guarantees
something about the worker startup.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Online enabling of checksums

2018-04-06 Thread Tomas Vondra

On 04/06/2018 08:13 PM, Andres Freund wrote:
> On 2018-04-06 19:59:17 +0200, Tomas Vondra wrote:
>> On 04/06/2018 07:46 PM, Andres Freund wrote:
 Sure. But what would that be? I can't think of anything. A process that
 modifies a buffer (or any other piece of shared state) without holding
 some sort of lock seems broken by default.
>>>
>>> You can quite possibly already *hold* a lock if it's not an exclusive
>>> one.
>>>
>>
>> Sure, but if you're holding the buffer lock when the checksum version is
>> changed, then the checksumhelper is obviously not running yet. In which
>> case it will update the checksum on the buffer later.
> 
> The buffer content lock itself doesn't generally give any such
> guarantee afaict, as it's required that the content lock is held in
> shared mode during IO. ProcessSingleRelationFork() happens to use
> exclusive mode (which could and possibly should be optimized), so
> that's probably sufficient from that end though.
> 

Yes.

> I'm mainly disconcerted this isn't well discussed & documented.
> 

Agreed, no argument here.


-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Online enabling of checksums

2018-04-06 Thread Andres Freund

On 2018-04-06 14:14:40 -0400, Robert Haas wrote:
> and granted, too, what is Magnus supposed to do about a couple of
> committers expressing doubts about whether something really ought to
> be committed?  Is that an absolute bar?  It wasn't phrased as such,
> nor do we really have the authority.  At the same time, those concerns
> didn't generate much discussion and, at least in my case, are not
> withdrawn merely because time has passed.

Yea, I don't think they're an absolute blocker. But imo more than
sufficient reason to give the list a head up of a day or two that the
patch is intended to be committed.

I'd only pointed the message out because JD said something about me not
having participated in the earlier discussion.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-06 Thread Robert Haas

On Thu, Apr 5, 2018 at 5:01 PM, Andres Freund  wrote:
> I've commented weeks ago about my doubts, and Robert concurred:
> http://archives.postgresql.org/message-id/CA%2BTgmoZPRfMqZoK_Fbo_tD9OH9PdPFcPBsi-sdGZ6Jg8OMM2PA%40mail.gmail.com

Yes, and I expressed some previous reservations as well.  Granted, my
reservations about this patch are less than for MERGE, and granted,
too, what is Magnus supposed to do about a couple of committers
expressing doubts about whether something really ought to be
committed?  Is that an absolute bar?  It wasn't phrased as such, nor
do we really have the authority.  At the same time, those concerns
didn't generate much discussion and, at least in my case, are not
withdrawn merely because time has passed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Online enabling of checksums

2018-04-06 Thread Andres Freund

On 2018-04-06 19:59:17 +0200, Tomas Vondra wrote:
> On 04/06/2018 07:46 PM, Andres Freund wrote:
> >> Sure. But what would that be? I can't think of anything. A process that
> >> modifies a buffer (or any other piece of shared state) without holding
> >> some sort of lock seems broken by default.
> > 
> > You can quite possibly already *hold* a lock if it's not an exclusive
> > one.
> > 
> 
> Sure, but if you're holding the buffer lock when the checksum version is
> changed, then the checksumhelper is obviously not running yet. In which
> case it will update the checksum on the buffer later.

The buffer content lock itself doesn't generally give any such guarantee
afaict, as it's required that the content lock is held in shared mode
during IO. ProcessSingleRelationFork() happens to use exclusive mode
(which could and possibly should be optimized), so that's probably
sufficient from that end though.

I'm mainly disconcerted this isn't well discussed & documented.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-06 Thread Tomas Vondra



On 04/06/2018 07:46 PM, Andres Freund wrote:
> On 2018-04-06 19:40:59 +0200, Tomas Vondra wrote:
>> In any case, I wouldn't call LockBufHdr/UnlockBufHdr a "side channel"
>> interlock. It's a pretty direct and intentional interlock, I think.
> 
> I mean it's a side-channel as far as DataChecksumsNeedWrite() is 
> concerned. You're banking on all callers using a barrier implying 
> operation around it.

Ah, OK.

> 
>> Sure. But what would that be? I can't think of anything. A process that
>> modifies a buffer (or any other piece of shared state) without holding
>> some sort of lock seems broken by default.
> 
> You can quite possibly already *hold* a lock if it's not an exclusive
> one.
> 

Sure, but if you're holding the buffer lock when the checksum version is
changed, then the checksumhelper is obviously not running yet. In which
case it will update the checksum on the buffer later.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Online enabling of checksums

2018-04-06 Thread Andres Freund

On 2018-04-06 19:40:59 +0200, Tomas Vondra wrote:
> In any case, I wouldn't call LockBufHdr/UnlockBufHdr a "side channel"
> interlock. It's a pretty direct and intentional interlock, I think.

I mean it's a side-channel as far as DataChecksumsNeedWrite() is
concerned. You're banking on all callers using a barrier implying
operation around it.

> Sure. But what would that be? I can't think of anything. A process that
> modifies a buffer (or any other piece of shared state) without holding
> some sort of lock seems broken by default.

You can quite possibly already *hold* a lock if it's not an exclusive
one.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-06 Thread Tomas Vondra

On 04/06/2018 07:22 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-04-06 14:34:43 +0200, Tomas Vondra wrote:
>>> Oh, that's not my intention either -- I just wanted to make sure I
>>> was thinking about the same issue you were.
> 
>> I agree we shouldn't rely on chance here - if we might read a stale
>> value, we need to fix that of course.
> 
> It's perfectly possible that some side-conditions mitigate this. What
> concerns me that
> a) Nobody appears to have raised this issue beforehand, besides an
>unlocked read of a critical variable being a fairly obvious
>issue. This kind of thing needs to be carefully thought about.
> b) If there's some "side channel" interlock, it's not documented.
> 
> I noticed the issue because of an IM question about the general feature,
> and I did a three minute skim and saw the read without a comment.
> 

All I can say is that I did consider this issue while reviewing the
patch, and I've managed to convince myself it's not an issue (using the
logic that I've just outlined here). Which is why I haven't raised it as
an issue, because I don't think it is.

You're right it might have been mentioned explicitly, of course.

In any case, I wouldn't call LockBufHdr/UnlockBufHdr a "side channel"
interlock. It's a pretty direct and intentional interlock, I think.

> 
>> I'm not quite sure I fully understand the issue, though. I assume both
>> LockBufHdr and UnlockBufHdr are memory barriers, so for bad things to
>> happen the process would need to be already past LockBufHdr when the
>> checksum version is updated. In which case it can use a stale version
>> when writing the buffer out. Correct?
> 
> Yes, they're are memory barriers.
> 

Phew! ;-)

> 
>> I wonder if that's actually a problem, considering the checksum worker
>> will then overwrite all data with correct checksums anyway. So the other
>> process would have to overwrite the buffer after checksum worker, at
>> which point it'll have to go through LockBufHdr.
> 
> Again, I'm not sure if there's some combination of issues that make this
> not a problem in practice. I just *asked* if there's something
> preventing this from being a problem.
> 
> The really problematic case would be if it is possible for some process
> to wait long enough, without executing a barrier implying operation,
> that it'd try to write out a page that the checksum worker has already
> passed over.
> 

Sure. But what would that be? I can't think of anything. A process that
modifies a buffer (or any other piece of shared state) without holding
some sort of lock seems broken by default.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Online enabling of checksums

2018-04-06 Thread Andres Freund

Hi,

On 2018-04-06 14:34:43 +0200, Tomas Vondra wrote:
> > Oh, that's not my intention either -- I just wanted to make sure I
> > was thinking about the same issue you were.

> I agree we shouldn't rely on chance here - if we might read a stale
> value, we need to fix that of course.

It's perfectly possible that some side-conditions mitigate this. What
concerns me that
a) Nobody appears to have raised this issue beforehand, besides an
   unlocked read of a critical variable being a fairly obvious
   issue. This kind of thing needs to be carefully thought about.
b) If there's some "side channel" interlock, it's not documented.

I noticed the issue because of an IM question about the general feature,
and I did a three minute skim and saw the read without a comment.

> I'm not quite sure I fully understand the issue, though. I assume both
> LockBufHdr and UnlockBufHdr are memory barriers, so for bad things to
> happen the process would need to be already past LockBufHdr when the
> checksum version is updated. In which case it can use a stale version
> when writing the buffer out. Correct?

Yes, they're are memory barriers.

> I wonder if that's actually a problem, considering the checksum worker
> will then overwrite all data with correct checksums anyway. So the other
> process would have to overwrite the buffer after checksum worker, at
> which point it'll have to go through LockBufHdr.

Again, I'm not sure if there's some combination of issues that make this
not a problem in practice. I just *asked* if there's something
preventing this from being a problem.

The really problematic case would be if it is possible for some process
to wait long enough, without executing a barrier implying operation,
that it'd try to write out a page that the checksum worker has already
passed over.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-06 Thread Andres Freund

On 2018-04-06 11:25:59 +0200, Magnus Hagander wrote:
> Since you know a lot more about that type of interlocks than I do :) We
> already wait for all running transactions to finish before we start doing
> anything. Obviously transactions != buffer writes (and we have things like
> the checkpointer/bgwriter to consider). Is there something else that we
> could safely just *wait* for? I have no problem whatsoever if this is a
> long wait (given the total time). I mean to the point of "what if we just
> stick a sleep(10) in there" level waiting.

I don't think anything just related to "time" is resonable in any sort
of way. On a overloaded system you can see long long stalls of processes
that have done a lot of work. Locking protocols should be correct, and
that's that.

> Or can that somehow be cleanly solved using some of the new atomic
> operators? Or is that likely to cause the same kind of overhead as throwing
> a barrier in there?

Worse.

I wonder if we could introduce "MegaExpensiveRareMemoryBarrier()" that
goes through pgproc and signals every process with a signal that
requiers the other side to do an operation implying a memory barrier.

That's actually not hard to do (e.g. every latch operation qualifies),
the problem is that signal delivery isn't synchronous. So you need some
acknowledgement protocol.

I think you could introduce a procsignal message that does a memory
barrier and then sets PGPROC->barrierGeneration to
ProcArrayStruct->barrierGeneration. MegaExpensiveRareMemoryBarrier()
increments ProcArrayStruct->barrierGeneration, signals everyone, and
then waits till every PGPROC->barrierGeneration has surpassed
ProcArrayStruct->barrierGeneration.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-06 Thread Tomas Vondra

On 04/06/2018 11:25 AM, Magnus Hagander wrote:
> 
> 
> On Thu, Apr 5, 2018 at 11:41 PM, Andres Freund  > wrote:
> 
> Hi,
> 
> On 2018-04-05 23:32:19 +0200, Magnus Hagander wrote:
> > On Thu, Apr 5, 2018 at 11:23 PM, Andres Freund  > wrote:
> > > Is there any sort of locking that guarantees that worker processes see
> > > an up2date value of
> > > DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
> > > there's not. So you can afaict end up with checksums being computed by
> > > the worker, but concurrent writes missing them.  The window is going 
> to
> > > be at most one missed checksum per process (as the unlocking of the 
> page
> > > is a barrier) and is probably not easy to hit, but that's dangerous
> > > enough.
> > >
> >
> > So just to be clear of the case you're worried about. It's basically:
> > Session #1 - sets checksums to inprogress
> > Session #1 - starts dynamic background worker ("launcher")
> > Launcher reads and enumerates pg_database
> > Launcher starts worker in first database
> > Worker processes first block of data in database
> > And at this point, Session #2 has still not seen the "checksums 
> inprogress"
> > flag and continues to write without checksums?
> 
> Yes.  I think there are some variations of that, but yes, that's pretty
> much it.
> 
> 
> > That seems like quite a long time to me -- is that really a problem?
> 
> We don't generally build locking models that are only correct based on
> likelihood. Especially not without a lengthy comment explaining that
> analysis.
> 
> 
> Oh, that's not my intention either -- I just wanted to make sure I
> was thinking about the same issue you were.
> 

I agree we shouldn't rely on chance here - if we might read a stale
value, we need to fix that of course.

I'm not quite sure I fully understand the issue, though. I assume both
LockBufHdr and UnlockBufHdr are memory barriers, so for bad things to
happen the process would need to be already past LockBufHdr when the
checksum version is updated. In which case it can use a stale version
when writing the buffer out. Correct?

I wonder if that's actually a problem, considering the checksum worker
will then overwrite all data with correct checksums anyway. So the other
process would have to overwrite the buffer after checksum worker, at
which point it'll have to go through LockBufHdr.

So I'm not sure I see the problem here. But perhaps LockBufHdr is not a
memory barrier?

> Since you know a lot more about that type of interlocks than I do :) We
> already wait for all running transactions to finish before we start
> doing anything. Obviously transactions != buffer writes (and we have
> things like the checkpointer/bgwriter to consider). Is there something
> else that we could safely just *wait* for? I have no problem whatsoever
> if this is a long wait (given the total time). I mean to the point of
> "what if we just stick a sleep(10) in there" level waiting.
> 
> Or can that somehow be cleanly solved using some of the new atomic
> operators? Or is that likely to cause the same kind of overhead as
> throwing a barrier in there?
> 

Perhaps the easiest thing we could do is walk shared buffers and do
LockBufHdr/UnlockBufHdr, which would guarantee no session is the process
of writing out a buffer with possibly stale checksum flag. Of course,
it's a bit brute-force-ish, but it's not that different from the waits
for running transactions and temporary tables.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Online enabling of checksums

2018-04-06 Thread Magnus Hagander

On Thu, Apr 5, 2018 at 11:41 PM, Andres Freund  wrote:

> Hi,
>
> On 2018-04-05 23:32:19 +0200, Magnus Hagander wrote:
> > On Thu, Apr 5, 2018 at 11:23 PM, Andres Freund 
> wrote:
> > > Is there any sort of locking that guarantees that worker processes see
> > > an up2date value of
> > > DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
> > > there's not. So you can afaict end up with checksums being computed by
> > > the worker, but concurrent writes missing them.  The window is going to
> > > be at most one missed checksum per process (as the unlocking of the
> page
> > > is a barrier) and is probably not easy to hit, but that's dangerous
> > > enough.
> > >
> >
> > So just to be clear of the case you're worried about. It's basically:
> > Session #1 - sets checksums to inprogress
> > Session #1 - starts dynamic background worker ("launcher")
> > Launcher reads and enumerates pg_database
> > Launcher starts worker in first database
> > Worker processes first block of data in database
> > And at this point, Session #2 has still not seen the "checksums
> inprogress"
> > flag and continues to write without checksums?
>
> Yes.  I think there are some variations of that, but yes, that's pretty
> much it.
>
>
> > That seems like quite a long time to me -- is that really a problem?
>
> We don't generally build locking models that are only correct based on
> likelihood. Especially not without a lengthy comment explaining that
> analysis.
>

Oh, that's not my intention either -- I just wanted to make sure I was
thinking about the same issue you were.

Since you know a lot more about that type of interlocks than I do :) We
already wait for all running transactions to finish before we start doing
anything. Obviously transactions != buffer writes (and we have things like
the checkpointer/bgwriter to consider). Is there something else that we
could safely just *wait* for? I have no problem whatsoever if this is a
long wait (given the total time). I mean to the point of "what if we just
stick a sleep(10) in there" level waiting.

Or can that somehow be cleanly solved using some of the new atomic
operators? Or is that likely to cause the same kind of overhead as throwing
a barrier in there?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Online enabling of checksums

2018-04-05 Thread Daniel Gustafsson

> On 05 Apr 2018, at 23:13, Magnus Hagander  wrote:

> (And yes, we've noticed it's failing in isolationtester on a number of boxes 
> -- Daniel is currently investigating)

Looking into the isolationtester failure on piculet, which builds using
--disable-atomics, and locust which doesn’t have atomics, the code for
pg_atomic_test_set_flag seems a bit odd.

TAS() is defined to return zero if successful, and pg_atomic_test_set_flag()
defined to return True if it could set.  When running without atomics, don’t we
need to do something like the below diff to make these APIs match?  :

--- a/src/backend/port/atomics.c
+++ b/src/backend/port/atomics.c
@@ -73,7 +73,7 @@ pg_atomic_init_flag_impl(volatile pg_atomic_flag *ptr)
 bool
 pg_atomic_test_set_flag_impl(volatile pg_atomic_flag *ptr)
 {
-   return TAS((slock_t *) >sema);
+   return TAS((slock_t *) >sema) == 0;
 }

Applying this makes the _cancel test pass, moving the failure instead to the
following _enable test (which matches what coypu and mylodon are seeing).

AFAICT there are no other callers of this than the online checksum code, and
this is not executed by the tests when running without atomics, which could
explain why nothing else is broken.

Before continuing the debugging, does this theory hold any water?  This isn’t
code I’m deeply familiar with so would appreciate any pointers.

cheers ./daniel

Re: Online enabling of checksums

2018-04-05 Thread Magnus Hagander

On Thu, Apr 5, 2018 at 11:48 PM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > I have now pushed this latest version with some minor text adjustments
> and
> > a catversion bump.
>
> crake is not happy --- it's failing cross-version upgrade tests because:
>
>
> Performing Consistency Checks
> -
> Checking cluster versions   ok
>
> old cluster uses data checksums but the new one does not
> Failure, exiting
>
>
> This seems to indicate that you broke pg_upgrade's detection of
> checksumming status, or that this patch changed the default
> checksum state (which it surely isn't described as doing).
>

It's not supposed to.

Without checking into it (just about off to bed now), one guess is that
it's actually a leftover from a previous stage -- what state is the cluster
actually in when it does that upgrade? Because the specific checksums tests
do leave their cluster with checksums on, which I think would perhaps be
the outcome of the testmodules-install-check-C test. The actual definition
of those tests are somewhere in the buildfarm client code, right?

In that case, the easy fix is probably to have the checksums tests actually
turn off the checksums again when they're done.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Online enabling of checksums

2018-04-05 Thread Tom Lane

Magnus Hagander  writes:
> I have now pushed this latest version with some minor text adjustments and
> a catversion bump.

crake is not happy --- it's failing cross-version upgrade tests because:

Performing Consistency Checks
-
Checking cluster versions   ok

old cluster uses data checksums but the new one does not
Failure, exiting

This seems to indicate that you broke pg_upgrade's detection of
checksumming status, or that this patch changed the default
checksum state (which it surely isn't described as doing).

regards, tom lane

Re: Online enabling of checksums

2018-04-05 Thread Andres Freund

Hi,

On 2018-04-05 23:32:19 +0200, Magnus Hagander wrote:
> On Thu, Apr 5, 2018 at 11:23 PM, Andres Freund  wrote:
> > Is there any sort of locking that guarantees that worker processes see
> > an up2date value of
> > DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
> > there's not. So you can afaict end up with checksums being computed by
> > the worker, but concurrent writes missing them.  The window is going to
> > be at most one missed checksum per process (as the unlocking of the page
> > is a barrier) and is probably not easy to hit, but that's dangerous
> > enough.
> >
> 
> So just to be clear of the case you're worried about. It's basically:
> Session #1 - sets checksums to inprogress
> Session #1 - starts dynamic background worker ("launcher")
> Launcher reads and enumerates pg_database
> Launcher starts worker in first database
> Worker processes first block of data in database
> And at this point, Session #2 has still not seen the "checksums inprogress"
> flag and continues to write without checksums?

Yes.  I think there are some variations of that, but yes, that's pretty
much it.


> That seems like quite a long time to me -- is that really a problem?

We don't generally build locking models that are only correct based on
likelihood. Especially not without a lengthy comment explaining that
analysis.


> I'm guessing you're seeing a shorter path between the two that I can't
> see right now (I'll blame the late evning...)?

I don't think it matters terribly much how long that path is.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-05 Thread Magnus Hagander

On Thu, Apr 5, 2018 at 11:23 PM, Andres Freund  wrote:

> Hi,
>
> On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:
> > I have now pushed this latest version with some minor text adjustments
> and
> > a catversion bump.
>
> Is there any sort of locking that guarantees that worker processes see
> an up2date value of
> DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
> there's not. So you can afaict end up with checksums being computed by
> the worker, but concurrent writes missing them.  The window is going to
> be at most one missed checksum per process (as the unlocking of the page
> is a barrier) and is probably not easy to hit, but that's dangerous
> enough.
>

So just to be clear of the case you're worried about. It's basically:
Session #1 - sets checksums to inprogress
Session #1 - starts dynamic background worker ("launcher")
Launcher reads and enumerates pg_database
Launcher starts worker in first database
Worker processes first block of data in database
And at this point, Session #2 has still not seen the "checksums inprogress"
flag and continues to write without checksums?

That seems like quite a long time to me -- is that really a problem? I'm
guessing you're seeing a shorter path between the two that I can't see
right now (I'll blame the late evning...)?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Online enabling of checksums

2018-04-05 Thread Andres Freund

Hi,

On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:
> I have now pushed this latest version with some minor text adjustments and
> a catversion bump.

Is there any sort of locking that guarantees that worker processes see
an up2date value of
DataChecksumsNeedWrite()/ControlFile->data_checksum_version? Afaict
there's not. So you can afaict end up with checksums being computed by
the worker, but concurrent writes missing them.  The window is going to
be at most one missed checksum per process (as the unlocking of the page
is a barrier) and is probably not easy to hit, but that's dangerous
enough.

Just plonking a barrier into DataChecksumsNeedWrite() etc is a
possibility, but it's also not free...

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-05 Thread Peter Geoghegan

On Thu, Apr 5, 2018 at 1:51 PM, Joshua D. Drake  wrote:
> Perhaps I am missing something but there has been a lot of public discussion
> on this feature for the last 7 weeks of which you barely participated. I
> certainly understand wanting some notice before commit but there has been
> lots of discussion, multiple people publicly commenting on the patch and
> Magnus has been very receptive to all feedback (that I have seen). Perhaps
> we are being a sensitive because of another patch that is actually
> ramrodding the process and we need to take a step back?

I wish it was just one patch. I can think of several.

-- 
Peter Geoghegan

Re: Online enabling of checksums

2018-04-05 Thread Magnus Hagander

On Thu, Apr 5, 2018 at 11:09 PM, Peter Geoghegan  wrote:

> On Thu, Apr 5, 2018 at 1:27 PM, Andres Freund  wrote:
> >>At least this patch was posted on the lists before commit, unlike many
> >>others from many different people. And AFAIK there has never been such
> >>a
> >>rule.
>
> The rules cannot possibly anticipate every situation or subtlety. The
> letter of the law is a slightly distinct thing to its spirit.
>
> > The more debatable a decision is, the more important it IMO becomes to
> give people a chance to object. Don't think there needs to be a hard rule
> to always announce an intent to commit.
>
> +1
>
> Andres' remarks need to be seen in the context of the past couple of
> weeks, and in the context of this being a relatively high risk patch
> that was submitted quite late in the cycle.
>

I would argue that this is a pretty isolated patch. A large majority of the
code is completely isolated from the rest. And I would argue that this
reduces the risk of the patch substantially.

(And yes, we've noticed it's failing in isolationtester on a number of
boxes -- Daniel is currently investigating)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Online enabling of checksums

2018-04-05 Thread Peter Geoghegan

On Thu, Apr 5, 2018 at 1:27 PM, Andres Freund  wrote:
>>At least this patch was posted on the lists before commit, unlike many
>>others from many different people. And AFAIK there has never been such
>>a
>>rule.

The rules cannot possibly anticipate every situation or subtlety. The
letter of the law is a slightly distinct thing to its spirit.

> The more debatable a decision is, the more important it IMO becomes to give 
> people a chance to object. Don't think there needs to be a hard rule to 
> always announce an intent to commit.

+1

Andres' remarks need to be seen in the context of the past couple of
weeks, and in the context of this being a relatively high risk patch
that was submitted quite late in the cycle.

-- 
Peter Geoghegan

Re: Online enabling of checksums

2018-04-05 Thread Andres Freund

On 2018-04-05 13:51:41 -0700, Joshua D. Drake wrote:
> On 04/05/2018 01:12 PM, Andres Freund wrote:
> > I want to be on the record that I think merging a nontrival feature that
> > got submitted 2018-02-21, just before the start of the last last CF, is
> > an abuse of process, and not cool.  We've other people working hard to
> > follow the process, and circumventing it like this just signals to
> > people trying to follow the rules that they're fools.
> > 
> > Merging ~2kloc patches like that is going to cause pain. And even if
> > not, it causes procedual damage.
> 
> Perhaps I am missing something but there has been a lot of public discussion
> on this feature for the last 7 weeks of which you barely participated.

I've commented weeks ago about my doubts, and Robert concurred:
http://archives.postgresql.org/message-id/CA%2BTgmoZPRfMqZoK_Fbo_tD9OH9PdPFcPBsi-sdGZ6Jg8OMM2PA%40mail.gmail.com


> I certainly understand wanting some notice before commit but there has
> been lots of discussion, multiple people publicly commenting on the
> patch and Magnus has been very receptive to all feedback (that I have
> seen).

It's perfectly reasonable to continue review / improvement cycles of a
patch, even if it's not going to get in the current release. What does
that have to do with what I am concerned about?


> Perhaps we are being a sensitive because of another patch that is
> actually ramrodding the process and we need to take a step back?

No. See link above.

Please don't use "we" in this childishness implying fashion.

- Andres

Re: Online enabling of checksums

2018-04-05 Thread Joshua D. Drake


On 04/05/2018 01:12 PM, Andres Freund wrote:

On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:

I have now pushed this latest version with some minor text adjustments and
a catversion bump.

Thanks for all the reviews!

I want to be on the record that I think merging a nontrival feature that
got submitted 2018-02-21, just before the start of the last last CF, is
an abuse of process, and not cool.  We've other people working hard to
follow the process, and circumventing it like this just signals to
people trying to follow the rules that they're fools.

Merging ~2kloc patches like that is going to cause pain. And even if
not, it causes procedual damage.


Perhaps I am missing something but there has been a lot of public 
discussion on this feature for the last 7 weeks of which you barely 
participated. I certainly understand wanting some notice before commit 
but there has been lots of discussion, multiple people publicly 
commenting on the patch and Magnus has been very receptive to all 
feedback (that I have seen). Perhaps we are being a sensitive because of 
another patch that is actually ramrodding the process and we need to 
take a step back?


Thanks,

JD




- Andres



--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
***  A fault and talent of mine is to tell it exactly how it is.  ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
* Unless otherwise stated, opinions are my own.   *

Re: Online enabling of checksums

2018-04-05 Thread Andres Freund



On April 5, 2018 1:20:52 PM PDT, Magnus Hagander  wrote:
>On Thu, Apr 5, 2018 at 10:14 PM, Andres Freund 
>wrote:
>
>And even worse, without even announcing an intent to commit and giving
>> people a chance to object.
>>
>
>At least this patch was posted on the lists before commit, unlike many
>others from many different people. And AFAIK there has never been such
>a
>rule.

The more debatable a decision is, the more important it IMO becomes to give 
people a chance to object. Don't think there needs to be a hard rule to always 
announce an intent to commit.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Online enabling of checksums

2018-04-05 Thread Magnus Hagander

On Thu, Apr 5, 2018 at 10:14 PM, Andres Freund  wrote:

> On 2018-04-05 13:12:08 -0700, Andres Freund wrote:
> > On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:
> > > I have now pushed this latest version with some minor text adjustments
> and
> > > a catversion bump.
> > >
> > > Thanks for all the reviews!
> >
> > I want to be on the record that I think merging a nontrival feature that
> > got submitted 2018-02-21, just before the start of the last last CF, is
> > an abuse of process, and not cool.  We've other people working hard to
> > follow the process, and circumventing it like this just signals to
> > people trying to follow the rules that they're fools.
> >
> > Merging ~2kloc patches like that is going to cause pain. And even if
> > not, it causes procedual damage.
>

I can understand those arguments, and if that's the view of others as well,
I can of course revert that.


And even worse, without even announcing an intent to commit and giving
> people a chance to object.
>

At least this patch was posted on the lists before commit, unlike many
others from many different people. And AFAIK there has never been such a
rule.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Online enabling of checksums

2018-04-05 Thread Andres Freund

On 2018-04-05 13:12:08 -0700, Andres Freund wrote:
> On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:
> > I have now pushed this latest version with some minor text adjustments and
> > a catversion bump.
> > 
> > Thanks for all the reviews!
> 
> I want to be on the record that I think merging a nontrival feature that
> got submitted 2018-02-21, just before the start of the last last CF, is
> an abuse of process, and not cool.  We've other people working hard to
> follow the process, and circumventing it like this just signals to
> people trying to follow the rules that they're fools.
> 
> Merging ~2kloc patches like that is going to cause pain. And even if
> not, it causes procedual damage.

And even worse, without even announcing an intent to commit and giving
people a chance to object.

Greetings,

Andres Freund

Re: Online enabling of checksums

2018-04-05 Thread Andres Freund

On 2018-04-05 22:06:36 +0200, Magnus Hagander wrote:
> I have now pushed this latest version with some minor text adjustments and
> a catversion bump.
> 
> Thanks for all the reviews!

I want to be on the record that I think merging a nontrival feature that
got submitted 2018-02-21, just before the start of the last last CF, is
an abuse of process, and not cool.  We've other people working hard to
follow the process, and circumventing it like this just signals to
people trying to follow the rules that they're fools.

Merging ~2kloc patches like that is going to cause pain. And even if
not, it causes procedual damage.

- Andres

Re: Online enabling of checksums

2018-04-05 Thread Magnus Hagander

On Thu, Apr 5, 2018 at 7:30 PM, Magnus Hagander  wrote:

> On Thu, Apr 5, 2018 at 5:08 PM, Andrey Borodin 
> wrote:
>
>>
>>
>> > 5 апр. 2018 г., в 19:58, Magnus Hagander 
>> написал(а):
>> >
>> >
>> >
>> > On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin 
>> wrote:
>> >
>> >
>> > > 5 апр. 2018 г., в 14:33, Tomas Vondra 
>> написал(а):
>> > >
>> > > This patch version seems fine to me. I'm inclined to mark it RFC.
>> > +1
>> > The patch works fine for me. I've tried different combinations of
>> backend cancelation and the only suspicious thing I found is that you can
>> start multiple workers by cancelling launcher and not cancelling worker. Is
>> it problematic behavior? If we run pg_enable_data_checksums() it checks for
>> existing launcher for a reason, m.b. it should check for worker too?
>> >
>> > I don't think it's a problem in itself -- it will cause pointless work,
>> but not actually cause any poroblems I think (whereas duplicate launchers
>> could cause interesting things to happen).
>> >
>> > How did you actually cancel the launcher to end up in this situation?
>> select pg_enable_data_checksums(1,1);
>> select pg_sleep(0.1);
>> select pg_cancel_backend(pid),backend_type from pg_stat_activity where
>> backend_type ~ 'checksumhelper launcher' ;
>> select pg_enable_data_checksums(1,1);
>> select pg_sleep(0.1);
>> select pg_cancel_backend(pid),backend_type from pg_stat_activity where
>> backend_type ~ 'checksumhelper launcher' ;
>> select pg_enable_data_checksums(1,1);
>> select pg_sleep(0.1);
>> select pg_cancel_backend(pid),backend_type from pg_stat_activity where
>> backend_type ~ 'checksumhelper launcher' ;
>>
>> select pid,backend_type from pg_stat_activity where backend_type
>> ~'checks';
>>   pid  | backend_type
>> ---+---
>>  98587 | checksumhelper worker
>>  98589 | checksumhelper worker
>>  98591 | checksumhelper worker
>> (3 rows)
>>
>> There is a way to shoot yourself in a leg then by calling
>> pg_disable_data_checksums(), but this is extremely stupid for a user
>>
>
>
> Ah, didn't consider query cancel. I'm not sure how much  we should
> actually care about it, but it's easy enough to trap that signal and just
> do a clean shutdown on it, so I've done that.
>
> PFA a patch that does that, and also rebased over the datallowconn patch
> just landed (which also removes some docs).
>
>

I have now pushed this latest version with some minor text adjustments and
a catversion bump.

Thanks for all the reviews!


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Online enabling of checksums

2018-04-05 Thread Magnus Hagander

On Thu, Apr 5, 2018 at 5:08 PM, Andrey Borodin  wrote:

>
>
> > 5 апр. 2018 г., в 19:58, Magnus Hagander 
> написал(а):
> >
> >
> >
> > On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin 
> wrote:
> >
> >
> > > 5 апр. 2018 г., в 14:33, Tomas Vondra 
> написал(а):
> > >
> > > This patch version seems fine to me. I'm inclined to mark it RFC.
> > +1
> > The patch works fine for me. I've tried different combinations of
> backend cancelation and the only suspicious thing I found is that you can
> start multiple workers by cancelling launcher and not cancelling worker. Is
> it problematic behavior? If we run pg_enable_data_checksums() it checks for
> existing launcher for a reason, m.b. it should check for worker too?
> >
> > I don't think it's a problem in itself -- it will cause pointless work,
> but not actually cause any poroblems I think (whereas duplicate launchers
> could cause interesting things to happen).
> >
> > How did you actually cancel the launcher to end up in this situation?
> select pg_enable_data_checksums(1,1);
> select pg_sleep(0.1);
> select pg_cancel_backend(pid),backend_type from pg_stat_activity where
> backend_type ~ 'checksumhelper launcher' ;
> select pg_enable_data_checksums(1,1);
> select pg_sleep(0.1);
> select pg_cancel_backend(pid),backend_type from pg_stat_activity where
> backend_type ~ 'checksumhelper launcher' ;
> select pg_enable_data_checksums(1,1);
> select pg_sleep(0.1);
> select pg_cancel_backend(pid),backend_type from pg_stat_activity where
> backend_type ~ 'checksumhelper launcher' ;
>
> select pid,backend_type from pg_stat_activity where backend_type ~'checks';
>   pid  | backend_type
> ---+---
>  98587 | checksumhelper worker
>  98589 | checksumhelper worker
>  98591 | checksumhelper worker
> (3 rows)
>
> There is a way to shoot yourself in a leg then by calling
> pg_disable_data_checksums(), but this is extremely stupid for a user
>


Ah, didn't consider query cancel. I'm not sure how much  we should actually
care about it, but it's easy enough to trap that signal and just do a clean
shutdown on it, so I've done that.

PFA a patch that does that, and also rebased over the datallowconn patch
just landed (which also removes some docs).



-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 122f034f17..6257563eaa 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19540,6 +19540,71 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 
   
 
+  
+   Data Checksum Functions
+
+   
+The functions shown in  can
+be used to enable or disable data checksums in a running cluster.
+See  for details.
+   
+
+   
+Checksum SQL Functions
+
+ 
+  
+   Function
+   Return Type
+   Description
+  
+ 
+ 
+  
+   
+
+ pg_enable_data_checksums
+
+pg_enable_data_checksums(cost_delay int, cost_limit int)
+   
+   
+void
+   
+   
+
+ Initiates data checksums for the cluster. This will switch the data checksums mode
+ to in progress and start a background worker that will process
+ all data in the database and enable checksums for it. When all data pages have had
+ checksums enabled, the cluster will automatically switch to checksums
+ on.
+
+
+ If cost_delay and cost_limit are
+ specified, the speed of the process is throttled using the same principles as
+ Cost-based Vacuum Delay.
+
+   
+  
+  
+   
+
+ pg_disable_data_checksums
+
+pg_disable_data_checksums()
+   
+   
+void
+   
+   
+Disables data checksums for the cluster.
+   
+  
+ 
+
+   
+
+  
+
   
Database Object Management Functions
 
diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 4e01e5641c..7cd6ee85dc 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -211,6 +211,7 @@ Complete list of usable sgml source files in this directory.
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 949b5a220f..826dd91f72 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -195,9 +195,9 @@ PostgreSQL documentation

 Use checksums on data pages to help detect corruption by the
 I/O system that would otherwise be silent. Enabling checksums
-may incur a noticeable performance penalty. This option can only
-be set during initialization, and cannot be changed later. If
-set, checksums are calculated for all objects, in all databases.

Re: Online enabling of checksums

2018-04-05 Thread Andrey Borodin



> 5 апр. 2018 г., в 19:58, Magnus Hagander  написал(а):
> 
> 
> 
> On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin  wrote:
> 
> 
> > 5 апр. 2018 г., в 14:33, Tomas Vondra  
> > написал(а):
> >
> > This patch version seems fine to me. I'm inclined to mark it RFC.
> +1
> The patch works fine for me. I've tried different combinations of backend 
> cancelation and the only suspicious thing I found is that you can start 
> multiple workers by cancelling launcher and not cancelling worker. Is it 
> problematic behavior? If we run pg_enable_data_checksums() it checks for 
> existing launcher for a reason, m.b. it should check for worker too?
> 
> I don't think it's a problem in itself -- it will cause pointless work, but 
> not actually cause any poroblems I think (whereas duplicate launchers could 
> cause interesting things to happen).
> 
> How did you actually cancel the launcher to end up in this situation?
select pg_enable_data_checksums(1,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where 
backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(1,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where 
backend_type ~ 'checksumhelper launcher' ;
select pg_enable_data_checksums(1,1);
select pg_sleep(0.1);
select pg_cancel_backend(pid),backend_type from pg_stat_activity where 
backend_type ~ 'checksumhelper launcher' ;

select pid,backend_type from pg_stat_activity where backend_type ~'checks';
  pid  | backend_type  
---+---
 98587 | checksumhelper worker
 98589 | checksumhelper worker
 98591 | checksumhelper worker
(3 rows)

There is a way to shoot yourself in a leg then by calling 
pg_disable_data_checksums(), but this is extremely stupid for a user.

Best regards, Andrey Borodin.

Re: Online enabling of checksums

2018-04-05 Thread Magnus Hagander

On Thu, Apr 5, 2018 at 4:55 PM, Andrey Borodin  wrote:

>
>
> > 5 апр. 2018 г., в 14:33, Tomas Vondra 
> написал(а):
> >
> > This patch version seems fine to me. I'm inclined to mark it RFC.
> +1
> The patch works fine for me. I've tried different combinations of backend
> cancelation and the only suspicious thing I found is that you can start
> multiple workers by cancelling launcher and not cancelling worker. Is it
> problematic behavior? If we run pg_enable_data_checksums() it checks for
> existing launcher for a reason, m.b. it should check for worker too?
>

I don't think it's a problem in itself -- it will cause pointless work, but
not actually cause any poroblems I think (whereas duplicate launchers could
cause interesting things to happen).

How did you actually cancel the launcher to end up in this situation?

I think it worth to capitalize WAL in "re-write the page to wal".
>

In the comment, right? Yeah, easy fix.,

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Online enabling of checksums

2018-04-05 Thread Andrey Borodin



> 5 апр. 2018 г., в 14:33, Tomas Vondra  
> написал(а):
> 
> This patch version seems fine to me. I'm inclined to mark it RFC.
+1
The patch works fine for me. I've tried different combinations of backend 
cancelation and the only suspicious thing I found is that you can start 
multiple workers by cancelling launcher and not cancelling worker. Is it 
problematic behavior? If we run pg_enable_data_checksums() it checks for 
existing launcher for a reason, m.b. it should check for worker too?

I think it worth to capitalize WAL in "re-write the page to wal".

Best regards, Andrey Borodin.

Re: Online enabling of checksums

2018-04-05 Thread Tomas Vondra



On 4/5/18 11:07 AM, Magnus Hagander wrote:
> 
> 
> On Wed, Apr 4, 2018 at 12:11 AM, Tomas Vondra
> > wrote:
>
> ...
>
> It however still fails to initialize the attempts field after allocating
> the db entry in BuildDatabaseList, so if you try running with
> -DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this:
> 
>  WARNING:  attempts = -1684366952
>  WARNING:  attempts = 1010514489
>  WARNING:  attempts = -1145390664
>  WARNING:  attempts = 1162101570
> 
> I guess those are not the droids we're looking for?
> 
> 
> When looking at that and after a quick discussion, we just decided it's
> better to completely remove the retry logic. As you mentioned in some
> earlier mail, we had all this logic to retry databases (unlikely) but
> not relations (likely). Attached patch simplifies it to only detect the
> "database was dropped" case (which is fine), and consider every other
> kind of failure a permanent one and just not turn on checksums in those
> cases.
> 

OK, works for me.

> 
> 
> Likewise, I don't see where ChecksumHelperShmemStruct->abort gets
> initialized. I think it only ever gets set in launcher_exit(), but that
> does not seem sufficient. I suspect it's the reason for this behavior:
> 
> 
> It's supposed to get initialized in ChecksumHelperShmemInit() -- fixed.
> (The whole memset-to-zero)
> 

OK, seems fine now.

> 
>     test=# select pg_enable_data_checksums(10, 10);
>     ERROR:  database "template0" does not allow connections
>     HINT:  Allow connections using ALTER DATABASE and try again.
>     test=# alter database template0 allow_connections = true;
>     ALTER DATABASE
>     test=# select pg_enable_data_checksums(10, 10);
>     ERROR:  could not start checksumhelper: already running
>     test=# select pg_disable_data_checksums();
>      pg_disable_data_checksums
>     ---
> 
>     (1 row)
> 
>     test=# select pg_enable_data_checksums(10, 10);
>     ERROR:  could not start checksumhelper: has been cancelled
> 
> 
> Turns out that wasn't the problem. The problem was that we *set* it
> before erroring out with the "does not allow connections", but never
> cleared it. In that case, it would be listed as launcher-is-running even
> though the launcher was never started.
> 
> Basically the check for datallowconn was put in the wrong place. That
> check should go away completely once we merge (because we should also
> merge the part that allows us to bypass it), but for now I have moved
> the check to the correct place.
> 

Ah, OK. I was just guessing.

> 
> 
> Attached is a diff with a couple of minor comment tweaks, and correct
> initialization of the attempts field.
> 
> 
> Thanks. This is included in the attached update, along with the above
> fixes and some other small touches from Daniel. 
> 

This patch version seems fine to me. I'm inclined to mark it RFC.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Online enabling of checksums

2018-04-05 Thread Magnus Hagander

On Wed, Apr 4, 2018 at 12:11 AM, Tomas Vondra 
wrote:

> On 04/03/2018 02:05 PM, Magnus Hagander wrote:
> > On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander  > > wrote:
> >
> > On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra
> > >
> > wrote:
>> > And if you try this with a temporary table (not
> > hidden in transaction,
> > > > so the bgworker can see it), the worker will fail
> > with this:
> > > >
> > > >   ERROR:  cannot access temporary tables of other
> > sessions
> > > >
> > > > But of course, this is just another way how to crash
> > without updating
> > > > the result for the launcher, so checksums may end up
> > being enabled
> > > > anyway.
> > > >
> > > >
> > > > Yeah, there will be plenty of side-effect issues from
> that
> > > > crash-with-wrong-status case. Fixing that will at least
> > make things
> > > > safer -- in that checksums won't be enabled when not put
> > on all pages.
> > > >
> > >
> > > Sure, the outcome with checksums enabled incorrectly is a
> > consequence of
> > > bogus status, and fixing that will prevent that. But that
> > wasn't my main
> > > point here - not articulated very clearly, though.
> > >
> > > The bigger question is how to handle temporary tables
> > gracefully, so
> > > that it does not terminate the bgworker like this at all.
> > This might be
> > > even bigger issue than dropped relations, considering that
> > temporary
> > > tables are pretty common part of applications (and it also
> > includes
> > > CREATE/DROP).
> > >
> > > For some clusters it might mean the online checksum
> > enabling would
> > > crash+restart infinitely (well, until reaching
> MAX_ATTEMPTS).
> > >
> > > Unfortunately, try_relation_open() won't fix this, as the
> > error comes
> > > from ReadBufferExtended. And it's not a matter of simply
> > creating a
> > > ReadBuffer variant without that error check, because
> > temporary tables
> > > use local buffers.
> > >
> > > I wonder if we could just go and set the checksums anyway,
> > ignoring the
> > > local buffers. If the other session does some changes,
> > it'll overwrite
> > > our changes, this time with the correct checksums. But it
> > seems pretty
> > > dangerous (I mean, what if they're writing stuff while
> > we're updating
> > > the checksums? Considering the various short-cuts for
> > temporary tables,
> > > I suspect that would be a boon for race conditions.)
> > >
> > > Another option would be to do something similar to running
> > transactions,
> > > i.e. wait until all temporary tables (that we've seen at
> > the beginning)
> > > disappear. But we're starting to wait on more and more
> stuff.
> > >
> > > If we do this, we should clearly log which backends we're
> > waiting for,
> > > so that the admins can go and interrupt them manually.
> > >
> > >
> > >
> > > Yeah, waiting for all transactions at the beginning is pretty
> > simple.
> > >
> > > Making the worker simply ignore temporary tables would also be
> > easy.
> > >
> > > One of the bigger issues here is temporary tables are
> > *session* scope
> > > and not transaction, so we'd actually need the other session
> > to finish,
> > > not just the transaction.
> > >
> > > I guess what we could do is something like this:
> > >
> > > 1. Don't process temporary tables in the checksumworker,
> period.
> > > Instead, build a list of any temporary tables that existed
> > when the
> > > worker started in this particular database (basically anything
> > that we
> > > got in our scan). Once we have processed the complete
> > database, keep
> > > re-scanning pg_class until those particular tables are gone
> > (search by oid).
> > >
> > > That means that any temporary tables that are created *while*
> > we are
> > > processing a database are ignored, but they should already be
> > receiving
> > > checksums.
> > >
> > > It

Re: Online enabling of checksums

2018-04-04 Thread Michael Banck

Hi,

On Tue, Apr 03, 2018 at 02:05:04PM +0200, Magnus Hagander wrote:
> PFA a rebase on top of the just committed verify-checksums patch.

For the record, I am on vacation this week and won't be able to do
further in-depth review or testing of this patch before the end of the
commitfest, sorry.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Re: Online enabling of checksums

2018-04-03 Thread Tomas Vondra

On 04/03/2018 02:05 PM, Magnus Hagander wrote:
> On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander  > wrote:
> 
> On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra
> >
> wrote:
> 
> On 03/31/2018 05:05 PM, Magnus Hagander wrote:
> > On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
> >  
>  >> wrote:
> >
> > ...
> >
> >     I do think just waiting for all running transactions to 
> complete is
> >     fine, and it's not the first place where we use it - CREATE 
> SUBSCRIPTION
> >     does pretty much exactly the same thing (and CREATE INDEX 
> CONCURRENTLY
> >     too, to some extent). So we have a precedent / working code we 
> can copy.
> >
> >
> > Thinking again, I don't think it should be done as part of
> > BuildRelationList(). We should just do it once in the launcher 
> before
> > starting, that'll be both easier and cleaner. Anything started after
> > that will have checksums on it, so we should be fine.
> >
> > PFA one that does this.
> >
> 
> Seems fine to me. I'd however log waitforxid, not the oldest one. If
> you're a DBA and you want to make the checksumming to proceed,
> knowing
> the oldest running XID is useless for that. If we log
> waitforxid, it can
> be used to query pg_stat_activity and interrupt the sessions
> somehow.
> 
> 
> Yeah, makes sense. Updated.
> 
>  
> 
> >     >     And if you try this with a temporary table (not
> hidden in transaction,
> >     >     so the bgworker can see it), the worker will fail
> with this:
> >     >
> >     >       ERROR:  cannot access temporary tables of other
> sessions
> >     >
> >     >     But of course, this is just another way how to crash
> without updating
> >     >     the result for the launcher, so checksums may end up
> being enabled
> >     >     anyway.
> >     >
> >     >
> >     > Yeah, there will be plenty of side-effect issues from that
> >     > crash-with-wrong-status case. Fixing that will at least
> make things
> >     > safer -- in that checksums won't be enabled when not put
> on all pages. 
> >     >
> >
> >     Sure, the outcome with checksums enabled incorrectly is a
> consequence of
> >     bogus status, and fixing that will prevent that. But that
> wasn't my main
> >     point here - not articulated very clearly, though.
> >
> >     The bigger question is how to handle temporary tables
> gracefully, so
> >     that it does not terminate the bgworker like this at all.
> This might be
> >     even bigger issue than dropped relations, considering that
> temporary
> >     tables are pretty common part of applications (and it also
> includes
> >     CREATE/DROP).
> >
> >     For some clusters it might mean the online checksum
> enabling would
> >     crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
> >
> >     Unfortunately, try_relation_open() won't fix this, as the
> error comes
> >     from ReadBufferExtended. And it's not a matter of simply
> creating a
> >     ReadBuffer variant without that error check, because
> temporary tables
> >     use local buffers.
> >
> >     I wonder if we could just go and set the checksums anyway,
> ignoring the
> >     local buffers. If the other session does some changes,
> it'll overwrite
> >     our changes, this time with the correct checksums. But it
> seems pretty
> >     dangerous (I mean, what if they're writing stuff while
> we're updating
> >     the checksums? Considering the various short-cuts for
> temporary tables,
> >     I suspect that would be a boon for race conditions.)
> >
> >     Another option would be to do something similar to running
> transactions,
> >     i.e. wait until all temporary tables (that we've seen at
> the beginning)
> >     disappear. But we're starting to wait on more and more stuff.
> >
> >     If we do this, we should clearly log which backends we're
> waiting for,
> >     so that the admins can go and interrupt them manually.
> >
> >
> >
> > Yeah,

Re: Online enabling of checksums

2018-04-03 Thread Magnus Hagander

On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander  wrote:

> On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
>
>> On 03/31/2018 05:05 PM, Magnus Hagander wrote:
>> > On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
>> > >
>> wrote:
>> >
>> > ...
>> >
>> > I do think just waiting for all running transactions to complete is
>> > fine, and it's not the first place where we use it - CREATE
>> SUBSCRIPTION
>> > does pretty much exactly the same thing (and CREATE INDEX
>> CONCURRENTLY
>> > too, to some extent). So we have a precedent / working code we can
>> copy.
>> >
>> >
>> > Thinking again, I don't think it should be done as part of
>> > BuildRelationList(). We should just do it once in the launcher before
>> > starting, that'll be both easier and cleaner. Anything started after
>> > that will have checksums on it, so we should be fine.
>> >
>> > PFA one that does this.
>> >
>>
>> Seems fine to me. I'd however log waitforxid, not the oldest one. If
>> you're a DBA and you want to make the checksumming to proceed, knowing
>> the oldest running XID is useless for that. If we log waitforxid, it can
>> be used to query pg_stat_activity and interrupt the sessions somehow.
>>
>
> Yeah, makes sense. Updated.
>
>
>
>> > > And if you try this with a temporary table (not hidden in
>> transaction,
>> > > so the bgworker can see it), the worker will fail with this:
>> > >
>> > >   ERROR:  cannot access temporary tables of other sessions
>> > >
>> > > But of course, this is just another way how to crash without
>> updating
>> > > the result for the launcher, so checksums may end up being
>> enabled
>> > > anyway.
>> > >
>> > >
>> > > Yeah, there will be plenty of side-effect issues from that
>> > > crash-with-wrong-status case. Fixing that will at least make
>> things
>> > > safer -- in that checksums won't be enabled when not put on all
>> pages.
>> > >
>> >
>> > Sure, the outcome with checksums enabled incorrectly is a
>> consequence of
>> > bogus status, and fixing that will prevent that. But that wasn't my
>> main
>> > point here - not articulated very clearly, though.
>> >
>> > The bigger question is how to handle temporary tables gracefully, so
>> > that it does not terminate the bgworker like this at all. This
>> might be
>> > even bigger issue than dropped relations, considering that temporary
>> > tables are pretty common part of applications (and it also includes
>> > CREATE/DROP).
>> >
>> > For some clusters it might mean the online checksum enabling would
>> > crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
>> >
>> > Unfortunately, try_relation_open() won't fix this, as the error
>> comes
>> > from ReadBufferExtended. And it's not a matter of simply creating a
>> > ReadBuffer variant without that error check, because temporary
>> tables
>> > use local buffers.
>> >
>> > I wonder if we could just go and set the checksums anyway, ignoring
>> the
>> > local buffers. If the other session does some changes, it'll
>> overwrite
>> > our changes, this time with the correct checksums. But it seems
>> pretty
>> > dangerous (I mean, what if they're writing stuff while we're
>> updating
>> > the checksums? Considering the various short-cuts for temporary
>> tables,
>> > I suspect that would be a boon for race conditions.)
>> >
>> > Another option would be to do something similar to running
>> transactions,
>> > i.e. wait until all temporary tables (that we've seen at the
>> beginning)
>> > disappear. But we're starting to wait on more and more stuff.
>> >
>> > If we do this, we should clearly log which backends we're waiting
>> for,
>> > so that the admins can go and interrupt them manually.
>> >
>> >
>> >
>> > Yeah, waiting for all transactions at the beginning is pretty simple.
>> >
>> > Making the worker simply ignore temporary tables would also be easy.
>> >
>> > One of the bigger issues here is temporary tables are *session* scope
>> > and not transaction, so we'd actually need the other session to finish,
>> > not just the transaction.
>> >
>> > I guess what we could do is something like this:
>> >
>> > 1. Don't process temporary tables in the checksumworker, period.
>> > Instead, build a list of any temporary tables that existed when the
>> > worker started in this particular database (basically anything that we
>> > got in our scan). Once we have processed the complete database, keep
>> > re-scanning pg_class until those particular tables are gone (search by
>> oid).
>> >
>> > That means that any temporary tables that are created *while* we are
>> > processing a database are ignored, but they should already be receiving
>> > checksums.
>> >
>> > It definitely leads to a potential issue

Re: Online enabling of checksums

2018-04-01 Thread Magnus Hagander

On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra 
wrote:

> On 03/31/2018 05:05 PM, Magnus Hagander wrote:
> > On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
> > >
> wrote:
> >
> > ...
> >
> > I do think just waiting for all running transactions to complete is
> > fine, and it's not the first place where we use it - CREATE
> SUBSCRIPTION
> > does pretty much exactly the same thing (and CREATE INDEX
> CONCURRENTLY
> > too, to some extent). So we have a precedent / working code we can
> copy.
> >
> >
> > Thinking again, I don't think it should be done as part of
> > BuildRelationList(). We should just do it once in the launcher before
> > starting, that'll be both easier and cleaner. Anything started after
> > that will have checksums on it, so we should be fine.
> >
> > PFA one that does this.
> >
>
> Seems fine to me. I'd however log waitforxid, not the oldest one. If
> you're a DBA and you want to make the checksumming to proceed, knowing
> the oldest running XID is useless for that. If we log waitforxid, it can
> be used to query pg_stat_activity and interrupt the sessions somehow.
>

Yeah, makes sense. Updated.



> > > And if you try this with a temporary table (not hidden in
> transaction,
> > > so the bgworker can see it), the worker will fail with this:
> > >
> > >   ERROR:  cannot access temporary tables of other sessions
> > >
> > > But of course, this is just another way how to crash without
> updating
> > > the result for the launcher, so checksums may end up being
> enabled
> > > anyway.
> > >
> > >
> > > Yeah, there will be plenty of side-effect issues from that
> > > crash-with-wrong-status case. Fixing that will at least make things
> > > safer -- in that checksums won't be enabled when not put on all
> pages.
> > >
> >
> > Sure, the outcome with checksums enabled incorrectly is a
> consequence of
> > bogus status, and fixing that will prevent that. But that wasn't my
> main
> > point here - not articulated very clearly, though.
> >
> > The bigger question is how to handle temporary tables gracefully, so
> > that it does not terminate the bgworker like this at all. This might
> be
> > even bigger issue than dropped relations, considering that temporary
> > tables are pretty common part of applications (and it also includes
> > CREATE/DROP).
> >
> > For some clusters it might mean the online checksum enabling would
> > crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
> >
> > Unfortunately, try_relation_open() won't fix this, as the error comes
> > from ReadBufferExtended. And it's not a matter of simply creating a
> > ReadBuffer variant without that error check, because temporary tables
> > use local buffers.
> >
> > I wonder if we could just go and set the checksums anyway, ignoring
> the
> > local buffers. If the other session does some changes, it'll
> overwrite
> > our changes, this time with the correct checksums. But it seems
> pretty
> > dangerous (I mean, what if they're writing stuff while we're updating
> > the checksums? Considering the various short-cuts for temporary
> tables,
> > I suspect that would be a boon for race conditions.)
> >
> > Another option would be to do something similar to running
> transactions,
> > i.e. wait until all temporary tables (that we've seen at the
> beginning)
> > disappear. But we're starting to wait on more and more stuff.
> >
> > If we do this, we should clearly log which backends we're waiting
> for,
> > so that the admins can go and interrupt them manually.
> >
> >
> >
> > Yeah, waiting for all transactions at the beginning is pretty simple.
> >
> > Making the worker simply ignore temporary tables would also be easy.
> >
> > One of the bigger issues here is temporary tables are *session* scope
> > and not transaction, so we'd actually need the other session to finish,
> > not just the transaction.
> >
> > I guess what we could do is something like this:
> >
> > 1. Don't process temporary tables in the checksumworker, period.
> > Instead, build a list of any temporary tables that existed when the
> > worker started in this particular database (basically anything that we
> > got in our scan). Once we have processed the complete database, keep
> > re-scanning pg_class until those particular tables are gone (search by
> oid).
> >
> > That means that any temporary tables that are created *while* we are
> > processing a database are ignored, but they should already be receiving
> > checksums.
> >
> > It definitely leads to a potential issue with long running temp tables.
> > But as long as we look at the *actual tables* (by oid), we should be
> > able to handle long-running sessions once they have dropped their temp
> > tables.
> >
> > Does that sound

Re: Online enabling of checksums

2018-03-31 Thread Magnus Hagander

On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra 
wrote:

>
> On 03/31/2018 02:02 PM, Magnus Hagander wrote:
> > On Sat, Mar 31, 2018 at 2:08 AM, Tomas Vondra
> > >
> wrote:
> >
> >
> > But wait - there is more ;-) BuildRelationList is using
> heap_beginscan
> > with the regular snapshot, so it does not see uncommitted
> transactions.
> > So if you do this:
> >
> >   BEGIN;
> >   CREATE TABLE t AS SELECT i FROM generate_series(1,1000) s(i);
> >   -- run pg_enable_data_checksums() from another session
> >   SELECT COUNT(*) FROM t;
> >
> > then the table will be invisible to the checksum worker, it won't
> have
> > checksums updated and the cluster will get checksums enabled. Which
> > means this:
> >
> >
> > Ugh. Interestingly enough I just put that on my TODO list *yesterday*
> > that I forgot to check that specific case :/
> >
>
> But I was faster in reporting it ;-)
>

Indeed you were :)



> >   test=# SELECT COUNT(*) FROM t;
> >   WARNING:  page verification failed, calculated checksum 27170 but
> > expected 0
> >   ERROR:  invalid page in block 0 of relation base/16677/16683
> >
> > Not sure what's the best way to fix this - maybe we could wait for
> all
> > running transactions to end, before starting the work.
> >
> >
> > I was thinking of that as one way to deal with it, yes.
> >
> > I guess a reasonable way to do that would be to do it as part of
> > BuildRelationList() -- basically have that one wait until there is no
> > other running transaction in that specific database before we take the
> > snapshot?
> >
> > A first thought I had was to try to just take an access exclusive lock
> > on pg_class for a very short time, but a transaction that does create
> > table doesn't actually keep it's lock on that table so there is no
> conflict.
> >
>
> Yeah, I don't think that's going to work, because you don't even know
> you need to lock/wait for something.


> I do think just waiting for all running transactions to complete is
> fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
> does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
> too, to some extent). So we have a precedent / working code we can copy.
>

Thinking again, I don't think it should be done as part of
BuildRelationList(). We should just do it once in the launcher before
starting, that'll be both easier and cleaner. Anything started after that
will have checksums on it, so we should be fine.

PFA one that does this.


> And if you try this with a temporary table (not hidden in transaction,
> > so the bgworker can see it), the worker will fail with this:
> >
> >   ERROR:  cannot access temporary tables of other sessions
> >
> > But of course, this is just another way how to crash without updating
> > the result for the launcher, so checksums may end up being enabled
> > anyway.
> >
> >
> > Yeah, there will be plenty of side-effect issues from that
> > crash-with-wrong-status case. Fixing that will at least make things
> > safer -- in that checksums won't be enabled when not put on all pages.
> >
>
> Sure, the outcome with checksums enabled incorrectly is a consequence of
> bogus status, and fixing that will prevent that. But that wasn't my main
> point here - not articulated very clearly, though.
>
> The bigger question is how to handle temporary tables gracefully, so
> that it does not terminate the bgworker like this at all. This might be
> even bigger issue than dropped relations, considering that temporary
> tables are pretty common part of applications (and it also includes
> CREATE/DROP).
>
> For some clusters it might mean the online checksum enabling would
> crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
>
> Unfortunately, try_relation_open() won't fix this, as the error comes
> from ReadBufferExtended. And it's not a matter of simply creating a
> ReadBuffer variant without that error check, because temporary tables
> use local buffers.
>
> I wonder if we could just go and set the checksums anyway, ignoring the
> local buffers. If the other session does some changes, it'll overwrite
> our changes, this time with the correct checksums. But it seems pretty
> dangerous (I mean, what if they're writing stuff while we're updating
> the checksums? Considering the various short-cuts for temporary tables,
> I suspect that would be a boon for race conditions.)
>
> Another option would be to do something similar to running transactions,
> i.e. wait until all temporary tables (that we've seen at the beginning)
> disappear. But we're starting to wait on more and more stuff.
>
> If we do this, we should clearly log which backends we're waiting for,
> so that the admins can go and interrupt them manually.
>


Yeah, waiting for all transactions at the beginning is pretty simple.

Making the

Re: Online enabling of checksums

2018-03-31 Thread Tomas Vondra

Hi,

On 03/31/2018 02:02 PM, Magnus Hagander wrote:
> On Sat, Mar 31, 2018 at 2:08 AM, Tomas Vondra
> > wrote:
>
> ...
> 
> (a) Should not be difficult to do, I think. We don't have relation_open
> with a missing_ok flag, but implementing something like that should not
> be difficult. Even a simple "does OID exist" should be enough.
> 
> 
> Not entirely sure what you mean with "even a simple does oid exist"
> means? I mean, if we check for the file, that won't help us -- there
> will still be a tiny race between the check and us opening it won't it?
> 

I meant to say "even a simple check if the OID still exists" but it was
a bit too late / not enough caffeine issue. You're right there would be
a tiny window of race condition - it'd be much shorter, possibly enough
to make the error+restart approach acceptable.

> However, we have try_relation_open().  Which is documented as:
>  *Same as relation_open, except return NULL instead of failing
>  *if the relation does not exist.
> 
> So I'm pretty sure it's just a matter of using try_relation_open()
> instead, and checking for NULL?
> 

Oh, right. I thought we had a relation_open variant that handles this,
but have been looking for one with missing_ok flag and so I missed this.
try_relation_open should do the trick when it comes to dropped tables.

> 
> (b) But just handling dropped relations is not enough, because I could
> simply kill the bgworker directly, and it would have exactly the same
> consequences. What needs to happen is something like this:
> 
> 
> 
> And now I see your code, which was below-fold when I first read. After
> having writing a very similar fix myself. I'm glad this code looks
> mostly identical to what I suggested above, so I think that's a good
> solution.
> 

Good ;-)

>  
> 
> BTW I don't think handling dropped relations by letting the bgworker
> crash and restart is an acceptable approach. That would pretty much mean
> any DDL changes are prohibited on the system while the checksum process
> is running, which is not quite possible (e.g. for systems doing stuff
> with temporary tables).
> 
> 
> No, I don't like that at all. We need to handle them gracefully, by
> skipping them - crash and restart is not acceptable for something that
> common.
> 

Yeah, I was just thinking aloud.

> 
> Which however reminds me I've also ran into a bug in the automated retry
> system, because you may get messages like this:
> 
>     ERROR:  failed to enable checksums in "test", giving up (attempts
>             639968292).
> 
> This happens because BuildDatabaseList() does just palloc() and does not
> initialize the 'attempts' field. It may get initialized to 0 by chance,
> but I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, hence the insanely
> high value.
> 
> 
> Eh. I don't have that "(attempts" part in my code at all. Is that either
> from some earlier version of the patch, or did you add that yourself for
> testing?
> 

Apologies, you're right I tweaked the message a bit (just adding the
number of attempts to it). The logic however remains the same, and the
bug is real.

>
> But wait - there is more ;-) BuildRelationList is using heap_beginscan
> with the regular snapshot, so it does not see uncommitted transactions.
> So if you do this:
> 
>   BEGIN;
>   CREATE TABLE t AS SELECT i FROM generate_series(1,1000) s(i);
>   -- run pg_enable_data_checksums() from another session
>   SELECT COUNT(*) FROM t;
> 
> then the table will be invisible to the checksum worker, it won't have
> checksums updated and the cluster will get checksums enabled. Which
> means this:
> 
> 
> Ugh. Interestingly enough I just put that on my TODO list *yesterday*
> that I forgot to check that specific case :/
> 

But I was faster in reporting it ;-)

> 
>   test=# SELECT COUNT(*) FROM t;
>   WARNING:  page verification failed, calculated checksum 27170 but
>             expected 0
>   ERROR:  invalid page in block 0 of relation base/16677/16683
> 
> Not sure what's the best way to fix this - maybe we could wait for all
> running transactions to end, before starting the work.
> 
> 
> I was thinking of that as one way to deal with it, yes.
> 
> I guess a reasonable way to do that would be to do it as part of
> BuildRelationList() -- basically have that one wait until there is no
> other running transaction in that specific database before we take the
> snapshot?
> 
> A first thought I had was to try to just take an access exclusive lock
> on pg_class for a very short time, but a transaction that does create
> table doesn't actually keep it's lock on that table so there is no conflict.
> 

Yeah, I don't think that's going to work, because you don't even know
you need to lock/wait for something.

I do think just waiting for all running transactions to complete is
fine,

Re: Online enabling of checksums

2018-03-31 Thread Magnus Hagander

On Sat, Mar 31, 2018 at 2:08 AM, Tomas Vondra 
wrote:

> Hi,
>
> I've been looking at the patch a bit more, and I think there are a
> couple of fairly serious issues in the error handling.
>

Thanks!


>
> Firstly ChecksumHelperLauncherMain spends quite a bit of effort on
> skipping dropped databases, but ChecksumHelperWorkerMain does not do the
> same thing with tables. I'm not exactly sure why, but I'd say dropped
> tables are more likely than dropped databases (e.g. because of temporary
> tables) and it's strange to gracefully handle the more rare case.
>

Uh, yes. I could've sworn we had code for that, but I fully agree with your
assessment that it's not there :)



Now, when a table gets dropped after BuildRelationList() does it's work,
> we end up calling ProcessSingleRelationByOid() on that OID. Which calls
> relation_open(), which fails with elog(ERROR), terminating the whole
> bgworker with an error like this:
>
> ERROR:  could not open relation with OID 16632
> LOG:  background worker "checksumhelper worker" (PID 27152) exited
>   with exit code 1
>

Yeah. We need to trap that error an not crash and burn.


Which however means the error handling in ChecksumHelperWorkerMain() has
> no chance to kick in, because the bgworker dies right away. The code
> looks like this:
>
> foreach(lc, RelationList)
> {
> ChecksumHelperRelation *rel
>  = (ChecksumHelperRelation *) lfirst(lc);
>
> if (!ProcessSingleRelationByOid(rel->reloid, strategy))
> {
> ChecksumHelperShmem->success = ABORTED;
> break;
> }
> else
> ChecksumHelperShmem->success = SUCCESSFUL;
> }
> list_free_deep(RelationList);
>
> Now, assume the first relation in the list still exists and gets
> processed correctly, so "success" ends up being SUCCESSFUL. Then the
> second OID is the dropped relation, which kills the bgworker ...
>

Indeed, that's just a very simple bug. It shouldn't be set to SUCCESSFUL
until *all* tables have been processed. I believe the code needs to be this:


IMHO this error handling is broken by design - two things need to
> happen, I think: (a) graceful handling of dropped relations and (b)
> proper error reporting from the bgworder.
>
> (a) Should not be difficult to do, I think. We don't have relation_open
> with a missing_ok flag, but implementing something like that should not
> be difficult. Even a simple "does OID exist" should be enough.
>

Not entirely sure what you mean with "even a simple does oid exist" means?
I mean, if we check for the file, that won't help us -- there will still be
a tiny race between the check and us opening it won't it?

However, we have try_relation_open().  Which is documented as:
 * Same as relation_open, except return NULL instead of failing
 * if the relation does not exist.

So I'm pretty sure it's just a matter of using try_relation_open() instead,
and checking for NULL?


(b) But just handling dropped relations is not enough, because I could
> simply kill the bgworker directly, and it would have exactly the same
> consequences. What needs to happen is something like this:
>


And now I see your code, which was below-fold when I first read. After
having writing a very similar fix myself. I'm glad this code looks mostly
identical to what I suggested above, so I think that's a good solution.



> BTW I don't think handling dropped relations by letting the bgworker
> crash and restart is an acceptable approach. That would pretty much mean
> any DDL changes are prohibited on the system while the checksum process
> is running, which is not quite possible (e.g. for systems doing stuff
> with temporary tables).
>

No, I don't like that at all. We need to handle them gracefully, by
skipping them - crash and restart is not acceptable for something that
common.


Which however reminds me I've also ran into a bug in the automated retry
> system, because you may get messages like this:
>
> ERROR:  failed to enable checksums in "test", giving up (attempts
> 639968292).
>
> This happens because BuildDatabaseList() does just palloc() and does not
> initialize the 'attempts' field. It may get initialized to 0 by chance,
> but I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, hence the insanely
> high value.
>

Eh. I don't have that "(attempts" part in my code at all. Is that either
from some earlier version of the patch, or did you add that yourself for
testing?



BTW both ChecksumHelperRelation and ChecksumHelperDatabase have
> 'success' field which is actually unused (and uninitialized).
>

Correct. These are old leftovers from the "partial restart" logic, and
should be removed.


But wait - there is more ;-) BuildRelationList is using heap_beginscan
> with the regular snapshot, so it does not see uncommitted transactions.
> So if you do this:
>
>   BEGIN;
>   CREATE TABLE t AS SELECT i FROM generate_series(1,1000) s(i);
>   -- run

Re: Online enabling of checksums

2018-03-30 Thread Tomas Vondra

Hi,

I've been looking at the patch a bit more, and I think there are a
couple of fairly serious issues in the error handling.

Firstly ChecksumHelperLauncherMain spends quite a bit of effort on
skipping dropped databases, but ChecksumHelperWorkerMain does not do the
same thing with tables. I'm not exactly sure why, but I'd say dropped
tables are more likely than dropped databases (e.g. because of temporary
tables) and it's strange to gracefully handle the more rare case.

Now, when a table gets dropped after BuildRelationList() does it's work,
we end up calling ProcessSingleRelationByOid() on that OID. Which calls
relation_open(), which fails with elog(ERROR), terminating the whole
bgworker with an error like this:

ERROR:  could not open relation with OID 16632
LOG:  background worker "checksumhelper worker" (PID 27152) exited
  with exit code 1

Which however means the error handling in ChecksumHelperWorkerMain() has
no chance to kick in, because the bgworker dies right away. The code
looks like this:

foreach(lc, RelationList)
{
ChecksumHelperRelation *rel
 = (ChecksumHelperRelation *) lfirst(lc);

if (!ProcessSingleRelationByOid(rel->reloid, strategy))
{
ChecksumHelperShmem->success = ABORTED;
break;
}
else
ChecksumHelperShmem->success = SUCCESSFUL;
}
list_free_deep(RelationList);

Now, assume the first relation in the list still exists and gets
processed correctly, so "success" ends up being SUCCESSFUL. Then the
second OID is the dropped relation, which kills the bgworker ...

The launcher however does not realize anything went wrong, because the
flag still says SUCCESSFUL. And so it merrily switches checksums to
"on", leading to this on the rest of the relations:

WARNING:  page verification failed, calculated checksum 58644 but
  expected 0
ERROR:  invalid page in block 0 of relation base/16631/16653

Yikes!

IMHO this error handling is broken by design - two things need to
happen, I think: (a) graceful handling of dropped relations and (b)
proper error reporting from the bgworder.

(a) Should not be difficult to do, I think. We don't have relation_open
with a missing_ok flag, but implementing something like that should not
be difficult. Even a simple "does OID exist" should be enough.

(b) But just handling dropped relations is not enough, because I could
simply kill the bgworker directly, and it would have exactly the same
consequences. What needs to happen is something like this:

ChecksumHelperResult local_success = SUCCESFUL;

foreach(lc, RelationList)
{
ChecksumHelperRelation *rel
 = (ChecksumHelperRelation *) lfirst(lc);

if (!ProcessSingleRelationByOid(rel->reloid, strategy))
{
local_success = ABORTED;
break;
}
}
list_free_deep(RelationList);

ChecksumHelperShmem->success = local_success;

That is, leave the flag in shred memory set to FAILED until the very
last moment, and only when everything went fine set it to SUCCESSFUL.


BTW I don't think handling dropped relations by letting the bgworker
crash and restart is an acceptable approach. That would pretty much mean
any DDL changes are prohibited on the system while the checksum process
is running, which is not quite possible (e.g. for systems doing stuff
with temporary tables).

Which however reminds me I've also ran into a bug in the automated retry
system, because you may get messages like this:

ERROR:  failed to enable checksums in "test", giving up (attempts
639968292).

This happens because BuildDatabaseList() does just palloc() and does not
initialize the 'attempts' field. It may get initialized to 0 by chance,
but I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, hence the insanely
high value.

BTW both ChecksumHelperRelation and ChecksumHelperDatabase have
'success' field which is actually unused (and uninitialized).

But wait - there is more ;-) BuildRelationList is using heap_beginscan
with the regular snapshot, so it does not see uncommitted transactions.
So if you do this:

  BEGIN;
  CREATE TABLE t AS SELECT i FROM generate_series(1,1000) s(i);
  -- run pg_enable_data_checksums() from another session
  SELECT COUNT(*) FROM t;

then the table will be invisible to the checksum worker, it won't have
checksums updated and the cluster will get checksums enabled. Which
means this:

  test=# SELECT COUNT(*) FROM t;
  WARNING:  page verification failed, calculated checksum 27170 but
expected 0
  ERROR:  invalid page in block 0 of relation base/16677/16683

Not sure what's the best way to fix this - maybe we could wait for all
running transactions to end, before starting the work.

And if you try this with a temporary table (not hidden in transaction,
so the bgworker can see it), the worker will fail with this:

  ERROR:  cannot access temporary tables of other sessions

But

1 2 >

1 - 100 of 191 matches

Mail list logo